Re: NiFi processor to convert CSV to XML

2016-08-25 Thread Andy LoPresto
Hi Ram,

NiFi does not currently have a ConvertCSVToXML processor. A couple suggestions:

* Use ExecuteScript processor with a small Groovy script which reads the CSV 
file (an attribute for the file path or with the file body as incoming flowfile 
content), parses the contents (a simple split on lines and then “,” should be 
sufficient), and then serializes to XML using `groovy.xml.MarkupBuilder` to 
format the output.
* Use an XLST transform with the TransformXML processor — I found this XSLT 
file which reads a CSV file and transforms it, but I cannot vouch for it [1].

[1] http://andrewjwelch.com/code/xslt/csv/csv-to-xml_v2.html 


Andy LoPresto
alopre...@apache.org
alopresto.apa...@gmail.com
PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4  BACE 3C6E F65B 2F7D EF69

> On Aug 25, 2016, at 3:16 PM, Nathamuni, Ramanujam  wrote:
> 
> Hello All,
> 
> I am looking for processor to convert  CSV file to XML. I looked at the 
> processors available but I do not see one for CSV to XML? Is there any  
> workaround using other processor to this job? Or any can write new processor 
> for this function?
> 
> 
> Thanks,
> Ram
> 
> *
> This e-mail may contain confidential or privileged information.
> If you are not the intended recipient, please notify the sender immediately 
> and then delete it.
> 
> TIAA
> *
> 



signature.asc
Description: Message signed with OpenPGP using GPGMail


Re: NiFi processor to convert CSV to XML

2016-08-25 Thread Matt Burgess
Ram,

You could use the ExecuteScript processor if you are comfortable with
scripting in Groovy, JavaScript, Jython, JRuby, or Lua. I have an
example [1] of reading in a file and splitting on a delimiter (like a
comma). If you use Groovy, you can leverage the MarkupBuilder [2] to
build XML. Please let me know if I can offer more guidance.

Regards,
Matt

[1] 
http://funnifi.blogspot.com/2016/02/executescript-explained-split-fields.html
[2] http://groovy-lang.org/processing-xml.html#_creating_xml

On Thu, Aug 25, 2016 at 6:16 PM, Nathamuni, Ramanujam
 wrote:
> Hello All,
>
>
>
> I am looking for processor to convert  CSV file to XML. I looked at the
> processors available but I do not see one for CSV to XML? Is there any
> workaround using other processor to this job? Or any can write new processor
> for this function?
>
>
>
>
>
> Thanks,
>
> Ram
>
>
> *
> This e-mail may contain confidential or privileged information.
> If you are not the intended recipient, please notify the sender immediately
> and then delete it.
>
> TIAA
> *


NiFi processor to convert CSV to XML

2016-08-25 Thread Nathamuni, Ramanujam
Hello All,

I am looking for processor to convert  CSV file to XML. I looked at the 
processors available but I do not see one for CSV to XML? Is there any  
workaround using other processor to this job? Or any can write new processor 
for this function?


Thanks,
Ram
*
This e-mail may contain confidential or privileged information.
If you are not the intended recipient, please notify the sender immediately and 
then delete it.

TIAA
*


Re: dynamic getTwitter ?

2016-08-25 Thread Andy LoPresto
And of course the Developer Guide [1] and Contributor Guide [2] on the NiFi 
site.

[1] https://nifi.apache.org/docs/nifi-docs/html/developer-guide.html
[2] https://cwiki.apache.org/confluence/display/NIFI/Contributor+Guide 



Andy LoPresto
alopre...@apache.org
alopresto.apa...@gmail.com
PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4  BACE 3C6E F65B 2F7D EF69

> On Aug 25, 2016, at 11:48 AM, Andy LoPresto  wrote:
> 
> Another crazy idea — would it be more computationally efficient to use NiFi’s 
> REST API to add a new instance of the GetTwitter processor if a new endpoint 
> was needed? Basically track using the state manager which terms are currently 
> registered (a map of terms to processor IDs) and if a new term needs to be 
> searched, duplicate an existing processor and replace the search term? They 
> could all be located in a specific PG to allow for isolation from the 
> “meta-flow” that is operating on NiFi itself.
> 
> Andy LoPresto
> alopre...@apache.org 
> alopresto.apa...@gmail.com 
> PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4  BACE 3C6E F65B 2F7D EF69
> 
>> On Aug 25, 2016, at 11:45 AM, Andy LoPresto > > wrote:
>> 
>> Yeah I had a feeling there was a reason it didn’t support EL in the first 
>> place but didn’t know enough of the context. Thanks Aldrin.
>> 
>> @Sven,
>> 
>> Writing a custom processor is always a good exercise. If you are familiar 
>> with Python/Groovy/Ruby I would suggest prototyping with ExecuteScript to 
>> get a feel for the processor lifecycle and very rapid development feedback 
>> loop, and then transition to full-scale NAR development.
>> 
>> If you run into any roadblocks or have more in-depth questions, I would 
>> recommend asking on the developer list as it is a bit more technical and 
>> some of the experienced NiFi users (even those not on the core development 
>> team) respond quickly to questions on that list.
>> 
>> Matt Burgess has written a number of articles about this that are very 
>> helpful [1][2].
>> 
>> [1] 
>> https://funnifi.blogspot.com/2016/02/executescript-processor-hello-world.html
>>  
>> 
>> [2] 
>> https://funnifi.blogspot.com/2016/02/writing-reusable-scripted-processors-in.html
>>  
>> 
>> 
>> Andy LoPresto
>> alopre...@apache.org 
>> alopresto.apa...@gmail.com 
>> PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4  BACE 3C6E F65B 2F7D EF69
>> 
>>> On Aug 25, 2016, at 10:34 AM, Sven Davison >> > wrote:
>>> 
>>> Good to know. The more i think about this, the more it seems like a 
>>> tech/dev version of the movie called 'pentagon wars'. Maybe a custom 
>>> processor would serve a duel purpose.. getting it done.. and building my 
>>> first custom processor.
>>> 
>>> On Thu, Aug 25, 2016 at 1:24 PM, Aldrin Piri >> > wrote:
>>> One consideration for why it does not support EL is due to client the 
>>> processor is wrapping that registers with a given endpoint.  EL would 
>>> require this disconnect/reconnection process to potentially happen on every 
>>> FlowFile presented to the processor (some smart caching could certainly 
>>> lessen the effect). Currently, filtering and such is very much integrated 
>>> with the lifecycle of the processor.  A more dynamic processor could be 
>>> achieved, but will come with a few caveats.
>>> 
>>> On Thu, Aug 25, 2016 at 1:03 PM, Sven Davison >> > wrote:
>>> thats, close to the same flow i was looking at really. but was chucked out 
>>> for lack of EL support w/in GetTwitter. The good news is... we're learning!
>>> 
>>> On Thu, Aug 25, 2016 at 12:52 PM, Andy LoPresto >> > wrote:
>>> Hi Sven,
>>> 
>>> Someone may have a more streamlined solution, but I’d suggest taking a look 
>>> at ExecuteSQL [1] to read from the database, ConvertAvroToJSON [2] to 
>>> convert the output of the SQL query to JSON, and EvaluateJsonPath [3] to 
>>> extract the specific values you are interested in. Then use UpdateAttribute 
>>> [4] to populate those values from the flowfile content to an attribute, and 
>>> finally use GetTwitter [5] to filter on those values.
>>> 
>>> However, at this time the query fields in GetTwitter do not support 
>>> Expression Language, so you will have to:
>>> 
>>> * Modify the source of GetTwitter to support EL
>>> * Raise a Jira requesting this feature
>>> * Write a small script wrapping GetTwitter using ExecuteScript [6] to 
>>> populate those values
>>> 
>>> Sorry it’s 

Re: dynamic getTwitter ?

2016-08-25 Thread Andy LoPresto
Another crazy idea — would it be more computationally efficient to use NiFi’s 
REST API to add a new instance of the GetTwitter processor if a new endpoint 
was needed? Basically track using the state manager which terms are currently 
registered (a map of terms to processor IDs) and if a new term needs to be 
searched, duplicate an existing processor and replace the search term? They 
could all be located in a specific PG to allow for isolation from the 
“meta-flow” that is operating on NiFi itself.

Andy LoPresto
alopre...@apache.org
alopresto.apa...@gmail.com
PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4  BACE 3C6E F65B 2F7D EF69

> On Aug 25, 2016, at 11:45 AM, Andy LoPresto  wrote:
> 
> Yeah I had a feeling there was a reason it didn’t support EL in the first 
> place but didn’t know enough of the context. Thanks Aldrin.
> 
> @Sven,
> 
> Writing a custom processor is always a good exercise. If you are familiar 
> with Python/Groovy/Ruby I would suggest prototyping with ExecuteScript to get 
> a feel for the processor lifecycle and very rapid development feedback loop, 
> and then transition to full-scale NAR development.
> 
> If you run into any roadblocks or have more in-depth questions, I would 
> recommend asking on the developer list as it is a bit more technical and some 
> of the experienced NiFi users (even those not on the core development team) 
> respond quickly to questions on that list.
> 
> Matt Burgess has written a number of articles about this that are very 
> helpful [1][2].
> 
> [1] 
> https://funnifi.blogspot.com/2016/02/executescript-processor-hello-world.html 
> 
> [2] 
> https://funnifi.blogspot.com/2016/02/writing-reusable-scripted-processors-in.html
>  
> 
> 
> Andy LoPresto
> alopre...@apache.org 
> alopresto.apa...@gmail.com 
> PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4  BACE 3C6E F65B 2F7D EF69
> 
>> On Aug 25, 2016, at 10:34 AM, Sven Davison > > wrote:
>> 
>> Good to know. The more i think about this, the more it seems like a tech/dev 
>> version of the movie called 'pentagon wars'. Maybe a custom processor would 
>> serve a duel purpose.. getting it done.. and building my first custom 
>> processor.
>> 
>> On Thu, Aug 25, 2016 at 1:24 PM, Aldrin Piri > > wrote:
>> One consideration for why it does not support EL is due to client the 
>> processor is wrapping that registers with a given endpoint.  EL would 
>> require this disconnect/reconnection process to potentially happen on every 
>> FlowFile presented to the processor (some smart caching could certainly 
>> lessen the effect). Currently, filtering and such is very much integrated 
>> with the lifecycle of the processor.  A more dynamic processor could be 
>> achieved, but will come with a few caveats.
>> 
>> On Thu, Aug 25, 2016 at 1:03 PM, Sven Davison > > wrote:
>> thats, close to the same flow i was looking at really. but was chucked out 
>> for lack of EL support w/in GetTwitter. The good news is... we're learning!
>> 
>> On Thu, Aug 25, 2016 at 12:52 PM, Andy LoPresto > > wrote:
>> Hi Sven,
>> 
>> Someone may have a more streamlined solution, but I’d suggest taking a look 
>> at ExecuteSQL [1] to read from the database, ConvertAvroToJSON [2] to 
>> convert the output of the SQL query to JSON, and EvaluateJsonPath [3] to 
>> extract the specific values you are interested in. Then use UpdateAttribute 
>> [4] to populate those values from the flowfile content to an attribute, and 
>> finally use GetTwitter [5] to filter on those values.
>> 
>> However, at this time the query fields in GetTwitter do not support 
>> Expression Language, so you will have to:
>> 
>> * Modify the source of GetTwitter to support EL
>> * Raise a Jira requesting this feature
>> * Write a small script wrapping GetTwitter using ExecuteScript [6] to 
>> populate those values
>> 
>> Sorry it’s not a cleaner solution. I would encourage you to raise the Jira 
>> [7] to have GetTwitter support EL in the query properties. It’s likely I am 
>> overlooking a potential simpler flow, but without EL support in GetTwitter, 
>> I don’t see an easy way forward.
>> 
>> [1] 
>> https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi.processors.standard.ExecuteSQL/index.html
>>  
>> 
>> [2] 
>> https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi.processors.avro.ConvertAvroToJSON/index.html
>>  
>> 

Re: dynamic getTwitter ?

2016-08-25 Thread Sven Davison
Good to know. The more i think about this, the more it seems like a
tech/dev version of the movie called 'pentagon wars'. Maybe a custom
processor would serve a duel purpose.. getting it done.. and building my
first custom processor.

On Thu, Aug 25, 2016 at 1:24 PM, Aldrin Piri  wrote:

> One consideration for why it does not support EL is due to client the
> processor is wrapping that registers with a given endpoint.  EL would
> require this disconnect/reconnection process to potentially happen on every
> FlowFile presented to the processor (some smart caching could certainly
> lessen the effect). Currently, filtering and such is very much integrated
> with the lifecycle of the processor.  A more dynamic processor could be
> achieved, but will come with a few caveats.
>
> On Thu, Aug 25, 2016 at 1:03 PM, Sven Davison 
> wrote:
>
>> thats, close to the same flow i was looking at really. but was chucked
>> out for lack of EL support w/in GetTwitter. The good news is... we're
>> learning!
>>
>> On Thu, Aug 25, 2016 at 12:52 PM, Andy LoPresto 
>> wrote:
>>
>>> Hi Sven,
>>>
>>> Someone may have a more streamlined solution, but I’d suggest taking a
>>> look at ExecuteSQL [1] to read from the database, ConvertAvroToJSON [2] to
>>> convert the output of the SQL query to JSON, and EvaluateJsonPath [3] to
>>> extract the specific values you are interested in. Then use UpdateAttribute
>>> [4] to populate those values from the flowfile content to an attribute, and
>>> finally use GetTwitter [5] to filter on those values.
>>>
>>> However, at this time the query fields in GetTwitter do not support
>>> Expression Language, so you will have to:
>>>
>>> * Modify the source of GetTwitter to support EL
>>> * Raise a Jira requesting this feature
>>> * Write a small script wrapping GetTwitter using ExecuteScript [6] to
>>> populate those values
>>>
>>> Sorry it’s not a cleaner solution. I would encourage you to raise the
>>> Jira [7] to have GetTwitter support EL in the query properties. It’s likely
>>> I am overlooking a potential simpler flow, but without EL support in
>>> GetTwitter, I don’t see an easy way forward.
>>>
>>> [1] https://nifi.apache.org/docs/nifi-docs/components/org.ap
>>> ache.nifi.processors.standard.ExecuteSQL/index.html
>>> [2] https://nifi.apache.org/docs/nifi-docs/components/org.ap
>>> ache.nifi.processors.avro.ConvertAvroToJSON/index.html
>>> [3] https://nifi.apache.org/docs/nifi-docs/components/org.ap
>>> ache.nifi.processors.standard.EvaluateJsonPath/index.html
>>> [4] https://nifi.apache.org/docs/nifi-docs/components/org.ap
>>> ache.nifi.processors.attributes.UpdateAttribute/index.html
>>> [5] https://nifi.apache.org/docs/nifi-docs/components/org.ap
>>> ache.nifi.processors.twitter.GetTwitter/index.html
>>> [6] https://nifi.apache.org/docs/nifi-docs/components/org.ap
>>> ache.nifi.processors.script.ExecuteScript/index.html
>>> [7] https://issues.apache.org/jira/secure/CreateIssue!default.jspa
>>>
>>> Andy LoPresto
>>> alopre...@apache.org
>>> *alopresto.apa...@gmail.com *
>>> PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4  BACE 3C6E F65B 2F7D EF69
>>>
>>> On Aug 25, 2016, at 9:10 AM, Sven Davison  wrote:
>>>
>>> i have a GetTwitter processor which works wonders. I'm tracking a few
>>> people and a couple hash tags but i'm also pulling all hashtags out of the
>>> posts and tracking how many times i saw it and when the last time was that
>>> i saw it.
>>>
>>> example tweet: "hello world #earth #usa"
>>>
>>> if i'm watching #usa, i'll still get both tags and put them into my
>>> database. using the tag as the id, a count for how many times it's been
>>> seen and a lastSeen field for when it was last seen.
>>>
>>> what i would like to do, is dynamically follow new tags upon condition
>>> X. Say... once #earth gets more than 500 posts and only if the tag was seen
>>> in the last 7 days. I can make a view in MySQL to build the result set, but
>>> how do i get that result set into nifi, to follow those tags that will
>>> change.
>>>
>>>
>>>
>>>
>>
>


Re: dynamic getTwitter ?

2016-08-25 Thread Aldrin Piri
One consideration for why it does not support EL is due to client the
processor is wrapping that registers with a given endpoint.  EL would
require this disconnect/reconnection process to potentially happen on every
FlowFile presented to the processor (some smart caching could certainly
lessen the effect). Currently, filtering and such is very much integrated
with the lifecycle of the processor.  A more dynamic processor could be
achieved, but will come with a few caveats.

On Thu, Aug 25, 2016 at 1:03 PM, Sven Davison  wrote:

> thats, close to the same flow i was looking at really. but was chucked out
> for lack of EL support w/in GetTwitter. The good news is... we're learning!
>
> On Thu, Aug 25, 2016 at 12:52 PM, Andy LoPresto 
> wrote:
>
>> Hi Sven,
>>
>> Someone may have a more streamlined solution, but I’d suggest taking a
>> look at ExecuteSQL [1] to read from the database, ConvertAvroToJSON [2] to
>> convert the output of the SQL query to JSON, and EvaluateJsonPath [3] to
>> extract the specific values you are interested in. Then use UpdateAttribute
>> [4] to populate those values from the flowfile content to an attribute, and
>> finally use GetTwitter [5] to filter on those values.
>>
>> However, at this time the query fields in GetTwitter do not support
>> Expression Language, so you will have to:
>>
>> * Modify the source of GetTwitter to support EL
>> * Raise a Jira requesting this feature
>> * Write a small script wrapping GetTwitter using ExecuteScript [6] to
>> populate those values
>>
>> Sorry it’s not a cleaner solution. I would encourage you to raise the
>> Jira [7] to have GetTwitter support EL in the query properties. It’s likely
>> I am overlooking a potential simpler flow, but without EL support in
>> GetTwitter, I don’t see an easy way forward.
>>
>> [1] https://nifi.apache.org/docs/nifi-docs/components/org.ap
>> ache.nifi.processors.standard.ExecuteSQL/index.html
>> [2] https://nifi.apache.org/docs/nifi-docs/components/org.ap
>> ache.nifi.processors.avro.ConvertAvroToJSON/index.html
>> [3] https://nifi.apache.org/docs/nifi-docs/components/org.ap
>> ache.nifi.processors.standard.EvaluateJsonPath/index.html
>> [4] https://nifi.apache.org/docs/nifi-docs/components/org.ap
>> ache.nifi.processors.attributes.UpdateAttribute/index.html
>> [5] https://nifi.apache.org/docs/nifi-docs/components/org.ap
>> ache.nifi.processors.twitter.GetTwitter/index.html
>> [6] https://nifi.apache.org/docs/nifi-docs/components/org.ap
>> ache.nifi.processors.script.ExecuteScript/index.html
>> [7] https://issues.apache.org/jira/secure/CreateIssue!default.jspa
>>
>> Andy LoPresto
>> alopre...@apache.org
>> *alopresto.apa...@gmail.com *
>> PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4  BACE 3C6E F65B 2F7D EF69
>>
>> On Aug 25, 2016, at 9:10 AM, Sven Davison  wrote:
>>
>> i have a GetTwitter processor which works wonders. I'm tracking a few
>> people and a couple hash tags but i'm also pulling all hashtags out of the
>> posts and tracking how many times i saw it and when the last time was that
>> i saw it.
>>
>> example tweet: "hello world #earth #usa"
>>
>> if i'm watching #usa, i'll still get both tags and put them into my
>> database. using the tag as the id, a count for how many times it's been
>> seen and a lastSeen field for when it was last seen.
>>
>> what i would like to do, is dynamically follow new tags upon condition X.
>> Say... once #earth gets more than 500 posts and only if the tag was seen in
>> the last 7 days. I can make a view in MySQL to build the result set, but
>> how do i get that result set into nifi, to follow those tags that will
>> change.
>>
>>
>>
>>
>


Re: dynamic getTwitter ?

2016-08-25 Thread Sven Davison
thats, close to the same flow i was looking at really. but was chucked out
for lack of EL support w/in GetTwitter. The good news is... we're learning!

On Thu, Aug 25, 2016 at 12:52 PM, Andy LoPresto 
wrote:

> Hi Sven,
>
> Someone may have a more streamlined solution, but I’d suggest taking a
> look at ExecuteSQL [1] to read from the database, ConvertAvroToJSON [2] to
> convert the output of the SQL query to JSON, and EvaluateJsonPath [3] to
> extract the specific values you are interested in. Then use UpdateAttribute
> [4] to populate those values from the flowfile content to an attribute, and
> finally use GetTwitter [5] to filter on those values.
>
> However, at this time the query fields in GetTwitter do not support
> Expression Language, so you will have to:
>
> * Modify the source of GetTwitter to support EL
> * Raise a Jira requesting this feature
> * Write a small script wrapping GetTwitter using ExecuteScript [6] to
> populate those values
>
> Sorry it’s not a cleaner solution. I would encourage you to raise the Jira
> [7] to have GetTwitter support EL in the query properties. It’s likely I am
> overlooking a potential simpler flow, but without EL support in GetTwitter,
> I don’t see an easy way forward.
>
> [1] https://nifi.apache.org/docs/nifi-docs/components/org.
> apache.nifi.processors.standard.ExecuteSQL/index.html
> [2] https://nifi.apache.org/docs/nifi-docs/components/org.
> apache.nifi.processors.avro.ConvertAvroToJSON/index.html
> [3] https://nifi.apache.org/docs/nifi-docs/components/org.
> apache.nifi.processors.standard.EvaluateJsonPath/index.html
> [4] https://nifi.apache.org/docs/nifi-docs/components/org.
> apache.nifi.processors.attributes.UpdateAttribute/index.html
> [5] https://nifi.apache.org/docs/nifi-docs/components/org.
> apache.nifi.processors.twitter.GetTwitter/index.html
> [6] https://nifi.apache.org/docs/nifi-docs/components/org.
> apache.nifi.processors.script.ExecuteScript/index.html
> [7] https://issues.apache.org/jira/secure/CreateIssue!default.jspa
>
> Andy LoPresto
> alopre...@apache.org
> *alopresto.apa...@gmail.com *
> PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4  BACE 3C6E F65B 2F7D EF69
>
> On Aug 25, 2016, at 9:10 AM, Sven Davison  wrote:
>
> i have a GetTwitter processor which works wonders. I'm tracking a few
> people and a couple hash tags but i'm also pulling all hashtags out of the
> posts and tracking how many times i saw it and when the last time was that
> i saw it.
>
> example tweet: "hello world #earth #usa"
>
> if i'm watching #usa, i'll still get both tags and put them into my
> database. using the tag as the id, a count for how many times it's been
> seen and a lastSeen field for when it was last seen.
>
> what i would like to do, is dynamically follow new tags upon condition X.
> Say... once #earth gets more than 500 posts and only if the tag was seen in
> the last 7 days. I can make a view in MySQL to build the result set, but
> how do i get that result set into nifi, to follow those tags that will
> change.
>
>
>
>


Re: dynamic getTwitter ?

2016-08-25 Thread Andy LoPresto
Hi Sven,

Someone may have a more streamlined solution, but I’d suggest taking a look at 
ExecuteSQL [1] to read from the database, ConvertAvroToJSON [2] to convert the 
output of the SQL query to JSON, and EvaluateJsonPath [3] to extract the 
specific values you are interested in. Then use UpdateAttribute [4] to populate 
those values from the flowfile content to an attribute, and finally use 
GetTwitter [5] to filter on those values.

However, at this time the query fields in GetTwitter do not support Expression 
Language, so you will have to:

* Modify the source of GetTwitter to support EL
* Raise a Jira requesting this feature
* Write a small script wrapping GetTwitter using ExecuteScript [6] to populate 
those values

Sorry it’s not a cleaner solution. I would encourage you to raise the Jira [7] 
to have GetTwitter support EL in the query properties. It’s likely I am 
overlooking a potential simpler flow, but without EL support in GetTwitter, I 
don’t see an easy way forward.

[1] 
https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi.processors.standard.ExecuteSQL/index.html
[2] 
https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi.processors.avro.ConvertAvroToJSON/index.html
[3] 
https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi.processors.standard.EvaluateJsonPath/index.html
[4] 
https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi.processors.attributes.UpdateAttribute/index.html
[5] 
https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi.processors.twitter.GetTwitter/index.html
[6] 
https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi.processors.script.ExecuteScript/index.html
[7] https://issues.apache.org/jira/secure/CreateIssue!default.jspa

Andy LoPresto
alopre...@apache.org
alopresto.apa...@gmail.com
PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4  BACE 3C6E F65B 2F7D EF69

> On Aug 25, 2016, at 9:10 AM, Sven Davison  wrote:
> 
> i have a GetTwitter processor which works wonders. I'm tracking a few people 
> and a couple hash tags but i'm also pulling all hashtags out of the posts and 
> tracking how many times i saw it and when the last time was that i saw it.
> 
> example tweet: "hello world #earth #usa"
> 
> if i'm watching #usa, i'll still get both tags and put them into my database. 
> using the tag as the id, a count for how many times it's been seen and a 
> lastSeen field for when it was last seen.
> 
> what i would like to do, is dynamically follow new tags upon condition X. 
> Say... once #earth gets more than 500 posts and only if the tag was seen in 
> the last 7 days. I can make a view in MySQL to build the result set, but how 
> do i get that result set into nifi, to follow those tags that will change.
> 
> 



signature.asc
Description: Message signed with OpenPGP using GPGMail


Re: Need to read a small local file into a flow file property

2016-08-25 Thread McDermott, Chris Kevin (MSDU - STaTS/StorefrontRemote)
Thanks, everyone.  I’ll give the ExecuteScript solution a try.

Chris McDermott

Remote Business Analytics
STaTS/StoreFront Remote
HPE Storage
Hewlett Packard Enterprise
Mobile: +1 978-697-5315

[cid:image001.png@01D1FECE.1ED700A0]

From: "Oxenberg, Jeff" 
Reply-To: "users@nifi.apache.org" 
Date: Thursday, August 25, 2016 at 11:52 AM
To: "users@nifi.apache.org" 
Subject: RE: Need to read a small local file into a flow file property

Yeah, that would work.. here’s a quick example in python that works on my local 
machine.

https://gist.github.com/jeffoxenberg/327b0dfeaa6bb63882279dd290222582

Thanks,


Jeff Oxenberg

From: Andre [mailto:andre-li...@fucs.org]
Sent: Thursday, August 25, 2016 8:41 AM
To: users@nifi.apache.org
Subject: Re: Need to read a small local file into a flow file property



wouldn't scripted task using ExecuteScript solve this issue?

You could simply use jython, groovy, jruby, luaj or javascript to read the 
contents and add to the attributes. Just be mindful that if I recall correctly 
attributes are size constrained.

Cheers

On Thu, Aug 25, 2016 at 11:17 PM, McDermott, Chris Kevin (MSDU - 
STaTS/StorefrontRemote) 
> wrote:
Sorry, I should have been more clear.  I have a flow file with conten.  To that 
flow file, I need to add the content of a disk file as an attribute without 
losing the original content.

Does that better explain things?

Chris McDermott

Remote Business Analytics
STaTS/StoreFront Remote
HPE Storage
Hewlett Packard Enterprise
Mobile: +1 978-697-5315

[cid:image002.png@01D1FECE.1ED700A0]

From: Matt Burgess >
Reply-To: "users@nifi.apache.org" 
>
Date: Wednesday, August 24, 2016 at 5:13 PM
To: "users@nifi.apache.org" 
>
Subject: Re: Need to read a small local file into a flow file property

Chris,

Are you looking to have a flow file that has its own content also as an 
attribute? With EvaluateJsonPath, are you taking in the entire document? If so, 
you could use ExtractText with a regex that captures all text and puts it in an 
attribute, I believe the content of the flow file is untouched.

Please let me know if I've misunderstood your use case, I'm a little confused 
as to why you have two paths and step 3. Wouldn't #1 and #2 (with 
"flowfile-attribute" as the Destination) read the file into an attribute and 
also keep it in the content?

Regards,
Matt

On Wed, Aug 24, 2016 at 4:33 PM, McDermott, Chris Kevin (MSDU - 
STaTS/StorefrontRemote) 
> wrote:
Hi folks,

I’m looking for some ideas here.  I need to read the content of a small local 
file info a flow file attribute.  I can’t find a processor that does this.  Did 
I miss one that does?

So without one of these I’ve been trying to do this using a MergeContent 
processor.

First, I assign a correlation UUID and store it in an attribute

I split by file down two processing paths.  The left hand path goes straight to 
the MergeContentProcessors.

In the right hand path I

1.   Read the content of the local file using FetchFile

2.   Pull the content of the FlowFile into an attribute using 
EvaluateJSONPath

3.   Clear the content of the FlowFile using ReplaceText


Then I combine the left and right legs using MergeContent using the assigned 
correlation UUID to merge the files.

This generally works, except when it doesn’t. ☺

The problem seems to be that the left hand side of the stream flows relatively 
faster than the right hand path, which makes sense.  This can lead to the 
“bins” in the MergeContent processor being reused before the file in the bin 
can be merged with the file traveling down the right hand path causing 
Uncorrelated files are then sent to the merged output.

Does it sound like I am using the MergeContent processor in the right way?

Any other ideas?


Thanks in advance,

Chris McDermott

Remote Business Analytics
STaTS/StoreFront Remote
HPE Storage
Hewlett Packard Enterprise
Mobile: +1 978-697-5315

[cid:image003.png@01D1FECE.1ED700A0]




RE: Need to read a small local file into a flow file property

2016-08-25 Thread Oxenberg, Jeff
Yeah, that would work.. here’s a quick example in python that works on my local 
machine.

https://gist.github.com/jeffoxenberg/327b0dfeaa6bb63882279dd290222582

Thanks,


Jeff Oxenberg

From: Andre [mailto:andre-li...@fucs.org]
Sent: Thursday, August 25, 2016 8:41 AM
To: users@nifi.apache.org
Subject: Re: Need to read a small local file into a flow file property



wouldn't scripted task using ExecuteScript solve this issue?

You could simply use jython, groovy, jruby, luaj or javascript to read the 
contents and add to the attributes. Just be mindful that if I recall correctly 
attributes are size constrained.

Cheers

On Thu, Aug 25, 2016 at 11:17 PM, McDermott, Chris Kevin (MSDU - 
STaTS/StorefrontRemote) 
> wrote:
Sorry, I should have been more clear.  I have a flow file with conten.  To that 
flow file, I need to add the content of a disk file as an attribute without 
losing the original content.

Does that better explain things?

Chris McDermott

Remote Business Analytics
STaTS/StoreFront Remote
HPE Storage
Hewlett Packard Enterprise
Mobile: +1 978-697-5315

[cid:image001.png@01D1FEBE.82285E20]

From: Matt Burgess >
Reply-To: "users@nifi.apache.org" 
>
Date: Wednesday, August 24, 2016 at 5:13 PM
To: "users@nifi.apache.org" 
>
Subject: Re: Need to read a small local file into a flow file property

Chris,

Are you looking to have a flow file that has its own content also as an 
attribute? With EvaluateJsonPath, are you taking in the entire document? If so, 
you could use ExtractText with a regex that captures all text and puts it in an 
attribute, I believe the content of the flow file is untouched.

Please let me know if I've misunderstood your use case, I'm a little confused 
as to why you have two paths and step 3. Wouldn't #1 and #2 (with 
"flowfile-attribute" as the Destination) read the file into an attribute and 
also keep it in the content?

Regards,
Matt

On Wed, Aug 24, 2016 at 4:33 PM, McDermott, Chris Kevin (MSDU - 
STaTS/StorefrontRemote) 
> wrote:
Hi folks,

I’m looking for some ideas here.  I need to read the content of a small local 
file info a flow file attribute.  I can’t find a processor that does this.  Did 
I miss one that does?

So without one of these I’ve been trying to do this using a MergeContent 
processor.

First, I assign a correlation UUID and store it in an attribute

I split by file down two processing paths.  The left hand path goes straight to 
the MergeContentProcessors.

In the right hand path I

1.   Read the content of the local file using FetchFile

2.   Pull the content of the FlowFile into an attribute using 
EvaluateJSONPath

3.   Clear the content of the FlowFile using ReplaceText


Then I combine the left and right legs using MergeContent using the assigned 
correlation UUID to merge the files.

This generally works, except when it doesn’t. ☺

The problem seems to be that the left hand side of the stream flows relatively 
faster than the right hand path, which makes sense.  This can lead to the 
“bins” in the MergeContent processor being reused before the file in the bin 
can be merged with the file traveling down the right hand path causing 
Uncorrelated files are then sent to the merged output.

Does it sound like I am using the MergeContent processor in the right way?

Any other ideas?


Thanks in advance,

Chris McDermott

Remote Business Analytics
STaTS/StoreFront Remote
HPE Storage
Hewlett Packard Enterprise
Mobile: +1 978-697-5315

[cid:image002.png@01D1FEBE.82285E20]




Re: NiFi global variables / persisting state outside of a pipeline

2016-08-25 Thread Mike Harding
Thanks Bryan - I was unaware of the MapCacheServer functionality - I've now
implemented the approached suggested and it works perfectly.

Mike

On 25 August 2016 at 15:05, Joe Witt  wrote:

> also this is a great use case which has been done quite a bit in the
> past using exactly the sort of logic Bryan calls out.  We've also done
> things like written custom controller services specific to the type of
> data and data structures needed for the job.  But the
> plumbing/infrastructure for it is well supported to avoid the RPC
> calls you mention, ensure the cache gets frequently updated live, and
> that the cache can be used by numerous components at once.
>
> Thanks
> Joe
>
> On Thu, Aug 25, 2016 at 9:57 AM, Bryan Bende  wrote:
> > Hi Mike,
> >
> > I think one approach might the following...
> >
> > Setup controller services for DistributedMapCacheServer and
> > DistributedMapCacheClient, then have part of your flow that is triggered
> > periodically and queries your Hive table, probably need to split/parse
> the
> > results, and then use PutDistributedMapCache processor to store them in
> the
> > cache.
> >
> > In the other part of your flow use FetchDistributedMapCache to do a look
> up
> > against the cache.
> >
> > I haven't worked through all of the exact steps, but I think something
> like
> > that should work.
> >
> > Thanks,
> >
> > Bryan
> >
> > On Thu, Aug 25, 2016 at 6:38 AM, Mike Harding 
> > wrote:
> >>
> >> Hi All,
> >>
> >> I have a mapping table stored in hive that maps an ID to a readable name
> >> string. When a JSON object enters my nifi pipeline as a flowfile I want
> to
> >> be able to inject the readable name string into the JSON object. The
> problem
> >> is currently as each flowfile enters the pipe I have to make a
> SelectHiveQL
> >> call tofirst get the lookup table data and store as attributes.
> >>
> >> Is there a way I can load the lookup table data once or on a periodic
> >> basis into nifi (as a global variable/attribute) to save having to make
> the
> >> select call for each flowfile which translates to 1000's of calls a
> minute?
> >>
> >> Thanks,
> >> Mike
> >
> >
>


Re: NiFi global variables / persisting state outside of a pipeline

2016-08-25 Thread Joe Witt
also this is a great use case which has been done quite a bit in the
past using exactly the sort of logic Bryan calls out.  We've also done
things like written custom controller services specific to the type of
data and data structures needed for the job.  But the
plumbing/infrastructure for it is well supported to avoid the RPC
calls you mention, ensure the cache gets frequently updated live, and
that the cache can be used by numerous components at once.

Thanks
Joe

On Thu, Aug 25, 2016 at 9:57 AM, Bryan Bende  wrote:
> Hi Mike,
>
> I think one approach might the following...
>
> Setup controller services for DistributedMapCacheServer and
> DistributedMapCacheClient, then have part of your flow that is triggered
> periodically and queries your Hive table, probably need to split/parse the
> results, and then use PutDistributedMapCache processor to store them in the
> cache.
>
> In the other part of your flow use FetchDistributedMapCache to do a look up
> against the cache.
>
> I haven't worked through all of the exact steps, but I think something like
> that should work.
>
> Thanks,
>
> Bryan
>
> On Thu, Aug 25, 2016 at 6:38 AM, Mike Harding 
> wrote:
>>
>> Hi All,
>>
>> I have a mapping table stored in hive that maps an ID to a readable name
>> string. When a JSON object enters my nifi pipeline as a flowfile I want to
>> be able to inject the readable name string into the JSON object. The problem
>> is currently as each flowfile enters the pipe I have to make a SelectHiveQL
>> call tofirst get the lookup table data and store as attributes.
>>
>> Is there a way I can load the lookup table data once or on a periodic
>> basis into nifi (as a global variable/attribute) to save having to make the
>> select call for each flowfile which translates to 1000's of calls a minute?
>>
>> Thanks,
>> Mike
>
>


Re: NiFi global variables / persisting state outside of a pipeline

2016-08-25 Thread Bryan Bende
Hi Mike,

I think one approach might the following...

Setup controller services for DistributedMapCacheServer and
DistributedMapCacheClient, then have part of your flow that is triggered
periodically and queries your Hive table, probably need to split/parse the
results, and then use PutDistributedMapCache processor to store them in the
cache.

In the other part of your flow use FetchDistributedMapCache to do a look up
against the cache.

I haven't worked through all of the exact steps, but I think something like
that should work.

Thanks,

Bryan

On Thu, Aug 25, 2016 at 6:38 AM, Mike Harding 
wrote:

> Hi All,
>
> I have a mapping table stored in hive that maps an ID to a readable name
> string. When a JSON object enters my nifi pipeline as a flowfile I want to
> be able to inject the readable name string into the JSON object. The
> problem is currently as each flowfile enters the pipe I have to make a
> SelectHiveQL call tofirst get the lookup table data and store as attributes.
>
> Is there a way I can load the lookup table data once or on a periodic
> basis into nifi (as a global variable/attribute) to save having to make the
> select call for each flowfile which translates to 1000's of calls a minute?
>
> Thanks,
> Mike
>


Re: Need to read a small local file into a flow file property

2016-08-25 Thread Andre
wouldn't scripted task using ExecuteScript solve this issue?

You could simply use jython, groovy, jruby, luaj or javascript to read the
contents and add to the attributes. Just be mindful that if I recall
correctly attributes are size constrained.

Cheers

On Thu, Aug 25, 2016 at 11:17 PM, McDermott, Chris Kevin (MSDU -
STaTS/StorefrontRemote)  wrote:

> Sorry, I should have been more clear.  I have a flow file with conten.  To
> that flow file, I need to add the content of a disk file as an attribute
> without losing the original content.
>
>
>
> Does that better explain things?
>
>
>
> Chris McDermott
>
>
>
> Remote Business Analytics
>
> STaTS/StoreFront Remote
>
> HPE Storage
>
> Hewlett Packard Enterprise
>
> Mobile: +1 978-697-5315
>
>
>
>
>
> *From: *Matt Burgess 
> *Reply-To: *"users@nifi.apache.org" 
> *Date: *Wednesday, August 24, 2016 at 5:13 PM
> *To: *"users@nifi.apache.org" 
> *Subject: *Re: Need to read a small local file into a flow file property
>
>
>
> Chris,
>
>
>
> Are you looking to have a flow file that has its own content also as an
> attribute? With EvaluateJsonPath, are you taking in the entire document? If
> so, you could use ExtractText with a regex that captures all text and puts
> it in an attribute, I believe the content of the flow file is untouched.
>
>
>
> Please let me know if I've misunderstood your use case, I'm a little
> confused as to why you have two paths and step 3. Wouldn't #1 and #2 (with
> "flowfile-attribute" as the Destination) read the file into an attribute
> and also keep it in the content?
>
>
>
> Regards,
>
> Matt
>
>
>
> On Wed, Aug 24, 2016 at 4:33 PM, McDermott, Chris Kevin (MSDU -
> STaTS/StorefrontRemote)  wrote:
>
> Hi folks,
>
>
>
> I’m looking for some ideas here.  I need to read the content of a small
> local file info a flow file attribute.  I can’t find a processor that does
> this.  Did I miss one that does?
>
>
>
> So without one of these I’ve been trying to do this using a MergeContent
> processor.
>
>
>
> First, I assign a correlation UUID and store it in an attribute
>
>
>
> I split by file down two processing paths.  The left hand path goes
> straight to the MergeContentProcessors.
>
>
>
> In the right hand path I
>
> 1.   Read the content of the local file using FetchFile
>
> 2.   Pull the content of the FlowFile into an attribute using
> EvaluateJSONPath
>
> 3.   Clear the content of the FlowFile using ReplaceText
>
>
>
> Then I combine the left and right legs using MergeContent using the
> assigned correlation UUID to merge the files.
>
>
>
> This generally works, except when it doesn’t. J
>
>
>
> The problem seems to be that the left hand side of the stream flows
> relatively faster than the right hand path, which makes sense.  This can
> lead to the “bins” in the MergeContent processor being reused before the
> file in the bin can be merged with the file traveling down the right hand
> path causing Uncorrelated files are then sent to the merged output.
>
>
>
> Does it sound like I am using the MergeContent processor in the right way?
>
>
>
> Any other ideas?
>
>
>
>
>
> Thanks in advance,
>
>
>
> Chris McDermott
>
>
>
> Remote Business Analytics
>
> STaTS/StoreFront Remote
>
> HPE Storage
>
> Hewlett Packard Enterprise
>
> Mobile: +1 978-697-5315
>
>
>
>
>
>


Re: Need to read a small local file into a flow file property

2016-08-25 Thread McDermott, Chris Kevin (MSDU - STaTS/StorefrontRemote)
Sorry, I should have been more clear.  I have a flow file with conten.  To that 
flow file, I need to add the content of a disk file as an attribute without 
losing the original content.

Does that better explain things?

Chris McDermott

Remote Business Analytics
STaTS/StoreFront Remote
HPE Storage
Hewlett Packard Enterprise
Mobile: +1 978-697-5315

[cid:image001.png@01D1FEB1.02CE0E20]

From: Matt Burgess 
Reply-To: "users@nifi.apache.org" 
Date: Wednesday, August 24, 2016 at 5:13 PM
To: "users@nifi.apache.org" 
Subject: Re: Need to read a small local file into a flow file property

Chris,

Are you looking to have a flow file that has its own content also as an 
attribute? With EvaluateJsonPath, are you taking in the entire document? If so, 
you could use ExtractText with a regex that captures all text and puts it in an 
attribute, I believe the content of the flow file is untouched.

Please let me know if I've misunderstood your use case, I'm a little confused 
as to why you have two paths and step 3. Wouldn't #1 and #2 (with 
"flowfile-attribute" as the Destination) read the file into an attribute and 
also keep it in the content?

Regards,
Matt

On Wed, Aug 24, 2016 at 4:33 PM, McDermott, Chris Kevin (MSDU - 
STaTS/StorefrontRemote) 
> wrote:
Hi folks,

I’m looking for some ideas here.  I need to read the content of a small local 
file info a flow file attribute.  I can’t find a processor that does this.  Did 
I miss one that does?

So without one of these I’ve been trying to do this using a MergeContent 
processor.

First, I assign a correlation UUID and store it in an attribute

I split by file down two processing paths.  The left hand path goes straight to 
the MergeContentProcessors.

In the right hand path I

1.   Read the content of the local file using FetchFile

2.   Pull the content of the FlowFile into an attribute using 
EvaluateJSONPath

3.   Clear the content of the FlowFile using ReplaceText


Then I combine the left and right legs using MergeContent using the assigned 
correlation UUID to merge the files.

This generally works, except when it doesn’t. ☺

The problem seems to be that the left hand side of the stream flows relatively 
faster than the right hand path, which makes sense.  This can lead to the 
“bins” in the MergeContent processor being reused before the file in the bin 
can be merged with the file traveling down the right hand path causing 
Uncorrelated files are then sent to the merged output.

Does it sound like I am using the MergeContent processor in the right way?

Any other ideas?


Thanks in advance,

Chris McDermott

Remote Business Analytics
STaTS/StoreFront Remote
HPE Storage
Hewlett Packard Enterprise
Mobile: +1 978-697-5315

[cid:image002.png@01D1FEB1.02CE0E20]



NiFi global variables / persisting state outside of a pipeline

2016-08-25 Thread Mike Harding
Hi All,

I have a mapping table stored in hive that maps an ID to a readable name
string. When a JSON object enters my nifi pipeline as a flowfile I want to
be able to inject the readable name string into the JSON object. The
problem is currently as each flowfile enters the pipe I have to make a
SelectHiveQL call tofirst get the lookup table data and store as attributes.

Is there a way I can load the lookup table data once or on a periodic basis
into nifi (as a global variable/attribute) to save having to make the
select call for each flowfile which translates to 1000's of calls a minute?

Thanks,
Mike