Re: How do you use ElasticSearch with NiFi?

2019-02-20 Thread Joe Percivall
Hey Mike,

As a data point, we're ingesting into ES v6 using PutElasticsearchHttp and
PutElasticsearchHttpRecord. We do almost no querying of anything in ES
using NiFi. Continued improvement around ingesting into ES would be our
core use-case.

One item that frustrated me was the issue around failures in the record
processor that I put up a PR here[1]. Another example of a potential
improvement would be to not load the entire request body (and thus all the
records/FF content) into memory when inserting into ES using those
processors. Not 100% sure how you would go about doing that but would be an
awesome improvement. Of course, any other improvements around performance
would also be welcome.

[1] https://github.com/apache/nifi/pull/3299

Cheers,
Joe

On Wed, Feb 20, 2019 at 8:08 AM Mike Thomsen  wrote:

> I'm looking for feedback from ElasticSearch users on how they use and how
> they **want** to use ElasticSearch v5 and newer with NiFi.
>
> So please respond with some use cases and what you want, what frustrates
> you, etc. so I can prioritize Jira tickets for the ElasticSearch REST API
> bundle.
>
> (Note: basic JSON DSL queries are already supported via
> JsonQueryElasticSearch. If you didn't know that, please try it out and drop
> some feedback on what is needed to make it work for your use cases.)
>
> Thanks,
>
> Mike
>


-- 
*Joe Percivall*
linkedin.com/in/Percivall
e: jperciv...@apache.com


Re: PutElasticsearchHttp can not use Flowfile attribute for ES_URL

2019-02-04 Thread Joe Percivall
I believe also one of the reasons this was done is because
PutElasticsearchHttp takes in batches of FlowFiles and does a bulk insert.
In order to support FlowFile attribute expression on the URL, we would have
to either only act on one FlowFile at a time or determine another mechanism
for handling that ambiguity.

PutElasticsearcHttpRecord on the other hand only takes in a single FlowFile
with each onTrigger and could be more easily updated to support that
use-case.

Cheers,
Joe

On Mon, Feb 4, 2019 at 5:05 PM Matt Burgess  wrote:

> The restriction to using the variable registry only has always been
> there AFAIK, but as of 1.6 we made the distinction in documentation on
> how expression language would be evaluated for each property. The
> choice was so that we weren't constantly recreating connections for
> each flow file, in fact all concurrent tasks share the same underlying
> OkHttpClient.
>
> We could probably do something fancier where we allow flowfile
> attributes to be evaluated as well, but have a modestly-sized
> least-recently-used (LRU) cache of clients, keeping them open until
> they are evicted (and closing them all when stopped). Please feel free
> to file an improvement Jira and we can discuss further there.
>
> Regards,
> Matt
>
> On Mon, Feb 4, 2019 at 4:16 PM Jean-Sebastien Vachon
>  wrote:
> >
> > Hi all,
> >
> > I was just finishing modifying my flow to make it more reusable by
> having my source document containing information about where to store the
> final document (some Elasticsearch index)
> > Everything was fine until I found out that the PutElasticsearchHttp's
> documentation was saying this...
> >
> > Supports Expression Language: true (will be evaluated using variable
> registry only)
> >
> >
> > It looks like this restriction appeared around Nifi 1.6 (as per the
> documentation)... is there a reason for such a limitation?
> >
> > My current flow was extracting the information from the input JSON
> document and saving the information inside a Flow attribute.
> >
> > What can I do about this?  I don't like monkey patching.. is there any
> other way to get around this?
> >
> > Thanks
>


-- 
*Joe Percivall*
linkedin.com/in/Percivall
e: jperciv...@apache.com


Re: Anyone using HashAttribute?

2018-09-05 Thread Joe Percivall
Hey Andy,

We're currently using the HashAttribute processor. The use-case is that we
have various events that come in but sometimes those events are just
updates of previous ones. We store everything in ElasticSearch. So for
certain events, we'll calculate a hash based on a couple of attributes in
order to have a composite unique key to upsert as the ES _id. This allows
us to easily just insert/update events that are the same (as determined by
the hashed composite key).

As for the configuration of the processors, we're essentially just
specifying exact attributes as dynamic properties of HashAttribute. Then
passing that FF to PutElasticSearchHttp with the resulting attribute from
HashAttribute as the "Identifier Attribute".

Joe

On Mon, Sep 3, 2018 at 9:52 PM Andy LoPresto  wrote:

> I opened PRs for 2980 [1] and 2983 [2] which add more performant,
> consistent, and full-featured processors to calculate cryptographic hashes
> of flowfile content and flowfile attributes. I would like to deprecate and
> drop support for HashAttribute, as it performs a convoluted calculation
> that was probably useful in an old scenario, but doesn’t “hash attributes”
> like the name implies. As it blocks the new implementation from using that
> name and following our naming convention, I am hoping to find anyone still
> using the old implementation and understand their use case. Thanks for your
> help.
>
> [1] https://github.com/apache/nifi/pull/2980
> [2] https://github.com/apache/nifi/pull/2983
>
>
>
> Andy LoPresto
> alopre...@apache.org
> *alopresto.apa...@gmail.com *
> PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4  BACE 3C6E F65B 2F7D EF69
>
>

-- 
*Joe Percivall*
linkedin.com/in/Percivall
e: jperciv...@apache.com


Re: Best practices for running Apache NiFi in production in a Docker container

2018-09-04 Thread Joe Percivall
Hi Peter,

Thanks for the follow-up. Yup, I agree with the relationship between
Xmx/Xms and UseCGroupMemoryLimitForHeap/MaxRAMFraction.

For the MaxRAMFraction=1 though, it seems that it does leave at least some
room for off-heap memory that said, that may not be enough for NiFi and
it's normal use-cases. Has anyone run the Native Memory Tracker[1] on a
"real-world" system? It gives a nice dump of where/how the JVM is using
memory[2].

[1]
https://docs.oracle.com/javase/8/docs/technotes/guides/troubleshoot/tooldescr007.html
[2] http://trustmeiamadeveloper.com/2016/03/18/where-is-my-memory-java/

Joe

On Fri, Aug 31, 2018 at 5:10 AM Peter Wilcsinszky <
peterwilcsins...@gmail.com> wrote:

> Hi,
>
> I haven't done extensive research in this area but ran through the
> articles and also found another one [1]. From what I understand
> UseCGroupMemoryLimitForHeap is just the dynamic version of setting memory
> limits manually using Xmx and Xms which is currently done by the NiFi start
> script explicitly. In an environment where it should be done in a more
> dynamic fashion the UseCGroupMemoryLimitForHeap with proper MaxRAMFraction
> should be used but for caveats check the comments here: [1] and here: [2]
> (My understanding: MaxRAMFraction=1 considered to be unsafe,
> MaxRAMFraction=2 leaves half the memory unused)
>
> [1] https://banzaicloud.com/blog/java-resource-limits/
> [2]
> https://stackoverflow.com/questions/49854237/is-xxmaxramfraction-1-safe-for-production-in-a-containered-environment
>
>
> On Thu, Aug 30, 2018 at 7:54 PM Joe Percivall 
> wrote:
>
>> Hey everyone,
>>
>> I was recently searching for a best practice guide for running a
>> production instance of Apache NiFi within a Docker container and couldn't
>> find anything specific other than the normal guidance for best practices of
>> a high-performance instance[1]. I did expand my search for best practices
>> on running the JVM within a container and found a couple good
>> articles[2][3]. The first of which explains why the JVM will take up more
>> than is set via "Xmx" and the second is about 2 JVM options which were
>> backported from Java 9 to JDK 8u131 specifically for configuring the JVM
>> heap for running in a "VM".
>>
>> So with that, a couple questions:
>> 1: Does anyone have any best practices or lessons learned specifically
>> for running NiFi in a container?
>> 2:  "UseCGroupMemoryLimitForHeap" and "MaxRAMFraction" are technically
>> "Experimental VM Options", has anyone used them in practice?
>>
>> [1]
>> https://community.hortonworks.com/articles/7882/hdfnifi-best-practices-for-setting-up-a-high-perfo.html
>>
>> [2]
>> https://developers.redhat.com/blog/2017/04/04/openjdk-and-containers/#more-433899
>> [3]
>> https://blog.csanchez.org/2017/05/31/running-a-jvm-in-a-container-without-getting-killed/
>>
>> Thanks,
>> Joe
>> --
>> *Joe Percivall*
>> linkedin.com/in/Percivall
>> e: jperciv...@apache.com
>>
>

-- 
*Joe Percivall*
linkedin.com/in/Percivall
e: jperciv...@apache.com


Best practices for running Apache NiFi in production in a Docker container

2018-08-30 Thread Joe Percivall
Hey everyone,

I was recently searching for a best practice guide for running a production
instance of Apache NiFi within a Docker container and couldn't find
anything specific other than the normal guidance for best practices of a
high-performance instance[1]. I did expand my search for best practices on
running the JVM within a container and found a couple good articles[2][3].
The first of which explains why the JVM will take up more than is set via
"Xmx" and the second is about 2 JVM options which were backported from Java
9 to JDK 8u131 specifically for configuring the JVM heap for running in a
"VM".

So with that, a couple questions:
1: Does anyone have any best practices or lessons learned specifically for
running NiFi in a container?
2:  "UseCGroupMemoryLimitForHeap" and "MaxRAMFraction" are technically
"Experimental VM Options", has anyone used them in practice?

[1]
https://community.hortonworks.com/articles/7882/hdfnifi-best-practices-for-setting-up-a-high-perfo.html

[2]
https://developers.redhat.com/blog/2017/04/04/openjdk-and-containers/#more-433899
[3]
https://blog.csanchez.org/2017/05/31/running-a-jvm-in-a-container-without-getting-killed/

Thanks,
Joe
-- 
*Joe Percivall*
linkedin.com/in/Percivall
e: jperciv...@apache.com


Re: Prioritizing flowFiles to tailor throughput

2017-12-18 Thread Joe Percivall
Hey James,

Sorry, no one responded when you first sent the message but I'm curious
what you ended up doing and any findings you had. Also, wanted to bring
this thread back up to the attention of the larger group as it brings up
some interesting questions I haven't found discussed elsewhere.

On the topic of the re-sorting of the queue, I was curious about the
answer, so I dug down to the StandardFlowFileQueue and found that it's
primarily just wrapping an instance of Java's PriorityQueue for its active
queue[1]. This means that sorting is done each time a FlowFile is enqueued
but also that we have immediate access to the head of the queue. I'm sure
someone else (Mark Payne?) could explain better how we make use of the
nuances of the queue for better performance and the impacts the different
queue prioritizers have.

For the higher priority FlowFiles starving out lower priority ones, I'm
thinking about a way to give a weight instead of a priority. So in essence,
a "weighted funnel processor", which grabs X Flowfiles each time but has a
weighting assigned to different categories such that you take a certain
number of each category based on a given weight. That said, I'm not sure
that would be guaranteed to work when FlowFiles in the queue are swapped
out since even if we iterated over everything in the incoming connection,
there are still others swapped to disk. Also, there's probably performance
concerns if we tried to implement it using the current tools offered to a
processor.

For the separate NiFis approach, I'm curious what other's view is.
Personally, it makes sense to me, that for flows that are dramatically
different in priority you'd want to section it off to another instance of
NiFi. Essentially the separation between data-plane and control-plane
instances of NiFi.


Lastly, James, I assume you're limited to using the 0.7.x release for a
specific reason? I'd highly suggest upgrading to the latest version
whenever possible. There are many security and performance improvements,
and of course many new features.

[1]
https://github.com/apache/nifi/blob/7f4cfd51ea07ead6c9b71b6c6d6f87a352b801d3/nifi-nar-bundles/nifi-framework-bundle/nifi-framework/nifi-framework-core/src/main/java/org/apache/nifi/controller/StandardFlowFileQueue.java#L89

Joe

On Thu, Oct 19, 2017 at 8:58 AM, James McMahon <jsmcmah...@gmail.com> wrote:

> Our team is considering ways to accelerate delivery for certain subsets of
> content we process through NiFi. We are using Apache NiFi 0.7.x as our
> baseline.
>
> This link discusses a recommended approach to content prioritization using
> PriorityAttributePrioritizer on a connector (queue) to tailor throughput
> based on a priority attribute we set upstream in our flow:
>
> https://stackoverflow.com/questions/42528993/how-to-
> specify-priority-attributes-for-individual-flowfiles
>
> How often does the connector queue have to re-sort contents in order to
> enforce our priority attribute? Is it re-sorting *every *single time new
> flowFiles hit the queue? Won't that markedly and negatively impact
> performance?
>
> If our priority 1s are a huge volume of flowfiles that persists over time,
> won't this approach cause our priority 2s, 3s, etc etc to languish in queue?
>
> The described approach seems to embed significant business logic in the
> NiFi workflows. In an environment where priorities change often, would that
> be considered a poor approach? Might it be better to enforce priority
> processing at a higher architectural level - a lightweight NiFi server to
> accelerate delivery of priority one content and email alerts, a priority
> two suite of NiFi servers for standard flowfile volume, a priority three
> suite of servers to handle long-term bulk processing, etc etc?
>
> Thanks in advance for your help.  -Jim
>



-- 
*Joe Percivall*
linkedin.com/in/Percivall
e: jperciv...@apache.com


Re: Back Pressure Object threshold not honored

2017-04-27 Thread Joe Percivall
Hello Kevin,

I believe there are two things at play here. The first is the processor
being very fast and processing the FlowFiles before back pressure gets
applied. The second is that in the current distribution, UpdateAttribute
uses an old style of getting higher performance and grabs batches of 100
with each onTrigger[1]. Since back-pressure gets applied per onTrigger the
UpdateAttribute will process at least 100 FlowFiles before it gets told to
stop processing.

In the changes for 1.2.0 though I updated it to bring only 1 FlowFile in
per onTrigger. So if you test this on a build of master then you should see
more appropriate back-pressure application.

[1]
https://github.com/apache/nifi/blob/rel/nifi-1.1.2/nifi-nar-bundles/nifi-update-attribute-bundle/nifi-update-attribute-processor/src/main/java/org/apache/nifi/processors/attributes/UpdateAttribute.java#L338

Joe

On Thu, Apr 27, 2017 at 7:21 PM, Kevin Verhoeven <kevin.verhoe...@ds-iq.com>
wrote:

> Thank you for your help Andy. I think you are correct, the flowfiles are
> very small and the previous Processor is very fast – this might explain
> what is happening. I’ve enclosed screenshots of the connection properties
> and the workflow. In the screenshot I see 400 flowfiles were allowed
> through before back pressure was applied. The back pressure object
> threshold is set to 1. Do you have any recommendations?
>
>
>
> Kevin
>
>
>
>
>
>
>
> *From:* Andy LoPresto [mailto:alopre...@apache.org]
> *Sent:* Thursday, April 27, 2017 4:16 PM
> *To:* users@nifi.apache.org
> *Subject:* Re: Back Pressure Object threshold not honored
>
>
>
> Hi Kevin,
>
>
>
> Sorry to hear you are having this issue. Can you please provide a
> screenshot of the connection properties in the configuration dialog? How
> quickly do those flowfiles get enqueued? I think there’s a chance if they
> are very small & the previous processor is very fast (i.e.
> RouteOnAttribute, SplitText) that it could enqueue a higher number before
> the back pressure check is executed.
>
>
>
> Andy LoPresto
>
> alopre...@apache.org
>
> *alopresto.apa...@gmail.com <alopresto.apa...@gmail.com>*
>
> PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4  BACE 3C6E F65B 2F7D EF69
>
>
>
> On Apr 27, 2017, at 4:07 PM, Kevin Verhoeven <kevin.verhoe...@ds-iq.com>
> wrote:
>
>
>
> I have an odd problem. I set the Back Pressure Object threshold on a link
> between two Processors to 1, but 200 flowfiles are passed to the queue
> before back pressure is honored. I need the back pressure to be set to a
> small number of flowfiles to keep the source from flooding the destination.
> Has anyone come across this problem before? I am running 12 instances of
> NiFi on version 1.1.1 on Ubuntu 14.04.
>
>
>
> Regards,
>
>
>
> Kevin
>
>
>



-- 
*Joe Percivall*
e: joeperciv...@gmail.com


Re: MiNiFi's differentiator

2017-04-20 Thread Joe Percivall
Hello Jeff,

Glad to hear the WholeConfigDifferentiator is working well. While it is
configurable, that is currently the only implementation. Is there a
specific place that suggests there are currently more implementations? The
documentation lists it as the only one[1]. When I wrote it up I implemented
it in such a way so that we could easily add other differentiators in the
future. One such example is once we have the NiFi-Registry in place, a
differentiator that takes advantage of whatever the uniqueness scheme that
is implemented for that.

[1]
https://github.com/apache/nifi-minifi/blob/master/minifi-docs/src/main/markdown/System_Admin_Guide.md#automatic-warm-redeploy

Joe

On Thu, Apr 20, 2017 at 10:06 AM, Jeff Zemerick <jzemer...@apache.org>
wrote:

> Hi all,
>
> MiNiFi's WholeConfigDifferentiator along with the PullHttpChangeIngestor
> is working well for me. I see that the differentiator is configurable and
> additional implementations can be provided. Are there any examples of
> circumstances in which the using WholeConfigDifferentiator would not be the
> best choice?
>
> Thanks,
> Jeff
>



-- 
*Joe Percivall*
linkedin.com/in/Percivall
e: jperciv...@apache.com


Re: Expression language access to flowfile content?

2017-01-16 Thread Joe Percivall
Nick,

A main reason why content isn't referenceable by expression language is
that it promotes an anti-pattern. Reading the content takes significantly
more time than attributes and should be limited as much as possible. Also
content can potentially be very large and reading the entire thing in
memory can cause many adverse effects.

Having less processors does not mean the flow is better (similar to concise
code != efficient code). You can have a couple very specialized processors
that very effectively stream the content and do operations as needed.
Alternatively using EL to grab the content each time ${content()} is called
would introduce many processors worth of memory and IO inefficiency.

You can learn more about the different repositories by reading this doc[1].

[1] https://nifi.apache.org/docs/nifi-docs/html/nifi-in-depth.html

Joe

On Mon, Jan 16, 2017 at 3:15 PM, Nick Carenza <
nick.care...@thecontrolgroup.com> wrote:

> I found a section in the nifi expression language documentation of
> Subjectless Functions. Could/should this feature be added as one of those?
>
> On Mon, Jan 16, 2017 at 11:53 AM, Nick Carenza <
> nick.care...@thecontrolgroup.com> wrote:
>
>> Thanks Bryan, that is what I ended up doing. PutEmail works the same was
>> as PutSlack in this regard. It expects you to specify a message as a
>> property that supports expression language and doesn't give you an option
>> to use the flowfile's content in the message but _does_ allow you to attach
>> the flowfile.
>>
>> If expression language was capable of retrieving flowfile content
>> directly, I could reduce the number of processors significantly.
>>
>> A major problem with this might be the need to introduce a reserved
>> attribute which would have the potential to cause compatibility problems in
>> some flows. Or perhaps a function that could serve as the root of an
>> expression `${content()}`.
>>
>> On Mon, Jan 16, 2017 at 11:26 AM, Bryan Bende <bbe...@gmail.com> wrote:
>>
>>> Nick,
>>>
>>> The current approach is to use ExtractText to extract the entire flow
>>> file content to an attribute which can then be referenced in expression
>>> language.
>>>
>>> Keep in mind this means the entire content will be read into memory
>>> which in some cases may not be a good idea.
>>>
>>> I would think that PutSlack should have a strategy to decide where the
>>> message should come from (attribute vs content), but I am not familiar with
>>> that processor to really say if it is a good idea.
>>>
>>> -Bryan
>>>
>>>
>>> On Mon, Jan 16, 2017 at 1:40 PM, Nick Carenza <
>>> nick.care...@thecontrolgroup.com> wrote:
>>>
>>>> Is there any way to access flowfile content with expression language?
>>>>
>>>> I am trying to use monitor activity with putslack but monitor activity
>>>> creates flowfiles with confiugrable content but putslack requires you to
>>>> supply a message property using expression language which as far as i can
>>>> tell doesn't have access to that flowfile content.
>>>>
>>>> Without having to put another processor in between monitor activity and
>>>> put slack, is there a way to use the flowfile content directly from
>>>> expression language?
>>>>
>>>> If not does anyone else think this would be really useful to make
>>>> processors like these more compatible?
>>>>
>>>
>>>
>>
>


-- 
*Joe Percivall*
linkedin.com/in/Percivall
e: jperciv...@apache.com


Re: How to use Rest API ressource data-transfer?

2017-01-16 Thread Joe Percivall
This question was answered by Bryan Bende here[1]. The response was:


That REST end-point was introduced when NiFi introduced Site-To-Site over
HTTP, previously Site-To-Site was always over TCP. There is a Site-To-Site
Java client which probably makes more sense to use rather than going
directly to the API:

https://github.com/apache/nifi/blob/master/nifi-commons/nifi-site-to-site-client/src/main/java/org/apache/nifi/remote/client/SiteToSiteClient.java

Set the transport protocol to HTTP.




[1]
https://community.hortonworks.com/questions/78054/how-to-use-rest-api-ressource-data-transfer.html

On Mon, Jan 16, 2017 at 9:24 AM, Matt Gilman <matt.c.gil...@gmail.com>
wrote:

> I'm not super familiar with this endpoint so hopefully someone else can
> chime in here if necessary. But I believe the transactionId is already
> composed in the Location header of this endpoint
>
> POST
> /data-transfer/input-ports/{portId}/transactions
>
> There is a Java class [1] available in NiFi commons site to site client
> that can be used to interact with these endpoints. Hopefully, this can be
> helpful.
>
> Matt
>
> [1] https://github.com/apache/nifi/blob/master/nifi-commons/
> nifi-site-to-site-client/src/main/java/org/apache/nifi/remote/util/
> SiteToSiteRestApiClient.java
>
> On Mon, Jan 16, 2017 at 8:35 AM, iboumedien <iboumed...@gmail.com> wrote:
>
>> Hi guys,
>>
>> I want to use the REST API of Nifi in particularly this one:
>>
>> POST
>> /data-transfer/input-ports/{portId}/transactions/{transactio
>> nId}/flow-files
>>
>> But I'm not to found how I ca get the transactionId?
>>
>> Can anyone help me?
>>
>> Best regards
>>
>> Ismael
>>
>>
>>
>> --
>> View this message in context: http://apache-nifi-users-list.
>> 2361937.n4.nabble.com/How-to-use-Rest-API-ressource-data-tra
>> nsfer-tp666.html
>> Sent from the Apache NiFi Users List mailing list archive at Nabble.com.
>>
>
>


-- 
*Joe Percivall*
linkedin.com/in/Percivall
e: jperciv...@apache.com


Re: Publish NiFi 1.1.1 to Maven

2016-12-29 Thread Joe Percivall
Hello Kevin,

Thanks for emailing the users list. What problems exactly are you running
into?

The repository.apache.org is the Apache repo that holds the artifacts after
they are released by the Release Manager. I released 1.1.1 in the same way
as 0.7.1, 1.0.0 and 1.0.1 so I'm not sure where the trouble would be.

Also I checked maven central repo (which I believe is the default) and it
has 1.1.1, as seen here:
http://search.maven.org/#artifactdetails%7Corg.apache.nifi%7Cnifi%7C1.1.1%7Cpom

Joe

On Thu, Dec 29, 2016 at 5:48 PM, Kevin Verhoeven 
wrote:

> Looks like an alternate repo has it: https://repository.apache.org/
> content/repositories/releases/org/apache/nifi/nifi-api/. So if we add a
> repo config line to the maven pom file to use
> https://repository.apache.org/content/repositories/releases/, instead of
> the default central it can find it.
>
>
>
> Thanks,
>
>
>
> Kevin
>
>
>
> *From:* Kevin Verhoeven [mailto:kevin.verhoe...@ds-iq.com]
> *Sent:* Thursday, December 29, 2016 2:44 PM
> *To:* users@nifi.apache.org
> *Subject:* Publish NiFi 1.1.1 to Maven
>
>
>
> I see that NiFi 1.1.1 is available as a download, do you know if/when the
> artifact will be published to Maven?
>
>
>
> Kevin
>
>
>



-- 

- - - - - -
*Joseph Percivall*
linkedin.com/in/Percivall
e: jperciv...@apache.com


Re: Building nifi locally on my mac

2016-12-29 Thread Joe Percivall
Hello,

I don't see any text between "... test error :" and "(on CentOS it works
fine)". Could you try reformatting and resending?


Joe
On Thu, Dec 29, 2016 at 5:26 AM, ddewaele  wrote:

> When I try to create a local build of nifi on my mac I always get the
> following test error : (on CentOS it works fine).
>
> Any idea what is causing this and how this can be fixed ?
>
>
>
>
>
> --
> View this message in context: http://apache-nifi-users-list.
> 2361937.n4.nabble.com/Building-nifi-locally-on-my-mac-tp542.html
> Sent from the Apache NiFi Users List mailing list archive at Nabble.com.
>



-- 

- - - - - -
*Joseph Percivall*
linkedin.com/in/Percivall
e: jperciv...@apache.com


New 'Powered by Apache NiFi' Website Section

2016-12-27 Thread Joe Percivall
Hello NiFi Users,

Thanks to a contribution by Andre de Miranda we have a new "Powered By
Apache NiFi" section on our webpage[1]. It would be great to expand this
list out with many more companies and organizations that are using Apache
NiFi.

If you'd like to be added to this list please feel free to reply on this
thread or create a PR with the necessary information:

* organization name
* industry
* a brief description on how Apache NiFi is used to help your organization.

[1] https://nifi.apache.org/powered-by-nifi.html

Thanks,
Joe

- - - - - -
*Joseph Percivall*
e: jperciv...@apache.com


Re: NiFi Cron scheduling

2016-12-21 Thread Joe Percivall
Totally forgot that UpdateAttribute with State was not in the previous release 
but is currently merged to master. You'd have to run 1.2.0-SNAPSHOT in order to 
do that workaround.

Joe
- - - - - - 
Joseph Percivall
linkedin.com/in/Percivall
e: joeperciv...@yahoo.com



On Wednesday, December 21, 2016, 2:06:44 PM EST, Joe Percivall 
<joeperciv...@yahoo.com> wrote:
I have created a JIRA here[1] for the issue.

If you need a fix/workaround now (and are using the latest version), then you 
may want to utilize the newly added ability to use state with UpdateAttribute. 
You can set up a rule to remember the last time a FlowFile triggered the 
processor and if it is a misfire (done too soon after) then add an attribute to 
it so you can route it off. If you want I can create a template explaining that.


[1] https://issues.apache.org/jira/browse/NIFI-3242

- - - - - - 
Joseph Percivall
linkedin.com/in/Percivall
e: joeperciv...@yahoo.com



On Wednesday, December 21, 2016, 9:04:32 AM EST, Joe Percivall 
<joeperciv...@yahoo.com> wrote:
Totally didn't realized I accidentally moved this off of the mailing list. 
Brining it back.

--

I just realized as well that NiFi is only using Quartz for the CronExpression. 
I think this bug is due to the logic going on in this block[1]. One thing 
specifically that worries me is in the CronExpression code here[2]. It sets the 
milliseconds to 0 when doing the computation and then in NiFi's next line it 
finds the delay using milliseconds.

That said, this code is part of the initial contribution and has been running 
for many years. I'm surprised this has never come up before.

I can write up a Jira ticket for this later today.

[1] 
https://github.com/apache/nifi/blob/c10d11d378ffd7c306830e24d50c5befc98a/nifi-nar-bundles/nifi-framework-bundle/nifi-framework/nifi-framework-core/src/main/java/org/apache/nifi/controller/scheduling/QuartzSchedulingAgent.java#L177-L177
[2] 
https://github.com/quartz-scheduler/quartz/blob/quartz-2.2.1/quartz-core/src/main/java/org/quartz/CronExpression.java#L1170

- - - - - - 
Joseph Percivall
linkedin.com/in/Percivall
e: joeperciv...@yahoo.com



On Wednesday, December 21, 2016, 5:33:32 AM EST, Davy De Waele 
<ddewa...@gmail.com> wrote:
Hi,

Also on 10.10.5. A colleague of mine has the same issue on his mac. Here is his 
output

2016-12-21 08:32:00,046 DEBUG [Timer-Driven Process Thread-2] 
o.a.n.c.scheduling.QuartzSchedulingAgent Finished task for 
GenerateFlowFile[id=20476217-0159-1000-63f4-7a3821b0f21a]; next scheduled time 
is at Wed Dec 21 08:34:00 CET 2016 after a delay of 119954 milliseconds
2016-12-21 08:34:00,001 DEBUG [Timer-Driven Process Thread-9] 
o.a.n.c.scheduling.QuartzSchedulingAgent Finished task for 
GenerateFlowFile[id=20476217-0159-1000-63f4-7a3821b0f21a]; next scheduled time 
is at Wed Dec 21 08:36:00 CET 2016 after a delay of 11 milliseconds
2016-12-21 08:36:00,002 DEBUG [Timer-Driven Process Thread-7] 
o.a.n.c.scheduling.QuartzSchedulingAgent Finished task for 
GenerateFlowFile[id=20476217-0159-1000-63f4-7a3821b0f21a]; next scheduled time 
is at Wed Dec 21 08:38:00 CET 2016 after a delay of 119998 milliseconds
2016-12-21 08:37:59,999 DEBUG [Timer-Driven Process Thread-6] 
o.a.n.c.scheduling.QuartzSchedulingAgent Finished task for 
GenerateFlowFile[id=20476217-0159-1000-63f4-7a3821b0f21a]; next scheduled time 
is at Wed Dec 21 08:38:00 CET 2016 after a delay of 1 milliseconds
2016-12-21 08:38:00,003 DEBUG [Timer-Driven Process Thread-6] 
o.a.n.c.scheduling.QuartzSchedulingAgent Finished task for 
GenerateFlowFile[id=20476217-0159-1000-63f4-7a3821b0f21a]; next scheduled time 
is at Wed Dec 21 08:40:00 CET 2016 after a delay of 119997 milliseconds
2016-12-21 08:40:00,002 DEBUG [Timer-Driven Process Thread-6] 
o.a.n.c.scheduling.QuartzSchedulingAgent Finished task for 
GenerateFlowFile[id=20476217-0159-1000-63f4-7a3821b0f21a]; next scheduled time 
is at Wed Dec 21 08:42:00 CET 2016 after a delay of 119998 milliseconds
2016-12-21 08:42:00,004 DEBUG [Timer-Driven Process Thread-8] 
o.a.n.c.scheduling.QuartzSchedulingAgent Finished task for 
GenerateFlowFile[id=20476217-0159-1000-63f4-7a3821b0f21a]; next scheduled time 
is at Wed Dec 21 08:44:00 CET 2016 after a delay of 119996 milliseconds
2016-12-21 08:44:00,002 DEBUG [Timer-Driven Process Thread-6] 
o.a.n.c.scheduling.QuartzSchedulingAgent Finished task for 
GenerateFlowFile[id=20476217-0159-1000-63f4-7a3821b0f21a]; next scheduled time 
is at Wed Dec 21 08:46:00 CET 2016 after a delay of 119998 milliseconds

So it seems that sometimes the schedule is fired too soon. In this case :
* processor task fiinished at 08:37:59,999, a result of the job that was 
scheduled at 08:36:00,002 ( I assume that for some reason this job was 
scheduled x ms too soon, causing the GenerateFlowFileProcessor to finish the 
job early)
* processor task fiinished at 08:38:00,003, a result of the job that was 
scheduled at 08:37:59,999 ( this gave the 

Re: NiFi Cron scheduling

2016-12-21 Thread Joe Percivall
I have created a JIRA here[1] for the issue.

If you need a fix/workaround now (and are using the latest version), then you 
may want to utilize the newly added ability to use state with UpdateAttribute. 
You can set up a rule to remember the last time a FlowFile triggered the 
processor and if it is a misfire (done too soon after) then add an attribute to 
it so you can route it off. If you want I can create a template explaining that.


[1] https://issues.apache.org/jira/browse/NIFI-3242

- - - - - - 
Joseph Percivall
linkedin.com/in/Percivall
e: joeperciv...@yahoo.com



On Wednesday, December 21, 2016, 9:04:32 AM EST, Joe Percivall 
<joeperciv...@yahoo.com> wrote:
Totally didn't realized I accidentally moved this off of the mailing list. 
Brining it back.

--

I just realized as well that NiFi is only using Quartz for the CronExpression. 
I think this bug is due to the logic going on in this block[1]. One thing 
specifically that worries me is in the CronExpression code here[2]. It sets the 
milliseconds to 0 when doing the computation and then in NiFi's next line it 
finds the delay using milliseconds.

That said, this code is part of the initial contribution and has been running 
for many years. I'm surprised this has never come up before.

I can write up a Jira ticket for this later today.

[1] 
https://github.com/apache/nifi/blob/c10d11d378ffd7c306830e24d50c5befc98a/nifi-nar-bundles/nifi-framework-bundle/nifi-framework/nifi-framework-core/src/main/java/org/apache/nifi/controller/scheduling/QuartzSchedulingAgent.java#L177-L177
[2] 
https://github.com/quartz-scheduler/quartz/blob/quartz-2.2.1/quartz-core/src/main/java/org/quartz/CronExpression.java#L1170

- - - - - - 
Joseph Percivall
linkedin.com/in/Percivall
e: joeperciv...@yahoo.com



On Wednesday, December 21, 2016, 5:33:32 AM EST, Davy De Waele 
<ddewa...@gmail.com> wrote:
Hi,

Also on 10.10.5. A colleague of mine has the same issue on his mac. Here is his 
output

2016-12-21 08:32:00,046 DEBUG [Timer-Driven Process Thread-2] 
o.a.n.c.scheduling.QuartzSchedulingAgent Finished task for 
GenerateFlowFile[id=20476217-0159-1000-63f4-7a3821b0f21a]; next scheduled time 
is at Wed Dec 21 08:34:00 CET 2016 after a delay of 119954 milliseconds
2016-12-21 08:34:00,001 DEBUG [Timer-Driven Process Thread-9] 
o.a.n.c.scheduling.QuartzSchedulingAgent Finished task for 
GenerateFlowFile[id=20476217-0159-1000-63f4-7a3821b0f21a]; next scheduled time 
is at Wed Dec 21 08:36:00 CET 2016 after a delay of 11 milliseconds
2016-12-21 08:36:00,002 DEBUG [Timer-Driven Process Thread-7] 
o.a.n.c.scheduling.QuartzSchedulingAgent Finished task for 
GenerateFlowFile[id=20476217-0159-1000-63f4-7a3821b0f21a]; next scheduled time 
is at Wed Dec 21 08:38:00 CET 2016 after a delay of 119998 milliseconds
2016-12-21 08:37:59,999 DEBUG [Timer-Driven Process Thread-6] 
o.a.n.c.scheduling.QuartzSchedulingAgent Finished task for 
GenerateFlowFile[id=20476217-0159-1000-63f4-7a3821b0f21a]; next scheduled time 
is at Wed Dec 21 08:38:00 CET 2016 after a delay of 1 milliseconds
2016-12-21 08:38:00,003 DEBUG [Timer-Driven Process Thread-6] 
o.a.n.c.scheduling.QuartzSchedulingAgent Finished task for 
GenerateFlowFile[id=20476217-0159-1000-63f4-7a3821b0f21a]; next scheduled time 
is at Wed Dec 21 08:40:00 CET 2016 after a delay of 119997 milliseconds
2016-12-21 08:40:00,002 DEBUG [Timer-Driven Process Thread-6] 
o.a.n.c.scheduling.QuartzSchedulingAgent Finished task for 
GenerateFlowFile[id=20476217-0159-1000-63f4-7a3821b0f21a]; next scheduled time 
is at Wed Dec 21 08:42:00 CET 2016 after a delay of 119998 milliseconds
2016-12-21 08:42:00,004 DEBUG [Timer-Driven Process Thread-8] 
o.a.n.c.scheduling.QuartzSchedulingAgent Finished task for 
GenerateFlowFile[id=20476217-0159-1000-63f4-7a3821b0f21a]; next scheduled time 
is at Wed Dec 21 08:44:00 CET 2016 after a delay of 119996 milliseconds
2016-12-21 08:44:00,002 DEBUG [Timer-Driven Process Thread-6] 
o.a.n.c.scheduling.QuartzSchedulingAgent Finished task for 
GenerateFlowFile[id=20476217-0159-1000-63f4-7a3821b0f21a]; next scheduled time 
is at Wed Dec 21 08:46:00 CET 2016 after a delay of 119998 milliseconds

So it seems that sometimes the schedule is fired too soon. In this case :
* processor task fiinished at 08:37:59,999, a result of the job that was 
scheduled at 08:36:00,002 ( I assume that for some reason this job was 
scheduled x ms too soon, causing the GenerateFlowFileProcessor to finish the 
job early)
* processor task fiinished at 08:38:00,003, a result of the job that was 
scheduled at 08:37:59,999 ( this gave the scheduler a window to schedule an 
additional job that could be completed).
I'm running it in a docker container now and don't have any issues.
As soon as I run the distribution directly on my mac I can reproduce it. 

 I also just realised that NiFi isn't using Quartz at all for its 
scheduling.(besides the CronExpression parser class). QuartzSchedulingAgent 
would make on

Re: NiFi Cron scheduling

2016-12-21 Thread Joe Percivall
Totally didn't realized I accidentally moved this off of the mailing list. 
Brining it back.

--

I just realized as well that NiFi is only using Quartz for the CronExpression. 
I think this bug is due to the logic going on in this block[1]. One thing 
specifically that worries me is in the CronExpression code here[2]. It sets the 
milliseconds to 0 when doing the computation and then in NiFi's next line it 
finds the delay using milliseconds.

That said, this code is part of the initial contribution and has been running 
for many years. I'm surprised this has never come up before.

I can write up a Jira ticket for this later today.

[1] 
https://github.com/apache/nifi/blob/c10d11d378ffd7c306830e24d50c5befc98a/nifi-nar-bundles/nifi-framework-bundle/nifi-framework/nifi-framework-core/src/main/java/org/apache/nifi/controller/scheduling/QuartzSchedulingAgent.java#L177-L177
[2] 
https://github.com/quartz-scheduler/quartz/blob/quartz-2.2.1/quartz-core/src/main/java/org/quartz/CronExpression.java#L1170

- - - - - - 
Joseph Percivall
linkedin.com/in/Percivall
e: joeperciv...@yahoo.com



On Wednesday, December 21, 2016, 5:33:32 AM EST, Davy De Waele 
<ddewa...@gmail.com> wrote:
Hi,

Also on 10.10.5. A colleague of mine has the same issue on his mac. Here is his 
output

2016-12-21 08:32:00,046 DEBUG [Timer-Driven Process Thread-2] 
o.a.n.c.scheduling.QuartzSchedulingAgent Finished task for 
GenerateFlowFile[id=20476217-0159-1000-63f4-7a3821b0f21a]; next scheduled time 
is at Wed Dec 21 08:34:00 CET 2016 after a delay of 119954 milliseconds
2016-12-21 08:34:00,001 DEBUG [Timer-Driven Process Thread-9] 
o.a.n.c.scheduling.QuartzSchedulingAgent Finished task for 
GenerateFlowFile[id=20476217-0159-1000-63f4-7a3821b0f21a]; next scheduled time 
is at Wed Dec 21 08:36:00 CET 2016 after a delay of 11 milliseconds
2016-12-21 08:36:00,002 DEBUG [Timer-Driven Process Thread-7] 
o.a.n.c.scheduling.QuartzSchedulingAgent Finished task for 
GenerateFlowFile[id=20476217-0159-1000-63f4-7a3821b0f21a]; next scheduled time 
is at Wed Dec 21 08:38:00 CET 2016 after a delay of 119998 milliseconds
2016-12-21 08:37:59,999 DEBUG [Timer-Driven Process Thread-6] 
o.a.n.c.scheduling.QuartzSchedulingAgent Finished task for 
GenerateFlowFile[id=20476217-0159-1000-63f4-7a3821b0f21a]; next scheduled time 
is at Wed Dec 21 08:38:00 CET 2016 after a delay of 1 milliseconds
2016-12-21 08:38:00,003 DEBUG [Timer-Driven Process Thread-6] 
o.a.n.c.scheduling.QuartzSchedulingAgent Finished task for 
GenerateFlowFile[id=20476217-0159-1000-63f4-7a3821b0f21a]; next scheduled time 
is at Wed Dec 21 08:40:00 CET 2016 after a delay of 119997 milliseconds
2016-12-21 08:40:00,002 DEBUG [Timer-Driven Process Thread-6] 
o.a.n.c.scheduling.QuartzSchedulingAgent Finished task for 
GenerateFlowFile[id=20476217-0159-1000-63f4-7a3821b0f21a]; next scheduled time 
is at Wed Dec 21 08:42:00 CET 2016 after a delay of 119998 milliseconds
2016-12-21 08:42:00,004 DEBUG [Timer-Driven Process Thread-8] 
o.a.n.c.scheduling.QuartzSchedulingAgent Finished task for 
GenerateFlowFile[id=20476217-0159-1000-63f4-7a3821b0f21a]; next scheduled time 
is at Wed Dec 21 08:44:00 CET 2016 after a delay of 119996 milliseconds
2016-12-21 08:44:00,002 DEBUG [Timer-Driven Process Thread-6] 
o.a.n.c.scheduling.QuartzSchedulingAgent Finished task for 
GenerateFlowFile[id=20476217-0159-1000-63f4-7a3821b0f21a]; next scheduled time 
is at Wed Dec 21 08:46:00 CET 2016 after a delay of 119998 milliseconds

So it seems that sometimes the schedule is fired too soon. In this case :
* processor task fiinished at 08:37:59,999, a result of the job that was 
scheduled at 08:36:00,002 ( I assume that for some reason this job was 
scheduled x ms too soon, causing the GenerateFlowFileProcessor to finish the 
job early)
* processor task fiinished at 08:38:00,003, a result of the job that was 
scheduled at 08:37:59,999 ( this gave the scheduler a window to schedule an 
additional job that could be completed).
I'm running it in a docker container now and don't have any issues.
As soon as I run the distribution directly on my mac I can reproduce it. 

 I also just realised that NiFi isn't using Quartz at all for its 
scheduling.(besides the CronExpression parser class). QuartzSchedulingAgent 
would make one to believe that scheduling is handled by a Quartz engine.

My mac is a pretty std dev laptop. Lots of stuff installed, but nothing that 
could explain this behavior

Should I log a ticket for this, or post this on the mailinglist ?


On Tue, Dec 20, 2016 at 5:03 PM, Joe Percivall <joeperciv...@yahoo.com> wrote:
> I am running on a Mac version 10.10.5. I pulled down the 1.1.0 zip, created a 
> flow with a GenerateFF -> ReplaceText -> LogAttribute (one for success and 
> one for failure). I configured GenerateFF with a Run Schedule of "0 0/2 * * * 
> ?" and didn't change anything else. I let it run for over an hour and haven't 
> seen any errors.

Re: ReplaceText and special characters

2016-12-18 Thread Joe Percivall
This question actually gets back to a discussion on the user entering literal 
vs. escaped text. In the NiFi UI the user inputs the text into the box and then 
it is converted into a Java String which gets automatically escaped in order to 
pass along the string as the user wrote it (so a processor would see the 
literal characters "\" and "n" when the user wrote "\n"). Though sometimes (as 
evidenced by this case) the user wants the control character instead of the 
literal values entered and Koji's suggestion of using EL as a work-around is 
great. That said, I do believe that "${literal('\r')}" can be used instead so 
that a replace isn't needed.
 
Joe


- - - - - - Joseph Percivall
linkedin.com/in/Percivall
e: joeperciv...@yahoo.com



On Sunday, December 18, 2016 9:06 PM, Koji Kawamura  
wrote:



Hello,

I did some experiments to see if I can append a carriage return.
GenerateFlowFile generating 10 random bytes, followed by:

1. ReplaceText
  - Replacement Value: SHIFT-ENTER
  - Replacement Strategy: Append
2. ReplaceText
  - Replacement Value: ${literal(''):replaceFirst('','\r')}
  - Replacement Strategy: Append
3. ExecuteStreamCommand
  - Command Arguments: -0;printf;%s\r
  - Command Path: xargs

Results:
- All of above processors (1, 2 and 3) generated flow file containing 11 bytes
- Saved the result flow files, then confirmed bytes with hexdump command:

- #1: SHIFT-ENTER added '0a'
000 61 51 ed f1 8f ab be a1 3d 7a 0a
00b

- #2: '\r' seems working, it added CR '0d'
000 61 51 ed f1 8f ab be a1 3d 7a 0d
00b

- #3: xargs and printf can also add CR '0d'
000 61 51 ed f1 8f ab be a1 3d 7a 0d
00b

>From above experiment result, NiFi Expression Language or printf
command will be able to provide results you wanted. Please let me know
if those are different than your need.

For the 2nd question, usage of GenerateFlowFile, if it does what you
need, I think it's just fine. If specific byte array is needed for
some reason, then I'd do use FetchFlowFile.

Hope this helps.

Thanks,
Koji


On Mon, Dec 19, 2016 at 6:26 AM, ddewaele  wrote:
> Hi,
>
> I need to send a byte sequence to a TCP socket every 10 minutes.
>
> I've setup a GenerateFlowFile processor to generate 1 random byte every 10
> minutes, followed by a replaceText processor that will replace that 1 byte
> with my byte sequence (a string literal).
>
> I can use SHIFT-ENTER in the ReplaceText processor to generate newlines, but
> I would like to generate a carriage return instead of a newline.
>
> Is this possible with the ReplaceText processor ? I've tried using "\r" ,
> "\\r" in both regex and literal mode, but I cannot the carriage return in
> the outgoing flowfile.
>
> Any ideas on how to do this with a standard processor ?
>
> Also, is there another way to generate a flowfile in a CRON-like fashion ? I
> read that the GenerateFlowFile is typically used for load testing, where
> here it used to trigger a CRON based flow. I feel like I'm abusing the
> GenerateFlowFile processor for this.
>
> Thanks.
>
>
>
> --
> View this message in context: 
> http://apache-nifi-users-list.2361937.n4.nabble.com/ReplaceText-and-special-characters-tp480.html
> Sent from the Apache NiFi Users List mailing list archive at Nabble.com.


Re: ExecuteStreamcommand

2016-12-16 Thread Joe Percivall
Hey Juan,

I believe you need to escape the escape, so in order to do a double quote it 
would be \\" as the property value.
 

Joe
- - - - - - 
Joseph Percivall
linkedin.com/in/Percivall
e: joeperciv...@yahoo.com



On Friday, December 16, 2016 2:05 PM, Juan Sequeiros  
wrote:



Hi all,

Escape character "\" does not seem to work on ExecuteStreamCommand "command 
arguments" section, any suggestions?

I want to escape double quotes but the command is taking the backslash as 
literal.

thanks


Re: Controller services visibility problem

2016-11-17 Thread Joe Percivall
Hello Panos,
With the 1.0.0 update, security policies were added to process groups and 
controller level features. This means that there is now a difference between 
creating a Controller Service (CS) which is scoped to the controller (ie. for 
Reporting tasks) and creating a CS which is scoped to a process group. 
So in order to create a CS that is available to all processors, just create it 
in the root process group. All processors will be able to reference it since 
they are all within the scope of the root group.
Joe 
- - - - - - Joseph Percivalllinkedin.com/in/Percivalle: joeperciv...@yahoo.com
 

On Thursday, November 17, 2016 9:11 AM, Panos Geo 
 wrote:
 

  Hello all,
 
We are using NiFi 1.0 over http without any authentication, so all our users 
are coming to NiFi as anonymous. 
 
The problem we are having is of controller services visibility. So if we create 
a controller service (say database connection) from the top right option of the 
canvas, this is not visible within a group of processors. Also the contrary is 
also true, if we create a controller service for a group of processors, this is 
not visible to the rest of the canvas.
 
Is there a way to assign visibility for a controller service, e.g. set global 
visibility for a service, so that we don’t have to recreate it in all the 
groups of processors that need it? 
As a side note, we didn't have this problem with NiFi versions before 1.0. 
 
Many thanks,
Panos 


   

Re: Getting the number of logs

2016-11-09 Thread Joe Percivall
Hello Sai,
I'm gonna paraphrase what I think your use-case is first, let me know if this 
is wrong. You want to keep track of the number of logs coming in and every hour 
you want to document how many came in in that hour. Currently NiFi doesn't 
handle this type of "stateful" event processing very well and with what NiFi 
currently offers you are very limited. 
That said, I've done some work in order to help NiFi into the "stateful" event 
processing space that may help you. I currently have an open PR[1] to add state 
to UpdateAttribute. This allows you keep stateful values (like a count) and 
even acts as a Stateful Rule Engine (using UpdateAttribute's 'Advanced Tab'). 
So in order to solve your use-case you can set up one stateful UpdateAttribute 
along your main flow that counts all your incoming FlowFiles. Then add a 
GenerateFlowFile processor running on an hourly cron job that is routed to the 
stateful UpdateAttribute to act as a trigger. When the Stateful UpdateAttribute 
is triggered it adds the count as an attribute of the triggering flowfile and 
resets the count. Then just do a RouteOnAttribute after the stateful 
UpdateAttribute to separate the triggering FlowFile from the incoming data and 
put it to ElasticSearch.
That may not have been the best explanation and if not I can create a template 
and take screenshots tomorrow if you're interested. One thing to keep in mind 
though, this stateful processing does have a limitation in this PR in that it 
will only work with local state. So no tracking counts across a whole cluster, 
just per node.
[1] https://github.com/apache/nifi/pull/319
Joe - - - - - - Joseph Percivalllinkedin.com/in/Percivalle: 
joeperciv...@yahoo.com
 

On Wednesday, November 9, 2016 11:41 AM, "Peddy, Sai" 
 wrote:
 

  Hi All,    Previously posted this in the Dev listserv moving it over to 
the Users listserv    I’m currently working on a use case to be able to track 
the number of individual logs that come in and put that information in 
ElasticSearch. I wanted to see if there is an easy way to do this and whether 
anyone had any good ideas?    Current approach I am considering: Route the Log 
Files coming in – to a Split Text & Route Text Processor to make sure no empty 
logs get through and get the individual log count when files contain multiple 
logs – At the end of this the total number of logs are visible in the UI queue, 
where it displays the queueCount, but this information is not readily available 
to any processor. Current thought process is that I can use the ExecuteScript 
Processor and update a local file to keep track and insert the document into 
elastic search hourly.    Any advice would be appreciated    Thanks, Sai Peddy 
 The information contained in this e-mail is confidential and/or proprietary to 
Capital One and/or its affiliates and may only be used solely in performance of 
work or services for Capital One. The information transmitted herewith is 
intended only for use by the individual or entity to which it is addressed. If 
the reader of this message is not the intended recipient, you are hereby 
notified that any review, retransmission, dissemination, distribution, copying 
or other use of, or taking of any action in reliance upon this information is 
strictly prohibited. If you have received this communication in error, please 
contact the sender and delete the material from your computer.

   

Re: MQTT Publisher errors

2016-11-07 Thread Joe Percivall
Not a problem Michail, glad to hear to hear it wasn't something I messed up, 
hah! Regardless, thanks for responding and letting us know the reason for the 
failure.

Feel free to message the list again if you encounter other issues,
Joe
- - - - - - 
Joseph Percivall
linkedin.com/in/Percivall
e: joeperciv...@yahoo.com



On Monday, November 7, 2016 5:18 AM, michail salichos 
<michail.salic...@gmail.com> wrote:



False alarm, we had recently enabled client id authentication enabled in our 
broker and I wasn't aware of it.

Though many thanks for the quick response.

Michail


On Thu, Nov 3, 2016 at 2:39 PM, Joe Percivall <joeperciv...@yahoo.com> wrote:

Hello,
>
>
>Sorry you are having problems with PublishMQTT. What MQTT broker are you 
>trying to hit? and could you pass along what configuration you have set 
>(including scheduling tab)? I'd like to try and reproduce if possible.
>
>
>I'm guessing there is nothing more to that stacktrace? I ask (hoping that 
>there is) because no where in the stacktrace does it indicate a line in the 
>NiFi codebase which would mean it's potentially a timing issue related to the 
>concurrency of background threads. Making it harder to track down and fix.
>
>
>Any other insight into your configuration/set-up would be appreciated.
> 
>Thanks,
>Joe
>- - - - - - 
>Joseph Percivall
>linkedin.com/in/Percivall
>e: joeperciv...@yahoo.com
>
>
>
>
>
>On Thursday, November 3, 2016 5:10 AM, michail salichos 
><michail.salic...@gmail.com> wrote:
>
>
>
>Hello,
>
>
>I am getting random errors using MQTT Publisher processor from v1.0.0.
>
>
>Although the exceptions are raised, sometimes messages are published, 
>sometimes not. I have not managed to find a pattern nor the cause, it seems to 
>be totally random.
>
>
>I have tried publishing messages to the same broker, using the same MQTT 
>client configuration (e.g. QoS) and the same credentials using mosquitto_pub 
>and JAVA paho custom client (same version as the one used in NiFI 1.0.0), and 
>everything works well. Only when I use NiFI MQTT publisher I get these errors. 
>
>
>Any tips or hints?
>
>
>
>
>(*)
>-MQTT client is disconnected and re-connecting failed. Transferring FlowFile 
>to fail and yielding
>
>
>or
>
>
>-o.a.nifi.processors.mqtt. PublishMQTT PublishMQTT[id=f6ee1833-0157- 
>1000-63a0-ad3c0072ca5f] Was disconnected from client or was never connected, 
>attempting to connect
>
>
>or
>
>
>-o.a.nifi.processors.mqtt. PublishMQTT PublishMQTT[id=f6f5b090-0157- 
>1000-d924-211d02de8856] Connection to tcp://api-test.iotcloud. 
>swisscom.com:1883 lost
>org.eclipse.paho.client. mqttv3.MqttException: Connection lost
>at org.eclipse.paho.client. mqttv3.internal.CommsReceiver. 
> run(CommsReceiver.java:146) [org.eclipse.paho.client. mqttv3-1.0.2.jar:na]
>at java.lang.Thread.run(Thread. java:745) [na:1.8.0_91]
>Caused by: java.io.EOFException: null
>at java.io.DataInputStream. readByte(DataInputStream.java: 267) 
> ~[na:1.8.0_91]
>at org.eclipse.paho.client. mqttv3.internal.wire. MqttInputStream. 
> readMqttWireMessage( MqttInputStream.java:65) ~[org.eclipse.paho.client. 
> mqttv3-1.0.2.jar:na]
>at org.eclipse.paho.client. mqttv3.internal.CommsReceiver. 
> run(CommsReceiver.java:107) [org.eclipse.paho.client. mqttv3-1.0.2.jar:na]
>... 1 common frames omitted
>
>
>
>


Re: MQTT Publisher errors

2016-11-03 Thread Joe Percivall
Hello,
Sorry you are having problems with PublishMQTT. What MQTT broker are you trying 
to hit? and could you pass along what configuration you have set (including 
scheduling tab)? I'd like to try and reproduce if possible.
I'm guessing there is nothing more to that stacktrace? I ask (hoping that there 
is) because no where in the stacktrace does it indicate a line in the NiFi 
codebase which would mean it's potentially a timing issue related to the 
concurrency of background threads. Making it harder to track down and fix.
Any other insight into your configuration/set-up would be appreciated. 
Thanks,Joe- - - - - - Joseph Percivalllinkedin.com/in/Percivalle: 
joeperciv...@yahoo.com
 

On Thursday, November 3, 2016 5:10 AM, michail salichos 
 wrote:
 

 Hello,
I am getting random errors using MQTT Publisher processor from v1.0.0.
Although the exceptions are raised, sometimes messages are published, sometimes 
not. I have not managed to find a pattern nor the cause, it seems to be totally 
random.
I have tried publishing messages to the same broker, using the same MQTT client 
configuration (e.g. QoS) and the same credentials using mosquitto_pub and JAVA 
paho custom client (same version as the one used in NiFI 1.0.0), and everything 
works well. Only when I use NiFI MQTT publisher I get these errors. 
Any tips or hints?

(*)-MQTT client is disconnected and re-connecting failed. Transferring FlowFile 
to fail and yielding
or
-o.a.nifi.processors.mqtt.PublishMQTT 
PublishMQTT[id=f6ee1833-0157-1000-63a0-ad3c0072ca5f] Was disconnected from 
client or was never connected, attempting to connect
or
-o.a.nifi.processors.mqtt.PublishMQTT 
PublishMQTT[id=f6f5b090-0157-1000-d924-211d02de8856] Connection to 
tcp://api-test.iotcloud.swisscom.com:1883 
lostorg.eclipse.paho.client.mqttv3.MqttException: Connection lost        at 
org.eclipse.paho.client.mqttv3.internal.CommsReceiver.run(CommsReceiver.java:146)
 [org.eclipse.paho.client.mqttv3-1.0.2.jar:na]        at 
java.lang.Thread.run(Thread.java:745) [na:1.8.0_91]Caused by: 
java.io.EOFException: null        at 
java.io.DataInputStream.readByte(DataInputStream.java:267) ~[na:1.8.0_91]       
 at 
org.eclipse.paho.client.mqttv3.internal.wire.MqttInputStream.readMqttWireMessage(MqttInputStream.java:65)
 ~[org.eclipse.paho.client.mqttv3-1.0.2.jar:na]        at 
org.eclipse.paho.client.mqttv3.internal.CommsReceiver.run(CommsReceiver.java:107)
 [org.eclipse.paho.client.mqttv3-1.0.2.jar:na]        ... 1 common frames 
omitted 


   

Re: Stable version of 1.x?

2016-10-10 Thread Joe Percivall
Hello John,

Ah sorry, that mention of "Non-stable" is my fault. The current 1.0.0 release 
is stable. That note was for the 1.0.0-BETA release and I forgot to remove it 
when changing over to 1.0.0. 

It have been removed.

Thank you for brining this up!
Joe 

- - - - - - 
Joseph Percivall
linkedin.com/in/Percivall
e: joeperciv...@yahoo.com



On Monday, October 10, 2016 8:06 AM, John Wiesel  wrote:



Hello,

the release notes state that version 1.0.0 is non-stable.
I'd be glad to know if there are any details about any upcoming stable 
release (release schedule, issue tracker, or similar).
I could not find any further information in the documentation or in the 
mailing list archive.

Thanks and best wishes
John

-- 
John Wiesel
Team Lead Information Extraction
itembase.com

Wilhelm Strasse 118 (Aufgang B) | 10963 Berlin | Germany
Email:  j...@itembase.biz

itembase GmbH | Handelsregister: Berlin (Charlottenburg) | HRB 138369 B
| Verantwortlich für den Inhalt nach § 55 Abs. 2 RStV: Geschäftsführer
Stefan Jørgensen


Re: Processor that Decompresses Files?

2016-10-05 Thread Joe Percivall
Hello Keren,

The "decompress" mode is an option of the CompressContent[1] processor and 
should solve your use-case.

[1] 
https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi.processors.standard.CompressContent/index.html
 
Joe

- - - - - - 
Joseph Percivall
linkedin.com/in/Percivall
e: joeperciv...@yahoo.com



On Wednesday, October 5, 2016 9:45 AM, "Tseytlin, Keren" 
 wrote:




Hi all,
 
I am wondering if it would be useful to create a processor whose purpose is to 
unzip large files. I currently have a lot of zip log files coming in, and we 
need to unpackage them in Nifi. I know there is an ExecuteStreamCommand 
processor where I can put in a command to do the unzipping, which effectively 
does the same thing. Is it best practice for me to use the ExecuteStreamCommand 
processor, or perhaps make something like a DecompressFile processor?
 
Best,
Keren


The information contained in this e-mail is confidential and/or proprietary to 
Capital One and/or its affiliates and may only be used solely in performance of 
work or services for Capital One. The information transmitted herewith is 
intended only for use by the individual or entity to which it is addressed. If 
the reader of this message is not the intended recipient, you are hereby 
notified that any review, retransmission, dissemination, distribution, copying 
or other use of, or taking of any action in reliance upon this information is 
strictly prohibited. If you have received this communication in error, please 
contact the sender and delete the material from your computer.


Re: Provenance expiration error

2016-09-10 Thread Joe Percivall
Hello Adam,


Sorry no one has responded yet.

Taking a look at the stack trace, I think you are running into NIFI-2087[1]. 
This was addressed in 1.0.0.
[1] https://issues.apache.org/jira/browse/NIFI-2087



Joe 
- - - - - - 
Joseph Percivall
linkedin.com/in/Percivall
e: joeperciv...@yahoo.com



On Saturday, September 10, 2016 12:42 AM, Adam J. Shook  
wrote:



--bump--

Any ideas on the below issue?

Thanks,
--Adam


On Wed, Aug 31, 2016 at 4:46 PM, Adam J. Shook  wrote:

Hello,
>
>
>I continue to receive the below error regarding deleting entries from the 
>provenance repository.  The Googles aren't returning anything too helpful.
>
>
>NiFi v0.7.0 on RHEL 6.8, JDK 1.8.0_60
>
>
>Any ideas?
>
>
>Thanks,
>--Adam
>
>
>2016-08-31 16:42:17,763 WARN [Provenance Maintenance Thread-3] o.a.n.p. 
>PersistentProvenanceRepository Failed to perform Expiration Action 
>org.apache.nifi.provenance. lucene.DeleteIndexAction@ 4aff1156 on Provenance 
>Event file /data01/nifi/provenance_ repository/5190858.prov.gz due to 
>java.lang. IllegalArgumentException: Cannot skip to block -1 because the value 
>is negative; will not perform additional Expiration Actions on this file at 
>this time
>2016-08-31 16:42:17,763 WARN [Provenance Maintenance Thread-3] o.a.n.p. 
>PersistentProvenanceRepository
>java.lang. IllegalArgumentException: Cannot skip to block -1 because the value 
>is negative
>at org.apache.nifi.provenance. StandardRecordReader. skipToBlock( 
> StandardRecordReader.java:111) ~[nifi-persistent-provenance- 
> repository-0.7.0.jar:0.7.0]
>at org.apache.nifi.provenance. StandardRecordReader. getMaxEventId( 
> StandardRecordReader.java:458) ~[nifi-persistent-provenance- 
> repository-0.7.0.jar:0.7.0]
>at org.apache.nifi.provenance. lucene.DeleteIndexAction. 
> execute(DeleteIndexAction. java:52) ~[nifi-persistent-provenance- 
> repository-0.7.0.jar:0.7.0]
>at org.apache.nifi.provenance. PersistentProvenanceRepository 
> .purgeOldEvents( PersistentProvenanceRepository .java:907) 
> ~[nifi-persistent-provenance- repository-0.7.0.jar:0.7.0]
>at org.apache.nifi.provenance. PersistentProvenanceRepository $2.run( 
> PersistentProvenanceRepository .java:261) [nifi-persistent-provenance- 
> repository-0.7.0.jar:0.7.0]
>at java.util.concurrent. Executors$RunnableAdapter. 
> call(Executors.java:511) [na:1.8.0_60]
>at java.util.concurrent. FutureTask.runAndReset( FutureTask.java:308) 
> [na:1.8.0_60]
>at java.util.concurrent. ScheduledThreadPoolExecutor$ 
> ScheduledFutureTask.access$ 301( ScheduledThreadPoolExecutor. java:180) 
> [na:1.8.0_60]
>at java.util.concurrent. ScheduledThreadPoolExecutor$ 
> ScheduledFutureTask.run( ScheduledThreadPoolExecutor. java:294) [na:1.8.0_60]
>at java.util.concurrent. ThreadPoolExecutor.runWorker( 
> ThreadPoolExecutor.java:1142) [na:1.8.0_60]
>at java.util.concurrent. ThreadPoolExecutor$Worker.run( 
> ThreadPoolExecutor.java:617) [na:1.8.0_60]
>at java.lang.Thread.run(Thread. java:745) [na:1.8.0_60]
>
>


Re: Request for enhancement

2016-08-29 Thread Joe Percivall
- Moving users list to BCC

Hello Gunjan,

This seems like a good potential idea. The proper place to submit the 
suggestion is through the Apache NiFi Jira[1]. It can more easily be discussed 
and worked on there.

[1] https://issues.apache.org/jira/browse/NIFI


Suggestions/ideas from users are always welcome!

Joe 
- - - - - - 
Joseph Percivall
linkedin.com/in/Percivall
e: joeperciv...@yahoo.com



On Tuesday, August 30, 2016 12:06 PM, Gunjan Dave  
wrote:



Seems like below didnot get delivered.


On Mon, Aug 29, 2016, 12:30 PM Gunjan Dave 
wrote:

> Hi Team,
> I would like to propose if the following enhacement if seen feasible can
> be incorporated in the provenance graph.
>
> Current graph only shows the type, rather i would like to suggest if we
> can actually put in the component name along with processor type. That
> would make the graph more unique to each flow and more visually intuitive.
>
> just a suggestion, not mandatory.
>


Re: How to deal with decimals while they're not supported?

2016-08-12 Thread Joe Percivall
Hey Stephane,
Currently working with Decimals in NiFi is like putting a square peg in a round 
hole. I haven't tried to do it much but for a simple use-case like yours I 
believe there are two options. 
1: Use Expression Language and a Regex to move the decimal2: (Not 100% sure 
this works) Use the scripting processors to write a script to do the routing
For option 1 I think you can do something like this 
"${myAttr:replaceAll("0.(\d)(\d*)", "$1"):lt(1)}. This should "move" the 
decimal over one place so that you can check it against the 1. This assumes 
there is nothing else in that attribute.For option 2, Matt Burgess would need 
to weigh-in.
As for NIFI-1662, it got de-prioritzed in that I got super busy with everything 
else I was doing for MiNiFi-0.0.1 and NiFi-1.0.0 (on-going) and it went to the 
back-burner. It's something I really want to get in though, along with a few 
other changes, because I think they open NiFi up to many new possibilities. I'd 
like to do it (decimals) in 1.1.0 since it wouldn't be a breaking change.
Hope that helps,Joe - - - - - - Joseph Percivalllinkedin.com/in/Percivalle: 
joeperciv...@yahoo.com
 

On Thursday, August 11, 2016 8:16 PM, Stéphane Maarek 
 wrote:
 

 Hi,
I have a flow in which I extract an attribute from json using jsonpath. That 
attribute happens to be a decimal number (0.123). I wanted to do a simple 
operation such as myAttr:lt(0.1) but obviously that won't work. What also won't 
work is myAttr:multiply(10):lt(1). I'm kinda stuck and I really need this logic 
to be working. What do you advise as a workaround?
Also, I've seen there is a JIRA for this: 
https://issues.apache.org/jira/browse/NIFI-1662 but stuff hasn't moved much 
since it first appeared. Not sure if it got de-prioritized or something
Congrats on the 1.0.0 beta, it looks great !!
Cheers,Stephane

  

[ANNOUNCE] Apache NiFi 1.0.0-Beta release

2016-08-09 Thread Joe Percivall
Hello
The Apache NiFi team would like to announce the release of Apache NiFi
1.0.0-BETA.
The upcoming 1.0.0 release will mark the culmination of a lot of work over
the last few months with many new framework level features being added.
This Beta release was cut to give our Apache NiFi users a chance to help
test this upcoming major release. We encourage users to download the
binary, give the new UI a try, and report any bugs to the Apache NiFi Jira.
https://issues.apache.org/jira/browse/NIFI/
Disclaimer: This release is meant for testing and may not be stable in
terms of features or functionality.
The release artifacts can be downloaded from here:
https://nifi.apache.org/download.html
Maven artifacts have been made available here:
https://repository.apache.org/content/repositories/releases/org/apache/nifi/
Issues closed/resolved for this list can be found here:
https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12316020=12338066
Release note highlights can be found here:
https://cwiki.apache.org/confluence/display/NIFI/Release+Notes#ReleaseNotes-Version1.0.0-Beta
Thank youThe Apache NiFi team


Re: InvokeHTTP[xxxx] failed to process session due to java.lang.NoClassDefFoundError: android/util/Log:

2016-06-29 Thread Joe Percivall
Hello Scott,


I got the chance to dig in further and it appears the underlying 3rd party 
library (okhttp-digest) uses "android.util.Log" to log a couple cases such as 
the one you are running into[1].

The only dependency okhttp-digest lists in it's pom is OkHttp[2] (also true for 
the latest version[3]). So while OkHttp doesn't assume you're running on 
Android, okhttp-digest (okhttp's recommended solution for doing Digest 
Authentication) does.

I will open an issue for okhttp-digest and raise a Jira for us to upgrade 
OkHttp and okhttp-digest versions if/when a fix is implemented. If 
okhttp-digest chooses not to change logger and OkHttp continues to only use 
that interceptor we will need to reassess the processor/feature. 
[1] 
https://github.com/rburgst/okhttp-digest/blob/d8ea75368ad802aefc92f8b096736a03b953f0e4/src/main/java/com/burgstaller/okhttp/AuthenticationCacheInterceptor.java#L34

[2] https://www.versioneye.com/java/com.burgstaller:okhttp-digest/0.6
[3] https://www.versioneye.com/java/com.burgstaller:okhttp-digest/1.5

Sorry for not catching this in testing and thank you for your patience,
Joe
- - - - - - 
Joseph Percivall
linkedin.com/in/Percivall
e: joeperciv...@yahoo.com



On Wednesday, June 29, 2016 1:51 PM, Scott Stroud <scottstr...@gmail.com> wrote:



I manually put in the okhttp-digest-1.5.jar and its required okhttp-3.2.0.jar.  
But unfortunately I think the package structure changed in the  okhttp jar and 
the InvokeHttp processor code fails then because it refers to an old package 
structure then too.  I could've missed something in my patching process as well 
since it was very manual.

Unfortunately I do not have the time right now to make all the updates to 
resolve this.  But I do think this is a significant bug if InvokeHTTP does not 
support digest authentication (across batches). 


org.apache.nifi.processor.Processor: Provider 
org.apache.nifi.processors.standard.InvokeHTTP could not be instantiated
java.util.ServiceConfigurationError: org.apache.nifi.processor.Processor: 
Provider org.apache.nifi.processors.standard.InvokeHTTP could not be 
instantiated
at java.util.ServiceLoader.fail(ServiceLoader.java:232) ~[na:1.8.0_72]
at java.util.ServiceLoader.access$100(ServiceLoader.java:185) ~[na:1.8.0_72]
at java.util.ServiceLoader$LazyIterator.nextService(ServiceLoader.java:384) 
~[na:1.8.0_72]
at java.util.ServiceLoader$LazyIterator.next(ServiceLoader.java:404) 
~[na:1.8.0_72]
at java.util.ServiceLoader$1.next(ServiceLoader.java:480) ~[na:1.8.0_72]
at 
org.apache.nifi.nar.ExtensionManager.loadExtensions(ExtensionManager.java:107) 
~[nifi-nar-utils-0.6.1.jar:0.6.1]
at 
org.apache.nifi.nar.ExtensionManager.discoverExtensions(ExtensionManager.java:88)
 ~[nifi-nar-utils-0.6.1.jar:0.6.1]
at org.apache.nifi.NiFi.(NiFi.java:120) ~[nifi-runtime-0.6.1.jar:0.6.1]
at org.apache.nifi.NiFi.main(NiFi.java:227) ~[nifi-runtime-0.6.1.jar:0.6.1]
Caused by: java.lang.NoClassDefFoundError: com/squareup/okhttp/Authenticator
at java.lang.Class.getDeclaredConstructors0(Native Method) ~[na:1.8.0_72]
at java.lang.Class.privateGetDeclaredConstructors(Class.java:2671) 
~[na:1.8.0_72]
at java.lang.Class.getConstructor0(Class.java:3075) ~[na:1.8.0_72]
at java.lang.Class.newInstance(Class.java:412) ~[na:1.8.0_72]
at java.util.ServiceLoader$LazyIterator.nextService(ServiceLoader.java:380) 
~[na:1.8.0_72]
... 6 common frames omitted
Caused by: java.lang.ClassNotFoundException: com.squareup.okhttp.Authenticator
at java.net.URLClassLoader.findClass(URLClassLoader.java:381) ~[na:1.8.0_72]
at java.lang.ClassLoader.loadClass(ClassLoader.java:424) ~[na:1.8.0_72]
at java.lang.ClassLoader.loadClass(ClassLoader.java:357) ~[na:1.8.0_72]
... 11 common frames omitted




On Tue, Jun 28, 2016 at 2:16 PM, Joe Percivall <joeperciv...@yahoo.com> wrote:

Okhttp has elected to not have built-in support for Digest Auth[1] and instead 
relies on third-party support offered by the okhttp-digest interceptor[2]. 
Something in this dependency 
(com.burgstaller.okhttp.AuthenticationCacheInterceptor from the stacktrace) is 
causing problems with the logs.
>
>Without digging in fully I notice that we are on version 0.6 (released in 
>December 2015) and the current release of okhttp-digest is 1.5 (released last 
>Month). This issue may be resolved in the latest release.
>
>Scott are you in a position to try upgrading the version to see if it does fix 
>the issue?
>
>[1] https://github.com/square/okhttp/issues/205
>[2] https://github.com/rburgst/okhttp-digest
>
>Joe
>
>- - - - - -
>Joseph Percivall
>linkedin.com/in/Percivall
>e: joeperciv...@yahoo.com
>
>
>
>
>On Tuesday, June 28, 2016 1:34 PM, Joe Witt <joe.w...@gmail.com> wrote:
>
>
>
>scott
>
>glad you found a path to keep making progress for the moment.
>
>anyone else in community familiar with okhttp that can help look into
>this and raise the appropria

Re: Replace Text

2016-06-13 Thread Joe Percivall
Awesome, and what processor were you planning to use to split on "#|#|#"? The 
SplitContent processor[1] can be used to split the content on a sequence of 
text characters which could split on "https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi.processors.standard.SplitContent/index.html
[2] 
https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi.processors.standard.SplitXml/index.html

Joe- - - - - - 
Joseph Percivall
linkedin.com/in/Percivall
e: joeperciv...@yahoo.com




On Monday, June 13, 2016 11:26 AM, Anuj Handa <anujha...@gmail.com> wrote:
Yes that's exactly correct. 


> On Jun 13, 2016, at 11:14 AM, Joe Percivall <joeperciv...@yahoo.com> wrote:
> 
> Sorry I got a bit confused, in your original question you said that you 
> wanted to append the value and I took it that you just wanted to append the 
> value to the end of the line or text. 
> 
> Let me try and restate your goal so I'm sure I understand, ultimately you 
> want to split the incoming FlowFile on each occurrence of " xmlns" and you are planning on using ReplaceText to add "#|#|#" before each 
> occurrence so that it will be easy to split?
> 
> 
> Joe
> - - - - - - 
> Joseph Percivall
> linkedin.com/in/Percivall
> e: joeperciv...@yahoo.com
> 
> 
> 
> On Monday, June 13, 2016 11:05 AM, Anuj Handa <anujha...@gmail.com> wrote:
> 
> 
> 
> Anuj 
> Hi Joe,
> 
> I modified the process per your suggestion but it only works to replace the 
> first occurrence, There are multiple such tags which it doesn't replace. .
> when i used evaluation mode line by line it appended it to every line in the 
> file and not to the one i waned too. 
> 
> 
> 
> 
> On Mon, Jun 13, 2016 at 10:40 AM, Joe Percivall <joeperciv...@yahoo.com> 
> wrote:
> 
> Hello,
>> 
>> In order to use ReplaceText[1] to solely append a value to the end of then 
>> entire text then change the "Replacement Strategy" to "Append" and leave 
>> "Evaluation Mode" as "Entire  Text". This will take whatever is the 
>> "Replacement Value" and append it as a literal(without interpreting 
>> back-references) to the end of the text.
>> 
>> Alternatively, if you want to append to the end of each line then change 
>> "Evaluation Mode" to "Line-by-Line".
>> 
>> [1] 
>> https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi.processors.standard.ReplaceText/index.html
>> 
>> 
>> Hope that helps,
>> Joe
>> - - - - - - Joseph Percivall
>> linkedin.com/in/Percivall
>> e: joeperciv...@yahoo.com
>> 
>> 
>> 
>> 
>> On Monday, June 13, 2016 10:05 AM, Anuj Handa <anujha...@gmail.com> wrote:
>> 
>> 
>> 
>> Hi,
>> 
>> I am trying to read a file and then use replaceText to append a string so I 
>> can spilt the line in the next step. I am nable to make the ReplaceText work.
>> The flowfile is going through as success without the string being appended 
>> or replaced
>> 
>> Any thoughts what i could be doing wrong
>> 


Re: Replace Text

2016-06-13 Thread Joe Percivall
Sorry I got a bit confused, in your original question you said that you wanted 
to append the value and I took it that you just wanted to append the value to 
the end of the line or text. 

Let me try and restate your goal so I'm sure I understand, ultimately you want 
to split the incoming FlowFile on each occurrence of " wrote:



Anuj 
Hi Joe,

I modified the process per your suggestion but it only works to replace the 
first occurrence, There are multiple such tags which it doesn't replace. .
when i used evaluation mode line by line it appended it to every line in the 
file and not to the one i waned too. 




On Mon, Jun 13, 2016 at 10:40 AM, Joe Percivall <joeperciv...@yahoo.com> wrote:

Hello,
>
>In order to use ReplaceText[1] to solely append a value to the end of then 
>entire text then change the "Replacement Strategy" to "Append" and leave 
>"Evaluation Mode" as "Entire  Text". This will take whatever is the 
>"Replacement Value" and append it as a literal(without interpreting 
>back-references) to the end of the text.
>
>Alternatively, if you want to append to the end of each line then change 
>"Evaluation Mode" to "Line-by-Line".
>
>[1] 
>https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi.processors.standard.ReplaceText/index.html
>
>
>Hope that helps,
>Joe
>- - - - - - Joseph Percivall
>linkedin.com/in/Percivall
>e: joeperciv...@yahoo.com
>
>
>
>
>On Monday, June 13, 2016 10:05 AM, Anuj Handa <anujha...@gmail.com> wrote:
>
>
>
>Hi,
>
>I am trying to read a file and then use replaceText to append a string so I 
>can spilt the line in the next step. I am nable to make the ReplaceText work.
>The flowfile is going through as success without the string being appended or 
>replaced
>
>Any thoughts what i could be doing wrong
>


Re: Replace Text

2016-06-13 Thread Joe Percivall
Hello,

In order to use ReplaceText[1] to solely append a value to the end of then 
entire text then change the "Replacement Strategy" to "Append" and leave 
"Evaluation Mode" as "Entire  Text". This will take whatever is the 
"Replacement Value" and append it as a literal(without interpreting 
back-references) to the end of the text.

Alternatively, if you want to append to the end of each line then change 
"Evaluation Mode" to "Line-by-Line".
 
[1] 
https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi.processors.standard.ReplaceText/index.html


Hope that helps,
Joe
- - - - - - Joseph Percivall
linkedin.com/in/Percivall
e: joeperciv...@yahoo.com



On Monday, June 13, 2016 10:05 AM, Anuj Handa  wrote:



Hi,

I am trying to read a file and then use replaceText to append a string so I can 
spilt the line in the next step. I am nable to make the ReplaceText work.  
The flowfile is going through as success without the string being appended or 
replaced

Any thoughts what i could be doing wrong 


Re: Processor Question

2016-06-06 Thread Joe Percivall
For number one, you can also use RouteText[1] with the matching strategy 
"Satisfies Expression". Then as a dynamic property use this expression 
"${lineNo:le(10)}". This will route first 10 lines to the "matched" 
relationship (assuming "Route to each matching Property Name" is not selected). 
This option also allows you to route those unmatched lines elsewhere if you 
need (if not just auto-terminate the "unmatched" relationship).
 
The for number two, instead of ReplaceText, you could also use RouteText. Set 
the matching strategy to "Matches Regular Expression". Then set the dynamic 
property to match everything and end with "unambiguously" (an example being 
"((\w|\W)*unambiguously)"). This will route all the text that matches the Regex 
apart from the end of the file and gives you the option to route the ending 
text differently if needed.

[1] 
https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi.processors.standard.RouteText/index.html


Joe- - - - - - 
Joseph Percivall
linkedin.com/in/Percivall
e: joeperciv...@yahoo.com



On Sunday, June 5, 2016 4:41 AM, Leslie Hartman  wrote:



Matthew:

The modifyBytes processor would be the best if it would allow
   one to
specify the bytes to keep. I could calculate the number of bytes to
   delete,
but when I try and place a variable in the End Offset it says it is
   not in the
  format.

As for SegmentContent and SplitText I have tried both of these.
   The problem
is that it just takes the original file a splits it in to a bunch of
   little files. So if I wanted
say 256 Bytes of a 30 meg file, after running out of memory it would
   give me
125 Million 829 Thousand 119 Files to get rid of.

For the 2nd case ReplaceText should work, I'm just having
   problems getting
the correct syntax. If someone could provide an example of the
   correct syntax
I would appreciate it.

Thank You.

Leslie Hartman


Matthew Clarke wrote:

You may also want to look at using the modifyBytes processor for number 1.
>
>On Jun 4, 2016 1:49 PM, "Thad Guidry"  wrote:
>
>For your 1st case, you can use either SegmentContent by your 256 bytes (or 
>perhaps you can even use SplitText)
>>https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi.processors.standard.SegmentContent/index.html
>>
>>https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi.processors.standard.SplitText/index.html
>>
>>
>>
>>For your 2nd case, you can use ReplaceText
>>https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi.processors.standard.ReplaceText/index.html
>>
>>
>>
>>Thad 
>>+ThadGuidry
>>
>>


Re: Guidance for NiFi output streaming

2016-05-26 Thread Joe Percivall
Hello Stephane,

Just to be sure I have your use-case correct, you are ingesting a continuous 
stream of lat/lon information for various devices. Every 1 second you want to 
take the information from the previous second and write out just the most 
recent lat/lon of each device. 

An important question, do you only want this file to include devices that have 
been in the last second or do you want to write out the last known lat/lon of 
every device seen? That is an important question because it is the difference 
between having to store state or not. If you need the last known of all devices 
seen, and thus need to store state, the use-case gets much trickier.

Another question, what order of magnitude of data are you planning on 
ingesting? If it's relatively low and you're use-case does not need to store 
state, you could create a processor that would analyze all FlowFiles currently 
on the queue to grab the latest lat/lon for each device and then emit a 
FlowFile with a content of the file you want to write. Set it to trigger every 
1 second and it would batch up the latest lat/lon for each device for the 
previous second. This would start to cause problems when it tries to batch up a 
large quantity of FlowFiles, similar to MergeContent.
 
Joe

- - - - - - 
Joseph Percivall
linkedin.com/in/Percivall
e: joeperciv...@yahoo.com



On Thursday, May 26, 2016 1:06 AM, Stéphane Maarek  
wrote:



I have tried a ControlRate but it doesn't work because it seems to stop 
processing once the threshold of 1 is reached, even though I set a grouping 
property (I know there are two different values for my group in my queue). Any 
clue?

On Thu, May 26, 2016 at 2:30 PM Stéphane Maarek  
wrote:

Hi,
>
>
>I need to output some data streaming from multiple devices directly into a map 
>(mapboxjs). 
>
>
>Basically, every 1 second, I want to only write the last data point for each 
>device to a json file. My problem resides in "how to pick the latest data 
>point by device"
>
>
>My incoming flow file has three attributes: device_id, lat, lon. 
>at some point they may queue up like this:
>
>
>1, (-37,20)
>1, (-37.1,20.1)
>2, (-40,30)
>2, (-40.1, 29.9)
>
>
>At the end, I wish to only have the latest point for each device ID
>1, (-37.1,20.1)
>2, (-40.1, 29.9)
>
>How can I design a processor for this?
>
>
>Thanks!
>Stephane


Re: Doing development on nifi

2016-04-28 Thread Joe Percivall
Hello Stéphane,

Just adding on to Matt's and Andy's answers, Andy mentioned Provenance[1] for 
replaying events but I also find it very useful for debugging processors/flows 
as well. Data Provenance is a core feature of NiFi and it allows you to see 
exactly what the FlowFile looked like (attributes and content) before and after 
a processor acted on it as well as the ability to see a map of the journey that 
FlowFile underwent through your flow. The easiest way to see the provenance of 
a processor is to right click on it and then click "Data provenance".

The documentation below should be a great introduction and if you have any 
questions feel free to ask!
 
[1] https://nifi.apache.org/docs/nifi-docs/html/user-guide.html#data-provenance


Joe
- - - - - - 
Joseph Percivall
linkedin.com/in/Percivall
e: joeperciv...@yahoo.com



On Thursday, April 28, 2016 7:30 PM, Matt Burgess  wrote:



Stéphane,

Welcome to NiFi, glad to have you aboard!  May I ask what version you
are using? I believe as of at least 0.6.0, you can view the items in a
queued connection. So for your example, you can have a GetHttp into a
SplitJson, but don't start the SplitJson, just the GetHttp. You will
see any flowfiles generated by GetHttp queued up in the success (or
response?) connection (whichever you have wired to SplitJson). Then
you can right-click on the connection (the line between the
processors) and choose List Queue. In that dialog you can choose an
element by clicking on the Info icon ('i' in a circle) and see the
information about it, including a View button for the content.

The best part is that you don't have to do a "preview" run, then a
"real" run. The data is in the connection's queue, so you can make
alterations to your SplitJson, then start it to see if it works. If it
doesn't, stop it and start the GetHttp again (if stopped) to put more
data in the queue.  For fine-grained debugging, you can temporarily
set the Run schedule for the SplitJson to something like 10 seconds,
then when you start it, it will likely only bring in one flow file, so
you can react to how it works, then stop it before it empties the
queue.

I hope that makes sense, I apologize in advance if I made things more
confusing. The good news is there is a solution to your problem, even
if I am not the right person to describe it :)

Cheers,
Matt


On Thu, Apr 28, 2016 at 7:06 PM, Stéphane Maarek
 wrote:
> Hi,
>
> I'm very new to nifi and love the concept. As part of the process, I'm
> learning. My biggest frustration is that I can't see the data flowing
> through the system as I do development.
>
> Maybe I missed an article or a link, but is it possible to view the data
> while in the flow? I.e. Say I create a get http, I'd like it to fire once,
> get some data so I can see what it looks like. Then if I do a split json,
> I'd like to see if my output of it is what I expected or if I somehow messed
> up, etc etc
>
> I hope my question is clear
>
> Thanks in advance,
> Stéphane


Re: Help with replace method

2016-04-26 Thread Joe Percivall
Hello Igor,

I got your template working by using the below replacement string and changing 
the "Replacement Strategy" to "Always Replace". I've attached a template that 
works for me.

{"test":"${teststr:replaceAll('"','"')}"}


The backslashes are a bit weird because they escape characters and are used to 
escape themselves. So when you're trying to use them explicitly it can lead 
needing to repeat them multiple times (in this case 4).

Hope this helps,
Joe
- - - - - - 
Joseph Percivall
linkedin.com/in/Percivall
e: joeperciv...@yahoo.com



On Tuesday, April 26, 2016 6:10 PM, Igor Kravzov  wrote:



Attached please find the test template. NiFi 0.6.1I am trying to replace " with 
\" in a text.  So "Here "we" go" should become \"Here \"we\" go\"


The call is in ReplaceText processor: {"test":"${teststr:replace('"','\\"')}"}
teststr cerated in UpdateAttribute.


>From some reason unable to make it working. What can be wrong?

Thanks in advance.Fixed_Replace_Test069b7449-0406-41d5-899b-e128143a91f13b98a054-b345-477c-abbf-e6e64cc373030 MB03b98a054-b345-477c-abbf-e6e64cc373038c36844e-3826-4491-827a-5ce33be372f1PROCESSOR0 sec1success3b98a054-b345-477c-abbf-e6e64cc37303abe1e7cd-d862-4b20-8c4a-6bba6d490807PROCESSOR0428901df-02be-46a4-b5bd-11a871d3bef43b98a054-b345-477c-abbf-e6e64cc373030 MB03b98a054-b345-477c-abbf-e6e64cc373033496125f-5737-446e-80f2-918fe2ea0760PROCESSOR0 sec1success3b98a054-b345-477c-abbf-e6e64cc373038c36844e-3826-4491-827a-5ce33be372f1PROCESSOR05ac0bd2e-a325-4b69-abd1-444c01cbefcb3b98a054-b345-477c-abbf-e6e64cc373030 MB03b98a054-b345-477c-abbf-e6e64cc37303abe1e7cd-d862-4b20-8c4a-6bba6d490807PROCESSOR0 sec1success3b98a054-b345-477c-abbf-e6e64cc37303621d8f8a-7f4a-4dea-a6ee-4beff2715617PROCESSOR0abe1e7cd-d862-4b20-8c4a-6bba6d4908073b98a054-b345-477c-abbf-e6e64cc373031716.623391446894207.89790068405205WARN1TIMER_DRIVEN1EVENT_DRIVEN0CRON_DRIVEN1TIMER_DRIVEN0 secCRON_DRIVEN* * * * * ?Delete Attributes ExpressionRegular expression for attributes to be deleted from flowfiles.Delete Attributes ExpressionfalseDelete Attributes Expressionfalsefalsetruefilenamefilenametruefilenamefalsefalsetrueteststrteststrtrueteststrfalsefalsetruefalse30 secDelete Attributes ExpressionRouteOnAttribute.Routefilename${filename}.jsonteststrHere we go00 secTIMER_DRIVEN1 secUpdateAttributefalseAll FlowFiles are routed to this relationshipsuccessSTOPPEDtruetrueorg.apache.nifi.processors.attributes.UpdateAttribute3496125f-5737-446e-80f2-918fe2ea07603b98a054-b345-477c-abbf-e6e64cc373031712.0604.0WARN1TIMER_DRIVEN1EVENT_DRIVEN0CRON_DRIVEN1TIMER_DRIVEN0 secCRON_DRIVEN* * * * * ?Log LeveltracetracedebugdebuginfoinfowarnwarnerrorerrorinfoThe Log Level to use when logging the AttributesLog LevelfalseLog LeveltruefalsefalseLog PayloadtruetruefalsefalsefalseIf true, the FlowFile's payload will be logged, in addition to its attributes; otherwise, just the Attributes will be logged.Log PayloadfalseLog PayloadtruefalsefalseAttributes to LogA comma-separated list of Attributes to Log. If not specified, all attributes will be logged.Attributes to LogfalseAttributes to LogfalsefalsefalseAttributes to IgnoreA comma-separated list of Attributes to ignore. If not specified, no attributes will be ignored.Attributes to IgnorefalseAttributes to IgnorefalsefalsefalseLog prefixLog prefix appended to the log lines. It helps to distinguish the output of multiple LogAttribute processors.Log prefixfalseLog prefixfalsefalsetruefalse30 secLog LevelLog PayloadAttributes to LogAttributes to IgnoreLog prefix00 secTIMER_DRIVEN1 secLogAttributetrueAll FlowFiles are routed to this relationshipsuccessSTOPPEDtruetrueorg.apache.nifi.processors.standard.LogAttribute621d8f8a-7f4a-4dea-a6ee-4beff27156173b98a054-b345-477c-abbf-e6e64cc373031722.33067613624138.0WARN1TIMER_DRIVEN1EVENT_DRIVEN0CRON_DRIVEN1TIMER_DRIVEN0 secCRON_DRIVEN* * * * * ?File SizeThe size of the file that will be usedFile SizefalseFile SizetruefalsefalseBatch Size1The number of FlowFiles to be transferred in each invocationBatch SizefalseBatch SizetruefalsefalseData FormatBinaryBinaryTextTextBinarySpecifies whether the data should be Text or BinaryData FormatfalseData FormattruefalsefalseUnique FlowFilestruetruefalsefalsefalseIf true, each FlowFile that is generated will be unique. If false, a random value will be generated and all FlowFiles will get the same content but this offers much higher throughputUnique FlowFilesfalseUnique FlowFilestruefalsefalsefalse30 secFile Size0kbBatch SizeData FormatTextUnique FlowFiles010 secTIMER_DRIVEN1 secGenerateFlowFilefalsesuccessSTOPPEDfalsetrueorg.apache.nifi.processors.standard.GenerateFlowFile8c36844e-3826-4491-827a-5ce33be372f13b98a054-b345-477c-abbf-e6e64cc373031712.0422.30129317631213WARN1TIMER_DRIVEN1EVENT_DRIVEN0CRON_DRIVEN1TIMER_DRIVEN0 secCRON_DRIVEN* * * * * ?Regular Expression(?s:^.*$)The Search Value to search for in the FlowFile content. Only used for 'Literal Replace' and 'Regex Replace' matching strategiesSearch 

Re: prepend attribute value to flowfile content

2016-04-25 Thread Joe Percivall
Hello Sumo,

Check out the "Replacement Strategy" property of the "ReplaceText" 
processor[1]. Setting this to "prepend", an "Evaluation Mode" of "Entire text" 
and a character set of UTF-8 you should be able to accomplish this.
 
https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi.processors.standard.ReplaceText/index.html

Joe
- - - - - - 
Joseph Percivall
linkedin.com/in/Percivall
e: joeperciv...@yahoo.com




On Monday, April 25, 2016 8:22 PM, Sumanth Chinthagunta  
wrote:
I need to prepend an attribute value to flowFile content. both have JSON/UTF-8 
strings. 
looking for advice if I can use any of the build-in processors to efficiently 
produce new combined flowFile. 
Thanks 
Sumo 


Re: Help on creating that flow that requires processing attributes in a flow content but need to preserve the original flow content

2016-03-21 Thread Joe Percivall
Hello Chris,

The EvaluateJsonPath processor has the property "Destination" which gives you 
the option to send it either to the FlowFile content or a FlowFile attribute. 
Selecting "flowfile-attribute" will place the value in the "kafka.key" 
attribute of the FlowFile. You can find documentation for EvaluateJsonPath 
here[1].

[1] 
https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi.processors.standard.EvaluateJsonPath/index.html
 

Hope your NiFi use is going well,
Joe

- - - - - - 
Joseph Percivall
linkedin.com/in/Percivall
e: joeperciv...@yahoo.com




On Monday, March 21, 2016 1:34 PM, "McDermott, Chris Kevin (MSDU - 
STaTS/StorefrontRemote)"  wrote:
What I need to do is read a file from Kafka.  The Kafka key contains a JSON 
string which I need to turn in FlowFile attributes while preserving the 
original FlowFile content.  Obviously I can use EvaluteJsonPath but that 
necessitates replacing the FlowFile content with the kaka.key attribute, thus 
loosing the original FlowFile content.  I feel like I’m missing something 
fundamental.


Re: List Files

2016-03-04 Thread Joe Percivall
Hello,

ListFile is a source processor so this behavior is expected. It supports 
expression language so that it can be configured to utilize certains methods. 
For example, some people may want to get a list of files from a rotating 
directory that gets created every hour. To do that they would need to use the 
date functions included in Expression Language. 

Does forbidding input hinder a specific use-case you have?
 
Joe

- - - - - - 
Joseph Percivall
linkedin.com/in/Percivall
e: joeperciv...@yahoo.com



On Friday, March 4, 2016 10:38 AM, Charlie Frasure  
wrote:



I'm using the 0.5.1 build and having what I think is odd behavior for ListFile. 
 The processor supports expression language in the Input Directory property, 
however I can't figure out how to configure an attribute as input.

I tried using UpdateAttribute prior to ListFile, but ListFile doesn't seem to 
allow incoming connections.  I also tried creating an attribute within 
ListFile, but it doesn't seem to be available for referencing within the same 
processor.

Is this expected?  If so, what is the intended purpose of allowing expression 
language in this attribute?


Re: How to configure a ExecuteStreamCommand

2016-03-03 Thread Joe Percivall
Hello,

Glad you were able to get it working. 

NiFi was designed to work just as well during failures as successes. If you 
notice, there are "success" and "failure" relationships coming out of the 
PutFile processor. You can create a connection for the failure relationship in 
the same way you did for success. 

So if you purely just want to log failures then route the failure relationship 
to the LogAttribute processor. If you want more immediate notifications, you 
could also use the PutEmail processor. It all depends on what your system needs 
are.

Joe
 - - - - - - 
Joseph Percivall
linkedin.com/in/Percivall
e: joeperciv...@yahoo.com




On Thursday, March 3, 2016 2:50 AM, jose antonio rodriguez diaz 
<josearodrigu...@gmail.com> wrote:
Hello All,

Finally I could make it work. In addition to errors identified by Joe, I had 
another errors, specifically on the connection between PutFile and 
ExecuteStreamProcess, success relation wasn’t checked, so to try to improve 
this data flow what could I do to log failure relationships, what do you 
suggest? Any comment will be nice to ear, no matter if it is about any other 
way to improve the whole data flow.

Thanks again Joe for answer me.

Regards 



> El 3 mar 2016, a las 0:12, jose antonio rodriguez diaz 
> <josearodrigu...@gmail.com> escribió:
> 
> Hello Joe,
> 
> I am gonna explain you the whole picture of what I got and what I would like 
> to have. Right now we I receive a file (In fact I receive several files) in a 
> shared net unit called “Z" then manually I move the file to a local folder 
> called “data” and finally I execute a batch file, this batch file consume 
> (read) the file and after read it move it to another folder called 
> “imported”. That’s why I just try to invoke a batch program once the file has 
> been dropped on the “data” folder.
> 
> I have changed Max attribute length and set it to 256 and also leaved the 
> Output destination attribute empty. Even so again I haven’t been able to 
> execute the batch file (there is no new file called foo.txt on my desktop).
> 
> Have you any idea what I am doing wrong? I am pretty sure should be an easy 
> fix. Please fell free to make any comment or suggestion regarding to my case.
> 
> Thanks in advance.
> 
>> El 2 mar 2016, a las 17:36, Joe Percivall <joeperciv...@yahoo.com> escribió:
>> 
>> Hello,
>> 
>> Welcome to NiFi!
>> 
>> I just tried running an ExecuteStreamCommand processor with the properties 
>> you have (I created a script and modified the paths to point to a folder 
>> that exists) and two things jump out. One, the Max attribute length must 
>> take an integer. If you set it to be a path the processor will be invalid 
>> and you'll see a yellow warning icon in the top left of the processor. This 
>> means the processor will not run and you'll see the flowfiles queue up in 
>> the relationship preceding it.
>> 
>> Second, the Output Destination Attribute is only for when you want to output 
>> the results of the command to an attribute instead of the content of a new 
>> flowfile (useful for running a command to find the character encoding of the 
>> contents). Using an integer for the max attribute length I am able to 
>> correctly run the script.
>> 
>> As a helpful hint, you can see the description of a property by hovering 
>> over the light blue "?" icon in the configure processor tab. Also you can 
>> see the documentation for the processor by right clicking on it and 
>> selecting "usage" from the list.
>> 
>> Also what will you eventually be doing with your script? The way the 
>> ExecuteStreamCommand is designed to work is by taking in a FlowFile and then 
>> running an external command on it. So you may make your flow more efficient 
>> and user friendly by putting the ExecuteStreamCommand between the Get and 
>> Put.
>> 
>> Hope that helps,
>> Joe
>> - - - - - - 
>> Joseph Percivall
>> linkedin.com/in/Percivall
>> e: joeperciv...@yahoo.com
>> 
>> 
>> 
>> 
>> On Sunday, February 28, 2016 4:53 PM, jose antonio rodriguez diaz 
>> <josearodrigu...@gmail.com> wrote:
>> Hello All,
>> 
>> I am just getting started with apache Nifi doing a kind of PoC (Proof of 
>> Concept) My DataFlow is compose as follow
>> GetFile->PutFile->ExecuteStreamCommand
>> 
>> The idea is move a file from a folder to another one and then execute an 
>> script. The first step (move the file from one side to the other) works 
>> perfectly but I haven´t been able to execute the script. The script is very 
>> simple I just want to create a 

Re: How to configure a ExecuteStreamCommand

2016-03-02 Thread Joe Percivall
Hello,

Welcome to NiFi!

I just tried running an ExecuteStreamCommand processor with the properties you 
have (I created a script and modified the paths to point to a folder that 
exists) and two things jump out. One, the Max attribute length must take an 
integer. If you set it to be a path the processor will be invalid and you'll 
see a yellow warning icon in the top left of the processor. This means the 
processor will not run and you'll see the flowfiles queue up in the 
relationship preceding it.

Second, the Output Destination Attribute is only for when you want to output 
the results of the command to an attribute instead of the content of a new 
flowfile (useful for running a command to find the character encoding of the 
contents). Using an integer for the max attribute length I am able to correctly 
run the script.

As a helpful hint, you can see the description of a property by hovering over 
the light blue "?" icon in the configure processor tab. Also you can see the 
documentation for the processor by right clicking on it and selecting "usage" 
from the list.

Also what will you eventually be doing with your script? The way the 
ExecuteStreamCommand is designed to work is by taking in a FlowFile and then 
running an external command on it. So you may make your flow more efficient and 
user friendly by putting the ExecuteStreamCommand between the Get and Put.
 
Hope that helps,
Joe
- - - - - - 
Joseph Percivall
linkedin.com/in/Percivall
e: joeperciv...@yahoo.com




On Sunday, February 28, 2016 4:53 PM, jose antonio rodriguez diaz 
 wrote:
Hello All,

I am just getting started with apache Nifi doing a kind of PoC (Proof of 
Concept) My DataFlow is compose as follow
GetFile->PutFile->ExecuteStreamCommand

The idea is move a file from a folder to another one and then execute an 
script. The first step (move the file from one side to the other) works 
perfectly but I haven´t been able to execute the script. The script is very 
simple I just want to create a file on my desktop.

the script called script.sh is located on my Desktop ($HOME/Desktop/script.sh)

#!/bin/bash

echo "This is a test" >> /Users/joseantoniorodriguez/Desktop/foo.txt




Also the ExecuteStreamCommand is configured as follow

Command Path: /Users/joseantoniorodriguez/Desktop/script.sh
Ignore STDIN: true
Working directory: /Users/joseantoniorodriguez/Desktop
Argument delimiter: ;
Output destination attribute: /Users/joseantoniorodriguez/Desktop —> ¿Is this 
necessary?
Max attribute length: /Users/joseantoniorodriguez/Desktop


The file I’m using to test are both csv about one of 324KB and the other 22MB.

After execute I could see the file has been moved from one folder to the other 
but I did´t see any foo.txt file on my desktop, also I did´t see any error on 
the Flow.

Could anybody give me a hand with this I am pretty sure this should be a 
ridiculous error or misconfiguration. BY the way the OS is Mac OS X.

Thanks in advance.


Re: Processor with State

2016-03-02 Thread Joe Percivall
I created a jira ticket to track this idea for a processor that enables 
updating an attribute using state, which should enable the very basics of data 
science: https://issues.apache.org/jira/browse/NIFI-1582
 
Joe- - - - - - 
Joseph Percivall
linkedin.com/in/Percivall
e: joeperciv...@yahoo.com




On Wednesday, March 2, 2016 11:19 AM, Joe Percivall <joeperciv...@yahoo.com> 
wrote:
Hello Claudio,

Your use-case actually could leverage a couple of recently added features to 
create a really cool open-source processor. The two key features that were 
added are State Management and the ability to reference processor specific 
variables in expression language. You can take a look at RouteText to see both 
in action. 

By utilizing both you can create a processor that is configured with multiple 
Expression language expressions. There would be dynamic properties which would 
accept expression language and then store the evaluated value via state 
management. Then there would be a routing property (that supports expression 
language) that could simply add an attribute to the flowfile with the evaluated 
value which would allow it to be used by flowing processors for routing.

This would allow you to do your use-case where you store the value for the 
incoming stream and route differently once you go over a threshold. It could 
even allow more complex use-cases. One instance, I believe, would be possible 
is to have a running average and standard deviation and route data to different 
locations based on it's standard deviation.


You can think of this like an UpdateAttribute with the ability to store and 
calculate variables using expression language.
Joe

- - - - - - 
Joseph Percivall
linkedin.com/in/Percivall
e: joeperciv...@yahoo.com




On Thursday, February 25, 2016 1:12 PM, Claudio Caldato 
<claud...@microsoft.com> wrote:



I expect that in the future I’ll need something a little more sophisticated but 
for now my problem is very simple:
I want to be able to trigger an alert (only once) when an attribute in an 
incoming stream, for instance, goes over a predefined threshold. The Processor 
should then trigger (only once again) another trigger when the signal goes back 
to normal (below threshold). Basically a RouteByAttribute but with memory.

Thanks 
Claudio





On 2/24/16, 8:56 PM, "Joe Witt" <joe.w...@gmail.com> wrote:

>Claudio
>
>Hello there and welcome to the nifi community.  There are some
>processors available now that allow you to store values in distributed
>(across the cluster) maps and to retrieve them.  And now within
>processors there is the ability interact with state management
>features built into the framework.  So the basic pieces are there.  I
>would like to better understand the idea though because it may be even
>more straight forward.
>
>Where does the state or signal come from that would prompt you to
>store a value away?  And is this source/signal separate from the feed
>of data you'd like to tag with this value?
>
>For example, we have the UpdateAttribute processor which can be used
>to tag attributes onto flow files going by.  You can of course simply
>call the rest api to change the tag being applied as needed and that
>can be done by whatever the signal/source is potentially.
>
>Thanks
>Joe
>
>On Wed, Feb 24, 2016 at 11:49 PM, Claudio Caldato
><claud...@microsoft.com> wrote:
>>
>> I need to be able to store a simple value (it can be true/false) in the
>> processor across messages, basically I need a processor with a local state
>> (set of properties) that I can use to set the value of properties on output
>> messages
>>
>> Can it be done or do I need to build a custom processor?
>>
>> Thanks
>> Claudio
>>


Re: Processor with State

2016-03-02 Thread Joe Percivall
Hello Claudio,

Your use-case actually could leverage a couple of recently added features to 
create a really cool open-source processor. The two key features that were 
added are State Management and the ability to reference processor specific 
variables in expression language. You can take a look at RouteText to see both 
in action. 

By utilizing both you can create a processor that is configured with multiple 
Expression language expressions. There would be dynamic properties which would 
accept expression language and then store the evaluated value via state 
management. Then there would be a routing property (that supports expression 
language) that could simply add an attribute to the flowfile with the evaluated 
value which would allow it to be used by flowing processors for routing.

This would allow you to do your use-case where you store the value for the 
incoming stream and route differently once you go over a threshold. It could 
even allow more complex use-cases. One instance, I believe, would be possible 
is to have a running average and standard deviation and route data to different 
locations based on it's standard deviation.


You can think of this like an UpdateAttribute with the ability to store and 
calculate variables using expression language.
Joe

- - - - - - 
Joseph Percivall
linkedin.com/in/Percivall
e: joeperciv...@yahoo.com



On Thursday, February 25, 2016 1:12 PM, Claudio Caldato 
 wrote:



I expect that in the future I’ll need something a little more sophisticated but 
for now my problem is very simple:
I want to be able to trigger an alert (only once) when an attribute in an 
incoming stream, for instance, goes over a predefined threshold. The Processor 
should then trigger (only once again) another trigger when the signal goes back 
to normal (below threshold). Basically a RouteByAttribute but with memory.

Thanks 
Claudio





On 2/24/16, 8:56 PM, "Joe Witt"  wrote:

>Claudio
>
>Hello there and welcome to the nifi community.  There are some
>processors available now that allow you to store values in distributed
>(across the cluster) maps and to retrieve them.  And now within
>processors there is the ability interact with state management
>features built into the framework.  So the basic pieces are there.  I
>would like to better understand the idea though because it may be even
>more straight forward.
>
>Where does the state or signal come from that would prompt you to
>store a value away?  And is this source/signal separate from the feed
>of data you'd like to tag with this value?
>
>For example, we have the UpdateAttribute processor which can be used
>to tag attributes onto flow files going by.  You can of course simply
>call the rest api to change the tag being applied as needed and that
>can be done by whatever the signal/source is potentially.
>
>Thanks
>Joe
>
>On Wed, Feb 24, 2016 at 11:49 PM, Claudio Caldato
> wrote:
>>
>> I need to be able to store a simple value (it can be true/false) in the
>> processor across messages, basically I need a processor with a local state
>> (set of properties) that I can use to set the value of properties on output
>> messages
>>
>> Can it be done or do I need to build a custom processor?
>>
>> Thanks
>> Claudio
>>


Re: Maximum attribute size

2016-02-19 Thread Joe Percivall
Hello Lars,
You are correct that the WAL is different from swapping. 
Swapping is used when a single connection queue grows to be very large. A chunk 
of the FlowFiles are then swapped out of JVM memory and written to disk. Where 
they are stored until they are swapped back in for processing. The WAL is 
almost solely for persistence of information when an NiFi instance is stopped 
for some reason (ie. restarting or hardware failures).
I am currently working on finishing up a document which will explain these and 
many other concepts utilized by the underlying system. So look out for that in 
the relatively near future. Joe
- - - - - - Joseph Percivalllinkedin.com/in/Percivalle: joeperciv...@yahoo.com
 

On Wednesday, February 17, 2016 6:48 PM, Lars Francke 
 wrote:
 

 Thanks a lot for confirming my suspicions.
One last clarification: The WAL is different from the swapping concept, 
correct? I guess it's way faster to swap in a dedicated "dump" than replaying a 
WAL.
On Wed, Feb 17, 2016 at 7:53 PM, Joe Witt  wrote:

Lars,

You are right about the thought process.  We've never provided solid
guidance here but we should.  It is definitely the case that flow file
content is streamed to and from the underlying repository and the only
way to access it is through that API.  Thus well behaved extensions
and the framework itself can handle basically data as large as the
underlying repository has space for.  For the flow file attributes
though these are held in memory in a map with each flowfile object.
So it is important to avoid having vast (undefined) quantities of
attributes or attributes with really large (undefined) values.

There are things we can and should do to make even this relatively
transparent to the users and it is why actually we support swapping
flowfiles to disk when there are large queues because even those inmem
attributes can really add up.

Thanks
Joe

On Wed, Feb 17, 2016 at 11:06 AM, Lars Francke  wrote:
> Hi and sorry for all these questions.
>
> I know that FlowFile content is persisted to the content_repository and can
> handle reasonably large amounts of data. Is the same true for attributes?
>
> I download JSON files (up to 200kb I'd say) and I want to insert them as
> they are into a PostgreSQL JSONB column. I'd love to use the PutSQL
> processor for that but it requires parameters in attributes.
>
> I have a feeling that putting large objects in attributes is a bad idea?




  

Re: Log4j/logback parser via syslog

2016-02-12 Thread Joe Percivall
Hello Madhu,


If you're looking for a template to show how to create a dynamic property for 
RouteOnAttribute to use, I'd suggest checking out this template[1]. It is a 
simple template that checks to see if the an attribute matches 'NiFi'.

Also provenance can be a very powerful debugging tool. If a flowfile gets 
routed to a relationship you don't expect, simply check the provenance for the 
destination of the relationship. You'll be able to see the exact attributes for 
any recent flowfile that was routed there.
[1] 
https://github.com/hortonworks-gallery/nifi-templates/blob/master/templates/simple-httpget-route-flow.xml

 
Hope that helps,
Joe

- - - - - - 
Joseph Percivall
linkedin.com/in/Percivall
e: joeperciv...@yahoo.com



On Friday, February 12, 2016 2:28 PM, Madhukar Thota  
wrote:



I am getting my log4j logs on facility value 23 ( LOCAL7) how can route only 
facility 23 logs for further extraction.

I added RouteonAttribute  processor and defined this property 
:${facility:contains(23)}  but none of them messages getting matched. I am not 
sure my defined property is correct. How can i route messages based on the 
field value to different processors?

-Madhu


On Fri, Feb 12, 2016 at 11:33 AM, Madhukar Thota  
wrote:

Thanks Bryan. Looking forward for the release.
>
>
> 
>
>
>On Fri, Feb 12, 2016 at 10:55 AM, Bryan Bende  wrote:
>
>I believe groovy, python, jython, jruby, ruby, javascript, and lua.
>>
>>
>>The associated JIRA is here:
>>https://issues.apache.org/jira/browse/NIFI-210
>>
>>
>>
>>There are some cool blogs about them here:
>>http://funnifi.blogspot.com/2016/02/executescript-processor-hello-world.html
>>
>>
>>
>>-Bryan
>>
>>
>>On Fri, Feb 12, 2016 at 10:48 AM, Madhukar Thota  
>>wrote:
>>
>>Thanks Bryan. I will look into ExtractText processor.
>>>
>>>
>>>Do you know what scripting languages are supported with new processors?
>>>
>>>
>>>-Madhu
>>>
>>>
>>>On Fri, Feb 12, 2016 at 9:27 AM, Bryan Bende  wrote:
>>>
>>>Hello,


Currently there are no built in processors to parse log formats, but have 
you taken a look at the ExtractText processor [1]? 


If you can come up with a regular expression for whatever you are trying to 
extract, then you should be able to use ExtractText.


Other options... 


You could write a custom processor, but this sounds like it might be 
overkill for your scenario.
In the next release (hopefully out in a few days) there will be two new 
processors that support scripting languages. It may be easier to use a 
scripting language to manipulate/parse the text. 


Thanks,


Bryan


[1] 
https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi.processors.standard.ExtractText/index.html




On Fri, Feb 12, 2016 at 12:16 AM, Madhukar Thota  
wrote:

Hi 
>
>
>I am very new to Apache Nifi and just started learning about how to use it.
>
>
>We have a requirement where we need to parse log4j/logback pattern 
>messages coming from SyslogAppenders via Syslog udp. I can read the 
>standard syslog messages, but how can i further extract log4j/logback 
>messages  from syslog body.
>
>
>Is there any log parsers( log4j/logback/Apache access log format) 
>available in apache nifi?
>
>
>
>
>Any help on this much appreciated. 
>
>
>Thanks in Advance.
>
>

>>>
>>
>


Re: File Upload to a ListenHTTP Processor

2016-02-05 Thread Joe Percivall
Hello Andrew,

I believe I was running into something similar before and my problem was 
actually with the curl command itself. When I added "--data-binary" to the 
command it worked.

Relevant SO question: 
http://stackoverflow.com/questions/9134003/binary-data-posting-with-curl

Joe
- - - - - - 
Joseph Percivall
linkedin.com/in/Percivall
e: joeperciv...@yahoo.com



On Friday, February 5, 2016 5:30 PM, Andrew Serff  
wrote:



Hello, I’m new to NiFi, and I’m just trying out my options for ingest into a 
data flow we are trying to set up.  We want to expose both the capability to 
SFTP a file into a dropbox or perform an HTTP File Upload. So I’ve set up both 
a GetFile and ListenHTTP processors.  After both processors, I just add a 
success relationship to a PutFile processor to write the file to an archive 
directory.  The dropbox works as expected, however the ListenHTTP processor 
isn’t working quite as I’d expect it to, so I’m hoping someone can point out 
what I’m doing wrong.  

As a simple case, I just set up a ListenHTTP processor and have it listening on 
port 8080 at the “ingest” context.  So I can hit it at 
http://localhost:8080/ingest. No security or anything yet. Then from the 
command line, I’m trying to upload file like so:

curl -i -X POST -H "Content-Type: multipart/form-data" -H "filename: 
maven-4.0.0.xsd" -F "data=@maven-4.0.0.xsd" http://localhost:8080/ingest

The file is uploaded and written to the directory, however it is wrapped with 
the form encoding.  So the beginning of the file now looks like:

--78b35889e5299cc2
Content-Disposition: form-data; name="data"; filename="maven-4.0.0.xsd"
Content-Type: application/octet-stream






http://www.w3.org/2001/XMLSchema; 
elementFormDefault="qualified" xmlns="http://maven.apache.org/POM/4.0.0; 
targetNamespace="http://maven.apache.org/POM/4.0.0;>
  



Obviously the file is no longer a valid xsd file nor the same as what was sent 
from the source system. I used an xsd file just for testing, but we could have 
any type of file (binary, text, what have you…). I have also tried this same 
upload using a Java client and I get the same result.  

Can anyone let me know how we can get this to work? 
Thanks
Andrew


Re: Add date in the flow file attribute

2016-02-02 Thread Joe Percivall
Hello Sudeep,

How precise do you need the date/time to be? What you could do is add an 
UpdateAttribute processor[1] after ingesting which uses the Expression language 
functions "now" [2] and "format" [3] to add the date/time down to the 
millisecond.

There would of course be a bit of error between when it was ingested and when 
it is processed by UpdateAttribute but UpdateAttribute is very fast and there 
may actually not be any measurable delay.

[1] 
https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi.processors.attributes.UpdateAttribute/index.html
[2] 
https://nifi.apache.org/docs/nifi-docs/html/expression-language-guide.html#now
[3] 
https://nifi.apache.org/docs/nifi-docs/html/expression-language-guide.html#format

Hope that helps,
Joe
- - - - - - 
Joseph Percivall
linkedin.com/in/Percivall
e: joeperciv...@yahoo.com



On Tuesday, February 2, 2016 1:17 AM, sudeep mishra  
wrote:



Hi,

I need to create some audits around the NiFi flows and want to add the time a 
flow file was received by a particular processor. Is there a way to add this 
date in the attributes for flow files?

I can see a date in the 'Details' section for a data provenance entry but can 
we get such a date in the attributes as well?


Thanks & Regards,

Sudeep


Re: Add date in the flow file attribute

2016-02-02 Thread Joe Percivall
Glad UpdateAttribute works for you.

You are seeing AttributeToJson append the information to the content? That is 
not what the documentation says or how it should be behaving (should replace 
the contents). Could you send more information documenting this?
 
Joe
- - - - - - 
Joseph Percivall
linkedin.com/in/Percivall
e: joeperciv...@yahoo.com



On Tuesday, February 2, 2016 12:11 PM, sudeep mishra <sudeepshekh...@gmail.com> 
wrote:



Thanks Joe.

The UpdateAttribute processor can be helpful for my case. Also is it possible 
to push only the attributes to  Mongo? I could see an AttributeToJson object 
but it seems to be appending the information in flow file content or attribute. 
What is a good way to capture only attributes and send it to MongoDb?



On Tue, Feb 2, 2016 at 8:42 PM, Joe Percivall <joeperciv...@yahoo.com> wrote:

Hello Sudeep,
>
>How precise do you need the date/time to be? What you could do is add an 
>UpdateAttribute processor[1] after ingesting which uses the Expression 
>language functions "now" [2] and "format" [3] to add the date/time down to the 
>millisecond.
>
>There would of course be a bit of error between when it was ingested and when 
>it is processed by UpdateAttribute but UpdateAttribute is very fast and there 
>may actually not be any measurable delay.
>
>[1] 
>https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi.processors.attributes.UpdateAttribute/index.html
>[2] 
>https://nifi.apache.org/docs/nifi-docs/html/expression-language-guide.html#now
>[3] 
>https://nifi.apache.org/docs/nifi-docs/html/expression-language-guide.html#format
>
>Hope that helps,
>Joe
>- - - - - -
>Joseph Percivall
>linkedin.com/in/Percivall
>e: joeperciv...@yahoo.com
>
>
>
>
>On Tuesday, February 2, 2016 1:17 AM, sudeep mishra <sudeepshekh...@gmail.com> 
>wrote:
>
>
>
>Hi,
>
>I need to create some audits around the NiFi flows and want to add the time a 
>flow file was received by a particular processor. Is there a way to add this 
>date in the attributes for flow files?
>
>I can see a date in the 'Details' section for a data provenance entry but can 
>we get such a date in the attributes as well?
>
>
>Thanks & Regards,
>
>Sudeep
>


-- 

Thanks & Regards,

Sudeep 


Re: JDBC External Table File Target

2016-01-28 Thread Joe Percivall
Hello Obaid,

Sorry no one has gotten back to you sooner, many of the developers are working 
diligently to get 0.5.0 done.

I don't know too much about loading SQL tables but when you say "local file 
system filename", is this a file that exits on the target system? If so you may 
just be able to set that command because my understanding is that using 
ExecuteSQL you have an input query which gets sent to the target server. Then 
the target server then runs that command.

Can anyone else that has more experience with SQL and loading tables chime in?
 

Joe
- - - - - - 
Joseph Percivall
linkedin.com/in/Percivall
e: joeperciv...@yahoo.com



On Saturday, January 23, 2016 12:31 AM, obaidul karim  
wrote:



Hi,

I developing a nifi processor for netezza. 
As per by experience the best way to load data to netezza is using netezza JDBC 
external table.

I want to run below command to load a file within NiFi processor.

insert into MYTABLE 
select * FROM EXTERNAL '' 
USING () ;

My question is which file to usefor   ?
- Is it the flow files ? If yes then how can I get full path of a flowfile ?
- or can I directly load the file in a spool directory ?
 

Thanks for your help in advance.

-Obaid


Re: Trying to send a PostHTTP with a "non-chunked" payload.

2016-01-19 Thread Joe Percivall
Hello Richard,
This is actually came up a couple days ago and is a known bug [1]. In another 
ticket[2] I added the option to InvokeHttp expose the option and my patch is 
pending. [1] https://issues.apache.org/jira/browse/NIFI-1396[2] 
https://issues.apache.org/jira/browse/NIFI-1405
Joe- - - - - - Joseph Percivalllinkedin.com/in/Percivalle: 
joeperciv...@yahoo.com
 

On Tuesday, January 19, 2016 1:37 PM, Richard Catlin 
 wrote:
 

 
I am trying to create a POST in Nifi, which carries a standard non-chunked 
payload.
Here is a POST request that works:POST / HTTP/1.1Host: 
127.0.0.1:8085User-Agent: curl/7.43.0Accept: */*Content-Length: 60Content-Type: 
application/x-www-form-urlencodedclient_id=test_id=test_secret∾cess_token=test_wellsHere
 is the POST request created in Nifi using PostHTTP. The setting are "Send as 
FlowFile" is false; "Use chunked encoding" is false. Why does the POST below 
show chunked encoding to be enabled?POST / HTTP/1.1Content-Type: 
application/x-www-form-urlencodedx-prefer-acknowledge-uri: 
truex-nifi-transfer-protocol-version: 3x-nifi-transaction-id: 
22297370-a69e-4210-b872-f010c2046c50Transfer-Encoding: chunkedHost: 
127.0.0.1:8085Connection: Keep-AliveUser-Agent: Apache-HttpClient/4.4.1 
(Java/1.7.0_79)3cclient_id=test_id=test_secret∾cess_token=test_wells

  

Re: how to sort json array in dataflow

2016-01-18 Thread Joe Percivall
Hello Roland,

Were you able to achieve any success incorporating the JSON sorting into a 
custom processor?
 
Joe
- - - - - - 
Joseph Percivall
linkedin.com/in/Percivall
e: joeperciv...@yahoo.com



On Monday, January 11, 2016 2:03 AM, 彭光裕  wrote:



Hello Joe,
Thanks for your response.

You can refer to http://www.trentrichardson.com/jsonsql/ for more details 
about what I've mentioned.

Besides that, http://doc.snaplogic.com/jsonpath , this project also extends 
jsonpath for more functions something like sort,etc.
for examples: $.children.sort_asc(value.age) / 
$.children.sort_desc(value.age)

Roland.
-Original Message-
From: Joe Witt [mailto:joe.w...@gmail.com] 
Sent: Monday, January 11, 2016 10:52 AM
To: users@nifi.apache.org
Subject: Re: how to sort json array in dataflow

Hello Roland

I don't believe we have out of the box support for that but it would make for 
likely a well scoped custom processor.

You mention support for sort syntax.  Can you share what that might look like?

Thanks
Joe

On Sun, Jan 10, 2016 at 9:45 PM, 彭光裕  wrote:
> Hi,
>
> I have a json array flowfile and would like to sort the json 
> array by some certain value (let’s say price). Is it possible sort 
> json array within nifi dataflow? I have tried EvaluateJsonPath 
> processor, but jsonpath doesn’t support sort syntax. I still can’t 
> figure it out how to do this. Any suggestion would be welcome, thanks in 
> advance.
>
>
>
> Roland.
>
>
>
> 本信件可能包含中華電信股份有限公司機密資訊,非指定之收件者,請勿蒐集、處理或利用本信件內容,並請銷毀此信件.
> 如為指定收件者,應確實保護郵件中本公司之營業機密及個人資料,不得任意傳佈或揭露,並應自行確認本郵件之附檔與超連結之安全性,以共同善盡資訊安全與個資保護責任.
> Please be advised that this email message (including any attachments) 
> contains confidential information and may be legally privileged. If 
> you are not the intended recipient, please destroy this message and 
> all attachments from your system and do not further collect, process, 
> or use them. Chunghwa Telecom and all its subsidiaries and associated 
> companies shall not be liable for the improper or incomplete 
> transmission of the information contained in this email nor for any 
> delay in its receipt or damage to your system. If you are the intended 
> recipient, please protect the confidential and/or personal information 
> contained in this email with due care. Any unauthorized use, 
> disclosure or distribution of this message in whole or in part is 
> strictly prohibited. Also, please self-inspect attachments and 
> hyperlinks contained in this email to ensure the information security and to 
> protect personal information.


本信件可能包含中華電信股份有限公司機密資訊,非指定之收件者,請勿蒐集、處理或利用本信件內容,並請銷毀此信件.如為指定收件者,應確實保護郵件中本公司之營業機密及個人資料,不得任意傳佈或揭露,並應自行確認本郵件之附檔與超連結之安全性,以共同善盡資訊安全與個資保護責任.
Please be advised that this email message (including any attachments) contains 
confidential information and may be legally privileged. If you are not the 
intended recipient, please destroy this message and all attachments from your 
system and do not further collect, process, or use them. Chunghwa Telecom and 
all its subsidiaries and associated companies shall not be liable for the 
improper or incomplete transmission of the information contained in this email 
nor for any delay in its receipt or damage to your system. If you are the 
intended recipient, please protect the confidential and/or personal information 
contained in this email with due care. Any unauthorized use, disclosure or 
distribution of this message in whole or in part is strictly prohibited.  Also, 
please self-inspect attachments and hyperlinks contained in this email to 
ensure the information security and to protect personal information.


Re: Is there a way to configure a processor to run only N times

2016-01-18 Thread Joe Percivall
Hello Lars and Sudeep,
I created a Jira ticket for this issue: 
https://issues.apache.org/jira/browse/NIFI-1407
We can continue the conversation there instead two different email threads.
Joe - - - - - - Joseph Percivalllinkedin.com/in/Percivalle: 
joeperciv...@yahoo.com
 

On Thursday, January 14, 2016 4:44 AM, Lars Francke 
 wrote:
 

 No, I'm not. I wanted to file one but haven't gotten around to it yet.
On Thu, Jan 14, 2016 at 10:09 AM, sudeep mishra  
wrote:

Thanks Lars for your inputs. Just to confirm are you aware of a JIRA to 
implement 'max executions' property for processors?
Regards,
Sudeep
On Thu, Jan 14, 2016 at 1:31 PM, Lars Francke  wrote:

Hi Sudeep,
I asked the same question just four days ago. You'll find the thread here 
.
The short answer currently is: No, it's not possible.
Cheers,Lars
On Thu, Jan 14, 2016 at 8:42 AM, sudeep mishra  wrote:

Hi,
Can we configure to run a processor to run only 'N' times. In my data flow I 
want that some processor should run only once. How can I achieve it?


Thanks & Regards,
Sudeep






-- 
Thanks & Regards,
Sudeep Shekhar Mishra
+91-9167519029sudeepshekh...@gmail.com



  

Re: PutDistributedMapCache

2016-01-14 Thread Joe Percivall
Hello Sudeep,
Sorry, not following your emails, did you need more help importing the 
processor?
Currently the way you would clear a DistributedMapCache is to just remove the 
DistributedMapCacheServer controller service and make a new one.
Joe - - - - - - Joseph Percivalllinkedin.com/in/Percivalle: 
joeperciv...@yahoo.com
 

On Thursday, January 14, 2016 7:04 AM, sudeep mishra 
<sudeepshekh...@gmail.com> wrote:
 

 Thanks Joe. The GetDistributedMapCache seems to be working fine. 
Is there a way to clear DistributedMapCache on demand?
Regards,
Sudeep
On Thu, Jan 14, 2016 at 12:42 PM, sudeep mishra <sudeepshekh...@gmail.com> 
wrote:

Upon building the repository we get different .nar files which can be updated 
in the lib for my requirement. Thanks for your help.
On Thu, Jan 14, 2016 at 9:27 AM, sudeep mishra <sudeepshekh...@gmail.com> wrote:

Is it possible to build the code for only a particular processor? Just curious 
if we can build and deploy a particular processor in an existing NiFi 
environment.
On Wed, Jan 13, 2016 at 9:33 PM, sudeep mishra <sudeepshekh...@gmail.com> wrote:

Thanks Joe. I will try out the patch.
On Wed, Jan 13, 2016 at 9:31 PM, Joe Percivall <joeperciv...@yahoo.com> wrote:

You would need to clone the nifi source from github and then apply the patch 
using git.

Here is how to clone a repo: 
https://help.github.com/articles/cloning-a-repository/
Along with the nifi repo itself: https://github.com/apache/nifi

and how to apply a patch: 
http://makandracards.com/makandra/2521-git-how-to-create-and-apply-patches

Let me know if you have any other questions,
Joe
- - - - - -
Joseph Percivall
linkedin.com/in/Percivall
e: joeperciv...@yahoo.com



On Wednesday, January 13, 2016 10:56 AM, sudeep mishra 
<sudeepshekh...@gmail.com> wrote:



Thank you very much Joe.

Can you please let me know how I can use the .patch file? I am using the NiFi 
via the binaries... Do I need to setup the source code and build the same along 
with the patch?

Thanks & Regards,

Sudeep


On Wed, Jan 13, 2016 at 9:02 PM, Joe Percivall <joeperciv...@yahoo.com> wrote:

Hello Sudeep,
>
>I put up a patch on the GetDistributedMapCache ticket[1]. Let me know what you 
>think.
>
>The PutDistributedMapCache processor and GetDistributedMapCache work with the 
>data as a byte[] so it should be format agnostic. That being said it will be 
>up to you to know what is in there in order to use it later.
>
>[1] https://issues.apache.org/jira/browse/NIFI-1382
>
>Joe
>- - - - - -
>Joseph Percivall
>linkedin.com/in/Percivall
>e: joeperciv...@yahoo.com
>
>
>
>
>On Tuesday, January 12, 2016 11:34 PM, sudeep mishra 
><sudeepshekh...@gmail.com> wrote:
>
>
>
>Thanks Joe.
>
>I do not have specific configuration as of now as I am still exploring NiFi. 
>Though I think it would be helpful to let user store and retrieve the cache 
>values in different formats json, avro etc.
>
>Thanks & Regards,
>
>Sudeep
>
>
>
>
>
>On Tue, Jan 12, 2016 at 9:15 PM, Joe Percivall <joeperciv...@yahoo.com> wrote:
>
>Hello Sudeep,
>>
>>
>>We are currently lacking a "GetDistributedMapCache" processor that 
>>corresponds to the "PutDistributedMapCache". I created a ticket[1] and will 
>>be working on it today. If you have any comments, configuration suggestions, 
>>etc. please let me know or comment on the ticket.
>>
>>
>>[1] https://issues.apache.org/jira/browse/NIFI-1382
>>
>>Joe
>>- - - - - -
>>Joseph Percivall
>>linkedin.com/in/Percivall
>>e: joeperciv...@yahoo.com
>>
>>
>>
>>
>>
>>On Tuesday, January 12, 2016 9:46 AM, sudeep mishra 
>><sudeepshekh...@gmail.com> wrote:
>>
>>
>>
>>Thanks Matt.
>>
>>
>>In my data flow I am expected to perform certain validations on data. I am 
>>loading some SQLServer data into HDFSusing Sqoop (not part of NiFi flow). For 
>>each record in HDFS file I have to query another database and then save the 
>>validated record again in HDFS which will be processed bysome Spark jobs.
>>
>>
>>Since I have to query for each record thus I was planning to cache the 
>>database records against which I have to validate the HDFS. Thus I was 
>>evaluating the DistributedCacheServer. But looks like its purpose is 
>>different. Alternatively can we integrate Redis or another distributed cache 
>>with NiFi as I do not see any processor for it.
>>
>>
>>Appreciate your help.
>>
>>
>>Thanks & Regards,
>>
>>
>>Sudeep
>>
>>
>>
>>
>>On Tue, Jan 12, 2016 at 6:59 PM, Matthew Clarke <matt.clarke@gmail.com> 
&g

Re: PutSQL question

2016-01-13 Thread Joe Percivall
Hello Ralph,
A common way to replace the contents of a FlowFile with specific text is to use 
the ReplaceText processor. The default search query will match the entire 
content and replace it with whatever your replacement value is. So if you set 
the replacement value (complete with expression language usage) to the query 
you want the FlowFile content will become the query.
In order to do that you need to have the ID from the JSON message as a FlowFile 
attribute. You will need to use ExtractText [1] to get the ID out of the 
content. If the JSON tag is unique (found via regex), you could forgo the 
EvaluateJSONPath processor and just use the ExtractText processor.  [1] 
https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi.processors.standard.ExtractText/index.html
Hope that helps,Joe- - - - - - Joseph Percivalllinkedin.com/in/Percivalle: 
joeperciv...@yahoo.com
 

On Wednesday, January 13, 2016 3:16 PM, "Perko, Ralph J" 
 wrote:
 

  Hi 
I want to use the PutSQL processor to execute an insert statement. The use case 
is straightforward.  I need to pull an id attribute from a JSON message and 
write it to a database.  I am using EvaluateJSONPath to grab the id but I’m 
hung up on how to then pass the SQL statement as the content.  I could put the 
SQL statement in a file to be loaded as content but I am wondering if there is 
a way to do this inline with the flow? Ideally I would have the content of the 
FlowFile be the SQL and pass in the id as a SQL parameter as mentioned in the 
documentation.  
Thanks for your help,Ralph


  

Re: PutDistributedMapCache

2016-01-13 Thread Joe Percivall
Hello Sudeep, 

I put up a patch on the GetDistributedMapCache ticket[1]. Let me know what you 
think.

The PutDistributedMapCache processor and GetDistributedMapCache work with the 
data as a byte[] so it should be format agnostic. That being said it will be up 
to you to know what is in there in order to use it later.

[1] https://issues.apache.org/jira/browse/NIFI-1382
 
Joe
- - - - - - 
Joseph Percivall
linkedin.com/in/Percivall
e: joeperciv...@yahoo.com



On Tuesday, January 12, 2016 11:34 PM, sudeep mishra <sudeepshekh...@gmail.com> 
wrote:



Thanks Joe.

I do not have specific configuration as of now as I am still exploring NiFi. 
Though I think it would be helpful to let user store and retrieve the cache 
values in different formats json, avro etc.

Thanks & Regards,

Sudeep





On Tue, Jan 12, 2016 at 9:15 PM, Joe Percivall <joeperciv...@yahoo.com> wrote:

Hello Sudeep,
>
>
>We are currently lacking a "GetDistributedMapCache" processor that corresponds 
>to the "PutDistributedMapCache". I created a ticket[1] and will be working on 
>it today. If you have any comments, configuration suggestions, etc. please let 
>me know or comment on the ticket.
>
>
>[1] https://issues.apache.org/jira/browse/NIFI-1382
> 
>Joe
>- - - - - - 
>Joseph Percivall
>linkedin.com/in/Percivall
>e: joeperciv...@yahoo.com
>
>
>
>
>
>On Tuesday, January 12, 2016 9:46 AM, sudeep mishra <sudeepshekh...@gmail.com> 
>wrote:
>
>
>
>Thanks Matt.
>
>
>In my data flow I am expected to perform certain validations on data. I am 
>loading some SQLServer data into HDFSusing Sqoop (not part of NiFi flow). For 
>each record in HDFS file I have to query another database and then save the 
>validated record again in HDFS which will be processed bysome Spark jobs.
>
>
>Since I have to query for each record thus I was planning to cache the 
>database records against which I have to validate the HDFS. Thus I was 
>evaluating the DistributedCacheServer. But looks like its purpose is 
>different. Alternatively can we integrate Redis or another distributed cache 
>with NiFi as I do not see any processor for it.
>
>
>Appreciate your help.
>
>
>Thanks & Regards,
>
>
>Sudeep
>
>
>
>
>On Tue, Jan 12, 2016 at 6:59 PM, Matthew Clarke <matt.clarke@gmail.com> 
>wrote:
>
>Sudeep,
>>   I was a little off on my second scenario.  The detectduplicate 
>> processor uses the distributedcache service all on its own.. Files that are 
>> route through it are loaded into the cache if they do not already exist in 
>> the cache.  if they do already exist they are routed to duplicate.  The 
>> putDistributedCache processor was a community contribution to which there 
>> are no processor that make use of the info that it caches.
>>
>>   We should probably build a processor that would make use of the data 
>> that can be loaded by the putDistributeCache processor.  Is there a 
>> particular use case you are trying to solve where this would be applicable?
>>
>>
>>Thanks,
>>Matt
>>
>>
>>On Tue, Jan 12, 2016 at 8:11 AM, Matthew Clarke <matt.clarke@gmail.com> 
>>wrote:
>>
>>Sudeep,
>>>The DistributedMapCache is typically used to prevent the consumption of 
>>> duplicate data by some of the ingest type processors (GetHBASE, ListHDFS, 
>>> and ListSFTP).  NiFi uses the service to keep a listing of what has been 
>>> consumed so the same files are not consumed multiple times. The Service can 
>>> also be used to detect if duplicate data already exists within a NiFi 
>>> Instance or cluster. This would be the scenario where some source is 
>>> pushing data to your NiFi and perhaps they push the same data more than 
>>> once. You want to catch these duplicates so you can perhaps kick them out 
>>> of your flow. For this you would use the PutDistributedCache processor to 
>>> cache all incoming data and then use the DetectDuplicate processor to find 
>>> those duplicates.
>>>
>>>Was there a different use case you were looking to solve using the 
>>> Distributed cache service?
>>>
>>>
>>>Thanks,
>>>Matt
>>>
>>>
>>>On Tue, Jan 12, 2016 at 4:36 AM, sudeep mishra <sudeepshekh...@gmail.com> 
>>>wrote:
>>>
>>>Hi,
>>>>
>>>>
>>>>I can cache some data to be used in NiFi flow. I can see the processor 
>>>>PutDistributedMapCache in the documentation which saves key-value pairs in 
>>>>DistributedMapCache for NiFi but I do not see any processor to red this 
>>>>data. How can I read data from DistributedMapCache in my data flow?
>>>>
>>>>
>>>>
>>>>
>>>>Thanks & Regards,
>>>>
>>>>
>>>>Sudeep Shekhar Mishra
>>>>
>>>>
>>>
>>
>
>
>
>-- 
>
>Thanks & Regards,
>
>
>Sudeep Shekhar Mishra
>
>
>+91-9167519029
>sudeepshekh...@gmail.com
>
>


-- 

Thanks & Regards,

Sudeep Shekhar Mishra

+91-9167519029
sudeepshekh...@gmail.com


Re: PutDistributedMapCache

2016-01-12 Thread Joe Percivall
Hello Sudeep,
We are currently lacking a "GetDistributedMapCache" processor that corresponds 
to the "PutDistributedMapCache". I created a ticket[1] and will be working on 
it today. If you have any comments, configuration suggestions, etc. please let 
me know or comment on the ticket.
[1] https://issues.apache.org/jira/browse/NIFI-1382 Joe- - - - - - Joseph 
Percivalllinkedin.com/in/Percivalle: joeperciv...@yahoo.com
 

On Tuesday, January 12, 2016 9:46 AM, sudeep mishra 
 wrote:
 

 Thanks Matt.
In my data flow I am expected to perform certain validations on data. I am 
loading some SQLServer data into HDFSusing Sqoop (not part of NiFi flow). For 
each record in HDFS file I have to query another database and then save the 
validated record again in HDFS which will be processed bysome Spark jobs.
Since I have to query for each record thus I was planning to cache the database 
records against which I have to validate the HDFS. Thus I was evaluating the 
DistributedCacheServer. But looks like its purpose is different. Alternatively 
can we integrate Redis or another distributed cache with NiFi as I do not see 
any processor for it.
Appreciate your help.
Thanks & Regards,
Sudeep

On Tue, Jan 12, 2016 at 6:59 PM, Matthew Clarke  
wrote:

Sudeep,       I was a little off on my second scenario.  The detectduplicate 
processor uses the distributedcache service all on its own.. Files that are 
route through it are loaded into the cache if they do not already exist in the 
cache.  if they do already exist they are routed to duplicate.  The 
putDistributedCache processor was a community contribution to which there are 
no processor that make use of the info that it caches.

       We should probably build a processor that would make use of the data 
that can be loaded by the putDistributeCache processor.  Is there a particular 
use case you are trying to solve where this would be applicable?
Thanks,Matt
On Tue, Jan 12, 2016 at 8:11 AM, Matthew Clarke  
wrote:

Sudeep,    The DistributedMapCache is typically used to prevent the consumption 
of duplicate data by some of the ingest type processors (GetHBASE, ListHDFS, 
and ListSFTP).  NiFi uses the service to keep a listing of what has been 
consumed so the same files are not consumed multiple times. The Service can 
also be used to detect if duplicate data already exists within a NiFi Instance 
or cluster. This would be the scenario where some source is pushing data to 
your NiFi and perhaps they push the same data more than once. You want to catch 
these duplicates so you can perhaps kick them out of your flow. For this you 
would use the PutDistributedCache processor to cache all incoming data and then 
use the DetectDuplicate processor to find those duplicates.

    Was there a different use case you were looking to solve using the 
Distributed cache service?
Thanks,Matt
On Tue, Jan 12, 2016 at 4:36 AM, sudeep mishra  wrote:

Hi,
I can cache some data to be used in NiFi flow. I can see the processor 
PutDistributedMapCache in the documentation which saves key-value pairs in 
DistributedMapCache for NiFi but I do not see any processor to red this data. 
How can I read data from DistributedMapCache in my data flow?


Thanks & Regards,
Sudeep Shekhar Mishra








-- 
Thanks & Regards,
Sudeep Shekhar Mishra
+91-9167519029sudeepshekh...@gmail.com

  

Re: Is it possible to to create a dynamic Processor through API call from PHP

2016-01-06 Thread Joe Percivall
Hello Kacem,

When you say you are receiving them with random names, do you mean the Flowfile 
filenames or Facebook names? 

Also when you say "i'll try to find out how i can parse them to rename them as 
JSON files instead of POST fields.", are you trying to change the content of 
the Flowfile into JSON using the POST fields (or http headers)? If so, the 
headers should already be added to each Flowfile as attributes and you can just 
use the AttributesToJSON processor.

[1] 
https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi.processors.standard.AttributesToJSON/index.html
 
Joe
- - - - - - 
Joseph Percivall
linkedin.com/in/Percivall
e: joeperciv...@yahoo.com



On Saturday, January 2, 2016 11:09 AM, "BOUKRAA, Kacem" <bk_bouk...@esi.dz> 
wrote:



I don't have SSL certificate (in fact i do but i used Let's encrypt certificate 
which as it seems are not recognized with Facebook API).


On 2 January 2016 at 16:15, Simon Ball <sb...@hortonworks.com> wrote:

Why not use the SSL layer provided by ListenHTTP? 
>
>
>Simon
>
>
>
>
>On 2 Jan 2016, at 08:47, BOUKRAA, Kacem <bk_bouk...@esi.dz> wrote:
>>
>>Hello,
>>I'm using a php script because the source requires an ssl connexion which is 
>>not the case with Nifi connexions.
>>I've set up a listener (php file with secured connexion), and then once 
>>retrieved, the data is sent to Nifi ListenHTTP processor through POST 
>>request. The thing is i receive them with random names, i'll try to find out 
>>how i can parse them to rename them as JSON files instead of POST fields.
>>Thanks for your help.
>>
>>
>>On 29 December 2015 at 16:50, Joe Percivall <joeperciv...@yahoo.com> wrote:
>>
>>Hello Kacem,
>>>
>>>There are multiple different ways to get information into NiFi. ListenHTTP 
>>>would be the easiest way to start an HTTP Server that is used to receive 
>>>FlowFiles from remote sources. Assuming your PHP code only acts as a router, 
>>>ListenHTTP should allow you to bypass
the PHP code entirely and just launch the workflow when it receives an HTTP 
request from the Web Service.
>>>
>>>If the PHP code is necessary and it's running on the same box as NiFi then 
>>>you could use the GetFile processor. That mean you write a file via PHP then 
>>>NiFi would grab the file and process it whenever possible. Of course there 
>>>are considerations you have to
take into account when you start putting things directly onto the file system.
>>>
>>>There are many other options for getting files into NiFi and if either of 
>>>these options don't fit your use-case just let us know.
>>>
>>>Joe
>>>- - - - - -
>>>Joseph Percivall
>>>linkedin.com/in/Percivall
>>>e: joeperciv...@yahoo.com
>>>
>>>
>>>
>>>
>>>On Tuesday, December 29, 2015 4:56 AM, "BOUKRAA, Kacem" <bk_bouk...@esi.dz> 
>>>wrote:
>>>
>>>
>>>
>>>Hello everyone,
>>>
>>>So i'm subscribing to an API Callback in a Web service that send a post 
>>>request once new data is available. I'm receiving this call through PHP.
>>>
>>>Is it possible to trigger a processor to retrieve the whole data and launch 
>>>the workflow of processing for that data through another API Call from my 
>>>PHP Code?
>>>
>>>(Web service send an API Callback --> My PHP code was listening --> Send an 
>>>API call to Nifi to launch a processor with specific attributes)
>>>
>>>Another question: It seems like Nifi has ListenHTTP processor. does it allow 
>>>to have an url to be used as a callback (which means is accessible through 
>>>the network to be used as a API Callback url?).
>>>
>>>Thanks in advance.
>>>
>>
>>
>>
>>
-- 
>>
>>
>>
>> Kacem BOUKRAA
>>5thyear student at ESI | Higher National School Of Computer Science 
>>(Information Systems)
>>Google Student Ambassador in Algeria
>>Kouba - Alger
>> 
>>mobile: +213 559 859 858 |  email: m...@kacemb.com
>>twitter: @kacem4dz |  website: www.kacemb.com
>> 
>>
>>
>>
>>
>


-- 



Kacem BOUKRAA
5thyear student at ESI | Higher National School Of Computer Science 
(Information Systems)
Google Student Ambassador in Algeria
Kouba - Alger

mobile: +213 559 859 858 |  email: m...@kacemb.com
twitter: @kacem4dz |  website: www.kacemb.com


Re: Is it possible to to create a dynamic Processor through API call from PHP

2015-12-29 Thread Joe Percivall
Hello Kacem,

There are multiple different ways to get information into NiFi. ListenHTTP 
would be the easiest way to start an HTTP Server that is used to receive 
FlowFiles from remote sources. Assuming your PHP code only acts as a router, 
ListenHTTP should allow you to bypass the PHP code entirely and just launch the 
workflow when it receives an HTTP request from the Web Service.

If the PHP code is necessary and it's running on the same box as NiFi then you 
could use the GetFile processor. That mean you write a file via PHP then NiFi 
would grab the file and process it whenever possible. Of course there are 
considerations you have to take into account when you start putting things 
directly onto the file system.
 
There are many other options for getting files into NiFi and if either of these 
options don't fit your use-case just let us know.

Joe
- - - - - - 
Joseph Percivall
linkedin.com/in/Percivall
e: joeperciv...@yahoo.com



On Tuesday, December 29, 2015 4:56 AM, "BOUKRAA, Kacem"  
wrote:



Hello everyone,

So i'm subscribing to an API Callback in a Web service that send a post request 
once new data is available. I'm receiving this call through PHP.

Is it possible to trigger a processor to retrieve the whole data and launch the 
workflow of processing for that data through another API Call from my PHP Code?

(Web service send an API Callback --> My PHP code was listening --> Send an API 
call to Nifi to launch a processor with specific attributes)

Another question: It seems like Nifi has ListenHTTP processor. does it allow to 
have an url to be used as a callback (which means is accessible through the 
network to be used as a API Callback url?).

Thanks in advance.


Re: InvokeHTTP request parameters

2015-12-18 Thread Joe Percivall
Hello James,

I'm actually working on repo that includes many different templates. There are 
a few that have InvokeHttp in them (can find the exact ones using the excel doc 
at the top level) but this is a simple one that uses InvokeHttp:

https://github.com/hortonworks-gallery/nifi-templates/blob/master/templates/InvokeHttp_And_Route_Original_On_Status.xml


Does that fit your use-case?

Joe
- - - - - - 
Joseph Percivall
linkedin.com/in/Percivall
e: joeperciv...@yahoo.com



On Friday, December 18, 2015 2:52 PM, Burrows James A 
 wrote:




I’m wondering if anyone has a sample, or tutorial on how to configure the 
InvokeHTTP request parameters.
I see that you can configure the request headers using the Attributes to Send 
property, but I need to be able to specify the request parameters to integrate 
with a rest api.
 
 
Thanks,
James


Fw: [RMX:NL] Re: [RMX:NL] Re: [RMX:NL] Re: InvokeHTTP request parameters

2015-12-18 Thread Joe Percivall
I accidentally replied to just James with one of my responses. Forwarding the 
thread back.
 - - - - - - 
Joseph Percivall
linkedin.com/in/Percivall
e: joeperciv...@yahoo.com




On Friday, December 18, 2015 5:20 PM, Joe Percivall <joeperciv...@yahoo.com> 
wrote:
Following this post[1] on sending and retrieving HTTP form data I believe I 
know what you want to do.

The variables should be sent as the body of the POST request. The InvokeHttp 
processor POSTs whatever the content of the FlowFile is. You'll need to add a 
ReplaceText processor before InvokeHttp that replaces the entire content of the 
FlowFile with the value you want to send. 

The default configuration for the ReplaceText processor is to replace the 
entire contents so you should just be able to change the "Replacement Value" 
property to "Identifier=${MessageIdentifier}=${MessageBody}".

[1] 
https://developer.mozilla.org/en-US/docs/Web/Guide/HTML/Forms/Sending_and_retrieving_form_data

Sorry, I am not versed in the terminology for sending forms via HTTP but hope 
this works,
Joe
- - - - - - 
Joseph Percivall
linkedin.com/in/Percivall
e: joeperciv...@yahoo.com





On Friday, December 18, 2015 5:03 PM, Burrows James A 
<james.burr...@dematic.com> wrote:
This is what I currently have:

Get a message from a kafka queue which may contain multiple messages in an 
array ->
Split the messages into separate flow files ->
Create some attributes from each message so I can access them 
(MessageIdentifier, MessageBody) ->
POST the message to a rest api 
(Identifier=${MessageIdentifier}=${MessageBody})

With the example template you provided it would be 
http://localhost/submit?Identifier=${MessageIdentifier}=${MessageBody} for 
a GET request, but I need them to be in form variables since the rest api is a 
POST method

IE. 
GET /submit?Identifier=${MessageIdentifier}=${MessageBody} HTTP/1.1
Host: localhost

Vs

POST /submit HTTP/1.1
Host: localhost
Identifier=${MessageIdentifier}=${MessageBody}

James


-Original Message-
From: Joe Percivall [mailto:joeperciv...@yahoo.com] 
Sent: Friday, December 18, 2015 2:51 PM
To: Burrows James A
Subject: Re: [RMX:NL] Re: [RMX:NL] Re: [RMX:NL] Re: InvokeHTTP request 
parameters

The only parsing of the URL that InvokeHTTP does is analyze it for the protocol 
(http or https) and for expression language (EL), like had in the template 
(${q}). Aside from that, what ever URL is is where the request will be routed 
to.

When you say variables, do you mean request headers? If so there are a couple 
different ways to add them that I can explain depending on how you want to 
configure it (attributes of the incoming flowfiles, same header every time, 
etc.).

Sorry just having a little trouble figuring out what you mean but I'm happy to 
work through it with you, Joe

- - - - - -
Joseph Percivall
linkedin.com/in/Percivall
e: joeperciv...@yahoo.com




On Friday, December 18, 2015 4:39 PM, Burrows James A 
<james.burr...@dematic.com> wrote:
Is it supposed to parse out the query string from the url, and convert the 
parameters to form variables?
I tried converting the method to POST, but they still came across as GET 
parameters.

Thanks again, I've been stuck on this for days.

James


-Original Message-
From: Joe Percivall [mailto:joeperciv...@yahoo.com] 
Sent: Friday, December 18, 2015 2:32 PM
To: Burrows James A
Subject: Re: [RMX:NL] Re: [RMX:NL] Re: [RMX:NL] Re: InvokeHTTP request 
parameters

Awesome! Glad to hear that fixed it and sorry I linked you to an incorrect 
template.

For other HTTP methods just change the "HTTP Method" property to your desired 
method.

Joe
- - - - - - 
Joseph Percivall
linkedin.com/in/Percivall
e: joeperciv...@yahoo.com




On Friday, December 18, 2015 4:28 PM, Burrows James A 
<james.burr...@dematic.com> wrote:
Thanks for updating the template it worked perfect for GET methods.
I'm now curious how I would do the same thing for POST,PUT,DELETE methods?

James


-Original Message-
From: Joe Percivall [mailto:joeperciv...@yahoo.com] 
Sent: Friday, December 18, 2015 1:58 PM
To: users@nifi.apache.org
Subject: [RMX:NL] Re: [RMX:NL] Re: [RMX:NL] Re: InvokeHTTP request parameters

Apparently that template doesn't actually do what it said it did. It wasn't 
using the "q" attribute in the URL so it was just hitting google without any 
query.

I just pushed out a change to the repo which fixes it to properly hit 
"http://www.google.com/search?q=${q}=j; where the ${q} is replace with the 
attribute that's created in the previous processor. The "rct=j" was just part 
of the url when I did manual google search in my browser and allowed me to do 
the search in NiFi as well.

I'll be changing on Confluence as well.

Sorry about that,
Joe
- - - - - - 
Joseph Percivall
linkedin.com/in/Percivall
e: joeperciv...@yahoo.com




On Friday, December 18, 2015 3:32 PM, Burrows James A 
<james.burr..

Re: [RMX:NL] Re: [RMX:NL] Re: InvokeHTTP request parameters

2015-12-18 Thread Joe Percivall
Apparently that template doesn't actually do what it said it did. It wasn't 
using the "q" attribute in the URL so it was just hitting google without any 
query.

I just pushed out a change to the repo which fixes it to properly hit 
"http://www.google.com/search?q=${q}=j; where the ${q} is replace with the 
attribute that's created in the previous processor. The "rct=j" was just part 
of the url when I did manual google search in my browser and allowed me to do 
the search in NiFi as well.
 
I'll be changing on Confluence as well.

Sorry about that,
Joe
- - - - - - 
Joseph Percivall
linkedin.com/in/Percivall
e: joeperciv...@yahoo.com




On Friday, December 18, 2015 3:32 PM, Burrows James A 
<james.burr...@dematic.com> wrote:
Yes the HTTP call being generated is sending without the query string values.
For the configuration of my InvokeHTTP I used that template, but changed the 
remote URL to point to my server (https://www.google.com -> http://localhost).

James


-----Original Message-
From: Joe Percivall [mailto:joeperciv...@yahoo.com] 
Sent: Friday, December 18, 2015 1:25 PM
To: users@nifi.apache.org
Subject: [RMX:NL] Re: [RMX:NL] Re: InvokeHTTP request parameters

Yeah I compiled from the wiki as well. I want to get a listing of as many 
templates as possible so when someone needs an example template I/we can check 
the excel doc to see which processors are used in which template. Still working 
on generating more content though.

Are you saying that the HTTP call you're generating in InvokeHttp is arriving 
at the server without any query string values? 

If that's the case and you're able, can you reply with your InvokeHttp config?

Joe
- - - - - - 
Joseph Percivall
linkedin.com/in/Percivall
e: joeperciv...@yahoo.com




On Friday, December 18, 2015 3:16 PM, Burrows James A 
<james.burr...@dematic.com> wrote:
Thanks for the reply.

That appears the be the same template from the wiki, and it seems like it 
should work, but when I get the request on the server there are no querystring 
values.


-Original Message-
From: Joe Percivall [mailto:joeperciv...@yahoo.com] 
Sent: Friday, December 18, 2015 12:58 PM
To: users@nifi.apache.org
Subject: [RMX:NL] Re: InvokeHTTP request parameters

Hello James,

I'm actually working on repo that includes many different templates. There are 
a few that have InvokeHttp in them (can find the exact ones using the excel doc 
at the top level) but this is a simple one that uses InvokeHttp:

https://github.com/hortonworks-gallery/nifi-templates/blob/master/templates/InvokeHttp_And_Route_Original_On_Status.xml


Does that fit your use-case?

Joe
- - - - - - 
Joseph Percivall
linkedin.com/in/Percivall
e: joeperciv...@yahoo.com



On Friday, December 18, 2015 2:52 PM, Burrows James A 
<james.burr...@dematic.com> wrote:




I’m wondering if anyone has a sample, or tutorial on how to configure the 
InvokeHTTP request parameters.
I see that you can configure the request headers using the Attributes to Send 
property, but I need to be able to specify the request parameters to integrate 
with a rest api.


Thanks,
James


Re: Content replacement

2015-12-15 Thread Joe Percivall
Hey Chandu,
You messed up bit in the config of your ReplaceText processor. You should be 
running "Line-by-line" mode so each line is treated separately. 
Also your search value is a bit messed up. It was trying to use capture groups 
in it ("$2"), it wasn't successfully handling an initial date of only 1 digit 
("\d{2}"), it wasn't configured to match the spaces in column one and you need 
to either need explicitly make groups  "non-capture groups" using "?:" or don't 
have them as capture groups. This search value works for me:
((?:\d{1,2}) (?:[a-zA-Z]{3}) (?:\d{4})),(.*),(.*)
Your replace value looks good though. 
Hope that helps,Joe
- - - - - - Joseph Percivalllinkedin.com/in/Percivalle: joeperciv...@yahoo.com
 


On Monday, December 14, 2015 5:21 PM, Chandu Koripella 
 wrote:
 

 #yiv5326222985 #yiv5326222985 -- _filtered #yiv5326222985 
{font-family:Calibri;panose-1:2 15 5 2 2 2 4 3 2 4;} _filtered #yiv5326222985 
{font-family:Tahoma;panose-1:2 11 6 4 3 5 4 4 2 4;}#yiv5326222985 
#yiv5326222985 p.yiv5326222985MsoNormal, #yiv5326222985 
li.yiv5326222985MsoNormal, #yiv5326222985 div.yiv5326222985MsoNormal 
{margin:0in;margin-bottom:.0001pt;font-size:12.0pt;}#yiv5326222985 a:link, 
#yiv5326222985 span.yiv5326222985MsoHyperlink 
{color:blue;text-decoration:underline;}#yiv5326222985 a:visited, #yiv5326222985 
span.yiv5326222985MsoHyperlinkFollowed 
{color:purple;text-decoration:underline;}#yiv5326222985 
p.yiv5326222985MsoAcetate, #yiv5326222985 li.yiv5326222985MsoAcetate, 
#yiv5326222985 div.yiv5326222985MsoAcetate 
{margin:0in;margin-bottom:.0001pt;font-size:8.0pt;}#yiv5326222985 
p.yiv5326222985xmsonormal, #yiv5326222985 li.yiv5326222985xmsonormal, 
#yiv5326222985 div.yiv5326222985xmsonormal 
{margin-right:0in;margin-left:0in;font-size:12.0pt;}#yiv5326222985 
span.yiv5326222985BalloonTextChar {}#yiv5326222985 
span.yiv5326222985EmailStyle20 {color:#1F497D;}#yiv5326222985 
.yiv5326222985MsoChpDefault {font-size:10.0pt;} _filtered #yiv5326222985 
{margin:1.0in 1.0in 1.0in 1.0in;}#yiv5326222985 div.yiv5326222985WordSection1 
{}#yiv5326222985 Hi Mark,    I am testing replace test feature in 0.4.0. I 
don’t see it is working. I tried with couple of the different columns. None of 
them aren’t replacing the text.  can you please take a look?        Thanks, 
Chandu    From: Mark Payne [mailto:marka...@hotmail.com]
Sent: Friday, December 04, 2015 2:05 PM
To: users@nifi.apache.org
Subject: Re: Content replacement    Chandru,    This ticket has now been merged 
to master. This feature will be available in 0.4.0, which should be out next 
week. If you prefer to build from source, it is now available for use from 
master. You should be able to now configure ReplaceText to easily replace the 
values:           Here, we use the Search value of (.*?),(.*?),(\d+.*) So the 
first Capturing Group gets the first column, the second Capturing Group gets 
the second column, and the third Capturing Group gets the third column (but 
only if it starts with a digit, so this won't match the header line and the 
header line will remain unchanged).    For the Replacement Value, we use: 
$1,$2,${ '$3':toDate( 'ddMMM'):format('/MM/dd') } So here we are using 
back references to replace the line with the first two columns, followed by an 
Expression Language Expression that parses the third back reference (the third 
column). Since the variable that we want to reference is named $3, we need to 
enclose it in quotes to escape the name because it begins with a non-alpha 
character. We can then call any Expression Language Expression that we want. So 
toDate() can be used to parse the string as Date and then we can use format() 
to format that date as a string in a different format.    You can also see the 
template available on the NIFI-1249 ticket, also attached for convenience here, 
but I don't know if the apache mailing list will let the template through.    
This was a great addition to NiFi that I think will help out in a lot of cases 
- thanks for reaching out to us on this!  Please let us know if you have issues 
getting this to work, or if you have any further questions. We're happy to help 
however we can.    Thanks -Mark                
On Dec 4, 2015, at 3:44 PM, Joe Witt  wrote:    Chandru,    
 Correct what you're trying doesn't work just yet.  But once this [1] is 
reviewed and pushed to master we should have your original request covered 
quite nicely.    [1] https://issues.apache.org/jira/browse/NIFI-1249    Thanks 
Joe     On Fri, Dec 4, 2015 at 3:26 PM, Chandu Koripella 
 wrote: Thanks Joe,   I believe I tried this option 
earlier. As you notice I have 4 groups in the regular expression. I tried with 
all four groups. Nothing gets replaced in the put file.    Group4 is 4 digit 
year, group3 is 3 digit month, group2 is date and group1 is ddMMM     
   From: Joe Witt [mailto:joe.w...@gmail.com]
Sent: Thursday, December 03, 2015 7:00 PM

Re: Mirroring nifi lists on gmane

2015-12-14 Thread Joe Percivall
Does anyone have any more opinions on this? - - - - - - Joseph 
Percivalllinkedin.com/in/Percivalle: joeperciv...@yahoo.com
 


On Saturday, December 5, 2015 6:54 AM, Nigel Jones  
wrote:
 

 I have no experience of nabble but I'm also open to any similar suggestions? . 
 I see some apache mailing lists there too 

On Sat, 5 Dec 2015 03:55 Tony Kurc  wrote:

Similar to nabble, right?On Dec 4, 2015 10:18 PM, "Joe Witt" 
 wrote:

Is anyone aware of any reasons not to do this?


On Fri, Dec 4, 2015 at 8:59 AM, Nigel Jones  wrote:
> Would it be ok if I asked gmane to mirror these nifi lists on the gmane nntp
> server?
>
> I personally find it a little easier to work through & catch up with
> discussions - though doing this shouldn't affect anyone not using gmane.
>
> I'm happy to do the submission as per http://gmane.org/add.php
>
> Other apache mailing lists already feature there including for example
> Apache Atlas
> ie
> gmane.comp.apache.incubator.atlas.dev at
> news://news.gmane.org/gmane.comp.apache.incubator.atlas.devel
>
> Thanks
> Nigel Jones




  

Re: InvokeHTTP Content Type

2015-12-04 Thread Joe Percivall
I see, as for the request I'm guessing you mean you pass a header with key 
"Accept" and value "application/json"? If so in the configuration tab for the 
processor properties create a new dynamic one (using the plus in the top right) 
with a key of "Accept" and value of "application/json".

You use the "Attributes to Send" property when you already have the header 
value as an attribute of the flowfile and want to send an attribute with that 
name and it's value.

Joe
 
- - - - - - 
Joseph Percivall
linkedin.com/in/Percivall
e: joeperciv...@yahoo.com




On Friday, December 4, 2015 4:06 PM, indus well <indusw...@gmail.com> wrote:



Thanks, Joe. My use case involves calling a REST service that returns XML 
response by default. However, it also has option for JSON which I can pass 
"Accept: application/json" along with the request using CURL for example. I 
tried to set that on the "Attributes to Send" property in the InvokeHTTP 
processor, but it did not work. It always returns XML.

Thanks,

Indus 


On Fri, Dec 4, 2015 at 2:49 PM, Joe Percivall <joeperciv...@yahoo.com> wrote:

Hello Indus,
>
>Are you receiving a specific error when you attempt to receive a JSON response?
>
>I can configure an InvokeHTTP processor to GET "http://api.randomuser.me/; by 
>just setting the URL and it properly receives back a JSON response.
>
>Joe
>
>
>- - - - - -
>Joseph Percivall
>linkedin.com/in/Percivall
>e: joeperciv...@yahoo.com
>
>
>
>
>
>On Friday, December 4, 2015 3:27 PM, indus well <indusw...@gmail.com> wrote:
>
>
>
>Hi All:
>
>How would I configure the InvokeHTTP processor to receive content type of JSON 
>response?
>
>Thanks,
>
>Indus Well
>


Re: InvokeHTTP Content Type

2015-12-04 Thread Joe Percivall
Hello Indus,

Are you receiving a specific error when you attempt to receive a JSON response? 

I can configure an InvokeHTTP processor to GET "http://api.randomuser.me/; by 
just setting the URL and it properly receives back a JSON response.

Joe


- - - - - - 
Joseph Percivall
linkedin.com/in/Percivall
e: joeperciv...@yahoo.com




On Friday, December 4, 2015 3:27 PM, indus well  wrote:



Hi All:

How would I configure the InvokeHTTP processor to receive content type of JSON 
response?

Thanks,

Indus Well


Re: InvokeHTTP Content Type

2015-12-04 Thread Joe Percivall
Sorry, what version are you running? Adding the property like that is something 
I added for 0.4.0.
 
- - - - - - 
Joseph Percivall
linkedin.com/in/Percivall
e: joeperciv...@yahoo.com




On Friday, December 4, 2015 4:27 PM, indus well <indusw...@gmail.com> wrote:



Yes, I passed a header with key "Accept" and value "application/json". I 
followed your instruction and add a new property in the InvokeHTTP process 
called "Accept" and value of "application/json", but I now had an error said 
"'Accept' validated against 'application/json' is invalid because 'Accept' is 
not a supported property." 

Did I do something wrong?

Thanks,

Indus


On Fri, Dec 4, 2015 at 3:12 PM, Joe Percivall <joeperciv...@yahoo.com> wrote:

I see, as for the request I'm guessing you mean you pass a header with key 
"Accept" and value "application/json"? If so in the configuration tab for the 
processor properties create a new dynamic one (using the plus in the top right) 
with a key of "Accept" and value of "application/json".
>
>You use the "Attributes to Send" property when you already have the header 
>value as an attribute of the flowfile and want to send an attribute with that 
>name and it's value.
>
>Joe
>
>- - - - - -
>Joseph Percivall
>linkedin.com/in/Percivall
>e: joeperciv...@yahoo.com
>
>
>
>
>
>On Friday, December 4, 2015 4:06 PM, indus well <indusw...@gmail.com> wrote:
>
>
>
>Thanks, Joe. My use case involves calling a REST service that returns XML 
>response by default. However, it also has option for JSON which I can pass 
>"Accept: application/json" along with the request using CURL for example. I 
>tried to set that on the "Attributes to Send" property in the InvokeHTTP 
>processor, but it did not work. It always returns XML.
>
>Thanks,
>
>Indus
>
>
>On Fri, Dec 4, 2015 at 2:49 PM, Joe Percivall <joeperciv...@yahoo.com> wrote:
>
>Hello Indus,
>>
>>Are you receiving a specific error when you attempt to receive a JSON 
>>response?
>>
>>I can configure an InvokeHTTP processor to GET "http://api.randomuser.me/; by 
>>just setting the URL and it properly receives back a JSON response.
>>
>>Joe
>>
>>
>>- - - - - -
>>Joseph Percivall
>>linkedin.com/in/Percivall
>>e: joeperciv...@yahoo.com
>>
>>
>>
>>
>>
>>On Friday, December 4, 2015 3:27 PM, indus well <indusw...@gmail.com> wrote:
>>
>>
>>
>>Hi All:
>>
>>How would I configure the InvokeHTTP processor to receive content type of 
>>JSON response?
>>
>>Thanks,
>>
>>Indus Well
>>
>


Re: Is it possible to create instances of a processor dynamically based on HTTP response?

2015-12-01 Thread Joe Percivall
Ah that makes the use-case a lot simpler. 

Given the list of commenters you can pass it into SplitJSON which you can use 
to split the data objects into multiple flowfiles. Then use EvaluateJSONPath to 
add the ids to an attribute of the flowfile. Finally use InvokeHTTP and the 
expression language to create a request out of the attributes.
 
Cheers,
Joe

- - - - - - 
Joseph Percivall
linkedin.com/in/Percivall
e: joeperciv...@yahoo.com




On Tuesday, December 1, 2015 11:23 AM, "BOUKRAA, Kacem" <bk_bouk...@esi.dz> 
wrote:



Thanks for the answer, i'll be considering your suggestion for further 
documentation.

For your second question, there no policy for doing this, i want just to that 
from a given list of ID (from a text file maybe with an ID for each line), the 
structure of the link is the same (i.e: graphs.facebook.com/user+id  only the 
id changes).
Here is the comments example. i have many people who commented the post, i need 
to do an HTTP request for each one. So i need to invoke not one HTTP request 
but many after each retrieval of one post comment.
Kindly.

​



On 1 December 2015 at 17:07, Joe Percivall <joeperciv...@yahoo.com> wrote:

Hello Kacem,
>
>For your first question you can set up a relatively simple flow to achieve 
>this. Assuming the first retrieval is a GET method call you can do: GetHTTP -> 
>ExtractText -> InvokeHTTP. What this does is, it first gets the JSON files 
>from the social network using GetHTTP. Then with ExtractText you extract the 
>values you are interested in using in the last Invoke (in other words add the 
>values as attributes of the FlowFile). Then in InvokeHTTP use the attribute 
>expression language [1] to form your request. The docs for the processors can 
>be found here[2].
>
>For the second scenario, are you specifically asking how to create calls that 
>hit the ids 1-1000 in a round robin fashion or is there a specific way you 
>determine the ids/scheduling?
>
>[1] https://nifi.apache.org/docs/nifi-docs/html/expression-language-guide.html
>[2] https://nifi.apache.org/docs.html
>
>Welcome to NiFi!
>Joe
>
>
>- - - - - - Joseph Percivall
>linkedin.com/in/Percivall
>e: joeperciv...@yahoo.com
>
>
>
>
>On Tuesday, December 1, 2015 10:35 AM, "BOUKRAA, Kacem" <bk_bouk...@esi.dz> 
>wrote:
>
>
>
>
>
>Hello everyone,
>So my case is as the following, i'm trying to retrieve JSON files from a 
>social network (Facebook), and based on some values in the file of the HTTP 
>response i want to invoke a new HTTP requests dynamically (in function of the 
>total number of values count).
>Ex: Retrieve a JSON file for a page publication in Facebook Page (with HTTP 
>request), and then retrieve the info about each profil of each one who 
>commented on that publication.
>So is there any way to achieve this?
>Another question, Is there any Loop possibility for creating a new instances 
>of a HTTP processor with different parameter for each (the profile ID for each 
>one who commented)?
>(Ex: graphs.facebook.com/ID1, graphs.facebook.com/ID2 .. 
>graphs.facebook.com/ID1000 without creating each processor separately), If 
>this not possible do you recommend me to use Flume for this purpose?
>​Thanks in advance :)​--
>
>
>
>
>Kacem BOUKRAA
>5thyear student at ESI | Higher National School Of Computer Science 
>(Information Systems)
>Google Student Ambassador in Algeria
>Kouba - Alger
>
>mobile: +213 559 859 858 |  email: m...@kacemb.com
>twitter: @kacem4dz |  website: www.kacemb.com
>


-- 



Kacem BOUKRAA
5thyear student at ESI | Higher National School Of Computer Science 
(Information Systems)
Google Student Ambassador in Algeria
Kouba - Alger

mobile: +213 559 859 858 |  email: m...@kacemb.com
twitter: @kacem4dz |  website: www.kacemb.com


Re: Is it possible to create instances of a processor dynamically based on HTTP response?

2015-12-01 Thread Joe Percivall
Hello Kacem,

For your first question you can set up a relatively simple flow to achieve 
this. Assuming the first retrieval is a GET method call you can do: GetHTTP -> 
ExtractText -> InvokeHTTP. What this does is, it first gets the JSON files from 
the social network using GetHTTP. Then with ExtractText you extract the values 
you are interested in using in the last Invoke (in other words add the values 
as attributes of the FlowFile). Then in InvokeHTTP use the attribute expression 
language [1] to form your request. The docs for the processors can be found 
here[2].

For the second scenario, are you specifically asking how to create calls that 
hit the ids 1-1000 in a round robin fashion or is there a specific way you 
determine the ids/scheduling?
 
[1] https://nifi.apache.org/docs/nifi-docs/html/expression-language-guide.html
[2] https://nifi.apache.org/docs.html

Welcome to NiFi!
Joe


- - - - - - Joseph Percivall
linkedin.com/in/Percivall
e: joeperciv...@yahoo.com




On Tuesday, December 1, 2015 10:35 AM, "BOUKRAA, Kacem"  
wrote:





Hello everyone,
So my case is as the following, i'm trying to retrieve JSON files from a social 
network (Facebook), and based on some values in the file of the HTTP response i 
want to invoke a new HTTP requests dynamically (in function of the total number 
of values count).
Ex: Retrieve a JSON file for a page publication in Facebook Page (with HTTP 
request), and then retrieve the info about each profil of each one who 
commented on that publication.
So is there any way to achieve this?
Another question, Is there any Loop possibility for creating a new instances of 
a HTTP processor with different parameter for each (the profile ID for each one 
who commented)? 
(Ex: graphs.facebook.com/ID1, graphs.facebook.com/ID2 .. 
graphs.facebook.com/ID1000 without creating each processor separately), If this 
not possible do you recommend me to use Flume for this purpose? 
​Thanks in advance :)​-- 



Kacem BOUKRAA
5thyear student at ESI | Higher National School Of Computer Science 
(Information Systems)
Google Student Ambassador in Algeria
Kouba - Alger

mobile: +213 559 859 858 |  email: m...@kacemb.com
twitter: @kacem4dz |  website: www.kacemb.com


Re: another proxied question from irc

2015-11-30 Thread Joe Percivall
Hey Tony,

I can't speak to the best practice for this controller service but just wanted 
to follow up to see if this user got their question answered.

Joe
 
- - - - - - 
Joseph Percivall
linkedin.com/in/Percivall
e: joeperciv...@yahoo.com




On Monday, November 23, 2015 3:32 PM, Tony Kurc  wrote:




12:56 < userX> Hey, does anyone have experience setting up the SSL Context in
 nifi?  I realize my knowledge of how certs/ssl is limited, but
 I was hoping someone could point me in the correct direction.
 I am getting a "No X509TrustManager available" -- any help?
13:02 < userX> I set the keystore and trust manager to the same file, and it
 is now working
13:02 < userX> I don't believe that is best practice, but I am not sure -- I
 will do more research



Tony


Re: Encrypting Attributes

2015-11-30 Thread Joe Percivall
That is one way you could do it if it fits your use-case. Where you take in the 
JSON flowfile, pass it through either the EvaluateJSONPath or RouteText (if 
it's new line deliminated) processors then encrypt the entire flowfile's 
content.

The main problem is at the moment the only processor that does encryption is 
the EncryptContent Processor which encrypts the entire FlowFile contents and 
there is no concept of sensitive attributes (that I've seen). So you could 
either route the sensitive properties out of the original flowfile and encrypt 
that (mentioned above) or extract out the JSON objects you need to work with to 
attributes then encrypt the entire content.
 
If your use-case only works when you can encrypt specific JSON values within a 
JSON text flowfile I'd suggest you file a Jira for it and we can work on it 
together.

Joe

- - - - - - 
Joseph Percivall
linkedin.com/in/Percivall
e: joeperciv...@yahoo.com




On Monday, November 30, 2015 12:39 PM, "Madhire, Naveen" 
<naveen.madh...@capitalone.com> wrote:
Hey Joe,

My requirement is to encrypt few sensitive customer information before
processing those files and decrypting those fields is not important now. I
want to see how Nifi can be useful here like encrpyting only few fields
before processing.


Is it possible to create a new flow file with those fields, route and
encrypt that flow file?

Thanks,
Naveen


On 11/30/15, 11:28 AM, "Joe Percivall" <joeperciv...@yahoo.com> wrote:

>Hello Naveen,
>
>Simply encrypting only certain parts of a text file is a use-case I
>hadn't thought of before. It could potentially be added as an option to
>the ReplaceText processor.
>
>That being said, what is the end-goal of encrypting certain elements of
>the flowfile content and why do you not want to encrypt the whole thing?
>Are you expecting to encrypt the certain elements of the JSON contents
>and un-encrypt them at a later point (in nifi or otherwise)? Keeping in
>mind that would require keeping track what password was used to encrypt
>every single flowfile.
>
>I can see the usefulness but just trying to see if that's the solution
>you really want (a feature not easily if at all supported now) or if
>there is a work around to meet your end-goal.
>
>Joe
>
>
>- - - - - - 
>Joseph Percivall
>linkedin.com/in/Percivall
>e: joeperciv...@yahoo.com
>
>
>
>
>
>On Monday, November 30, 2015 11:56 AM, "Madhire, Naveen"
><naveen.madh...@capitalone.com> wrote:
>
>
>
>Hi,
>
>My requirement is to encrypt only selected elements of the JSON flow
>file. I was thinking of adding those selected elements into attributes
>and encrypt those.
>
>Is there a way to encrypt specific/sensitive elements of the flow file
>instead of the whole flow file?
>
>
>Thanks,
>Naveen
>
>
>The information contained in this e-mail is confidential and/or
>proprietary to Capital One and/or its affiliates and may only be used
>solely in performance of work or services for Capital One. The
>information transmitted herewith is intended only for use by the
>individual or entity to which it is addressed. If the reader of this
>message is not the intended recipient, you are hereby notified that any
>review, retransmission, dissemination, distribution, copying or other use
>of, or taking of any action in reliance upon this information is strictly
>prohibited. If you have received this communication in error, please
>contact the sender and delete the material from your computer.

>



The information contained in this e-mail is confidential and/or proprietary to 
Capital One and/or its affiliates and may only be used solely in performance of 
work or services for Capital One. The information transmitted herewith is 
intended only for use by the individual or entity to which it is addressed. If 
the reader of this message is not the intended recipient, you are hereby 
notified that any review, retransmission, dissemination, distribution, copying 
or other use of, or taking of any action in reliance upon this information is 
strictly prohibited. If you have received this communication in error, please 
contact the sender and delete the material from your computer.


Re: queued files

2015-11-24 Thread Joe Percivall
Hello Charlie,

I was looking back through and saw this wasn't totally resolved yet. 


Couple questions. First, what system are you using? There are a couple of 
options for the stream command depending on what you're using. Also are you 
able to get new commands (using yum or brew)?

The key thing I want to solve is to find the encoding of a file just based on 
it contents and not relying on having access to the original file. 
ExecuteStreamCommand should enable this. This is because you can just pass any 
FlowFile into ExecuteStreamCommand then it can route the FlowFile contents to 
STDIN for the command to execute on.

Mac's (what I am using) default command for finding file encodings is "file -bi 
filename.txt" but it doesn't allow you to pass in a file via STDIN. I found a 
command called "uchardet"[1] which finds file encodings and allows you to pass 
the file in via STDIN. 

I attached a template that takes in a file using GetFile (deletes the original) 
and routes that FlowFile to ExecuteStreamCommand. ExecuteStreamCommand then 
runs "uchardet" on the contents of the FlowFile and outputs the encoding to the 
"encoding" attribute of the original FlowFile.
 
[1] https://github.com/BYVoid/uchardet

If this doesn't satisfy your needs just let me know!
Joe

- - - - - - 
Joseph Percivall
linkedin.com/in/Percivall
e: joeperciv...@yahoo.com




On Friday, November 20, 2015 9:53 AM, Charlie Frasure 
 wrote:



I'm definitely game for that.  Let me know what I can do to help.



On Fri, Nov 20, 2015 at 9:35 AM, Joe Witt  wrote:

Charlie
>
>Got ya.  I missed the 'encoding vs content type' thing.  I agree let's
>find a way to avoid the extra copy.  We dont expose the storage
>location of the underlying bytes.  So on the ListFile thing.  What I
>was thinking was this (and honestly I've not tested this so maybe i'm
>skipping something important)
>
>ListFile to get a listing of names/etc.. of interest
>
>Execute the 'file --mime-encoding ${filename}' to get more attributes
>available to work with
>
>RouteOnAttribute to decide what to do with the file next.  You can
>Fetch/delete what you don't want you can Fetch/pass on what you do
>
>I was looking for a way to check the mime-encoding while passing the
>data to detect into an input stream.  because that is actually how
>execute stream command wants to work.
>
>This is a use case that should be pretty easy so if you're willing to
>chat through it with us we'll figure out a path to make it work well.
>
>Thanks
>Joe
>
>On Fri, Nov 20, 2015 at 9:17 AM, Charlie Frasure
>
> wrote:
>> Thanks Joe,
>>
>> The use case is that I'm receiving data without knowing what character set
>> it is coming in.  --mime-encoding is giving it's best guess on character set
>> rather than the content type.
>>
>> The ListFile sounds interesting, but I wonder if I really even need that.  I
>> don't want to leave the files in place, I just want to run an external
>> command on them as part of the data flow.  Is there a way I can run an
>> external command against the physical file such as
>> /opt/nifi/somedir/12345.uuid?  Would that info be in an attribute somewhere?
>> It just seems wasteful to make an extra copy of the file, in order to run a
>> read-only command on it, then delete it.  If ListFiles is still the right
>> way to go, please let me know.
>>
>>
>> On Fri, Nov 20, 2015 at 6:45 AM, Joe Witt  wrote:
>>>
>>> For identifying the mime type you may have sufficient results with the
>>> existing processor 'IdentifyMimeType' which you can put into the flow.
>>>
>>> For better logic around identifying files to pull but first calling an
>>> external command to learn more about them the upcoming
>>> ListFile/FetchFile combo that comes from this JIRA [1] might give you
>>> better flexibility.
>>>
>>> [1] https://issues.apache.org/jira/browse/NIFI-631
>>>
>>> Thanks
>>> Joe
>>>
>>> On Fri, Nov 20, 2015 at 12:08 AM, Charlie Frasure
>>>  wrote:
>>> > Thanks everyone for the help.  The trouble started a few processors
>>> > earlier
>>> > in an ExecuteStreamCommand on ${filename} with the result of "file not
>>> > found".  I had originally set my GetFile processor to not remove files,
>>> > but
>>> > recently changed that.  Now it seems that my ExecuteStreamCommand may
>>> > not be
>>> > the best way to accomplish this.
>>> >
>>> > The command that gets executed is: file -b --mime-encoding ${filename}
>>> > in the working directory: ${absolute.path}
>>> >
>>> > Now that the file is no longer in the source directory when the
>>> > processor
>>> > fires, the command is broken.  I could PutFile somewhere temporarily; is
>>> > there a better way?
>>> >
>>> > On Thu, Nov 19, 2015 at 10:33 PM, Joe Witt  wrote:
>>> >>
>>> >> Charlie,
>>> >>
>>> >> The fact that this is confusing is something we agree should be more
>>> >> clear and we will improve.  We're tackling 

Re: High CPU usage in FileSystemRepository.java

2015-11-19 Thread Joe Percivall
Hello Adam,


Are you still seeing high cpu usage?

Sorry no has gotten back to you sooner, we are all working very hard to get 
0.4.0 out. 

Joe

- - - - - - 
Joseph Percivall
linkedin.com/in/Percivall
e: joeperciv...@yahoo.com




On Friday, November 13, 2015 4:10 PM, Adam Lamar  wrote:



Mark,

For this development system, I'm running the packaged OpenJDK from 
Ubuntu 14.04:

$ java -version
java version "1.7.0_79"
OpenJDK Runtime Environment (IcedTea 2.5.6) (7u79-2.5.6-0ubuntu1.14.04.1)
OpenJDK 64-Bit Server VM (build 24.79-b02, mixed mode)

Interestingly, I tried another system running the Oracle JDK (same 
version) and didn't see the same issue. Though it does seem to benefit 
from the additional sleep in that loop, but just barely (maybe 1% CPU 
difference - I could be making that up).

I hadn't uncommented those values, but I tried with no noticeable 
difference on the OpenJDK system.

Hope that helps,
Adam


On 11/13/15 5:28 AM, Mark Payne wrote:
> Adam,
>
> What version of Java are you running?
>
> Do you have the following lines from conf/bootstrap.conf uncommented, or are 
> they still commented out?
>
> java.arg.7=-XX:ReservedCodeCacheSize=256m
> java.arg.8=-XX:CodeCacheFlushingMinimumFreeSpace=10m
> java.arg.9=-XX:+UseCodeCacheFlushing
> java.arg.11=-XX:PermSize=128M
> java.arg.12=-XX:MaxPermSize=128M
>
> Thanks
> -Mark
>
>
>> On Nov 13, 2015, at 12:28 AM, Joe Witt  wrote:
>>
>> sorry - i see now :-)
>>
>> Thanks for the analysis.  Will dig in.
>>
>> Joe
>>
>> On Fri, Nov 13, 2015 at 12:28 AM, Joe Witt  wrote:
>>> Adam,
>>>
>>> Are you on a recent master build?
>>>
>>> Thanks
>>> Joe
>>>
>>> On Fri, Nov 13, 2015 at 12:27 AM, Adam Lamar  wrote:
 Hi everybody!

 I'm following up from my previous thread about high CPU usage in GetSQS. I
 ran into high CPU usage while developing a patch for that processor, and
 while investigating with "top", I noticed one NiFi thread in particular
 showed high CPU usage, even after turning off all processors and restarting
 NiFi.

 A jstack showed this thread was busy at FileSystemRepository.java line 1287
 [1]. Since that is a continue statement, it suggests that the thread was
 churning in the surrounding for loop.

 I didn't debug any further, but I did add a sleep statement just before the
 continue, and CPU usage dropped wildly, settling around 2-4%.

 I hope this is useful information, and I'm happy to debug further if 
 needed.

 Cheers,
 Adam

 [1]
 https://github.com/apache/nifi/blob/master/nifi-nar-bundles/nifi-framework-bundle/nifi-framework/nifi-framework-core/src/main/java/org/apache/nifi/controller/repository/FileSystemRepository.java#L1287


Re: Example for mongo processors

2015-11-17 Thread Joe Percivall
Hello Subhash,

Sorry no one has gotten back to you sooner but what kind of examples are you 
looking for? Are you looking for examples of how they are configured, a 
template or just cool applications?
 

Joe

- - - - - - 
Joseph Percivall
linkedin.com/in/Percivall
e: joeperciv...@yahoo.com




On Friday, November 13, 2015 6:38 AM, Subhash Parise 
 wrote:



Hi Team,

Can any one give me the examples for get mongo processors?.

Regards,
Subhash.


Re: Nifi - InvokeHTTP - retrying on 3xx with Location header issue

2015-11-10 Thread Joe Percivall
Mark,

That's fair, I think for now I'll take your approach (also easier on me) and if 
users want something else later I can change it.

Mans,

No problem, glad I can help.

Joe

 
- - - - - - 
Joseph Percivall
linkedin.com/in/Percivall
e: joeperciv...@yahoo.com




On Monday, November 9, 2015 9:56 PM, M Singh <mans2si...@yahoo.com> wrote:



Joe:

No problem and thanks for your support in fixing the issue.

Mans



On Monday, November 9, 2015 12:35 PM, Joe Percivall <joeperciv...@yahoo.com> 
wrote:



Mans,

I did a bit of testing using my refactor and I understand what the issue is. I 
was able to successfully complete the first use-case where "follow redirect" is 
true and within the same ontrigger follow the redirect.

The second use-case where the "follow redirect" is false didn't work as 
intended. It ends up routing to "No Retry" and emitting no response flowfile. 
The problem arises that the location header is not added to the request 
flowfile so there is no way to know where it was supposed to redirect to later. 

This actually brings up a bigger issue that is at the core of this issue. That 
is when the processor reaches out to a website and routes to any of the not 2xx 
relationships, the headers (other than status code and message) that are 
returned by the response aren't stored anywhere. I'm thinking we have a 
property that is a regular expression that allows users to match specific 
headers to add as attributes to the request flowfile. Thoughts?

Not trying to discourage you from contributing but I'm going to knock this out 
as part of 1086. There are plenty of other tickets that could use contributors!

Thanks for bringing this up,
Joe
- - - - - - 
Joseph Percivall
linkedin.com/in/Percivall
e: joeperciv...@yahoo.com





On Monday, November 9, 2015 2:05 PM, Joe Percivall <joeperciv...@yahoo.com> 
wrote:
Hey Mans,

Thanks for creating the ticket. First, have you tried the patch I supplied in 
1086 [1]? I changed the underlying implementation  of InvokeHttp so it very 
well could already be fixed.

The functionality should work where if you have "follow redirects" to true then 
it should follow the 3xx response to hit the other server automatically within 
the same ontrigger call. If "follow redirects" is false then it would not 
follow the redirect to the  location set in the header and instead route the 
request to the 3xx relationship.


[1] https://issues.apache.org/jira/browse/NIFI-1086
Thanks,
Joe
- - - - - - 
Joseph Percivall
linkedin.com/in/Percivall
e: joeperciv...@yahoo.com





On Monday, November 9, 2015 1:54 PM, M Singh <mans2si...@yahoo.com> wrote:



Hi:

I could not find Jira ticket for InvokeHTTP processor not saving location 
header, so I've created a Jira ticket for this - [NIFI-1133] InvokeHTTP 
Processor does not save Location header for 3xx responses - ASF JIRA.

My understanding is that once the http status is saved and a relationship for 
3xx is available then we can route the flow file with appropriate attribute 
(based on location header) to the invoke http processor so it can make a new 
request.


Please let me know if it ok and if I can start working on it.

Thanks 
  
  
[NIFI-1133] InvokeHTTP Processor does not save Location header for 3xx 
responses - ASF JIRA
InvokeHTTP processor does not save the Location header for 3xx responses. So, 
we can cannot hook up the processor to route based on redirect location header 
values.  
View on issues.apache.org Preview by Yahoo 
  




On Monday, November 9, 2015 6:51 AM, Joe Witt <joe.w...@gmail.com> wrote:



Mans,

Being new to NiFi or even contributing to open source at all is not a
problem.  We're here to help.  Ask questions as needed and we can help
you contribute.

Thanks
Joe


On Mon, Nov 9, 2015 at 9:38 AM, M Singh <mans2si...@yahoo.com> wrote:
> Hi Joe:
>
> I was looking at the InvokeHttp code and I can work on enhancing the 3xx
> issue based on the pattern for handling other statuses.  However, I would
> like to add a newbie caveat here.
>
> Let me know if that would help.
>
> Thanks
>
> Mans
>
>
>
> On Sunday, November 8, 2015 8:44 PM, M Singh <mans2si...@yahoo.com> wrote:
>
>
> Hi Joe:
>
> You are right - setting follow-redirects did not work and I did mix retry
> with redirect.
>
> I will wait for your enhancements.
>
> Thanks again for your help.
>
>
>
> On Sunday, November 8, 2015 8:31 PM, Joe Percivall <joeperciv...@yahoo.com>
> wrote:
>
>
> Hello,
>
>
> Firstly, I think you're mixing up "retry" and "redirect". The 3xx status
> code is for redirecting to another url and 5xx is to try again. The property
> we have is "Follow Redirects". Retrying doesn't involve a location header
> but the redirect does. That being said, I did a bit of digging I don't

Re: Nifi - InvokeHTTP - retrying on 3xx with Location header issue

2015-11-09 Thread Joe Percivall
Mans,

I did a bit of testing using my refactor and I understand what the issue is. I 
was able to successfully complete the first use-case where "follow redirect" is 
true and within the same ontrigger follow the redirect.

The second use-case where the "follow redirect" is false didn't work as 
intended. It ends up routing to "No Retry" and emitting no response flowfile. 
The problem arises that the location header is not added to the request 
flowfile so there is no way to know where it was supposed to redirect to later. 

This actually brings up a bigger issue that is at the core of this issue. That 
is when the processor reaches out to a website and routes to any of the not 2xx 
relationships, the headers (other than status code and message) that are 
returned by the response aren't stored anywhere. I'm thinking we have a 
property that is a regular expression that allows users to match specific 
headers to add as attributes to the request flowfile. Thoughts?

Not trying to discourage you from contributing but I'm going to knock this out 
as part of 1086. There are plenty of other tickets that could use contributors!

Thanks for bringing this up,
Joe
- - - - - - 
Joseph Percivall
linkedin.com/in/Percivall
e: joeperciv...@yahoo.com




On Monday, November 9, 2015 2:05 PM, Joe Percivall <joeperciv...@yahoo.com> 
wrote:
Hey Mans,

Thanks for creating the ticket. First, have you tried the patch I supplied in 
1086 [1]? I changed the underlying implementation  of InvokeHttp so it very 
well could already be fixed.

The functionality should work where if you have "follow redirects" to true then 
it should follow the 3xx response to hit the other server automatically within 
the same ontrigger call. If "follow redirects" is false then it would not 
follow the redirect to the  location set in the header and instead route the 
request to the 3xx relationship.


[1] https://issues.apache.org/jira/browse/NIFI-1086
Thanks,
Joe
- - - - - - 
Joseph Percivall
linkedin.com/in/Percivall
e: joeperciv...@yahoo.com





On Monday, November 9, 2015 1:54 PM, M Singh <mans2si...@yahoo.com> wrote:



Hi:

I could not find Jira ticket for InvokeHTTP processor not saving location 
header, so I've created a Jira ticket for this - [NIFI-1133] InvokeHTTP 
Processor does not save Location header for 3xx responses - ASF JIRA.

My understanding is that once the http status is saved and a relationship for 
3xx is available then we can route the flow file with appropriate attribute 
(based on location header) to the invoke http processor so it can make a new 
request.


Please let me know if it ok and if I can start working on it.

Thanks 
  
  
[NIFI-1133] InvokeHTTP Processor does not save Location header for 3xx 
responses - ASF JIRA
InvokeHTTP processor does not save the Location header for 3xx responses. So, 
we can cannot hook up the processor to route based on redirect location header 
values.  
View on issues.apache.org Preview by Yahoo 
  




On Monday, November 9, 2015 6:51 AM, Joe Witt <joe.w...@gmail.com> wrote:



Mans,

Being new to NiFi or even contributing to open source at all is not a
problem.  We're here to help.  Ask questions as needed and we can help
you contribute.

Thanks
Joe


On Mon, Nov 9, 2015 at 9:38 AM, M Singh <mans2si...@yahoo.com> wrote:
> Hi Joe:
>
> I was looking at the InvokeHttp code and I can work on enhancing the 3xx
> issue based on the pattern for handling other statuses.  However, I would
> like to add a newbie caveat here.
>
> Let me know if that would help.
>
> Thanks
>
> Mans
>
>
>
> On Sunday, November 8, 2015 8:44 PM, M Singh <mans2si...@yahoo.com> wrote:
>
>
> Hi Joe:
>
> You are right - setting follow-redirects did not work and I did mix retry
> with redirect.
>
> I will wait for your enhancements.
>
> Thanks again for your help.
>
>
>
> On Sunday, November 8, 2015 8:31 PM, Joe Percivall <joeperciv...@yahoo.com>
> wrote:
>
>
> Hello,
>
>
> Firstly, I think you're mixing up "retry" and "redirect". The 3xx status
> code is for redirecting to another url and 5xx is to try again. The property
> we have is "Follow Redirects". Retrying doesn't involve a location header
> but the redirect does. That being said, I did a bit of digging I don't think
> InvokeHttp was handling redirects properly. All we were doing was setting
> the "setInstanceFollowRedirects" to true, which according to this site [1]
> doesn't fully handle it.
>
> I am going to attach a patch to ticket 1086[2] tonight for InvokeHttp's
> refactor to use OkHttp instead of HttpUrlConnection. If you'd like to test
> that out and see if that it solves the redirect and location header problem
> that would be awesome.
>
> [1]
> http://www.mkyong.com/jav

Re: Help me figure out why this wont work

2015-11-09 Thread Joe Percivall
Hey Chris,

I feel your "Ugg" and I'd be happy to knock it out. I'm doing a bunch of work 
with InvokeHttp already so I should be able to get this done pretty quick.

 
Joe- - - - - - 
Joseph Percivall
linkedin.com/in/Percivall
e: joeperciv...@yahoo.com




On Monday, November 9, 2015 3:23 PM, Christopher Hamm 
 wrote:



Ugg
https://issues.apache.org/jira/browse/NIFI-993



On Mon, Nov 9, 2015 at 3:10 PM, Christopher Hamm  
wrote:

Ok tried that and still wont work. I think it as something to do with message 
of "supports expression language: false" for the url. Attached error and xml 
template. Any help is appreciated. I need to understand how the dynamic 
querying should work in order to complete an actual project. Many thanks
>
>
>
>​
>
>
>
>On Fri, Nov 6, 2015 at 8:04 AM, Aldrin Piri  wrote:
>
>Christopher,
>>
>>
>>For this setup, you want to use GetHTTP. InvokeHTTP operates off of FlowFile 
>>input.  
>>
>>
>>
>>
>>On Fri, Nov 6, 2015 at 2:33 AM, Christopher Hamm  
>>wrote:
>>
>>I have a dataFlow setup that looks up api call using current date. I turn it 
>>on and nothing seems to happen.. Help appreciated.
>>>
>>>
>>>
>>>
>>>
>>>-- 
>>>
>>>Sincerely,
>>>Chris Hamm
>>>(E) ceham...@gmail.com
>>>(Twitter) http://twitter.com/webhamm
>>>(Linkedin) http://www.linkedin.com/in/chrishamm
>>
>
>
>
>-- 
>
>Sincerely,
>Chris Hamm
>(E) ceham...@gmail.com
>(Twitter) http://twitter.com/webhamm
>(Linkedin) http://www.linkedin.com/in/chrishamm


-- 

Sincerely,
Chris Hamm
(E) ceham...@gmail.com
(Twitter) http://twitter.com/webhamm
(Linkedin) http://www.linkedin.com/in/chrishamm


Re: Help me figure out why this wont work

2015-11-09 Thread Joe Percivall
Chris,

Just posted a patch to allow expression language for the URL property[1]. I 
also enabled expression language for the filename property since there didn't 
seem to be a reason not to. 

[1] https://issues.apache.org/jira/browse/NIFI-993

Hope that helps,
Joe

- - - - - - 
Joseph Percivall
linkedin.com/in/Percivall
e: joeperciv...@yahoo.com




On Monday, November 9, 2015 3:26 PM, Joe Percivall <joeperciv...@yahoo.com> 
wrote:
Hey Chris,

I feel your "Ugg" and I'd be happy to knock it out. I'm doing a bunch of work 
with InvokeHttp already so I should be able to get this done pretty quick.


Joe- - - - - - 
Joseph Percivall
linkedin.com/in/Percivall
e: joeperciv...@yahoo.com





On Monday, November 9, 2015 3:23 PM, Christopher Hamm 
<em...@christopherhamm.com> wrote:



Ugg
https://issues.apache.org/jira/browse/NIFI-993



On Mon, Nov 9, 2015 at 3:10 PM, Christopher Hamm <em...@christopherhamm.com> 
wrote:

Ok tried that and still wont work. I think it as something to do with message 
of "supports expression language: false" for the url. Attached error and xml 
template. Any help is appreciated. I need to understand how the dynamic 
querying should work in order to complete an actual project. Many thanks
>
>
>
>​
>
>
>
>On Fri, Nov 6, 2015 at 8:04 AM, Aldrin Piri <aldrinp...@gmail.com> wrote:
>
>Christopher,
>>
>>
>>For this setup, you want to use GetHTTP. InvokeHTTP operates off of FlowFile 
>>input.  
>>
>>
>>
>>
>>On Fri, Nov 6, 2015 at 2:33 AM, Christopher Hamm <em...@christopherhamm.com> 
>>wrote:
>>
>>I have a dataFlow setup that looks up api call using current date. I turn it 
>>on and nothing seems to happen.. Help appreciated.
>>>
>>>
>>>
>>>
>>>
>>>-- 
>>>
>>>Sincerely,
>>>Chris Hamm
>>>(E) ceham...@gmail.com
>>>(Twitter) http://twitter.com/webhamm
>>>(Linkedin) http://www.linkedin.com/in/chrishamm
>>
>
>
>
>-- 
>
>Sincerely,
>Chris Hamm
>(E) ceham...@gmail.com
>(Twitter) http://twitter.com/webhamm
>(Linkedin) http://www.linkedin.com/in/chrishamm


-- 

Sincerely,
Chris Hamm
(E) ceham...@gmail.com
(Twitter) http://twitter.com/webhamm
(Linkedin) http://www.linkedin.com/in/chrishamm


Re: ConvertCharacterSet

2015-10-27 Thread Joe Percivall
Hey Charlie,

Sorry no one has followed up with you yet. One way I see around 
ConvertCharacterSet not supporting expression language is to route on attribute 
(assuming the character set is extracted to be an attribute) to different 
ConvertCharacterSet processors depending on the input character set.

That being said, I don't see a reason why the ConvertCharacterSet shouldn't 
support expression language. If anyone doesn't have objections I'll put in a 
ticket later today and knock it out real quick.
 

Joe
- - - - - - 
Joseph Percivall
linkedin.com/in/Percivall
e: joeperciv...@yahoo.com




On Sunday, October 25, 2015 7:13 PM, Charlie Frasure  
wrote:



I'm looking to process many files into common formats.  The source files are 
coming in various character sets, mime types, and new line terminators.

My thinking for a data flow was along these lines:

GetFile (from many sub directories) -> 
ExecuteStreamCommand (file -i) ->
ConvertCharacterSet (from previous command to utf8) ->
ReplaceText (to change any \r\n into \n) ->
PutFile (into a directory structure based on values found in the original file 
path and filename)

Additional steps would be added for archiving a copy of the original, 
converting xml files, etc.

Attempting to process these with Nifi leaves me confused as to how to process 
within the tool.  If I want to ConvertCharacterSet, I have to know the input 
type.  I setup a ExecuteStreamCommand to file -i 
${absolute.path:append(${filename})} which returned the expected values.  I 
don't see a way to turn these results into input for the processor, which 
doesn't accept expression language for that field.

I also considered ConvertCSVToAvro as an interim step but notice the same 
issue.  Any suggestions what this dataflow should look like?


Charlie


Re: Nifi Clustering - work distribution on workers

2015-10-23 Thread Joe Percivall
Hey Mans,



To load balance and send the FlowFiles to a processing group or processor in 
the same cluster you will still set up an RPG but you will just point the RPG 
to the current cluster's NCM.

The NiFi Cluster Manager (NCM) is in charge of splitting the incoming FlowFiles 
among the nodes. It knows the current load of each and splits up the files 
accordingly.



Hope that helps, 
Joe
- - - - - - 
Joseph Percivall
linkedin.com/in/Percivall
e: joeperciv...@yahoo.com




On Tuesday, October 20, 2015 9:34 AM, M Singh  wrote:



Hi Matt:

The screenshot seems to be truncated - after the after the FetchHDFS processor 
- but I am not sure if that is important.

I have a question though - the ListHDFS processor running on a separate cluster 
is producing one flow file for each file on hdfs and from your comments it 
appears that RPG will load balance the flow files to it's processor nodes so 
that they process each flow file separately.  Can the ListHDFS send the flow 
files to processors or a processing group in the same cluster that can then 
fetch the data from HDFS ? Also, you indicate that the input port is running on 
each RPG node so how do nodes in the RPG coordinate splitting of the incoming 
flow files among them ?

Thanks again.

Mans




On Friday, October 16, 2015 3:16 PM, M Singh  wrote:




Hi Matt:

Thanks for taking the time to describe and draw out the scenario for me.  I go 
through your notes and documentation to understand the concepts.   

Thanks again for your generous support in helping me understand Nifi better.

Mans



On Thursday, October 15, 2015 12:53 PM, Matthew Clarke 
 wrote:



Mans,
 I have attached a screenshot for how the listHDFS and fetchHDFS would be 
configured in a NIFi cluster to achieve what we believe your looking to 
accomplish. At the end you will have each of your nodes fetching different 
files from HDFS.  These nodes will work on each of their files independently of 
the other Nodes.  The NCM serves as your eyes in to your cluster.  Every 
processor on your graph exists on every node. Unless specifically configured to 
run 'on primary node' only, the processors all run on every node using the 
configured values.  Setting the 'concurrent tasks' on a processor will have the 
affect of setting that number of concurrent tasks on that processor on every 
node.

Thanks,
Matt


On Thu, Oct 15, 2015 at 12:17 PM, M Singh  wrote:

Hi Mark:
>
>Thanks for your answers but being a newbie I am still not clear about some 
>issues:
>
>
>Regarding hdfs multiple files:
>
>
>Typically, if you want to pull from HDFS and partition that data
>across the cluster, you would run ListHDFS on the Primary Node only, and then 
>use Site-to-Site [1] to distribute
>that listing to all nodes in the cluster. 
>
>
>Question - I believe that this requires distributing the list of files to NCM 
>to the other site - who will take care of distributing it to it's worker 
>nodes.  Do we send the list of files to NCM as a single message and NCM will 
>split it to distribute one to each of the nodes, or should we send separate 
>messages to NCM and then it will send one message to each worker node ? Also, 
>if we send a single list of files to NCM, does it send the same list to all 
>it's workers ? If the NCM sends the same list then won't there be duplication 
>of work ?
>
>
>Regarding concurrent tasks - 
>
>
>Question - How do they help in parallelizing the processing ?
>
>
>Regarding passing separate arguments to workers :
>
>
>Question - This is related to the above two, ie, how to partition the tasks 
>across worker nodes in a cluster ?
>
>
>Thanks again for your help.
>
>
>Mans
>
>
>
>
>
>
>
>
>
>
>On Wednesday, October 14, 2015 2:08 PM, Mark Payne  
>wrote:
> 
>
>
>Mans,
>
>
>Nodes in a cluster work independently from one another and do not know about 
>each other. That is accurate.
>Each node in a cluster runs the same flow. Typically, if you want to pull from 
>HDFS and partition that data
>across the cluster, you would run ListHDFS on the Primary Node only, and then 
>use Site-to-Site [1] to distribute
>that listing to all nodes in the cluster. Each node would then pull the data 
>that it is responsible to pull and begin
>working on it. We do realize that this is not ideal to have to setup this way, 
>and it is something that we are working
>on so that it is much easier to have that listing automatically distributed 
>across the cluster.
>
>
>I'm not sure that I understand your #3 - how do we design the workflow so that 
>the nodes work on one file at a time?
>For each Processor, you can configure how many threads (Concurrent Tasks) are 
>to be used in the Scheduling tab
>of the Processor Configuration dialog. You can certainly configure that to run 
>only a single Concurrent Task. 
>This is the number of Concurrent Tasks that will run on each node in the 
>cluster, not 

Re: Feeding & Consuming data to & from Nifi

2015-09-20 Thread Joe Percivall
Hey Krish,
Welcome to NiFi! Storm is how I got my start with data flow as well and NiFi 
has been awesome. NiFi's "processors" are equivalent to Storm's "spouts" and 
"bolts".
NiFi can certainly handle the logging/auditing use case among others. With 
NiFi's data provenance you can see exactly how each message was handled, the 
content and attributes at each stage, as well as replay any message at any 
stage if you feel the need.
For ingesting into NiFi there are a couple of options depending on your system. 
If the files already exist on the same system then there is a simple "getFile" 
processor. For other set-ups you can check out the "get" processors in the 
documentation [1]. For forwarding to another service again it depends on your 
system. If you set up two NiFi instances you can check out "Remote Process 
Group". If you're just "Put"ing it through HTTP there is a processor for that 
as well. Again there are other ways for exfiltrate the data depending on your 
set-up.
So for the configuration it all depends on what you want to do and how your 
system is set up. If you give a bit more information on the specific services 
you're using or use-case we can give even more direction.
[1] https://nifi.apache.org/docs.html
Again welcome to NiFi,Joe- - - - - - Joseph 
Percivalllinkedin.com/in/Percivalle: joeperciv...@yahoo.com
 


 On Sunday, September 20, 2015 1:15 AM, Krish  
wrote:
   

 Hi,
I am a n00b at NiFi although I have worked with Storm; currently that is how we 
handle data flow logic.
I am evaluating using NiFi for logging/auditing use case (might move to other 
uses if this works) in a ~50 machine cluster, and am thinking if it would be a 
good fit to ingest messages from various sources and spit out the graph of how 
a message was handled at each stage, by each of the services.
Anyway, I was wondering how do we feed data into a NiFi cluster to get the 
first stage started. Also, how does the data exit the system, if I want it to 
be forwarded to another service after NiFi is done with its processing [similar 
to spouts and bolts in apache storm].
Thanks.--
κρισhναν