Re: NiFi Queue Monitoring

2021-07-21 Thread Andrew Grande
Can't you leverage some of the recent nifi features and basically run sql
queries over NiFi metrics directly as part of the flow? Then act on it with
a full flexibility of the flow. Kinda like a push design.

Andrew

On Tue, Jul 20, 2021, 2:31 PM scott  wrote:

> Hi all,
> I'm trying to setup some monitoring of all queues in my NiFi instance, to
> catch before queues become full. One solution I am looking at is to use the
> API, but because I have a secure NiFi that uses LDAP, it seems to require a
> token that expires in 24 hours or so. I need this to be an automated
> solution, so that is not going to work. Has anyone else tackled this
> problem with a secure LDAP enabled cluster?
>
> Thanks,
> Scott
>


Re: NiFi configuration files changes

2021-06-29 Thread Andrew Grande
The physical files will get synchronized to the reference state from a
central config management source (CM). There's no point watching them on
the file system. If you need a change log for config files, I'd look into
CM api to fetch those instead.

On Tue, Jun 29, 2021, 8:30 AM Tomislav Novosel <
tomislav.novo...@clearpeaks.com> wrote:

> Hi to all,
>
>
>
> Is there a good way how to capture NiFi configuration files changes
> (nifi.properties, authorizers.xml,…etc)
>
> and to forward that changes (or just to notify) some other system or app?
>
> Can I do it with NiFi itself?
>
> The question is in the context of Cloudera platform – CFM.
>
>
>
> Thanks,
>
> Regards,
>
> Tom
>


Re: Stopping processor after MAX number of retries

2021-02-26 Thread Andrew Grande
I saw it several times and I have a strong conviction this is an
anti-pattern. A dataflow must not mess with start/stop state of processors
or process groups.

Instead, a flow is always running and one puts a conditional check to
either not get the data in or reroute/deny it.

If you need to coordinate between processors, use wait/notify.

If you need to coordinate across nodes, consider DistributedMapCache with a
variety of implementations.

Finally, can use external stores directly if need to coordinate with other
systems.

These become part of the flow design.

Andrew

On Fri, Feb 26, 2021, 7:42 AM Tomislav Novosel <
tomislav.novo...@clearpeaks.com> wrote:

> Hi guys,
>
>
>
> I want to stop the processor after exceeding maximum number of retries.
>
> For that I'm using RetryFlowFile processor, after 5 times of retry, it
> routes
>
> flowfile to retries_exceeded.
>
>
>
> When that kicks in, I want to stop the processor which was retried 5 times.
>
>
>
> What is the best approach? I have few ones:
>
>
>
>- Execute shell script which sends request to nifi-api to set
>processor state to STOPPED
>- Put InvokeHTTP processor to send request
>
>
>
> The downside is, what if processor-id changes, e.g. deploying to another
> env or nifi restart, not sure about that.
>
> Also, it is nifi cluster with authentication and SSL, so it complicates
> the things.
>
>
>
> Maybe someone has much simpler approach, with backpressure or something.
>
>
>
> Regards,
>
> Tom
>


Re: Tls-toolkit.sh?

2020-12-11 Thread Andrew Grande
NiFi toolkit link here https://nifi.apache.org/download.html

Enjoy :)

On Fri, Dec 11, 2020, 8:59 AM Darren Govoni  wrote:

> Hi
>
> I want to setup a secure local nifi and the online docs refer to this
> script but i cant find it anywhere.
>
> Any clues?
>
> Darren
>
> Sent from my Verizon, Samsung Galaxy smartphone
> Get Outlook for Android 
>


Re: Rehosting flow files from cluster nodes to primary node only

2020-11-13 Thread Andrew Grande
I think a better design is for every node to write to this network share
with some form of a partition (node) id in the filename. In a 5 node
cluster you will have 5 parts.

Next, either query over this directory directly with your engine of choice.

Reducing all traffic to 1 node will be a bottleneck and not how a
distributed processing logic generally operates.

On Fri, Nov 13, 2020, 8:54 AM James McMahon  wrote:

> My flow files are distributed across my cluster nodes. I am using an
> ExecuteScript processor running a python script to write custom log
> messages to a log file I maintain. It appears that i am losing records when
> all the nodes are attempting to write to the one log file that exists on a
> common network lz.
>
> I'd like to work around this problem by reassigning all my flow files from
> my cluster nodes to the primary node, only. How can I do this? Thank you in
> advance for your help.
>


Re: NIFI HandleHttpRequest API - Health Check when API or Node Down

2020-09-04 Thread Andrew Grande
You can always hit NiFi API status rest endpoint. It won't give you any
idea about that specific http endpoint you exposed, though, as it is a
general nifi rest api.

Your LB would need to understand how to hit this URL too, especially if
it's secured. Coming back to the easiest path, you'd rather implement a
standard integration pattern for thr custom endpoint and filter out any GET
requests which come through to the /path/ping as an example. If it fails,
LB knows the endpoint is dead, if it returns 200, it's live, and your nifi
flow would simply terminate any requests for the status check path.

If you were asking about having a real time integrated system where a LB
would be able to route ONLY to healthy nodes and maintain and discover that
list - I don't think you can do it with the aws LB, at least not if you
have a full control over it and can drive it with APIs.

Andrew

On Fri, Sep 4, 2020, 9:26 AM jgunvaldson  wrote:

> It seems a bit like a chicken and egg thing. Using ‘anything’ configured
> on the disconnected node as a health check, is not unlike trying to get to
> the API (listening port) itself? Kinda.
>
> Anyway
>
> I was hoping that the NIFI infrastructure had a generalized, centralized
> (REST API?  or other) that would give me the answer is this NODE up and
> listening on this PORT, and that it could be called by a Load Balancer?
>
> ~John
>
>
>
> On Sep 4, 2020, at 9:19 AM, Etienne Jouvin 
> wrote:
>
> Because you implemented a HandleHttpRequest listing, why don't you
> configure an handle on something like http(s)://server/ping
> And the response is just pong
>
>
>
> Le ven. 4 sept. 2020 à 18:02, jgunvaldson  a écrit :
>
>> Hi,
>>
>> Our network administrators are unable to wire up advanced Load Balancer
>> (AWS Application Load Balancer) or (Apache reverse proxy) to leverage a
>> NIFI API that may be listening on a port across several nodes.
>>
>> For instance, a HandleHttpRequest listing on Node-1 on PORT 5112, Node-2
>> on 5112, Node-3 on 5112, and so on and so forth…
>>
>> In an event where a NODE is down (or API stops listening, it happens), or
>> disconnected, a call to that Node and PORT will fail and be a pretty bad
>> experience for the customer
>>
>> So
>>
>> What we would like to have is an external Load Balancer be able to use
>> Round Robin (Advanced Features) to redirect the request to an UP Node, but
>> to do this the Load Balancer needs a proper health check.
>>
>> What is a proper “Health Check” for this scenario? How would it be
>> created and wired up?
>>
>> Right now, an API requested that is hosted on NIFI that is proxied by our
>> API Manager (WSO2) will fail on the down NODE and not recover - user will
>> probably get a 500. APIM is not a good load balancer.
>>
>> Thanks in advance for this discussion
>>
>>
>> Best Regards
>> John Gunvaldson
>>
>>
>


Re: Looking for Help! Custom Processor NAR autoloader

2020-05-28 Thread Andrew Grande
I confirmed previously that autoload applies to an initial load and won't
hot-reload on NAR update (instead requires an instance restart).

If the custom component doesn't come up, I'm pretty sure there was a
problem with dependency resolution. Check the startup logs for errors and
any WARN messages from the nifi module loader, it should tell more.

Hope this helps,
Andrew


On Thu, May 28, 2020, 1:21 PM margeaux.egor...@spglobal.com <
margeaux.egor...@spglobal.com> wrote:

> Hello!
>
>
>
> I am looking for guidance on an issue I am facing concerning utilizing the
> autoloading of custom processors using the extensions directory.
>
>
>
> The issue is with autoloading custom processor NARs and being able to see
> them in the UI without a restart of the docker container.  I can confirm
> that the NARs are showing up in the extensions folder as they should but
> while tailing the docker logs to see the if the autoloader is picking it up
> it is not.
>
>
>
> If anyone has any tips on troubleshooting this I would be very grateful!
>
>
>
> Thank you,
>
>
>
>
>
> Margeaux Egorova
>
> Data Engineer, Architecture
>
> S Global  | Market Intelligence
>
> +1.804.499.9414
>
> East Coast, USA (Remote)
>
>
>
>
>
>
>
> #NiFittyNARwhals
>
>
>
> ~~~ *~*~ Happy Data Flowing ~*~* ~~~
>
>
>
>
>
> --
>
> The information contained in this message is intended only for the
> recipient, and may be a confidential attorney-client communication or may
> otherwise be privileged and confidential and protected from disclosure. If
> the reader of this message is not the intended recipient, or an employee or
> agent responsible for delivering this message to the intended recipient,
> please be aware that any dissemination or copying of this communication is
> strictly prohibited. If you have received this communication in error,
> please immediately notify us by replying to the message and deleting it
> from your computer. S Global Inc. reserves the right, subject to
> applicable local law, to monitor, review and process the content of any
> electronic message or information sent to or from S Global Inc. e-mail
> addresses without informing the sender or recipient of the message. By
> sending electronic message or information to S Global Inc. e-mail
> addresses you, as the sender, are consenting to S Global Inc. processing
> any of your personal data therein.
>


Re: Connecting Controller Services Automatically

2020-05-23 Thread Andrew Grande
Maybe something is going on with specific types or hierarchies. I've
noticed DefaultSslContext didn't get assigned, even though it was the only
one available. Does autowiring logic apply to this one?

Andrew

On Sat, May 23, 2020, 3:54 PM Eric Secules  wrote:

> Hi Bryan,
>
> I have noticed this behaviour sometimes, but not all the time I am running
> the latest registry and NiFi versions. I haven't found a conclusive pattern
> but I have a hunch that it has to do with having versioned process groups
> within versioned process groups. My deployment strategy is this:
>
>- Have an outer process group which only contains controller services,
>called the "Controller Container"
>   - For now I just have one controller service per type of controller
>   service.
>   - When deploying, download all production flows inside the
>Controller Container.
>- I noticed that some of the controller services find their match, but
>others don't leaving me with roughly 70 invalid processors out of 800.
>
> If you could point me in the right direction of the code which is supposed
> to do the matching I might be able to debug better.
>
> Thanks,
> Eric
>
> On Sat, May 23, 2020 at 3:27 PM Bryan Bende  wrote:
>
>> If you use registry >= 0.5.0 And nifi >= 1.10.0, then it will auto select
>> external controller services with the same name as long as there is only
>> one of the same type with same name (name is not unique).
>>
>> On Sat, May 23, 2020 at 3:34 PM Andy LoPresto 
>> wrote:
>>
>>> My position is that we don’t claim completely automated deployment as a
>>> feature, so manually setting the controller service IDs is not exposed.
>>> Technically, they are defined in the flow.xml.gz and could be modified by
>>> an administrator to be static after generation. This would require frequent
>>> manual manipulation of the flow.xml.gz in various environments and frequent
>>> restarts of the NiFi service. I do not recommend this.
>>>
>>>
>>> Andy LoPresto
>>> alopre...@apache.org
>>> *alopresto.apa...@gmail.com *
>>> He/Him
>>> PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4  BACE 3C6E F65B 2F7D EF69
>>>
>>> On May 23, 2020, at 11:05 AM, Andrew Grande  wrote:
>>>
>>> Aren't those IDs generated? How can one enforce it?
>>>
>>> Andrew
>>>
>>> On Sat, May 23, 2020, 10:53 AM Andy LoPresto 
>>> wrote:
>>>
>>>> If you want the process to be completely automated, you would have to
>>>> enforce the controller service IDs to be identical across environments.
>>>> Otherwise deployment would need a manual intervention to reference the
>>>> specific controller service in the proper component.
>>>>
>>>> Andy LoPresto
>>>> alopre...@apache.org
>>>> *alopresto.apa...@gmail.com *
>>>> He/Him
>>>> PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4  BACE 3C6E F65B 2F7D EF69
>>>>
>>>> On May 22, 2020, at 3:57 PM, Eric Secules  wrote:
>>>>
>>>> Hi Andy,
>>>>
>>>> Given that you have a flow which operates on two different S3 accounts
>>>> for example, how would you do deployment automation? Do you mandate that
>>>> the controller service with the same ID must exist in both a development
>>>> and production environment rather than try to connect a processor to a
>>>> matching controller service?
>>>>
>>>> -Eric
>>>>
>>>> On Fri, May 22, 2020 at 3:44 PM Andy LoPresto 
>>>> wrote:
>>>>
>>>>> Eric,
>>>>>
>>>>> I can’t answer all these questions but I would definitely have
>>>>> hesitations around building an expectation that there is only one instance
>>>>> of any given controller service type in an entire canvas. I can think of
>>>>> numerous flows (this may not affect your particular flows, but the 
>>>>> concepts
>>>>> still apply) which require multiple instances of the same controller
>>>>> service type to be available:
>>>>>
>>>>> * A flow which invokes a mutually-authenticated TLS HTTP API, consumes
>>>>> data, transforms it, and posts it to another mTLS API
>>>>> * A flow which retrieves objects from one S3 bucket and puts them into
>>>>> an S3 bucket in a different AWS account
>>>>> * A flow which connects to one database and retrieves data, transforms
>>>&

Re: Connecting Controller Services Automatically

2020-05-23 Thread Andrew Grande
Aren't those IDs generated? How can one enforce it?

Andrew

On Sat, May 23, 2020, 10:53 AM Andy LoPresto  wrote:

> If you want the process to be completely automated, you would have to
> enforce the controller service IDs to be identical across environments.
> Otherwise deployment would need a manual intervention to reference the
> specific controller service in the proper component.
>
> Andy LoPresto
> alopre...@apache.org
> *alopresto.apa...@gmail.com *
> He/Him
> PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4  BACE 3C6E F65B 2F7D EF69
>
> On May 22, 2020, at 3:57 PM, Eric Secules  wrote:
>
> Hi Andy,
>
> Given that you have a flow which operates on two different S3 accounts for
> example, how would you do deployment automation? Do you mandate that the
> controller service with the same ID must exist in both a development and
> production environment rather than try to connect a processor to a matching
> controller service?
>
> -Eric
>
> On Fri, May 22, 2020 at 3:44 PM Andy LoPresto 
> wrote:
>
>> Eric,
>>
>> I can’t answer all these questions but I would definitely have
>> hesitations around building an expectation that there is only one instance
>> of any given controller service type in an entire canvas. I can think of
>> numerous flows (this may not affect your particular flows, but the concepts
>> still apply) which require multiple instances of the same controller
>> service type to be available:
>>
>> * A flow which invokes a mutually-authenticated TLS HTTP API, consumes
>> data, transforms it, and posts it to another mTLS API
>> * A flow which retrieves objects from one S3 bucket and puts them into an
>> S3 bucket in a different AWS account
>> * A flow which connects to one database and retrieves data, transforms
>> it, and persists it to another database
>>
>> If there is only _one_ StandardSSLContextService,
>> AWSCredentialsProviderControllerService, or DBCPConnectionPool available in
>> the entire controller, these flows cannot exist.
>>
>> I am not saying the retrieval of new flow versions and the matching of
>> referenced controller services cannot be improved, but I would definitely
>> advise caution before going too far down this path without considering all
>> possible side effects and potential constraints on future flow development.
>>
>>
>>
>> Andy LoPresto
>> alopre...@apache.org
>> *alopresto.apa...@gmail.com *
>> He/Him
>> PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4  BACE 3C6E F65B 2F7D EF69
>>
>> On May 22, 2020, at 3:01 PM, Eric Secules  wrote:
>>
>> Hello everyone,
>>
>> I am running into an issue with automated deployment using nipyapi
>> . We would like to be able to
>> pull down flows from a registry and have them ready to go once all their
>> controller services have been turned on. But there are a few issues.
>> Sometimes the flows that we download from the registry reference controller
>> service IDs that don't exist on this machine because the flow was developed
>> in a different environment. That's easy enough to fix if there is just one
>> applicable controller service, but not when there are two or more.
>>
>> We have taken the decision to put all our controller services at the top
>> level and have one of each kind we need, rather than have multiple of the
>> same controller service attached to individual process groups.
>>
>> We are running into a problem where some processors can either connect to
>> a JSONTreeReader or a CSVReader and there's no indication in the
>> ProcessorDTO object which type it was originally connected to, just a GUID
>> of a controller service that doesn't exist in this deployment.
>>
>> Would it be possible to include the type or name of the controller
>> service in the component.config.descriptors section? Are we going about it
>> the wrong way trying to simplify down to the least number of controller
>> services?
>>
>> Thanks,
>> Eric
>>
>>
>>
>


Re: Can Nifi authenticate and authorize using 389DS?

2020-01-29 Thread Andrew Grande
It should work, as it is the LDAP implementation, but not sure it was
tested explicitly with this one.

Authorization is local, however, one doesn't put application-specific
policies into LDAP as a best practice. Instead, NiFi manages its policies.

On Wed, Jan 29, 2020, 12:32 PM Dan Stromberg 
wrote:

>
> Hi folks.
>
> The subject pretty much says it.
>
> But here it is again: Can Apache Nifi authenticate and authorize using
> Redhat (IBM) 389 Directory Server?
>
> I know it /should/ work - my question is "does it?"
>
> Alternatively, I'll ask: Is anyone combining these two today?  If so, are
> they "playing nicely together"?
>
> Thanks!
>
>


Re: Holiday scheduling

2020-01-15 Thread Andrew Grande
Maybe a good fit for a drools engine rule set? I remember there was a
community processor.

Andrew

On Wed, Jan 15, 2020, 10:40 AM Dave Andrews 
wrote:

> Hello community,
> Has anyone come up with a clean internal to NiFi solution to changing
> workflow based on date/time?  For example, a routeonAttribute that will
> route files differently for the upcoming weekend and Monday holiday?  I
> have written different routing strategies like:
>
> ${filename:toUpper():startsWith('XYZ')
> :and(${now():toNumber():format('MM'):equals('01')
> :and(${now():toNumber():format('dd'):equals('20')
> :and(${now():toNumber():format('HH'):lt('17')
> :and(${now():toNumber():format(''):equals('2020')
>
>   But I'm curious if someone came up with a more elegant which is more
> maintainable - I don't want to maintain a calendar.
>
> Should I query a shared database, or is there an online calendar I could
> invoke http against?   or??
>
> thanks --da
>


Re: Abstract Processor > Static final properties

2019-08-12 Thread Andrew Grande
Also, static fields are scoped to the classloader, they are not global in a
sense you described. NiFi uses classloader isolation for its processors.

Andrew

On Mon, Aug 12, 2019, 12:24 AM Craig Knell  wrote:

> Thanks
>
> Best regards
>
> Craig Knell
>
> Mobile 61 402128615
> Skype craigknell
>
> On 12 Aug 2019, at 13:57, Bryan Bende  wrote:
>
> Hello,
>
> The final static variables are usually the descriptors which are just the
> definition of the properties. The actual values of the properties are
> storied in a separate map per instance of the processor.
>
> -Bryan
>
> On Mon, Aug 12, 2019 at 12:03 AM Craig Knell 
> wrote:
>
>> Hi Folks,
>>
>> I would like some assistance to clarify maybe some basic java and
>> using Nifi processors.
>>
>> My Use Case is to run multiple instances of a custom processor with
>> different settings, aka just about all processors in nifi operate this
>> way.
>>
>> As processor properties are declared STATIC, and hence I believe a
>> class variable , how is this managed in nifi with java, where there
>> are multiple Processor Instance of a single class?
>>
>> --
>> Regards
>>
>> Craig
>>
> --
> Sent from Gmail Mobile
>
>


Re: Postgres table as Cached Lookup Service

2019-08-10 Thread Andrew Grande
Matt, the 1.9.2 docs don't list anything like that. Are you sure? Is it
something coming out in the next, unreleased version? A more than welcome
addition :)

Silly me was searching for everything jdbc, of course, on the page.

Andrew

On Sat, Aug 10, 2019, 11:08 AM Matt Burgess  wrote:

> There is, sorry I’m AFK ATM but there’s a SimpleDatabaseLookup and a
> DatabaseRecordLookup (or something similarly named) :)
>
> Sent from my iPhone
>
> On Aug 10, 2019, at 1:31 PM, Andrew Grande  wrote:
>
> Maybe this would help?
> https://github.com/mrcsparker/nifi-sqllookup-services-bundle/blob/master/README.md
>
> I wish there was a standard bundled jdbc lookup record implementation.
>
> Andrew
>
> On Fri, Aug 9, 2019, 11:56 PM Craig Knell  wrote:
>
>> Hi Folks
>>
>> Cached Postgres Lookup Service
>> whats the best way to create a postgres lookup service within nifi.
>>
>> I have an incoming ff with content in JSON.
>> What i want to do is use the field
>> =  "ID" : 12344
>>  field to lookup a postgres table and ADD in a new json field
>> =  "NAME" : lookupvalue
>> back into the ff content in json format.
>>
>> I would ideally like the lookup processor to get/refresh the cached data
>> daily.
>>
>> Avro Records
>> It looks like the Rcord processing should work with the Cache,
>> I however get a little lost with using Record processors and AVRO
>> 1. are avro schema's case sensitive?
>> 2. how do i convert from incoming JSON ff with  uppercase fields to an
>> out going ff in lowercase fields ?
>> 3. i performed a validateRecord processor using lowercase avro schema
>> against the incoming ff with uppercase fields and it returned 3 of the
>> 10 fields, in lowercase, with null values, not sure what this means.
>>
>> Thanks
>>
>> Craig
>>
>


Re: Postgres table as Cached Lookup Service

2019-08-10 Thread Andrew Grande
Maybe this would help?
https://github.com/mrcsparker/nifi-sqllookup-services-bundle/blob/master/README.md

I wish there was a standard bundled jdbc lookup record implementation.

Andrew

On Fri, Aug 9, 2019, 11:56 PM Craig Knell  wrote:

> Hi Folks
>
> Cached Postgres Lookup Service
> whats the best way to create a postgres lookup service within nifi.
>
> I have an incoming ff with content in JSON.
> What i want to do is use the field
> =  "ID" : 12344
>  field to lookup a postgres table and ADD in a new json field
> =  "NAME" : lookupvalue
> back into the ff content in json format.
>
> I would ideally like the lookup processor to get/refresh the cached data
> daily.
>
> Avro Records
> It looks like the Rcord processing should work with the Cache,
> I however get a little lost with using Record processors and AVRO
> 1. are avro schema's case sensitive?
> 2. how do i convert from incoming JSON ff with  uppercase fields to an
> out going ff in lowercase fields ?
> 3. i performed a validateRecord processor using lowercase avro schema
> against the incoming ff with uppercase fields and it returned 3 of the
> 10 fields, in lowercase, with null values, not sure what this means.
>
> Thanks
>
> Craig
>


Re: DistributeLoad across a NiFi cluster

2019-07-02 Thread Andrew Grande
Jim,

There's a better solution in NiFi. Right click on the connection between
ListFile and FetchFile and select a cluster distribution strategy in
options. That's it :)

Andrew

On Tue, Jul 2, 2019, 7:37 AM James McMahon  wrote:

> We would like to employ a DistributeLoad processor, restricted to run on
> the primary node of our cluster. Is there a recommended approach employed
> to efficiently distribute across nodes in the cluster?
>
> As I understand it, and using a FetchFile running in "all nodes" as the
> first processor following the DistributeLoad, I can have it distribute by
> round robin, next available, or load distribution service.  Can anyone
> provide a link to an example that employs the load distribution service? Is
> that the recommended distribution approach when running in clustered mode?
>
> I am interested in maintaining load balance across my cluster nodes when
> running at high flowfile volumes. Flow files will vary greatly in contents,
> so I'd like to design with an approach that helps me balance processing
> distribution.
>
> Thanks very much in advance. -Jim
>


Re: Would a currency parser record path function be useful?

2019-05-28 Thread Andrew Grande
Honestly, I have seen so many unique ways to mangle the currency format,
that I'm not even sure one can reliably parse one anymore.

But more importantly, isn't it the responsibility of a representation layer
to format and pretty print the value? Just have hard time seeing a
widespread need for that within NiFi itself.

This is just me sharing my experiences, not to discourage you from
contributing :)

Andrew

On Tue, May 28, 2019, 7:59 AM Mike Thomsen  wrote:

> We have a controller service that does some basic cleanup on currency to
> take a string and turn it into a float or double. It's not intended to be
> anything like the Java Money API, it just purges everything except raw
> currency-related characters and formats the text into a float or double.
>
> Would that be a helpful addition to the standard record path? I would
> imagine something like this:
>
> toCurrencyString(parseRawCurrency("USD10.0"), "US")
>
> = $100,00.00
>
> So the first function would just try to clean the string down to a bare
> double and the next one could use NumberFormats and Locales to give you a
> clean currency string.
>
> I looked into the Java Money API before proposing this, and it seems like
> it might be overkill and not even that helpful since converting currencies
> is largely useless without an accurate conversion rate data source.
>
> Thoughts?
>


Re: PutKafka use with large quantity of data?

2019-04-04 Thread Andrew Grande
What's the concurrency for these processors? What's a global NiFi thread
pool size?

I wonder if you might be running out of available threads while they are
waiting for external system i/o under load.

Andrew

On Thu, Apr 4, 2019, 8:24 AM l vic  wrote:

> What's this particular processing group does: writes large dataset to
> Kafka topic, one consumer reads from topic and saves data to Hbase/PQS
> table, another consumer writes to ES index
>
> On Thu, Apr 4, 2019 at 10:58 AM Joe Witt  wrote:
>
>> Can you share screenshots, logs, and a more detailed description of what
>> you're doing, observing with nifi and the system and what you expect it to
>> be doing.
>>
>> Thanks
>>
>> On Thu, Apr 4, 2019 at 10:56 AM l vic  wrote:
>>
>>> No, actually what happens is - NiFi stops responding ( if I use it
>>> without rate contol)
>>>
>>>
>>> On Thu, Apr 4, 2019 at 10:42 AM Joe Witt  wrote:
>>>
 Hello

 There isn't really a feedback mechanism based on load on the Kafka
 topic.  When you say overrunning the topic do you mean that you don't want
 there to be a large lag between consumers and their current offset and if
 that grows you want NiFi to slow down?  I dont believe there is anything
 inherent to the kafka producer protocol that would inform of us this.  We
 could periodically poll for this information and optional back-off.

 Is this what you have in mind?

 Thanks

 On Thu, Apr 4, 2019 at 10:34 AM l vic  wrote:

> I have to ingest large (200,000 messages) data set into Kafka topic as
> quickly as possible without overrunning topic... Right now I just use rate
> limiter to do it but can be there some better "adaptive" way to do it?
> Thank you...
> -V
>



Re: Weird ListFile Issue

2019-03-22 Thread Andrew Grande
Looks like the processor started listing CWD

On Fri, Mar 22, 2019, 2:00 PM William Gosse 
wrote:

> The real var name was aimuploaddir and the error was aimduploaddir.  Think
> I have a work around to prevent the tragedy was wiping out Nifi. I did this:
>
> ${aimuploaddir:replaceEmpty('')}
>
>
>
>
>
> *From:* Joe Witt 
> *Sent:* Friday, March 22, 2019 4:06 PM
> *To:* users@nifi.apache.org
> *Subject:* Re: Weird ListFile Issue
>
>
>
> [CAUTION: This email originated from outside of Kodak Alaris. Do not click
> links or open attachments unless you recognize the sender and know the
> content is safe.]
> --
>
> William
>
>
>
> What was the real dir name vs the erroneous name?   What might happens
> depends on many factors such as os behaviors.
>
>
>
> thanks
>
>
>
> On Fri, Mar 22, 2019, 12:47 PM William Gosse 
> wrote:
>
> I ran into kind of a weird issue with the ListFile processor.   I was
> referencing a variable for my input directory and had enter the variable
> name incorrectly.
>
> So I assumed that the with my incorrect variable name the value of the
> Input Directory would be null or an empty string.
>
> When I started the InputFile it started listing all the files inside of my
> Nifi install directory.  Things got  bad when the my FetchFile started to
> delete them.
>
> I'm on 1.9.1 but I was wondering if this was the normal behavior or some
> kind of bug.
>
> If anyone has seen this behavior and has some kind of viable workaround
> please let me know.
>
>


Re: Log Queries being executed by PutDatabaseRecord

2019-02-28 Thread Andrew Grande
Could ve a good idea to log values at TRACE level then.

On Wed, Feb 27, 2019, 7:56 AM Matt Burgess  wrote:

> True, at a DEBUG level we could output the record values, although for
> large flow files this will be quite verbose :) Also the point of the
> ?s is not necessarily to not show the values, but that we are
> technically only issuing one statement (i.e. PreparedStatement), and
> just the values change. This usually gives much better performance,
> and also when we execute them as a batch, we get more of a
> transactional capability (one flow file = one batch = one
> transaction).
>
> Please feel free to file a Jira to add record output to
> PutDatabaseRecord at a DEBUG level.
>
> Regards,
> Matt
>
> On Wed, Feb 27, 2019 at 7:05 AM Mike Thomsen 
> wrote:
> >
> > I could be mistaken, but I think that's standard JDBC behavior to not
> show the values. That said, yes it would be a fairly trivial improvement to
> add a dump of the record to a debug logger.
> >
> > On Wed, Feb 27, 2019 at 4:36 AM Fred Affini 
> wrote:
> >>
> >> Hi Matt and Phillip, thanks a lot for the help
> >>
> >> Setting PutDatabaseRecord log level to DEBUG (changing Bulletin Level
> in the
> >> GUI or insert the XML line Matt sent) almost gave me what I need, the
> log
> >> nos show:
> >>
> >> 10:32:55 CETDEBUGadcd1a7d-1000-1169-8aa8-92d8f2e891e5
> >> PutDatabaseRecord[id=adcd1a7d-1000-1169-8aa8-92d8f2e891e5] Executing
> query
> >> INSERT INTO SYSTEM.NIFI_INPUT_TEST (ID_BB_GLOBAL, BLOOMBERG_CODE,
> ID_ISIN,
> >> ID_CUSIP, ID_SEDOL, TICKER, ID_BB_COMPANY, ID_GRID, CRNCY, INSTYP,
> ISSUE_DT,
> >> CPN_TYP, CPN, MTY_TYP, MATURITY, WORKOUT_DT_BID) VALUES
> >> (?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?); fieldIndexes: [0, 1, 2, 3, 4, 5, 6,
> 7, 8,
> >> 9, 10, 11, 12, 13, 14, 15]; batch index: 1; batch size: 1499
> >>
> >> I think the log should put the values in the parameters, looks like it
> is
> >> being logged before preparing the statement. I still have a way to
> figure it
> >> out, since it gives me the field index and I suppose that: *batch
> index: 1;
> >> batch size: 1499* means the first line of my file (that has 1499 lines)
> >>
> >> Should this be a request for improvement in the processor?
> >>
> >> Regards,
> >> Fred
> >>
> >>
> >>
> >> --
> >> Sent from: http://apache-nifi-users-list.2361937.n4.nabble.com/
>


Re: Using variables in SSLContextService

2019-02-19 Thread Andrew Grande
Andy,

An incoming message dynamically setting the path to the trust/keystore was
the concern. Essentially could brute force a file system path by trying out
locations, doubt that is ever an intended behavior. A keytab CS was a good
analogy of a better imolrmentation for separation of access.

Then, not even sure if the code would work and reinit the TLS subsystem,
but still.

I believe we can address the requirement by VR alone. After all, shouldn't
we promotr best practices? Limiting the evaluation scope to VR only is an
easy concept to explain, has nice UX already and provides solid SDLC story
flow. It clicks easily with everyone I showed it to.

Supporting all kinds of runtime environment property evaluation is not
something I'd promote, too many ways to skin a cat and confuse things.

Andrew

On Tue, Feb 19, 2019, 7:39 PM Andy LoPresto  wrote:

> I think there are a couple distinct issues to address here.
>
>
>1. The claim that allowing EL evaluation for the keystore/truststore
>path is a security concern. What is the risk here? This input should be
>trusted (if someone is configuring the SSLContextService, they are aware of
>& using a system which has a keystore & truststore to which the OS user
>running NiFi has read access). Any user input which is used to read from
>the local filesystem anywhere in the application should be validated, but
>at some point, input from an authenticated and authorized user must be
>allowed in order to configure the system.
>   1. One could make the argument that this controller service should
>   be @Restricted, similar to the KeytabCredentialsService used to provide
>   protected access to various Kerberos key tabs without exposing file
>   locations to unauthorized users.
>2. I am unclear on the apparent distinction being drawn by some people
>here between variable substitution and expression language evaluation. My
>understanding is that a property descriptor can support expression language
>or not — a boolean decision. _If_ it supports expression language, it can
>allow variable access to “only" the Variable Registry (which includes
>custom properties files and OS environment variables) or it can include the
>flowfile attributes of each flowfile that passes through the component.
>Some properties are scoped to not allow per-flowfile access, but I am
>unaware of any property descriptor which supports variable substitution
>which does not allow the full complement of EL functions to be evaluated. I
>have verified this in NiFi 1.9.0-RC2 by putting EL code containing
>functions into the Input Directory PD of ListFile, which is scoped to
>“Variable Registry Only” — it successfully executes the EL functions. See
>code in [1] and [2] for more.
>3. I think it is a fair request for the keystore/truststore path
>property descriptors of the implementations of SSLContextService and
>RestrictedSSLContextService to evaluate EL with the variable scope of VR
>only. However, the password properties still will not accept EL at all. I
>think there are legitimate discussions to be had around adding the
>Restricted component permission to those controller services, and around
>separating EL function evaluation from simple variable substitution, but
>currently those topics have not been addressed.
>
>
>
> [1]
> https://github.com/apache/nifi/blob/master/nifi-api/src/main/java/org/apache/nifi/components/PropertyDescriptor.java#L269
> [2]
> https://github.com/apache/nifi/blob/master/nifi-api/src/main/java/org/apache/nifi/expression/ExpressionLanguageScope.java#L41
>
>
> Andy LoPresto
> alopre...@apache.org
> *alopresto.apa...@gmail.com *
> PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4  BACE 3C6E F65B 2F7D EF69
>
> On Feb 19, 2019, at 6:16 PM, Beutel, Maximilian <
> maximilian.beu...@credit-suisse.com> wrote:
>
> Andrew,
>
> Yes, I was exploring to see if I can use variable registry values in the
> properties of the SSL Context Service. I wouldn’t need full expression
> language support.
>
> To give an example of what I want to do:
>
> My keystore is a .p12 file, call it store.p12. On my development box it
> resides in a different location than on my prod.
>
> On dev: C:/Users/max/store.p12
> On prod: /etc/store.p12
>
> So my idea was to define the keystore path such as
> ${nifi.system.properties.dir}/store.p12 in the CS and then I can easily
> override the directory based on which stage I am using the variable
> registry.
>
> If you guys agree that this is a reasonable request, is it ok for me then
> to raise such a feature request in
> https://issues.apache.org/jira/projects/NIFI/issues/NIFI-4610?filter=allopen

Re: Using variables in SSLContextService

2019-02-19 Thread Andrew Grande
Mike, I think the ask here is for this CS to support variable registry
values. IIRC, there are other cases in NiFi where EL is not supported, but
VR is. A fair request, IMO.

Supporting a full EL for the keystore/truststore path is a bad idea, no
doubt.

Do you agree?

Andrew

On Tue, Feb 19, 2019, 3:33 AM Mike Thomsen  wrote:

> When expression language is not supported by a field, it won't accept any
> variables.
>
> Mike
>
> On Mon, Feb 18, 2019 at 10:34 PM Beutel, Maximilian <
> maximilian.beu...@credit-suisse.com> wrote:
>
>> Hello!
>>
>>
>>
>> Also asked the question on IRC, but figured the mailing list might be
>> better for this longer question.
>>
>>
>>
>> For an InvokeHTTP processor I defined a SSL Context Service. In the SSL
>> Context Service, in Keystore Filename property, I’d like to use a variable
>> which I defined in a nifi.registry file. So in my nifi.properties I have:
>>
>>
>>
>> nifi.variable.registry.properties=./conf/custom.properties
>>
>>
>>
>> And in the conf/custom.properties I have:
>>
>>
>>
>> nifi.system.properties.file=C:/Users/some.file
>>
>>
>>
>> And in the field Keystore Filename in the SSL Context Service I input:
>>
>>
>>
>> ${nifi.system.properties.file}
>>
>>
>>
>> But then saving the SSL Context Service doesn’t work anymore, the
>> validation fails and says “${nifi.system.properties.file} does not exist”.
>> The actual file does exist however, but I suspect that the variable doesn’t
>> get interpolated.
>>
>>
>>
>> According to
>> https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi/nifi-ssl-context-service-nar/1.5.0/org.apache.nifi.ssl.StandardSSLContextService/index.html
>> it seems like expression language is not supported for Keystore Filename
>> property. Does this also imply that variables won’t work in that field?
>>
>>
>>
>> Thanks for your help!
>>
>>
>>
>> Max
>>
>>
>>
>>
>>
>> ==
>> Please access the attached hyperlink for an important electronic
>> communications disclaimer:
>> http://www.credit-suisse.com/legal/en/disclaimer_email_ib.html
>>
>> ==
>>
>


Re: Nifi provenance indexing throughput if it is being used as an event store

2019-02-15 Thread Andrew Grande
NiFi provenance searches are not a good integration pattern for external
systems. I.e. using it to periodicaly fetch history burdens the cluster
(those searches can be heavy) and disrupt normal processing SLAs.

Pushing provenance events out to an external system (pitebtially even
filtered down to components of interest) is a much more predictable pattern
and provides lots of flexibility on how to interpret the events.

Andrew

On Thu, Feb 14, 2019, 11:26 PM Ali Nazemian  wrote:

> Can I expect the Nifi search provenance part do the job for me?
>
> On Fri, 15 Feb. 2019, 13:21 Mike Thomsen 
>> Ali,
>>
>> There is a site to site publishing task for provenance that you can add
>> as a root controller service that would be great here. It'll just take all
>> of your provenance data periodically and ship it off to another NiFi server
>> or cluster that can process all of the provenance data as blocks of JSON
>> data. A common pattern there is to filter down to the events you want and
>> publish to ElasticSearch.
>>
>> On Thu, Feb 14, 2019 at 7:05 PM Ali Nazemian 
>> wrote:
>>
>>> Hi All,
>>>
>>> I am investigating to see how Nifi provenance can be used as an event
>>> store for a long period of time. Our use case is very burst based and
>>> sometimes we may not receive any event for a period of time and sometimes
>>> we may get burst traffic. On average we can say maybe around 1000 eps is
>>> the expected throughput at this stage. Nifi has a powerful provenance that
>>> gives you an ability to also index based on some attributes. I am
>>> investigating how reliable is to use Nifi provenance store for a long
>>> period of time and enable index for a few extra attributes. Has anybody
>>> used Nifi provenance at this scale? Can lots of Lucene indices create other
>>> issues within Nifi as provenance uses Lucene for the indexing?
>>>
>>> P.S: Our use case is pretty light for Nifi as we are not going to have
>>> any ETL and Nifi is being used mostly as an Orchestrator of multiple
>>> Microservices.
>>>
>>> Regards,
>>> Ali
>>>
>>


Re: Record-oriented DetectDuplicate?

2019-02-08 Thread Andrew Grande
Can I suggest a time-based option for specifying the window? I think we
only mentioned the number of records.

Andrew

On Fri, Feb 8, 2019, 8:22 AM Mike Thomsen  wrote:

> Thanks. That answers it succinctly for me. I'll build out a
> DetectDuplicateRecord processor to handle this.
>
> On Fri, Feb 8, 2019 at 11:17 AM Mark Payne  wrote:
>
>> Matt,
>>
>> That would work if you want to select distinct records in a given
>> FlowFIle but not across FlowFiles.
>> PartitionRecord -> UpdateAttribute (optionally to combine multiple
>> attributes into one) -> DetectDuplicate
>> would work, but given that you expect the records to be unique generally,
>> this would have the effect of
>> splitting each FlowFile into Record-per-FlowFile, which is certainly not
>> ideal.
>>
>> Thanks
>> -Mark
>>
>>
>> > On Feb 8, 2019, at 11:14 AM, Matt Burgess  wrote:
>> >
>> > Mike,
>> >
>> > I don't think so, but you could try a SELECT DISTINCT in QueryRecord,
>> > might be a bit of a pain if you want to select all columns and there
>> > are lots of them.
>> >
>> > Alternatively you could try PartitionRecord -> QueryRecord (select *
>> > limit 1). Neither PartitionRecord nor QueryRecord keeps state so you'd
>> > likely need to use distributed cache or UpdateAttribute.
>> >
>> > Regards,
>> > Matt
>> >
>> > On Fri, Feb 8, 2019 at 11:08 AM Mike Thomsen 
>> wrote:
>> >>
>> >> Do we have anything like DetectDuplicate for the Record API already?
>> Didn't see anything, but wanted to ask before reinventing the wheel.
>> >>
>> >> Thanks,
>> >>
>> >> Mike
>>
>>


Re: Preferred schema registry

2019-01-16 Thread Andrew Grande
Isn't AvroSchemaRegistry an embedded one to NiFi? I.e. it's not a
comparable alternative to an external dedicated schema registry.

Andrew

On Wed, Jan 16, 2019 at 7:43 AM Mike Thomsen  wrote:

> I think I figured out what it was. We backed out a change and it was
> fungible with an earlier version of the schema. Therefore it appears that
> it didn't create a new version of the schema. When I made a more meaningful
> change, it created a new 4th version.
>
> Thanks,
>
> Mike
>
> On Wed, Jan 16, 2019 at 10:39 AM Bryan Bende  wrote:
>
>> Mike,
>>
>> Do you have anymore details about why the Hortonworks schema registry
>> stopped working?
>>
>> I have used it before and didn’t have any issues.
>>
>> Thanks
>>
>> Bryan
>>
>> On Tue, Jan 15, 2019 at 8:38 PM dan young  wrote:
>>
>>> We used the AvroSchemaRegistry
>>>
>>> Dano
>>>
>>> On Tue, Jan 15, 2019, 12:51 PM Mike Thomsen >> wrote:
>>>
 What schema registry are others using in production use cases? We tried
 out the HortonWorks registry, but it seemed to stop accepting updates once
 we hit "v3" of our schema (we didn't name it v3, that's the version it
 showed in the UI). So I'd like to know what others are doing for their
 registry use since we're looking at either Confluent or going back to the
 AvroSchemaRegistry.

 Thanks,

 Mike

>>> --
>> Sent from Gmail Mobile
>>
>


Re: Truststore/Trusted hostname

2019-01-09 Thread Andrew Grande
Walter, you could point to the default JRE truststore file, maybe.

Andrew

On Wed, Jan 9, 2019, 7:12 AM Kevin Doran  wrote:

> Hi Walter,
>
> I could be mistaken, but my interpretation of the Trusted Hostname
> configuration option is that it is designed to work with/in-addition-to the
> truststore, not instead of a truststore as an alternative trust mechanism.
>
> Basically, I think it is to be used in situations when the default
> hostname verifier (i.e., the remote hostname must match the hostname/SANs
> of the certificate) prevents the connection. IF you have a reason the
> hostname does not match the cert (for example, a dev/test environment) you
> could whitelist an alternative hostname while still making aan HTTPS
> connection.
>
> Note that when using this option there are man-in-the-middle attack
> implications you should consider.
>
> Hope this helps!
>
> Cheers,
> Kevin
>
>
> On January 9, 2019 at 03:28:45, Vos, Walter (walter@ns.nl) wrote:
> > Hi,
> >
> > I'm trying to use the invokeHttp processor to POST to an https site
> through a proxy. The
> > proxy is http. Through some googling I found references that Java is
> rather finicky with
> > SSL connections and wants the target server certificate in its
> truststore, but InvokeHttp
> > also offers the trusted hostname parameter.
> >
> > Because I don't have CLI access to the server that NiFi runs on, that
> seemed like the way
> > to get what I want and I added the hostname to the Trusted Hostname. The
> domain is in a form
> > of subsub.sub.domain.tld and I've tried it just like, as well as
> *.sub.domain.tld and
> > *.domain.tld and domain.tld, but I keep getting this Java exception:
> >
> > sun.security.validator.ValidatorException: PKIX path building failed:
> sun.security.provider.certpath.SunCertPathBuilderException:
> > unable to find valid certification path to requested target
> >
> > Am I doing something wrong? Is truststore really the only way to go?
> We're working with
> > HDF 3.1.0 / NiFi 1.5.0.*
> >
> > Cheers, Walter
> >
> > 
> >
> > Deze e-mail, inclusief eventuele bijlagen, is uitsluitend bestemd voor
> (gebruik door)
> > de geadresseerde. De e-mail kan persoonlijke of vertrouwelijke
> informatie bevatten.
> > Openbaarmaking, vermenigvuldiging, verspreiding en/of verstrekking van
> (de inhoud
> > van) deze e-mail (en eventuele bijlagen) aan derden is uitdrukkelijk
> niet toegestaan.
> > Indien u niet de bedoelde geadresseerde bent, wordt u vriendelijk
> verzocht degene die
> > de e-mail verzond hiervan direct op de hoogte te brengen en de e-mail
> (en eventuele bijlagen)
> > te vernietigen.
> >
> > Informatie vennootschap
> >
>
>


Re: Variables handling in MiNiFi

2018-12-19 Thread Andrew Grande
Luie,

Which version are you looking at? I think there was a recent discussion
about supporting those in MiNiFi, not sure if it got imoleme ted already,
though.

Andrew

On Wed, Dec 19, 2018, 9:26 AM luis_size  wrote:

> Hi
>
> I am a big fan of NiFi variables registry. This makes instantiating
> versions of the same workflow easier. However, when I save a PG as a
> template, export it, convert it to YML for MiNiFi agent, the value of
> variable is not exported.
>
> Is there a support of variable in MiNiFi? any ongoing work on this?
>
> Thanks
> Luis
>


Re: NiFi JSON enrichment

2018-12-17 Thread Andrew Grande
James,

The easiest would be to merge json in a custom processor. Not easy as in no
work at all, but given your limitations with the NiFi version could be done
sooner maybe.

Andrew

On Mon, Dec 17, 2018, 9:53 AM James Srinivasan 
wrote:

> Hi all,
>
> I'm trying to enrich a data stream using NiFi. So far I have the following:
>
> 1) Stream of vehicle data in JSON format containing (id, make, model)
> 2) This vehicle data goes into HBase, using id as the row key and the
> json data as the cell value (cf:json)
> 3) Stream of position data in JSON format, containing (id, lat, lon)
> 4) I extract the id from each of these items, then use FetchHBaseRow
> to populate the hbase.row attribute with the json content
> corresponding to that vehicle
> 5) I want to merge the NiFI attribute (which is actually JSON) into
> the rest of the content, so I end up with (id, lat, lon, make, model).
> This is where I am stuck - using the Jolt processor, I keep getting
> unable to unmarshal json to an object
>
> Caveats
>
> 1) I'm on NiFi 1.3
> 2) Much as I would like to use the new record functionality, I'm
> trying to be schema agnostic as much as possible
>
> Is this the right approach? Is there an easy way to add the attribute
> value as a valid JSON object? Maybe ReplaceText capturing the trailing
> } would work?
>
> Thanks in advance,
>
> James
>


Re: NiFi Toolkit CLI issues with NiFi/Registry SSL handshake

2018-10-29 Thread Andrew Grande
Guessing - if it's a new container on the first invocation, maybe the JVM
is generating the binary class cache? This operation is performed only once
ever, but with a clean environment every time I can see it being invoked
again and again.

Other than that, you'd need to connect a profiler and see where it spends
most of the time.

Andrew

On Mon, Oct 29, 2018, 1:22 PM ara m.  wrote:

> Hi Andrew - yes its a container environment. Everything I work with is in
> effect 'dockerized'. I created the container environment with 128min memory
> and 256 max memory and 500m cpu which is half a core (2.40GHz per core). I
> think that should be fine for the CLI but you tell me otherwise.
>
> Hi Andy - I was hoping the TLS handshake was the reason but that parameter
> for the JVM did not solve it. The speed remained the same, slow. I found
> out
> the jvm i was using already had /dev/urandom it was using.
>
> Calling cli.sh   is still 10+ seconds.
>
> Here's the odd thing.. Using the CLI, the very first command is again 10+
> seconds, but subsequent commands are 1 second if that.
> The only one that takes 2-3 seconds is the pg-import one but thats because
> its a hefty flow (100+ processors there).
>
> I also tried setting '-XX:+UseAES -XX:+UseAESIntrinsics' as well. ("TLS/SSL
> Performance Improvements for Java"
>
> https://docs.hazelcast.org/docs/latest-development/manual/html/Security/TLS-SSL.html
> ).
>
> I'm puzzled why using the CLI, the first command is fast, and rest are
> fast,
> while calling the cli from outside is always same 10+ seconds slowness...
>
>
>
> --
> Sent from: http://apache-nifi-users-list.2361937.n4.nabble.com/
>


Re: NiFi Toolkit CLI issues with NiFi/Registry SSL handshake

2018-10-28 Thread Andrew Grande
Are you running in some container environment? It should never take 15 secs
and there's no caching performed by cli. I would review the container
environment and see why it's taking forever to start.

Andrew

On Sun, Oct 28, 2018, 10:11 AM ara m.  wrote:

> The Keystore user had "view processor" and "modify processor" permissions.
> Still did not work. Its odd since NiFi UI can make imports, so the CLI
> should be able to.
>
> Speed wise.. JVM takes a few seconds to start. The command to list buckets
> takes ~13 seconds. Its kind of rough. I noticed however from inside the
> cli,
> the first command is slow, but then things start getting cached and running
> faster. You dont get the same benefits when executing many commands by
> calling ./cli.sh  .
>
> I think I'm going to have a problem in deployment with setting variables..
> since every operation takes 12-15 seconds.
>
> You can imagine a standard deployment goes like this
>
> bucket_id = registry list-buckets with this $bucket_name
> flow_id = registry list-flows get flow_id with this bucket_id and this
> $flow_name
> pg_id = nifi pg-import with this bucket_id, flow_id, and flow_version
>
> now lets say you have 20-30 variables to set
> nifi pg-set-var -pgid $PGID -var someVar -val someVal
>
> nifi pg-start
>
> That is a typical deployment. But if you have 20-30 variables to set and
> each one takes ~12-13 seconds to set, youre looking at ~5 minutes just
> setting variables.
>
> Thoughts? Why isnt there an equivalent Registry REST call for it? Perhaps
> that will be quicker.. although i like using the CLI only.
>
> I noticed there is a rest api for nifi itself, but it seems to want to
> replace the whole contents of variable registry rather then just set 1
> variable which is what the CLI provides
> (/process-groups/{id}/variable-registry).
>
>
>
>
>
> --
> Sent from: http://apache-nifi-users-list.2361937.n4.nabble.com/
>


Re: how to organise processor or PG in the Nifi canvas ?

2018-09-14 Thread Andrew Grande
Try importing via thr NiFi cli tool. I've implemented the logic to auto
layout newly imported PGs. But I agree, this should be something included
in the core NiFi logic itself.

Andrew

On Fri, Sep 14, 2018, 4:17 PM Dominique De Vito  wrote:

> Hi,
>
> Is there a way to re-organize automatically the processors or PG inside
> the Nifi canvas ?
>
> I create PG through REST api.
> I set originX and originY for all PG I create, but the created PG could be
> on top of another PG (for example, previously existing) and the first one
> could be hiding the second one.
>
> So, I wonder if there is already a way in Nifi to re-organize the "boxes"
> inside the Nifi canvas to minimize box overlapping for example.
>
> Any idea ?
>
> Thanks.
>
> Dominique
>
>


Re: AVRO is the only output format with ExecuteSQL

2018-08-07 Thread Andrew Grande
I'd really like to see the Record suffix on the processor for
discoverability, as already mentioned.

Andrew

On Tue, Aug 7, 2018, 2:16 PM Matt Burgess  wrote:

> Yeah that's definitely doable, most of the logic for writing a
> ResultSet to a Flow File is localized (currently to JdbcCommon but
> also in ResultSetRecordSet), so I wouldn't think it would be too much
> refactor. What are folks thoughts on whether to add a Record Writer
> property to the existing ExecuteSQL or subclass it to a new processor
> called ExecuteSQLRecord? The former is more consistent with how the
> SiteToSite reporting tasks work, but this is a processor. The latter
> is more consistent with the way we've done other record processors,
> and the benefit there is that we don't have to add a bunch of
> documentation to fields that will be ignored (such as the Use Avro
> Logical Types property which we wouldn't need in a ExecuteSQLRecord).
> Having said that, we will want to offer the same options in the Avro
> Reader/Writer, but Peter is working on that under NIFI-5405 [1].
>
> Thanks,
> Matt
>
> [1] https://issues.apache.org/jira/browse/NIFI-5405
>
> On Tue, Aug 7, 2018 at 2:06 PM Andy LoPresto  wrote:
> >
> > Matt,
> >
> > Would extending the core ExecuteSQL processor with an ExecuteSQLRecord
> processor also work? I wonder about discoverability if only one processor
> is present and in other places we explicitly name the processors which
> handle records as such. If the ExecuteSQL processor handled all the SQL
> logic, and the ExecuteSQLRecord processor just delegated most of the
> processing in its #onTrigger() method to super, do you foresee any
> substantial difficulties? It might require some refactoring of the parent
> #onTrigger() to service methods.
> >
> >
> > Andy LoPresto
> > alopre...@apache.org
> > alopresto.apa...@gmail.com
> > PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4  BACE 3C6E F65B 2F7D EF69
> >
> > On Aug 7, 2018, at 10:25 AM, Andrew Grande  wrote:
> >
> > As a side note, one has to ha e a serious justification _not_ to use
> record-based processors. The benefits, including performance, are too
> numerous to call out here.
> >
> > Andrew
> >
> > On Tue, Aug 7, 2018, 1:15 PM Mark Payne  wrote:
> >>
> >> Boris,
> >>
> >> Using a Record-based processor does not mean that you need to define a
> schema upfront. This is
> >> necessary if the source itself cannot provide a schema. However, since
> it is pulling structured data
> >> and the schema can be inferred from the database, you wouldn't need to.
> As Matt was saying, your
> >> Record Writer can simply be configured to Inherit Record Schema. It can
> then write the schema to
> >> the "avro.schema" attribute or you can choose "Do Not Write Schema".
> This would still allow the data
> >> to be written in JSON, CSV, etc.
> >>
> >> You could also have the Record Writer choose to write the schema using
> the "avro.schema" attribute,
> >> as mentioned above, and then have any down-stream processors read the
> schema from this attribute.
> >> This would allow you to use any record-oriented processors you'd like
> without having to define the
> >> schema yourself, if you don't want to.
> >>
> >> Thanks
> >> -Mark
> >>
> >>
> >>
> >> On Aug 7, 2018, at 12:37 PM, Boris Tyukin 
> wrote:
> >>
> >> thanks for all the responses! it means I am not the only one interested
> in this topic.
> >>
> >> Record-aware version would be really nice, but a lot of times I do not
> want to use record-based processors since I need to define a schema for
> input/output upfront and just want to run SQL query and get whatever
> results back. It just adds an extra step that will be subject to
> break/support.
> >>
> >> Similar to Kafka processors, it is nice to have an option of
> record-based processor vs. message oriented processor. But if one processor
> can do it all, it is even better :)
> >>
> >>
> >> On Tue, Aug 7, 2018 at 9:28 AM Matt Burgess 
> wrote:
> >>>
> >>> I'm definitely interested in supporting a record-aware version as well
> >>> (I wrote the Jira up last year [1] but haven't gotten around to
> >>> implementing it), however I agree with Peter's comment on the Jira.
> >>> Since ExecuteSQL is an oft-touched processor, if we had two processors
> >>> that only differed in how the output is formatted, it could be harder
> >>> to maintain (bugs to be fixed in two pl

Re: AVRO is the only output format with ExecuteSQL

2018-08-07 Thread Andrew Grande
As a side note, one has to ha e a serious justification _not_ to use
record-based processors. The benefits, including performance, are too
numerous to call out here.

Andrew

On Tue, Aug 7, 2018, 1:15 PM Mark Payne  wrote:

> Boris,
>
> Using a Record-based processor does not mean that you need to define a
> schema upfront. This is
> necessary if the source itself cannot provide a schema. However, since it
> is pulling structured data
> and the schema can be inferred from the database, you wouldn't need to. As
> Matt was saying, your
> Record Writer can simply be configured to Inherit Record Schema. It can
> then write the schema to
> the "avro.schema" attribute or you can choose "Do Not Write Schema". This
> would still allow the data
> to be written in JSON, CSV, etc.
>
> You could also have the Record Writer choose to write the schema using the
> "avro.schema" attribute,
> as mentioned above, and then have any down-stream processors read the
> schema from this attribute.
> This would allow you to use any record-oriented processors you'd like
> without having to define the
> schema yourself, if you don't want to.
>
> Thanks
> -Mark
>
>
>
> On Aug 7, 2018, at 12:37 PM, Boris Tyukin  wrote:
>
> thanks for all the responses! it means I am not the only one interested in
> this topic.
>
> Record-aware version would be really nice, but a lot of times I do not
> want to use record-based processors since I need to define a schema for
> input/output upfront and just want to run SQL query and get whatever
> results back. It just adds an extra step that will be subject to
> break/support.
>
> Similar to Kafka processors, it is nice to have an option of record-based
> processor vs. message oriented processor. But if one processor can do it
> all, it is even better :)
>
>
> On Tue, Aug 7, 2018 at 9:28 AM Matt Burgess  wrote:
>
>> I'm definitely interested in supporting a record-aware version as well
>> (I wrote the Jira up last year [1] but haven't gotten around to
>> implementing it), however I agree with Peter's comment on the Jira.
>> Since ExecuteSQL is an oft-touched processor, if we had two processors
>> that only differed in how the output is formatted, it could be harder
>> to maintain (bugs to be fixed in two places, e.g.). I think we should
>> add an optional RecordWriter property to ExecuteSQL, and the
>> documentation would reflect that if it is not set, the output will be
>> Avro with embedded schema as it has always been. If the RecordWriter
>> is set, either the schema can be hardcoded, or they can use "Inherit
>> Record Schema" even though there's no reader, and that would mimic the
>> current behavior where the schema is inferred from the database
>> columns and used for the writer. There is precedence for this pattern
>> in the SiteToSite reporting tasks.
>>
>> To Bryan's point about history, Avro at the time was the most
>> descriptive of the solutions because it maintains the schema and
>> datatypes with the data, unlike JSON, CSV, etc. Also before the record
>> readers/writers, as Bryan said, you pretty much had to split,
>> transform, merge. We just need to make that processor (and others with
>> specific input/output formats) "record-aware" for better performance.
>>
>> Regards,
>> Matt
>>
>> [1] https://issues.apache.org/jira/browse/NIFI-4517
>> On Tue, Aug 7, 2018 at 9:20 AM Bryan Bende  wrote:
>> >
>> > I would also add that the pattern of splitting to 1 record per flow
>> > file was common before the record processors existed, and generally
>> > this can/should be avoided now in favor of processing/manipulating
>> > records in place, and keeping them together in large batches.
>> >
>> >
>> >
>> > On Tue, Aug 7, 2018 at 9:10 AM, Andrew Grande 
>> wrote:
>> > > Careful, that makes too much sense, Joe ;)
>> > >
>> > >
>> > > On Tue, Aug 7, 2018, 8:45 AM Joe Witt  wrote:
>> > >>
>> > >> i think we just need to make an ExecuteSqlRecord processor.
>> > >>
>> > >> thanks
>> > >>
>> > >> On Tue, Aug 7, 2018, 8:41 AM Mike Thomsen 
>> wrote:
>> > >>>
>> > >>> My guess is that it is due to the fact that Avro is the only record
>> type
>> > >>> that can match sql pretty closely feature to feature on data types.
>> > >>> On Tue, Aug 7, 2018 at 8:33 AM Boris Tyukin 
>> > >>> wrote:
>> > >>>>
>> > >>&g

Re: Multiple registry instances sharing Git repo

2018-08-07 Thread Andrew Grande
There are also registry event hooks to help with the automation.

Andrew

On Tue, Aug 7, 2018, 12:46 PM Bryan Bende  wrote:

> Mike,
>
> It is not really made to have multiple instances pointing at the same git
> repo.
>
> Each registry has a metadata DB and a flow storage component which can
> be filesystem or git.
>
> So the metadata DB won't know about the stuff created in the other
> instance.
>
> Generally it is either a single shared registry, or multiple
> registries each with their own back-end storage mechanisms and then
> you can use the CLI to promote flows between the registry instances.
>
> -Bryan
>
>
> On Tue, Aug 7, 2018 at 12:32 PM, Mike Thomsen 
> wrote:
> > Has anyone tried having two or more registry instances pointing to the
> same
> > repo and keeping them in sync?
> >
> > We have a NiFi deployment where it would be an easier sell to have 3
> > instances of the registry sharing the same repo than to have one instance
> > that is a big exception to the network security posture that separates
> dev,
> > test and prod environments.
> >
> > Any ideas on how to do this?
> >
> > Thanks,
> >
> > Mike
>


Re: AVRO is the only output format with ExecuteSQL

2018-08-07 Thread Andrew Grande
Careful, that makes too much sense, Joe ;)

On Tue, Aug 7, 2018, 8:45 AM Joe Witt  wrote:

> i think we just need to make an ExecuteSqlRecord processor.
>
> thanks
>
> On Tue, Aug 7, 2018, 8:41 AM Mike Thomsen  wrote:
>
>> My guess is that it is due to the fact that Avro is the only record type
>> that can match sql pretty closely feature to feature on data types.
>> On Tue, Aug 7, 2018 at 8:33 AM Boris Tyukin 
>> wrote:
>>
>>> I've been wondering since I started learning NiFi why ExecuteSQL
>>> processor only returns AVRO formatted data. All community examples I've
>>> seen then convert AVRO to json and pretty much all of them then split
>>> json to multiple flows.
>>>
>>> I found myself doing the same thing over and over and over again.
>>>
>>> Since everyone is doing it, is there a strong reason why AVRO is liked
>>> so much? And why everyone continues doing this 3 step pattern rather than
>>> providing users with an option to output json instead and another option to
>>> output one flowfile or multiple (one per record).
>>>
>>> thanks
>>> Boris
>>>
>>


Re: docker nifi cluster not working

2018-07-20 Thread Andrew Grande
I think the idea is, if there is anything available today, it will have to
be built from sources, including the new docker image. I.e. it's not a
public image yet.

Andrew

On Fri, Jul 20, 2018, 7:20 AM Chris Herssens 
wrote:

> Hello Mike,
>
> Where can I find nifi 1.8 image for docker ?
> I get
> Pulling nifi (apache/nifi:1.8.0-SNAPSHOT-dockermaven)...
> ERROR: manifest for apache/nifi:1.8.0-SNAPSHOT-dockermaven not found
>
> Regards,
>
> Chris
>
>
>
> On Fri, Jul 20, 2018 at 1:10 PM Mike Thomsen 
> wrote:
>
>> Cluster support is only in 1.8.
>> On Fri, Jul 20, 2018 at 7:02 AM Chris Herssens 
>> wrote:
>>
>>> Hello All,
>>>
>>> I try to setup a nifi cluster with docker for windows.
>>> For that I use the docker-compose file from the nifi repository
>>> (https://github.com/apache/nifi/tree/master/nifi-docker/docker-compose)
>>>
>>> I launch 3 nifi instance , but the instances are not clustered.  In the
>>> nifi properties files the cluster variables are not changed :
>>>
>>> nifi.cluster.is.node=false
>>> nifi.cluster.node.address=
>>> nifi.cluster.node.protocol.port=
>>>
>>> also zookeeper connect setting is empty
>>>
>>> docker image for nifi is set to  apache/nifi:1.7.1
>>>
>>> regards,
>>>
>>> Chris
>>>
>>


Re: NiFi Registry with nested PGs

2018-07-19 Thread Andrew Grande
I would also check the poll period. E.g. give it 40 secs or more to detect
new versions in the registry, it's not real time.

Andrew

On Thu, Jul 19, 2018, 6:35 AM Mike Thomsen  wrote:

> Thanks, Kevin. If I get a chance I'll try it out again because it appeared
> to be the case that after the definition changed to B:v2 that A did not
> show any sign it was changed. Could just be me misremembering it because it
> was late when we tried it.
>
> On Wed, Jul 18, 2018 at 9:50 PM Kevin Doran  wrote:
>
>> Hi Mike,
>>
>> Yes, this is expected behavior.
>>
>> Let's say I have a PG A that has a nested versioned PG B, both are at
>> version 1. Because PG B is versioned, the full definition of PG A does not
>> extend down into PG B, it stops at a reference to "PG B:v1". Because PG B
>> is versioned independently, a reference to that versioned PG id at version
>> number fully defines PG A. Let's say I change the definition of PG B. PG B
>> will show up on my canvas as "version 1 with local changes", but until I
>> commit those changes, it is still at version 1 (modified), so PG A is still
>> defined both locally and in NiFi Registry as "containing PG B:v1".
>> Therefore, PG A, at this point in time, does not have any local changes
>> (because compared to Registry, it has the same definition), which is why it
>> shows up as such.
>>
>> Once I commit PG B's local changes, PG B on my canvas is now at version
>> 2, and my local PG A's definition has changed to include a reference to PG
>> B:v2. This differs from what is in Registry for PG A:v1, so now PG A shows
>> local changes. If I revert local changes, I will go back to PG A:v1
>> referencing PG B:v1. If I commit local changes, PG A is now also at version
>> 2, which references PG B:v2.
>>
>> Fair to say this can be surprising/confusing behavior at first, even
>> though technically correct. This is why the top level version indicators on
>> the global status bar are useful to see if there are any changes, nested or
>> otherwise, in the overall flow.
>>
>> Best,
>> Kevin
>>
>>
>> On Wed, Jul 18, 2018 at 9:24 PM, Mike Thomsen 
>> wrote:
>>
>>> We have 0.2 hooked up to a NiFi instance that has nested PGs. All PGs
>>> are versioned. When one of the inner ones has local changes, the out of
>>> sync icon doesn't appear on the parent PGs. Is that expected behavior? No
>>> one really minds it, but I didn't have an answer as to whether we stumbled
>>> onto a bug or it's expected behavior.
>>>
>>> Thanks,
>>>
>>> Mike
>>>
>>
>>


Re: Only get file when a set exists.

2018-05-27 Thread Andrew Grande
Martijn,

Here's an idea you could explore. Have the ListFile processor work as usual
and create a custom component (start with a scripting one to prototype)
grouping the filenames as needed. I don't know of the number of files in a
set is different every time, so trying to be more robust.

Once you group and count the set, you can transfer the names to the success
relationship. Ignore otherwise and wait until the set is full.

Andrew

On Sun, May 27, 2018, 7:29 AM Martijn Dekkers 
wrote:

> Hello all,
>
> I am trying to work out an issue with little success.
>
> I need to ingest files generated by some application. I can only ingest
> these files when a specific set exists. For example:
>
> file_123_456_ab.ex1
> file_123_456_cd.ex1
> file_123_456_ef.ex1
> file_123_456_gh.ex1
> file_123_456.ex2
>
> Only when a set like that exists should I pick them up into the Flow. The
> parts I am looking for to "group" would "ab.ex1", "cd.ex1", "ef.ex1",
> "gh.ex1", ".ex2".
>
> I tried to do this with some expression, but couldn't work it out.
>
> What would be the best way to achieve this?
>
> Many thanks!
>


Re: Allowing all users to connect

2018-05-17 Thread Andrew Grande
Juan,

A cert implies one knows the identity of the cert holder.

I'd imagine if you shared it with multiple users, you would have achieved
this semi-anonymous requirement.

I would take a really deep look into why you want to do it this way,
though. Defeats the purpose of security. Is there a problem issuing client
certificates?

Andrew

On Wed, May 16, 2018, 9:50 PM Juan Sequeiros  wrote:

> Hello all,
>
> Is there a way I can set up authorization to secured UI to anyone who has
> a valid cert with out knowing who they will be?
>


Re: PDF generating processor

2018-05-11 Thread Andrew Grande
Hi Mike,

Thanks for sharing this. May I also suggest you provide a binary NAR
download on the releases page in this github? The NiFi build pupeline is
non-trivial, it would help folks get started quickly.

Andrew

On Fri, May 11, 2018, 7:40 AM Mike Thomsen  wrote:

> A few nights ago, I wrote a NiFi PDF processor that generates PDFs from
> HTML built from Mustache templates. It uses iText 7, so unfortunately it's
> stuck with the AGPL  and thus cannot ever be merged into the official code
> base.
>
> I chose Mustache over FreeMarker, Velocity, etc. because the goal was to
> keep it really simple for interacting with the JSON inputs.
>
> https://github.com/MikeThomsen/nifi-pdf-generator-bundle
>
>


Re: NiFi cluster with DistributedMapCacheServer/Client

2018-04-13 Thread Andrew Grande
HBase-backed cache is a great choice here. Redis is nice and nimble, but
when it comes to clustering and enterprise security, may not be the best
fit. The original legacy cache server in NiFi is... well, it should be
deprecated and removed, IMO :)

Andrew

On Fri, Apr 13, 2018, 3:45 PM James Srinivasan 
wrote:

> Thanks, I might try moving to the HBase implementation anyway because:
>
> 1) It is already in NiFi 1.3
> 2) We already have HBase installed (but unused) on our cluster
> 3) There doesn't seem to be a limit to the number of cache entries.
> For our use case (avoiding downloading the same file multiple times)
> it was always a bit icky to set the number of cache entries to
> something that should be "big enough"
>
> Thanks again,
>
> James
>
> On 13 April 2018 at 20:24, Joe Witt  wrote:
> > James,
> >
> > You have it right about the proper solution path..  I think we have a
> > Redis one in there now too that might be interesting (not in 1.3.0
> > perhaps but..).
> >
> > We offered a simple out of the box one early and to ensure the
> > interfaces are right.  Since then the community has popped up some
> > real/stronger implementations like you're mentioning.
> >
> > Thanks
> >
> > On Fri, Apr 13, 2018 at 7:14 PM, James Srinivasan
> >  wrote:
> >> Hi all,
> >>
> >> Is there a recommended way to set up a
> >> DistributedMapCacheServer/Client on a cluster, ideally with some
> >> amount of HA (NiFi 1.3.0)? I'm using a shared persistence directory,
> >> and when adding and enabling the controller it seems to start on my
> >> primary node (but not the other two - status keeps saying "enabling"
> >> rather than "enabled"). Adding the DistributedMapCacheClientService is
> >> harder, because I have to specify the host it runs on. Setting it to
> >> the current primary node works, but presumably won't fail over?
> >>
> >> I guess the proper solution is to use the HBase versions (or even
> >> implement my own Accumulo one for our cluster)
> >>
> >> Thanks very much,
> >>
> >> James
>


Re: [Nifi 1.5.0] DistributedMapCacheClientService fails to open port

2018-04-10 Thread Andrew Grande
It's a client, it connects to a cache service. You need to start another
service and point the client to it.

Andrew

On Tue, Apr 10, 2018, 6:29 AM françois lacombe 
wrote:

> Hi all,
>
> Does anyone experience problems with DistributedMapCacheClientService
> service?
> It currently doesn't manage to open port to listen on, according to lsof
> -i on my server.
>
> Then I got "connection refused" errors on runtime on several modules which
> rely on it.
>
> Is there any nifi.properties adjustements to make ?
>
> All the best
>
> François Lacombe
>


Re: how to edit queue content

2018-03-27 Thread Andrew Grande
How is it going to work with e.g. 20GB of events in the queue? I'd be
careful, as requirements blow up into a full db with indexes, search, and a
UI on top. If one wanted to filter events, wouldn't a standard processor do
the job better?

Andrew

On Tue, Mar 27, 2018, 12:11 AM Joe Witt  wrote:

> Scott
> Yep definitely something we've talked about [1].  We've not pursued it
> directly as of yet since it is indeed a queue and we're just letting
> you peak into it.  We dont have facilities built in to really alter
> the queue in a particular position.  Also, the complexity comes in
> when folks want to have paging/selection of various items down the
> list/etc..  (but it isn't a list - its a queue).
>
> If you could bound the range of what you'd expect to be able to do
> that would probably help constrain into something reasonably
> implemented.
>
> Thanks
>
> [1]
> https://cwiki.apache.org/confluence/display/NIFI/Interactive+Queue+Management
>
> On Tue, Mar 27, 2018 at 12:06 AM, scott  wrote:
> > Hi community,
> >
> > I've got a question about a feature I would find useful. I've been
> setting
> > up a lot of new flows and testing various configurations, and I thought
> it
> > would be really useful if I could edit the content of queues. For
> example, I
> > can examine each file in the queue, then decide I want to keep the second
> > one and the third one, then remove the rest before resuming my flow
> testing.
> > I know I can delete all files, but is there a way to have more control
> over
> > the queue content? Could I delete a specific file, or change the order of
> > the queue?
> >
> > Thanks for your time,
> >
> > Scott
> >
>


Re: 答复: put pictures from remote server into hdfs

2018-03-23 Thread Andrew Grande
I think the MOB expectation for HBase was around 10MB.

I agree it will require some thought put in organizing the space and region
server splits with column families, once this volume becomes significant.

Andrew

On Fri, Mar 23, 2018, 9:08 AM Mike Thomsen  wrote:

> Off the top of my head, try PutHBaseCell for that. If you run into
> problems, let us know.
>
> As a side note, you should be careful about storing large binary blobs in
> HBase. I don't know to what extent our processors support HBase MOBs
> either. In general, you'll probably be alright if the pictures are on the
> small side (< 1MB), but be very careful beyond that.
>
> If you have to store a lot of images and aren't able to commit to a small
> file size, I would recommend looking at BLOB store like S3 or OpenStack
> Swift. Maybe Ceph as well.
>
> On Thu, Mar 22, 2018 at 8:59 PM, 李 磊  wrote:
>
>> Hi Bryan:
>>
>> Thanks for you response.
>>
>> Using GetSFTP and PutHDFS is helpful.
>>
>> Now I meet another problem. Besides the HDFS, the priictures from remote
>> server also need to put into HBase. The filename is rowkey and the file as
>> a column.
>>
>> This is the reason why I store the pictures in local and then use
>> ExecuteFlumeSource with spooldir which can read the picture as a whole, but
>> I lose the filename.
>>
>> -邮件原件-
>> 发件人: Bryan Bende [mailto:bbe...@gmail.com]
>> 发送时间: 2018年3月23日 0:42
>> 收件人: users@nifi.apache.org
>> 主题: Re: put pictures from remote server into hdfs
>>
>> Hello,
>>
>> It would probably be best to use GetSFTP -> PutHDFS.
>>
>> No need to write the files out to local disk somewhere else with PutFile,
>> they can go straight to HDFS.
>>
>> The filename in HDFS will be the "filename" attribute of the flow file,
>> which GetSFTP should be setting to the filename it picked up.
>>
>> If you need a different filename, you can stick an UpdateAttribute before
>> PutHDFS and change the filename attribute to whatever makes sense.
>>
>> -Bryan
>>
>>
>> On Thu, Mar 22, 2018 at 12:18 PM, 李 磊  wrote:
>> > Hi all,
>> >
>> >
>> >
>> > It is my requirement that put pictures from remote server(not in nifi
>> > cluster) into hdfs.
>> >
>> > First I use the GetSFTP and PutFile to get pictures to local, and then
>> > use ExecuteFlumeSource and ExecuteFlumeSink to put pictures into hdfs
>> > from local.
>> >
>> >
>> >
>> > However, there is a problem that the name of pictures that put into
>> > hdfs cannot keep the same with local.
>> >
>> >
>> >
>> > Could you tell me the way to keep the name same or a better way to put
>> > pictures into hdfs from remote server with nifi?
>> >
>> >
>> >
>> > Thanks!
>>
>
>


Re: setting processor concurrency based on the development/production environment

2018-03-02 Thread Andrew Grande
There are 2 efforts, with somewhat different focus. You are already aware
of the community-driven nipyapi, but there's also an official module
Ientioned before, will be included with the 1.6 release.

Andrew

On Fri, Mar 2, 2018, 8:59 AM Boris Tyukin <bo...@boristyukin.com> wrote:

> Hi Andrew,
>
> thanks for the idea. I've been playing with nipyapi recently so might give
> this a try.
>
> Thanks
>
> On Thu, Mar 1, 2018 at 7:32 PM, Andrew Grande <apere...@gmail.com> wrote:
>
>> Boris,
>>
>> Here's an idea youncould explore _today_.
>>
>> Assume your dev and prod flows live in different bucket/registry
>> instance. Given that you are trying out NiFi 1.6, you should be able to
>> extract the versioned flow from DEV and process it to change the
>> concurrency level for PROD before committing it to the prod registry
>> instance. Any script which understands json would do. nifi-toolkit-cli will
>> take care of extracting and moving flow versions.
>>
>> It's not ideal (yes, would like concurrency to be a customizable flow
>> var), and it assumes an explicit process to promote between environments,
>> but technically it is possible already. The user experience can be improved
>> in the future.
>>
>> Andrew
>>
>>
>> On Thu, Mar 1, 2018, 1:52 PM Kevin Doran <kdo...@apache.org> wrote:
>>
>>> I think you could put it under either project. Ultimately, if we go with
>>> that approach, most (all?) of the logic/enhancement would be in the NiFi
>>> code base during save version / import flow / change version operations, so
>>> probably best to create it there.
>>>
>>>
>>>
>>> Glad you are finding NiFi useful.
>>>
>>>
>>>
>>> Cheers,
>>>
>>> Kevin
>>>
>>>
>>>
>>> *From: *Boris Tyukin <bo...@boristyukin.com>
>>> *Reply-To: *<users@nifi.apache.org>
>>> *Date: *Thursday, March 1, 2018 at 13:44
>>> *To: *<users@nifi.apache.org>
>>> *Subject: *Re: setting processor concurrency based on the
>>> development/production environment
>>>
>>>
>>>
>>> thanks Bryan and Kevin. I will be happy to open a jira - would it be a
>>> NiFi jira or NiFi registry?
>>>
>>>
>>>
>>> I like the approach that Bryan suggested.
>>>
>>>
>>>
>>> I guess for now I will just color code the processors that need to be
>>> changed in production.
>>>
>>>
>>>
>>> P.S. I really, really like where NiFi is going...I've looked at
>>> StreamSets and Cask, but for my purposes, I was looking for a tool when I
>>> can process various tables without creating a flow per table. I was able to
>>> create a very simple flow in NiFi, that will handle 25 tables. My next
>>> project is to handle 600 tables in near real-time. I just could see how I
>>> would do that with StreamSets or Cask, when you have to create a pipeline
>>> per table. I was only being able to do something similar with Apache
>>> Airflow, but airflow cannot do things in near real-time. The concept of
>>> FlowFiles with attributes is a genius idea, and I am blown away with all
>>> the possibilities to extend the functionality of NiFi with custom
>>> processors and Groovy scripts. Awesome job, guys.
>>>
>>>
>>>
>>> On Thu, Mar 1, 2018 at 1:29 PM, Kevin Doran <kdo...@apache.org> wrote:
>>>
>>> Hi Boris,
>>>
>>> Good point regarding concurrent tasks; thanks for sharing!
>>>
>>> This is a great candidate for something that one should be able to
>>> create environment-specific values for, as Bryan suggests. I agree we
>>> should create a NiFi JIRA to track this enhancement.
>>>
>>> Thanks,
>>> Kevin
>>>
>>>
>>> On 3/1/18, 11:44, "Bryan Bende" <bbe...@gmail.com> wrote:
>>>
>>> Hello,
>>>
>>> Glad you are having success with NiFi + NiFi Registry!
>>>
>>> You brought up an interesting point about the concurrent tasks...
>>>
>>> I think we may want to consider making the concurrent tasks work
>>> similar to variables, in that we capture the concurrent tasks that
>>> the
>>> flow was developed with and would use it initially, but then if you
>>> have modified this value in the target environment it would not
>>> trigger a local change and would be retained 

Re: setting processor concurrency based on the development/production environment

2018-03-01 Thread Andrew Grande
Boris,

Here's an idea youncould explore _today_.

Assume your dev and prod flows live in different bucket/registry instance.
Given that you are trying out NiFi 1.6, you should be able to extract the
versioned flow from DEV and process it to change the concurrency level for
PROD before committing it to the prod registry instance. Any script which
understands json would do. nifi-toolkit-cli will take care of extracting
and moving flow versions.

It's not ideal (yes, would like concurrency to be a customizable flow var),
and it assumes an explicit process to promote between environments, but
technically it is possible already. The user experience can be improved in
the future.

Andrew

On Thu, Mar 1, 2018, 1:52 PM Kevin Doran  wrote:

> I think you could put it under either project. Ultimately, if we go with
> that approach, most (all?) of the logic/enhancement would be in the NiFi
> code base during save version / import flow / change version operations, so
> probably best to create it there.
>
>
>
> Glad you are finding NiFi useful.
>
>
>
> Cheers,
>
> Kevin
>
>
>
> *From: *Boris Tyukin 
> *Reply-To: *
> *Date: *Thursday, March 1, 2018 at 13:44
> *To: *
> *Subject: *Re: setting processor concurrency based on the
> development/production environment
>
>
>
> thanks Bryan and Kevin. I will be happy to open a jira - would it be a
> NiFi jira or NiFi registry?
>
>
>
> I like the approach that Bryan suggested.
>
>
>
> I guess for now I will just color code the processors that need to be
> changed in production.
>
>
>
> P.S. I really, really like where NiFi is going...I've looked at StreamSets
> and Cask, but for my purposes, I was looking for a tool when I can process
> various tables without creating a flow per table. I was able to create a
> very simple flow in NiFi, that will handle 25 tables. My next project is to
> handle 600 tables in near real-time. I just could see how I would do that
> with StreamSets or Cask, when you have to create a pipeline per table. I
> was only being able to do something similar with Apache Airflow, but
> airflow cannot do things in near real-time. The concept of FlowFiles with
> attributes is a genius idea, and I am blown away with all the possibilities
> to extend the functionality of NiFi with custom processors and Groovy
> scripts. Awesome job, guys.
>
>
>
> On Thu, Mar 1, 2018 at 1:29 PM, Kevin Doran  wrote:
>
> Hi Boris,
>
> Good point regarding concurrent tasks; thanks for sharing!
>
> This is a great candidate for something that one should be able to create
> environment-specific values for, as Bryan suggests. I agree we should
> create a NiFi JIRA to track this enhancement.
>
> Thanks,
> Kevin
>
>
> On 3/1/18, 11:44, "Bryan Bende"  wrote:
>
> Hello,
>
> Glad you are having success with NiFi + NiFi Registry!
>
> You brought up an interesting point about the concurrent tasks...
>
> I think we may want to consider making the concurrent tasks work
> similar to variables, in that we capture the concurrent tasks that the
> flow was developed with and would use it initially, but then if you
> have modified this value in the target environment it would not
> trigger a local change and would be retained across upgrades so that
> you don't have to reset it.
>
> For now you could probably always leave the versioned flow with the
> lower value of 2, then once you are in prod you bump it to 4 until the
> next upgrade is available, you then revert the local changes, do the
> upgrade, and put it back to 4, but its not ideal because it shows a
> local change the entire time.
>
> I don't think there is much you can do differently right now, but I
> think this is a valid case to create a JIRA for.
>
> Thanks,
>
> Bryan
>
> On Thu, Mar 1, 2018 at 11:29 AM, Boris Tyukin 
> wrote:
> > Hello NiFi community,
> >
> > started using NiFi recently and fell in love with it! We run 1.6
> NiFi alone
> > with new NiFi registry and I am trying to figure out how to promote
> NiFi
> > flow, created in VM environment to our cluster.
> >
> > One of the things is "Concurrent Tasks" processor parameter. I bump
> it to 2
> > or 4 for some processors in my flow, when I develop / test it in VM.
> >
> > Then we deploy this flow to a beefy cluster node (with 48 cores) and
> want to
> > change concurrency to let's say 8 or 10 or 12 for some processors.
> >
> > Then I work on a new version/make some changes in my VM, and need to
> be more
> > shy with concurrency so set it back to 2 or 4.
> >
> > Then the story repeats...
> >
> > Is there a better way than to manually set this parameter? I do not
> believe
> > I can use a variable there and have to type the actual number of
> tasks.
> >
> >
> > Thanks
> 

Re: Registry, sdlc and promotion between environments

2018-02-21 Thread Andrew Grande
Yes, Georg, there's something coming up to address exactly that, please
take a look at https://github.com/apache/nifi/pull/2477

Andrew

On Wed, Feb 21, 2018, 2:45 AM Georg Heiler 
wrote:

> Hi
>
> Can I use the new nifi registry to promote / move my nifi flows between
> environments?
>
> To me it currently looks like I can only click around in a single
> environment and then follow up / diff the changes over time via the
> registry.
>
> Also can the registry integrate with git?
>
> Best
> Georg
>


Re: Routing Files Based on Dynamic Pattern Matching

2018-02-20 Thread Andrew Grande
I think this is exactly what a Lookup Service was designed to do. You are
free to implement any logic of yours behind the scenes.

Andrew

On Mon, Feb 19, 2018, 5:12 PM Shawn Weeks <swe...@weeksconsulting.us> wrote:

> The problem I see with the Scan Processors is they don’t return anything
> other than matched or not matched. And Lookup Services assume you have a
> specific key you’re trying to match. I need to look for regular expression
> matches by comparing against a list of regular expressions not specific
> values. Let’s say I have a file named 123_blah_test_blah.csv, I might have
> a regular expression defined that says if the file name matches
> .*blah_test_blah(?!_somethingelse).* then it should goto directory
> /tmp/blah_test
>
>
>
> Thanks
>
> Shawn
>
>
>
> *From:* Andrew Grande <apere...@gmail.com>
> *Sent:* Monday, February 19, 2018 3:48 PM
> *To:* users@nifi.apache.org
> *Subject:* Re: Routing Files Based on Dynamic Pattern Matching
>
>
>
> Maybe take a look at one of the ScanContent/Attribute processors? It
> allows to map in a reloadable external file.
>
>
>
> Or, better yet, one of the LookupService variants, which is more generic.
>
>
>
> HTH,
>
> Andrew
>
> On Mon, Feb 19, 2018, 3:10 PM Shawn Weeks <swe...@weeksconsulting.us>
> wrote:
>
> Hi, I’m looking for some ideas on how to handle a workflow I’m developing.
> I have NiFi monitoring a drop off location where files are delivered. Files
> from this drop off location need to be routed to various target directories
> based on a series of file name patterns. Currently I have a custom
> processor I developed that queries a database table comparing the incoming
> file name against known file name patterns stored as regular expressions
> and attaches those attributes to the flow file. I feel like there is a
> better way to do this but I’m still fairly new to NiFi.
>
>
>
> Thanks
>
> Shawn Weeks
>
>


Re: Modularize nifi flows

2018-02-19 Thread Andrew Grande
A Processing Group would serve as a reusable unit, which then has version
control applied to it. You would model your flows to abstract interactions
with e.g. in/output port objects.

They also mentioned that nested PG can be versioned separately, looks like
git submodules behavior, but I haven't tried multi-level versioned PGs yet.

Andrew

On Mon, Feb 19, 2018, 4:46 PM Georg Heiler 
wrote:

> How can I modularize and reuse parts of my nifi flows?
> I.e. Have a shared logging and error handling strategy which can be
> centrally configureed for all processor groups.
>
> Best Georg
>


Re: Routing Files Based on Dynamic Pattern Matching

2018-02-19 Thread Andrew Grande
Maybe take a look at one of the ScanContent/Attribute processors? It allows
to map in a reloadable external file.

Or, better yet, one of the LookupService variants, which is more generic.

HTH,
Andrew

On Mon, Feb 19, 2018, 3:10 PM Shawn Weeks  wrote:

> Hi, I’m looking for some ideas on how to handle a workflow I’m developing.
> I have NiFi monitoring a drop off location where files are delivered. Files
> from this drop off location need to be routed to various target directories
> based on a series of file name patterns. Currently I have a custom
> processor I developed that queries a database table comparing the incoming
> file name against known file name patterns stored as regular expressions
> and attaches those attributes to the flow file. I feel like there is a
> better way to do this but I’m still fairly new to NiFi.
>
>
>
> Thanks
>
> Shawn Weeks
>


Re: NiFi Registry Create Item in the UI

2018-02-10 Thread Andrew Grande
Looking more at the UI. Not sure listing all items across buckets is a very
scalable approach. I expect to have lots and lots of buckets and many many
versions within a bucket. Having more of a 'nested' navigation, when one
can dive into a bucket, should provide for a more responsive UI and, as a
benefit, make a logical place for the New/Import button.

And yes, working from the cli and taking a registry-first approach is what
drives all these activities.

Cheers,
Andrew

On Sat, Feb 10, 2018 at 2:41 PM Bryan Bende <bbe...@gmail.com> wrote:

> Andrew,
>
> I agree it is a little confusing, especially when you first enter a
> brand new registry instance with no data.
>
> However, the main page of the registry is not a listing of buckets, it
> is a listing of "items" across all buckets, where currently the only
> time of item is a flow. You then click on an item to see the details
> which includes the version list. So I'm not sure if having a "New
> Bucket" button on the main page makes sense when you aren't looking at
> the list of buckets.
>
> Regarding "New Flow" and "New Version"...
>
> Currently these are not options from with in the registry because they
> are expected to happen from NiFi. However, if we wanted to add the
> ability to export/import between registries (which I think we should)
> then I agree with your suggestions. There could be an "Import" button
> at the top-level which would create the flow and the first version as
> one operation, I view this as what you are calling "New Flow", and
> then there could be an "Import Version" action from the actions
> drop-down of a specific flow.
>
> -Bryan
>
>
>
> On Sat, Feb 10, 2018 at 11:39 AM, Andrew Grande <apere...@gmail.com>
> wrote:
> > Hi,
> >
> > I wanted to share some feedback and suggestions on the registry UI. It
> was a
> > little unnatural to look for a New Bucket button in the Settings menu.
> Same
> > sentiment with the Create Flow for when we expect to be able to import
> it in
> > the UI in the near future.
> >
> > How about making the (+) or New button much more prominent and where one
> > would expect it? I propose:
> >
> > New Bucket action in the top menu on the main screen. Can hide if no
> > permission.
> > New Flow button once inside the bucket.
> > New Flow version once inside a flow.
> >
> > The semantics of the action change between Create and Import depending on
> > the context, but the expected action button location should be the least
> > confusing.
> >
> > Makes sense?
> >
> > Andrew
>


NiFi Registry Create Item in the UI

2018-02-10 Thread Andrew Grande
Hi,

I wanted to share some feedback and suggestions on the registry UI. It was
a little unnatural to look for a New Bucket button in the Settings menu.
Same sentiment with the Create Flow for when we expect to be able to import
it in the UI in the near future.

How about making the (+) or New button much more prominent and where one
would expect it? I propose:

   - New Bucket action in the top menu on the main screen. Can hide if no
   permission.
   - New Flow button once inside the bucket.
   - New Flow version once inside a flow.

The semantics of the action change between Create and Import depending on
the context, but the expected action button location should be the least
confusing.

Makes sense?

Andrew


Re: Persistence mechanism in Nifi

2018-01-18 Thread Andrew Grande
I don't think you need to do anything. If your target is down, the data is
stored in the NiFi connection. You can raise the backpressure limits as
long as you have enough disk space (I don't recommend disabling it).

Andrew

On Thu, Jan 18, 2018, 12:31 AM Vikram KR  wrote:

> Hi,
>I have NiFi data flow which reads from a  source  and writes to a Kafka
> target.
>  I'm writing my custom processor for fetching from source and It is
> supposed to keep reading data and not apply any back pressure. What i want
> to do is that even If my target does down I want my source to read data and
> persist it so that when target comes up data can be written to the target.
> Does NiFi allow this? If so how to configure it for the intermediate
> persistence?
>
>
> Regards,
> Vikram
>


Re: Polling Processors impact on Latency

2017-11-07 Thread Andrew Grande
Yes, polling increases latency in some cases. But no, NiFi is not just
polling. It has all kinds of sources, and listening vs polling vs
subscribing purely depends on the protocol of that given processor.

Hope this helps,
Andrew

On Tue, Nov 7, 2017, 1:39 AM Chirag Dewan  wrote:

> Hi All,
>
> I am a layman to NiFi. I am exploring NiFi as a data flow engine to be
> integrated with my Flink processing engine. A brief history of our approach
> :
>
> We are trying to build a Streaming Data processing engine. We started off
> with Flink as the sole core engine, which is responsible for
> collection(through Flink Sources) as well as processing the data.
>
> Soon we fumbled onto NiFi and the data flow world.
>
> So far, my understanding is that the NiFi processors are poling processors
> and not Pub-Sub processors. That makes me wonder, whats the impact of
> polling on latency? I know I can configure my processors to tradeoff
> latency with throughput, but is there a hard set limit on the latency I can
> achieve using NiFi?
>
> As I said, I am layman as yet. Perhaps my understanding is short here. Any
> leads would be much appreciated.
>
> P.S - Not diving much into Event Driven Processors. They look like
> something which might clear my thoughts. But since they are marked
> experimental, would be more interested in understanding the timer driven
> processors.
>
> Thanks,
>
> Chirag
>
>


Re: Back pressure deadlock

2017-10-23 Thread Andrew Grande
I wonder which jms broker you are using. The situation where a jms
destination is full is absurd, the whole point was to decouple publishers
and consumers. I would additionally look into what jms broker settings are
available to address the situation.

Andrew

On Mon, Oct 23, 2017, 10:32 AM Arne Degenring 
wrote:

> Hi Mark,
>
> Don’t get me wrong, NiFi is great! Much appreciated that it is constantly
> being improved. Would be great if better support for looping connections
> would be one of those improvements in the future :-) In the meantime, we
> can live with one of the solutions you suggested. Thanks for describing the
> options!
>
> Keep up the good work!
> Arne
>
>
> On 23. Oct 2017, at 16:05, Mark Payne  wrote:
>
> Arne,
>
> Fair enough. NiFi could perhaps be smarter about looping connections
> instead of stopping at self-loops.
>
> Another approach to this situation, which I have used, though, would be
> rather than having a flow that loops like you laid out
> with PublishJMS -> LogAttribute -> Back to PublishJMS,
> you could instead connect the 'failure' relationship to both PublishJMS as
> a self-loop and also connect it to the LogAttribute (or alerting
> processor or whatever you have), and then set an age-off on that
> connection. So in this setup, even if the log/alerting processor
> was having trouble, you'd not cause back pressure to be applied to
> PublishJMS because of the age-off. Typically in such a situation,
> sending data to some sort of alerting/status publishing case, it is the
> case that age-off is appropriate (though granted it may not be 100%
> of the time).
>
> Another useful approach to consider in such a case may actually be to have
> Reporting Tasks [1] that would monitor the flow for large queues,
> etc. While you can build such monitoring capabilities into the flow, I am
> a fan personally of 'pulling up' this logic out of the flow because it tends
> to result in much cleaner, easier-to-understand, and easier-to-implement
> flows.
>
> So I'm certainly not saying that what NiFi does is correct and perfect and
> can't be improved upon - any solution can probably be improved upon,
> and NiFi is certainly rapidly improving each day. But I wanted to point
> out some ways that you can think about attacking the concerns that you
> have with the current implementation.
>
> Thanks!
> -Mark
>
>
> [1]
> http://nifi.apache.org/docs/nifi-docs/html/user-guide.html#Reporting_Tasks
>
>
>
> On Oct 23, 2017, at 9:45 AM, Arne Degenring 
> wrote:
>
> Hi Mark,
>
> Thanks for clarifying that self-looping connections will still be
> processed in back pressure situations.
>
> For this specific case, we can probably live without the additional
> routing to the logging component and back.
>
> I think, however, that there are cases when such ping-pong routing in
> failure cases can be very useful. E.g. for alerting someone actively,
> publishing some information on a status page, ... etc.
>
> Therefore I feel it would be great if NiFi could be extended to avoid such
> back pressure deadlock situations. Maybe through some kind of automatic
> deadlock detection, or by marking certain incoming relations as not back
> pressure relevant (same as self-looping connections).
>
> Thanks,
> Arne
>
>
> On 23. Oct 2017, at 15:00, Mark Payne  wrote:
>
> Hi Arne,
>
> Generally, the approach that is used in such a situation would be to route
> failure back to the PublishJMS processor
> itself (without diverting first to a LogAttribute processor). The
> PublishJMS processors itself should be logging an error
> with the FlowFile's identity. Then, troubleshooting can be done by
> inspecting the queue (right-click, List Queue) or
> via Data Provenance [1]. When a processor encounters backpressure, it
> still will continue to process data that comes
> in on self-looping connections. So the failure relationship would still
> get processed.
>
> Does this help?
>
> Thanks
> -Mark
>
>
>
> [1]
> http://nifi.apache.org/docs/nifi-docs/html/user-guide.html#data_provenance
>
>
>
> On Oct 23, 2017, at 6:46 AM, Arne Degenring 
> wrote:
>
> Hi,
>
> We came across a situation when we experience a kind of “back pressure
> dead lock”.
>
> In our setup, this occurs around PublishJMS when the target JMS queue is
> full. Please find attached a screenshot of the relevant flow.
>
> The failure relation we route to a logging component, and then back to
> PublishJMS for retry. Sooner or later, the failure and retry queues will
> become full and produce backpressure towards the main input (which is
> good). The problem is that the same back pressure is also applied to the
> retry queue.
>
> In this situation, PublishJMS will not be called at all any longer. Even
> when the JMS problem resolves, the whole thing stays deadlocked.
>
> Is there a recommended way to avoid such situation?
>
> Obviously, an admin can 

Re: Dynamically adding avro schema to AvroSchemaRegistry at runtime

2017-10-23 Thread Andrew Grande
Hi,

Using an external schema registry is the way to go. The embedded one is
meant for ease of use, but once you grow beyond an initial phase, the best
practice is to have a full service and potentially use standard InvomeHTTP
to perform operations beyond just lookups.

Does it help?
Andrew

On Mon, Oct 23, 2017, 6:43 AM Sönke Liebau 
wrote:

> Hi everybody,
>
> I am developing a custom ingest processor that writes data out as a binary
> avro stream. Currently I simply store the schema in an attribute of the
> flowfile and that works quite nicely with an AvroRecordReader deserializing
> it.
>
> To make things "nicer" I thought I'd have a look at the AvroSchemaRegistry
> and use that to store and look up schemas. However I cannot find a way for
> my processor to register a schema with the registry, but only to retrieve
> schemas [1]. I understand that I can manually add the schema as a dynamic
> property, but what I want to accomplish is that the processor can
> automatically add evolving schemas to the registry at runtime.
>
> Am I missing something obvious here, or is the registry simply not
> supposed to work like that?
>
> Kind regards,
> Sönke
>
> [1]
> https://github.com/apache/nifi/blob/master/nifi-nar-bundles/nifi-registry-bundle/nifi-registry-service/src/main/java/org/apache/nifi/schemaregistry/services/AvroSchemaRegistry.java#L83
>


Re: org.apache.nifi.spark.NiFiReceiver and spark spark.streaming.receiver.maxRate

2017-09-18 Thread Andrew Grande
A typical production setup is to use Kafka in the middle.

Andrew

On Mon, Sep 18, 2017, 3:02 AM Margus Roo  wrote:

> Hi
>
> I need to take flow with Spark streaming from Nifi port. As we know
> Spark supports spark.streaming.receiver.maxRate and
> spark.streaming.receiver.backpressure
>
> Seems that org.apache.nifi.spark.NiFiReceiver does support it at all.
>
> https://github.com/apache/nifi/tree/master/nifi-external/nifi-spark-receiver/src/main/java/org/apache/nifi/spark
> - code is quite old too.
>
> My question is - what is the recommended aproach today getting stream
> from nifi with Spark? Is
>
> https://github.com/apache/nifi/tree/master/nifi-external/nifi-spark-receiver/src/main/java/org/apache/nifi/spark
> the best we have? If it is then what is the best aproach to integrate
> spark maxRate or backpressure in it?
>
> --
> Margus (margusja) Roo
> http://margus.roo.ee
> skype: margusja
> https://www.facebook.com/allan.tuuring
> +372 51 48 780
>
>


Re: Re: QueryDatabaseTable - Deleted Records

2017-09-16 Thread Andrew Grande
As an interesting architectural approach we took eons ago, before NiFi, was
to take daily snapshots of a full table. Every row would then be
hashed/digested or in any other way uniquely identified and 2 datasets
would be crossed and compared to find inserts/deletes/updates. It was
involved, but worked.

Andrew

On Sat, Sep 16, 2017, 2:38 AM Uwe Geercken  wrote:

> Bryan,
>
> yes, the change log would be possible. In my use case I have Oracle 11 as
> the source - and I can not change the source easily (takes long - is
> expensive).
>
> I was expecting this answer but wanted to make sure that I have not missed
> anything. I will try to build my use case around something else then.
>
> Thanks for your response(s).
>
> Rgds,
>
> Uwe
>
> *Gesendet:* Freitag, 15. September 2017 um 16:15 Uhr
> *Von:* "Bryan Bende" 
> *An:* users@nifi.apache.org
> *Betreff:* Re: QueryDatabaseTable - Deleted Records
> Uwe,
>
> Typically you need to process the change log of the database in this
> case, which unfortunately usually becomes database specific.
>
> I believe we have a processor CaptureChangeMySQL that can process the
> MySQL change log.
>
> -Bryan
>
>
> On Tue, Sep 12, 2017 at 1:39 PM, Uwe Geercken  wrote:
> > Hello,
> >
> > apparently the QueryDatabaseTable processor catches changes made to the
> data
> > of the source database - updates and inserts.
> >
> > Has anybody a good idea or strategy how to handle deletes in the source
> > database? Of course one could flag a record as deleted instead of
> phisically
> > deleting it. But this means changing the source system in many cases and
> > that is sometimes not possible. And yes, if you process the change log
> (if
> > available) of the source system that is also a good option.
> >
> > Would be greatful for any tips or a best practive of how you do it.
> >
> > Rgds,
> >
> > Uwe
>


Re: Missing nifi-app.log files

2017-08-17 Thread Andrew Grande
Typically this is better handled by using the -F switch instead of -f, it
has more robust file handling and manages files disappearing correctly.
Unfortunately, some OS don't have that switch in their toolchain.

Andrew

On Thu, Aug 17, 2017, 5:07 PM James McMahon  wrote:

> Interesting. Thank you Russ. I wonder whether I somehow interrupted or
> deleted the log file when I "ctrl-C"`ed out of the tail -f ? I'll have to
> test that and see. -Jim
>
> On Thu, Aug 17, 2017 at 4:57 PM, Russell Bateman 
> wrote:
>
>> James,
>>
>> It's the case that, NiFi running, deleting a log file will result in that
>> file no longer existing and no longer written to again until NiFi is
>> restarted. This is my observation anyway.
>>
>> Hope this observation is useful.
>>
>> Russ
>>
>> On 08/17/2017 02:11 PM, James McMahon wrote:
>>
>> Thank you Joe. I agree and will monitor it closely going forward. I
>> suspect there were some external factors at play here.
>>
>> On Thu, Aug 17, 2017 at 4:05 PM, Joe Witt  wrote:
>>
>>> Ok if 50,000 is the max then i'm doubtful that it ran out.
>>>
>>> In the event of exhaustion of allowed open file handle count NiFi will
>>> run but its behavior will be hard to reason over.  That means it
>>> cannot create any new files or open existing files but can merely
>>> operate using the handles it already has. It is a situation to avoid.
>>>
>>> As far as what actually happened resulting in logfile issues it is not
>>> easy to tell at this stage but should be monitored for system state
>>> when it happens again.
>>>
>>> Thanks
>>>
>>>
>>> On Thu, Aug 17, 2017 at 1:02 PM, James McMahon 
>>> wrote:
>>> > 50,000.
>>> > Is NiFi robust enough that it can continue to run without the log file
>>> for
>>> > write attempts?
>>> > It is back up and running like a champ now, so I will keep an eye on
>>> it.
>>> >
>>> > On Thu, Aug 17, 2017 at 3:40 PM, Joe Witt  wrote:
>>> >>
>>> >> It sounds like a case of exhausted file handles.
>>> >>
>>> >> Ulimit -a
>>> >>
>>> >> How many open files are allowed for the user nifi runs as?
>>> >>
>>> >> On Aug 17, 2017 12:26 PM, "James McMahon" 
>>> wrote:
>>> >>>
>>> >>> Our nifi instance appeared to be running fine but we noticed that
>>> there
>>> >>> were no log files for today in the logs subdirectory. We could not
>>> find any
>>> >>> nifi logs for today anywhere on our system.
>>> >>>
>>> >>> I was surprised that NiFi continued to run. Has anyone experienced
>>> such
>>> >>> behavior?
>>> >>>
>>> >>> How is NiFi able to continue to run without a nifi-app.log - do all
>>> its
>>> >>> log messages effectively go to bit bucket heaven?
>>> >>>
>>> >>> I ultimately did an orderly shutdown via
>>> >>> service nifi stop
>>> >>> and an orderly start via
>>> >>> service nifi start
>>> >>> after which the log files were there as expected.
>>> >>>
>>> >>> Thanks in advance for any insights. -Jim
>>> >>>
>>> >>>
>>> >>>
>>> >
>>> >
>>>   
>>>
>>
>>
>>
>


Re: Parameterizing the nifi flow

2017-08-11 Thread Andrew Grande
Hi,

Read up on the variable registry in the docs, that sounds like a good fit.
I don't remember if it were available in 1.1 though.

Andrew

On Fri, Aug 11, 2017, 5:12 PM More, Vikram (CONT) <
vikram.m...@capitalone.com> wrote:

> Hi,
>
>
>
> I have a nifi flow which pulls/extracts from source database table and
> loads into target database table. This flow will run several times in a day
> to get delta records from source table (more like batch process running
> every 3-4 hrs). Now I need to replicate this same process for 100+
> different source tables. So rather than creating 100+ nifi flows for each
> separate table, can I create main flow (let's say template) and pass
> parameter like source extract sql, target load sql to main flow. And repeat
> these steps for each source table . Has anyone tried parameterizing the
> nifi flows, can you please advice . We are using NiFi 1.1.0
>
>
>
> Appreciate any thoughts here.
>
>
>
>
>
> Thanks & Regards,
>
> *Vikram*
>
>
>
> --
>
> The information contained in this e-mail is confidential and/or
> proprietary to Capital One and/or its affiliates and may only be used
> solely in performance of work or services for Capital One. The information
> transmitted herewith is intended only for use by the individual or entity
> to which it is addressed. If the reader of this message is not the intended
> recipient, you are hereby notified that any review, retransmission,
> dissemination, distribution, copying or other use of, or taking of any
> action in reliance upon this information is strictly prohibited. If you
> have received this communication in error, please contact the sender and
> delete the material from your computer.
>


Re: ExecuteSQL with MSSQL Server Express

2017-07-19 Thread Andrew Grande
Try with forward slashes, please.

On Wed, Jul 19, 2017, 12:17 PM Praveen Reddy 
wrote:

> Hi,
>
> I am using ExecuteSQL processor to connect with MSSql server express to
> get some data. But I am getting the below exception:
>
> [image: Inline image 1]
>
> My Connection URL is:
> *jdbc:sqlserver:\\localhost\SQLEXPRESS:1433;Database=Test*
> Driver class name is* :
> com.microsoft.sqlserver.jdbc.SQLServerDriver*
> Db driver location: file*  : \\\C:\New
> folder\sqljdbc_6.2\enu\mssql-jdbc-6.2.1.jre8.jar*
>
> Can you please let me know if I am doing something wrong?
>
> Thank you.
>
> With Best Regards,
> Praveen.B
>


Re: Processor classpath

2017-07-04 Thread Andrew Grande
It's not a classpath, but rather a user configurable location, exposed as a
processor property. Does it help?

Andrew

On Tue, Jul 4, 2017, 4:32 PM James Srinivasan 
wrote:

> Hi,
>
> I'm developing a processor which needs to read some of its config from
> the classpath. Reading the docs etc., NiFi's classpath is a little
> funky - where's the best (least worst?) location for such files? I
> note that the HDFS processors can read their config (core-site.xml
> etc) from the classpath, but I can't find where that actually is.
>
> Thanks in advance,
>
> James
>


Re: Quickly find queues with data...

2017-05-05 Thread Andrew Grande
Pro tip - this connection backlog info is available in the json returned
from the status API. This is a good way to integrating with monitoring
system via polling.

Andrew

On Fri, May 5, 2017, 5:41 PM Russell Bateman  wrote:

> Ah, I see. Thanks very much. (I spend too much time writing custom
> processors and not enough time dog-fooding so when I do, I tend to choke
> like a beginning NiFi user.)
>
>
> On 05/05/2017 03:37 PM, Joe Witt wrote:
>
> Go here: 
> https://nifi.apache.org/docs/nifi-docs/html/user-guide.html#Summary_Page
>  It shows you how to get the summary view.  Click on connections.
> Sort on the column of interest.
>
> Thanks
>
> On Fri, May 5, 2017 at 5:35 PM, Russell Bateman  
>  wrote:
>
> Is there some NiFi power-user trick to quick-finding queues that have
> flowfiles sitting in them?
>
> I have a pretty big flow with lots of processor groups, processors, input-
> and output-ports, etc. and I ingest some files. I realize, however, that
> they don't all show up at the end, so I must go looking.
>
> I need to find out which queues contain the ones that never made it to the
> end. Have I no other choice than to inspect every processor group and every
> queue visually in search of my lost files?
>
> I know that processors that failed will have a red blob in their upper-right
> corner and that I'll find that the queue in front of them may contain files.
> I also know that a stopped processor's (i.e.: a processor I never started)
> preceding queue may contain files. But, is there a faster way than to
> inspect each processor, queue, etc?
>
> Thanks for any comments, experience, tricks, etc.
>
>
>
>


Re: Publish with persistence using PublishAMQP?

2017-05-05 Thread Andrew Grande
James, you make too much sense :) Mind filing a jira for that enhancement?

Thanks,
Andrew

On Thu, May 4, 2017, 9:56 AM James McMahon  wrote:

> amqp$deliveryMode
>
> from the documentation:
> Attributes extracted from the FlowFile are considered candidates for AMQP
> properties if their names are prefixed with *amqp$* (e.g.,
> amqp$contentType=text/xml). To enrich message with additional AMQP
> properties you may use *UpdateAttribute* processor between the source
> processor and PublishAMQP processor. The following is the list of available
> standard AMQP properties: *("amqp$contentType", "amqp$contentEncoding",
> "amqp$headers", "amqp$deliveryMode", "amqp$priority", "amqp$correlationId",
> "amqp$replyTo", "amqp$expiration", "amqp$messageId", "amqp$timestamp",
> "amqp$type", "amqp$userId", "amqp$appId", "amqp$clusterId")*
>
> I assume folks use this approach, preceding PublishAMQP with an extra
> UpdateAttribute.
>
> If anyone does something differently, I'd welcome your feedback.
>
> A question the suggests itself: why aren't these options included as
> optional config parms in PublishAMQP itself? Seems odd to require the extra
> UpdateAttribute step.
>
> On Thu, May 4, 2017 at 7:21 AM, James McMahon 
> wrote:
>
>> New to using PublishAMQP and interested in applying best practices. I've
>> made a mistake in my initial use, in which messages I posted to RabbitMQ
>> were gone from my queues after I restarted the message broker. Goofy
>> mistake, but thankfully I am in development prior to production use.
>>
>> I realize that I am not publishing persistently. I don't see any
>> immediate configuration option in the PublishAMQP processor to dictate
>> persistent publication to the exchange and bound queues.
>>
>> What is the best practice folks employ to publish messages that persist?
>> Thanks very much in advance for any insights.   - Jim
>>
>
>


Re: Input flowfile for GetHTTP

2017-05-02 Thread Andrew Grande
Try with InvokeHttp.

Andrew

On Tue, May 2, 2017, 3:42 AM Buntu Dev  wrote:

> I'm trying to connect ExtractText to GetHTTP processor but that doesn't
> seem to be allowed. Wanted to check if that is true that GetHTTP processor
> doesn't accept input flowfiles and if so, how do I figure out which
> processors do not accept input flowfile based on the documentation?
>
>
> Thanks!
>


Re: Apache nifi 1.0.0 consuming high CPU utilization

2017-05-02 Thread Andrew Grande
What is your flow doing? It very much depends on this. While 6 cores is not
a great hardware to run a full NiFi system on, I'd guess ReplaceText and
EvaluateJsonPath are the hotspots. See how much data queues up in  front​
of these processors.

Andrew

On Tue, May 2, 2017, 5:09 AM kunal  wrote:

> Hi All,
>
> As we are using nifi for the restfull API development purpose and using
> mostly below components.
> HttpRequestHandler
> HttpResponseHandler
> Route on attribute
> ReplaceText
> ExecuteSQL
> PutSQL
> EvaluateJsonPath
> InvokeHttp
> etc
> After development we notice that the CPU utilisation is very high it almost
> taking 300% CPU utilisation on the 6 core processor with Average above 3
> after changing the run scheduler on HttpRequestHandler as 1 from 0 it
> reduces to 160-190
> which is also very high, also there is no abnormal behaviour found in the
> nifi logs.
> Please suggest how to resolve this issue
>
>
>
>
> --
> View this message in context:
> http://apache-nifi-users-list.2361937.n4.nabble.com/Apache-nifi-1-0-0-consuming-high-CPU-utilization-tp1770.html
> Sent from the Apache NiFi Users List mailing list archive at Nabble.com.
>


Re: csv output processor?

2017-05-01 Thread Andrew Grande
There is something interesting coming out in 1.2.0 potentially, the new
pairs of RecordReaders/RecordSetWriters. I did see CSV format support in
there. Take another look maybe.

Andrew

On Fri, Apr 28, 2017, 4:55 PM Frank Maritato 
wrote:

> Is there a nifi processor that will output a bunch of flowfile attributes
> as delimited text (i.e. csv)? I had checked google a while ago and the
> suggestion was to use ReplaceText. The problem with this is that we have to
> add all the field delimiters, quotes and escaping of characters to each
> attribute manually.
>
> Thanks!
> --
> Frank Maritato
> Data Architect
>


Re: Back Pressure Object threshold not honored

2017-04-27 Thread Andrew Grande
I would also suggest the ControlRate processor in front of a sensitive
step. There's a chance data may accumulate and sent in burst after
downtime, so it ensures a predictable outflow.

Andrew

On Thu, Apr 27, 2017, 8:03 PM Joe Percivall  wrote:

> Hello Kevin,
>
> I believe there are two things at play here. The first is the processor
> being very fast and processing the FlowFiles before back pressure gets
> applied. The second is that in the current distribution, UpdateAttribute
> uses an old style of getting higher performance and grabs batches of 100
> with each onTrigger[1]. Since back-pressure gets applied per onTrigger the
> UpdateAttribute will process at least 100 FlowFiles before it gets told to
> stop processing.
>
> In the changes for 1.2.0 though I updated it to bring only 1 FlowFile in
> per onTrigger. So if you test this on a build of master then you should see
> more appropriate back-pressure application.
>
> [1]
> https://github.com/apache/nifi/blob/rel/nifi-1.1.2/nifi-nar-bundles/nifi-update-attribute-bundle/nifi-update-attribute-processor/src/main/java/org/apache/nifi/processors/attributes/UpdateAttribute.java#L338
>
> Joe
>
> On Thu, Apr 27, 2017 at 7:21 PM, Kevin Verhoeven <
> kevin.verhoe...@ds-iq.com> wrote:
>
>> Thank you for your help Andy. I think you are correct, the flowfiles are
>> very small and the previous Processor is very fast – this might explain
>> what is happening. I’ve enclosed screenshots of the connection properties
>> and the workflow. In the screenshot I see 400 flowfiles were allowed
>> through before back pressure was applied. The back pressure object
>> threshold is set to 1. Do you have any recommendations?
>>
>>
>>
>> Kevin
>>
>>
>>
>>
>>
>>
>>
>> *From:* Andy LoPresto [mailto:alopre...@apache.org]
>> *Sent:* Thursday, April 27, 2017 4:16 PM
>> *To:* users@nifi.apache.org
>> *Subject:* Re: Back Pressure Object threshold not honored
>>
>>
>>
>> Hi Kevin,
>>
>>
>>
>> Sorry to hear you are having this issue. Can you please provide a
>> screenshot of the connection properties in the configuration dialog? How
>> quickly do those flowfiles get enqueued? I think there’s a chance if they
>> are very small & the previous processor is very fast (i.e.
>> RouteOnAttribute, SplitText) that it could enqueue a higher number before
>> the back pressure check is executed.
>>
>>
>>
>> Andy LoPresto
>>
>> alopre...@apache.org
>>
>> *alopresto.apa...@gmail.com *
>>
>> PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4  BACE 3C6E F65B 2F7D EF69
>>
>>
>>
>> On Apr 27, 2017, at 4:07 PM, Kevin Verhoeven 
>> wrote:
>>
>>
>>
>> I have an odd problem. I set the Back Pressure Object threshold on a link
>> between two Processors to 1, but 200 flowfiles are passed to the queue
>> before back pressure is honored. I need the back pressure to be set to a
>> small number of flowfiles to keep the source from flooding the destination.
>> Has anyone come across this problem before? I am running 12 instances of
>> NiFi on version 1.1.1 on Ubuntu 14.04.
>>
>>
>>
>> Regards,
>>
>>
>>
>> Kevin
>>
>>
>>
>
>
>
> --
> *Joe Percivall*
> e: joeperciv...@gmail.com
>


Re: NPE in ListenSyslog processor

2017-04-25 Thread Andrew Grande
I wonder if the cause of zero length messages is the health check from the
f5 balancer. Worth verifying with your team.

Andrew

On Tue, Apr 25, 2017, 3:15 PM Andy LoPresto  wrote:

> PR 1694 [1] is available for this issue.
>
> [1] https://github.com/apache/nifi/pull/1694
>
> Andy LoPresto
> alopre...@apache.org
> *alopresto.apa...@gmail.com *
> PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4  BACE 3C6E F65B 2F7D EF69
>
> On Apr 25, 2017, at 10:07 AM, Conrad Crampton  > wrote:
>
> Hi,
> Thanks for the swift reply (as usual).
> NIFI-3738 created [1].
>
> I have passed over to infrastructure to try and establish cause of the
> zero length datagrams, but at least I now know there isn’t anything
> fundamentally wrong here and can (safely) ignore the errors.
>
> Thanks
> Conrad
>
>
> [1] https://issues.apache.org/jira/browse/NIFI-3738
>
> On 25/04/2017, 17:46, "Bryan Bende"  wrote:
>
>Hi Conrad,
>
>Line 431 of ListenSyslog has the following code:
>
>if (!valid || !event.isValid())
>
>So to get an NPE there means event must be null, and event comes from
> this code:
>
>boolean valid = true;
>try {
>event = parser.parseEvent(rawSyslogEvent.getData(), sender);
>} catch (final ProcessException pe) {
>getLogger().warn("Failed to parse Syslog event; routing to
> invalid");
>valid = false;
>}
>
>The parser returns null if the bytes sent in are null or length 0.
>
>We should be checking if (!valid || event == null || !event.isValid())
>to avoid this case, and I think a similar situation exists in the
>ParseSyslog processor. It appears this would only happen if parsing
>messages is enabled in ListenSyslog.
>
>Do you want to create a JIRA for this?
>
>The other question is why you are ending up with these 0 length
>messages, but that one I am not sure about. In the case of UDP, its
>just reading from a datagram channel into a byte buffer and passing
>those bytes a long, so I think it means its receiving a 0 byte
>datagram from the sender.
>
>Thanks,
>
>Bryan
>
>
>On Tue, Apr 25, 2017 at 12:31 PM, Conrad Crampton
> wrote:
>
> Hi,
>
> Been away for a bit from this community due to other work pressures, but
> picking up Nifi again and successfully upgraded to 1.1.2 (apart from
> screwing up one of the nodes temporarily).
>
> So, with the renewed interest in log processing our infrastructure team has
> put in an F5 load balancer to distribute the syslog traffic I am collecting
> to my 6 node cluster. This is to stop one node being the only workhorse for
> receiving syslog traffic. I had previously used the ‘standard’ pattern of
> having the ListenSyslog processor connect to a RPG and then the rest of my
> data processing flow receive via a local port – to effectively distribute
> the processing load. I was finding though that the single node was getting
> too many warnings about buffer, sockets being full etc. – hence the
> external
> load balancing.
>
>
>
> I am no load balancing expert, but what I believe happens is the F5 load
> balancer receives syslog traffic (over UDP) then distributes this load to
> all Nifi nodes (gives a bit of syslog traffic to each I believe). All
> appears fine, but then I start getting NPE in my node logs thus:
>
>
>
> 2017-04-25 17:16:34,832 ERROR [Timer-Driven Process Thread-7]
> o.a.n.processors.standard.ListenSyslog
> ListenSyslog[id=0a932c37-0158-1000--656754bf]
> ListenSyslog[id=0a932c37-0158-1000--656754bf] failed to process due
> to java.lang.NullPointerException; rolling back session:
> java.lang.NullPointerException
>
> 2017-04-25 17:16:34,833 ERROR [Timer-Driven Process Thread-7]
> o.a.n.processors.standard.ListenSyslog
>
> java.lang.NullPointerException: null
>
>at
>
> org.apache.nifi.processors.standard.ListenSyslog.onTrigger(ListenSyslog.java:431)
> ~[nifi-standard-processors-1.1.2.jar:1.1.2]
>
>at
>
> org.apache.nifi.processor.AbstractProcessor.onTrigger(AbstractProcessor.java:27)
> ~[nifi-api-1.1.2.jar:1.1.2]
>
>at
>
> org.apache.nifi.controller.StandardProcessorNode.onTrigger(StandardProcessorNode.java:1099)
> [nifi-framework-core-1.1.2.jar:1.1.2]
>
>at
>
> org.apache.nifi.controller.tasks.ContinuallyRunProcessorTask.call(ContinuallyRunProcessorTask.java:136)
> [nifi-framework-core-1.1.2.jar:1.1.2]
>
>at
>
> org.apache.nifi.controller.tasks.ContinuallyRunProcessorTask.call(ContinuallyRunProcessorTask.java:47)
> [nifi-framework-core-1.1.2.jar:1.1.2]
>
>at
>
> org.apache.nifi.controller.scheduling.TimerDrivenSchedulingAgent$1.run(TimerDrivenSchedulingAgent.java:132)
> [nifi-framework-core-1.1.2.jar:1.1.2]
>
>at
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> [na:1.8.0_51]
>
>at 

Re: NiFi best practices to manage big flowfiles

2017-04-21 Thread Andrew Grande
Let me ask you this. All those processing cli steps, do they change file
format, content, etc? If yes, NiFi is not doing anything that you aren't
doing already. E.g. unpacking a file requires space for the original and
decompressed file to be available.

You can use ListFile and not move any files in NiFi. It will have a full
file path as an attribute which you can pass around to your tool
invocations.

HTH,
Andrew

On Fri, Apr 21, 2017, 7:17 AM Simone Giannecchini <
simone.giannecch...@geo-solutions.it> wrote:

> Dear Andrew,
> I am working with Damiano on this, so let me first thank you for your
> indications.
>
> The use case is as follows:
>
> - a satellite acquisition is placed on a shared file system. It can be
> significative in size, e.g. 10GB
> - it has to be pulled through a chain of operations out of a larger
> DAG where the elements of the sequence is decided depending on the
> data itself
> - we will surely create a number of intermediate files as we are going
> to use standard CLI tools for the processing
> - the resulting file will be placed again in a shared file system to
> be served by a cluster of mapping servers to generate maps on the fly
>
> We are getting thousands of this files per day hence we are trying to
> minimize file move operations.
>
> If you are still not sleeping, here is the point. Can I avoid, without
> having to customize too many parts of NIFI, to bring the original file
> into the content repository or we are stretching NIFI to far from its
> intended usage patterns?
>
> Thanks for your  patience.
>
>
> Regards,
> Simone Giannecchini
> ==
> GeoServer Professional Services from the experts!
> Visit http://goo.gl/it488V for more information.
> ==
> Ing. Simone Giannecchini
> @simogeo
> Founder/Director
>
> GeoSolutions S.A.S.
> Via di Montramito 3/A
> 55054  Massarosa (LU)
> Italy
> phone: +39 0584 962313
> fax: +39 0584 1660272
> mob:   +39  333 8128928
>
> http://www.geo-solutions.it
> http://twitter.com/geosolutions_it
>
> ---
> AVVERTENZE AI SENSI DEL D.Lgs. 196/2003
> Le informazioni contenute in questo messaggio di posta elettronica e/o
> nel/i file/s allegato/i sono da considerarsi strettamente riservate.
> Il loro utilizzo è consentito esclusivamente al destinatario del
> messaggio, per le finalità indicate nel messaggio stesso. Qualora
> riceviate questo messaggio senza esserne il destinatario, Vi preghiamo
> cortesemente di darcene notizia via e-mail e di procedere alla
> distruzione del messaggio stesso, cancellandolo dal Vostro sistema.
> Conservare il messaggio stesso, divulgarlo anche in parte,
> distribuirlo ad altri soggetti, copiarlo, od utilizzarlo per finalità
> diverse, costituisce comportamento contrario ai principi dettati dal
> D.Lgs. 196/2003.
>
> The information in this message and/or attachments, is intended solely
> for the attention and use of the named addressee(s) and may be
> confidential or proprietary in nature or covered by the provisions of
> privacy act (Legislative Decree June, 30 2003, no.196 - Italy's New
> Data Protection Code).Any use not in accord with its purpose, any
> disclosure, reproduction, copying, distribution, or either
> dissemination, either whole or partial, is strictly forbidden except
> previous formal approval of the named addressee(s). If you are not the
> intended recipient, please contact immediately the sender by
> telephone, fax or e-mail and delete the information in this message
> that has been received in error. The sender does not give any warranty
> or accept liability as the content, accuracy or completeness of sent
> messages and accepts no responsibility  for changes made after they
> were sent or for other risks which arise as a result of e-mail
> transmission, viruses, etc.
>
>
> On Fri, Apr 21, 2017 at 1:01 PM, Andrew Grande <apere...@gmail.com> wrote:
> > Hi,
> >
> > First, there won't be multiple copies of a file within NiFi. If you pass
> > around the content and don't change it (only attributes), it will merely
> > point a reference to it, no more.
> >
> > You need to decide if you want to delete processed files, this is what
> > GetFile does. Might want to look into ListFile/FetchFile instead, it
> > maintains internal state of files already processed.
> >
> > Assuming you want to delete the file from the original location, you can
> use
> > PutFile in your file to write it to the new working directory and connect
> > the success relationship to ExecuteStreamCommand.
> >
> > Andrew
> >
> >
> > On Fri, Apr 21, 2017, 5:37 AM damiano.giampa...@geo-solutions.it
> > <damiano.

Re: NiFi best practices to manage big flowfiles

2017-04-21 Thread Andrew Grande
Hi,

First, there won't be multiple copies of a file within NiFi. If you pass
around the content and don't change it (only attributes), it will merely
point a reference to it, no more.

You need to decide if you want to delete processed files, this is what
GetFile does. Might want to look into ListFile/FetchFile instead, it
maintains internal state of files already processed.

Assuming you want to delete the file from the original location, you can
use PutFile in your file to write it to the new working directory and
connect the success relationship to ExecuteStreamCommand.

Andrew

On Fri, Apr 21, 2017, 5:37 AM damiano.giampa...@geo-solutions.it <
damiano.giampa...@geo-solutions.it> wrote:

> Hi list,
>
> I'm a NiFi newbie and I'm trying to figure out the best way to use it as a
> batch ingestion system for satellite imagery as raster files.
> The files are pushed on the FS by an external system and then they must be
> processed and published through WMS protocols.
> I tried to draft the flow using the NiFi processors available, summarizing
> I used:
>
> - GetFile and PutFile processors to watch for incoming files to process
> - UpdateAttributes to manage the location of the incoming files and the
> intermediate processing products
> - ExecuteStreamProcess to call the gdalwarp and gdaladdo command line
> utilities to do the geospatial processing needed (http://www.gdal.org/)
>
> The issue I found with my use case is the fact that what represent
> flowfiles are big raster files (1 to 6GB) and I would like to minimize the
> number of copies of that resource.
>
> I used the GetFile processor to watch a FileSystem folder for incoming
> files.
> Once a new file is found, it is imported in the NiFi internal content
> repository so I can't reference it with its absolute FS path anymore since
> it is transformed into a flowfile (Did I misunderstand something here?)
> So I have to copy it again somewhere else on the FS to process it, the
> geospatial processing utilities I have to use require the abs path of the
> files to process.
>
> There could be maybe a solution which better address the design of this
> flow, for example I can watch not for the real file but for a txt file
> which describe its FS abs path.
>
> Anyway I was wondering if it is possible to configure the GetFile
> processors to use as flowfile payload only the absolute path of the file
> found, but I guess that in that case I have to write my own GetFile
> processor. (the same could be valid also for other Getxxx processors)
>
>
> Anyone has some hints to suggest me? Am I on the right path?
> I would be sure that I am not overlooking at some NiFi concept/feature
> that can allows me to better manage this Use case.
>
>
> I hope to have been clear enough, any shared shared would be extremely
> appreciated!
>
>
> Best regards,
> Damiano
>
> --
>
> ==
> GeoServer Professional Services from the experts!
> Visit http://goo.gl/it488V for more information.
> ==
> Damiano Giampaoli
> Software Engineer
>
> GeoSolutions S.A.S.
> Via di Montramito 3/A
> 55054  Massarosa (LU)
> Italy
> phone: +39 0584 962313
> fax: +39 0584 1660272
> mob:   +39 333 8128928 <%2B39%20%20333%208128928>
>
> http://www.geo-solutions.it
> http://twitter.com/geosolutions_it
>
> ---
> AVVERTENZE AI SENSI DEL D.Lgs. 196/2003
> Le informazioni contenute in questo messaggio di posta elettronica e/o
> nel/i file/s allegato/i sono da considerarsi strettamente riservate.
> Il loro utilizzo è consentito esclusivamente al destinatario del
> messaggio, per le finalità indicate nel messaggio stesso. Qualora
> riceviate questo messaggio senza esserne il destinatario, Vi preghiamo
> cortesemente di darcene notizia via e-mail e di procedere alla
> distruzione del messaggio stesso, cancellandolo dal Vostro sistema.
> Conservare il messaggio stesso, divulgarlo anche in parte,
> distribuirlo ad altri soggetti, copiarlo, od utilizzarlo per finalità
> diverse, costituisce comportamento contrario ai principi dettati dal
> D.Lgs. 196/2003.
>
> The information in this message and/or attachments, is intended solely
> for the attention and use of the named addressee(s) and may be
> confidential or proprietary in nature or covered by the provisions of
> privacy act (Legislative Decree June, 30 2003, no.196 - Italy's New
> Data Protection Code).Any use not in accord with its purpose, any
> disclosure, reproduction, copying, distribution, or either
> dissemination, either whole or partial, is strictly forbidden except
> previous formal approval of the named addressee(s). If you are not the
> intended recipient, please contact immediately the sender by
> telephone, fax or e-mail and delete the information in this message
> that has been received in error. The sender does not give any warranty
> or accept liability as the content, accuracy or completeness of sent
> messages and accepts no responsibility  for changes made after they
> were sent or for 

Re: Clustering Best Practices?

2017-04-20 Thread Andrew Grande
BTW, your NiFi instance is not single-threaded, it's a single node. It
still runs multiple worker threads in the flow.

Andrew

On Thu, Apr 20, 2017, 7:01 AM James McMahon  wrote:

> Good morning. I have established an initial single-threaded NiFi server
> instance for my customers. It works well, but I anticipate increasing usage
> as groups learn more about it. I also want to move beyond our single
> threaded-ness.
>
> I would like to take the next step in the evolution of our NiFi
> capability, implementing a clustered NiFi server configuration to help
> me address the following requirements:
> 1. increase our fault tolerance
> 2. permit our configuration to scale to peak processing demands during
> bulk data loads and as more customers begin to leverage our NiFi instance
> 3. permit our configuration to load balance
>
> I do intend to begin by reading through the clustering sections in the
> NiFi Sys Admin guide. I am also interested in hearing from our user
> community, particularly regarding clustering "best practices" and practical
> insights based on your experiences. Thanks in advance for any insights you
> are willing to share.  -Jim
>


Re: Change data capture processor

2017-04-12 Thread Andrew Grande
Yes, it's in 1.2.0. However, it's unreleased yet. Should be soon, you can
track progress by following the 1.2.0 release email thread on tbe dev list.

Andrew

On Wed, Apr 12, 2017, 1:55 AM Buntu Dev  wrote:

> Is the new CaptureChangeMySQL processor available in 1.2 and is there any
> documentation available for this processor?
>
>  https://issues.apache.org/jira/browse/NIFI-3413
>
>
> Thanks!
>


Re: Logging changes to workflow

2017-04-10 Thread Andrew Grande
John, the Flow Configuration History menu item already captures it. Does it
look like a fit?

Andrew

On Sun, Apr 9, 2017, 10:50 PM HARRIOTT, John 
wrote:

> Is it possible to log changes to a workflow so they can be captured by a
> central logging/auditing capability e.g. syslog.
>
> This will allow security auditors to detect changes to a systems
> configuration e.g. addition/removal/modification of a processor(s).
>
>
>
> Thanks
>
>
>
>
>
>
>
> This email has been sent on behalf of one of the following companies
> within the BAE Systems Australia group of companies:
>
> BAE Systems Australia Limited - Australian Company Number 008 423 005
> BAE Systems Australia Defence Pty Limited - Australian Company Number
> 006 870 846
> BAE Systems Australia Logistics Pty Limited - Australian Company
> Number 086 228 864
>
> Our registered office is Evans Building, Taranaki Road, Edinburgh Parks,
> Edinburgh, South Australia, 5111. If the identity of the sending company is
> not clear from the content of this email please contact the sender.
>
> This email and any attachments may contain confidential and legally
> privileged information.  If you are not the intended recipient, do not copy
> or disclose its content, but please reply to this email immediately and
> highlight the error to the sender and then immediately delete the message.
>


Re: Re: new Nifi Processors

2017-03-01 Thread Andrew Grande
Basically the GPL license puts restrictions on how one can distribute in
practical terms. Meaning your work may live under GPL license as long as
it's not part of the official package. End users will have to download your
NAR themselves.

Andrew

On Wed, Mar 1, 2017, 8:43 AM Matt Burgess  wrote:

> Uwe,
>
> Sorry for misspeaking, by "official Apache NiFi repo" I meant the
> Apache NiFi codebase (the "built-in" processors, e.g.). For the
> licensing part, if you distribute something with GPL binary
> dependencies, I believe the entire distribution must be licensed as
> GPL or something GPL-compatible.  This is not a bad thing, but due to
> Apache licensing, such a processor could not be accepted as-is into
> the NiFi codebase. Even LGPL binary dependencies are not allowed.
> However as you have made your processor available via your own repo,
> the community is free to download and use your processor under the
> terms of your license.  However if someone packaged up a NiFi
> distribution with a GPL-licensed processor (for example), they would
> not be allowed to distribute that package under an Apache 2.0 license;
> rather I believe the whole package would have to be licensed under the
> GPL.
>
> I am no licensing expert by any means, but I have had experience with
> these kinds of things, both for NiFi and other extensible open-source
> projects.
>
> Regards,
> Matt
>
> On Wed, Mar 1, 2017 at 7:01 AM, Uwe Geercken  wrote:
> > Matt,
> >
> > I did not know there is an official Apache Nifi repo. If you send me a
> link, I will have a look.
> >
> > Also, is there an official way of tagging, annotating or otherwise
> documenting the license model for a processor? At which point in the code,
> documentation do I have to place license information?
> >
> > I will check if the Apache license fits to my personal ideas of how my
> software should be protected. I am not a license expert, so I will have to
> spend some time to understand what that means. Also I need to check what it
> means for the software (and current users) if I change the license model.
> >
> > Anyway, this is still a first version of the processors. So they will
> mature over time and I hope at that point the extension registry is there.
> >
> > In general - as you know Matt - I am creating open source software
> (since 2000). I believe in the idea of open source and of sharing for the
> benefit of all of us.
> >
> > If I can, I will adjust whatever is necessary, so that the license is
> not a hurdle for using the processors. Nifi is a really great product and I
> still remember my first impression when I saw it.
> >
> > Greetings,
> >
> > Uwe
> >
> >> Gesendet: Mittwoch, 01. März 2017 um 03:56 Uhr
> >> Von: "Matt Burgess" 
> >> An: users@nifi.apache.org
> >> Betreff: Re: new Nifi Processors
> >>
> >> Uwe G has made his processors available (thank you!) via his own repo
> >> vs the official Apache NiFi repo; this may be directly related to your
> >> point about licensing.  Having said that, he is of course at liberty
> >> to license those separate processors as he sees fit (assuming it is
> >> also in accordance with the licenses he has employed).  Apache NiFi
> >> welcomes to its codebase Apache-friendly contributions (FAQ [1]), but
> >> alternatively and even before an Extension Registry [2] is supported,
> >> authors can make their NiFi processors and such available under the
> >> appropriate licenses.  If there are commercial (or other) entities
> >> looking to package such extensions with the official Apache NiFi
> >> distribution, they would be subject to the same terms of the License &
> >> Notice (L) of Apache NiFi as well as whatever extensions are added.
> >>
> >> Regards,
> >> Matt
> >>
> >> [1] https://www.apache.org/legal/resolved.html
> >> [2]
> https://cwiki.apache.org/confluence/display/NIFI/Extension+Repositories+%28aka+Extension+Registry%29+for+Dynamically-loaded+Extensions
> >>
> >>
> >> On Tue, Feb 28, 2017 at 9:33 PM, Angry Duck Studio
> >>  wrote:
> >> > Hi, Uwe,
> >> >
> >> > These look useful. However, typically custom processors are either
> Apache
> >> > 2.0 or MIT licensed. These don't seem to specify a license, but your
> >> > business rule engine (jare) seems to be GPL 3.0 licensed. I'm not
> sure that
> >> > fits with most uses of NiFi.
> >> >
> >> > Can you please clarify?
> >> >
> >> > Thanks
> >> >
> >> > -Matt
> >> >
> >> > On Tue, Feb 28, 2017 at 4:47 PM, Uwe Geercken 
> wrote:
> >> >>
> >> >> Hello everyone,
> >> >>
> >> >> I just wanted to let you know, that I have created four processors
> for
> >> >> Nifi
> >> >>
> >> >> 1) GenerateData - generates random data (test data) based on word
> lists,
> >> >> regular expressions or purely random
> >> >> 2) RuleEngine - a ruleengine which allows to process complex business
> >> >> logic. But the logic is maintained in a separate web app and thus
> 

Wait/Notify State in a Cluster

2017-02-16 Thread Andrew Grande
Hi Guys,

I've noticed the upcoming version has wait/notify set of processors (great
write ups by Koji, avain). Does it support clustered state, though? Planned
maybe? For high throughput scenarios, is it possible to switch it out to
algernative implementations if e.g. default relies on a ZK?

Thanks in adcance!
Andrew


Re: Nifi taking forever to start

2017-02-15 Thread Andrew Grande
I'm not sure piggy-backing on the host entropy will work reliably. I have
seen this issue in ec2, openstack boxes, etc. A newly spun up box will
exhibit this issue often.

Andrew

On Wed, Feb 15, 2017, 10:09 AM Bryan Rosander  wrote:

> Hey Arnaud,
>
> Andy's solution is definitely the right answer for Java applications in
> general (on docker or in vm or anywhere with more limited entropy).
>
> A more general way to take care of entropy issues in docker containers
> (applicable beyond NiFi) is to mount the host's /dev/random or /dev/urandom
> as the container's /dev/random. [1]
>
> If you want to use the host's /dev/random, the host machine will likely
> have significantly more entropy:
> -v /dev/random:/dev/random
>
> If you just want to force the container to use your host's /dev/urandom so
> it will never block for entropy (should be fine in the majority of cases
> [2]):
> -v /dev/urandom:/dev/random
>
> [1]
> http://stackoverflow.com/questions/26021181/not-enough-entropy-to-support-dev-random-in-docker-containers-running-in-boot2d#answer-26024403
> [2] http://www.2uo.de/myths-about-urandom/
>
> On Wed, Feb 15, 2017 at 5:15 AM, Andy LoPresto 
> wrote:
>
> Glad this fixed it and sorry it happened in the first place. This one is a
> personal antagonist of mine and I’ll be happy when it’s fixed for everyone.
> Good luck using the project.
>
> Andy LoPresto
> alopre...@apache.org
> *alopresto.apa...@gmail.com *
> PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4  BACE 3C6E F65B 2F7D EF69
>
> On Feb 15, 2017, at 2:09 AM, Arnaud G  wrote:
>
> Hi Andy,
>
> Thank you very much, and indeed it seems that you pointed the right
> problem. The docker is running in a VM and it seems that I had a lack of
> entroy.
>
> I changed the entropy source to /dev/urandom and Nifi was able to start
> immediately.
>
> Thank you very much for your help
>
> Arnaud
>
> On Wed, Feb 15, 2017 at 10:41 AM, Andy LoPresto 
> wrote:
>
> Hi Arnaud,
>
> I’m sorry you are having trouble getting NiFi going. We want to minimize
> any inconvenience and get you up and running quickly.
>
> Are you by any chance running on a VM that does not have access to any
> physical inputs to generate entropy for secure random seeding? There is a
> known issue [1] (being worked on for the next release) where this can cause
> the application to block because insufficient entropy is available (without
> the physical inputs, there is not enough random data to properly seed
> secure operations).
>
> I recommend you check if this the case (run this command in your terminal
> — if it hangs, this is the cause):
>
> head -n 1 /dev/random
>
> If it hangs, follow the instructions on this page [2] to modify the Java
> secure random source (ignore the warning that this is “less secure” — this
> is an urban legend propagated by a misunderstanding in the Linux kernel
> manual pages [3]).
>
> Modify $JAVA_HOME/jre/lib/security/java.security to change
> securerandom.source=file:/dev/random to
> securerandom.source=file:/dev/urandom
>
>
> [1] https://issues.apache.org/jira/browse/NIFI-3313
> [2]
> https://docs.oracle.com/cd/E13209_01/wlcp/wlss30/configwlss/jvmrand.html
> [3] http://www.2uo.de/myths-about-urandom/
>
> Andy LoPresto
> alopre...@apache.org
> *alopresto.apa...@gmail.com *
> PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4  BACE 3C6E F65B 2F7D EF69
>
> On Feb 15, 2017, at 1:29 AM, Arnaud G  wrote:
>
> Hi guys!
>
> I'm trying to play with nifi (1.1.1) in a docker image. I tried different
> configuration (cluster, single node, secured, etc.), however whatever I
> try, Nifi takes forever to start (like 30-45 minutes). This not related to
> data as I observe this behavior even when I instantiate the docker image
> for the first time.
>
> In the log it stops here:
>
> nifi-bootstrap.log
> 2017-02-14 08:52:34,624 INFO [NiFi Bootstrap Command Listener]
> org.apache.nifi.bootstrap.RunNiFi Apache NiFi now running and listening for
> Bootstrap requests on port 46553
>
> nifi-app.log
> 2017-02-14 08:53:11,225 INFO [main]
> o.a.nifi.properties.NiFiPropertiesLoader Loaded 121 properties from
> /opt/nifi/./conf/nifi.properties
>
> and then wait for boostraping (if I set up debug log level)
>
> Any idea what may cause this?
>
> Thanks in advance!
>
> AG
>
>
>
>
>
>
>
>


Re: nifi at AWS

2017-01-22 Thread Andrew Grande
Isn't it more advisable to use the HTTP mode instead, i.e. no additional
ports to open? Make sure to change the client RPG mode to http from RAW (in
the UI).

Andrew

On Sun, Jan 22, 2017, 10:47 AM Bryan Bende  wrote:

> Hello,
>
> I'm assuming you are using site-to-site since you mentioned failing to
> create a transaction.
>
> In nifi.properties on the AWS instance, there is probably a value for  
> nifi.remote.input.socket.port
> which would also need to be opened.
>
> -Bryan
>
> On Sat, Jan 21, 2017 at 7:00 PM, mohammed shambakey 
> wrote:
>
> Hi
>
> I'm trying to send a file from a local nifi instatnce to a remote nifi
> instance in AWS. Security rules at remote instance has port 8080 opened,
> yet each time I try to send the file, local nifi says it failed to create
> transaction to the remote instance.
>
> Regards
>
> --
> Mohammed
>
>
>


Re: DetectDuplicate

2016-12-19 Thread Andrew Grande
Juan, no change from how you remember this processor yet. I personally
would love to have a more pluggable backend for it, too.

Andrew

On Mon, Dec 19, 2016, 2:35 PM Juan Sequeiros  wrote:

> Hello,
>
> I am wondering if DetectDuplicate still has single dependency on
> Distributed Cache Service?
> And if so can I assume that DetectDuplicate will fail if Distributed Cache
> server is down?
>
>
> I want to replace our DetectDuplicate solution "external DB" and use
> NIFI's but single point reliance on Cache server is a blocker. Not sure if
> I am missing something possibly now it uses zookeeper?
>
>
>


Re: How to integrate a custom protocol with Apache Nifi

2016-12-06 Thread Andrew Grande
Kant,

Look into a custom processor. You have a choice of either implementing e.g.
a parser for your data, or, if the protocol is more involved, implement a
receiver in the processor as well which would spit out meaningful data
messages next.

Andrew

On Tue, Dec 6, 2016, 7:47 PM kant kodali  wrote:

> HI All,
>
> I understand that Apache Nifi has integration with many systems but what
> If I have an application that talks a custom protocol ? How do I integrate
> Apache Nifi with the custom protocol?
>
> Thanks,
> kant
>


Re: Data Provenance is not available

2016-11-21 Thread Andrew Grande
Yes.

It's a combination of a generic view data permission in the global menu and
specific access in the processing group.

Andrew

On Mon, Nov 21, 2016, 12:28 PM Pablo Lopez 
wrote:

> Hi,
>
> Any ideas as of why Data Provenance option is grayed out (not available)
> and none of the processors show the option in the context menu? Is this
> something to do with security?
>
> Thanks,
> Pablo.
>


Re: NiFi 1.0.0 canvas background

2016-11-15 Thread Andrew Grande
Hi Russ,

Alignment or any snapping to a grid wasn't there before, but that would be
a very welcome feature, I agree.

Regarding the background, I personally didn't notice a change, isn't it the
same?

Andrew

On Tue, Nov 15, 2016, 3:14 PM Russell Bateman <
russell.bate...@perfectsearchcorp.com> wrote:

> Now that I'm working in the 1.x world, there's no way I can restore the
> nice, light grid from 0.x to 1.x in place of the heavy, dull "fabric"
> background of the canvas in the new version, is there?
>
> I've Googled and prowled around the interface with no success.
>
> Also, wasn't there a way to align rectangles? I thought I remembered
> seeing that in 0.x, but couldn't find how to do it there or in 1.x. Maybe I
> was having a LibreOffice Draw-inspired dream or something.
>
> Thanks,
>
> Russ
>


Re: Delay Processor

2016-11-15 Thread Andrew Grande
Joe,

It's good to know some thinking went into this feature before. Basically,
I'm trying to put a spotlight on these 2 areas:

   1. Making it, potentially, more generic than a retry loop. E.g. enhance
   the ControlRate processor.
   2. Making these policies more explicit, so a user wouldn't have to
   second-guess on where to go to configure behavior. Here I don't have a
   strong opinion on whether a dedicated (or enhanced) processor or changes
   into standard processor configuration screens will be the way.

Andrew


Re: Delay Processor

2016-11-15 Thread Andrew Grande
Oleg,

I'll break my response in 2 threads. I understand your use case of 'delay
until X', but frankly would design it differently if we're talking about
long-term transactions or schedules. It may involve systems external to
NiFi. Anyway, I'd like to keep this use case out of scope for the delay
processor, at least for the immediate discussion.

Now jumping to another thread..


Delay Processor

2016-11-15 Thread Andrew Grande
Hi,

I'd lime to check where discussions are on this, ir propose the new
component otherwise.

Use case: make delay strategies explicit, easier to use. E.g. think of a
failure retry loop.

Currently, ControlRate is somewhat related, but can be improved. E.g.
introduce delay strategies a la prioritizers on the connection?

Thinking out loud, something like a exponential backoff strategy could be
kept stateless by adding a number of housekeeping attributes to the FF,
which eliminates the need for any state in the processor itself.

I'll stop here to see if any ideas were captured prior abd what community
thinks of it.

Andrew


Re: stream one large file, only once

2016-11-14 Thread Andrew Grande
Neither GetFile nor FetchFile read the file into memory, they only deal
with the file handle and pass the contents via a handle to the content
repository (NiFi streams data into and reads as a stream).

What you will face, however, is an issue with a SplitText when you try to
split it in 1 transaction. This might fail based on the JVM heap allocated
and file size. A recommended best practice in this case is to introduce a
series of 2 SplitText processors. 1st pass would split into e.g. 10 000 row
chunks, 2nd - into individual. Adjust for your expected file sizes and
available memory.

HTH,
Andrew

On Mon, Nov 14, 2016 at 7:23 AM Raf Huys  wrote:

> I would like to read in a large (several gigs) of logdata, and route every
> line to a (potentially different) Kafka topic.
>
> - I don't want this file to be in memory
> - I want it to be read once, not more
>
> using `GetFile` takes the whole file in memory. Same with `FetchFile` as
> far as I can see.
>
> I also used a `ExecuteProcess` processor in which the file is `cat` and
> which splits off a flowfile every millisecond. This looked to be a somewhat
> streaming approach to the problem, but this processor runs continuously (or
> cron based) and by consequence the logfile is re-injected all the time.
>
> What's the typical Nifi for this? Tx
>
> Raf Huys
>


Re: unable to empty the connection queue between 2 processors in NIFI secure cluster

2016-11-13 Thread Andrew Grande
Hi,

There are 2 levels basically. One is the global policies in the top right
menu. Another is in the operator menu on the left and is specific to every
processing group.

Sometimes you need a combination of both to allow for an action. E.g. try
data provenance and modify data permissions to allow emptying a queue.

Andrew

On Sun, Nov 13, 2016, 10:11 PM yinwencai Ywc  wrote:

> Hi guys, I've just setup a secure NIFI 1.0.0 cluster and tried to check
> how NIFI cluster works.
>
> I set up my NIFI secure cluster with LDAP based authorization and set the
> Initial Admin Identity to one of the users inside the LDAP server.
> I could successfully log into the NIFI user interface and could do almost
> anything inside, but when I tried to empty the connection queue between 2
> processors inside a processor group,
> it prompted me I don't have enough permission to do it. I checked the
> policies menu inside NIFI and have given this user all possible permissions
> but still failed. You could check the snapshots
> below:
>
>
> ​
>
> ​
>
> Does anyone have any idea why this would happen? Thanks.
>


Re: Enable Compression on Remote Port?

2016-11-11 Thread Andrew Grande
Disable transmission on RPG, go into the ports view again. Now, you should
be able to modify settings like compression, concurrent threads and
security on ports.

Andrew

On Thu, Nov 10, 2016, 2:26 PM Peter Wicks (pwicks) 
wrote:

> When I have a Remote Process Group and I view its Remote Ports I can see
> that all my ports show “Compressed” as No.  How can I change this so that
> the ports use compression?
>


Re: Trouble with PublishKafka10

2016-10-30 Thread Andrew Grande
2 things to check:

1. Are you connecting to Kafka 0.10 broker?
2. Which port are you using? Recent Kafka clients must point to the kafka
broker port directly. Older clients were connecting through a zookeeper and
used different host/port.

Andrew

On Sun, Oct 30, 2016, 1:11 PM Daniel Einspanjer <
daniel.einspan...@designet.com> wrote:

> These db rows are fairly small. Five or six small text fields and five or
> six integer fields plus a timestamp.  Looking at the queue in NiFi, about
> 440 bytes each.
>
>
> If I exported the template correctly, the flow should show that I am
> trying to generate a set of query statement flow files, each 10k records,
> and then executesql on them.  Next I use SplitAvro and AvroToJSON to get a
> set of flow files, each one record in json format.  Those are what I'm
> feeding in to Kafka.
>
> -Daniel
>
> On Oct 30, 2016 11:00 AM, "Joe Witt"  wrote:
>
> Daniel
>
> How large is each object you are trying to write to Kafka?
>
> Since it was working based on a different source of data but now
> problematic that is the direction I am looking in in terms of changes.
> Output of the db stuff could need demarcation for example.
>
>
>


Re: How I put the cluster down.

2016-10-28 Thread Andrew Grande
Hi,

I'd suggest couple things. Have you configured backpressure controls on
connections? NiFi 1.0.0 adds 1evt/1GB by default IIRC. This can help
avoid overwhelming components in a flow.

Next, the 2 core CPU is really inadequate for high throughput system, see
if you can get something better. It seems there's a lot going on in your
cluster. A full NiFi node with many flows does a lot of housekeeping in the
background, needs some power.

Andrew

On Fri, Oct 28, 2016, 8:36 AM Alessio Palma 
wrote:

> Hello Witt,
> before anything else thanks for your help.
> Fortunatly I  put down only the NIFI cluster, otherwise I was already in
> vacation :)
>
> After I posted this problem I kept to torture staging NIFI and
> discovered that when CPU LOAD gets very high, nodes loose connection and
> anything starts going in the bad directory. Also the WEB GUI becomes not
> responsive, you have no option to stop workflows.
>
> You can reproduce this issue starting some workflows composed by
> 1) GenerateFlowFile ( 1 Kb size, Timer driven, 0 sec run schedule )
> 2) ReplaceText ( just to force the use of regexp )
> 3) HashContent, ( auto terminate both relationships )
>
> Currently my staging cluster is composed by 2 virtual host configured as:
> 2 Core cpu ( Intel(R) Xeon(R) CPU E7- 2870  @ 2.40GHz )
> 2 GB RAM
> 18 GB HD
>
> The problem raised when the CPU load goes over 8, this basically means
> when you start 8 of the above WF.
>
> I noticed NIFI attempts to reduce the load but this does not works too
> much and does not avoid the general failure.
>
> Here you can see the errors which started to show under stress:
>
> https://drive.google.com/drive/folders/0B7NTMIqrCjESN0JURnRtZWp5Tms?usp=sharing
>
>
> The 1st question is: is here a way to keep the load under some critical
> values? Is there some "how to" which helps me to configure NIFI ?
> Currently it is using the factory settings and no customization has been
> performed but LDAP login.
>
> AP
>
>
>
> On 28/10/2016 13:24, Joe Witt wrote:
> > Alessio
> >
> > You have two clusters here potentially.  The NiFi cluster and the
> > Hadoop cluster.  Which one went down?
> >
> > If NiFi went down I'd suspect memory exhaustion issues because other
> > resource exhaustion issues like full file system, exhausted file
> > handles, pegged CPU, etc.. tend not to cause it to restart.  If memory
> > related you'll probably see something in the nifi-app.log.  Try going
> > with a larger heap as can be controlled in conf/bootstrap.conf.
> >
> > Thanks
> > Joe
> >
> > On Fri, Oct 28, 2016 at 5:55 AM, Alessio Palma
> >  wrote:
> >> Hello all,
> >> yesterday, for a mistake, basically I executed " ls -R / " using the
> >> ListHDFS processor and the whole cluster gone down ( not just a node ).
> >>
> >> Something like this also happened when I was playing with some DO WHILE
> >> / WHILE DO patterns. I have only the nifi logs and they show the
> >> heartbeat has been lost. About the CPU LOAD, NETWORK TRAFFIC I have no
> >> info. Any pointers about where do I have look for the problem's root ?
> >>
> >> Today I'm trying to repeat the problems I got with DO/WHILE, nothing bad
> >> is happening although CPU LOAD is enough high and NETWORK  TRAFFIC
> >> increased up to 282 Kb/sec.
> >>
> >> Of course I can redo the "ls -R /" on production, however I like to
> >> avoid it since there are already some ingestion flows running.
> >>
> >> AP
> > .
> >
>


Re: SelectHiveQL Error

2016-10-07 Thread Andrew Grande
I remember this error, it basically means your Hive is too old. There's no
way to make a generic Hive client, a line has to be drawn somewhere. Same,
as e.g. a car running on premium gas won't work with regular.

You need at least Hive 1.2.

Andrew

On Fri, Oct 7, 2016, 10:20 AM Nathamuni, Ramanujam <rnatham...@tiaa.org>
wrote:

> I do have similar client protocol issue? how can we make this  Hive*
> processor very generic where users can point to the LIB directory where it
> can have JAR files for Hadoop Cluster?
>
>
>
> SAS Hadoop Access connector is using below approach from their Enterprise
> Guide.
>
>
>
> -Download the JAR files from hadoop cluster
>
> -Down the config files from hadoop cluster
>
>
>
> Export two configuration variables
>
>
>
> Export HDOOOP_LIB_PATH=/opt/cdh/5.7.1/lib/ (which will have
> all the jar files)
>
> Export HADOOP_CONFIG_PATH=/opt/cdh/5.7.1/conf/
>
>
>
> Can we have similar options on all the hadoop related processors? Which
> will make things to work with all different version of hadoop.
>
>
>
> Thanks,
>
> Ram
>
> *From:* Dan Giannone [mailto:dgiann...@humana.com]
> *Sent:* Friday, October 07, 2016 9:49 AM
>
>
> *To:* users@nifi.apache.org
> *Subject:* RE: SelectHiveQL Error
>
>
>
> It turns out the port needed to be changed for hive server2 as well. That
> seemed to fix the below issue. However, now I get :
>
>
>
> > org.apache.thrift.TApplicationException: Required field
> 'client_protocol' is unset!
>
>
>
> Which according to this
> <http://stackoverflow.com/questions/30931599/error-jdbc-hiveconnection-error-opening-session-hive>
> indicates my hive and hive-jdbc versions are mismatching. “Hive –-version”
> gives me 1.1.0. If I were to download the hive-jdbc 1.1.0 jar, is there a
> way I could specify that it us that?
>
>
>
>
>
> -Dan
>
>
>
> *From:* Dan Giannone [mailto:dgiann...@humana.com <dgiann...@humana.com>]
> *Sent:* Friday, October 07, 2016 9:25 AM
> *To:* users@nifi.apache.org
> *Subject:* RE: SelectHiveQL Error
>
>
>
> Hi Matt,
>
>
>
> When I try to change to jdbc:hive2://, I get a different error set of
> errors.
>
>
>
> >Error getting Hive connection
>
> >org.apache.commons.dbcp.SQLNestedException: Cannot create
> PoolableConnectionFactory (Could not open client transport with JDBC Uri:
> jdbc:hive2://…)
>
> >Caused by: java.sql.SQLException: Could not open client transport with
> JDBC Uri: jdbc:hive2://…
>
> >Caused by: org.apache.thrift.transport.TTransportException: null
>
>
>
> I am thinking you are right in that it is an issue with my connection URL.
> Is there some command I can run that will generate this for me? Or a
> specific place I should look? The only mention of a url in hive-site.xml
> that I see is:
>
>
>
> 
>
> hive.metastore.uris
>
> thrift://server:port
>
> 
>
>
>
>
>
> -Dan
>
>
>
> *From:* Matt Burgess [mailto:mattyb...@gmail.com <mattyb...@gmail.com>]
> *Sent:* Thursday, October 06, 2016 5:17 PM
> *To:* users@nifi.apache.org
> *Subject:* Re: SelectHiveQL Error
>
>
>
> Andrew is correct. Although the HiveServer
>
> 1 driver is included with the NAR, the HiveConnectionPool is hardcoded to
> use the HiveServer 2 driver (since the former doesn't allow for
> simultaneous connections and we are using a connection pool :) the scheme
> should be jdbc:hive2:// not hive.
>
>
>
> If that was a typo and you are using the correct scheme, could you provide
> your configuration details/properties?
>
>
>
> Thanks,
>
> Matt
>
>
>
>
> On Oct 6, 2016, at 4:07 PM, Andrew Grande <apere...@gmail.com> wrote:
>
> Are you sure the jdbc url is correct? Iirc, it was jdbc:hive2://
>
> Andrew
>
>
>
> On Thu, Oct 6, 2016, 3:46 PM Dan Giannone <dgiann...@humana.com> wrote:
>
> Hi Matt,
>
> Here is the whole error trace, starting from when I turned on the
> SelectHiveQL processor:
>
> INFO [StandardProcessScheduler Thread-2]
> o.a.n.c.s.TimerDrivenSchedulingAgent Scheduled
> SelectHiveQL[id=0157102a-94da-11ec-0f7e-17fd3119aa00] to run with 1 threads
> 2016-10-06 15:37:06,554 INFO [Timer-Driven Process Thread-7]
> o.a.nifi.dbcp.hive.HiveConnectionPool
> HiveConnectionPool[id=0157102d-94da-11ec-4d91-5a8952e888bd] Simple
> Authentication
> 2016-10-06 15:37:06,556 ERROR [Timer-Driven Process Thread-7]
> o.a.nifi.dbcp.hive.HiveConnectionPool
> HiveConnectionPool[id=0157102d-94da-11ec-4d91-5a8952e888bd] Error getting
> Hive connection
> 2016-10-06 15:37:06,55

Re: nifi Rest API to get full details of the flow.

2016-09-28 Thread Andrew Grande
This isn't an ideal approach, IMO. There is a standard API to get a summary
of the flow and,status of every processor, check what the Summary tab is
invoking for a URL. You can then drill into any specific component by ID.

Andrew

On Wed, Sep 28, 2016, 6:42 AM Sandeep Khurana  wrote:

> Just now looked at flow.xml.gz file. Ir serves  the purpose. Thx
>
> On Wed, Sep 28, 2016 at 4:02 PM, Sandeep Khurana 
> wrote:
>
>> Hello
>>
>> Is there a way to get the full details of the flow which I created from
>> Nifi UI ?
>>
>> If want to the ID of processors programatically (without looking from
>> Nifi UI) and then based upon some conditions I want to see status of 1 or
>> more processors.
>>
>>  Is there any way ?
>>
>>
>>
>>
>
>
> --
> Thanks and regards
> Sandeep Khurana
>


Re: read in values from nifi.properties in a Groovy ExecuteScript processor

2016-09-23 Thread Andrew Grande
Which NiFi version? With 1.0 there are some bits for variable registry
available, basically one can reference values from external config files
via regular EL expressions.

Andrew

On Fri, Sep 23, 2016, 6:00 PM Tom Gullo  wrote:

> I want to read in values from nifi.properties in a Groovy ExecuteScript
> processor.  What's the best way to do that?
>
> Thanks
> -Tom
>


Re: UI: flow status and counters feedback

2016-09-21 Thread Andrew Grande
Alright guys, do we have enough consensus to start filing jira work items?
:)

Andrew

On Tue, Sep 20, 2016, 2:01 PM Andrew Grande <apere...@gmail.com> wrote:

> Let's fade the connection slowly to an inverted if backpressure engages?
>
> On Tue, Sep 20, 2016, 1:17 PM Rob Moran <rmo...@gmail.com> wrote:
>
>> Agreed – thanks for calling that out, Andy.
>>
>> Rob
>>
>> On Tue, Sep 20, 2016 at 1:13 PM, Andy LoPresto <alopre...@apache.org>
>> wrote:
>>
>>> In this and other UI discussions going on, I would request that everyone
>>> keep in mind the usability of the software by people with visual and other
>>> impairments. The US Federal Government has guidelines referred to as
>>> “Section 508” [1] which cover the design and usability of softwares
>>> specifically to ensure access for as many people as possible. Now, NiFi is
>>> not explicitly governed by these rules, but it seems to me that we should
>>> work towards accessibility from the beginning, not as a bolt-on effort.
>>>
>>> In that vein, one of the simplest and easiest rules is “color is great
>>> as a secondary indicator, but should not be the *only* indicator”. In
>>> practice — changing the color of a connection to indicate back pressure is
>>> a great feature, but there should be another indicator of back pressure
>>> that does not require the ability to discern color.
>>>
>>> [1]
>>> https://www.section508.gov/content/learn/standards/quick-reference-guide
>>>
>>> Andy LoPresto
>>> alopre...@apache.org
>>> *alopresto.apa...@gmail.com <alopresto.apa...@gmail.com>*
>>> PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4  BACE 3C6E F65B 2F7D EF69
>>>
>>> On Sep 20, 2016, at 8:28 AM, Andrew Grande <apere...@gmail.com> wrote:
>>>
>>> I like the tooltip addition of yours.
>>>
>>> For more interactive feedback on the canvas I can immediately think of 2
>>> items.
>>>
>>> 1. Indicator for when backpressure was configured on a connection
>>> (although it's now always added by default, maybe less useful).
>>>
>>> 2. Changing the color of a connection when backpressure has engaged
>>> could go a long way. Can go further, gradient color based on how close the
>>> connection backlog is to triggering the backpressure controls. Immediately
>>> highlights hotspots visually.
>>>
>>> Andrew
>>>
>>> On Tue, Sep 20, 2016, 9:40 AM Rob Moran <rmo...@gmail.com> wrote:
>>>
>>>> Andrew,
>>>>
>>>> Thanks for the feedback on the status bar. Separation between each item
>>>> helps but realize after your comments how it can not feel like a single,
>>>> cohesive group of items. We could probably tighten things up a bit.
>>>>
>>>> I think another part of this that could help would be to address some
>>>> of the discussion around awareness of stats updating. Being able to call
>>>> more attention (without being too intrusive) when stats change could help
>>>> ease some of the burden of having to routinely scan the status bar to look
>>>> for changes.
>>>>
>>>> Also related, I would like to see us get a tooltip that is seen when
>>>> you hover anywhere on the status bar. That tooltip would provide more
>>>> descriptive text about what each item means. It would help new users learn
>>>> as well as provide detail and follow-on action when something is alerted.
>>>>
>>>> Let's see what others think and then I can work on filing a jira to
>>>> capture thoughts.
>>>>
>>>> Rob
>>>>
>>>> On Mon, Sep 19, 2016 at 6:22 PM, Andrew Grande <apere...@gmail.com>
>>>> wrote:
>>>>
>>>>> Hi All,
>>>>>
>>>>> I'd like to provide some feedback on the NiFi 1.0 UI now that I had a
>>>>> chance to use it for a while, as well as pass along what I heard directly
>>>>> from other end users.
>>>>>
>>>>> Attached is a screenshot of a status bar right above the main flow
>>>>> canvas. The biggest difference from the 0.x UI is how much whitespace it
>>>>> now has between elements. To a point where it's not possible to quickly
>>>>> scan the state with a glance.
>>>>>
>>>>> Does anyone have other opinions? Can we adjust things slightly so they
>>>>> are easier on the eye an have less horizontal friction?
>>>>>
>>>>> Thanks!
>>>>> Andrew
>>>>>
>>>>>
>>>>>
>>>>
>>>
>>


Re: UI: feedback on the processor 'color' in NiFi 1.0

2016-09-20 Thread Andrew Grande
No need to go wild, changing processor colors should be enough, IMO. PG and
RPG are possible candidates, but they are different enough already, I guess.

What I heard quite often was to differentiate between regular processors,
incoming sources of data and out only (data producers?). Maybe even with a
shape?

Andrew

On Tue, Sep 20, 2016, 12:35 PM Rob Moran <rmo...@gmail.com> wrote:

> Good points. I was thinking a label would be tied to the group of
> components to which it was applied, but that could also introduce problems
> as things move and are added to a flow.
>
> So would you all expect to be able to change the color of every component
> type, or just processors?
>
> Andrew - your comment about coloring terminators red is interesting as
> well. What are some other parts of a flow you might use color to identify?
> Along with backpressure, we could explore other ways to call these things
> out so users do not come up with their own methods. Perhaps there are layer
> options, like on a map (e.g., "show terrain" or "show traffic").
>
> Rob
>
> On Tue, Sep 20, 2016 at 11:23 AM, Andrew Grande <apere...@gmail.com>
> wrote:
>
>> I agree. Labels are great for grouping, beyond PGs. Processor colors
>> individually add value. E.g. flow terminator colored in red was a very
>> common pattern I used. Besides, labels are not grouped with components, so
>> moving things and re-arranging is a pain.
>>
>> Andrew
>>
>> On Tue, Sep 20, 2016, 11:21 AM Joe Skora <jsk...@gmail.com> wrote:
>>
>>> Rob,
>>>
>>> The labelling functionality you described sounds very useful in
>>> general.  But, I miss the processor color too.
>>>
>>> I think labels are really useful for identifying groups of components
>>> and areas in the flow, but I worry that needing to use them in volume for
>>> processor coloring will increase the API and browser canvas load for
>>> elements that don't actually affect the flow.
>>>
>>> On Tue, Sep 20, 2016 at 10:40 AM, Rob Moran <rmo...@gmail.com> wrote:
>>>
>>>> What if we promote the use of Labels as a way to highlight things. We
>>>> could add functionality to expand their usefulness as a way to highlight
>>>> things on the canvas. I believe that is their intended use.
>>>>
>>>> Today you can create a label and change its color to highlight single
>>>> or multiple components. Even better you can do it for any component (not
>>>> just processors).
>>>>
>>>> To expand on functionality, I'm imagining a context menu and palette
>>>> action to "Label" a selected component or components. This would prompt
>>>> a user to pick a background and add text which would place a label
>>>> around everything once it's applied.
>>>>
>>>> Rob
>>>>
>>>> On Mon, Sep 19, 2016 at 6:42 PM, Jeff <jtsw...@gmail.com> wrote:
>>>>
>>>>> I was thinking, in addition to changing the color of the icon on the
>>>>> processor, that the color of the drop shadow could be changed as well.
>>>>> That would provide more contrast, but preserve readability, in my opinion.
>>>>>
>>>>> On Mon, Sep 19, 2016 at 6:39 PM Andrew Grande <apere...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Hi All,
>>>>>>
>>>>>> Rolling with UI feedback threads. This time I'd like to discuss how
>>>>>> NiFi 'lost' its ability to change processor boxes color. I.e. as you can
>>>>>> see from a screenshot attached, it does change color for the processor in
>>>>>> the flow overview panel, but the processor itself only changes the icon 
>>>>>> in
>>>>>> the top-left of the box. I came across a few users who definitely miss 
>>>>>> the
>>>>>> old way. I personally think changing the icon color for the processor
>>>>>> doesn't go far enough, especially when one is dealing with a flow of
>>>>>> several dozen processors, zooms in and out often. The overview helps, but
>>>>>> it's not the same.
>>>>>>
>>>>>> Proposal - can we restore how color selection for the processor
>>>>>> changed the actual background of the processor box on the canvas? Let the
>>>>>> user go wild with colors and deal with readability, but at least it's 
>>>>>> easy
>>>>>> to spot 'important' things this way. And with multi-tenant authorization 
>>>>>> it
>>>>>> becomes a poor-man's doc between teams, to an extent.
>>>>>>
>>>>>> Thanks for any feedback,
>>>>>> Andrew
>>>>>>
>>>>>
>>>>
>>>
>


Re: UI: flow status and counters feedback

2016-09-20 Thread Andrew Grande
I like the tooltip addition of yours.

For more interactive feedback on the canvas I can immediately think of 2
items.

1. Indicator for when backpressure was configured on a connection (although
it's now always added by default, maybe less useful).

2. Changing the color of a connection when backpressure has engaged could
go a long way. Can go further, gradient color based on how close the
connection backlog is to triggering the backpressure controls. Immediately
highlights hotspots visually.

Andrew

On Tue, Sep 20, 2016, 9:40 AM Rob Moran <rmo...@gmail.com> wrote:

> Andrew,
>
> Thanks for the feedback on the status bar. Separation between each item
> helps but realize after your comments how it can not feel like a single,
> cohesive group of items. We could probably tighten things up a bit.
>
> I think another part of this that could help would be to address some of
> the discussion around awareness of stats updating. Being able to call more
> attention (without being too intrusive) when stats change could help ease
> some of the burden of having to routinely scan the status bar to look for
> changes.
>
> Also related, I would like to see us get a tooltip that is seen when you
> hover anywhere on the status bar. That tooltip would provide more
> descriptive text about what each item means. It would help new users learn
> as well as provide detail and follow-on action when something is alerted.
>
> Let's see what others think and then I can work on filing a jira to
> capture thoughts.
>
> Rob
>
> On Mon, Sep 19, 2016 at 6:22 PM, Andrew Grande <apere...@gmail.com> wrote:
>
>> Hi All,
>>
>> I'd like to provide some feedback on the NiFi 1.0 UI now that I had a
>> chance to use it for a while, as well as pass along what I heard directly
>> from other end users.
>>
>> Attached is a screenshot of a status bar right above the main flow
>> canvas. The biggest difference from the 0.x UI is how much whitespace it
>> now has between elements. To a point where it's not possible to quickly
>> scan the state with a glance.
>>
>> Does anyone have other opinions? Can we adjust things slightly so they
>> are easier on the eye an have less horizontal friction?
>>
>> Thanks!
>> Andrew
>>
>>
>>
>


Re: UI: feedback on the processor 'color' in NiFi 1.0

2016-09-20 Thread Andrew Grande
I agree. Labels are great for grouping, beyond PGs. Processor colors
individually add value. E.g. flow terminator colored in red was a very
common pattern I used. Besides, labels are not grouped with components, so
moving things and re-arranging is a pain.

Andrew

On Tue, Sep 20, 2016, 11:21 AM Joe Skora <jsk...@gmail.com> wrote:

> Rob,
>
> The labelling functionality you described sounds very useful in general.
> But, I miss the processor color too.
>
> I think labels are really useful for identifying groups of components and
> areas in the flow, but I worry that needing to use them in volume for
> processor coloring will increase the API and browser canvas load for
> elements that don't actually affect the flow.
>
> On Tue, Sep 20, 2016 at 10:40 AM, Rob Moran <rmo...@gmail.com> wrote:
>
>> What if we promote the use of Labels as a way to highlight things. We
>> could add functionality to expand their usefulness as a way to highlight
>> things on the canvas. I believe that is their intended use.
>>
>> Today you can create a label and change its color to highlight single or
>> multiple components. Even better you can do it for any component (not just
>> processors).
>>
>> To expand on functionality, I'm imagining a context menu and palette
>> action to "Label" a selected component or components. This would prompt
>> a user to pick a background and add text which would place a label
>> around everything once it's applied.
>>
>> Rob
>>
>> On Mon, Sep 19, 2016 at 6:42 PM, Jeff <jtsw...@gmail.com> wrote:
>>
>>> I was thinking, in addition to changing the color of the icon on the
>>> processor, that the color of the drop shadow could be changed as well.
>>> That would provide more contrast, but preserve readability, in my opinion.
>>>
>>> On Mon, Sep 19, 2016 at 6:39 PM Andrew Grande <apere...@gmail.com>
>>> wrote:
>>>
>>>> Hi All,
>>>>
>>>> Rolling with UI feedback threads. This time I'd like to discuss how
>>>> NiFi 'lost' its ability to change processor boxes color. I.e. as you can
>>>> see from a screenshot attached, it does change color for the processor in
>>>> the flow overview panel, but the processor itself only changes the icon in
>>>> the top-left of the box. I came across a few users who definitely miss the
>>>> old way. I personally think changing the icon color for the processor
>>>> doesn't go far enough, especially when one is dealing with a flow of
>>>> several dozen processors, zooms in and out often. The overview helps, but
>>>> it's not the same.
>>>>
>>>> Proposal - can we restore how color selection for the processor changed
>>>> the actual background of the processor box on the canvas? Let the user go
>>>> wild with colors and deal with readability, but at least it's easy to spot
>>>> 'important' things this way. And with multi-tenant authorization it becomes
>>>> a poor-man's doc between teams, to an extent.
>>>>
>>>> Thanks for any feedback,
>>>> Andrew
>>>>
>>>
>>
>


UI: flow status and counters feedback

2016-09-19 Thread Andrew Grande
Hi All,

I'd like to provide some feedback on the NiFi 1.0 UI now that I had a
chance to use it for a while, as well as pass along what I heard directly
from other end users.

Attached is a screenshot of a status bar right above the main flow canvas.
The biggest difference from the 0.x UI is how much whitespace it now has
between elements. To a point where it's not possible to quickly scan the
state with a glance.

Does anyone have other opinions? Can we adjust things slightly so they are
easier on the eye an have less horizontal friction?

Thanks!
Andrew


Re: Configure Multiple NCM

2016-09-18 Thread Andrew Grande
Hi Tijo,

Take a look at clustering docs for NiFi 1.0. Zero-master clustering changed
a few things, any node can be elected to be a primary or coordinator now.
The 0.x concept of NCM is gone.

>From the UI access standpoint, one can hit any node in a cluster to get the
same experience.

Andrew

On Sun, Sep 18, 2016, 11:56 AM Tijo Thomas  wrote:

> Hi ,
>
> Is there any way to configure multiple NCMs and can do load balance amount
> NCMs.
>
> Tijo
>


Re: Processor to send flowfile to two different destinations?

2016-09-13 Thread Andrew Grande
It's actually very simple - connect a processor output to 2 or more other
processors or ports, use the 'success' relationship if prompted to choose
from multiple.

Andrew

On Tue, Sep 13, 2016, 6:14 PM Russell Bateman <
russell.bate...@perfectsearchcorp.com> wrote:

> *DuplicateFlowFile* sends multiple copies all to the Success
> relationship, mostly for testing load.
>
> How does one legitimately send two copies of the same flowfile to
> different relationships to result essentially in two parallel workflows?
> (I'm probably missing some simple understanding here...)
>
> Russ
>


Re: ExecuteProcess (fetch output)

2016-07-03 Thread Andrew Grande
Sven, take a look at ExtractText component, it will allow you to promote a
result into an attribute.

Andrew

On Sat, Jul 2, 2016, 7:41 PM Sven Davison  wrote:

> I’ve been trying to run a script and fetch said output from the script as
> a variable work with. The process executes and I see the content in the
> body if I send it off to a LogAttribute process but I’m not sure how to get
> the output of the script as a variable. I want to be able to use this
> content to send it to a database and it would be much easier if I could use
> it as a variable I think.
>
>
>
>
>
> Link to LogAttribute screenshot showing desired content.
> http://prntscr.com/bo2cbw
>
>
>
> I’m guessing I need to use the ExtractText processor but I’m not sure how
> to address the output of the script.
>
>
>
>
>
> -Sven
>
>
>
>
>
> Sent from Mail  for
> Windows 10
>
>
>


  1   2   >