from:"Peter Wicks \(pwicks\)"

RE: Exception on Processor ConvertJSONToSQL

2016-08-15 Thread Peter Wicks (pwicks)

Carlos,

I ran into this same error when querying Teradata. It looks like a lot of 
databases don't include this.
I submitted a bug a couple weeks ago: 
https://issues.apache.org/jira/browse/NIFI-2356

I did something similar to your suggestion locally in a modified version of the 
code.

Regards,
  Peter



From: Carlos Manuel Fernandes (DSI) [mailto:carlos.antonio.fernan...@cgd.pt]
Sent: Thursday, August 11, 2016 9:20 AM
To: users@nifi.apache.org
Subject: Exception on Processor ConvertJSONToSQL

Hi All,

I am making some tests to move data from Db2 to Netezza using Nifi.   If I 
don't use costume processors,   it's a  indirect away :

ExecuteSQL(on db2) -> ConvertAvroToJSON -> ConvertJSONToSQL -> PutSQL (bulk on 
netezza)

and  this away, I have an Exception on ConvertJSONToSQL:
org.netezza.error.NzSQLException: The column name IS_AUTOINCREMENT not found.
at org.netezza.sql.NzResultSet.findColumn(NzResultSet.java:266) 
~[nzjdbc.jar:na]
at org.netezza.sql.NzResultSet.getString(NzResultSet.java:1407) 
~[nzjdbc.jar:na]
at 
org.apache.commons.dbcp.DelegatingResultSet.getString(DelegatingResultSet.java:263)
 ~[na:na]
at 
org.apache.nifi.processors.standard.ConvertJSONToSQL$ColumnDescription.from(ConvertJSONToSQL.java:678)
 ~[nifi-standard-processors-0.7.0.jar:0.7.0]

Netezza jdbc driver doesn't implement IS_AUTOINCREMENT metadata column ( the 
same is true for oracle driver). Probably the reason is Netezza and Oracle 
don't have incremental columns because they use Sequences for this purpose.

On possible solution it to put a try catch (isn't beautiful) around
final String autoIncrementValue = resultSet.getString("IS_AUTOINCREMENT");  
(ConvertJSONToSQL.java:678)
and on the catch, put autoIncrementValue='NO'


Besides this error , we can remove  on  step ConvertAvroToJSON  in the flow  if 
 ExecuteSQL  is changed to generate optional
Output: Avro or JSON.

What you Think?

Thanks

Carlos

RE: PutSQL ERROR bulletin

2016-08-15 Thread Peter Wicks (pwicks)

Sven,

I've run into this a couple times.  In my case some records would insert and 
some would not.  To find my issue:

 - I routed all failures back to PutSQL
 - Reduced the batch size down to about 10
 - Changed the prioritization on the failure relationship so that hopefully 
failures will move to the back.

I then let it run until my failure count had stabilized. I stopped the 
processor and looked at the values. I then built and executed a SQL statement 
by hand in my SQL editor.

In my case we found two issues:
 - We had Unicode characters being inserted into a non-unicode field (NVARCHAR 
into VARCHAR)
 - PutSQL requires timestamps to be timestamps, but if you are using a 
JSONToSQL processor up stream it expects Timestamps to be epoch's, so it can 
convert them back to timestamps... This was one of our big issues.

Regards,
  Peter

-Original Message-
From: Joe Witt [mailto:joe.w...@gmail.com] 
Sent: Friday, August 12, 2016 5:11 PM
To: users@nifi.apache.org
Subject: Re: PutSQL ERROR bulletin

Hello Sven

LogAttributes will just show the attributes as they are understood by the flow 
file itself.  But the PutSQL processor may be doing something more specific 
with the data.  Can you share your configuration for PutSQL?

Thanks
Joe

On Fri, Aug 12, 2016 at 7:05 PM, Sven Davison  wrote:
> Actualy… I’m fairly sure I found it. I sent stuff off to “logAttribute”
> processor and found the input is not escaped.
>
>
>
> http://prntscr.com/c51je3
>
>
>
>
>
>
>
> Sent from Mail for Windows 10
>
>
>
> From: Sven Davison
> Sent: Friday, August 12, 2016 7:02 PM
> To: users@nifi.apache.org
> Subject: PutSQL ERROR bulletin
>
>
>
> I’m getting several inserts but every once in a while (every 1-2 
> minutes or so)… I get this error. Anyone know what might cause this?
>
>
> PutSQL[id=b8d54aa6-567a-4686-96bc-1c00e5d43461] Failed to update 
> database due to a failed batch update. There were a total of 1 
> FlowFiles that failed,
> 0 that succeeded, and 0 that were not execute and will be routed to 
> retry;
>
>
>
>
>
> -Sven
>
>
>
> Sent from Mail for Windows 10
>
>
>
>

RE: NiFi 1.1.0 stuck starting, no errors

2017-01-27 Thread Peter Wicks (pwicks)

It clocked in at around 200MB’s… 
https://drive.google.com/file/d/0B4yjbe5sOeAuRXAxT3Z6anRRN1k/view?usp=sharing


From: Andy LoPresto [mailto:alopre...@apache.org]
Sent: Thursday, January 26, 2017 3:06 PM
To: users@nifi.apache.org
Subject: Re: NiFi 1.1.0 stuck starting, no errors


Hi Peter,

Can you provide a thread dump of the process? You should be able to do this via 
the jcmd tool [1].

[1] 
https://docs.oracle.com/javase/8/docs/technotes/guides/troubleshoot/tooldescr006.html#BABEHABG
 
<https://docs.oracle.com/javase/8/docs/technotes/guides/troubleshoot/tooldescr006.html#BABEHABG>

Andy LoPresto
alopre...@apache.org<mailto:alopre...@apache.org>
alopresto.apa...@gmail.com<mailto:alopresto.apa...@gmail.com>
PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4  BACE 3C6E F65B 2F7D EF69

> On Jan 26, 2017, at 12:42 PM, Peter Wicks (pwicks) 
> <pwi...@micron.com<mailto:pwi...@micron.com>> wrote:
>
> I’m looking for help in troubleshooting my NiFi 1.1.0 install.  It’s been 
> running stably for some time, but I restarted it this morning when I deployed 
> an updated custom NAR. Now it gets stuck at startup, see logs at the end.

> There are no error messages, and the processes don’t die. The process just 
> seems to be hanging waiting for something.
>
> · My first thought was to try rolling back the modified nar, and even 
> just removing the nar all together since it was custom.  Neither of these 
> made any difference.

> · I also tried deleting the “work” folder, which has fixed nar 
> versioning issues for me in the past (not really related, but was worth a 
> shot). This made no difference.

> · NiFi is set to start with java.arg.2=-Xms4G and java.arg.3=-Xmx8G, 
> 22GB’s of free RAM are available on the system (out of some 60GB’s total).

> · I’ve checked running processes, and when I stop NiFi no rouge 
> instances are left running.
> · Since NiFi gets stuck right around the JettyServer step I checked 
> to see if any processes were using port 8443. No other processes are using 
> this port.

> · I thought maybe a key file was being locked, but with NiFi off 
> `lsof | grep nifi` returns no locked files.
>
> Nifi-app Log:
> 2017-01-26 20:23:43,359 INFO [main] org.eclipse.jetty.util.log Logging 
> initialized @90357ms
> 2017-01-26 20:23:43,418 INFO [main] org.apache.nifi.web.server.JettyServer 
> Configuring Jetty for HTTPs on port: 8443
> 2017-01-26 20:23:43,691 INFO [main] org.apache.nifi.web.server.JettyServer 
> Loading WAR: 
> /data/nifi/nifi-1.1.0/./work/nar/extensions/nifi-media-nar-1.1.0.nar-unpacked/META-INF/bundled-dependencies/nifi-image-viewer-1.1.0.war
>  with context path set to /nifi-image-viewer-1.1.0

> 2017-01-26 20:23:43,702 INFO [main] org.apache.nifi.web.server.JettyServer 
> Loading WAR: 
> /data/nifi/nifi-1.1.0/./work/nar/extensions/nifi-update-attribute-nar-1.1.0.nar-unpacked/META-INF/bundled-dependencies/nifi-update-attribute-ui-1.1.0.war
>  with context path set to /nifi-update-attribute-ui-1.1.0

> 2017-01-26 20:23:43,703 INFO [main] org.apache.nifi.web.server.JettyServer 
> Loading UI extension [ProcessorConfiguration, 
> /nifi-update-attribute-ui-1.1.0] for 
> [org.apache.nifi.processors.attributes.UpdateAttribute]

> 2017-01-26 20:23:43,713 INFO [main] org.apache.nifi.web.server.JettyServer 
> Loading WAR: 
> /data/nifi/nifi-1.1.0/./work/nar/extensions/nifi-standard-nar-1.1.0.nar-unpacked/META-INF/bundled-dependencies/nifi-standard-content-viewer-1.1.0.war
>  with context path set to /nifi-standard-content-viewer-1.1.0

> 2017-01-26 20:23:43,723 INFO [main] org.apache.nifi.web.server.JettyServer 
> Loading WAR: 
> /data/nifi/nifi-1.1.0/./work/nar/extensions/nifi-standard-nar-1.1.0.nar-unpacked/META-INF/bundled-dependencies/nifi-jolt-transform-json-ui-1.1.0.war
>  with context path set to /nifi-jolt-transform-json-ui-1.1.0

> 2017-01-26 20:23:43,724 INFO [main] org.apache.nifi.web.server.JettyServer 
> Loading UI extension [ProcessorConfiguration, 
> /nifi-jolt-transform-json-ui-1.1.0] for 
> [org.apache.nifi.processors.standard.JoltTransformJSON]

> 2017-01-26 20:23:43,729 INFO [main] org.apache.nifi.web.server.JettyServer 
> Loading WAR: 
> /data/nifi/nifi-1.1.0/./work/nar/framework/nifi-framework-nar-1.1.0.nar-unpacked/META-INF/bundled-dependencies/nifi-web-ui-1.1.0.war
>  with context path set to /nifi

> 2017-01-26 20:23:43,733 INFO [main] org.apache.nifi.web.server.JettyServer 
> Loading WAR: 
> /data/nifi/nifi-1.1.0/./work/nar/framework/nifi-framework-nar-1.1.0.nar-unpacked/META-INF/bundled-dependencies/nifi-web-api-1.1.0.war
>  with context path set to /nifi-api

> 2017-01-26 20:23:43,735 INFO [main] org.apache.nifi.web.server.JettyServer 
> Loading WAR: 
> /data/nifi/nifi-1.1.0/./work/nar/framework/nif

NiFi 1.1.0 stuck starting, no errors

2017-01-26 Thread Peter Wicks (pwicks)

I'm looking for help in troubleshooting my NiFi 1.1.0 install.  It's been 
running stably for some time, but I restarted it this morning when I deployed 
an updated custom NAR. Now it gets stuck at startup, see logs at the end.
There are no error messages, and the processes don't die. The process just 
seems to be hanging waiting for something.


* My first thought was to try rolling back the modified nar, and even 
just removing the nar all together since it was custom.  Neither of these made 
any difference.

* I also tried deleting the "work" folder, which has fixed nar 
versioning issues for me in the past (not really related, but was worth a 
shot). This made no difference.

* NiFi is set to start with java.arg.2=-Xms4G and java.arg.3=-Xmx8G, 
22GB's of free RAM are available on the system (out of some 60GB's total).

* I've checked running processes, and when I stop NiFi no rouge 
instances are left running.

* Since NiFi gets stuck right around the JettyServer step I checked to 
see if any processes were using port 8443. No other processes are using this 
port.

* I thought maybe a key file was being locked, but with NiFi off `lsof 
| grep nifi` returns no locked files.

Nifi-app Log:
2017-01-26 20:23:43,359 INFO [main] org.eclipse.jetty.util.log Logging 
initialized @90357ms
2017-01-26 20:23:43,418 INFO [main] org.apache.nifi.web.server.JettyServer 
Configuring Jetty for HTTPs on port: 8443
2017-01-26 20:23:43,691 INFO [main] org.apache.nifi.web.server.JettyServer 
Loading WAR: 
/data/nifi/nifi-1.1.0/./work/nar/extensions/nifi-media-nar-1.1.0.nar-unpacked/META-INF/bundled-dependencies/nifi-image-viewer-1.1.0.war
 with context path set to /nifi-image-viewer-1.1.0
2017-01-26 20:23:43,702 INFO [main] org.apache.nifi.web.server.JettyServer 
Loading WAR: 
/data/nifi/nifi-1.1.0/./work/nar/extensions/nifi-update-attribute-nar-1.1.0.nar-unpacked/META-INF/bundled-dependencies/nifi-update-attribute-ui-1.1.0.war
 with context path set to /nifi-update-attribute-ui-1.1.0
2017-01-26 20:23:43,703 INFO [main] org.apache.nifi.web.server.JettyServer 
Loading UI extension [ProcessorConfiguration, /nifi-update-attribute-ui-1.1.0] 
for [org.apache.nifi.processors.attributes.UpdateAttribute]
2017-01-26 20:23:43,713 INFO [main] org.apache.nifi.web.server.JettyServer 
Loading WAR: 
/data/nifi/nifi-1.1.0/./work/nar/extensions/nifi-standard-nar-1.1.0.nar-unpacked/META-INF/bundled-dependencies/nifi-standard-content-viewer-1.1.0.war
 with context path set to /nifi-standard-content-viewer-1.1.0
2017-01-26 20:23:43,723 INFO [main] org.apache.nifi.web.server.JettyServer 
Loading WAR: 
/data/nifi/nifi-1.1.0/./work/nar/extensions/nifi-standard-nar-1.1.0.nar-unpacked/META-INF/bundled-dependencies/nifi-jolt-transform-json-ui-1.1.0.war
 with context path set to /nifi-jolt-transform-json-ui-1.1.0
2017-01-26 20:23:43,724 INFO [main] org.apache.nifi.web.server.JettyServer 
Loading UI extension [ProcessorConfiguration, 
/nifi-jolt-transform-json-ui-1.1.0] for 
[org.apache.nifi.processors.standard.JoltTransformJSON]
2017-01-26 20:23:43,729 INFO [main] org.apache.nifi.web.server.JettyServer 
Loading WAR: 
/data/nifi/nifi-1.1.0/./work/nar/framework/nifi-framework-nar-1.1.0.nar-unpacked/META-INF/bundled-dependencies/nifi-web-ui-1.1.0.war
 with context path set to /nifi
2017-01-26 20:23:43,733 INFO [main] org.apache.nifi.web.server.JettyServer 
Loading WAR: 
/data/nifi/nifi-1.1.0/./work/nar/framework/nifi-framework-nar-1.1.0.nar-unpacked/META-INF/bundled-dependencies/nifi-web-api-1.1.0.war
 with context path set to /nifi-api
2017-01-26 20:23:43,735 INFO [main] org.apache.nifi.web.server.JettyServer 
Loading WAR: 
/data/nifi/nifi-1.1.0/./work/nar/framework/nifi-framework-nar-1.1.0.nar-unpacked/META-INF/bundled-dependencies/nifi-web-content-viewer-1.1.0.war
 with context path set to /nifi-content-viewer
2017-01-26 20:23:43,738 INFO [main] org.apache.nifi.web.server.JettyServer 
Loading WAR: 
/data/nifi/nifi-1.1.0/./work/nar/framework/nifi-framework-nar-1.1.0.nar-unpacked/META-INF/bundled-dependencies/nifi-web-docs-1.1.0.war
 with context path set to /nifi-docs
2017-01-26 20:23:43,753 INFO [main] org.apache.nifi.web.server.JettyServer 
Loading documents web app with context path set to /nifi-docs
2017-01-26 20:23:43,761 INFO [main] org.apache.nifi.web.server.JettyServer 
Loading WAR: 
/data/nifi/nifi-1.1.0/./work/nar/framework/nifi-framework-nar-1.1.0.nar-unpacked/META-INF/bundled-dependencies/nifi-web-error-1.1.0.war
 with context path set to /
2017-01-26 20:23:43,804 INFO [main] org.eclipse.jetty.server.Server 
jetty-9.3.9.v20160517
2017-01-26 20:23:44,748 INFO [main] o.e.jetty.server.handler.ContextHandler 
Started 
o.e.j.w.WebAppContext@4b511e61{/nifi-image-viewer-1.1.0,file:///data/nifi/nifi-1.1.0/work/jetty/nifi-image-viewer-1.1.0.war/webapp/,AVAILABLE}{./work/nar/extensions/nifi-media-nar-1.1.0.nar-unpacked/META-INF/bundled-dependencies/nifi-image-viewer-1.1.0.war}
2017-01-26 20:23:46,566

RE: adding dependencies like jdbc drivers to the build

2016-08-19 Thread Peter Wicks (pwicks)

While you are at it… can you make it so it supports more than one file in that 
field, probably comma delimited?  I have one JDBC driver that for whatever 
reason requires two separate JAR’s.

From: Joe Witt [mailto:joe.w...@gmail.com]
Sent: Friday, August 19, 2016 5:30 AM
To: users@nifi.apache.org
Subject: RE: adding dependencies like jdbc drivers to the build

Adding jars to the lib directory is not ideal as it pollutes all classloaders.  
We should add expression language support to the path property if it isn't 
already as that makes variable registry access available which makes use of the 
same template or flow in different environments easier.

On Aug 19, 2016 7:27 AM, "Peter Wicks (pwicks)" 
<pwi...@micron.com<mailto:pwi...@micron.com>> wrote:
Sumanth,

If the driver is in your lib directory already then you should leave the path 
empty.  All jar’s in your lib directory are loaded in the classpath for all 
NAR’s.

Personally I have three different JDBC drivers in my lib directory to make them 
available for whoever needs them (MS SQL, SAP Hana, Teradata, and will add 
Oracle soon).

--Peter

From: Sumanth Chinthagunta [mailto:xmlk...@gmail.com<mailto:xmlk...@gmail.com>]
Sent: Thursday, August 18, 2016 8:56 PM
To: users@nifi.apache.org<mailto:users@nifi.apache.org>
Subject: Re: adding dependencies like jdbc drivers to the build

It would be nice if we support relative paths for driver jar. E.g., 
./lib/mariadb-java-client-1.1.7.jar
This let flow templet portable (dev -> prod)

Sent from my iPhone

On Aug 18, 2016, at 2:25 PM, Bryan Bende 
<bbe...@gmail.com<mailto:bbe...@gmail.com>> wrote:
For JDBC, if you are talking about the DBConnectionPool, you should be able to 
reference a driver as an external file such as 
file:///var/tmp/mariadb-java-client-1.1.7.jar'

If you are talking about something different besides the DBConnectionPool then 
it depends what processor/component...
If you look in the lib directory you will see all the NAR files, each NAR has 
one or more components along with all of the other JARs it needs, and each NAR 
has isolated class loading so that they will not interfere with each other.

You would need to figure out which NAR you are dealing with and add a 
dependency to one of the poms related to that NAR.

-Bryan

On Thu, Aug 18, 2016 at 4:33 PM, Tom Gullo 
<tomgu...@gmail.com<mailto:tomgu...@gmail.com>> wrote:
If I want to add a jdbc driver or any third party dependency where should I add 
that dependency in the Maven build for Nifi?

Thanks
-Tom

v0.* QueryDatabaseTable vs v1 GenerateTableFetch

2016-08-15 Thread Peter Wicks (pwicks)

What is the future of QueryDatabaseTable? Unless I'm misunderstanding how it 
works it looks like GenerateTableFetch can do everything QueryDatabaseTable can 
do and then some.  Is there a plan to phase out QueryDatabaseTable?  Is there a 
reason for a new processor instead of an update to QueryDatabaseTable? It 
looked like the only user facing change was result paging, which could have had 
a default of 0 (no paging).  Just curious.

Regards,
  Peter

RE: v0.* QueryDatabaseTable vs v1 GenerateTableFetch

2016-08-15 Thread Peter Wicks (pwicks)

Oh, disregard :). I misread GenerateTableFetch as being an actual data fetch vs 
a query builder.

From: Peter Wicks (pwicks)
Sent: Monday, August 15, 2016 9:11 PM
To: 'users@nifi.apache.org' <users@nifi.apache.org>
Subject: v0.* QueryDatabaseTable vs v1 GenerateTableFetch

What is the future of QueryDatabaseTable? Unless I'm misunderstanding how it 
works it looks like GenerateTableFetch can do everything QueryDatabaseTable can 
do and then some.  Is there a plan to phase out QueryDatabaseTable?  Is there a 
reason for a new processor instead of an update to QueryDatabaseTable? It 
looked like the only user facing change was result paging, which could have had 
a default of 0 (no paging).  Just curious.

Regards,
  Peter

RE: I need help configuring Site-to-Site in Secure Mode.

2016-09-02 Thread Peter Wicks (pwicks)

Matt,

That was the case on our first go round, when we only had SSL certs.  We went 
back yesterday and got new certificates that support both Server Auth and 
Client Auth and rebuilt our KeyStore.

When I use keytool to look at my KeyStore I can see both of these on the 
certificate:

#6: ObjectId: 2.5.29.37 Criticality=false
ExtendedKeyUsages [
  clientAuth
  serverAuth
]

Thanks,
  Peter

From: Matthew Clarke [mailto:matt.clarke@gmail.com]
Sent: Friday, September 02, 2016 9:23 AM
To: users@nifi.apache.org
Subject: Re: I need help configuring Site-to-Site in Secure Mode.

Do the certs you created/obtained support being used for both client and server 
auth.  If they were created for server auth only, this could explain your 
issue.  NiFi instances need to act as a client and a server at times.

Thanks,
Matt

On Fri, Sep 2, 2016 at 10:59 AM, Peter Wicks (pwicks) 
<pwi...@micron.com<mailto:pwi...@micron.com>> wrote:
Bryan,

We’ve fixed our certs, with no change to the outcome.

We have username/password authentication enabled, via Kerberos, are there 
issues having Kerberos enabled (username/password) and trying to do 
site-to-site? When I try to initiate site-to-site with an instance of NiFI 
configured for Kerberos all requests come through to the server as anonymous 
because no challenge appears to be sent.  We’ve debugged the code and even deep 
down in NiFiUserUtils.getNiFiUser the request is already marked as anonymous by 
the Spring framework. It appears to me that the client has a cert, and is 
waiting for a challenge(?) from the server, and the server is configured for 
Kerberos and it’s waiting for a ‘bearer’ token…

We’ve debugged both client and server, the client sends the request and gets 
back a 401 (Unauthorized). SSL verifies good.
Server doesn’t appear to get any authorization information of any kind.

Looking for further guidance/next steps.

Thanks,
  Peter

From: Bryan Bende [mailto:bbe...@gmail.com<mailto:bbe...@gmail.com>]
Sent: Thursday, September 01, 2016 9:44 AM

To: users@nifi.apache.org<mailto:users@nifi.apache.org>
Subject: Re: I need help configuring Site-to-Site in Secure Mode.

Peter,

Yes, by no means am I saying everyone should use the TLS toolkit. I was just 
using that because many people are not familiar with how to create 
certificates, and for people trying to follow a tutorial it is the easiest 
option.

In your case you definitely want to be using your CA. What you described about 
not having a cert for client authentication definitely sounds like it would be 
a problem. Let us know if everything works out after getting the new certs.

-Bryan


On Thu, Sep 1, 2016 at 11:34 AM, Peter Wicks (pwicks) 
<pwi...@micron.com<mailto:pwi...@micron.com>> wrote:
Bryan,

Paul and I have been working on this, and I think our issue is related to 
certificates.

In your blog posting you used TLS-Toolkit in your example, but I think that is 
unrealistic for many environments.  For example, this also creates the 
certificates for SSL right? But these will be self-signed and thus untrusted by 
default in web browsers.  In our environment we generated SSL certificates from 
our CA and loaded them into the KeyStore.  We then extracted public keys for 
the SSL certs and put them in each of the Trust Stores.  This I think is where 
our main problem is…

I’m making a few assumptions here, so feel free to correct me, but my 
understanding is that when you use TLS-Toolkit it either creates multiple certs 
(SSL & Client Auth), or it creates a cert that you are allowed to use for both 
activities.  In our case we ONLY have SSL certs, and the certs are marked such 
that they aren’t allowed to be used for Client Authentication. I believe this 
is the reason why our requests are showing up as ‘Anonymous’, because there are 
no Client Authentication certificates in the KeyStore, just SSL certs.

I’ve asked our security team for Client Authentication certs for each server, 
since it would be our preference to use our CA rather than having TLS-Toolkit 
be its own CA.

Thoughts?

Thanks,
  Peter

From: Bryan Bende [mailto:bbe...@gmail.com<mailto:bbe...@gmail.com>]
Sent: Thursday, September 01, 2016 9:26 AM

To: users@nifi.apache.org<mailto:users@nifi.apache.org>
Subject: Re: I need help configuring Site-to-Site in Secure Mode.

Paul,

Clustering is not a requirement for site-to-site... This sounds strange since 
"anonymous" is used to represent a user when NiFi is not secured.

Can you double-check all your configs and make sure you have the following 
properties set...

nifi.remote.input.secure=true

nifi.web.https.host=
nifi.web.https.port=

nifi.security.keystore=
nifi.security.keystoreType=
nifi.security.keystorePasswd=
nifi.security.keyPasswd=
nifi.security.truststore=
nifi.security.truststoreType=
nifi.security.truststorePasswd=

After your question the other day I went through the steps of setting secure 
site-to-site to make

RE: I need help configuring Site-to-Site in Secure Mode.

2016-09-02 Thread Peter Wicks (pwicks)

Bryan,

In the log on the server side we see this message:

INFO [NiFi Web Server-324] o.a.n.w.a.c.AccessDeniedExceptionMapper anonymous 
does not have permission to access the requested resource. Returning 
Unauthorized response.

I forgot to mention, we tried adding a user named anonymous and granting it 
access.  When we did this Site-to-Site started working.  Obviously that is not 
a course of action we want to take, but it was a good exercise.

When I debugged the X509AuthenticationFilter it found no certs, here is the 
screenshot showing some of the stack/variables at the time.
https://goo.gl/photos/938E5A8vsb7nQ6Kh8

What would be the equivalent source code for debugging the client to find out 
why no cert is being sent?

Thanks,
  Peter



From: Bryan Bende [mailto:bbe...@gmail.com]
Sent: Friday, September 02, 2016 10:09 AM
To: users@nifi.apache.org
Subject: Re: I need help configuring Site-to-Site in Secure Mode.

Paul/Peter,

Having Kerberos enabled should not have any impact, you can have Kerberos or 
LDAP enabled, but if a certificate is provided that should always take 
precedence.

What do you see in the nifi-user.log on the second instance (the one where the 
remote process group is pointing at)?

If it found the incoming cert it should log something like:

  "Attempting request for  %s %s (source ip: %s)"
  "Authentication success for "

If it didn't find one I think it should log:

  "Rejecting access to web api: %s", ae.getMessage()

It might be help to try debugging into the X509AuthenticationFilter:

https://github.com/apache/nifi/blob/master/nifi-nar-bundles/nifi-framework-bundle/nifi-framework/nifi-web/nifi-web-security/src/main/java/org/apache/nifi/web/security/x509/X509AuthenticationFilter.java

The base class it extends is:

https://github.com/apache/nifi/blob/master/nifi-nar-bundles/nifi-framework-bundle/nifi-framework/nifi-web/nifi-web-security/src/main/java/org/apache/nifi/web/security/NiFiAuthenticationFilter.java

If it goes into the X509 filter and returns null from attemptAuthentication, we 
can at least narrow down to the fact that that the client certificate is not in 
the request for some reason.

It sounds like you might have already done this, but there is a line in 
bootstrap.conf that you can uncomment to hook up a remote debugger:

java.arg.debug=-agentlib:jdwp=transport=dt_socket,server=y,suspend=n,address=8000

-Bryan



On Fri, Sep 2, 2016 at 11:27 AM, Peter Wicks (pwicks) 
<pwi...@micron.com<mailto:pwi...@micron.com>> wrote:
Matt,

That was the case on our first go round, when we only had SSL certs.  We went 
back yesterday and got new certificates that support both Server Auth and 
Client Auth and rebuilt our KeyStore.

When I use keytool to look at my KeyStore I can see both of these on the 
certificate:

#6: ObjectId: 2.5.29.37 Criticality=false
ExtendedKeyUsages [
  clientAuth
  serverAuth
]

Thanks,
  Peter

From: Matthew Clarke 
[mailto:matt.clarke@gmail.com<mailto:matt.clarke@gmail.com>]
Sent: Friday, September 02, 2016 9:23 AM

To: users@nifi.apache.org<mailto:users@nifi.apache.org>
Subject: Re: I need help configuring Site-to-Site in Secure Mode.

Do the certs you created/obtained support being used for both client and server 
auth.  If they were created for server auth only, this could explain your 
issue.  NiFi instances need to act as a client and a server at times.

Thanks,
Matt

On Fri, Sep 2, 2016 at 10:59 AM, Peter Wicks (pwicks) 
<pwi...@micron.com<mailto:pwi...@micron.com>> wrote:
Bryan,

We’ve fixed our certs, with no change to the outcome.

We have username/password authentication enabled, via Kerberos, are there 
issues having Kerberos enabled (username/password) and trying to do 
site-to-site? When I try to initiate site-to-site with an instance of NiFI 
configured for Kerberos all requests come through to the server as anonymous 
because no challenge appears to be sent.  We’ve debugged the code and even deep 
down in NiFiUserUtils.getNiFiUser the request is already marked as anonymous by 
the Spring framework. It appears to me that the client has a cert, and is 
waiting for a challenge(?) from the server, and the server is configured for 
Kerberos and it’s waiting for a ‘bearer’ token…

We’ve debugged both client and server, the client sends the request and gets 
back a 401 (Unauthorized). SSL verifies good.
Server doesn’t appear to get any authorization information of any kind.

Looking for further guidance/next steps.

Thanks,
  Peter

From: Bryan Bende [mailto:bbe...@gmail.com<mailto:bbe...@gmail.com>]
Sent: Thursday, September 01, 2016 9:44 AM

To: users@nifi.apache.org<mailto:users@nifi.apache.org>
Subject: Re: I need help configuring Site-to-Site in Secure Mode.

Peter,

Yes, by no means am I saying everyone should use the TLS toolkit. I was just 
using that because many people are not familiar with how to create 
certificates, and for people tryin

RE: Kill-and-Fill Pattern?

2016-08-29 Thread Peter Wicks (pwicks)

Toivo,

I started down this path, but then came up with a broader solution (which I 
have not tested):


1.   Do a normal JSONToSQL

2.   Use MergeContent to group all of the FlowFiles from the same batch 
into a single new FlowFile using FlowFile Stream Merge Format.

3.   Update PutSQL to support Merged FlowFiles.

--Peter

From: Toivo Adams [mailto:toivo.ad...@gmail.com]
Sent: Sunday, August 28, 2016 7:27 AM
To: users@nifi.apache.org
Subject: Re: Kill-and-Fill Pattern?

hi
Could new processor PutAvroSQL help?
Processor will use data in Avro format and insert all records at once.
thanks
toivo

2016-08-26 16:45 GMT+03:00 Peter Wicks (pwicks) 
<pwi...@micron.com<mailto:pwi...@micron.com>>:
I have a source SQL table that I’m reading with a SQL select statement.  I want 
to kill and fill a destination SQL table with this source data on an interval.

My non kill-and-fill pattern is: ExecuteSQL -> Avro To JSON -> JSON To SQL -> 
PutSQL.

I’m trying to come up with a good way to delete existing data first before 
loading new data.
One option I’ve considered is to mark the original Avro file with a UUID and 
add this attribute as a field in the destination table; then do a split off, 
ReplaceText, and delete all rows where the UUID doesn’t match this batch.  I 
think this could work, but I’m worried about timing the SQL DELETE.  I kind of 
want the kill and the fill steps to happen in a single transaction.

The other issue is what happens if PutSQL has to go down for a while due to 
database downtime and I get several kill-and-fill batches piled up.  Is there a 
way I can use backpressure to make sure only a single file gets converted from 
JSON to SQL at a time in order to avoid mixing batches?
I also considered FlowFile expiration, but is there a way I can tell it NiFI to 
only expire a FlowFile when a new FlowFile has entered the queue? Ex: 1 flow 
file in queue, no expiration occurs. 2nd (newer) FlowFile enters queue then 
first file will expire itself.

Thanks,
  Peter

RE: I need help configuring Site-to-Site in Secure Mode.

2016-09-01 Thread Peter Wicks (pwicks)

Bryan,

Paul and I have been working on this, and I think our issue is related to 
certificates.

In your blog posting you used TLS-Toolkit in your example, but I think that is 
unrealistic for many environments.  For example, this also creates the 
certificates for SSL right? But these will be self-signed and thus untrusted by 
default in web browsers.  In our environment we generated SSL certificates from 
our CA and loaded them into the KeyStore.  We then extracted public keys for 
the SSL certs and put them in each of the Trust Stores.  This I think is where 
our main problem is…

I’m making a few assumptions here, so feel free to correct me, but my 
understanding is that when you use TLS-Toolkit it either creates multiple certs 
(SSL & Client Auth), or it creates a cert that you are allowed to use for both 
activities.  In our case we ONLY have SSL certs, and the certs are marked such 
that they aren’t allowed to be used for Client Authentication. I believe this 
is the reason why our requests are showing up as ‘Anonymous’, because there are 
no Client Authentication certificates in the KeyStore, just SSL certs.

I’ve asked our security team for Client Authentication certs for each server, 
since it would be our preference to use our CA rather than having TLS-Toolkit 
be its own CA.

Thoughts?

Thanks,
  Peter

From: Bryan Bende [mailto:bbe...@gmail.com]
Sent: Thursday, September 01, 2016 9:26 AM
To: users@nifi.apache.org
Subject: Re: I need help configuring Site-to-Site in Secure Mode.

Paul,

Clustering is not a requirement for site-to-site... This sounds strange since 
"anonymous" is used to represent a user when NiFi is not secured.

Can you double-check all your configs and make sure you have the following 
properties set...

nifi.remote.input.secure=true

nifi.web.https.host=
nifi.web.https.port=

nifi.security.keystore=
nifi.security.keystoreType=
nifi.security.keystorePasswd=
nifi.security.keyPasswd=
nifi.security.truststore=
nifi.security.truststoreType=
nifi.security.truststorePasswd=

After your question the other day I went through the steps of setting secure 
site-to-site to make sure I knew what I was talking about :)

I wrote up the steps here:  
http://bryanbende.com/development/2016/08/30/apache-nifi-1.0.0-secure-site-to-site

Thanks,

Bryan

On Thu, Sep 1, 2016 at 10:44 AM, Paul Gibeault (pagibeault) 
> wrote:
Bryan,

Thanks for the reply.  After increasing the log level for Authentication I saw 
the target NiFi instance used the account “anonymous” for the Site-to-Site 
connection.  After creating a policy for “anonymous”, I was able to view the 
remote ports and connect to them.

Obviously this is not ideal.  We would prefer to make policies for remote 
hosts/users rather than anonymous.

We are using the same SSL Certificate for both Key Store and Trust Store on a 
NiFi Instance.  This is likely the cause of the “anonymous” user as it doesn’t 
have a DN.  We are working to correct this.

However, after getting my work flow set up across NiFi instances I see this 
error:

Unable to refresh Remote Group's peers due to Unable to communicate with remote 
NiFi cluster in order to determine which nodes exist in the remote cluster

Our NiFi servers are not set up for clustering.  Is clustering required to 
perform Site-to-Site?

Thanks,
Paul Gibeault

From: Bryan Bende [mailto:bbe...@gmail.com]
Sent: Tuesday, August 30, 2016 5:09 PM
To: users@nifi.apache.org
Subject: Re: I need help configuring Site-to-Site in Secure Mode.

Paul,

It sounds like you probably have the certificates/truststores setup correctly 
and just need to create the appropriate policies...

Lets say you have nifi-1 with an Remote Process Group pointing at the URL of 
nifi-2, and nifi-2 has an Input port to receive data.

In nifi-2 there needs to be a user for the certificate of nifi-1, and then in 
the global policies of nifi-2 (top right menu) there needs to be a policy for 
"retrieve site-to-site details" with the nifi-1 user added to the policy. I 
think this is what is causing the error message you are seeing since nifi-1 is 
not authorized to query nifi-2 for site-to-site information (available ports, 
etc).

I believe you also need to create a policy on the Input Port on nifi-2... 
select the input port and use the lock icon in the left palette and choose 
"receive data over site-to-site" and add the user of nifi-1. This gives nifi-1 
access to the specific port.

Let us know if that works. If so we should definitely look at updating some of 
the documentation to explain this.

Thanks,

Bryan

On Tue, Aug 30, 2016 at 6:28 PM, Paul Gibeault (pagibeault) 
> wrote:
Hello,

We have been attempting to set up Site-to-Site for NiFi in secure mode and have 
not been successful.

When I create a Remote Process Group, and enter the URL*

Processor scheduling resets on Nifi Restart

2016-08-31 Thread Peter Wicks (pwicks)

We've noticed that a job we have setup to run once a day will run again in the 
same day if we restart NiFi (and again, and again, depending on how often we 
restart NiFi).
Is this by design?

RE: Erroneous Queue has No FlowFiles message

2016-09-09 Thread Peter Wicks (pwicks)

Matt,

This is not a cluster.
Yes, it’s secured. Kerberos.

The thing that gets me is I can list another queue on the same graph/same 
processor group.

--Peter


From: Matt Gilman [mailto:matt.c.gil...@gmail.com]
Sent: Friday, September 09, 2016 5:25 AM
To: users@nifi.apache.org
Subject: Re: Erroneous Queue has No FlowFiles message

Peter,

Thanks for the details! These will be very helpful investigating what's 
happening here. A couple follow-up questions...

- Is this a cluster?
- Is this instance secured?

Thanks

Matt

On Fri, Sep 9, 2016 at 12:13 AM, Peter Wicks (pwicks) 
<pwi...@micron.com<mailto:pwi...@micron.com>> wrote:
Gunjan,

Thanks for the response. I included those messages to emphasize the difference 
between a normal Queue List and mine.  In a normal queue list the GET step 
includes a non-empty “flowFileSummaries” array, assuming there are FlowFiles to 
show.
When I list my other queue, the one with 23 FlowFiles in it, I get back an 
array with 23 entries.  Based on the JSON I’m assuming that my queue with 
100,000 files in it should return 100, but instead I get 0.

Thanks,
  Peter

From: Gunjan Dave 
[mailto:gunjanpiyushd...@gmail.com<mailto:gunjanpiyushd...@gmail.com>]
Sent: Thursday, September 08, 2016 9:26 PM
To: users@nifi.apache.org<mailto:users@nifi.apache.org>
Subject: Re: Erroneous Queue has No FlowFiles message


Hi Peter, once you post the request, your first step, you get a listing request 
reference handle UUID as part of response.
This UUID is used to perform the all the operations on the queue.
This UUID is active until a DELETE request is sent. Once you delete the active 
request, you get the message you mentioned in the logs, this is not an issue.
If you check the developer panel in chrome, you will see all 3 operations, 
post-get-delete in succession.

On Fri, Sep 9, 2016, 8:48 AM Peter Wicks (pwicks) 
<pwi...@micron.com<mailto:pwi...@micron.com>> wrote:
Running NiFI 1.0.0, I’m listing a queue that has 100k files queued. I’ve 
stopped both the incoming and outgoing processors, so the files are just 
hanging out in the queue, no possible motion.

I get, “The queue has no FlowFiles” message.  Here are the actual responses 
from the REST calls:

POST - Listing-requests
{"listingRequest":{"id":"0cee44de-0157-1000-5668-6e93a465e227","uri":"https://localhost:8443/nifi-api/flowfile-queues/0bacce2d-0157-1000-1a6d-6e0fd84a6bd6/listing-requests/0cee44de-0157-1000-5668-6e93a465e227","submissionTime":"09/09/2016
 03:12:04.318 GMT+00:00","lastUpdated":"03:12:04 
GMT+00:00","percentCompleted":0,"finished":false,"maxResults":100,"state":"Waiting
 for other queue requests to 
complete","queueSize":{"byteCount":2540,"objectCount":10},"sourceRunning":false,"destinationRunning":false}}

GET
{"listingRequest":{"id":"0cee44de-0157-1000-5668-6e93a465e227","uri":"https:// 
localhost:8443/nifi-api/flowfile-queues/0bacce2d-0157-1000-1a6d-6e0fd84a6bd6/listing-requests/0cee44de-0157-1000-5668-6e93a465e227","submissionTime":"09/09/2016
 03:12:04.318 GMT+00:00","lastUpdated":"03:12:04 
GMT+00:00","percentCompleted":100,"finished":true,"maxResults":100,"state":"Completed
 
successfully","queueSize":{"byteCount":2540,"objectCount":10},"flowFileSummaries":[],"sourceRunning":false,"destinationRunning":false}}

DELETE
{"listingRequest":{"id":"0cee44de-0157-1000-5668-6e93a465e227","uri":"https:// 
localhost:8443/nifi-api/flowfile-queues/0bacce2d-0157-1000-1a6d-6e0fd84a6bd6/listing-requests/0cee44de-0157-1000-5668-6e93a465e227","submissionTime":"09/09/2016
 03:12:04.318 GMT+00:00","lastUpdated":"03:12:04 
GMT+00:00","percentCompleted":100,"finished":true,"maxResults":100,"state":"Completed
 
successfully","queueSize":{"byteCount":2540,"objectCount":10},"sourceRunning":false,"destinationRunning":false}}

On a subsequent test (thus the difference in ID’s) I checked the nifi-app.log 
file and found this single message:

2016-09-09 03:15:50,043 INFO [NiFi Web Server-828] 
o.a.n.controller.StandardFlowFileQueue Canceling ListFlowFile Request with ID 
0cf1b178-0157-1000-9111-9b889415bcdc

Not clear why it was canceled.

I went up one step in the process, and that queue has 23 items in it. I was 
able to list it without issue.

Any ideas why I can’t list the queue?

Thanks,
  Peter Wicks

RE: Erroneous Queue has No FlowFiles message

2016-09-09 Thread Peter Wicks (pwicks)

PutSQL.  The 100k FlowFiles are all SQL Insert queries with associated 
attributys, generated by a JSONToSQL processor.

From: Matt Gilman [mailto:matt.c.gil...@gmail.com]
Sent: Friday, September 09, 2016 8:51 AM
To: users@nifi.apache.org
Subject: Re: Erroneous Queue has No FlowFiles message

Peter,

What is the processor downstream of the connection in question? Thanks.

Matt

On Fri, Sep 9, 2016 at 10:39 AM, Matt Gilman 
<matt.c.gil...@gmail.com<mailto:matt.c.gil...@gmail.com>> wrote:
Peter,

Thanks for the answers. Still not quite sure what's causing this and am trying 
to narrow down the possible cause. Are you still able to replicate the issue? 
If so, can you enable debug level logging for

org.apache.nifi.controller.StandardFlowFileQueue

and see if there are any meaningful messages in the nifi-app.log?

Thanks!

Matt

On Fri, Sep 9, 2016 at 9:52 AM, Peter Wicks (pwicks) 
<pwi...@micron.com<mailto:pwi...@micron.com>> wrote:
Matt,

This is not a cluster.
Yes, it’s secured. Kerberos.

The thing that gets me is I can list another queue on the same graph/same 
processor group.

--Peter

From: Matt Gilman 
[mailto:matt.c.gil...@gmail.com<mailto:matt.c.gil...@gmail.com>]
Sent: Friday, September 09, 2016 5:25 AM

To: users@nifi.apache.org<mailto:users@nifi.apache.org>
Subject: Re: Erroneous Queue has No FlowFiles message

Peter,

Thanks for the details! These will be very helpful investigating what's 
happening here. A couple follow-up questions...

- Is this a cluster?
- Is this instance secured?

Thanks

Matt

On Fri, Sep 9, 2016 at 12:13 AM, Peter Wicks (pwicks) 
<pwi...@micron.com<mailto:pwi...@micron.com>> wrote:
Gunjan,

Thanks for the response. I included those messages to emphasize the difference 
between a normal Queue List and mine.  In a normal queue list the GET step 
includes a non-empty “flowFileSummaries” array, assuming there are FlowFiles to 
show.
When I list my other queue, the one with 23 FlowFiles in it, I get back an 
array with 23 entries.  Based on the JSON I’m assuming that my queue with 
100,000 files in it should return 100, but instead I get 0.

Thanks,
  Peter

From: Gunjan Dave 
[mailto:gunjanpiyushd...@gmail.com<mailto:gunjanpiyushd...@gmail.com>]
Sent: Thursday, September 08, 2016 9:26 PM
To: users@nifi.apache.org<mailto:users@nifi.apache.org>
Subject: Re: Erroneous Queue has No FlowFiles message

Hi Peter, once you post the request, your first step, you get a listing request 
reference handle UUID as part of response.
This UUID is used to perform the all the operations on the queue.
This UUID is active until a DELETE request is sent. Once you delete the active 
request, you get the message you mentioned in the logs, this is not an issue.
If you check the developer panel in chrome, you will see all 3 operations, 
post-get-delete in succession.

On Fri, Sep 9, 2016, 8:48 AM Peter Wicks (pwicks) 
<pwi...@micron.com<mailto:pwi...@micron.com>> wrote:
Running NiFI 1.0.0, I’m listing a queue that has 100k files queued. I’ve 
stopped both the incoming and outgoing processors, so the files are just 
hanging out in the queue, no possible motion.

I get, “The queue has no FlowFiles” message.  Here are the actual responses 
from the REST calls:

POST - Listing-requests
{"listingRequest":{"id":"0cee44de-0157-1000-5668-6e93a465e227","uri":"https://localhost:8443/nifi-api/flowfile-queues/0bacce2d-0157-1000-1a6d-6e0fd84a6bd6/listing-requests/0cee44de-0157-1000-5668-6e93a465e227","submissionTime":"09/09/2016
 03:12:04.318 GMT+00:00","lastUpdated":"03:12:04 
GMT+00:00","percentCompleted":0,"finished":false,"maxResults":100,"state":"Waiting
 for other queue requests to 
complete","queueSize":{"byteCount":2540,"objectCount":10},"sourceRunning":false,"destinationRunning":false}}

GET
{"listingRequest":{"id":"0cee44de-0157-1000-5668-6e93a465e227","uri":"https:// 
localhost:8443/nifi-api/flowfile-queues/0bacce2d-0157-1000-1a6d-6e0fd84a6bd6/listing-requests/0cee44de-0157-1000-5668-6e93a465e227","submissionTime":"09/09/2016
 03:12:04.318 GMT+00:00","lastUpdated":"03:12:04 
GMT+00:00","percentCompleted":100,"finished":true,"maxResults":100,"state":"Completed

successfully","queueSize":{"byteCount":2540,"objectCount":10},"flowFileSummaries":[],"sourceRunning":false,"destinationRunning":false}}

DELETE
{"listingRequest":{"id":"0cee44de-0157-1000-5668-6e93a465e227","uri":"https:// 
localhost:8443/nifi-api/flowfile-queues/0bacce2d-0157-1000-1a6d-6e0fd84a6bd6/li

Interesting Site-to-Site quirk with nifi.security.identity.mapping.pattern.dn

2016-09-11 Thread Peter Wicks (pwicks)

I've been playing with site-to-site and found an interesting quirk.  I had the 
full DN's from my certificates for my usernames, but decided to setup 
nifi.security.identity.mapping patterns for both the DN's and for Kerberos; 
which by the way works great for normal users.

I renamed just my own account in users.xml so I could login.  I was getting 
site-to-site login errors so I renamed the user accounts to be just the CN 
name, and in nifi-user.log I started seeing successful authentications.

Then I started seeing this message in the nifi-app.log and eventually it 
started showing up as bulletin messages:

EndpointConnectionPool[Cluster URL=https://host1:8443/nifi] failed to 
communicate with Peer[url=nifi://host1:8500,CLOSED] due to 
org.apache.nifi.remote.exception.HandshakeException: Received unexpected 
response
User Not Authorized: 
StandardRootGroupPort[id=1c60dcc0-0157-1000-c554-002d2b3e3702] authorization 
failed for user EMAILADDRESS=pwi...@micron.com, CN=host2, OU=ou, O=Micron 
Technology Inc., L=Boise, ST=ID, C=US because Unknown user with identity 
'EMAILADDRESS=pwi...@micron.com, CN=host2, OU=ou, O=Micron Technology Inc., 
L=Boise, ST=ID, C=US'.

I worked around my site-to-site auth issue by adding a second account with the 
full DN from the certificate.  This allowed site-to-site to start working again.

This feels like a bug in Site-to-Site (StandardRootGroupPort). I cut a Jira for 
it: https://issues.apache.org/jira/browse/NIFI-2757.

If I'm missing something from a configuration perspective please let me know.

RE: Nifi 1.0.0 compatibility with Hive 1.1.0

2016-09-08 Thread Peter Wicks (pwicks)

Also, ORC File support was pulled out into its own library on the HIVE side.
If you are willing to compile and run your own version you might need to 
include orc-core as a MVN dependency: 
https://mvnrepository.com/artifact/org.apache.orc/orc-core/1.2.0.


From: Andre [mailto:andre-li...@fucs.org]
Sent: Thursday, September 08, 2016 4:51 AM
To: users@nifi.apache.org
Subject: Re: Nifi 1.0.0 compatibility with Hive 1.1.0

Yari,

Is there any chance you can cherry pick commit 
80224e3e5ed7ee7b09c4985a920a7fa393bff26c and try again?

Post 1.0.0 there have been some changes to streamline compilation using vendor 
provided libraries.

Cheers

On Thu, Sep 8, 2016 at 8:44 PM, Yari Marchetti 
> wrote:
Hello,
I'd like to use Nifi 1.0.0 with Hive 1.1.0 (on CDH 5.5.2) but after some 
investigation I realised that the hive-jdbc driver included in Nifi is 
incompatible with the Hive version we're using (1.1.0 on CDH 5.5.2) as I'm 
getting the error:

org.apache.hive.jdbc.HiveConnection Error opening session
org.apache.thrift.TApplicationException: Required field 'client_protocol' is 
unset! Struct:TOpenSessionReq(client_protocol:null, 
configuration:{use:database=unifieddata})

So I just tried to recompile Nifi using the Cloudera profile 5.5.2 but 
compilation is failing:

[ERROR] Failed to execute goal 
org.apache.maven.plugins:maven-compiler-plugin:3.2:compile (default-compile) on 
project nifi-hive-processors: Compilation failure: Compilation failure:
[ERROR] 
/home/matteo/git/nifi/nifi-nar-bundles/nifi-hive-bundle/nifi-hive-processors/src/main/java/org/apache/hadoop/hive/ql/io/orc/NiFiOrcUtils.java:[26,43]
 error: package 
org.apache.hadoop.hive.ql.io.filters does 
not exist
[ERROR] 
/home/matteo/git/nifi/nifi-nar-bundles/nifi-hive-bundle/nifi-hive-processors/src/main/java/org/apache/hadoop/hive/ql/io/orc/OrcFlowFileWriter.java:[45,43]
 error: package 
org.apache.hadoop.hive.ql.io.filters does 
not exist
[ERROR] 
/home/matteo/git/nifi/nifi-nar-bundles/nifi-hive-bundle/nifi-hive-processors/src/main/java/org/apache/hadoop/hive/ql/io/orc/OrcFlowFileWriter.java:[643,24]
 error: cannot find symbol
[ERROR] symbol:   class BloomFilterIO
[ERROR] location: class TreeWriter
[ERROR] 
/home/matteo/git/nifi/nifi-nar-bundles/nifi-hive-bundle/nifi-hive-processors/src/main/java/org/apache/hadoop/hive/ql/io/orc/OrcFlowFileWriter.java:[645,30]
 error: cannot find symbol
[ERROR] symbol:   class BloomFilterIndex
[ERROR] location: class OrcProto
[ERROR] 
/home/matteo/git/nifi/nifi-nar-bundles/nifi-hive-bundle/nifi-hive-processors/src/main/java/org/apache/hadoop/hive/ql/io/orc/OrcFlowFileWriter.java:[646,30]
 error: cannot find symbol
[ERROR] symbol:   class BloomFilter
[ERROR] location: class OrcProto
[ERROR] 
/home/matteo/git/nifi/nifi-nar-bundles/nifi-hive-bundle/nifi-hive-processors/src/main/java/org/apache/hadoop/hive/ql/io/orc/NiFiOrcUtils.java:[450,32]
 error: cannot find symbol
[ERROR] symbol:   variable BloomFilterIO
[ERROR] location: class NiFiOrcUtils
[ERROR] 
/home/matteo/git/nifi/nifi-nar-bundles/nifi-hive-bundle/nifi-hive-processors/src/main/java/org/apache/hadoop/hive/ql/io/orc/OrcFlowFileWriter.java:[200,20]
 error: cannot find symbol
[ERROR] symbol:   variable OrcUtils
[ERROR] location: class OrcFlowFileWriter


Is there any way to get Nifi to work with Hive 1.1.0 and CDH 5.5.2?

Thanks,
Yari

RE: Erroneous Queue has No FlowFiles message

2016-09-08 Thread Peter Wicks (pwicks)

Gunjan,

Thanks for the response. I included those messages to emphasize the difference 
between a normal Queue List and mine.  In a normal queue list the GET step 
includes a non-empty “flowFileSummaries” array, assuming there are FlowFiles to 
show.
When I list my other queue, the one with 23 FlowFiles in it, I get back an 
array with 23 entries.  Based on the JSON I’m assuming that my queue with 
100,000 files in it should return 100, but instead I get 0.

Thanks,
  Peter

From: Gunjan Dave [mailto:gunjanpiyushd...@gmail.com]
Sent: Thursday, September 08, 2016 9:26 PM
To: users@nifi.apache.org
Subject: Re: Erroneous Queue has No FlowFiles message


Hi Peter, once you post the request, your first step, you get a listing request 
reference handle UUID as part of response.
This UUID is used to perform the all the operations on the queue.
This UUID is active until a DELETE request is sent. Once you delete the active 
request, you get the message you mentioned in the logs, this is not an issue.
If you check the developer panel in chrome, you will see all 3 operations, 
post-get-delete in succession.

On Fri, Sep 9, 2016, 8:48 AM Peter Wicks (pwicks) 
<pwi...@micron.com<mailto:pwi...@micron.com>> wrote:
Running NiFI 1.0.0, I’m listing a queue that has 100k files queued. I’ve 
stopped both the incoming and outgoing processors, so the files are just 
hanging out in the queue, no possible motion.

I get, “The queue has no FlowFiles” message.  Here are the actual responses 
from the REST calls:

POST - Listing-requests
{"listingRequest":{"id":"0cee44de-0157-1000-5668-6e93a465e227","uri":"https://localhost:8443/nifi-api/flowfile-queues/0bacce2d-0157-1000-1a6d-6e0fd84a6bd6/listing-requests/0cee44de-0157-1000-5668-6e93a465e227","submissionTime":"09/09/2016
 03:12:04.318 GMT+00:00","lastUpdated":"03:12:04 
GMT+00:00","percentCompleted":0,"finished":false,"maxResults":100,"state":"Waiting
 for other queue requests to 
complete","queueSize":{"byteCount":2540,"objectCount":10},"sourceRunning":false,"destinationRunning":false}}

GET
{"listingRequest":{"id":"0cee44de-0157-1000-5668-6e93a465e227","uri":"https:// 
localhost:8443/nifi-api/flowfile-queues/0bacce2d-0157-1000-1a6d-6e0fd84a6bd6/listing-requests/0cee44de-0157-1000-5668-6e93a465e227","submissionTime":"09/09/2016
 03:12:04.318 GMT+00:00","lastUpdated":"03:12:04 
GMT+00:00","percentCompleted":100,"finished":true,"maxResults":100,"state":"Completed
 
successfully","queueSize":{"byteCount":2540,"objectCount":10},"flowFileSummaries":[],"sourceRunning":false,"destinationRunning":false}}

DELETE
{"listingRequest":{"id":"0cee44de-0157-1000-5668-6e93a465e227","uri":"https:// 
localhost:8443/nifi-api/flowfile-queues/0bacce2d-0157-1000-1a6d-6e0fd84a6bd6/listing-requests/0cee44de-0157-1000-5668-6e93a465e227","submissionTime":"09/09/2016
 03:12:04.318 GMT+00:00","lastUpdated":"03:12:04 
GMT+00:00","percentCompleted":100,"finished":true,"maxResults":100,"state":"Completed
 
successfully","queueSize":{"byteCount":2540,"objectCount":10},"sourceRunning":false,"destinationRunning":false}}

On a subsequent test (thus the difference in ID’s) I checked the nifi-app.log 
file and found this single message:

2016-09-09 03:15:50,043 INFO [NiFi Web Server-828] 
o.a.n.controller.StandardFlowFileQueue Canceling ListFlowFile Request with ID 
0cf1b178-0157-1000-9111-9b889415bcdc

Not clear why it was canceled.

I went up one step in the process, and that queue has 23 items in it. I was 
able to list it without issue.

Any ideas why I can’t list the queue?

Thanks,
  Peter Wicks

Erroneous Queue has No FlowFiles message

2016-09-08 Thread Peter Wicks (pwicks)

Running NiFI 1.0.0, I'm listing a queue that has 100k files queued. I've 
stopped both the incoming and outgoing processors, so the files are just 
hanging out in the queue, no possible motion.

I get, "The queue has no FlowFiles" message.  Here are the actual responses 
from the REST calls:

POST - Listing-requests
{"listingRequest":{"id":"0cee44de-0157-1000-5668-6e93a465e227","uri":"https://localhost:8443/nifi-api/flowfile-queues/0bacce2d-0157-1000-1a6d-6e0fd84a6bd6/listing-requests/0cee44de-0157-1000-5668-6e93a465e227","submissionTime":"09/09/2016
 03:12:04.318 GMT+00:00","lastUpdated":"03:12:04 
GMT+00:00","percentCompleted":0,"finished":false,"maxResults":100,"state":"Waiting
 for other queue requests to 
complete","queueSize":{"byteCount":2540,"objectCount":10},"sourceRunning":false,"destinationRunning":false}}

GET
{"listingRequest":{"id":"0cee44de-0157-1000-5668-6e93a465e227","uri":"https:// 
localhost:8443/nifi-api/flowfile-queues/0bacce2d-0157-1000-1a6d-6e0fd84a6bd6/listing-requests/0cee44de-0157-1000-5668-6e93a465e227","submissionTime":"09/09/2016
 03:12:04.318 GMT+00:00","lastUpdated":"03:12:04 
GMT+00:00","percentCompleted":100,"finished":true,"maxResults":100,"state":"Completed
 
successfully","queueSize":{"byteCount":2540,"objectCount":10},"flowFileSummaries":[],"sourceRunning":false,"destinationRunning":false}}

DELETE
{"listingRequest":{"id":"0cee44de-0157-1000-5668-6e93a465e227","uri":"https:// 
localhost:8443/nifi-api/flowfile-queues/0bacce2d-0157-1000-1a6d-6e0fd84a6bd6/listing-requests/0cee44de-0157-1000-5668-6e93a465e227","submissionTime":"09/09/2016
 03:12:04.318 GMT+00:00","lastUpdated":"03:12:04 
GMT+00:00","percentCompleted":100,"finished":true,"maxResults":100,"state":"Completed
 
successfully","queueSize":{"byteCount":2540,"objectCount":10},"sourceRunning":false,"destinationRunning":false}}

On a subsequent test (thus the difference in ID's) I checked the nifi-app.log 
file and found this single message:

2016-09-09 03:15:50,043 INFO [NiFi Web Server-828] 
o.a.n.controller.StandardFlowFileQueue Canceling ListFlowFile Request with ID 
0cf1b178-0157-1000-9111-9b889415bcdc

Not clear why it was canceled.

I went up one step in the process, and that queue has 23 items in it. I was 
able to list it without issue.

Any ideas why I can't list the queue?

Thanks,
  Peter Wicks

RE: Erroneous Queue has No FlowFiles message

2016-09-09 Thread Peter Wicks (pwicks)

Matt,

I followed the swapping train of thought and debugged the code. When I debug 
the code where it gets the files the `size` variable looks like this:

FlowFile Queue Size[ ActiveQueue=[0, 0 Bytes], Swap Queue=[10, 2660 
Bytes], Swap Files=[10], Unacknowledged=[0, 0 Bytes] ]

But the List FlowFiles command only looks at the Active queue…

That looks like the root cause, what I don’t know is if this is by design.

--Peter

From: Peter Wicks (pwicks)
Sent: Friday, September 09, 2016 3:28 PM
To: 'users@nifi.apache.org' <users@nifi.apache.org>
Subject: RE: Erroneous Queue has No FlowFiles message

Matt,

You also asked in an earlier email if I could still reproduce it, and if so to 
try enabling DEBUG level logging.  I am able to reproduce, so I enabled it:

2016-09-09 21:27:28,352 DEBUG [List FlowFiles for Connection 
0f620e2d-0157-1000-4a1d-fd988c59e290] o.a.n.controller.StandardFlowFileQueue 
FlowFileQueue[id=0f620e2d-0157-1000-4a1d-fd988c59e290] Acquired lock to perform 
listing of FlowFiles

2016-09-09 21:27:28,353 DEBUG [List FlowFiles for Connection 
0f620e2d-0157-1000-4a1d-fd988c59e290] o.a.n.controller.StandardFlowFileQueue 
FlowFileQueue[id=0f620e2d-0157-1000-4a1d-fd988c59e290] Finished listing 
FlowFiles for active queue with a total of 0 results

2016-09-09 21:27:29,656 INFO [NiFi Web Server-112] 
o.a.n.controller.StandardFlowFileQueue Canceling ListFlowFile Request with ID 
10d92339-0157-1000-42f4-464c37340fdb

Thanks,
  Peter

From: Peter Wicks (pwicks)
Sent: Friday, September 09, 2016 3:15 PM
To: users@nifi.apache.org<mailto:users@nifi.apache.org>
Subject: RE: Erroneous Queue has No FlowFiles message

Matt,

PutSQL is the end of the line, no downstream processors.
Batch size is 1000, yes I have fragmented transactions set to false.

nifi.queue.swap.threshold=2

--Peter


From: Matt Gilman [mailto:matt.c.gil...@gmail.com]
Sent: Friday, September 09, 2016 2:23 PM
To: users@nifi.apache.org<mailto:users@nifi.apache.org>
Subject: Re: Erroneous Queue has No FlowFiles message

Peter,

Would you be able to share what you've configured for the batch size of PutSQL 
(assuming that 'fragmented transactions' is disabled) and what your swap 
threshold is configured to (nifi.queue.swap.threshold in nifi.properties)?

Also, what is following the PutSQL? Had any of those connections exceeded their 
configured back pressure threshold?

Thanks again.

Matt

On Fri, Sep 9, 2016 at 11:18 AM, Peter Wicks (pwicks) 
<pwi...@micron.com<mailto:pwi...@micron.com>> wrote:
PutSQL.  The 100k FlowFiles are all SQL Insert queries with associated 
attributys, generated by a JSONToSQL processor.

From: Matt Gilman 
[mailto:matt.c.gil...@gmail.com<mailto:matt.c.gil...@gmail.com>]
Sent: Friday, September 09, 2016 8:51 AM

To: users@nifi.apache.org<mailto:users@nifi.apache.org>
Subject: Re: Erroneous Queue has No FlowFiles message

Peter,

What is the processor downstream of the connection in question? Thanks.

Matt

On Fri, Sep 9, 2016 at 10:39 AM, Matt Gilman 
<matt.c.gil...@gmail.com<mailto:matt.c.gil...@gmail.com>> wrote:
Peter,

Thanks for the answers. Still not quite sure what's causing this and am trying 
to narrow down the possible cause. Are you still able to replicate the issue? 
If so, can you enable debug level logging for

org.apache.nifi.controller.StandardFlowFileQueue

and see if there are any meaningful messages in the nifi-app.log?

Thanks!

Matt


On Fri, Sep 9, 2016 at 9:52 AM, Peter Wicks (pwicks) 
<pwi...@micron.com<mailto:pwi...@micron.com>> wrote:
Matt,

This is not a cluster.
Yes, it’s secured. Kerberos.

The thing that gets me is I can list another queue on the same graph/same 
processor group.

--Peter


From: Matt Gilman 
[mailto:matt.c.gil...@gmail.com<mailto:matt.c.gil...@gmail.com>]
Sent: Friday, September 09, 2016 5:25 AM

To: users@nifi.apache.org<mailto:users@nifi.apache.org>
Subject: Re: Erroneous Queue has No FlowFiles message

Peter,

Thanks for the details! These will be very helpful investigating what's 
happening here. A couple follow-up questions...

- Is this a cluster?
- Is this instance secured?

Thanks

Matt

On Fri, Sep 9, 2016 at 12:13 AM, Peter Wicks (pwicks) 
<pwi...@micron.com<mailto:pwi...@micron.com>> wrote:
Gunjan,

Thanks for the response. I included those messages to emphasize the difference 
between a normal Queue List and mine.  In a normal queue list the GET step 
includes a non-empty “flowFileSummaries” array, assuming there are FlowFiles to 
show.
When I list my other queue, the one with 23 FlowFiles in it, I get back an 
array with 23 entries.  Based on the JSON I’m assuming that my queue with 
100,000 files in it should return 100, but instead I get 0.

Thanks,
  Peter

From: Gunjan Dave 
[mailto:gunjanpiyushd...@gmail.com<mailto:gunjanpiyushd...@gmail.com>]
Sent: Thursday, September 08, 2016 9:26 PM
To: users@nifi.apache.org<mailto:users@nifi.apache.org

RE: Erroneous Queue has No FlowFiles message

2016-09-09 Thread Peter Wicks (pwicks)

Matt,

PutSQL is the end of the line, no downstream processors.
Batch size is 1000, yes I have fragmented transactions set to false.

nifi.queue.swap.threshold=2

--Peter


From: Matt Gilman [mailto:matt.c.gil...@gmail.com]
Sent: Friday, September 09, 2016 2:23 PM
To: users@nifi.apache.org
Subject: Re: Erroneous Queue has No FlowFiles message

Peter,

Would you be able to share what you've configured for the batch size of PutSQL 
(assuming that 'fragmented transactions' is disabled) and what your swap 
threshold is configured to (nifi.queue.swap.threshold in nifi.properties)?

Also, what is following the PutSQL? Had any of those connections exceeded their 
configured back pressure threshold?

Thanks again.

Matt

On Fri, Sep 9, 2016 at 11:18 AM, Peter Wicks (pwicks) 
<pwi...@micron.com<mailto:pwi...@micron.com>> wrote:
PutSQL.  The 100k FlowFiles are all SQL Insert queries with associated 
attributys, generated by a JSONToSQL processor.

From: Matt Gilman 
[mailto:matt.c.gil...@gmail.com<mailto:matt.c.gil...@gmail.com>]
Sent: Friday, September 09, 2016 8:51 AM

To: users@nifi.apache.org<mailto:users@nifi.apache.org>
Subject: Re: Erroneous Queue has No FlowFiles message

Peter,

What is the processor downstream of the connection in question? Thanks.

Matt

On Fri, Sep 9, 2016 at 10:39 AM, Matt Gilman 
<matt.c.gil...@gmail.com<mailto:matt.c.gil...@gmail.com>> wrote:
Peter,

Thanks for the answers. Still not quite sure what's causing this and am trying 
to narrow down the possible cause. Are you still able to replicate the issue? 
If so, can you enable debug level logging for

org.apache.nifi.controller.StandardFlowFileQueue

and see if there are any meaningful messages in the nifi-app.log?

Thanks!

Matt


On Fri, Sep 9, 2016 at 9:52 AM, Peter Wicks (pwicks) 
<pwi...@micron.com<mailto:pwi...@micron.com>> wrote:
Matt,

This is not a cluster.
Yes, it’s secured. Kerberos.

The thing that gets me is I can list another queue on the same graph/same 
processor group.

--Peter


From: Matt Gilman 
[mailto:matt.c.gil...@gmail.com<mailto:matt.c.gil...@gmail.com>]
Sent: Friday, September 09, 2016 5:25 AM

To: users@nifi.apache.org<mailto:users@nifi.apache.org>
Subject: Re: Erroneous Queue has No FlowFiles message

Peter,

Thanks for the details! These will be very helpful investigating what's 
happening here. A couple follow-up questions...

- Is this a cluster?
- Is this instance secured?

Thanks

Matt

On Fri, Sep 9, 2016 at 12:13 AM, Peter Wicks (pwicks) 
<pwi...@micron.com<mailto:pwi...@micron.com>> wrote:
Gunjan,

Thanks for the response. I included those messages to emphasize the difference 
between a normal Queue List and mine.  In a normal queue list the GET step 
includes a non-empty “flowFileSummaries” array, assuming there are FlowFiles to 
show.
When I list my other queue, the one with 23 FlowFiles in it, I get back an 
array with 23 entries.  Based on the JSON I’m assuming that my queue with 
100,000 files in it should return 100, but instead I get 0.

Thanks,
  Peter

From: Gunjan Dave 
[mailto:gunjanpiyushd...@gmail.com<mailto:gunjanpiyushd...@gmail.com>]
Sent: Thursday, September 08, 2016 9:26 PM
To: users@nifi.apache.org<mailto:users@nifi.apache.org>
Subject: Re: Erroneous Queue has No FlowFiles message


Hi Peter, once you post the request, your first step, you get a listing request 
reference handle UUID as part of response.
This UUID is used to perform the all the operations on the queue.
This UUID is active until a DELETE request is sent. Once you delete the active 
request, you get the message you mentioned in the logs, this is not an issue.
If you check the developer panel in chrome, you will see all 3 operations, 
post-get-delete in succession.

On Fri, Sep 9, 2016, 8:48 AM Peter Wicks (pwicks) 
<pwi...@micron.com<mailto:pwi...@micron.com>> wrote:
Running NiFI 1.0.0, I’m listing a queue that has 100k files queued. I’ve 
stopped both the incoming and outgoing processors, so the files are just 
hanging out in the queue, no possible motion.

I get, “The queue has no FlowFiles” message.  Here are the actual responses 
from the REST calls:

POST - Listing-requests
{"listingRequest":{"id":"0cee44de-0157-1000-5668-6e93a465e227","uri":"https://localhost:8443/nifi-api/flowfile-queues/0bacce2d-0157-1000-1a6d-6e0fd84a6bd6/listing-requests/0cee44de-0157-1000-5668-6e93a465e227","submissionTime":"09/09/2016
 03:12:04.318 GMT+00:00","lastUpdated":"03:12:04 
GMT+00:00","percentCompleted":0,"finished":false,"maxResults":100,"state":"Waiting
 for other queue requests to 
complete","queueSize":{"byteCount":2540,"objectCount":10},"sourceRunning":false,"destinationRunning":false}}

GET
{"listingRequest":{&quo

RE: Erroneous Queue has No FlowFiles message

2016-09-09 Thread Peter Wicks (pwicks)

Matt,

You also asked in an earlier email if I could still reproduce it, and if so to 
try enabling DEBUG level logging.  I am able to reproduce, so I enabled it:

2016-09-09 21:27:28,352 DEBUG [List FlowFiles for Connection 
0f620e2d-0157-1000-4a1d-fd988c59e290] o.a.n.controller.StandardFlowFileQueue 
FlowFileQueue[id=0f620e2d-0157-1000-4a1d-fd988c59e290] Acquired lock to perform 
listing of FlowFiles

2016-09-09 21:27:28,353 DEBUG [List FlowFiles for Connection 
0f620e2d-0157-1000-4a1d-fd988c59e290] o.a.n.controller.StandardFlowFileQueue 
FlowFileQueue[id=0f620e2d-0157-1000-4a1d-fd988c59e290] Finished listing 
FlowFiles for active queue with a total of 0 results

2016-09-09 21:27:29,656 INFO [NiFi Web Server-112] 
o.a.n.controller.StandardFlowFileQueue Canceling ListFlowFile Request with ID 
10d92339-0157-1000-42f4-464c37340fdb

Thanks,
  Peter

From: Peter Wicks (pwicks)
Sent: Friday, September 09, 2016 3:15 PM
To: users@nifi.apache.org
Subject: RE: Erroneous Queue has No FlowFiles message

Matt,

PutSQL is the end of the line, no downstream processors.
Batch size is 1000, yes I have fragmented transactions set to false.

nifi.queue.swap.threshold=2

--Peter


From: Matt Gilman [mailto:matt.c.gil...@gmail.com]
Sent: Friday, September 09, 2016 2:23 PM
To: users@nifi.apache.org
Subject: Re: Erroneous Queue has No FlowFiles message

Peter,

Would you be able to share what you've configured for the batch size of PutSQL 
(assuming that 'fragmented transactions' is disabled) and what your swap 
threshold is configured to (nifi.queue.swap.threshold in nifi.properties)?

Also, what is following the PutSQL? Had any of those connections exceeded their 
configured back pressure threshold?

Thanks again.

Matt

On Fri, Sep 9, 2016 at 11:18 AM, Peter Wicks (pwicks) 
<pwi...@micron.com<mailto:pwi...@micron.com>> wrote:
PutSQL.  The 100k FlowFiles are all SQL Insert queries with associated 
attributys, generated by a JSONToSQL processor.

From: Matt Gilman 
[mailto:matt.c.gil...@gmail.com<mailto:matt.c.gil...@gmail.com>]
Sent: Friday, September 09, 2016 8:51 AM

To: users@nifi.apache.org<mailto:users@nifi.apache.org>
Subject: Re: Erroneous Queue has No FlowFiles message

Peter,

What is the processor downstream of the connection in question? Thanks.

Matt

On Fri, Sep 9, 2016 at 10:39 AM, Matt Gilman 
<matt.c.gil...@gmail.com<mailto:matt.c.gil...@gmail.com>> wrote:
Peter,

Thanks for the answers. Still not quite sure what's causing this and am trying 
to narrow down the possible cause. Are you still able to replicate the issue? 
If so, can you enable debug level logging for

org.apache.nifi.controller.StandardFlowFileQueue

and see if there are any meaningful messages in the nifi-app.log?

Thanks!

Matt


On Fri, Sep 9, 2016 at 9:52 AM, Peter Wicks (pwicks) 
<pwi...@micron.com<mailto:pwi...@micron.com>> wrote:
Matt,

This is not a cluster.
Yes, it’s secured. Kerberos.

The thing that gets me is I can list another queue on the same graph/same 
processor group.

--Peter


From: Matt Gilman 
[mailto:matt.c.gil...@gmail.com<mailto:matt.c.gil...@gmail.com>]
Sent: Friday, September 09, 2016 5:25 AM

To: users@nifi.apache.org<mailto:users@nifi.apache.org>
Subject: Re: Erroneous Queue has No FlowFiles message

Peter,

Thanks for the details! These will be very helpful investigating what's 
happening here. A couple follow-up questions...

- Is this a cluster?
- Is this instance secured?

Thanks

Matt

On Fri, Sep 9, 2016 at 12:13 AM, Peter Wicks (pwicks) 
<pwi...@micron.com<mailto:pwi...@micron.com>> wrote:
Gunjan,

Thanks for the response. I included those messages to emphasize the difference 
between a normal Queue List and mine.  In a normal queue list the GET step 
includes a non-empty “flowFileSummaries” array, assuming there are FlowFiles to 
show.
When I list my other queue, the one with 23 FlowFiles in it, I get back an 
array with 23 entries.  Based on the JSON I’m assuming that my queue with 
100,000 files in it should return 100, but instead I get 0.

Thanks,
  Peter

From: Gunjan Dave 
[mailto:gunjanpiyushd...@gmail.com<mailto:gunjanpiyushd...@gmail.com>]
Sent: Thursday, September 08, 2016 9:26 PM
To: users@nifi.apache.org<mailto:users@nifi.apache.org>
Subject: Re: Erroneous Queue has No FlowFiles message


Hi Peter, once you post the request, your first step, you get a listing request 
reference handle UUID as part of response.
This UUID is used to perform the all the operations on the queue.
This UUID is active until a DELETE request is sent. Once you delete the active 
request, you get the message you mentioned in the logs, this is not an issue.
If you check the developer panel in chrome, you will see all 3 operations, 
post-get-delete in succession.

On Fri, Sep 9, 2016, 8:48 AM Peter Wicks (pwicks) 
<pwi...@micron.com<mailto:pwi...@micron.com>> wrote:
Running NiFI 1.0.0, I’m listing a queue that has 100k files

RE: Erroneous Queue has No FlowFiles message

2016-09-09 Thread Peter Wicks (pwicks)

Matt,

I’ve identified the source of the issue, created a patch/unit test, and PR. In 
StandardFlowFileQueue: writeSwapFilesIfNecessary.  When it calculates the 
`numSwapFiles`, if the number of FlowFiles in the queue is perfectly splitable 
(in my case 10/2 = 5) and the Active Queue is empty then ALL files move 
to swap and none are left in Active.

https://github.com/apache/nifi/pull/1000

If you have a chance to take a look at the PR I’d appreciate it.

Thanks,
  Peter

From: Matt Gilman [mailto:matt.c.gil...@gmail.com]
Sent: Friday, September 09, 2016 5:00 PM
To: users@nifi.apache.org
Subject: Re: Erroneous Queue has No FlowFiles message

Peter,

Thanks for the confirmation. I think there is some case were hitting here where 
some flowfiles are being swapped instead of added back to the active queue. The 
queue listing only returns the top 100 entries in the active queue. Haven't 
identified the case that's causing it yet but definitely have a better idea 
what's going on now.

Thanks

Matt

Sent from my iPhone

On Sep 9, 2016, at 5:42 PM, Peter Wicks (pwicks) 
<pwi...@micron.com<mailto:pwi...@micron.com>> wrote:
Matt,

I followed the swapping train of thought and debugged the code. When I debug 
the code where it gets the files the `size` variable looks like this:

FlowFile Queue Size[ ActiveQueue=[0, 0 Bytes], Swap Queue=[10, 2660 
Bytes], Swap Files=[10], Unacknowledged=[0, 0 Bytes] ]

But the List FlowFiles command only looks at the Active queue…

That looks like the root cause, what I don’t know is if this is by design.

--Peter

From: Peter Wicks (pwicks)
Sent: Friday, September 09, 2016 3:28 PM
To: 'users@nifi.apache.org<mailto:users@nifi.apache.org>' 
<users@nifi.apache.org<mailto:users@nifi.apache.org>>
Subject: RE: Erroneous Queue has No FlowFiles message

Matt,

You also asked in an earlier email if I could still reproduce it, and if so to 
try enabling DEBUG level logging.  I am able to reproduce, so I enabled it:

2016-09-09 21:27:28,352 DEBUG [List FlowFiles for Connection 
0f620e2d-0157-1000-4a1d-fd988c59e290] o.a.n.controller.StandardFlowFileQueue 
FlowFileQueue[id=0f620e2d-0157-1000-4a1d-fd988c59e290] Acquired lock to perform 
listing of FlowFiles

2016-09-09 21:27:28,353 DEBUG [List FlowFiles for Connection 
0f620e2d-0157-1000-4a1d-fd988c59e290] o.a.n.controller.StandardFlowFileQueue 
FlowFileQueue[id=0f620e2d-0157-1000-4a1d-fd988c59e290] Finished listing 
FlowFiles for active queue with a total of 0 results

2016-09-09 21:27:29,656 INFO [NiFi Web Server-112] 
o.a.n.controller.StandardFlowFileQueue Canceling ListFlowFile Request with ID 
10d92339-0157-1000-42f4-464c37340fdb

Thanks,
  Peter

From: Peter Wicks (pwicks)
Sent: Friday, September 09, 2016 3:15 PM
To: users@nifi.apache.org<mailto:users@nifi.apache.org>
Subject: RE: Erroneous Queue has No FlowFiles message

Matt,

PutSQL is the end of the line, no downstream processors.
Batch size is 1000, yes I have fragmented transactions set to false.

nifi.queue.swap.threshold=2

--Peter


From: Matt Gilman [mailto:matt.c.gil...@gmail.com]
Sent: Friday, September 09, 2016 2:23 PM
To: users@nifi.apache.org<mailto:users@nifi.apache.org>
Subject: Re: Erroneous Queue has No FlowFiles message

Peter,

Would you be able to share what you've configured for the batch size of PutSQL 
(assuming that 'fragmented transactions' is disabled) and what your swap 
threshold is configured to (nifi.queue.swap.threshold in nifi.properties)?

Also, what is following the PutSQL? Had any of those connections exceeded their 
configured back pressure threshold?

Thanks again.

Matt

On Fri, Sep 9, 2016 at 11:18 AM, Peter Wicks (pwicks) 
<pwi...@micron.com<mailto:pwi...@micron.com>> wrote:
PutSQL.  The 100k FlowFiles are all SQL Insert queries with associated 
attributys, generated by a JSONToSQL processor.

From: Matt Gilman 
[mailto:matt.c.gil...@gmail.com<mailto:matt.c.gil...@gmail.com>]
Sent: Friday, September 09, 2016 8:51 AM

To: users@nifi.apache.org<mailto:users@nifi.apache.org>
Subject: Re: Erroneous Queue has No FlowFiles message

Peter,

What is the processor downstream of the connection in question? Thanks.

Matt

On Fri, Sep 9, 2016 at 10:39 AM, Matt Gilman 
<matt.c.gil...@gmail.com<mailto:matt.c.gil...@gmail.com>> wrote:
Peter,

Thanks for the answers. Still not quite sure what's causing this and am trying 
to narrow down the possible cause. Are you still able to replicate the issue? 
If so, can you enable debug level logging for

org.apache.nifi.controller.StandardFlowFileQueue

and see if there are any meaningful messages in the nifi-app.log?

Thanks!

Matt


On Fri, Sep 9, 2016 at 9:52 AM, Peter Wicks (pwicks) 
<pwi...@micron.com<mailto:pwi...@micron.com>> wrote:
Matt,

This is not a cluster.
Yes, it’s secured. Kerberos.

The thing that gets me is I can list another queue on the same graph/same 
processor group.

--Peter


From:

RE: Download item from queue - what permission is required?

2016-09-20 Thread Peter Wicks (pwicks)

Andre/Matt,

Sorry, my memory was wrong. My experience matches Andre’s, it only errors when 
I click Download; View is fine.

We are running a customized build of 1.0 and I made the assumption that this 
was an issue caused by a bad merge on our part and wasn’t paying it much 
attention. I have not submitted a JIRA ticket.

We are not clustered, running Kerberos for authentication.

Thanks,
  Peter


From: Matt Gilman [mailto:matt.c.gil...@gmail.com]
Sent: Tuesday, September 20, 2016 9:55 AM
To: users@nifi.apache.org
Subject: Re: Download item from queue - what permission is required?

Downloading and viewing should be the same permissions. If you're seeing 
otherwise please file a JIRA with the details. Is the instance clustered, what 
permissions to you have set on the source component, etc?

Andre,

The 'view the data' is the correct policy that you need to configure. Is your 
instance clustered or are there anything proxying user requests? And endpoint 
that will be transferring 'data' (or 'metadata' like flow file attributes) will 
require that every link is the chain has the 'view the data' policy enabled. 
This ensures that every system between the user and NiFi is authorized to have 
the data.

Let me know if that helps.

Matt

On Tue, Sep 20, 2016 at 11:41 AM, Andre 
<andre-li...@fucs.org<mailto:andre-li...@fucs.org>> wrote:
Peter,

Quite curious as I am able to view the flowfile but unable to download it.

Seems something we should either document (how to setup properly) or to fix in 
the next release.

Have you already raised a JIRA?


On Wed, Sep 21, 2016 at 12:30 AM, Peter Wicks (pwicks) 
<pwi...@micron.com<mailto:pwi...@micron.com>> wrote:
No help here, except to share that I’ve also seen this error.  I’ve been 
working around it by downloading the FlowFile instead of viewing it.

From: Andre [mailto:andre-li...@fucs.org<mailto:andre-li...@fucs.org>]
Sent: Monday, September 19, 2016 11:18 PM
To: users@nifi.apache.org<mailto:users@nifi.apache.org>
Subject: Download item from queue - what permission is required?

Hi there,


I am puzzled but one of 1.0.0 features. I had some flowfiles in the queue and 
as customary I did a list queue.

Flowfile was in there, attributes in perfect shape. Yet when I try to download 
the data of the flowfile (i.e. click the download button) it reports I don't 
have permissions.

I would assume the permissions required would be "view the data"?


Cheers

RE: Does NiFi support multiple queries

2016-09-21 Thread Peter Wicks (pwicks)

Karthik,

PutSQL will handle both styles, and for multiple tables, without issue.

Internally it creates a separate SQL Batch for each distinct SQL statement in 
the queue and then executes these batches separately.  Feel free to mix as many 
Inserts/Updates as you wish for as many tables as you wish.

Thanks,
  Peter

From: Karthik Ramakrishnan [mailto:karthik.ramakrishna...@gmail.com]
Sent: Tuesday, September 20, 2016 9:57 PM
To: users@nifi.apache.org
Subject: Does NiFi support multiple queries

Hello -

I was wondering if NiFi can support multiple queries in the same PutSQL 
processor. For example, if an attribute is set to 'update' - will PutSQL run 
the defined update query and next time when it is an 'insert' - it runs the 
insert query. Or should we go ahead and add two separate processors and make a 
decision on the RouteAttributes processor? Any thoughts would be welcome!!

TIA!!

--
Thanks,
Karthik Ramakrishnan
Data Services Intern
Copart Inc.
Contact : +1 (469) 951-8854

PutHiveQL Multiple Ordered Statements

2016-09-23 Thread Peter Wicks (pwicks)

I have a PutHDFS processor drop a file, I then have a long chain of ReplaceText 
-> PutHiveQL processors that runs a series of steps.
The below ~4 steps allow me to take the file generated by NiFi in one format 
and move it into the final table, which is ORC with several Timestamp columns 
(thus why I'm not using AvroToORC, since I'd lose my Timestamps.

The exact HQL, all in one block, is roughly:

DROP TABLE `db.tbl_${filename}`;

CREATE TABLE ` db.tbl _${filename}`(
   Some list of columns goes here that exactly matches the schema of 
`prod_db.tbl`
)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '\001'
STORED AS TEXTFILE;
 LOAD DATA INPATH '${absolute.hdfs.path}/${filename}' INTO TABLE ` db.tbl 
_${filename}`;
 INSERT INTO `prod_db.tbl`
SELECT * FROM ` db.tbl _${filename}`;
DROP TABLE ` db.tbl _${filename}`;

Right now I'm having to split this into 5 separate ReplaceText steps, each one 
followed by a PutHiveQL.  Is there a way I can push a multi-statement, order 
dependent, script like this to Hive in a simpler way?

Thanks,
  Peter

RE: Nifi UI configured to be accessed over HTTPS not displayed in Internet Explorer

2016-09-22 Thread Peter Wicks (pwicks)

Nicolas,

According to the NiFi users 
guide,
 you need to be running on Internet Explorer Edge browser to get NiFi working 
in iE.  If you are running IE10 or IE11 it just isn't going to work because of 
the lack of functionality in those versions.

Not sure with which version of Windows you can start installing Edge, but I 
don't believe it's available on Win7.

Thanks,
  Peter

From: Provenzano Nicolas [mailto:nicolas.provenz...@gfi.fr]
Sent: Thursday, September 22, 2016 1:25 AM
To: users@nifi.apache.org
Subject: Nifi UI configured to be accessed over HTTPS not displayed in Internet 
Explorer

Hi all,

I installed a Nifi 1.0.0 instance and configured the User Interface to be 
accessed over HTTPS.

I installed the client certificate in IE and Firefox.

Everything works fine with Firefox when connecting to https://host1:9443/nifi/.

When I try to connect using IE, the tab is correctly renamed "Nifi" but the 
Canvas is not displayed. Instead, a simple blue screen is displayed.

Did someone else have the same issue ?

Thanks and regards,

Nicolas

RE: Requesting Obscene FlowFile Batch Sizes

2016-09-20 Thread Peter Wicks (pwicks)

Andy/Bryan,

Thanks for all of the detail, it’s been helpful.
I actually did an experiment this morning where I modified the processor to 
force it to keep calling `get` until it had all 1 million FlowFiles.  Since I 
was calling it sequentially it was able to move files out of swap and into 
active on each request. I was able to retrieve them and process them through, 
which was great until… NiFi tried to move them through provenance.  At that 
point NiFi ran out of memory and fell over (stopped responding).  Right before 
NiFi ran out of memory I received several bulletins related to Provenance being 
written to too quickly, and that it was being slowed down.

I found another solution to my mass insert and got it up and running. Using a 
Teradata JDBC proprietary flag called FastLoadCSV, and a new custom processor, 
I was able to pass in a CSV file to my JDBC driver and get the same result.  In 
this scenario there was just a single FlowFile and everything went smoothly.

Thanks again!

Peter Wicks



From: Bryan Bende [mailto:bbe...@gmail.com]
Sent: Tuesday, September 20, 2016 3:38 PM
To: users@nifi.apache.org
Subject: Re: Requesting Obscene FlowFile Batch Sizes

Andy,

That was my thinking. An easy test might be to bump the threshold up to 100k 
(increase heap if needed) and see if it starts grabbing 100k every time.

If it does then I would think it is swapping related, then need to figure out 
if you really want to get all 1 million in a single batch, and if theres enough 
heap to support that.

-Bryan

On Tue, Sep 20, 2016 at 5:29 PM, Andy LoPresto 
<alopre...@apache.org<mailto:alopre...@apache.org>> wrote:
Bryan,

That’s a good point. Would running with a larger Java heap and higher swap 
threshold allow Peter to get larger batches out?

Andy LoPresto
alopre...@apache.org<mailto:alopre...@apache.org>
alopresto.apa...@gmail.com<mailto:alopresto.apa...@gmail.com>
PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4  BACE 3C6E F65B 2F7D EF69

On Sep 20, 2016, at 1:41 PM, Bryan Bende 
<bbe...@gmail.com<mailto:bbe...@gmail.com>> wrote:

Peter,

Does 10k happen to be your swap threshold in nifi.properties by any chance (it 
defaults to 20k I believe)?

I suspect the behavior you are seeing could be due to the way swapping works, 
but Mark or others could probably confirm.

I found this thread where Mark explained how swapping works with a background 
thread, and I believe it still works this way:
http://apache-nifi.1125220.n5.nabble.com/Nifi-amp-Spark-receiver-performance-configuration-td524.html

-Bryan

On Tue, Sep 20, 2016 at 10:22 AM, Peter Wicks (pwicks) 
<pwi...@micron.com<mailto:pwi...@micron.com>> wrote:
I’m using JSONToSQL, followed by PutSQL.  I’m using Teradata, which supports a 
special JDBC mode called FastLoad, designed for a minimum of 100,000 rows of 
data per batch.

What I’m finding is that when PutSQL requests a new batch of FlowFiles from the 
queue, which has over 1 million rows in it, with a batch size of 100, it 
always returns a maximum of 10k.  How can I get my obscenely sized batch 
request to return all the FlowFile’s I’m asking for?

Thanks,
  Peter

RE: Download item from queue - what permission is required?

2016-09-20 Thread Peter Wicks (pwicks)

No help here, except to share that I’ve also seen this error.  I’ve been 
working around it by downloading the FlowFile instead of viewing it.

From: Andre [mailto:andre-li...@fucs.org]
Sent: Monday, September 19, 2016 11:18 PM
To: users@nifi.apache.org
Subject: Download item from queue - what permission is required?

Hi there,

I am puzzled but one of 1.0.0 features. I had some flowfiles in the queue and 
as customary I did a list queue.

Flowfile was in there, attributes in perfect shape. Yet when I try to download 
the data of the flowfile (i.e. click the download button) it reports I don't 
have permissions.

I would assume the permissions required would be "view the data"?

Cheers

Kill-and-Fill Pattern?

2016-08-26 Thread Peter Wicks (pwicks)

I have a source SQL table that I'm reading with a SQL select statement.  I want 
to kill and fill a destination SQL table with this source data on an interval.

My non kill-and-fill pattern is: ExecuteSQL -> Avro To JSON -> JSON To SQL -> 
PutSQL.

I'm trying to come up with a good way to delete existing data first before 
loading new data.
One option I've considered is to mark the original Avro file with a UUID and 
add this attribute as a field in the destination table; then do a split off, 
ReplaceText, and delete all rows where the UUID doesn't match this batch.  I 
think this could work, but I'm worried about timing the SQL DELETE.  I kind of 
want the kill and the fill steps to happen in a single transaction.

The other issue is what happens if PutSQL has to go down for a while due to 
database downtime and I get several kill-and-fill batches piled up.  Is there a 
way I can use backpressure to make sure only a single file gets converted from 
JSON to SQL at a time in order to avoid mixing batches?
I also considered FlowFile expiration, but is there a way I can tell it NiFI to 
only expire a FlowFile when a new FlowFile has entered the queue? Ex: 1 flow 
file in queue, no expiration occurs. 2nd (newer) FlowFile enters queue then 
first file will expire itself.

Thanks,
  Peter

Can't export Templates from NiFi 1.0.0 BETA

2016-08-24 Thread Peter Wicks (pwicks)

I've created a template in NiFi 1.0.0 BETA, but am unable to export it.
Security is enabled, Kerberos.  I get the following error:

Unable to perform the desired action due to insufficient permissions. Contact 
the system administrator.

I've tried giving myself and a group containing me both `view the component` 
and `modify the component` access to the template; this made no difference. I 
am able to place new copies of the same template without issue.

I spun up NiFi 1.0.0 from source, so not an exact replica... with no 
authentication and it exported fine, but with no auth enabled and an auth error 
message that isn't too surprising.

I can submit a bug, but thought I'd check if there was more I should try.

Thanks,
  Peter

RE: Access denied for kerberos users

2016-09-26 Thread Peter Wicks (pwicks)

Nicolas,

If Bryan’s suggestion doesn’t work (and he’s probably correct), you may not 
have named your user correctly in NiFi.  Go try to authenticate again, then go 
to {nifi install directory}/logs and look at the end of nif-user.log.  You 
should see more details about your authentication request and what name it 
tried to use to authenticate you. This was how I worked around getting my 
naming conventions to match.

In my case I had enabled “Identity Mapping Properties” in nifi.properties so 
that I could use both certificates and Kerberos, but had forgotten to rename 
the account objects I had already added to NiFi.

Thanks,
  Peter



From: Bryan Bende [mailto:bbe...@gmail.com]
Sent: Monday, September 26, 2016 10:14 AM
To: users@nifi.apache.org
Subject: Re: Access denied for kerberos users

Hello,

Since you are getting to "insufficient permissions" page this means that NiFi 
successfully authenticated your user against the KDC, but then the authorizer 
in NiFi said the user didn't have permissions for something.

What policies did you grant to the kerberos user in NiFi?

At a minimum they need a policy for "view the user interface" from the global 
policies in the top-right menu.

-Bryan

On Mon, Sep 26, 2016 at 11:43 AM, Provenzano Nicolas 
> wrote:
Hi all,

I configured an 1.0.0 NIFI instance to use Kerberos services for authentication.

I can connect to the UI using the certificate corresponding to the user 
declared in the Initial Admin Identity.

However, when I try to connect using a user declared in the Kerberos server :


1.   Based on some docs, I should be able to submit a request to get access 
to the UI. It’s not the case.

2.   Using the initial admin user, I created a user in Nifi and add in some 
profiles.

However, I still have the following message :

“Access Denied
Unable to perform the desired action due to insufficient permissions. Contact 
the system administrator.”

The user is correctly declared in the Kerberos server. When it is not, a pop-up 
displays :
The supplied username and password are not valid.
Have someone already met this issue ?

Thanks in advance

BR

Nicolas

Enable Compression on Remote Port?

2016-11-10 Thread Peter Wicks (pwicks)

When I have a Remote Process Group and I view its Remote Ports I can see that 
all my ports show "Compressed" as No.  How can I change this so that the ports 
use compression?

RE: How to increase the processing speed of the ExtractText and ReplaceText Processor?

2016-10-24 Thread Peter Wicks (pwicks)

Prabhu,

Lee mentioned making sure you have good indexes, but I would caution you on 
this point.  If you have a unique constraint then SQL Server will build an 
index on this automatically, but I would suggest dropping all other indexes 
that aren’t related to data integrity. Each time SQL Server updates a column 
that is indexed it’s going to be updating that index also.  This will add a lot 
of overhead.

You might be thinking that you need these indexes though for user queries. To 
work around this I often see the use of a staging table. This table has no 
indexes beyond the absolute minimum to ensure data integrity, and sometimes 
even these are removed and data integrity/duplicate removal is handled through 
the use of SQL or a Stored Procedure.  A periodic job will move all data from 
this staging table into a final table.  If you execute the copy and a truncate 
in a single transaction it allows you to do this safely:

INSERT INTO “Some_Final_Table” SELECT * FROM 
“Staging_Table_With_Exact_Same_schema”;TRUNCATE TABLE 
“Staging_Table_With_Exact_Same_schema”;

If you do it this way you can keep the indexes you need for user access while 
still allowing maximum data throughput to SQL Server.

I’ve seen a lot of comments online about batch sizing around 500 being optimal, 
but of course this will vary on the system configuration; both your NiFi server 
and the SQL Server.

I have had issues getting good performance out of PutSQL even with the above, I 
don’t think this is the fault of the processor, but more due to the volume of 
data and JDBC batch row processing not really being designed for this kind of 
volume. In my case I was trying to push about 10M rows over a longer time 
period, but was still running into trouble. After working on the issue for a 
while I found that a database specific loader was needed. I am loading to 
Teradata, so I wrote up a Teradata FastLoad processor.  In your case the MS SQL 
Server JDBC Driver includes a `SQLServerBulkCopy` loader, 
https://msdn.microsoft.com/en-us/library/mt221490%28v=sql.110%29.aspx.  
Unfortunately, this would require writing code either through a scripted 
processor, or as a whole new processor.

Since writing a custom processor may be more than you want to jump into right 
now you should probably take a look at `bcp`.  I didn’t catch if you were on 
Windows or a Unix platform, but if you are on Windows I’d check out the command 
line based Bulk Copy Program for MS SQL: 
https://msdn.microsoft.com/en-us/library/ms162802.aspx.  Using this would allow 
you to prepare your data into an intermediary format, like CSV first, then send 
it all at once through `bcp`.


Thanks,
  Peter Wicks

From: Lee Laim [mailto:lee.l...@gmail.com]
Sent: Monday, October 24, 2016 7:17 AM
To: users@nifi.apache.org
Subject: Re: How to increase the processing speed of the ExtractText and 
ReplaceText Processor?

Hello Prabhu,

50 minutes is a good start! Now we have to determine where the next bottleneck 
is -check to see where the flow files are queueing.  You can also check the 
"average task duration" statistic for each processor.  I suspect the bottleneck 
is at  PutSQL and will carry this assumption forward.

There are several knobs you can adjust at the assumed PutSQL bottleneck:
1.  Increase the run duration and keep the PutSQL processor running for 2 
seconds before releasing the thread.
2. Set Fragmented Transactions to false.  This removes constraints that take 
time to check.
3. Consider changing batch size, systematically and observe throughput changes. 
I'd move up in increments of 100.
4*. Increase the number of concurrent tasks for the bottleneck processor to 3 
or higher.  Increase, systematically to observe if you get more flow files 
through.   You can increase the max timer driven threads of the NiFi instance 
in the NiFi Flow Settings (top right of the canvas).  you can set the max to 
25, but you are truly limited by hardware here. Consider a more powerful system 
to manage this flow, especially with the time constraint you need. It is often 
easier to throw more hardware at the problem than to debug.
Other areas:
5. On the output of the last SplitText processor,  Invoke back pressure object 
threshold = 1.  This will slow (temporarily stop) the first split text 
processor and reduce the number of overall flow files to manage.  It also 
reduces the NiFi processor demand for the cpu threads.
6. Increase nifi.queue.swap.threshold in nifi.properties-  reduce disk access.
7. Check connection/load on the SQL server.

To address your queries,I used the same expression you provided: 
(.+)[|](.+)[|](.+)[|](.+)
You can use an ExtractStreamCommand processor to 'extract text', but in this 
case, with small flow files, it won't offer much gain.

*With an i5 processor, you have 4 cpu threads to process flow files, manage 
NiFi, read/write to disk, and handle all other non-NiFi processes.  Moving to 
an i7 or Xeon, hyper threading will provide NiFi more resources

RE: Enable Compression on Remote Port?

2016-11-11 Thread Peter Wicks (pwicks)

Thanks Matt, I’m in a position to update, but I’ll probably wait for 1.1 as I 
already have a Compressor processor hooked up on both ends.

From: Matt Gilman [mailto:matt.c.gil...@gmail.com]
Sent: Friday, November 11, 2016 12:39 PM
To: users@nifi.apache.org
Subject: Re: Enable Compression on Remote Port?

Peter,

Sorry for the inconvenience. Unfortunately, the issue you're running into is a 
known issue in the 1.0.0 release [1]. Are you possibly in a position to run a 
SNAPSHOT build based off the current state of master. This should have the 
patch incorporated. Alternatively, the 1.1.0 release candidate should be 
created soon. We are in the process of closing down the remaining JIRAs 
assigned to the next release.

Matt

[1] https://issues.apache.org/jira/browse/NIFI-2687

On Fri, Nov 11, 2016 at 9:37 AM, Peter Wicks (pwicks) 
<pwi...@micron.com<mailto:pwi...@micron.com>> wrote:
Andrew, that did enable the option, but when I change it to Compressed and save 
I get the error, “Unable to find remote process group with id 
'01571036-21a9-139e-ce64-52eca7af6646'”.  This is very odd since this is a 
working connection…

Thanks,
  Peter

From: Andrew Grande [mailto:apere...@gmail.com<mailto:apere...@gmail.com>]
Sent: Friday, November 11, 2016 6:26 AM
To: users@nifi.apache.org<mailto:users@nifi.apache.org>
Subject: Re: Enable Compression on Remote Port?

Disable transmission on RPG, go into the ports view again. Now, you should be 
able to modify settings like compression, concurrent threads and security on 
ports.

Andrew

On Thu, Nov 10, 2016, 2:26 PM Peter Wicks (pwicks) 
<pwi...@micron.com<mailto:pwi...@micron.com>> wrote:
When I have a Remote Process Group and I view its Remote Ports I can see that 
all my ports show “Compressed” as No.  How can I change this so that the ports 
use compression?

Failing to Start NiFi 1.1.0, OverlappingFileLockException

2016-12-20 Thread Peter Wicks (pwicks)

I've successfully upgraded my DEV and TEST environments from NiFi 1.0.0 to NiFi 
1.1.0. So I felt comfortable upgrading PROD until...
I completed all the work and went to start the server, but am receiving the 
below stack trace.  I dug through the code a bit and found that it's locking a 
file named wali.lock, so I tried deleting that file and starting NiFi up again 
but got the same stack dump.  The lock file did get recreated on the next run.

Our version of NiFi 1.1.0 is a couple of commits newer than official due to a 
merged branch (rather than a cherry pick). I don't have the exact commit we are 
running, but we haven't had this issue in our other environments using the same 
code.

---BEGIN STACK TRACE---
org.springframework.beans.factory.BeanCreationException: Error creating bean 
with name 'flowService': FactoryBean threw exception on object creation; nested 
exception is org.springframework.beans.factory.BeanCreationException: Error 
creating bean with name 'flowController': FactoryBean threw exception on object 
creation; nested exception is java.lang.RuntimeException: 
java.nio.channels.OverlappingFileLockException
at 
org.springframework.beans.factory.support.FactoryBeanRegistrySupport.doGetObjectFromFactoryBean(FactoryBeanRegistrySupport.java:175)
 ~[na:na]
at 
org.springframework.beans.factory.support.FactoryBeanRegistrySupport.getObjectFromFactoryBean(FactoryBeanRegistrySupport.java:103)
 ~[na:na]
at 
org.springframework.beans.factory.support.AbstractBeanFactory.getObjectForBeanInstance(AbstractBeanFactory.java:1585)
 ~[na:na]
at 
org.springframework.beans.factory.support.AbstractBeanFactory.doGetBean(AbstractBeanFactory.java:254)
 ~[na:na]
at 
org.springframework.beans.factory.support.AbstractBeanFactory.getBean(AbstractBeanFactory.java:202)
 ~[na:na]
at 
org.springframework.context.support.AbstractApplicationContext.getBean(AbstractApplicationContext.java:1060)
 ~[na:na]
at 
org.apache.nifi.web.contextlistener.ApplicationStartupContextListener.contextDestroyed(ApplicationStartupContextListener.java:103)
 ~[na:na]
at 
org.eclipse.jetty.server.handler.ContextHandler.callContextDestroyed(ContextHandler.java:845)
 ~[na:na]
at 
org.eclipse.jetty.servlet.ServletContextHandler.callContextDestroyed(ServletContextHandler.java:546)
 ~[na:na]
at 
org.eclipse.jetty.server.handler.ContextHandler.stopContext(ContextHandler.java:826)
 ~[na:na]
at 
org.eclipse.jetty.servlet.ServletContextHandler.stopContext(ServletContextHandler.java:356)
 ~[na:na]
at 
org.eclipse.jetty.webapp.WebAppContext.stopWebapp(WebAppContext.java:1410) 
~[na:na]
at 
org.eclipse.jetty.webapp.WebAppContext.stopContext(WebAppContext.java:1374) 
~[na:na]
at 
org.eclipse.jetty.server.handler.ContextHandler.doStop(ContextHandler.java:874) 
~[na:na]
at 
org.eclipse.jetty.servlet.ServletContextHandler.doStop(ServletContextHandler.java:272)
 ~[na:na]
at 
org.eclipse.jetty.webapp.WebAppContext.doStop(WebAppContext.java:544) ~[na:na]
at 
org.eclipse.jetty.util.component.AbstractLifeCycle.stop(AbstractLifeCycle.java:89)
 ~[na:na]
at 
org.eclipse.jetty.util.component.ContainerLifeCycle.stop(ContainerLifeCycle.java:143)
 ~[na:na]
at 
org.eclipse.jetty.util.component.ContainerLifeCycle.doStop(ContainerLifeCycle.java:161)
 ~[na:na]
at 
org.eclipse.jetty.server.handler.AbstractHandler.doStop(AbstractHandler.java:73)
 ~[na:na]
at 
org.eclipse.jetty.util.component.AbstractLifeCycle.stop(AbstractLifeCycle.java:89)
 ~[na:na]
at 
org.eclipse.jetty.util.component.ContainerLifeCycle.stop(ContainerLifeCycle.java:143)
 ~[na:na]
at 
org.eclipse.jetty.util.component.ContainerLifeCycle.doStop(ContainerLifeCycle.java:161)
 ~[na:na]
at 
org.eclipse.jetty.server.handler.AbstractHandler.doStop(AbstractHandler.java:73)
 ~[na:na]
at 
org.eclipse.jetty.util.component.AbstractLifeCycle.stop(AbstractLifeCycle.java:89)
 ~[na:na]
at 
org.eclipse.jetty.util.component.ContainerLifeCycle.stop(ContainerLifeCycle.java:143)
 ~[na:na]
at 
org.eclipse.jetty.util.component.ContainerLifeCycle.doStop(ContainerLifeCycle.java:161)
 ~[na:na]
at 
org.eclipse.jetty.server.handler.AbstractHandler.doStop(AbstractHandler.java:73)
 ~[na:na]
at org.eclipse.jetty.server.Server.doStop(Server.java:482) ~[na:na]
at 
org.eclipse.jetty.util.component.AbstractLifeCycle.stop(AbstractLifeCycle.java:89)
 ~[na:na]
at org.apache.nifi.web.server.JettyServer.stop(JettyServer.java:854) 
~[na:na]
at org.apache.nifi.NiFi.shutdownHook(NiFi.java:187) 
[nifi-runtime-1.1.0.jar:1.1.0]
at org.apache.nifi.NiFi$2.run(NiFi.java:88) 
[nifi-runtime-1.1.0.jar:1.1.0]
at java.lang.Thread.run(Thread.java:745) [na:1.8.0_101]
Caused by: org.springframework.beans.factory.BeanCreationException: Error 
creating bean with name 'flowController': FactoryBean

RE: Failing to Start NiFi 1.1.0, OverlappingFileLockException

2016-12-21 Thread Peter Wicks (pwicks)

Mark,

We just got this working this morning. During a configuration file review late 
yesterday (we used a new config file and copied over settings) it was 
discovered that I hadn’t put in a Port number for Jetty to run on… 
nifi.web.https.port was left blank instead of providing a port number, and the 
regular http port number was also blank.  For some reason this lead to this 
error, and I have no idea why.  I updated this property to 8443 and now it’s 
running fine.

This seemed really weird to me, so I took my local build from source but I only 
got helpful error messages like, “Remote input HTTPS is enabled but 
nifi.web.https.port is not specified”. So I’m not really sure what combination 
of factors lead to this vague error.

From: Mark Payne [mailto:marka...@hotmail.com]
Sent: Wednesday, December 21, 2016 6:34 AM
To: users@nifi.apache.org
Subject: Re: Failing to Start NiFi 1.1.0, OverlappingFileLockException

Hey Peter,

The FlowFile repository obtains a lock to ensure that no other process is using 
that directory.
Getting an OverlappingFileLockException means that there is actually another 
process that
has a lock. Can you verify that no other instance of NiFi is running on the 
node? If possible
would recommend running "ps -ef | grep nifi" as root (assuming that you're 
running on Linux).
This would ensure that no other user has started the nifi process.

Thanks
-Mark


On Dec 20, 2016, at 4:05 PM, Peter Wicks (pwicks) 
<pwi...@micron.com<mailto:pwi...@micron.com>> wrote:

I’ve successfully upgraded my DEV and TEST environments from NiFi 1.0.0 to NiFi 
1.1.0. So I felt comfortable upgrading PROD until…
I completed all the work and went to start the server, but am receiving the 
below stack trace.  I dug through the code a bit and found that it’s locking a 
file named wali.lock, so I tried deleting that file and starting NiFi up again 
but got the same stack dump.  The lock file did get recreated on the next run.

Our version of NiFi 1.1.0 is a couple of commits newer than official due to a 
merged branch (rather than a cherry pick). I don’t have the exact commit we are 
running, but we haven’t had this issue in our other environments using the same 
code.

---BEGIN STACK TRACE---
org.springframework.beans.factory.BeanCreationException: Error creating bean 
with name 'flowService': FactoryBean threw exception on object creation; nested 
exception is org.springframework.beans.factory.BeanCreationException: Error 
creating bean with name 'flowController': FactoryBean threw exception on object 
creation; nested exception is java.lang.RuntimeException: 
java.nio.channels.OverlappingFileLockException
at 
org.springframework.beans.factory.support.FactoryBeanRegistrySupport.doGetObjectFromFactoryBean(FactoryBeanRegistrySupport.java:175)
 ~[na:na]
at 
org.springframework.beans.factory.support.FactoryBeanRegistrySupport.getObjectFromFactoryBean(FactoryBeanRegistrySupport.java:103)
 ~[na:na]
at 
org.springframework.beans.factory.support.AbstractBeanFactory.getObjectForBeanInstance(AbstractBeanFactory.java:1585)
 ~[na:na]
at 
org.springframework.beans.factory.support.AbstractBeanFactory.doGetBean(AbstractBeanFactory.java:254)
 ~[na:na]
at 
org.springframework.beans.factory.support.AbstractBeanFactory.getBean(AbstractBeanFactory.java:202)
 ~[na:na]
at 
org.springframework.context.support.AbstractApplicationContext.getBean(AbstractApplicationContext.java:1060)
 ~[na:na]
at 
org.apache.nifi.web.contextlistener.ApplicationStartupContextListener.contextDestroyed(ApplicationStartupContextListener.java:103)
 ~[na:na]
at 
org.eclipse.jetty.server.handler.ContextHandler.callContextDestroyed(ContextHandler.java:845)
 ~[na:na]
at 
org.eclipse.jetty.servlet.ServletContextHandler.callContextDestroyed(ServletContextHandler.java:546)
 ~[na:na]
at 
org.eclipse.jetty.server.handler.ContextHandler.stopContext(ContextHandler.java:826)
 ~[na:na]
at 
org.eclipse.jetty.servlet.ServletContextHandler.stopContext(ServletContextHandler.java:356)
 ~[na:na]
at 
org.eclipse.jetty.webapp.WebAppContext.stopWebapp(WebAppContext.java:1410) 
~[na:na]
at 
org.eclipse.jetty.webapp.WebAppContext.stopContext(WebAppContext.java:1374) 
~[na:na]
at 
org.eclipse.jetty.server.handler.ContextHandler.doStop(ContextHandler.java:874) 
~[na:na]
at 
org.eclipse.jetty.servlet.ServletContextHandler.doStop(ServletContextHandler.java:272)
 ~[na:na]
at 
org.eclipse.jetty.webapp.WebAppContext.doStop(WebAppContext.java:544) ~[na:na]
at 
org.eclipse.jetty.util.component.AbstractLifeCycle.stop(AbstractLifeCycle.java:89)
 ~[na:na]
at 
org.eclipse.jetty.util.component.ContainerLifeCycle.stop(ContainerLifeCycle.java:143)
 ~[na:na]
at 
org.eclipse.jetty.util.component.ContainerLifeCycle.doStop(ContainerLifeCycle.java:161)
 ~[na:na]
at 
org.eclipse.jetty.server.handler.

RE: Visual Indicator for "Can't run because there are no threads"?

2017-03-06 Thread Peter Wicks (pwicks)

Joe,

In my case I had not seen the issue until I added 7 new 
QueryDatabaseProcessor's. All seven of them kicked off against the same SQL 
database on restart and took 10 to 15 minutes to come back.  During that time 
my default 10 threads was running with only 3 to spare, which were being shared 
across a lot of other jobs.  I bumped it up considerably and have not had 
issues since then.

--Peter

-Original Message-
From: Joe Witt [mailto:joe.w...@gmail.com] 
Sent: Friday, March 03, 2017 3:02 PM
To: users@nifi.apache.org
Subject: Re: Visual Indicator for "Can't run because there are no threads"?

Peter,

That is a good idea and I don't believe there is any existing JIRAs to do so.  
But the idea makes a lot of sense.  Being so thread starved that processors do 
not get to run for extended periods of time is pretty unique.  Makes me think 
that the flow has processors which are not honoring the model but are rather 
more acting like greedy thread daemons.  That should also be considered.  But 
even with that said I could certainly see how it would be helpful to know that 
a processor is running less often than it would like due to lack of available 
threads rather than just backpressure.

Thanks
Joe

On Fri, Mar 3, 2017 at 4:57 PM, Peter Wicks (pwicks) <pwi...@micron.com> wrote:
> I think everyone was really happy when backpressure finally got super 
> great indicators.  Backpressure used to be my #1, “Why isn’t stuff moving?”
> problem.  My latest issue is there are no free threads, sometimes for 
> hours, and I don’t notice and start wondering what’s going on.
>
>
>
> Is there anything under consideration for an indicator to show how 
> many processors can’t run because there aren’t enough threads 
> available? I can create a ticket, wasn’t sure if there was one floating 
> around.

RE: [EXT] NiFi 1.3: Simplest way possible of creating CSV files from SQL queries

2017-08-01 Thread Peter Wicks (pwicks)

I hate to respond with “me too”, but I haven’t seen a response and this kind of 
simplification is of interest to me.

The PutDatabaseRecord processor already does something similar, and I have only 
needed the AvroReader processor without a schema registry.

From: Márcio Faria [mailto:faria.mar...@ymail.com]
Sent: Tuesday, July 25, 2017 11:09 AM
To: Users 
Subject: [EXT] NiFi 1.3: Simplest way possible of creating CSV files from SQL 
queries

Hi,

I'm looking for the simplest way possible of creating CSV files from SQL 
queries using Apache NiFi 1.3.

The flow I currently have (the files are to be SFTP'ed to a remote server):

ExecuteSQL -> UpdateAttribute -> ConversionRecord [3 CSs] -> PutSFTP

The concept of SchemaRegistry is new to me, but if I understood it correctly in 
order for the ConversionRecord to work properly is necessary to have 3 
Controller Services ([3 CSs]) associated with it:

  *   AvroSchemaRegistry, with the schema defined in Avro Schema (JSON);
  *   AvroReader, referring to the above schema;
  *   CSVRecordSetWriter, also referring to the same schema.

It seems there are many benefits in using the schema registry, including 
versioning, validation, etc, but in my example a simpler configuration would be 
welcome.

Isn't the schema already defined by ExecuteSQL? Can I have the ConversionRecord 
alone with no dedicated SchemaRegistry (property), AvroReader,(instance), or 
CSVRecordSetWriter (instance)? Of course, we'd still need to specify the output 
is a CSV, so perhaps a shared CSVRecordSetWriter that also gets its schema from 
the flow file would still be useful.

By the way, would the Schema Access Strategy named "Use Embedded Avro Schema" 
be part of a simpler solution? How?

In the same vein, what about having the schema-name property optionally defined 
by the ExecuteSQL itself, so we don't have to depend on the UpdateAttribute 
component?

In summary, I'm wondering if it's possible to have 3 (+ 1 generic) components 
instead of 6 per query:

ExecuteSQL -> ConversionRecord [CSVRecordSetWriter] -> PutSFTP

That would make a difference when defining multiple conversions from SQL to 
CSV, or other equivalent flows.

In addition, consider that someone might want to have maximum flexibility, 
meaning that it would be totally acceptable to change the query and get a 
different layout for the resulting CSV file, without having to change any 
SchemaRegistry, Reader, or Writer.

I've found a few tickets out there covering a similar topic. In particular, [1] 
mentions the difficulty with more complex Avro data types. But I don't see that 
being a blocker when the data source is an old-fashioned SQL query.

Recommendations?

P.S.1 Maybe templates would save the effort, but since Controller Services are 
"global", I'm still wondering if having too many parts would make it more 
difficult to manage lots of flows than it could be.

P.S.2 Will my 1st flow have a good performance? I'm wondering if another 
advantage of using SchemaRegistry etc is that it prevents the creation of too 
many records at once.

Thank you,

Marcio

[1] NIFI-1372 Create 
ConvertAvroToCSV

[NIFI-1372] Create ConvertAvroToCSV - ASF JIRA

RE: [EXT] High-performance Nifi decoder

2017-08-23 Thread Peter Wicks (pwicks)

Hi Ali,

How many FlowFile’s are you expecting per minute? My experience with NiFi has 
been that if you have 1 FlowFile that is several GB’s in size you can 
frequently process it faster than 100k FlowFile’s at a fraction the size. There 
is a lot of overhead in managing the life cycle of a FlowFile when you start to 
have lots of them.

--Peter

From: Ali Nazemian [mailto:alinazem...@gmail.com]
Sent: Wednesday, August 23, 2017 2:49 PM
To: users@nifi.apache.org
Subject: [EXT] High-performance Nifi decoder

Hi all,


I am investigating the right approach for implementing an high-performance 
network packet decoder in Nifi. I am trying to achieve about 1 Gbps throughput 
on a specific decoder. (with 4 Nifi instances) I was wondering which option 
would perform better. 1) Implementing my decoder in a separate Nifi processor 
using NAR mechanism; 2) Implementing my decoder in C++ and use 
"ExecuteStreamCommand" to run that decoder. Basically, if ExecuteStreamCommand 
involves Disk for the sake of sending messages to the written script, it 
doesn't matter how fast the decoder is, disk would be the main bottleneck.


Regards,
Ali

RE: [EXT] Parsing Email Attachments

2017-05-17 Thread Peter Wicks (pwicks)

Nick,

Try escaping your \n’s, see if that helps.

(?s)(.*\\n\\n${boundary}\\nContent-Type: text\/plain; 
charset="UTF-8"\\n\\n)(.*?)(\\n\\n${boundary}.*)

From: Nick Carenza [mailto:nick.care...@thecontrolgroup.com]
Sent: Thursday, May 18, 2017 11:27 AM
To: users@nifi.apache.org
Subject: [EXT] Parsing Email Attachments

Hey Nifi-ers,

I haven't been having any luck trying to parse email after consuming them with 
pop3.

I am composing a simple message with gmail with just plain text and it comes 
out like this (with many headers removed):

Delivered-To: sl...@company.com
Return-Path: >
MIME-Version: 1.0
Received: by 0.0.0.0 with HTTP; Tue, 16 May 2017 17:54:04 -0700 (PDT)
From: User >
Date: Tue, 16 May 2017 17:54:04 -0700
Subject: test subject
To: em...@company.com
Content-Type: multipart/alternative; boundary="f403045f83d499711a054fadb980"

--f403045f83d499711a054fadb980
Content-Type: text/plain; charset="UTF-8"

test email body

--f403045f83d499711a054fadb980
Content-Type: text/html; charset="UTF-8"

test email body

--f403045f83d499711a054fadb980--

I just want the email body and ExtractEmailAttachments doesn't seem to extract 
the parts between the boundaries like I hoped it would.

So instead I use ExtractEmailHeaders and additionally extract the Content-Type 
header which I then retrieve just the boundary value with an UpdateAttribute 
processor configure like:

boundary: 
${email.headers.content-type:substringAfter('boundary="'):substringBefore('"'):prepend('--')}

Then I wrote a sweet regex for ReplaceText to clean this up:

(?s)(.*\n\n${boundary}\nContent-Type: text\/plain; 
charset="UTF-8"\n\n)(.*?)(\n\n${boundary}.*)

[Inline image 1]

... but even though this works in regex testers and sublimetext, it seems to 
have no effect in my flow.

Anyone have any insight on this?

Thanks,
Nick

RE: [EXT] New to Nifi - Failed to update database due to a failed batch update

2017-09-24 Thread Peter Wicks (pwicks)

Hi Aruna,

Since you are using ReplaceText, you can view the contents of the FlowFile and 
check that you can copy/paste the SQL and execute it by hand in Postgres.

If all that works try setting the batch size on PutSQL to 1 record. This will 
help check if it's all records that are having trouble, or just a few bad 
records.

--Peter

From: Aruna Sankaralingam [mailto:aruna.sankaralin...@cormac-corp.com]
Sent: Saturday, September 23, 2017 2:57 AM
To: users@nifi.apache.org
Subject: [EXT] New to Nifi - Failed to update database due to a failed batch 
update

Hi,

I am new to Nifi. I am trying to load a CSV file into S3 bucket and then load 
into postgres database. Please see screenshots below. This is what I have done. 
I am successful till "Replace Text". But I am not sure if the replace text is 
creating the insert query properly. When I start the PutSQL, it fails with this 
error "Failed to update database due to a failed batch update. There were a 
total of 30 FlowFiles that failed, 0 that succeeded, and 0 that were not 
execute and will be routed to retry"

I tried to see if I can find something in the failure flow file but when I 
click on View or Download, nothing is happening. I would really appreciate any 
kind of guidance to make this work.

[cid:image001.jpg@01D335DB.94E40DC0]


[cid:image002.jpg@01D335DB.94E40DC0]

[cid:image003.jpg@01D335DB.94E40DC0]

RE: [EXT] New to Nifi - Failed to update database due to a failed batch update

2017-09-25 Thread Peter Wicks (pwicks)

Use the Download button right next to View, then open it in a text editor.

From: Aruna Sankaralingam [mailto:aruna.sankaralin...@cormac-corp.com]
Sent: Monday, September 25, 2017 9:54 AM
To: users@nifi.apache.org
Subject: Re: [EXT] New to Nifi - Failed to update database due to a failed 
batch update

Hi, thank you for getting back. Could you please let me know how I can see the 
contents of the flow file ? The view option doesn't seem to work for me. Please 
see my last screenshot in my first email.

Thanks
Aruna

On Sep 24, 2017, at 8:52 PM, Peter Wicks (pwicks) 
<pwi...@micron.com<mailto:pwi...@micron.com>> wrote:
Hi Aruna,

Since you are using ReplaceText, you can view the contents of the FlowFile and 
check that you can copy/paste the SQL and execute it by hand in Postgres.

If all that works try setting the batch size on PutSQL to 1 record. This will 
help check if it’s all records that are having trouble, or just a few bad 
records.

--Peter

From: Aruna Sankaralingam [mailto:aruna.sankaralin...@cormac-corp.com]
Sent: Saturday, September 23, 2017 2:57 AM
To: users@nifi.apache.org<mailto:users@nifi.apache.org>
Subject: [EXT] New to Nifi - Failed to update database due to a failed batch 
update

Hi,

I am new to Nifi. I am trying to load a CSV file into S3 bucket and then load 
into postgres database. Please see screenshots below. This is what I have done. 
I am successful till “Replace Text”. But I am not sure if the replace text is 
creating the insert query properly. When I start the PutSQL, it fails with this 
error “Failed to update database due to a failed batch update. There were a 
total of 30 FlowFiles that failed, 0 that succeeded, and 0 that were not 
execute and will be routed to retry”

I tried to see if I can find something in the failure flow file but when I 
click on View or Download, nothing is happening. I would really appreciate any 
kind of guidance to make this work.

java.lang.OutOfMemoryError: unable to create new native thread

2017-11-14 Thread Peter Wicks (pwicks)

I've been getting the error:

2017-11-15 09:22:24,959 ERROR [NiFi Web Server-566674] org.apache.nifi.NiFi
java.lang.OutOfMemoryError: unable to create new native thread
at java.lang.Thread.start0(Native Method)
at java.lang.Thread.start(Thread.java:714)
at 
org.eclipse.jetty.util.thread.QueuedThreadPool.startThreads(QueuedThreadPool.java:476)
at 
org.eclipse.jetty.util.thread.QueuedThreadPool.access$200(QueuedThreadPool.java:49)
at 
org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:657)
at java.lang.Thread.run(Thread.java:745)

My research online says this isn't an out of memory error, but an out of 
resources error. The system can't support making new threads.

This really sucks, because I can't su to my service account, because that 
requires a new thread... and if I have a bash session still open I can't kill 
NiFi because that requires creating a new thread... last time this happened my 
Unix admin had to restart my server...

The first time this happened we changed the limits for our service account, 
here is my ulimit statement.

# ulimit -a
core file size  (blocks, -c) 0
data seg size   (kbytes, -d) unlimited
scheduling priority (-e) 0
file size   (blocks, -f) unlimited
pending signals (-i) 96297
max locked memory   (kbytes, -l) 64
max memory size (kbytes, -m) unlimited
open files  (-n) 5
pipe size(512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority  (-r) 0
stack size  (kbytes, -s) 8192
cpu time   (seconds, -t) unlimited
max user processes  (-u) 5
virtual memory  (kbytes, -v) unlimited
file locks  (-x) unlimited

It's been about two weeks, and now the issue has come up again. Is this an 
actual hardware limitation?

If I run "ps huH p | wc -l" I can get a thread count, right now it's about 
9900 threads.

Thanks,
  Peter

RE: [EXT] Re: java.lang.OutOfMemoryError: unable to create new native thread

2017-11-14 Thread Peter Wicks (pwicks)

Thanks Joe. It will definitely have to be something I do periodically, 
unfortunately. Calling nifi.sh dump right now fails since it can’t create any 
new threads…

NiFi actually came back up for a little while shortly after this email, maybe 
some threads closed out. But I didn’t take the opportunity to do a dump or a 
restart… and now it’s all locked up again.

I’ll run a dump after a couple days run time and see if I can figure it out. We 
have a custom processor for working with Teradata, which uses a vendor provided 
JDBC driver. It might be related.

Thanks,
  Peter

From: Joe Witt [mailto:joe.w...@gmail.com]
Sent: Wednesday, November 15, 2017 11:49
To: users@nifi.apache.org
Subject: [EXT] Re: java.lang.OutOfMemoryError: unable to create new native 
thread

You’ll want to get thread dumps during the life of nifi to figure out the 
pattern of what is leaking threads.  Often it will be around some tcp socket 
handling thread in something like sftp for example.  Can be a config issue or a 
bug.

Thanks
Joe

On Tue, Nov 14, 2017 at 8:32 PM Peter Wicks (pwicks) 
<pwi...@micron.com<mailto:pwi...@micron.com>> wrote:
I’ve been getting the error:

2017-11-15 09:22:24,959 ERROR [NiFi Web Server-566674] org.apache.nifi.NiFi
java.lang.OutOfMemoryError: unable to create new native thread
at java.lang.Thread.start0(Native Method)
at java.lang.Thread.start(Thread.java:714)
at 
org.eclipse.jetty.util.thread.QueuedThreadPool.startThreads(QueuedThreadPool.java:476)
at 
org.eclipse.jetty.util.thread.QueuedThreadPool.access$200(QueuedThreadPool.java:49)
at 
org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:657)
at java.lang.Thread.run(Thread.java:745)

My research online says this isn’t an out of memory error, but an out of 
resources error. The system can’t support making new threads.

This really sucks, because I can’t su to my service account, because that 
requires a new thread… and if I have a bash session still open I can’t kill 
NiFi because that requires creating a new thread… last time this happened my 
Unix admin had to restart my server…

The first time this happened we changed the limits for our service account, 
here is my ulimit statement.

# ulimit -a
core file size  (blocks, -c) 0
data seg size   (kbytes, -d) unlimited
scheduling priority (-e) 0
file size   (blocks, -f) unlimited
pending signals (-i) 96297
max locked memory   (kbytes, -l) 64
max memory size (kbytes, -m) unlimited
open files  (-n) 5
pipe size(512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority  (-r) 0
stack size  (kbytes, -s) 8192
cpu time   (seconds, -t) unlimited
max user processes  (-u) 5
virtual memory  (kbytes, -v) unlimited
file locks  (-x) unlimited

It’s been about two weeks, and now the issue has come up again. Is this an 
actual hardware limitation?

If I run “ps huH p | wc -l” I can get a thread count, right now it’s about 
9900 threads.

Thanks,
  Peter

Wait only if flagged?

2017-11-13 Thread Peter Wicks (pwicks)

I have a database flow, which is a sequence of 4 processors. For database 
performance reasons I need to make sure only one file is in this section of the 
flow at a time. Not just one file per queue/processor, but for the whole 
section.

I feel like I should be able to use Wait/Notify to do this, but Wait/Notify 
seem to do the opposite. I want to allow a file to go into the flow unless 
there is a flag. If there is a flag I want the FlowFile to wait until the flag 
is cleared.

Thoughts?

Thanks,
 Peter

RE: [EXT] Re: Calling NIFI job from any Enterprise Scheduler

2017-11-13 Thread Peter Wicks (pwicks)

A lot of enterprise schedulers have an option to post to an HTTP endpoint. In 
the past I’ve used NiFi’s ListenHTTP processor to allow for remote triggering 
of a flow.

But really any of the Listen/Consume processors could potentially be used for 
remote triggering:


  *   ListenHTTP or ListenWebSocket (might be a good way to trigger from some 
kind of Javascript front end, less helpful form an enterprise scheduler)
  *   ConsumeJMSQueue/Topic
  *   ConsumeMQTT
  *   ListenTCP/ListenUDP

You could also use a file as a trigger by writing an empty file to a directory 
and then using FetchFile to trigger the flow. You could even use this as a way 
to pass in a configuration file to a flow… ☺ all sorts of options.

--Peter

From: Jeremy Dyer [mailto:jdy...@gmail.com]
Sent: Tuesday, November 14, 2017 02:24
To: users@nifi.apache.org
Subject: [EXT] Re: Calling NIFI job from any Enterprise Scheduler

Siva - I find the best way to trigger a NiFi workload from any sort of 
enterprise scheduler is to start the workflow you wish to trigger with a 
*ConsumeJMS processor and then have that enterprise scheduler fire the event to 
start the job to that configured endpoint. In this manner you can use your 
enterprise scheduler to invoke the flow without any major modifications.

- Jeremy Dyer

On Mon, Nov 13, 2017 at 1:20 PM, Sivakumar, S 
> wrote:
Hi Team,
I have requirement to call NIFI job from any Enterprise Scheduler (for ex: 
tidal Enterprise Scheduler). What is the way to call nifi job from a command 
line so that it can triggered from external program?

-Siva

RE: [EXT] Re: Wait only if flagged?

2017-11-13 Thread Peter Wicks (pwicks)

Matt,

I played around with your idea. I haven't been able to get it to work.

First flow file comes in, goes out the wait relationship. Now how do we stop 
the next FlowFile from going out the wait relationship and entering the flow? 
Well we'd have to use Notify along with some scheduling on Wait so multiple 
FlowFile's don't come through right away; so I put a Notify processor right 
after Wait. Now the second FlowFile enters, goes out the success relationship 
because there is a signal count, then immediately goes down wait since the 
signal is clear...

If this use case seems common enough, I was thinking that the solution might be 
to have add an option on Wait that lets you choose whether you want to "Wait 
for signal" or "Wait only if signaled". This would restore the relationships 
back to their proper uses. Then you'd use the Notify property 'Signal Counter 
Delta' mode of 0 at the end of the flow.  Flow would be something like Wait 
(Wait only if signaled)->Notify (Signal to 1)->... Processors ...->Notify 
(Signal to 0). Or perhaps as part of the feature, "Wait only if signaled" mode 
would do the notify step internally.

Would still like to find a clean/easy way to do this though.

Thanks,
 Peter

-Original Message-
From: Matt Burgess [mailto:mattyb...@apache.org] 
Sent: Tuesday, November 14, 2017 08:53
To: users@nifi.apache.org
Subject: [EXT] Re: Wait only if flagged?

Peter,

I haven't tried this, but my knee-jerk reaction is to switch the roles
of the "wait" and "success" relationships. Maybe you can send the
"wait" relationship downstream and route the "success" one back to
Wait. Then when the flag is "cleared", the flow files will start going
to the "success" relationship which is routed back to wait.

Regards,
Matt

On Mon, Nov 13, 2017 at 7:21 PM, Peter Wicks (pwicks) <pwi...@micron.com> wrote:
> I have a database flow, which is a sequence of 4 processors. For database
> performance reasons I need to make sure only one file is in this section of
> the flow at a time. Not just one file per queue/processor, but for the whole
> section.
>
>
>
> I feel like I should be able to use Wait/Notify to do this, but Wait/Notify
> seem to do the opposite. I want to allow a file to go into the flow unless
> there is a flag. If there is a flag I want the FlowFile to wait until the
> flag is cleared.
>
>
>
> Thoughts?
>
>
>
> Thanks,
>
>  Peter

RE: [EXT] Re: Metrics on Lineage Duration

2017-11-07 Thread Peter Wicks (pwicks)

Mark,

While Pierre’s solution will probably be more helpful in the long run, this 
provides an easy way for me to examine the data by hand right now. I didn’t 
know that metric was on the Status History window, good to now!

Thanks,
  Peter

From: Mark Payne [mailto:marka...@hotmail.com]
Sent: Tuesday, November 07, 2017 9:26 PM
To: users@nifi.apache.org
Subject: [EXT] Re: Metrics on Lineage Duration

Peter,

In the UI you can right-click on a Processor and go to "View Status History" to 
see a chart of different metrics
over time. One of those metrics is the Average Lineage Duration (averaged over 
a 5-minute period).
Of course, you are asking for a reporting task/processor that does this. 
There's nothing that provides this info
directly, but you could always use InvokeHTTP, or something like that to poll 
the REST API, using the path
/nifi-api/flow/processors//status/history

Or, another approach that may be valuable would be to use the 
SiteToSiteProvenanceReportingTask to send
Provenance Events via Site-to-Site back to the NiFi instance. For a given 
Provenance Event, you can get Event Time
and Lineage Start Date, so to calculate the Lineage Duration / latency for the 
FlowFile, you can just use
EventTime - LineageStartDate. This approach would be a little different, as it 
provides lineage duration of each event.
So this could be used, for instance, to detect any particular FlowFiles that 
exceed some SLA or calculate a 90th percentile
type of thing, rather than getting an average.

Does this help at all?

Thanks
-Mark


On Nov 7, 2017, at 6:55 AM, Peter Wicks (pwicks) 
<pwi...@micron.com<mailto:pwi...@micron.com>> wrote:

Is there a reporting task/processor that will let me aggregate Lineage 
Duration’s for a point in time so I can monitor my flow using this metric?

Thanks,
  Peter

RE: [EXT] Re: Metrics on Lineage Duration

2017-11-07 Thread Peter Wicks (pwicks)

Thanks Pierre, though I think you meant to send me: 
https://pierrevillard.com/2017/05/15/monitoring-nifi-workflow-sla/, which was 
much more helpful ☺.

From: Pierre Villard [mailto:pierre.villard...@gmail.com]
Sent: Tuesday, November 07, 2017 8:35 PM
To: users@nifi.apache.org
Subject: [EXT] Re: Metrics on Lineage Duration

Hi Peter,
There is not as far as I know. What I usually do is to use ExecuteScript 
processor to extract the lineage duration of the flow file into an attribute of 
the flow file where I feel this information is useful for my workflow and I 
send it to whatever monitoring destination I have. This way I can ensure that 
mean processing time of a flow file is constant and I can be alerted in case of 
spurious events. I wrote something about this approach here [1].

[1] https://pierrevillard.com/2017/05/16/monitoring-nifi-ambari-grafana/

Hope this helps,
Pierre

2017-11-07 12:55 GMT+01:00 Peter Wicks (pwicks) 
<pwi...@micron.com<mailto:pwi...@micron.com>>:
Is there a reporting task/processor that will let me aggregate Lineage 
Duration’s for a point in time so I can monitor my flow using this metric?

Thanks,
  Peter

RE: [EXT] Re: Polling Processors impact on Latency

2017-11-07 Thread Peter Wicks (pwicks)

If you schedule the processor to run every 0 sec (the default) then in my 
experience you won’t notice latency from polling at all. But I guess this 
depends on your expectations, volume, and over all Flow processing time.

Yes, event driven may help, but from what I’ve read it’s more about reducing 
server resource consumption than latency (could be wrong).

As for a hard set limit, there is a configuration entry in nifi.properties that 
seems relevant:

# If a component has no work to do (is "bored"), how long should we wait before 
checking again for work?
nifi.bored.yield.duration=10 millis

Thanks,
  Peter

From: Chirag Dewan [mailto:chirag.dewa...@yahoo.in]
Sent: Tuesday, November 07, 2017 8:02 PM
To: apere...@gmail.com; users@nifi.apache.org
Subject: [EXT] Re: Polling Processors impact on Latency

Thanks Andrew for the quick response.

I am more concerned about the processors polling for flow files on the 
connection between the processors?

Thanks,

Chirag
Sent from Yahoo Mail on 
Android

On Tue, 7 Nov 2017 at 5:24 PM, Andrew Grande
> wrote:

Yes, polling increases latency in some cases. But no, NiFi is not just polling. 
It has all kinds of sources, and listening vs polling vs subscribing purely 
depends on the protocol of that given processor.

Hope this helps,
Andrew

On Tue, Nov 7, 2017, 1:39 AM Chirag Dewan 
> wrote:
Hi All,

I am a layman to NiFi. I am exploring NiFi as a data flow engine to be 
integrated with my Flink processing engine. A brief history of our approach :

We are trying to build a Streaming Data processing engine. We started off with 
Flink as the sole core engine, which is responsible for collection(through 
Flink Sources) as well as processing the data.

Soon we fumbled onto NiFi and the data flow world.

So far, my understanding is that the NiFi processors are poling processors and 
not Pub-Sub processors. That makes me wonder, whats the impact of polling on 
latency? I know I can configure my processors to tradeoff latency with 
throughput, but is there a hard set limit on the latency I can achieve using 
NiFi?

As I said, I am layman as yet. Perhaps my understanding is short here. Any 
leads would be much appreciated.

P.S - Not diving much into Event Driven Processors. They look like something 
which might clear my thoughts. But since they are marked experimental, would be 
more interested in understanding the timer driven processors.

Thanks,

Chirag

Metrics on Lineage Duration

2017-11-07 Thread Peter Wicks (pwicks)

Is there a reporting task/processor that will let me aggregate Lineage 
Duration's for a point in time so I can monitor my flow using this metric?

Thanks,
  Peter

RE: [EXT] CDC like updates on Nifi

2017-12-06 Thread Peter Wicks (pwicks)

Alberto,

You probably just need to try out the options and see what works best (Avro or 
ORC, etc…).

With the Avro option, you wouldn’t need to change the type of your main HIVE 
table, keep that as ORC.
Only the staging table would use Avro. Then call Hive QL to merge the data from 
your staging table into your main table. Let your clusters CPU power crunch 
through the data to do the merge.

If you split the data using SplitRecord into individual rows then you could 
probably route on the transaction type. But working with individual rows in 
NiFi adds a lot of overhead, and just imagine executing 10k Hive QL SQL 
statements instead of 1 big one… If you have ACID enabled I guess it would all 
get recombined, but the overhead of calling that many statements would be 
really high.

--Peter

From: Alberto Bengoa [mailto:albe...@propus.com.br]
Sent: Thursday, December 07, 2017 02:27
To: users@nifi.apache.org
Subject: Re: [EXT] CDC like updates on Nifi

On Tue, Dec 5, 2017 at 11:55 PM, Peter Wicks (pwicks) 
<pwi...@micron.com<mailto:pwi...@micron.com>> wrote:

Alberto,
Hello Peter,

Thanks for your answer.



Since it sounds like you have control over the structure of the tables, this 
should be doable.



If you have a changelog table for each table this will probably be easier, and 
in your changelog table you’ll need to make sure you have a good transaction 
timestamp column and a change type column (I/U/D). Then use QueryDatabaseTable 
to tail your change log table, one copy of QueryDatabaseTable for each change 
table.

Yes. This is the way that I'm trying to do. I have the TimeStamp and Operation 
type columns as "metadata columns" and all the other "data columns" of each 
table.



Now your changes are in easy to ingest Avro files. For HIVE I’d probably use an 
external table with the Avro schema, this makes it easy to use PutHDFS to load 
the file and make it accessible from HIVE. I haven’t used Phoenix, sorry.

Hmm. Sounds interesting.

I was planning to use ORC because it's allow transactions (to make updates / 
deletes). Avro do not allow transactions, but changing data using HDFS instead 
of HiveQL would be an option.

Would be possible to update fields of specific records using PutHDFS?

On my changelog table I do not have the entire row data when triggered by an 
update. I just have values of changed fields (not changed fields have  
values on changelog tables).

_TimeStamp  _OperationColumn_A Column_B  
Column_C
2017-12-01 14:35:56:204 - 02:00  3 7501   
2017-12-01 14:35:56:211 - 02:00  4 7501 1234  
2017-12-01 15:25:35:945 - 02:00  3 7503   
2017-12-01 15:25:35:945 - 02:00  4 7503 5678  

In the example above, we had two update operations (_Operation = 4). Column_B 
was changed, Column_C not. Column_C would have any prior value.


If you have a single change table for all tables, then you can still use the 
above patter, but you’ll need a middle step where you extract and rebuild the 
changes. Maybe if you store the changes in JSON you could extract them using 
one of the Record parsers and then rebuild the data row. Much harder though.

I have one changelog table for each table.

Considering that I would use HiveQL to update tables on the Datalake, could I 
use a RouteOnContent processor to create SQL Queries according to the 
_Operation type?






Thanks,

  Peter



Thanks you!

Alberto


From: Alberto Bengoa 
[mailto:albe...@propus.com.br<mailto:albe...@propus.com.br>]
Sent: Wednesday, December 06, 2017 06:24
To: users@nifi.apache.org<mailto:users@nifi.apache.org>
Subject: [EXT] CDC like updates on Nifi



Hey folks,



I read about Nifi CDC processor for MySQL and other CDC "solutions" with Nifi 
found on Google, like these:



https://community.hortonworks.com/idea/53420/apache-nifi-processor-to-address-cdc-use-cases-for.html

https://community.hortonworks.com/questions/88686/change-data-capture-using-nifi-1.html

https://community.hortonworks.com/articles/113941/change-data-capture-cdc-with-apache-nifi-version-1-1.html



I'm trying a different approach to acquire fresh information from tables, using 
triggers on source database's tables to write changes to a "changelog table".



This is done, but my questions are:



Would Nifi be capable to read this tables, transform these data to generate a 
SQL equivalent query (insert/update/delete) to send to Hive and/or Phoenix with 
current available processors?



Which would be the best / suggested flow?



The objective is to keep tables on the Data Lake as up-to-date as possible for 
real time analyses.



Cheers,

Alberto

RE: [EXT] CDC like updates on Nifi

2017-12-05 Thread Peter Wicks (pwicks)

Alberto,

Since it sounds like you have control over the structure of the tables, this 
should be doable.

If you have a changelog table for each table this will probably be easier, and 
in your changelog table you’ll need to make sure you have a good transaction 
timestamp column and a change type column (I/U/D). Then use QueryDatabaseTable 
to tail your change log table, one copy of QueryDatabaseTable for each change 
table.

Now your changes are in easy to ingest Avro files. For HIVE I’d probably use an 
external table with the Avro schema, this makes it easy to use PutHDFS to load 
the file and make it accessible from HIVE. I haven’t used Phoenix, sorry.

If you have a single change table for all tables, then you can still use the 
above patter, but you’ll need a middle step where you extract and rebuild the 
changes. Maybe if you store the changes in JSON you could extract them using 
one of the Record parsers and then rebuild the data row. Much harder though.

Thanks,
  Peter

From: Alberto Bengoa [mailto:albe...@propus.com.br]
Sent: Wednesday, December 06, 2017 06:24
To: users@nifi.apache.org
Subject: [EXT] CDC like updates on Nifi

Hey folks,

I read about Nifi CDC processor for MySQL and other CDC "solutions" with Nifi 
found on Google, like these:

https://community.hortonworks.com/idea/53420/apache-nifi-processor-to-address-cdc-use-cases-for.html
https://community.hortonworks.com/questions/88686/change-data-capture-using-nifi-1.html
https://community.hortonworks.com/articles/113941/change-data-capture-cdc-with-apache-nifi-version-1-1.html

I'm trying a different approach to acquire fresh information from tables, using 
triggers on source database's tables to write changes to a "changelog table".

This is done, but my questions are:

Would Nifi be capable to read this tables, transform these data to generate a 
SQL equivalent query (insert/update/delete) to send to Hive and/or Phoenix with 
current available processors?

Which would be the best / suggested flow?

The objective is to keep tables on the Data Lake as up-to-date as possible for 
real time analyses.

Cheers,
Alberto

Reading Email Message Body

2017-10-30 Thread Peter Wicks (pwicks)

A coworker and I were troubleshooting a bug in the ConsumeEWS processor where 
Unicode characters were being read as ASCII.
I figured out there was a bug in my code for ConsumeEWS and plan to fix it, but 
as part of the research I found that the way Unicode text in the email is 
outputted to the FlowFile is not easy to work with; in general the whole email 
body is hard to work with. If there are attachments in there and all you want 
is the body it's even more of a mess.

How are other users reading the email message body? Has anyone else run into 
the issue with Unicode characters?

In my scenario, we see the auto-quotes/semicolons from Outlook's Word interface 
becoming '?' characters, and with my fix in place they are written to the flow 
file using some kind of serialization format:

"Where there's NiFi there is Happiness" becomes:

=E2=80=9CWhere there=E2=80=99s NiFi there is Happiness=E2=80=9D.

Is there a need for a new Email processor that extracts the message body by 
deserializing the FlowFile and reading out the body?

RE: [EXT] Issue with ResizeImage processor for PNG files

2018-06-29 Thread Peter Wicks (pwicks)

Raman,

I created a ticket for this issue: 
https://issues.apache.org/jira/browse/NIFI-5355.

I read through the links you provided, thanks for researching the problem so 
thoroughly.
Found some others as well, such as 
https://stackoverflow.com/questions/4391814/why-would-java-awt-image-bufferedimagegettype-return-different-value-mac-cent.

In short, I think this will require a code change, or an OS change . From what 
I’ve read it works on some OS’s but not others, depending on your Java 
implementation on that OS.

Also, in reading up on this, using the method NiFi is using, getScaledInstance, 
apparently isn’t the best option anyways; there appear to be fater/higher 
quality options out there, as outlined here: 
https://stackoverflow.com/a/19507160/328968

So if changes were going to be made, it could conceivably cover both.

Thanks,
  Peter

From: Ramaninder Singh Jhajj [mailto:jhajj.raman...@gmail.com]
Sent: Friday, June 29, 2018 12:12 PM
To: users@nifi.apache.org
Subject: [EXT] Issue with ResizeImage processor for PNG files

Hello Everyone,

I am facing another issue. I am fetching some images from source and store them 
to destination and just before destination processor I have ResizeImage 
Processor.

Now my images can be JPEG or PNG. Processor documentation mentioned that it can 
handle both but it is failing for PNG files with the following error:


ResizeImage[id=2b2bf869-0dcb-351d-857b-bbeef6f27fcf] 
ResizeImage[id=2b2bf869-0dcb-351d-857b-bbeef6f27fcf] failed to process due to 
java.lang.IllegalArgumentException: Unknown image type 0; rolling back session: 
Unknown image type 0

I did some digging and it seems like the issue is with ImageIO library being 
used for resizing.
https://github.com/apache/nifi/blob/master/nifi-nar-bundles/nifi-media-bundle/nifi-media-processors/src/main/java/org/apache/nifi/processors/image/ResizeImage.java#L177

As per this thread "image.getType()" method is the one which creates problem.
https://www.thecodingforums.com/threads/imageio-have-problem-of-reading-png.141174/

https://stackoverflow.com/questions/5836128/how-do-i-make-javas-imagebuffer-to-read-a-png-file-correctly

So seems like for PNG files image.getType returns 0 but it is expecting 5.

Is there any workaround for this without changing the code?

If there is none, I can fix the code and use (can also submit a pull request 
with the fix but I am not sure how to do that.)

Kind Regards,
Raman

Hive w/ Kerberos Authentication starts failing after a week

2018-07-26 Thread Peter Wicks (pwicks)

We are seeing frequent failures of our Hive DBCP connections after a week of 
use when using Kerberos with Principal/Keytab. We've tried with both the 
Credential Service and without (though in looking at the code, there should be 
no difference).

It looks like the tickets are expiring and renewal is not happening?

javax.security.sasl.SaslException: GSS initiate failed
at 
com.sun.security.sasl.gsskerb.GssKrb5Client.evaluateChallenge(GssKrb5Client.java:211)
at 
org.apache.thrift.transport.TSaslClientTransport.handleSaslStartMessage(TSaslClientTransport.java:94)
at 
org.apache.thrift.transport.TSaslTransport.open(TSaslTransport.java:271)
at 
org.apache.thrift.transport.TSaslClientTransport.open(TSaslClientTransport.java:37)
at 
org.apache.hadoop.hive.thrift.client.TUGIAssumingTransport$1.run(TUGIAssumingTransport.java:52)
at 
org.apache.hadoop.hive.thrift.client.TUGIAssumingTransport$1.run(TUGIAssumingTransport.java:49)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1656)
at 
org.apache.hadoop.hive.thrift.client.TUGIAssumingTransport.open(TUGIAssumingTransport.java:49)
at 
org.apache.hive.jdbc.HiveConnection.openTransport(HiveConnection.java:204)
at org.apache.hive.jdbc.HiveConnection.(HiveConnection.java:176)
at org.apache.hive.jdbc.HiveDriver.connect(HiveDriver.java:105)
at 
org.apache.commons.dbcp.DriverConnectionFactory.createConnection(DriverConnectionFactory.java:38)
at 
org.apache.commons.dbcp.PoolableConnectionFactory.makeObject(PoolableConnectionFactory.java:582)
at 
org.apache.commons.pool.impl.GenericObjectPool.borrowObject(GenericObjectPool.java:1148)
at 
org.apache.commons.dbcp.PoolingDataSource.getConnection(PoolingDataSource.java:106)
at 
org.apache.commons.dbcp.BasicDataSource.getConnection(BasicDataSource.java:1044)
at 
org.apache.nifi.dbcp.hive.HiveConnectionPool.lambda$getConnection$0(HiveConnectionPool.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1656)
at 
org.apache.nifi.dbcp.hive.HiveConnectionPool.getConnection(HiveConnectionPool.java:355)
at sun.reflect.GeneratedMethodAccessor515.invoke(Unknown Source)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)

Thanks,
Peter

RE: [EXT] Re: Hive w/ Kerberos Authentication starts failing after a week

2018-07-27 Thread Peter Wicks (pwicks)

I don’t believe that is how this code works. Not to say that might not work, 
but I don’t believe that the Kerberos authentication used by NiFi processors 
relies in any way on the tickets that appear in klist.

While we are only using a single account on this particular server, many of our 
servers use several Kerberos principals/keytab’s. I don’t think that doing 
kinit’s for all of them would work either.

Thanks,
  Peter

From: Sivaprasanna [mailto:sivaprasanna...@gmail.com]
Sent: Friday, July 27, 2018 3:12 AM
To: users@nifi.apache.org
Subject: [EXT] Re: Hive w/ Kerberos Authentication starts failing after a week

Did you try executing 'klist' to see if the tickets are there and renewed? If 
expired, try manual kinit and see if that fixes.

On Fri, Jul 27, 2018 at 1:51 AM Peter Wicks (pwicks) 
mailto:pwi...@micron.com>> wrote:
We are seeing frequent failures of our Hive DBCP connections after a week of 
use when using Kerberos with Principal/Keytab. We’ve tried with both the 
Credential Service and without (though in looking at the code, there should be 
no difference).

It looks like the tickets are expiring and renewal is not happening?

javax.security.sasl.SaslException: GSS initiate failed
at 
com.sun.security.sasl.gsskerb.GssKrb5Client.evaluateChallenge(GssKrb5Client.java:211)
at 
org.apache.thrift.transport.TSaslClientTransport.handleSaslStartMessage(TSaslClientTransport.java:94)
at 
org.apache.thrift.transport.TSaslTransport.open(TSaslTransport.java:271)
at 
org.apache.thrift.transport.TSaslClientTransport.open(TSaslClientTransport.java:37)
at 
org.apache.hadoop.hive.thrift.client.TUGIAssumingTransport$1.run(TUGIAssumingTransport.java:52)
at 
org.apache.hadoop.hive.thrift.client.TUGIAssumingTransport$1.run(TUGIAssumingTransport.java:49)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1656)
at 
org.apache.hadoop.hive.thrift.client.TUGIAssumingTransport.open(TUGIAssumingTransport.java:49)
at 
org.apache.hive.jdbc.HiveConnection.openTransport(HiveConnection.java:204)
at org.apache.hive.jdbc.HiveConnection.(HiveConnection.java:176)
at org.apache.hive.jdbc.HiveDriver.connect(HiveDriver.java:105)
at 
org.apache.commons.dbcp.DriverConnectionFactory.createConnection(DriverConnectionFactory.java:38)
at 
org.apache.commons.dbcp.PoolableConnectionFactory.makeObject(PoolableConnectionFactory.java:582)
at 
org.apache.commons.pool.impl.GenericObjectPool.borrowObject(GenericObjectPool.java:1148)
at 
org.apache.commons.dbcp.PoolingDataSource.getConnection(PoolingDataSource.java:106)
at 
org.apache.commons.dbcp.BasicDataSource.getConnection(BasicDataSource.java:1044)
at 
org.apache.nifi.dbcp.hive.HiveConnectionPool.lambda$getConnection$0(HiveConnectionPool.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1656)
at 
org.apache.nifi.dbcp.hive.HiveConnectionPool.getConnection(HiveConnectionPool.java:355)
at sun.reflect.GeneratedMethodAccessor515.invoke(Unknown Source)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)

Thanks,
Peter

RE: [EXT] Re: Hive w/ Kerberos Authentication starts failing after a week

2018-07-27 Thread Peter Wicks (pwicks)

As an aside, while digging around in the code, I noticed that the Atlas 
Reporting Task has its own Hadoop Kerberos authentication logic 
(org.apache.nifi.atlas.security.Kerberos). I'm not using this, but it made me 
wonder if this could cause trouble if Hive (synchronized) and Atlas (separate, 
unsynchronized) were both trying to login from Keytab at the same time.

--Peter

From: Shawn Weeks [mailto:swe...@weeksconsulting.us]
Sent: Friday, July 27, 2018 10:29 AM
To: users@nifi.apache.org
Subject: Re: [EXT] Re: Hive w/ Kerberos Authentication starts failing after a 
week


If you're using the Hortonworks distribution it's fixed in the latest HDF 3.x 
release I think.



Thanks

Shawn


From: Peter Wicks (pwicks) 
Sent: Friday, July 27, 2018 10:58 AM
To: users@nifi.apache.org
Subject: RE: [EXT] Re: Hive w/ Kerberos Authentication starts failing after a 
week


Thanks Shawn. Looks like this was fixed in 1.7.0. Will have to upgrade.



From: Shawn Weeks [mailto:swe...@weeksconsulting.us]
Sent: Friday, July 27, 2018 8:07 AM
To: users@nifi.apache.org
Subject: Re: [EXT] Re: Hive w/ Kerberos Authentication starts failing after a 
week



See NIFI-5134 as there was a known bug with the Hive Connection Pool that made 
it fail once the Kerberos Tickets expired and you lost your connection from 
Hive. If you don't have this patch in your version once the Kerberos Tickets 
reaches the end of it's lifetime the connection pool won't work till you 
restart NiFi.



Thanks

Shawn



From: Peter Wicks (pwicks) mailto:pwi...@micron.com>>
Sent: Friday, July 27, 2018 8:51:54 AM
To: users@nifi.apache.org<mailto:users@nifi.apache.org>
Subject: RE: [EXT] Re: Hive w/ Kerberos Authentication starts failing after a 
week



I don't believe that is how this code works. Not to say that might not work, 
but I don't believe that the Kerberos authentication used by NiFi processors 
relies in any way on the tickets that appear in klist.



While we are only using a single account on this particular server, many of our 
servers use several Kerberos principals/keytab's. I don't think that doing 
kinit's for all of them would work either.



Thanks,

  Peter



From: Sivaprasanna [mailto:sivaprasanna...@gmail.com]
Sent: Friday, July 27, 2018 3:12 AM
To: users@nifi.apache.org<mailto:users@nifi.apache.org>
Subject: [EXT] Re: Hive w/ Kerberos Authentication starts failing after a week



Did you try executing 'klist' to see if the tickets are there and renewed? If 
expired, try manual kinit and see if that fixes.



On Fri, Jul 27, 2018 at 1:51 AM Peter Wicks (pwicks) 
mailto:pwi...@micron.com>> wrote:

We are seeing frequent failures of our Hive DBCP connections after a week of 
use when using Kerberos with Principal/Keytab. We've tried with both the 
Credential Service and without (though in looking at the code, there should be 
no difference).



It looks like the tickets are expiring and renewal is not happening?



javax.security.sasl.SaslException: GSS initiate failed

at 
com.sun.security.sasl.gsskerb.GssKrb5Client.evaluateChallenge(GssKrb5Client.java:211)

at 
org.apache.thrift.transport.TSaslClientTransport.handleSaslStartMessage(TSaslClientTransport.java:94)

at 
org.apache.thrift.transport.TSaslTransport.open(TSaslTransport.java:271)

at 
org.apache.thrift.transport.TSaslClientTransport.open(TSaslClientTransport.java:37)

at 
org.apache.hadoop.hive.thrift.client.TUGIAssumingTransport$1.run(TUGIAssumingTransport.java:52)

at 
org.apache.hadoop.hive.thrift.client.TUGIAssumingTransport$1.run(TUGIAssumingTransport.java:49)

at java.security.AccessController.doPrivileged(Native Method)

at javax.security.auth.Subject.doAs(Subject.java:422)

at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1656)

at 
org.apache.hadoop.hive.thrift.client.TUGIAssumingTransport.open(TUGIAssumingTransport.java:49)

at 
org.apache.hive.jdbc.HiveConnection.openTransport(HiveConnection.java:204)

at org.apache.hive.jdbc.HiveConnection.(HiveConnection.java:176)

at org.apache.hive.jdbc.HiveDriver.connect(HiveDriver.java:105)

at 
org.apache.commons.dbcp.DriverConnectionFactory.createConnection(DriverConnectionFactory.java:38)

at 
org.apache.commons.dbcp.PoolableConnectionFactory.makeObject(PoolableConnectionFactory.java:582)

at 
org.apache.commons.pool.impl.GenericObjectPool.borrowObject(GenericObjectPool.java:1148)

at 
org.apache.commons.dbcp.PoolingDataSource.getConnection(PoolingDataSource.java:106)

at 
org.apache.commons.dbcp.BasicDataSource.getConnection(BasicDataSource.java:1044)

at 
org.apache.nifi.dbcp.hive.HiveConnectionPool.lambda$getConnection$0(HiveConnectionPool.java:355)

at java.security.AccessController.doPrivileged(Native Method)

at javax.security.auth.S

After 1.7.1 upgrade, no Provenance data is visible

2018-08-10 Thread Peter Wicks (pwicks)

After upgrading our NiFi instances to 1.7.1 we are not able to see Provenance 
data anymore in the UI. We see this across about a dozen instances.
In the UI it tells me provenance is available for about the last 24 hours, and 
I can see that files have moved in and out of the processor in the last 5 min. 
In the logs, I can see it query provenance, and that the query returns 0 
results.

Thoughts? I saw a few tickets related to Provenance in 1.7, but not sure if 
they have an impact.

Here are our properties:

# Provenance Repository Properties
nifi.provenance.repository.implementation=org.apache.nifi.provenance.WriteAheadProvenanceRepository
nifi.provenance.repository.debug.frequency=1_000_000
nifi.provenance.repository.encryption.key.provider.implementation=
nifi.provenance.repository.encryption.key.provider.location=
nifi.provenance.repository.encryption.key.id=
nifi.provenance.repository.encryption.key=

# Persistent Provenance Repository Properties
nifi.provenance.repository.directory.default=/data/nifi/repositories/provenance_repository
nifi.provenance.repository.max.storage.time=24 hours
nifi.provenance.repository.max.storage.size=1 GB
nifi.provenance.repository.rollover.time=30 secs
nifi.provenance.repository.rollover.size=100 MB
nifi.provenance.repository.query.threads=2
nifi.provenance.repository.index.threads=2
nifi.provenance.repository.compress.on.rollover=true
nifi.provenance.repository.always.sync=false
nifi.provenance.repository.journal.count=16
# Comma-separated list of fields. Fields that are not indexed will not be 
searchable. Valid fields are:
# EventType, FlowFileUUID, Filename, TransitURI, ProcessorID, 
AlternateIdentifierURI, Relationship, Details
nifi.provenance.repository.indexed.fields=EventType, FlowFileUUID, Filename, 
ProcessorID, Relationship
# FlowFile Attributes that should be indexed and made searchable.  Some 
examples to consider are filename, uuid, mime.type
nifi.provenance.repository.indexed.attributes=
# Large values for the shard size will result in more Java heap usage when 
searching the Provenance Repository
# but should provide better performance
nifi.provenance.repository.index.shard.size=500 MB
# Indicates the maximum length that a FlowFile attribute can be when retrieving 
a Provenance Event from
# the repository. If the length of any attribute exceeds this value, it will be 
truncated when the event is retrieved.
nifi.provenance.repository.max.attribute.length=65536
nifi.provenance.repository.concurrent.merge.threads=2
nifi.provenance.repository.warm.cache.frequency=1 hour

Thanks,
  Peter

RE: [EXT] Re: After 1.7.1 upgrade, no Provenance data is visible

2018-08-13 Thread Peter Wicks (pwicks)

Thanks Mike, that fixed it.

--Peter

From: Michael Moser [mailto:moser...@gmail.com]
Sent: Friday, August 10, 2018 3:49 PM
To: users@nifi.apache.org
Subject: [EXT] Re: After 1.7.1 upgrade, no Provenance data is visible

Hi Peter,

There was a change to provenance related access policies in 1.7.0.  Check out 
the Migration Guide [1] for 1.7.0.  It talks about what you'll need to do.

[1] - https://cwiki.apache.org/confluence/display/NIFI/Migration+Guidance

-- Mike

On Fri, Aug 10, 2018 at 5:39 PM Peter Wicks (pwicks) 
mailto:pwi...@micron.com>> wrote:
After upgrading our NiFi instances to 1.7.1 we are not able to see Provenance 
data anymore in the UI. We see this across about a dozen instances.
In the UI it tells me provenance is available for about the last 24 hours, and 
I can see that files have moved in and out of the processor in the last 5 min. 
In the logs, I can see it query provenance, and that the query returns 0 
results.

Thoughts? I saw a few tickets related to Provenance in 1.7, but not sure if 
they have an impact.

Here are our properties:

# Provenance Repository Properties
nifi.provenance.repository.implementation=org.apache.nifi.provenance.WriteAheadProvenanceRepository
nifi.provenance.repository.debug.frequency=1_000_000
nifi.provenance.repository.encryption.key.provider.implementation=
nifi.provenance.repository.encryption.key.provider.location=
nifi.provenance.repository.encryption.key.id<http://nifi.provenance.repository.encryption.key.id>=
nifi.provenance.repository.encryption.key=

# Persistent Provenance Repository Properties
nifi.provenance.repository.directory.default=/data/nifi/repositories/provenance_repository
nifi.provenance.repository.max.storage.time=24 hours
nifi.provenance.repository.max.storage.size=1 GB
nifi.provenance.repository.rollover.time=30 secs
nifi.provenance.repository.rollover.size=100 MB
nifi.provenance.repository.query.threads=2
nifi.provenance.repository.index.threads=2
nifi.provenance.repository.compress.on.rollover=true
nifi.provenance.repository.always.sync=false
nifi.provenance.repository.journal.count=16
# Comma-separated list of fields. Fields that are not indexed will not be 
searchable. Valid fields are:
# EventType, FlowFileUUID, Filename, TransitURI, ProcessorID, 
AlternateIdentifierURI, Relationship, Details
nifi.provenance.repository.indexed.fields=EventType, FlowFileUUID, Filename, 
ProcessorID, Relationship
# FlowFile Attributes that should be indexed and made searchable.  Some 
examples to consider are filename, uuid, mime.type
nifi.provenance.repository.indexed.attributes=
# Large values for the shard size will result in more Java heap usage when 
searching the Provenance Repository
# but should provide better performance
nifi.provenance.repository.index.shard.size=500 MB
# Indicates the maximum length that a FlowFile attribute can be when retrieving 
a Provenance Event from
# the repository. If the length of any attribute exceeds this value, it will be 
truncated when the event is retrieved.
nifi.provenance.repository.max.attribute.length=65536
nifi.provenance.repository.concurrent.merge.threads=2
nifi.provenance.repository.warm.cache.frequency=1 hour

Thanks,
  Peter

How many threads does Jetty use?

2018-07-18 Thread Peter Wicks (pwicks)

I know the default thread count for Jetty is 200, but is there a way to tell 
how many are actually being used and if I need to make adjustments?

nifi.web.jetty.threads=200

Thanks,
  Peter

RE: [EXT] Adding a file to a zip file

2018-07-11 Thread Peter Wicks (pwicks)

Hi Kiran,

In your flow, how do you avoid duplicate files going into MergeContent?

For example:

  1.  file1.zip goes into Unpack zip file, it contains 5 files.
  2.  These 5 files are sent down both success paths (AttributeToJSON and 
increment fragment index and count)
  3.  5 files show up at Merge Content, and are waiting for that 1 file.
  4.  Meanwhile 5 files show up to AttrigutesToJSON, and then have their 
fragment.index set to 1…
  5.  10 files are now available to MergeContent with the same Fragment 
Identifier and covering all required indexes…

Is this not happening?

Thanks,
  Peter

From: Kiran [mailto:kiran@protonmail.com]
Sent: Tuesday, July 10, 2018 2:36 PM
To: users 
Subject: [EXT] Adding a file to a zip file

Hello,

I've got a requirement to add a JSON file to an existing zip file.

I'm doing this by:

  1.  Unpacking the ZIP file
  2.  Increment the fragment.index and fragment.count of the original files
  3.  Create the JSON file and set the fragment.index to 1 and set the 
fragment.count
  4.  Merge the contents of the files to create the resulting ZIP file
I've attached an image of the data flow and the settings for the MergeContent 
processor.

When I process the ZIP files one by one this works fine but when I process the 
ZIP files in bulk some work and others fail the MergeContent processor. I'm 
guessing that it's to do with the settings of the MergeContent processor. Can 
anyone provide me with insight on what I'm doing wrong here?

Thanks

Kiran

Cluster Peer Lists

2018-09-27 Thread Peter Wicks (pwicks)

Hi NiFi team,

We had one of the nodes in our cluster go offline today. We eventually resolved 
the issue, but it exposed some issues in our configuration across our edge NiFi 
instances.

Right now we have non-clustered instances of NiFi distributed around the world, 
pushing data back to a three node cluster via Site-to-Site. All of these 
instances use the name of the first node (node01), and pull back the peer list 
and weights from it. But node01 is the node that went offline today, and while 
some site-to-site connections appeared to use cached data and continued 
uploading data to node02 and node03, many of the site-to-site connections went 
down because they were not able to pull the peer list from the cluster, which 
makes perfect sense to me.

One question that I was curious about, how long is a peer list cached for if an 
updated list can't be retrieved  from the cluster?

What are the best practices for fixing this? We were throwing around ideas of 
using a load balancer or round robin DNS name as the entry point for 
site-to-site, but I figured others have probably already tackled this problem 
before and could share some ideas.

Thanks,
  Peter

Delay a FlowFile for a specific amount of time

2018-10-15 Thread Peter Wicks (pwicks)

A coworker and I were working on a problem where we needed to delay a group of 
FlowFile's for 60 minutes. Our first attempt of course used ControlRate, but 
with ControlRate the first file is let through immediately, and only after that 
are the rest of the files delayed.

We got it working by using FlowFile penalization on a broken PutFile processor 
to penalize all files for 60 minutes, and all files fail due to the bad config. 
In testing this looks good, but... well it's pretty ugly. Is there an easy fix 
for this? My first thought was we need a "Delay" processor.

--Peter

RE: [EXT] Re: Delay a FlowFile for a specific amount of time

2018-10-15 Thread Peter Wicks (pwicks)

Bryan/Pierre, both good ideas. Thanks for the input!

-Original Message-
From: Bryan Bende [mailto:bbe...@gmail.com] 
Sent: Monday, October 15, 2018 2:14 PM
To: users@nifi.apache.org
Subject: [EXT] Re: Delay a FlowFile for a specific amount of time

Maybe a Wait processor with an expiration of 60 mins? If you never use a Notify 
processor then its basically just going to wait til the expiration.
On Mon, Oct 15, 2018 at 4:10 PM Peter Wicks (pwicks)  wrote:
>
> A coworker and I were working on a problem where we needed to delay a group 
> of FlowFile’s for 60 minutes. Our first attempt of course used ControlRate, 
> but with ControlRate the first file is let through immediately, and only 
> after that are the rest of the files delayed.
>
>
>
> We got it working by using FlowFile penalization on a broken PutFile 
> processor to penalize all files for 60 minutes, and all files fail due to the 
> bad config. In testing this looks good, but… well it’s pretty ugly. Is there 
> an easy fix for this? My first thought was we need a “Delay” processor.
>
>
>
> --Peter

RE: [EXT] ReplaceText cannot consume messages if Regex does not match

2018-10-18 Thread Peter Wicks (pwicks)

Hi Juan,

What version of NiFi are you running on?
What mode are you running ReplaceText in, all text or line by line?
Other settings that might be important? What’s your RegEx look like (if your 
able to share).

--Peter


From: Juan Pablo Gardella [mailto:gardellajuanpa...@gmail.com]
Sent: Thursday, October 18, 2018 8:53 AM
To: users@nifi.apache.org
Subject: [EXT] ReplaceText cannot consume messages if Regex does not match

Hi all,

I'm seeing that ReplaceText is not able to consume messages that does not match 
regex. It keeps all the messages in the input queue instead of sending them to 
failure relationship. Is this the intended behavior or I have to file a ticket 
in order to be fixed? In that way, the processor is not able to process bad 
messages and converts in the bottleneck of a flow

Juan

Reverting position only versioned changes

2018-10-22 Thread Peter Wicks (pwicks)

In NiFi 1.7.1, I tried to revert some changes to a versioned processor group 
where the only changes were Processor position. The processor that was moved 
was running for a very long time, and the revert would not complete until it 
stopped the affected processor, even though a stop is not actually necessary to 
revert the change.

I eventually cancelled the revert request. An unintended side effect is that 
the processor is still in a Stopping state, and I have to make sure and go back 
and start it back up once it completes.

Is this an issue that's already been fixed in 1.8? "Do not stop processors 
during a version revert if only position has changed", or something like that.

--Peter

Maximum Memory for NiFi?

2018-10-04 Thread Peter Wicks (pwicks)

We've had some more clustering issues, and found that some nodes are running 
out of memory when we have unexpected spikes in data, then we run into a GC 
stop-the-world event... We lowered our thread count, and that has allowed the 
cluster to stabilize for the time being.

Our hardware is pretty robust, we usually have 1000+ threads running on each 
node in the cluster (cumulative ~4,000 threads). Each node has about 500G's of 
RAM. But we've only been running NiFi with 70G's of RAM, and it usually uses 
only 50G's.

I enabled GC logging and after analyzing the data we decided to increase the 
heap size. We are experimenting with upping the max to 200G of heap to better 
absorb spikes in data. We are using the default G1GC.

Also, how much impact is there from doing GC logging all the time? The metrics 
we are getting are really helpful for debugging/analyzing, but we don't want to 
slowdown the cluster too much.

Thoughts on issues we might encounter? Things we should consider?

--Peter

RE: [EXT] Re: Maximum Memory for NiFi?

2018-10-08 Thread Peter Wicks (pwicks)

Bryan,

Our Min is set to 32GB's. Under normal situations the heap does not exceed 
roughly 50% usage (out of 70 GB's), and many times is lower. We collect and 
track these metrics, and in the last 30 days it's been closer to 35% usage.

But during database maintenance we have to shut down a lot of processors. 
Flowfile's start to backup in the system across lots of different feeds. Then, 
when the database comes back online, the combined processing of all these 
separate feeds catching up on backlog (lots of different processors, not a 
single processor), causes the heap usage to spike. What we saw in GC logging 
was we would reach 70 GB's, then GC would do a stop the world pause and bring 
us down to about 65 GB's, then we'd reach 70 GB's, and GC would get us down to 
68 GB's. This would repeat until GC was only trimming off a few MB's and having 
to run full cleanups every few seconds; thus leaving the system inoperable.

We brought our cluster back online by:
 1. Shutting everything down
 2. Going into a single node and setting NiFi to not auto-resume state; we also 
set the maximum thread count to 10.
 3. We turned on a single node and verified we could process a single feed 
without crashing. We then synchronized the flow to the rest of the nodes and 
brought them back online.
 4. We then manually turned feeds on to flush out backlogged data, of course 
more data was backlogging on our edge servers while we did this.
 5. We decided to set threads to 140 per node (significantly lower than the 
1500 threads we used to have), and Heap to 200 GB's. We did 2x threads per 
virtual core, plus enough threads to cover all of the site-to-site input ports. 
It's weird, because NiFi used to happily run 1000+ threads per node all the 
time, but is able to keep up just as well now with 140 threads...
 6. With these settings in place we caught up on our backlog without running 
out of heap. We maxed out around 100 GB's of Heap usage per node.

--Peter

-Original Message-
From: Bryan Bende [mailto:bbe...@gmail.com] 
Sent: Friday, October 5, 2018 7:26 AM
To: users@nifi.apache.org
Subject: [EXT] Re: Maximum Memory for NiFi?

Generally the larger the heap, the more likely to have long GC pauses.

I'm surprised that you would need a 70GB heap given NiFi's design where the 
content of the flow files is generally not held in memory, unless many of the 
processors you are using are not written in an optimal way to process the 
content in a streaming fashion.

Did you initially start out lower than 70GB and head to increase it to that 
point? Just wondering what happens at lower levels like maybe 32GB.

On Thu, Oct 4, 2018 at 4:20 PM Peter Wicks (pwicks)  wrote:
>
> We’ve had some more clustering issues, and found that some nodes are running 
> out of memory when we have unexpected spikes in data, then we run into a GC 
> stop-the-world event... We lowered our thread count, and that has allowed the 
> cluster to stabilize for the time being.
>
>
>
> Our hardware is pretty robust, we usually have 1000+ threads running on each 
> node in the cluster (cumulative ~4,000 threads). Each node has about 500G’s 
> of RAM. But we’ve only been running NiFi with 70G’s of RAM, and it usually 
> uses only 50G’s.
>
>
>
> I enabled GC logging and after analyzing the data we decided to increase the 
> heap size. We are experimenting with upping the max to 200G of heap to better 
> absorb spikes in data. We are using the default G1GC.
>
>
>
> Also, how much impact is there from doing GC logging all the time? The 
> metrics we are getting are really helpful for debugging/analyzing, but we 
> don’t want to slowdown the cluster too much.
>
>
>
> Thoughts on issues we might encounter? Things we should consider?
>
>
>
> --Peter

RE: [EXT] Re: Cluster Peer Lists

2018-09-28 Thread Peter Wicks (pwicks)

Thanks Koji, exactly the answers I was looking for.

I've switched over all the Remote Processor Group's to use the list of nodes.

--Peter

-Original Message-
From: Koji Kawamura [mailto:ijokaruma...@gmail.com] 
Sent: Thursday, September 27, 2018 7:08 PM
To: users@nifi.apache.org
Subject: [EXT] Re: Cluster Peer Lists

Hi Peter,

Site-to-Site client refreshes remote peer list per 60 secs.
https://github.com/apache/nifi/blob/master/nifi-commons/nifi-site-to-site-client/src/main/java/org/apache/nifi/remote/client/PeerSelector.java#L60

The address configured to setup a S2S client is used to get remote peer list 
initially.
After that, the client knows node01, 02 and 03 are available peers, then when 
it refreshes peer list, even if it fails to access node01, it should retrieve 
the updated peer list from node02 or 03. However, if node01 stays in the remote 
cluster (until it is removed from the cluster, node02 and 03 still think it's a 
part of the cluster), the returned peer list contains node01.
https://github.com/apache/nifi/blob/master/nifi-commons/nifi-site-to-site-client/src/main/java/org/apache/nifi/remote/client/PeerSelector.java#L383

Another thing to note is that S2S client calculates destination for the next 
128 transaction in advance.
So, if your client does not make transactions often, it may take longer for 
re-calculating the next destination.
https://github.com/apache/nifi/blob/master/nifi-commons/nifi-site-to-site-client/src/main/java/org/apache/nifi/remote/client/PeerSelector.java#L159

To avoid having a single host address at S2S client configuration, you can use 
multiple ones delimited by commas.
With this, S2S client can connect when it's restarted even if node01 is down.
E.g. http://node01:8080,http://node02:8080,http://node03:8080

Alternatively, round robin DNS name or Reverse Proxy for the bootstrap node 
address can be used similarly.

Thanks,
Koji

On Fri, Sep 28, 2018 at 4:30 AM Peter Wicks (pwicks)  wrote:
>
> Hi NiFi team,
>
>
>
> We had one of the nodes in our cluster go offline today. We eventually 
> resolved the issue, but it exposed some issues in our configuration across 
> our edge NiFi instances.
>
>
>
> Right now we have non-clustered instances of NiFi distributed around the 
> world, pushing data back to a three node cluster via Site-to-Site. All of 
> these instances use the name of the first node (node01), and pull back the 
> peer list and weights from it. But node01 is the node that went offline 
> today, and while some site-to-site connections appeared to use cached data 
> and continued uploading data to node02 and node03, many of the site-to-site 
> connections went down because they were not able to pull the peer list from 
> the cluster, which makes perfect sense to me.
>
>
>
> One question that I was curious about, how long is a peer list cached for if 
> an updated list can’t be retrieved  from the cluster?
>
>
>
> What are the best practices for fixing this? We were throwing around ideas of 
> using a load balancer or round robin DNS name as the entry point for 
> site-to-site, but I figured others have probably already tackled this problem 
> before and could share some ideas.
>
>
>
> Thanks,
>
>   Peter

Problems with NiFi Registry Conflicts after Processor Upgrades

2018-11-29 Thread Peter Wicks (pwicks)

Ran into a NiFi Registry issue while upgrading our instances to NiFi 1.8.0. 
ExecuteSQL had a number of new properties added to it in 1.8.0, so after 
upgrading our, our versioned processor groups show as having local changes, 
which is good. We went ahead and checked the changes into the registry.

Enter the second instance... we upgraded a second instance. It also see's local 
changes, but now the processor group is in conflict, because we have local 
(identical) changes, and we have a newer version checked in. If you try to 
revert the local changes so you can sync things up... you can't, because these 
are properties on the Processor, and the default values automatically come 
back. So our second processor group is in conflict and we haven't found a way 
to bring it back in sync without deleting it and re loading it from the 
registry. Help would be appreciated.

Thanks,
  Peter

Cluster Response Times Intermittently Slow, CPU Spikes

2019-01-09 Thread Peter Wicks (pwicks)

For a few weeks, one of our clusters has been having large spikes in CPU across 
most or all nodes, at the same time. Beginning at the same time, our metrics 
collection job has intermittently started to see long response times when 
calling the following api end-points:

/nifi-api/process-groups/root
/nifi-api/flow/process-groups/root/status?recursive=False=True
/nifi-api/system-diagnostics?nodewise=true

There has been no change in response time for these end-points:

/nifi-api/access/token
/nifi-api/flow/cluster/summary
/nifi-api/tenants/users/
/nifi-api/flow/process-groups/root/controller-services?includeAncestorGroups=False=True

All calls are node local, so usually response times are sub-second. But when 
the slowness sets in, some problem calls may take as long as 45 to 60 seconds. 
Sometimes it's just one call, sometimes it's more than one that is slow (they 
are called synchronously back to back).

Any thoughts on properties/settings that we can look at, or component logging 
that we can enable to help troubleshoot the slowness?

We have a second cluster, geographically isolated, that runs (to the best of 
our knowledge) the exact same jobs with the exact same data. This cluster has 
no issues.

Thanks,
  Peter

RE: [EXT] Telnet login and data capture

2019-04-23 Thread Peter Wicks (pwicks)

You should look at the ExecuteStreamCommand processor. Write up a script that 
does what you want in Bash or Python, pass any variables you need either 
through command line arguments or STIDIN (the contents of the FlowFile is 
passed as STDIN, I’ve written up whole Python scripts using ReplaceText and 
passed them straight into Python), and finally write out responses you want to 
keep to STDOUT.  All text sent to STDOUT will be saved as the contents of the 
outgoing FlowFile from this processor.

--Peter

From: Luis Carmona 
Sent: Monday, April 22, 2019 4:35 PM
To: users 
Subject: [EXT] Telnet login and data capture

Hi,

has anyone used Nifi to read data from a Telnet connection. What I'm trying to 
do is to open the telnet connection, send a Login string, and then after 
receive all the traffic that will come from that connection from the server to 
NIFI.



Any clues ?


Thanks in advance

RE: [EXT] Re: FlowFile Repository can't checkpoint, out of heap space.

2019-08-15 Thread Peter Wicks (pwicks)

We were able to recover this morning, in the end we deleted the queues that 
were causing trouble from the Flow, and when the problem node came online it 
deleted the FlowFile’s all on its own, since the queue did not exist. Since 
this is done during the FlowFile Repository load into memory, it didn’t run out 
of heap.

But before we go to that point we maxed out heap, 500GB’s!  All our server had 
to offer. I also tried scripting a cleanup of the journals overflow files. 
Which failed, because the journal keeps track of those files, and won’t restore 
if some are missing.  I’m thinking of building some nifi-utility functions for 
doing emergency cleanup of the FlowFile repository where you can specify a 
Queue ID and it removes those files, or maybe doing an offline compaction.

Thanks,
  Peter


From: Brandon DeVries 
Sent: Thursday, August 15, 2019 9:53 AM
To: users@nifi.apache.org
Subject: [EXT] Re: FlowFile Repository can't checkpoint, out of heap space.


Peter,

Unfortunately, I don't have a perfect solution for your current problem.  I 
would try starting with autoResume=false, just to try to limit what's going on 
in the system.  If possible, you can also try temporarily giving the JVM more 
heap.

This is, however, the use case that led to the idea of "recovery mode" in the 
new RocksDBFlowFileRepository[1] that should be in nifi 1.10.0 (the 
documentation[2] is attached to the ticket):

"[Recovery mode] limits the number of FlowFiles loaded into the graph at a 
time, while not actually removing any FlowFiles (or content) from the system. 
This allows for the recovery of a system that is encountering OutOfMemory 
errors or similar on startup..."

[1] 
https://issues.apache.org/jira/browse/NIFI-4775<https://nam01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fissues.apache.org%2Fjira%2Fbrowse%2FNIFI-4775=02%7C01%7Cpwicks%40micron.com%7C2292944c3b9f430c5d9508d72198ab64%7Cf38a5ecd28134862b11bac1d563c806f%7C0%7C0%7C637014811949909503=EWX6b3fQkHV9ANXRFu7oiiv8n8khDVIy9PT6fiaJmUY%3D=0>
[2] 
https://issues.apache.org/jira/secure/attachment/12976954/RocksDBFlowFileRepo.html<https://nam01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fissues.apache.org%2Fjira%2Fsecure%2Fattachment%2F12976954%2FRocksDBFlowFileRepo.html=02%7C01%7Cpwicks%40micron.com%7C2292944c3b9f430c5d9508d72198ab64%7Cf38a5ecd28134862b11bac1d563c806f%7C0%7C0%7C637014811949909503=XDKoAcehZmYFqJ6JLVkVyhx0gsRGNyw7vDNfURhsn4E%3D=0>

On Wed, Aug 14, 2019 at 12:12 PM Peter Wicks (pwicks) 
mailto:pwi...@micron.com>> wrote:
I have a node in a cluster whose FlowFile repository grew so fast that it 
exceeded the amount of available heap space and now can't checkpoint. Or that 
is my interpretation of the error.

"Cannot update journal file flowfile_repository/journals/.journal because 
this journal  has already encountered a failure when attempting to write to the 
file."
Additionally, on restart, we see NiFi failed to restart because it ran out of 
heap space while doing a SchemaRecordReader.readFieldValue.  Feeling a bit 
stuck on where to go from here.

Based on metrics we collect, we see a large increase in FlowFile's on that node 
right before it crashed, and in linux we see the following:
94G ./journals/overflow-569618072
356G./journals/overflow-569892338

Oh, and a 280 GB checkpoint file

There are a few queues/known FlowFile’s that are probably the problem, and I’m 
OK with dropping them, but there is plenty of other data in there too that I 
don’t want to lose…

Thanks,
  Peter

RE: [EXT] Re: FlowFile Repository can't checkpoint, out of heap space.

2019-08-15 Thread Peter Wicks (pwicks)

 serde.deserializeRecord(dataInputStream, serdeVersion);
}

outStream.close();
dataInputStream.close();

System.out.println(file + " - " + saved + " / " + total);
}

From: Joe Witt 
Sent: Thursday, August 15, 2019 10:58 AM
To: users@nifi.apache.org
Subject: Re: [EXT] Re: FlowFile Repository can't checkpoint, out of heap space.

Peter

All the details you can share on this would be good.  First, we should be 
resilient to any sort of repo corruption in the event of heap issues.  While 
obviously the flow isn't in a good state at that point the saved state should 
be reliable/recoverable.  Second, how the repo/journals got that large itself 
should be evaluated/considered/determined.  A full JIRA/description of the 
situation/logs/known state would be worthy of further resolution.

Thanks

On Thu, Aug 15, 2019 at 12:50 PM Peter Wicks (pwicks) 
mailto:pwi...@micron.com>> wrote:
We were able to recover this morning, in the end we deleted the queues that 
were causing trouble from the Flow, and when the problem node came online it 
deleted the FlowFile’s all on its own, since the queue did not exist. Since 
this is done during the FlowFile Repository load into memory, it didn’t run out 
of heap.

But before we go to that point we maxed out heap, 500GB’s!  All our server had 
to offer. I also tried scripting a cleanup of the journals overflow files. 
Which failed, because the journal keeps track of those files, and won’t restore 
if some are missing.  I’m thinking of building some nifi-utility functions for 
doing emergency cleanup of the FlowFile repository where you can specify a 
Queue ID and it removes those files, or maybe doing an offline compaction.

Thanks,
  Peter


From: Brandon DeVries mailto:b...@jhu.edu>>
Sent: Thursday, August 15, 2019 9:53 AM
To: users@nifi.apache.org<mailto:users@nifi.apache.org>
Subject: [EXT] Re: FlowFile Repository can't checkpoint, out of heap space.


Peter,

Unfortunately, I don't have a perfect solution for your current problem.  I 
would try starting with autoResume=false, just to try to limit what's going on 
in the system.  If possible, you can also try temporarily giving the JVM more 
heap.

This is, however, the use case that led to the idea of "recovery mode" in the 
new RocksDBFlowFileRepository[1] that should be in nifi 1.10.0 (the 
documentation[2] is attached to the ticket):

"[Recovery mode] limits the number of FlowFiles loaded into the graph at a 
time, while not actually removing any FlowFiles (or content) from the system. 
This allows for the recovery of a system that is encountering OutOfMemory 
errors or similar on startup..."

[1] 
https://issues.apache.org/jira/browse/NIFI-4775<https://nam01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fissues.apache.org%2Fjira%2Fbrowse%2FNIFI-4775=02%7C01%7Cpwicks%40micron.com%7Ce785de36aeeb49a54bf308d721a1d839%7Cf38a5ecd28134862b11bac1d563c806f%7C0%7C0%7C637014851336060085=BT%2FQoS0CeWySXE5VIJblhE%2BLaXW7ziR1rcfUlRQdnBc%3D=0>
[2] 
https://issues.apache.org/jira/secure/attachment/12976954/RocksDBFlowFileRepo.html<https://nam01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fissues.apache.org%2Fjira%2Fsecure%2Fattachment%2F12976954%2FRocksDBFlowFileRepo.html=02%7C01%7Cpwicks%40micron.com%7Ce785de36aeeb49a54bf308d721a1d839%7Cf38a5ecd28134862b11bac1d563c806f%7C0%7C0%7C637014851336070081=GMs8yj24VVSL0Igk8wYkYvLlx0i6wtXVI8xRU3VsL0Y%3D=0>

On Wed, Aug 14, 2019 at 12:12 PM Peter Wicks (pwicks) 
mailto:pwi...@micron.com>> wrote:
I have a node in a cluster whose FlowFile repository grew so fast that it 
exceeded the amount of available heap space and now can't checkpoint. Or that 
is my interpretation of the error.

"Cannot update journal file flowfile_repository/journals/.journal because 
this journal  has already encountered a failure when attempting to write to the 
file."
Additionally, on restart, we see NiFi failed to restart because it ran out of 
heap space while doing a SchemaRecordReader.readFieldValue.  Feeling a bit 
stuck on where to go from here.

Based on metrics we collect, we see a large increase in FlowFile's on that node 
right before it crashed, and in linux we see the following:
94G ./journals/overflow-569618072
356G./journals/overflow-569892338

Oh, and a 280 GB checkpoint file

There are a few queues/known FlowFile’s that are probably the problem, and I’m 
OK with dropping them, but there is plenty of other data in there too that I 
don’t want to lose…

Thanks,
  Peter

FlowFile Repository can't checkpoint, out of heap space.

2019-08-14 Thread Peter Wicks (pwicks)

I have a node in a cluster whose FlowFile repository grew so fast that it 
exceeded the amount of available heap space and now can't checkpoint. Or that 
is my interpretation of the error.

"Cannot update journal file flowfile_repository/journals/.journal because 
this journal  has already encountered a failure when attempting to write to the 
file."
Additionally, on restart, we see NiFi failed to restart because it ran out of 
heap space while doing a SchemaRecordReader.readFieldValue.  Feeling a bit 
stuck on where to go from here.

Based on metrics we collect, we see a large increase in FlowFile's on that node 
right before it crashed, and in linux we see the following:
94G ./journals/overflow-569618072
356G./journals/overflow-569892338

Oh, and a 280 GB checkpoint file

There are a few queues/known FlowFile's that are probably the problem, and I'm 
OK with dropping them, but there is plenty of other data in there too that I 
don't want to lose...

Thanks,
  Peter

RE: [EXT] Specifying formatters at a record field level

2019-08-14 Thread Peter Wicks (pwicks)

Not that I’m aware of. We implemented something custom that lets you specify it 
with attributes on the FlowFile (something like data.field.#.format=….), we do 
the same thing for binary/hex fields. But we didn’t contribute it as it’s part 
of a custom record processing processor that’s application specific.

Thanks,
  Peter

From: Mike Thomsen 
Sent: Wednesday, August 14, 2019 8:35 AM
To: users@nifi.apache.org
Subject: [EXT] Specifying formatters at a record field level

If there any way to specify a timestamp format string on each field that is a 
TIMESTAMP (long, logical type timestamp-millis)? We have a case where we would 
need at least three, possibly half a dozen timestamp formats to read a record 
set.

Thanks,

Mike

RE: [EXT] Re: How to replace multi character delimiter with ASCII 001

2019-11-06 Thread Peter Wicks (pwicks)

Shawn,

We had the same issue, and use special and multi character delimiters here. I 
have not been able to find a CSV library that supports multi-character 
delimiters, otherwise I would have updated the CSV Record Reader to support it. 
I created a special Record Reader that supports multi-character delimiters. We 
use this in Convert Record to convert to a different format as soon as possible 
.  I don’t know if your up for using custom code… But just in case you are, 
here is my personal implementation that we use in house.

Thanks,
  Peter

--Class 1--

public class CSVReader extends AbstractControllerService implements 
RecordReaderFactory {
static final PropertyDescriptor COLUMN_DELIMITER = new 
PropertyDescriptor.Builder()
.name("pt-column-delimiter")
.displayName("Column Delimiter")
.description("The character(s) to use to separate columns of data. 
Special characters like metacharacter should use the '' notation. If not 
specified Ctrl+A delimiter is used.")
.required(false)
.defaultValue("\\u0001")
.addValidator(StandardValidators.NON_EMPTY_VALIDATOR)
.build();
static final PropertyDescriptor RECORD_DELIMITER = new 
PropertyDescriptor.Builder()
.name("pt-record-delimiter")
.displayName("Record Delimiter")
.description("The character(s) to use to separate rows of data. For 
line return press 'Shift+Enter' in this field. Special characters should use 
the '\\u' notation.")
.required(false)
.defaultValue("\n")
.addValidator(StandardValidators.NON_EMPTY_VALIDATOR)
.build();
static final PropertyDescriptor SKIP_HEADER_ROW = new 
PropertyDescriptor.Builder()
.name("pt-skip-header")
.displayName("Skip First Row")
.description("Specifies whether or not the first row of data will 
be skipped.")
.allowableValues("true", "false")
.defaultValue("true")
.addValidator(StandardValidators.NON_EMPTY_VALIDATOR)
.build();

private volatile String colDelimiter;
private volatile String recDelimiter;
   private volatile boolean skipHeader;

@OnEnabled
public void storeCsvFormat(final ConfigurationContext context) {
this.colDelimiter = 
StringEscapeUtils.unescapeJava(context.getProperty(COLUMN_DELIMITER).getValue());
this.recDelimiter = 
StringEscapeUtils.unescapeJava(context.getProperty(RECORD_DELIMITER).getValue());
this.skipHeader = context.getProperty(SKIP_HEADER_ROW).asBoolean();
}

@Override
protected List getSupportedPropertyDescriptors() {
List propertyDescriptors = new ArrayList<>();
propertyDescriptors.add(COLUMN_DELIMITER);
propertyDescriptors.add(RECORD_DELIMITER);
propertyDescriptors.add(SKIP_HEADER_ROW);

return propertyDescriptors;
}

@Override
public RecordReader createRecordReader(Map map, InputStream 
inputStream, ComponentLog componentLog) throws MalformedRecordException, 
IOException, SchemaNotFoundException {
return new CSVRecordReader(inputStream, componentLog, this.skipHeader, 
this.colDelimiter, this.recDelimiter);
}
}


--- Class 2 ---

public class CSVRecordReader implements RecordReader {
private final PeekableScanner s;
private final RecordSchema schema;
private final String colDelimiter;
private final String recordDelimiter;

public CSVRecordReader(final InputStream in, final ComponentLog logger, 
final boolean hasHeader, final String colDelimiter, final String 
recordDelimiter) throws IOException {
this.recordDelimiter = recordDelimiter;
this.colDelimiter = colDelimiter;

s = new PeekableScanner(new Scanner(in, 
"UTF-8").useDelimiter(recordDelimiter));
//Build a basic schema based on row count
final String forRowCount = s.peek();
final List fields = new ArrayList<>();

if (forRowCount != null) {
final String[] columns = forRowCount.split(colDelimiter, -1);
for (int nColumnIndex = 0; nColumnIndex < columns.length; 
nColumnIndex++) {
fields.add(new RecordField("Column_" + 
String.valueOf(nColumnIndex), RecordFieldType.STRING.getDataType(), true));
}

schema = new SimpleRecordSchema(fields);
} else {
schema = null;
}

//Skip the header line, if there is one
if (hasHeader && s.hasNext()) s.next();
}

@Override
public Record nextRecord(boolean b, boolean b1) throws IOException, 
MalformedRecordException {
if(!s.hasNext()) return null;

final String row = s.next();
final List recordFields = getSchema().getFields();

final Map values = new 
LinkedHashMap<>(recordFields.size() * 2);
final String[] columns = row.split(colDelimiter, -1);

for (int i = 0; i < columns.length; i++)

Re: [EXT] Re: sslcontext certs

2020-10-14 Thread Peter Wicks (pwicks)

Micron Confidential

I agree Nathan.  I believe the situation I ran into came about due to bad 
planning.  Users started independently hosting services, and it was only later 
that we realized that a centralized service or variables would be a better 
solution.

It would probably be easier to just go the direction you suggested 

From: Nathan Gough 
Reply-To: "users@nifi.apache.org" 
Date: Wednesday, October 14, 2020 at 1:59 PM
To: "users@nifi.apache.org" 
Subject: Re: [EXT] Re: sslcontext certs

Is there a reason each ListenHTTP has a unique SSLContextService if they're all 
using the same certificates?

If it were me, I'd use a single shared SSLContextService, and when I needed to 
update the certificate in the keystore/truststore, I would change it on disk by 
renaming the old file and putting the new file in place with the original name. 
Now NiFi and the context service refers to the updated certificates and no NiFi 
configuration changed. Does this work for you?

Nathan

On Wed, Oct 14, 2020 at 3:29 PM Peter Wicks (pwicks) 
mailto:pwi...@micron.com>> wrote:
Micron Confidential

I've found this annoying in the past as well. I would not be opposed to an 
additional implementation of the SSLContext that uses the NiFi certs by 
default, though... if it uses the client certificate as well you'd have to make 
it restricted, so as to prevent users from impersonating the servers identity 
when communicating with external services. (A restricted Controller Service?)

--Peter

On 10/14/20, 12:44 PM, "Michael Di Domenico" 
mailto:mdidomeni...@gmail.com>> wrote:

ah, okay that sounds like maybe a step in a good direction, but
doesn't necessarily solve my problem.  What I'm trying to alleviate is
the need to go into nifi to change the certs when they expire.

i'll have to look up parameter contexts, that should at least make it
so there's only one place to make the change.

thanks

On Wed, Oct 14, 2020 at 2:40 PM Joe Witt 
mailto:joe.w...@gmail.com>> wrote:
>
> Michael,
>
> There is not any specific way supported or intended to combine the 
context used by NiFi's own HTTP server with those that would be used by 
processors within the flow.
>
> However, using parameter contexts here is a great way to ensure you have 
only a single place to update for flow internals.  If those values are 
parameterized it should work out nicely.
>
> Thanks
>
> On Wed, Oct 14, 2020 at 11:34 AM Michael Di Domenico 
mailto:mdidomeni...@gmail.com>> wrote:
>>
>> i have a nifi server with several listenhttp modules on different
>> ports.  each one has an sslcontext within it that uses the same certs
>> as the main 443 instance.
>>
>> sadly i changed the cert when expired on the 443 port, but failed to
>> change the sslcontext on the ports.  is there a way to tell the
>> sslcontext on the other ports to just use the same cert that's on the
>> 443 port?
>>
>> what i'm trying to avoid having to do is change the filename in all
>> the contexts to point to the new cert, i'd rather change it in one
>> place and have everything else pick it up
>>
>> using a symlink on the filesystem seemed like one way, but i thought
>> there might be a way to do it in nifi

Micron Confidential

Micron Confidential

Re: [EXT] Re: sslcontext certs

2020-10-14 Thread Peter Wicks (pwicks)

Micron Confidential

I've found this annoying in the past as well. I would not be opposed to an 
additional implementation of the SSLContext that uses the NiFi certs by 
default, though... if it uses the client certificate as well you'd have to make 
it restricted, so as to prevent users from impersonating the servers identity 
when communicating with external services. (A restricted Controller Service?)

--Peter

On 10/14/20, 12:44 PM, "Michael Di Domenico"  wrote:

ah, okay that sounds like maybe a step in a good direction, but
doesn't necessarily solve my problem.  What I'm trying to alleviate is
the need to go into nifi to change the certs when they expire.

i'll have to look up parameter contexts, that should at least make it
so there's only one place to make the change.

thanks

On Wed, Oct 14, 2020 at 2:40 PM Joe Witt  wrote:
>
> Michael,
>
> There is not any specific way supported or intended to combine the 
context used by NiFi's own HTTP server with those that would be used by 
processors within the flow.
>
> However, using parameter contexts here is a great way to ensure you have 
only a single place to update for flow internals.  If those values are 
parameterized it should work out nicely.
>
> Thanks
>
> On Wed, Oct 14, 2020 at 11:34 AM Michael Di Domenico 
 wrote:
>>
>> i have a nifi server with several listenhttp modules on different
>> ports.  each one has an sslcontext within it that uses the same certs
>> as the main 443 instance.
>>
>> sadly i changed the cert when expired on the 443 port, but failed to
>> change the sslcontext on the ports.  is there a way to tell the
>> sslcontext on the other ports to just use the same cert that's on the
>> 443 port?
>>
>> what i'm trying to avoid having to do is change the filename in all
>> the contexts to point to the new cert, i'd rather change it in one
>> place and have everything else pick it up
>>
>> using a symlink on the filesystem seemed like one way, but i thought
>> there might be a way to do it in nifi

Micron Confidential

Re: [EXT] AW: ExecuteSQL and Teradata

2021-01-12 Thread Peter Wicks (pwicks)

Micron Confidential

Can you try setting `FINALIZE_AUTO_CLOSE=ON` in your connection string to 
Teradata? It’s not a best practice, but based on what the docs say, I think it 
might work.

>From the docs:


FINALIZE_AUTO_CLOSE values are OFF (default) or ON:

·  When set to OFF (the default), the Teradata JDBC Driver provides the JDBC 
4.0 API Specification behavior such that JDBC objects are not closed 
automatically during finalize. The application is responsible for closing or 
freeing JDBC objects.

·  When set to ON, the Teradata JDBC Driver provides the JDBC 3.0 API 
Specification behavior to close JDBC objects during finalize. This will have a 
performance impact on garbage collection, and is not recommended.

Java programming best practice is to avoid finalize methods altogether. If a 
finalize method is used, best practice is to minimize its processing time, and 
to avoid operations that can take a long time, such as network communications. 
The JDBC 3.0 API Specification contradicted these best practices by requiring a 
JDBC Driver to close JDBC objects automatically during garbage collection. The 
JDBC 4.0 API Specification dropped the requirement for automatic closing of 
JDBC objects during garbage collection, so the JDBC 4.0 API Specification is in 
agreement with these best practices.

Garbage collection can be blocked indefinitely when FINALIZE_AUTO_CLOSE is set 
to ON, and the Teradata JDBC Driver does not receive a response from the 
database after sending a message to the database to close the response spool.

This parameter is available for SQL connections beginning with Teradata JDBC 
Driver 14.00.00.08.


From: christian.gump...@ezv.admin.ch 
Date: Monday, January 11, 2021 at 2:05 PM
To: users@nifi.apache.org 
Subject: [EXT] AW: ExecuteSQL and Teradata
CAUTION: EXTERNAL EMAIL. Do not click links or open attachments unless you 
recognize the sender and were expecting this message.

Hi Phil,

we are facing
Christian exactly the same issue (using Nifi 1.11.2) and I opened a ticket for 
this 
https://issues.apache.org/jira/browse/NIFI-8119#
I already have a patch for that locally and will submit a PR shortly.

Cheers,
Christian

Von: Toivo Adams 
Gesendet: Montag, 11. Januar 2021 21:24
An: users@nifi.apache.org
Betreff: Re: ExecuteSQL and Teradata

Hi,

And you are able to recognize you have received all data?
(My Teradata knowledge is limited, sorry.)
One solution is to create a customized version of ExecuteSQL.
And either close connection (return to pool) or send signal to Teradata.
This is not that hard.

You can also try to set QUERY_TIMEOUT for standard ExecuteSQL,
does Teradata JDBC support this?

BR
Toivo

Kontakt mailto:gravity.nif...@mailnull.com>> 
kirjutas kuupäeval R, 18. detsember 2020 kell 20:24:
Dear all

In the ExecuteSQL processor, I'm facing the following problem:

I want to execute a stored procedure in a Teradata database. This stored
procedure returns LOB data. Since I receive LOB data, I have LOB_SUPPORT on in
the JDBC driver (it's the default anyway). Since LOB data is not stored inline
in the database, Teradata expects a signal from the receiver that all data has
been received (in Teradata lingo that means KeepResp is on).

The problem now is, that NiFi does not send this signal. It keeps the connection
open. After NiFi has 16 connections to Teradata open, Teradata refuses to open
another connection and the following error is thrown:

[Teradata Database] : Response limit exceeded.

There's even a nice explanation about this error here [1].

I have set the maximum number of connections to 8, as is the default in the
controller. But that does not seem to prevent my issue. If I set a max timer for
the connections, it works, but of course I do not know how long I would need to
keep the connections open.

My question now is: How can I tell NiFi to close the connection to the Teradata
database, once it has received all data?

I appreciate all the help. Thanks,
Phil

[1] 
https://teradata-docs.s3.amazonaws.com/doc/connectivity/jdbc/reference/current/jdbcug_chapter_5.html#CHDGCHBB


--
This message was sent from a MailNull

81 matches

Mail list logo