Creating user certificates

2022-05-12 Thread Jean-Sebastien Vachon
Hi again,

So I've been able to setup my Nifi server with client certificates for the 
admin user, setup a few policies and create some group and users within Nifi 
UI...

but how these new users will authenticate to Nifi? I guess I will have to 
generate certificates for each of them but how do I create them?

Thanks


Jean-Sébastien Vachon
Co-Founder & Architect
Brizo Data, Inc.
www.brizodata.com


Re: Where are my processes?

2022-03-29 Thread Jean-Sebastien Vachon
Hi,

I did check if there was any GC issue going on using VisualVM and didn't see 
anything obvious.
I will try to take a dump and dig for information

thanks for the suggestion. I really appreciate it

Jean-Sébastien Vachon
Co-Founder & Architect
Brizo Data, Inc.
www.brizodata.com<https://outlook.office365.com/mail/options/mail/messageContent/www.brizodata.com>

From: Mark Payne 
Sent: Monday, March 28, 2022 7:56 PM
To: users@nifi.apache.org 
Subject: Re: Where are my processes?

Jean-Sebastian,

I’d recommend grabbing a thread dump (bin/nifi.sh dump dump1.txt) and check the 
thread dump to see what the processors are doing.

Would also grep logs for “waiting for” to see if it shows anything.

Thanks
-Mark



Sent from my iPhone

On Mar 28, 2022, at 7:27 PM, Jean-Sebastien Vachon  
wrote:


Some additional information as I haven't found what's happening.

The average task time is about 15 seconds.. Is there any reason why the process 
wouldn't show up in top or ps?
I setup a watch updated every 1s and I can see at most a few instances of my 
process.
I ramped up to 35 concurrent processes and I saw a few more processes but with 
an average of 15 s, I was expecting to see way more than a few.

Any thoughts?

Jean-Sébastien Vachon
Co-Founder & Architect
Brizo Data, Inc.
www.brizodata.com<https://outlook.office365.com/mail/options/mail/messageContent/www.brizodata.com>
____
From: Jean-Sebastien Vachon 
Sent: Friday, March 25, 2022 9:40 AM
To: users@nifi.apache.org 
Subject: Where are my processes?

Hi all,

A strange thing seems to be happening on my server this morning I can't 
find the processes reported by Nifi.
Nifi shows 15 running processes for one of my processor (ExecuteStreamCommand 
with Python script) but when I look at the OS, I can see at most 2 of them.
I understand that very short-lived processes will be harder to spot at the OS 
level but the discrepancy seems too large to make any sense.
If I stop the processor, It takes a good minute for any activity to stop but I 
can't see anything in the list of processes.

My throughput is also 4-5 times slower than usual with that processor. There 
does not seem to be any issue with GC and the global setting for maximum thread 
count is set to 600 (I went from 500 to 600 with no effect). There is nothing 
else running in Nifi on this server at the moment.

Any idea? I'm using Nifi 1.13.2

Thanks



Re: Where are my processes?

2022-03-28 Thread Jean-Sebastien Vachon
Some additional information as I haven't found what's happening.

The average task time is about 15 seconds.. Is there any reason why the process 
wouldn't show up in top or ps?
I setup a watch updated every 1s and I can see at most a few instances of my 
process.
I ramped up to 35 concurrent processes and I saw a few more processes but with 
an average of 15 s, I was expecting to see way more than a few.

Any thoughts?

Jean-Sébastien Vachon
Co-Founder & Architect
Brizo Data, Inc.
www.brizodata.com<https://outlook.office365.com/mail/options/mail/messageContent/www.brizodata.com>
____
From: Jean-Sebastien Vachon 
Sent: Friday, March 25, 2022 9:40 AM
To: users@nifi.apache.org 
Subject: Where are my processes?

Hi all,

A strange thing seems to be happening on my server this morning I can't 
find the processes reported by Nifi.
Nifi shows 15 running processes for one of my processor (ExecuteStreamCommand 
with Python script) but when I look at the OS, I can see at most 2 of them.
I understand that very short-lived processes will be harder to spot at the OS 
level but the discrepancy seems too large to make any sense.
If I stop the processor, It takes a good minute for any activity to stop but I 
can't see anything in the list of processes.

My throughput is also 4-5 times slower than usual with that processor. There 
does not seem to be any issue with GC and the global setting for maximum thread 
count is set to 600 (I went from 500 to 600 with no effect). There is nothing 
else running in Nifi on this server at the moment.

Any idea? I'm using Nifi 1.13.2

Thanks



Where are my processes?

2022-03-25 Thread Jean-Sebastien Vachon
Hi all,

A strange thing seems to be happening on my server this morning I can't 
find the processes reported by Nifi.
Nifi shows 15 running processes for one of my processor (ExecuteStreamCommand 
with Python script) but when I look at the OS, I can see at most 2 of them.
I understand that very short-lived processes will be harder to spot at the OS 
level but the discrepancy seems too large to make any sense.
If I stop the processor, It takes a good minute for any activity to stop but I 
can't see anything in the list of processes.

My throughput is also 4-5 times slower than usual with that processor. There 
does not seem to be any issue with GC and the global setting for maximum thread 
count is set to 600 (I went from 500 to 600 with no effect). There is nothing 
else running in Nifi on this server at the moment.

Any idea? I'm using Nifi 1.13.2

Thanks



Re: InvokeHTTP vs invalid SSL certificates

2022-03-04 Thread Jean-Sebastien Vachon
Thanks David for the information.

My main issue is that we are doing massive web scraping (over 400k websites and 
growing) and I can not just add each certificate manually.
I can probably automate most of it but I wanted to see what options were 
available to me.

Thanks again. I will look into this.

Jean-Sébastien Vachon
Co-Founder & Architect
Brizo Data, Inc.
www.brizodata.com<https://outlook.office365.com/mail/options/mail/messageContent/www.brizodata.com>

From: David Handermann 
Sent: Friday, March 4, 2022 9:16 AM
To: users@nifi.apache.org 
Subject: Re: InvokeHTTP vs invalid SSL certificates

Thanks for raising this question.  The InvokeHTTP processor relies on the 
OkHttp client library, which implements standard TLS handshaking and hostname 
verification as described in their documentation:

https://square.github.io/okhttp/features/https/

There are many things that could make a certificate invalid for a specific 
connection.  If the remote certificate is self-signed, it is possible to 
configure a NiFi SSL Context Service with a trust store that includes the 
self-signed certificate.

If the remote certificate is expired, the remote server must be updated with a 
new certificate.  If the remote certificate does not include a DNS Subject 
Alternative Name (SAN) matching the domain name that InvokeHTTP uses for the 
connection, the best solution is for the remote server to be updated with a new 
certificate containing a matching SAN.

It is possible to configure OkHttp with a custom hostname verifier or trust 
manager that ignores some of these attributes, but this would require custom 
code that overrides the default behavior of InvokeHTTP.  There have been some 
requests in the past for NiFi to implement support for a custom hostname 
verifier, but this approach breaks one of the fundamental aspects of TLS 
communication security.

With that background, the potential solution depends on why InvokeHTTP 
considers the certificate invalid.

Regards,
David Handermann

On Fri, Mar 4, 2022 at 6:59 AM Jean-Sebastien Vachon 
mailto:jsvac...@brizodata.com>> wrote:
Hi all,

what is the best way to deal with invalid SSL certificates when trying to open 
an URL using InvokeHTTP?


Thanks

Jean-Sébastien Vachon
Co-Founder & Architect
Brizo Data, Inc.
www.brizodata.com<https://outlook.office365.com/mail/options/mail/messageContent/www.brizodata.com>


InvokeHTTP vs invalid SSL certificates

2022-03-04 Thread Jean-Sebastien Vachon
Hi all,

what is the best way to deal with invalid SSL certificates when trying to open 
an URL using InvokeHTTP?


Thanks

Jean-Sébastien Vachon
Co-Founder & Architect
Brizo Data, Inc.
www.brizodata.com


Re: Running unsecured Nifi in Docker

2022-02-16 Thread Jean-Sebastien Vachon
Ok my bad... it does work after all.

I have no excuse. Thanks for your help

Jean-Sébastien Vachon
Co-Founder & Architect
Brizo Data, Inc.
www.brizodata.com<https://outlook.office365.com/mail/options/mail/messageContent/www.brizodata.com>

From: Kevin Doran 
Sent: Wednesday, February 16, 2022 10:29 AM
To: users@nifi.apache.org 
Subject: Re: Running unsecured Nifi in Docker

I’ve tried the docker compose yaml config you provided as well as the one I 
sent you, and both are working for me with the latest nifi image. Is there any 
other relevant part of your config that could be causing this (e.g., changes 
since your initial email?) Have you started over from a clean state by running 
`docker compose down`?

On Feb 16, 2022 at 09:45:27, Jean-Sebastien Vachon 
mailto:jsvac...@brizodata.com>> wrote:
Here you go... thanks for the fast response.
I've looked at the start script to see what is being done and set the different 
environment variables to go through the proper sections in the file.


...
/extensions/nifi-server-nar-1.15.3.nar-unpacked/NAR-INF/bundled-dependencies/nifi-web-ui-1.15.3.war}
stack-nifi2-1| 2022-02-16 14:23:46,854 INFO [main] 
o.e.j.a.AnnotationConfiguration Scanning elapsed time=158ms
stack-nifi2-1| 2022-02-16 14:23:46,917 INFO [main] 
o.e.j.s.handler.ContextHandler._nifi_api No Spring WebApplicationInitializer 
types detected on classpath
stack-nifi2-1| 2022-02-16 14:23:46,995 INFO [main] 
o.e.j.s.handler.ContextHandler._nifi_api Initializing Spring root 
WebApplicationContext
stack-nifi2-1| 2022-02-16 14:23:51,275 INFO [main] 
o.a.nifi.properties.NiFiPropertiesLoader Loaded 198 properties from 
/opt/nifi/nifi-current/./conf/nifi.properties
stack-nifi2-1| 2022-02-16 14:23:55,907 INFO [main] 
o.a.n.r.v.FileBasedVariableRegistry Loaded 93 properties from system properties 
and environment variables
stack-nifi2-1| 2022-02-16 14:23:55,908 INFO [main] 
o.a.n.r.v.FileBasedVariableRegistry Loaded a total of 93 properties.  Including 
precedence overrides effective accessible registry key size is 93
stack-nifi2-1| 2022-02-16 14:23:56,178 WARN [main] 
o.a.nifi.security.util.SslContextFactory Some keystore properties are populated 
(, , , null) but not valid
stack-nifi2-1| 2022-02-16 14:23:56,179 ERROR [main] 
o.apache.nifi.controller.FlowController Unable to start the flow controller 
because the TLS configuration was invalid: The keystore properties are not valid
stack-nifi2-1| 2022-02-16 14:23:56,657 ERROR [main] 
o.s.web.context.ContextLoader Context initialization failed
stack-nifi2-1| 
org.springframework.beans.factory.UnsatisfiedDependencyException: Error 
creating bean with name 
'org.springframework.security.config.annotation.web.configuration.WebSecurityConfiguration':
 Unsatisfied dependency expressed through method 
'setFilterChainProxySecurityConfigurer' parameter 1; nested exception is 
org.springframework.beans.factory.BeanExpressionException: Expression parsing 
failed; nested exception is 
org.springframework.beans.factory.UnsatisfiedDependencyException: Error 
creating bean with name 'org.apache.nifi.web.NiFiWebApiSecurityConfiguration': 
Unsatisfied dependency expressed through method 'setJwtAuthenticationProvider' 
parameter 0; nested exception is 
org.springframework.beans.factory.UnsatisfiedDependencyException: Error 
creating bean with name 
'org.apache.nifi.web.security.configuration.JwtAuthenticationSecurityConfiguration':
 Unsatisfied dependency expressed through constructor parameter 3; nested 
exception is org.springframework.beans.factory.BeanCreationException: Error 
creating bean with name 'flowController': FactoryBean threw exception on object 
creation; nested exception is java.lang.IllegalStateException: Flow controller 
TLS configuration is invalid



Jean-Sébastien Vachon
Co-Founder & Architect
Brizo Data, Inc.
www.brizodata.com<https://outlook.office365.com/mail/options/mail/messageContent/www.brizodata.com>

From: Kevin Doran mailto:kdo...@apache.org>>
Sent: Wednesday, February 16, 2022 9:39 AM
To: users@nifi.apache.org<mailto:users@nifi.apache.org> 
mailto:users@nifi.apache.org>>
Subject: Re: Running unsecured Nifi in Docker

There have been some changes recently, and NiFi is now secure by default with a 
self-signed cert I believe. It could be that NIFI_WEB_HTTP_PORT conflicts with 
the expected NIFI_WEB_HTTPS_PORT.

Try this:

  nifi:
image: apache/nifi:latest
ports:
  - "8443:8443" # UI
  - "1" # Site-to-Site Input Port
environment:
  SINGLE_USER_CREDENTIALS_USERNAME: admin
  SINGLE_USER_CREDENTIALS_PASSWORD: some_password
  NIFI_SENSITIVE_PROPS_KEY: some_other_password

If that does not work, can you please share the exact startup error?

On Feb 16, 2022 at 09:28:55, Jean-Sebastien Vachon 
mailto:jsvac...@brizodata.com>>

Re: Running unsecured Nifi in Docker

2022-02-16 Thread Jean-Sebastien Vachon
Here you go... thanks for the fast response.
I've looked at the start script to see what is being done and set the different 
environment variables to go through the proper sections in the file.


...
/extensions/nifi-server-nar-1.15.3.nar-unpacked/NAR-INF/bundled-dependencies/nifi-web-ui-1.15.3.war}
stack-nifi2-1| 2022-02-16 14:23:46,854 INFO [main] 
o.e.j.a.AnnotationConfiguration Scanning elapsed time=158ms
stack-nifi2-1| 2022-02-16 14:23:46,917 INFO [main] 
o.e.j.s.handler.ContextHandler._nifi_api No Spring WebApplicationInitializer 
types detected on classpath
stack-nifi2-1| 2022-02-16 14:23:46,995 INFO [main] 
o.e.j.s.handler.ContextHandler._nifi_api Initializing Spring root 
WebApplicationContext
stack-nifi2-1| 2022-02-16 14:23:51,275 INFO [main] 
o.a.nifi.properties.NiFiPropertiesLoader Loaded 198 properties from 
/opt/nifi/nifi-current/./conf/nifi.properties
stack-nifi2-1| 2022-02-16 14:23:55,907 INFO [main] 
o.a.n.r.v.FileBasedVariableRegistry Loaded 93 properties from system properties 
and environment variables
stack-nifi2-1| 2022-02-16 14:23:55,908 INFO [main] 
o.a.n.r.v.FileBasedVariableRegistry Loaded a total of 93 properties.  Including 
precedence overrides effective accessible registry key size is 93
stack-nifi2-1| 2022-02-16 14:23:56,178 WARN [main] 
o.a.nifi.security.util.SslContextFactory Some keystore properties are populated 
(, , , null) but not valid
stack-nifi2-1| 2022-02-16 14:23:56,179 ERROR [main] 
o.apache.nifi.controller.FlowController Unable to start the flow controller 
because the TLS configuration was invalid: The keystore properties are not valid
stack-nifi2-1| 2022-02-16 14:23:56,657 ERROR [main] 
o.s.web.context.ContextLoader Context initialization failed
stack-nifi2-1| 
org.springframework.beans.factory.UnsatisfiedDependencyException: Error 
creating bean with name 
'org.springframework.security.config.annotation.web.configuration.WebSecurityConfiguration':
 Unsatisfied dependency expressed through method 
'setFilterChainProxySecurityConfigurer' parameter 1; nested exception is 
org.springframework.beans.factory.BeanExpressionException: Expression parsing 
failed; nested exception is 
org.springframework.beans.factory.UnsatisfiedDependencyException: Error 
creating bean with name 'org.apache.nifi.web.NiFiWebApiSecurityConfiguration': 
Unsatisfied dependency expressed through method 'setJwtAuthenticationProvider' 
parameter 0; nested exception is 
org.springframework.beans.factory.UnsatisfiedDependencyException: Error 
creating bean with name 
'org.apache.nifi.web.security.configuration.JwtAuthenticationSecurityConfiguration':
 Unsatisfied dependency expressed through constructor parameter 3; nested 
exception is org.springframework.beans.factory.BeanCreationException: Error 
creating bean with name 'flowController': FactoryBean threw exception on object 
creation; nested exception is java.lang.IllegalStateException: Flow controller 
TLS configuration is invalid



Jean-Sébastien Vachon
Co-Founder & Architect
Brizo Data, Inc.
www.brizodata.com<https://outlook.office365.com/mail/options/mail/messageContent/www.brizodata.com>

From: Kevin Doran 
Sent: Wednesday, February 16, 2022 9:39 AM
To: users@nifi.apache.org 
Subject: Re: Running unsecured Nifi in Docker

There have been some changes recently, and NiFi is now secure by default with a 
self-signed cert I believe. It could be that NIFI_WEB_HTTP_PORT conflicts with 
the expected NIFI_WEB_HTTPS_PORT.

Try this:

  nifi:
image: apache/nifi:latest
ports:
  - "8443:8443" # UI
  - "1" # Site-to-Site Input Port
environment:
  SINGLE_USER_CREDENTIALS_USERNAME: admin
  SINGLE_USER_CREDENTIALS_PASSWORD: some_password
  NIFI_SENSITIVE_PROPS_KEY: some_other_password

If that does not work, can you please share the exact startup error?

On Feb 16, 2022 at 09:28:55, Jean-Sebastien Vachon 
mailto:jsvac...@brizodata.com>> wrote:
Hi all,

I'm trying to start a simple unsecured Nifi instance in a container as part of 
a larger docker compose stack and I'm stuck with an error regarding the TLS 
configuration and/or keystore properties. Here is the relevant part of my 
docker-compose file... what am I missing? Please make me feel stupid 

 nifi:
image: apache/nifi:latest
# command:
ports:
- "8080:8080"
- "1:1"

restart: always
command:
/bin/bash
environment:
NIFI_REMOTE_INPUT_HOST: 0.0.0.0
NIFI_WEB_HTTP_HOST: 0.0.0.0
SINGLE_USER_CREDENTIALS_USERNAME: admin
SINGLE_USER_CREDENTIALS_PASSWORD: some_password
NIFI_WEB_HTTP_PORT: 8080
AUTH: none




Jean-Sébastien Vachon
Co-Founder & Architect
Brizo Data, Inc.
www.brizodata.com<https://outlook.office365.com/mail/options/mail/messageContent/www.brizodata.com>


Running unsecured Nifi in Docker

2022-02-16 Thread Jean-Sebastien Vachon
Hi all,

I'm trying to start a simple unsecured Nifi instance in a container as part of 
a larger docker compose stack and I'm stuck with an error regarding the TLS 
configuration and/or keystore properties. Here is the relevant part of my 
docker-compose file... what am I missing? Please make me feel stupid 

 nifi:
image: apache/nifi:latest
# command:
ports:
- "8080:8080"
- "1:1"

restart: always
command:
/bin/bash
environment:
NIFI_REMOTE_INPUT_HOST: 0.0.0.0
NIFI_WEB_HTTP_HOST: 0.0.0.0
SINGLE_USER_CREDENTIALS_USERNAME: admin
SINGLE_USER_CREDENTIALS_PASSWORD: some_password
NIFI_WEB_HTTP_PORT: 8080
AUTH: none




Jean-Sébastien Vachon
Co-Founder & Architect
Brizo Data, Inc.
www.brizodata.com


Passing single and double-quotes as part of parameters

2021-12-21 Thread Jean-Sebastien Vachon
Hi all,

I'm trying to configure an ExecuteStreamCommand processor that will simply 
receive a JSON array such as ["1 2", "3 4"]
and merge everything into a single line.

I can do it easily with jq from the command line

echo '["1 2", "4 5"]' |jq 'join(" ")'
"1 2 4 5"

I am running into issues when specifying the parameters. Nifi is complaining 
about "Unix shell encoding issues"... I have tries escaping with \, doubling 
the quotes but nothing seems to work.

I checked the documentation and I couldn't find any reference to escaping 
except for variables

Any idea?

Jean-Sébastien Vachon
Co-Founder & Architect
Brizo Data, Inc.
www.brizodata.com


Re: Trouble starting docker container

2021-10-18 Thread Jean-Sebastien Vachon
I fixed my problem... it was related to this

https://stackoverflow.com/questions/69081508/nifi-migration-required-for-blank-sensitive-properties-key
[https://cdn.sstatic.net/Sites/stackoverflow/Img/apple-touch-i...@2.png?v=73d79a89bded]<https://stackoverflow.com/questions/69081508/nifi-migration-required-for-blank-sensitive-properties-key>
java - Nifi Migration Required for blank Sensitive Properties Key - Stack 
Overflow<https://stackoverflow.com/questions/69081508/nifi-migration-required-for-blank-sensitive-properties-key>
I'm using nifi 1.14.0 in container where I'm experiencing this problem when I 
restart nifi. Migration Required for blank Sensitive Properties Key erro 
2021-09-07 01:15:03,672 INFO [main] o.a.n.p.
stackoverflow.com

Once I restored the value of nifi.sensitive.props.key, everything is fine



Jean-Sébastien Vachon
Co-Founder & Architect
Brizo Data, Inc.
www.brizodata.com<https://outlook.office365.com/mail/options/mail/messageContent/www.brizodata.com>
____
From: Jean-Sebastien Vachon 
Sent: Monday, October 18, 2021 10:54 AM
To: users@nifi.apache.org 
Subject: Re: Trouble starting docker container

I was able to recover the flow.xml.gz file from the container using this 
command:

sudo docker cp nifi:/opt/nifi/nifi-current/conf/flow.xml.gz .

But I can't seem to be able to use it in another container by mounting it as a 
volume.

A new Nifi 1.14 based container will give me about the same error:

2021-10-18 14:43:15,660 ERROR [main] o.a.nifi.properties.NiFiPropertiesLoader 
Flow Configuration [./conf/flow.xml.gz] Found: Migration Required for blank 
Sensitive Properties Key [nifi.sensitive.props.key]
2021-10-18 14:43:15,662 ERROR [main] org.apache.nifi.NiFi Failure to launch 
NiFi due to java.lang.IllegalArgumentException: There was an issue decrypting 
protected properties
java.lang.IllegalArgumentException: There was an issue decrypting protected 
properties
at org.apache.nifi.NiFi.initializeProperties(NiFi.java:346)
at 
org.apache.nifi.NiFi.convertArgumentsToValidatedNiFiProperties(NiFi.java:314)
at 
org.apache.nifi.NiFi.convertArgumentsToValidatedNiFiProperties(NiFi.java:310)
at org.apache.nifi.NiFi.main(NiFi.java:302)
Caused by: org.apache.nifi.properties.SensitivePropertyProtectionException: 
Sensitive Properties Key [nifi.sensitive.props.key] not found: See Admin Guide 
section [Updating the Sensitive Properties Key]
at 
org.apache.nifi.properties.NiFiPropertiesLoader.getDefaultProperties(NiFiPropertiesLoader.java:226)
at 
org.apache.nifi.properties.NiFiPropertiesLoader.get(NiFiPropertiesLoader.java:209)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.nifi.NiFi.initializeProperties(NiFi.java:341)
... 3 common frames omitted

I also tried using the flowfile with a previous version of Nifi... Nifi wants 
to start but then complain about this:

021-10-18 14:53:09,727 INFO [main] org.eclipse.jetty.server.Server Started 
@27643ms
2021-10-18 14:53:09,727 WARN [main] org.apache.nifi.web.server.JettyServer 
Failed to start web server... shutting down.
...
org.apache.nifi.web.NiFiCoreException: Unable to start Flow Controller.
Caused by: java.nio.file.FileSystemException: ./conf/flow.xml.gz: Device or 
resource busy


Is there any way to move a flow.xml.gz from a machine to another?


Jean-Sébastien Vachon
Co-Founder & Architect
Brizo Data, Inc.
www.brizodata.com<https://outlook.office365.com/mail/options/mail/messageContent/www.brizodata.com>
________
From: Jean-Sebastien Vachon 
Sent: Monday, October 18, 2021 9:24 AM
To: users@nifi.apache.org 
Subject: Trouble starting docker container

Hi all,

I used an instance of Nifi 1.14 last friday (single user with password) and 
everything was fine until this morning.
My PC was rebooted over the weekend and now I can't restart the container at 
all.


Java home: /usr/local/openjdk-8
NiFi home: /opt/nifi/nifi-current

Bootstrap Config File: /opt/nifi/nifi-current/conf/bootstrap.conf

2021-10-18 13:17:22,856 INFO [main] org.apache.nifi.bootstrap.Command Starting 
Apache NiFi...
2021-10-18 13:17:22,856 INFO [main] org.apache.nifi.bootstrap.Command Working 
Directory: /opt/nifi/nifi-current
2021-10-18 13:17:22,856 INFO [main] org.apache.nifi.bootstrap.Command Command: 
/usr/local/openjdk-8/bin/java -classpath 
/opt/nifi/nifi-current/./conf:/opt/nifi/nifi-current/./lib/logback-core-1.2.3.jar:/opt/nifi/nifi-current/./lib/nifi-framework-api-1.14.0.jar:/opt/nifi/nifi-current/./lib/nifi-runtime-1.14.0.jar:/opt/nifi/nifi-current/./lib/javax.servlet-api-3.1.0.jar:/opt/nifi/nifi-current/./lib/log4j-over-slf4j-1.7.30.jar:/

Re: Trouble starting docker container

2021-10-18 Thread Jean-Sebastien Vachon
I was able to recover the flow.xml.gz file from the container using this 
command:

sudo docker cp nifi:/opt/nifi/nifi-current/conf/flow.xml.gz .

But I can't seem to be able to use it in another container by mounting it as a 
volume.

A new Nifi 1.14 based container will give me about the same error:

2021-10-18 14:43:15,660 ERROR [main] o.a.nifi.properties.NiFiPropertiesLoader 
Flow Configuration [./conf/flow.xml.gz] Found: Migration Required for blank 
Sensitive Properties Key [nifi.sensitive.props.key]
2021-10-18 14:43:15,662 ERROR [main] org.apache.nifi.NiFi Failure to launch 
NiFi due to java.lang.IllegalArgumentException: There was an issue decrypting 
protected properties
java.lang.IllegalArgumentException: There was an issue decrypting protected 
properties
at org.apache.nifi.NiFi.initializeProperties(NiFi.java:346)
at 
org.apache.nifi.NiFi.convertArgumentsToValidatedNiFiProperties(NiFi.java:314)
at 
org.apache.nifi.NiFi.convertArgumentsToValidatedNiFiProperties(NiFi.java:310)
at org.apache.nifi.NiFi.main(NiFi.java:302)
Caused by: org.apache.nifi.properties.SensitivePropertyProtectionException: 
Sensitive Properties Key [nifi.sensitive.props.key] not found: See Admin Guide 
section [Updating the Sensitive Properties Key]
at 
org.apache.nifi.properties.NiFiPropertiesLoader.getDefaultProperties(NiFiPropertiesLoader.java:226)
at 
org.apache.nifi.properties.NiFiPropertiesLoader.get(NiFiPropertiesLoader.java:209)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.nifi.NiFi.initializeProperties(NiFi.java:341)
... 3 common frames omitted

I also tried using the flowfile with a previous version of Nifi... Nifi wants 
to start but then complain about this:

021-10-18 14:53:09,727 INFO [main] org.eclipse.jetty.server.Server Started 
@27643ms
2021-10-18 14:53:09,727 WARN [main] org.apache.nifi.web.server.JettyServer 
Failed to start web server... shutting down.
...
org.apache.nifi.web.NiFiCoreException: Unable to start Flow Controller.
Caused by: java.nio.file.FileSystemException: ./conf/flow.xml.gz: Device or 
resource busy


Is there any way to move a flow.xml.gz from a machine to another?


Jean-Sébastien Vachon
Co-Founder & Architect
Brizo Data, Inc.
www.brizodata.com<https://outlook.office365.com/mail/options/mail/messageContent/www.brizodata.com>
____
From: Jean-Sebastien Vachon 
Sent: Monday, October 18, 2021 9:24 AM
To: users@nifi.apache.org 
Subject: Trouble starting docker container

Hi all,

I used an instance of Nifi 1.14 last friday (single user with password) and 
everything was fine until this morning.
My PC was rebooted over the weekend and now I can't restart the container at 
all.


Java home: /usr/local/openjdk-8
NiFi home: /opt/nifi/nifi-current

Bootstrap Config File: /opt/nifi/nifi-current/conf/bootstrap.conf

2021-10-18 13:17:22,856 INFO [main] org.apache.nifi.bootstrap.Command Starting 
Apache NiFi...
2021-10-18 13:17:22,856 INFO [main] org.apache.nifi.bootstrap.Command Working 
Directory: /opt/nifi/nifi-current
2021-10-18 13:17:22,856 INFO [main] org.apache.nifi.bootstrap.Command Command: 
/usr/local/openjdk-8/bin/java -classpath 
/opt/nifi/nifi-current/./conf:/opt/nifi/nifi-current/./lib/logback-core-1.2.3.jar:/opt/nifi/nifi-current/./lib/nifi-framework-api-1.14.0.jar:/opt/nifi/nifi-current/./lib/nifi-runtime-1.14.0.jar:/opt/nifi/nifi-current/./lib/javax.servlet-api-3.1.0.jar:/opt/nifi/nifi-current/./lib/log4j-over-slf4j-1.7.30.jar:/opt/nifi/nifi-current/./lib/nifi-property-utils-1.14.0.jar:/opt/nifi/nifi-current/./lib/nifi-api-1.14.0.jar:/opt/nifi/nifi-current/./lib/jul-to-slf4j-1.7.30.jar:/opt/nifi/nifi-current/./lib/nifi-nar-utils-1.14.0.jar:/opt/nifi/nifi-current/./lib/jcl-over-slf4j-1.7.30.jar:/opt/nifi/nifi-current/./lib/nifi-stateless-bootstrap-1.14.0.jar:/opt/nifi/nifi-current/./lib/logback-classic-1.2.3.jar:/opt/nifi/nifi-current/./lib/nifi-server-api-1.14.0.jar:/opt/nifi/nifi-current/./lib/jetty-schemas-3.1.jar:/opt/nifi/nifi-current/./lib/slf4j-api-1.7.30.jar:/opt/nifi/nifi-current/./lib/nifi-stateless-api-1.14.0.jar:/opt/nifi/nifi-current/./lib/nifi-properties-1.14.0.jar
 -Dorg.apache.jasper.compiler.disablejsr199=true -Xmx512m -Xms512m 
-Dcurator-log-only-first-connection-issue-as-error-level=true 
-Djavax.security.auth.useSubjectCredsOnly=true 
-Djava.security.egd=file:/dev/urandom -Dzookeeper.admin.enableServer=false 
-Dsun.net.http.allowRestrictedHeaders=true -Djava.net.preferIPv4Stack=true 
-Djava.awt.headless=true -Djava.protocol.handler.pkgs=sun.net.www.protocol 
-Dnifi.properties.file.path=/opt/nifi/nifi-current/./conf/nifi.properties 
-Dnifi.bootstrap.listen.port=40717 

Trouble starting docker container

2021-10-18 Thread Jean-Sebastien Vachon
Hi all,

I used an instance of Nifi 1.14 last friday (single user with password) and 
everything was fine until this morning.
My PC was rebooted over the weekend and now I can't restart the container at 
all.


Java home: /usr/local/openjdk-8
NiFi home: /opt/nifi/nifi-current

Bootstrap Config File: /opt/nifi/nifi-current/conf/bootstrap.conf

2021-10-18 13:17:22,856 INFO [main] org.apache.nifi.bootstrap.Command Starting 
Apache NiFi...
2021-10-18 13:17:22,856 INFO [main] org.apache.nifi.bootstrap.Command Working 
Directory: /opt/nifi/nifi-current
2021-10-18 13:17:22,856 INFO [main] org.apache.nifi.bootstrap.Command Command: 
/usr/local/openjdk-8/bin/java -classpath 
/opt/nifi/nifi-current/./conf:/opt/nifi/nifi-current/./lib/logback-core-1.2.3.jar:/opt/nifi/nifi-current/./lib/nifi-framework-api-1.14.0.jar:/opt/nifi/nifi-current/./lib/nifi-runtime-1.14.0.jar:/opt/nifi/nifi-current/./lib/javax.servlet-api-3.1.0.jar:/opt/nifi/nifi-current/./lib/log4j-over-slf4j-1.7.30.jar:/opt/nifi/nifi-current/./lib/nifi-property-utils-1.14.0.jar:/opt/nifi/nifi-current/./lib/nifi-api-1.14.0.jar:/opt/nifi/nifi-current/./lib/jul-to-slf4j-1.7.30.jar:/opt/nifi/nifi-current/./lib/nifi-nar-utils-1.14.0.jar:/opt/nifi/nifi-current/./lib/jcl-over-slf4j-1.7.30.jar:/opt/nifi/nifi-current/./lib/nifi-stateless-bootstrap-1.14.0.jar:/opt/nifi/nifi-current/./lib/logback-classic-1.2.3.jar:/opt/nifi/nifi-current/./lib/nifi-server-api-1.14.0.jar:/opt/nifi/nifi-current/./lib/jetty-schemas-3.1.jar:/opt/nifi/nifi-current/./lib/slf4j-api-1.7.30.jar:/opt/nifi/nifi-current/./lib/nifi-stateless-api-1.14.0.jar:/opt/nifi/nifi-current/./lib/nifi-properties-1.14.0.jar
 -Dorg.apache.jasper.compiler.disablejsr199=true -Xmx512m -Xms512m 
-Dcurator-log-only-first-connection-issue-as-error-level=true 
-Djavax.security.auth.useSubjectCredsOnly=true 
-Djava.security.egd=file:/dev/urandom -Dzookeeper.admin.enableServer=false 
-Dsun.net.http.allowRestrictedHeaders=true -Djava.net.preferIPv4Stack=true 
-Djava.awt.headless=true -Djava.protocol.handler.pkgs=sun.net.www.protocol 
-Dnifi.properties.file.path=/opt/nifi/nifi-current/./conf/nifi.properties 
-Dnifi.bootstrap.listen.port=40717 -Dapp=NiFi 
-Dorg.apache.nifi.bootstrap.config.log.dir=/opt/nifi/nifi-current/logs 
org.apache.nifi.NiFi
2021-10-18 13:17:22,879 INFO [main] org.apache.nifi.bootstrap.Command Launched 
Apache NiFi with Process ID 95
2021-10-18 13:17:23,095 INFO [main] org.apache.nifi.NiFi Launching NiFi...
2021-10-18 13:17:23,356 INFO [main] o.a.n.p.AbstractBootstrapPropertiesLoader 
Determined default application properties path to be 
'/opt/nifi/nifi-current/./conf/nifi.properties'
2021-10-18 13:17:23,361 INFO [main] o.a.nifi.properties.NiFiPropertiesLoader 
Loaded 202 properties from /opt/nifi/nifi-current/./conf/nifi.properties
2021-10-18 13:17:23,368 ERROR [main] o.a.nifi.properties.NiFiPropertiesLoader 
Flow Configuration [./conf/flow.xml.gz] Found: Migration Required for blank 
Sensitive Properties Key [nifi.sensitive.props.key]
2021-10-18 13:17:23,369 ERROR [main] org.apache.nifi.NiFi Failure to launch 
NiFi due to java.lang.IllegalArgumentException: There was an issue decrypting 
protected properties
java.lang.IllegalArgumentException: There was an issue decrypting protected 
properties
at org.apache.nifi.NiFi.initializeProperties(NiFi.java:346)
at 
org.apache.nifi.NiFi.convertArgumentsToValidatedNiFiProperties(NiFi.java:314)
at 
org.apache.nifi.NiFi.convertArgumentsToValidatedNiFiProperties(NiFi.java:310)
at org.apache.nifi.NiFi.main(NiFi.java:302)
Caused by: org.apache.nifi.properties.SensitivePropertyProtectionException: 
Sensitive Properties Key [nifi.sensitive.props.key] not found: See Admin Guide 
section [Updating the Sensitive Properties Key]
at 
org.apache.nifi.properties.NiFiPropertiesLoader.getDefaultProperties(NiFiPropertiesLoader.java:226)
at 
org.apache.nifi.properties.NiFiPropertiesLoader.get(NiFiPropertiesLoader.java:209)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.nifi.NiFi.initializeProperties(NiFi.java:341)
... 3 common frames omitted

Anybody understand what's going on?

Is there anything I can do to recover my flowfile?

Jean-Sébastien Vachon
Co-Founder & Architect
Brizo Data, Inc.
www.brizodata.com


Nifi 1.14 vs ActiveDirectory authentication

2021-09-22 Thread Jean-Sebastien Vachon
Hi all,

Is it possible to use our Office 365 Active Directory as an identity provider 
to Nifi 1.14+?
We are a small team right now but we're expecting to grow considerably in the 
coming months so I'm looking for a long term solution
to control who can access our Nifi clusters

Thanks

Jean-Sébastien Vachon
Co-Founder & Architect
Brizo Data, Inc.
www.brizodata.com


Re: UI is not as responsive...

2021-09-08 Thread Jean-Sebastien Vachon
Thanks. I'll give Firefox a try and see how it goes

Jean-Sébastien Vachon
Co-Founder & Architect
Brizo Data, Inc.
www.brizodata.com<https://outlook.office365.com/mail/options/mail/messageContent/www.brizodata.com>

From: Lucas 4MDG 
Sent: Wednesday, September 8, 2021 8:49 AM
To: users@nifi.apache.org 
Subject: Re: UI is not as responsive...

Hi!

I found out that Firefox is much performative for the NiFi UI than Google 
Chrome.

Best Regards,


On 7 Sep 2021, at 09:59, Matt Gilman 
mailto:matt.c.gil...@gmail.com>> wrote:

Josef,

There was a regression in Chrome 92.x that affects SVG heavy web applications 
like NiFi. Here is the Chrome issues tracking this [1]. And here is a Chrome 
Help thread discussing the matter [2].

[1] https://bugs.chromium.org/p/chromium/issues/detail?id=1235045
[2] 
https://support.google.com/chrome/thread/118284571/any-one-suffers-from-the-newest-ver-92-rendering-some-heavy-svg-jobs?hl=en

On Mon, Sep 6, 2021 at 10:12 AM 
mailto:josef.zahn...@swisscom.com>> wrote:

Hi guys



We can confirm the slow browser behavior as well and it’s very annoying. We 
have single node NiFis as well multiple NiFi clusters with different sizes. It 
happens everywhere and is definitely browser specific. We’ve also tried to 
restart NiFi, but no change at all. It so slow that in 2-Node cluster and a PG 
with 300 processors it sometimes takes longer than the browser timeout to just 
ENTER the PG.



It happens with Chrome 92.x and as well with Edge 93.x (both based on 
Chromium?). Firefox is way faster -> we switched over to Firefox. We don’t 
exactly know when the issue started, but we have definitely just slightly 
modified our workflows in the last 2-3 months and we were using NiFi 1.13.2 and 
the same Java Version since multiple months. We are working with NiFi since 
1.4.x, so we are not new into NiFi.



We see that memory goes up fast when we try to open a PG with Chrome, but we 
don’t know what’s normal.



To answer Marks questions:

  1.  It’s faster when we zoom in/out in a way that NiFi stops rendering the 
stats
  2.  GUI refresh for a single NiFi PG with 30 processors takes 2-3s while the 
logs shows at max  40-50milis for the GET.
  3.  Network is fast as hell, no change there. As Firefox is way faster than 
Chrome/Edge I don’t think it’s a connectivity issue.
- NiFi 1.13.2
- Java 1.8.0_282
- In “bigger” PGs with 300 processors it takes more than 10s to open the flow. 
Most of the time the browser windows crashes due to long timeout.



Hope this helps.



Could it be that Pierre referred to this issue/improvement for NiFi 1.15.0?

https://issues.apache.org/jira/browse/NIFI-9061



For us this is a major issue, but as we have a working alternative (Firefox) we 
didn’t raised a jira ticket yet.



Cheers Josef





From: Mark Payne mailto:marka...@hotmail.com>>
Reply to: "users@nifi.apache.org<mailto:users@nifi.apache.org>" 
mailto:users@nifi.apache.org>>
Date: Saturday, 4 September 2021 at 15:55
To: "users@nifi.apache.org<mailto:users@nifi.apache.org>" 
mailto:users@nifi.apache.org>>
Subject: Re: UI is not as responsive...



OK so there are really 3 parts to consider here in order to understand what is 
making things sluggish:



- Front-end rendering

- Backend processing

- Network between your browser and the back end



So a few things to consider here:



- If you’re seeing the sluggishness in a Process Group with only a few 
elements, that leads me to believe it’s probably NOT the browser rendering 
that’s an issue. But another thing to check, to help verify: zoom out using 
your mouse wheel to the point where NiFi no longer renders the stats on the 
processors. Once you reach this level of zoom, the rendering is much cheaper. 
Do you still the same lag, or is the lag less at this point?



- To understand how long the backend is taking to process the request, you can 
add the following to your conf/logback.xml file:





   This will cause nifi to log in the nifi-app.log file something like:

GET /flow/1234 from localhost duration for Request ID 4567: 102 millis



   So watch the logs here. Are you seeing the request times in the logs are 
constantly very short while the UI takes a long time to render the request?



- Do you have any idea what kind of latency and throughput you expect between 
the machine running the browser and the machine running nifi?



Also, a few other things to understand:

- What version of NiFi are you running?

- What version of Java?

- When you say the UI is not as responsive, what kind of delay are you seeing? 
1 second to refresh the UI? 10 seconds?



Thanks

-Mark







On Sep 3, 2021, at 1:42 PM, Jean-Sebastien Vachon 
mailto:jsvac...@brizodata.com>> wrote:



Hi Mark,



thanks for the quick response. I am running a single stand-alone Nifi instance 
(1.13.2)

I tried with a smaller group (1 input port and 8 processors)

Re: UI is not as responsive...

2021-09-03 Thread Jean-Sebastien Vachon
Hi Mark,

thanks for the quick response. I am running a single stand-alone Nifi instance 
(1.13.2)
I tried with a smaller group (1 input port and 8 processors), and I still 
experience slow downs.

I've looked at the timing of the backend calls and everything seems in order.

I am using Edge but some of my colleagues are using Firefox/Chrome and 
experienced the same.

One of the flows we are dealing with is relatively complex and involves about 
50 processors.
I will try to split it into smaller groups and see how it goes.

Thanks

Jean-Sébastien Vachon
Co-Founder & Architect
Brizo Data, Inc.
www.brizodata.com<https://outlook.office365.com/mail/options/mail/messageContent/www.brizodata.com>

From: Mark Payne 
Sent: Friday, September 3, 2021 1:19 PM
To: users@nifi.apache.org 
Subject: Re: UI is not as responsive...

Jean-Sebastien,

Are you running a cluster or a single, stand-alone nifi instance? The slowness 
could be either on the backend (performing the action and formulating the 
response to the UI) or on the UI end, where it has to render everything.

One thing you can do to help understand which is causing the slowness is to 
create a new, empty process group and then step into it. Is the UI still 
sluggish when you’re in that process group, or is the UI faster there? Also, 
which browser are you using?

Thanks
-Mark

Sent from my iPhone

On Sep 3, 2021, at 1:03 PM, Jean-Sebastien Vachon  
wrote:


Hi all,

The UI has been slowing down considerably over the last few days/weeks. I tried 
restarting Nifi but it does not really make any difference.
I tuned the JVM and there is no sign of heavy GC going on.

What other things should I investigate? There is currently around 4.2 MB of 
data in all my flows... so not much going on and it is still slow.

My server as 128 CPUs and 512GB of Ram of which 15 are allocated to Nifi.
I do have other processes running but nothing to cause any slowdown.
The load on the server is around 25 and is 98.5% idle. There is nothing going 
on regarding storage as well.

Thanks

Jean-Sébastien Vachon
Co-Founder & Architect
Brizo Data, Inc.
www.brizodata.com<https://outlook.office365.com/mail/options/mail/messageContent/www.brizodata.com>


UI is not as responsive...

2021-09-03 Thread Jean-Sebastien Vachon
Hi all,

The UI has been slowing down considerably over the last few days/weeks. I tried 
restarting Nifi but it does not really make any difference.
I tuned the JVM and there is no sign of heavy GC going on.

What other things should I investigate? There is currently around 4.2 MB of 
data in all my flows... so not much going on and it is still slow.

My server as 128 CPUs and 512GB of Ram of which 15 are allocated to Nifi.
I do have other processes running but nothing to cause any slowdown.
The load on the server is around 25 and is 98.5% idle. There is nothing going 
on regarding storage as well.

Thanks

Jean-Sébastien Vachon
Co-Founder & Architect
Brizo Data, Inc.
www.brizodata.com


Re: Questions about the GetFile processor

2021-02-26 Thread Jean-Sebastien Vachon
Thanks for the hint

Télécharger Outlook pour Android<https://aka.ms/ghei36>


From: Joe Witt 
Sent: Friday, February 26, 2021 10:13:20 PM
To: users@nifi.apache.org 
Subject: Re: Questions about the GetFile processor

Hello

Yeah when there are a ton (50k or more) of files in a directory performance is 
*horrible*.   If you can put them into some subdirs to divide it up then it 
will go a lot faster.

Thanks

On Fri, Feb 26, 2021 at 7:30 PM Jean-Sebastien Vachon 
mailto:jsvac...@brizodata.com>> wrote:
Hi again,

I need to reprocess all my files after we discovered a problem. My folder 
contains 3,906,135 JSON files (590GB total size).
I tried the ListFile strategy, and it works fine on a small subset but on the 
whole dataset not a single flow was queued after many hours of waiting.

Is it normal that it takes so long to do something?

I am using the following settings:

  Tracking Timestamps,
  no recurse,
  file filter is set to the default ([^\.].*),
  the minimal size is 0b and the min age is 0s,
  track performance is off,
  max number of files is set to 5,000,000
  max disk op time is 10 s
  max directory listing time is 3 hours

Am I doing something wrong? my server is quite capable with 512GB of Ram and 
128 cores.

Thanks

Jean-Sébastien Vachon
Co-Founder & Architect
Brizo Data, Inc.
www.brizodata.com<https://outlook.office365.com/mail/options/mail/messageContent/www.brizodata.com>
____
From: Jean-Sebastien Vachon 
mailto:jsvac...@brizodata.com>>
Sent: Thursday, February 18, 2021 8:59 AM

To: users@nifi.apache.org<mailto:users@nifi.apache.org> 
mailto:users@nifi.apache.org>>
Subject: Re: Questions about the GetFile processor

OK thanks

I missed that part of the documentation. Stupid me

Jean-Sébastien Vachon
Co-Founder & Architect
Brizo Data, Inc.
www.brizodata.com<https://outlook.office365.com/mail/options/mail/messageContent/www.brizodata.com>

From: Arpad Boda mailto:ab...@apache.org>>
Sent: Thursday, February 18, 2021 8:46 AM
To: users@nifi.apache.org<mailto:users@nifi.apache.org> 
mailto:users@nifi.apache.org>>
Subject: Re: Questions about the GetFile processor

GetFile has no persistence.
Actually it has, but it's called your hard drive. :)

If you take a look at the documentation:
Keep Source File - "If true, the file is not deleted after it has been copied 
to the Content Repository; this causes the file to be picked up continually and 
is useful for testing purposes. If not keeping original NiFi will need write 
permissions on the directory it is pulling from otherwise it will ignore the 
file."

You can see that it's going to get the same files over and over again unless 
you configure it to delete the already processed ones.

The reason I suggested the combination above is that listfile can be triggered 
once, the metadata (filenames) are stored in your queue and fetchfile can 
process them later.

On Thu, Feb 18, 2021 at 2:39 PM Jean-Sebastien Vachon 
mailto:jsvac...@brizodata.com>> wrote:
OK I understand your point.. sorry (early morning) 

I am kind of stuck with the GetFile processor for now. Is there a way to know 
how many files are left to process?

Will it go forever? or will it stops streaming once all files have been 
processed? (there are no new files in the folder... everything was there at the 
beginning)

Thanks

Jean-Sébastien Vachon
Co-Founder & Architect
Brizo Data, Inc.
www.brizodata.com<https://outlook.office365.com/mail/options/mail/messageContent/www.brizodata.com>

From: Jean-Sebastien Vachon 
mailto:jsvac...@brizodata.com>>
Sent: Thursday, February 18, 2021 8:34 AM
To: users@nifi.apache.org<mailto:users@nifi.apache.org> 
mailto:users@nifi.apache.org>>
Subject: Re: Questions about the GetFile processor

Thanks for your comment. However, I can't queue everything as the total size of 
the data is around 560GB.
Right now, I am using a GetFile processor and it has been running for a few 
days. If I look at my end point, it looks like it should be done pretty soon 
but data is still
streaming in at the same rate so I was wondering if the processor remembers 
every single file it has already processed or if it is simply going through all 
the files alphabetically or in whatever order it decides.

Thanks

Jean-Sébastien Vachon
Co-Founder & Architect
Brizo Data, Inc.
www.brizodata.com<https://outlook.office365.com/mail/options/mail/messageContent/www.brizodata.com>

From: Arpad Boda mailto:ab...@apache.org>>
Sent: Thursday, February 18, 2021 8:29 AM
To: users@nifi.apache.org<mailto:users@nifi.apache.org> 
mailto:users@nifi.apache.org>>
Subject: Re: Questions about the GetFile processor

You can use the combination of listfile and fetchfile.
In the queue between the tw

Re: Questions about the GetFile processor

2021-02-26 Thread Jean-Sebastien Vachon
Hi again,

I need to reprocess all my files after we discovered a problem. My folder 
contains 3,906,135 JSON files (590GB total size).
I tried the ListFile strategy, and it works fine on a small subset but on the 
whole dataset not a single flow was queued after many hours of waiting.

Is it normal that it takes so long to do something?

I am using the following settings:

  Tracking Timestamps,
  no recurse,
  file filter is set to the default ([^\.].*),
  the minimal size is 0b and the min age is 0s,
  track performance is off,
  max number of files is set to 5,000,000
  max disk op time is 10 s
  max directory listing time is 3 hours

Am I doing something wrong? my server is quite capable with 512GB of Ram and 
128 cores.

Thanks

Jean-Sébastien Vachon
Co-Founder & Architect
Brizo Data, Inc.
www.brizodata.com<https://outlook.office365.com/mail/options/mail/messageContent/www.brizodata.com>
____
From: Jean-Sebastien Vachon 
Sent: Thursday, February 18, 2021 8:59 AM
To: users@nifi.apache.org 
Subject: Re: Questions about the GetFile processor

OK thanks

I missed that part of the documentation. Stupid me

Jean-Sébastien Vachon
Co-Founder & Architect
Brizo Data, Inc.
www.brizodata.com<https://outlook.office365.com/mail/options/mail/messageContent/www.brizodata.com>

From: Arpad Boda 
Sent: Thursday, February 18, 2021 8:46 AM
To: users@nifi.apache.org 
Subject: Re: Questions about the GetFile processor

GetFile has no persistence.
Actually it has, but it's called your hard drive. :)

If you take a look at the documentation:
Keep Source File - "If true, the file is not deleted after it has been copied 
to the Content Repository; this causes the file to be picked up continually and 
is useful for testing purposes. If not keeping original NiFi will need write 
permissions on the directory it is pulling from otherwise it will ignore the 
file."

You can see that it's going to get the same files over and over again unless 
you configure it to delete the already processed ones.

The reason I suggested the combination above is that listfile can be triggered 
once, the metadata (filenames) are stored in your queue and fetchfile can 
process them later.

On Thu, Feb 18, 2021 at 2:39 PM Jean-Sebastien Vachon 
mailto:jsvac...@brizodata.com>> wrote:
OK I understand your point.. sorry (early morning) 

I am kind of stuck with the GetFile processor for now. Is there a way to know 
how many files are left to process?

Will it go forever? or will it stops streaming once all files have been 
processed? (there are no new files in the folder... everything was there at the 
beginning)

Thanks

Jean-Sébastien Vachon
Co-Founder & Architect
Brizo Data, Inc.
www.brizodata.com<https://outlook.office365.com/mail/options/mail/messageContent/www.brizodata.com>
________
From: Jean-Sebastien Vachon 
mailto:jsvac...@brizodata.com>>
Sent: Thursday, February 18, 2021 8:34 AM
To: users@nifi.apache.org<mailto:users@nifi.apache.org> 
mailto:users@nifi.apache.org>>
Subject: Re: Questions about the GetFile processor

Thanks for your comment. However, I can't queue everything as the total size of 
the data is around 560GB.
Right now, I am using a GetFile processor and it has been running for a few 
days. If I look at my end point, it looks like it should be done pretty soon 
but data is still
streaming in at the same rate so I was wondering if the processor remembers 
every single file it has already processed or if it is simply going through all 
the files alphabetically or in whatever order it decides.

Thanks

Jean-Sébastien Vachon
Co-Founder & Architect
Brizo Data, Inc.
www.brizodata.com<https://outlook.office365.com/mail/options/mail/messageContent/www.brizodata.com>

From: Arpad Boda mailto:ab...@apache.org>>
Sent: Thursday, February 18, 2021 8:29 AM
To: users@nifi.apache.org<mailto:users@nifi.apache.org> 
mailto:users@nifi.apache.org>>
Subject: Re: Questions about the GetFile processor

You can use the combination of listfile and fetchfile.
In the queue between the two you are going to see the number of (flow)files 
left to be processed.

On Thu, Feb 18, 2021 at 2:14 PM Jean-Sebastien Vachon 
mailto:jsvac...@brizodata.com>> wrote:
Hi all,

If I configure a GetFile processor to list all JSON files under a given folder, 
will it stops sending flows once it has processed all files?
My folder contains thousands of files and the processor reads them by small 
batch (10) every 30s.

Is there a way to know how many files are left to processed?

Thanks

Jean-Sébastien Vachon
Co-Founder & Architect
Brizo Data, Inc.
www.brizodata.com<https://outlook.office365.com/mail/options/mail/messageContent/www.brizodata.com>


Re: Questions about the GetFile processor

2021-02-18 Thread Jean-Sebastien Vachon
OK thanks

I missed that part of the documentation. Stupid me

Jean-Sébastien Vachon
Co-Founder & Architect
Brizo Data, Inc.
www.brizodata.com<https://outlook.office365.com/mail/options/mail/messageContent/www.brizodata.com>

From: Arpad Boda 
Sent: Thursday, February 18, 2021 8:46 AM
To: users@nifi.apache.org 
Subject: Re: Questions about the GetFile processor

GetFile has no persistence.
Actually it has, but it's called your hard drive. :)

If you take a look at the documentation:
Keep Source File - "If true, the file is not deleted after it has been copied 
to the Content Repository; this causes the file to be picked up continually and 
is useful for testing purposes. If not keeping original NiFi will need write 
permissions on the directory it is pulling from otherwise it will ignore the 
file."

You can see that it's going to get the same files over and over again unless 
you configure it to delete the already processed ones.

The reason I suggested the combination above is that listfile can be triggered 
once, the metadata (filenames) are stored in your queue and fetchfile can 
process them later.

On Thu, Feb 18, 2021 at 2:39 PM Jean-Sebastien Vachon 
mailto:jsvac...@brizodata.com>> wrote:
OK I understand your point.. sorry (early morning) 

I am kind of stuck with the GetFile processor for now. Is there a way to know 
how many files are left to process?

Will it go forever? or will it stops streaming once all files have been 
processed? (there are no new files in the folder... everything was there at the 
beginning)

Thanks

Jean-Sébastien Vachon
Co-Founder & Architect
Brizo Data, Inc.
www.brizodata.com<https://outlook.office365.com/mail/options/mail/messageContent/www.brizodata.com>
________
From: Jean-Sebastien Vachon 
mailto:jsvac...@brizodata.com>>
Sent: Thursday, February 18, 2021 8:34 AM
To: users@nifi.apache.org<mailto:users@nifi.apache.org> 
mailto:users@nifi.apache.org>>
Subject: Re: Questions about the GetFile processor

Thanks for your comment. However, I can't queue everything as the total size of 
the data is around 560GB.
Right now, I am using a GetFile processor and it has been running for a few 
days. If I look at my end point, it looks like it should be done pretty soon 
but data is still
streaming in at the same rate so I was wondering if the processor remembers 
every single file it has already processed or if it is simply going through all 
the files alphabetically or in whatever order it decides.

Thanks

Jean-Sébastien Vachon
Co-Founder & Architect
Brizo Data, Inc.
www.brizodata.com<https://outlook.office365.com/mail/options/mail/messageContent/www.brizodata.com>

From: Arpad Boda mailto:ab...@apache.org>>
Sent: Thursday, February 18, 2021 8:29 AM
To: users@nifi.apache.org<mailto:users@nifi.apache.org> 
mailto:users@nifi.apache.org>>
Subject: Re: Questions about the GetFile processor

You can use the combination of listfile and fetchfile.
In the queue between the two you are going to see the number of (flow)files 
left to be processed.

On Thu, Feb 18, 2021 at 2:14 PM Jean-Sebastien Vachon 
mailto:jsvac...@brizodata.com>> wrote:
Hi all,

If I configure a GetFile processor to list all JSON files under a given folder, 
will it stops sending flows once it has processed all files?
My folder contains thousands of files and the processor reads them by small 
batch (10) every 30s.

Is there a way to know how many files are left to processed?

Thanks

Jean-Sébastien Vachon
Co-Founder & Architect
Brizo Data, Inc.
www.brizodata.com<https://outlook.office365.com/mail/options/mail/messageContent/www.brizodata.com>


Re: Questions about the GetFile processor

2021-02-18 Thread Jean-Sebastien Vachon
OK I understand your point.. sorry (early morning) 

I am kind of stuck with the GetFile processor for now. Is there a way to know 
how many files are left to process?

Will it go forever? or will it stops streaming once all files have been 
processed? (there are no new files in the folder... everything was there at the 
beginning)

Thanks

Jean-Sébastien Vachon
Co-Founder & Architect
Brizo Data, Inc.
www.brizodata.com<https://outlook.office365.com/mail/options/mail/messageContent/www.brizodata.com>
____
From: Jean-Sebastien Vachon 
Sent: Thursday, February 18, 2021 8:34 AM
To: users@nifi.apache.org 
Subject: Re: Questions about the GetFile processor

Thanks for your comment. However, I can't queue everything as the total size of 
the data is around 560GB.
Right now, I am using a GetFile processor and it has been running for a few 
days. If I look at my end point, it looks like it should be done pretty soon 
but data is still
streaming in at the same rate so I was wondering if the processor remembers 
every single file it has already processed or if it is simply going through all 
the files alphabetically or in whatever order it decides.

Thanks

Jean-Sébastien Vachon
Co-Founder & Architect
Brizo Data, Inc.
www.brizodata.com<https://outlook.office365.com/mail/options/mail/messageContent/www.brizodata.com>

From: Arpad Boda 
Sent: Thursday, February 18, 2021 8:29 AM
To: users@nifi.apache.org 
Subject: Re: Questions about the GetFile processor

You can use the combination of listfile and fetchfile.
In the queue between the two you are going to see the number of (flow)files 
left to be processed.

On Thu, Feb 18, 2021 at 2:14 PM Jean-Sebastien Vachon 
mailto:jsvac...@brizodata.com>> wrote:
Hi all,

If I configure a GetFile processor to list all JSON files under a given folder, 
will it stops sending flows once it has processed all files?
My folder contains thousands of files and the processor reads them by small 
batch (10) every 30s.

Is there a way to know how many files are left to processed?

Thanks

Jean-Sébastien Vachon
Co-Founder & Architect
Brizo Data, Inc.
www.brizodata.com<https://outlook.office365.com/mail/options/mail/messageContent/www.brizodata.com>


Re: Questions about the GetFile processor

2021-02-18 Thread Jean-Sebastien Vachon
Thanks for your comment. However, I can't queue everything as the total size of 
the data is around 560GB.
Right now, I am using a GetFile processor and it has been running for a few 
days. If I look at my end point, it looks like it should be done pretty soon 
but data is still
streaming in at the same rate so I was wondering if the processor remembers 
every single file it has already processed or if it is simply going through all 
the files alphabetically or in whatever order it decides.

Thanks

Jean-Sébastien Vachon
Co-Founder & Architect
Brizo Data, Inc.
www.brizodata.com<https://outlook.office365.com/mail/options/mail/messageContent/www.brizodata.com>

From: Arpad Boda 
Sent: Thursday, February 18, 2021 8:29 AM
To: users@nifi.apache.org 
Subject: Re: Questions about the GetFile processor

You can use the combination of listfile and fetchfile.
In the queue between the two you are going to see the number of (flow)files 
left to be processed.

On Thu, Feb 18, 2021 at 2:14 PM Jean-Sebastien Vachon 
mailto:jsvac...@brizodata.com>> wrote:
Hi all,

If I configure a GetFile processor to list all JSON files under a given folder, 
will it stops sending flows once it has processed all files?
My folder contains thousands of files and the processor reads them by small 
batch (10) every 30s.

Is there a way to know how many files are left to processed?

Thanks

Jean-Sébastien Vachon
Co-Founder & Architect
Brizo Data, Inc.
www.brizodata.com<https://outlook.office365.com/mail/options/mail/messageContent/www.brizodata.com>


Questions about the GetFile processor

2021-02-18 Thread Jean-Sebastien Vachon
Hi all,

If I configure a GetFile processor to list all JSON files under a given folder, 
will it stops sending flows once it has processed all files?
My folder contains thousands of files and the processor reads them by small 
batch (10) every 30s.

Is there a way to know how many files are left to processed?

Thanks

Jean-Sébastien Vachon
Co-Founder & Architect
Brizo Data, Inc.
www.brizodata.com


Question on ExecuteProcess processor

2021-01-14 Thread Jean-Sebastien Vachon
Hi all,

I just started using an ExecuteProcess processor to run a python3 process and 
everything seems to work fine.
But if I stop the processor while it is running, the process will complete but 
the resulting flow is lost due to a Broken Pipe.
I used the ExecuteStreamCommand in the past it was gracefully completing the 
process without loosing anything.
I was expecting the same from the ExecuteProcess module

Is that the intended behaviour? I am using Nifi 1.11.4

Thanks

Jean-Sébastien Vachon
Co-Founder & Architect
Brizo Data, Inc.
www.brizodata.com


Merging back flows

2020-07-10 Thread Jean-Sebastien Vachon
Hi all,

A quick question regarding merging back flows into a single flow.

I have a processor that outputs a JSON formatted flow containing information 
such as the following:

{
   “organization”: {
“id”: 1234,
“name”: “demo”
   },
   … some other fields …
}

This is sent to multiple processors that will compute and add different fields 
to the flow… which will need to be merged back together into a single JSON 
formatted flow.
The problem is that the “organization” section above will appear in each of the 
individual flows so if I simply use a MergeContent processor, this section will 
appear more than once ( I am using BinaryConcatenation)

What would be the best strategy to accomplish this? I was thinking about adding 
EvaluateJsonPath or JOLTTransformJson after each processor to filter the flows 
before sending everything to the MergeContent.
Are there any other ways of performing this?

Thanks



RE: Accessing flow attributes from ExecuteStreamCommand

2020-05-28 Thread Jean-Sebastien Vachon
Thanks Mike

That was my fallback in case it was not supported out of the box.
I guess another alternative would be to create a custom processor or use some 
other engine/cache such as Redis to store the data if it becomes too large.

Thanks

Sent from Mail<https://go.microsoft.com/fwlink/?LinkId=550986> for Windows 10

From: Mike Thomsen<mailto:mikerthom...@gmail.com>
Sent: May 28, 2020 10:57 AM
To: users@nifi.apache.org<mailto:users@nifi.apache.org>
Subject: Re: Accessing flow attributes from ExecuteStreamCommand

There's not way at the moment to interact with the NiFi API from that 
processor. The closest work around would be to pass in flowfile attributes as 
parameters using the parameter configuration field and expression language.

On Thu, May 28, 2020 at 10:28 AM Jean-Sebastien Vachon 
mailto:jsvac...@brizodata.com>> wrote:
Hi all,

I am using the ExecuteStreamCommand processor to run a python script to crunch 
different data and I was curious
to know if such a processor could both read and/or write from/to the flow 
attributes.

Can someone point me to the documentation if this is possible? I could not find 
it by myself.

Thanks

Sent from Mail<https://go.microsoft.com/fwlink/?LinkId=550986> for Windows 10




Accessing flow attributes from ExecuteStreamCommand

2020-05-28 Thread Jean-Sebastien Vachon
Hi all,

I am using the ExecuteStreamCommand processor to run a python script to crunch 
different data and I was curious
to know if such a processor could both read and/or write from/to the flow 
attributes.

Can someone point me to the documentation if this is possible? I could not find 
it by myself.

Thanks

Sent from Mail for Windows 10



Re: Problem processing "huge" json objects

2020-02-17 Thread Jean-Sebastien Vachon
Hi

sorry for the late response. The Json looks like this

{
  "x": [ {}, {}, {} ]
}



From: Mike Thomsen 
Sent: Saturday, February 15, 2020 2:54 PM
To: users@nifi.apache.org 
Subject: Re: Problem processing "huge" json objects

> JSON contains an array of object

Like this:

[ { }, {} ]

Or like this?

{
  "x": [ {}, {}, {} ]
}

Because if the latter, I might have a custom NAR file I can share that I had to 
use for a similar situation.

On Fri, Feb 14, 2020 at 3:58 PM Jean-Sebastien Vachon 
mailto:jsvac...@brizodata.com>> wrote:
The JSON contains an array of object that are to be inserted into a DB (and 
copied over to S3 for archival)...
I used a Split processor to cut them down to smaller chunks and it worked.

Thanks anyhow

From: Pierre Villard 
mailto:pierre.villard...@gmail.com>>
Sent: Friday, February 14, 2020 3:28 PM
To: users@nifi.apache.org<mailto:users@nifi.apache.org> 
mailto:users@nifi.apache.org>>
Subject: Re: Problem processing "huge" json objects

Hi,

Can't you use the Record processors? What are you trying to achieve?

Thanks,
Pierre

Le ven. 14 févr. 2020 à 12:01, Jean-Sebastien Vachon 
mailto:jsvac...@brizodata.com>> a écrit :
Hi all,

I am having some trouble processing 17 objects (total size 20.6GB ) through a 
JsonEvaluateJsonPath processor.
Originally, the JVM had only 6GB and I progressively upgraded the amount of Ram 
and it still fails with the following settings:

 -Xms16g -Xmx40g -XX:MaxPermSize=6G -XX:PermSize=4G

The exact error message is:

2020-02-14 19:55:53,799 ERROR [Timer-Driven Process Thread-316] 
o.a.n.p.standard.EvaluateJsonPath 
EvaluateJsonPath[id=01701031-6cff-1beb-b313-ab0531781357] 
EvaluateJsonPath[id=01701031-6cff-1beb-b313-ab0531781357] failed to process 
session due to java.lang.OutOfMemoryError: Requested array size exceeds VM 
limit; Pro
cessor Administratively Yielded for 1 sec: java.lang.OutOfMemoryError: 
Requested array size exceeds VM limit
java.lang.OutOfMemoryError: Requested array size exceeds VM limit
at java.lang.StringCoding.encode(StringCoding.java:350)
at java.lang.String.getBytes(String.java:941)
at 
org.apache.nifi.processors.standard.EvaluateJsonPath.lambda$onTrigger$3(EvaluateJsonPath.java:331)
at 
org.apache.nifi.processors.standard.EvaluateJsonPath$$Lambda$840/682258977.process(Unknown
 Source)
at 
org.apache.nifi.controller.repository.StandardProcessSession.write(StandardProcessSession.java:2665)
at 
org.apache.nifi.processors.standard.EvaluateJsonPath.onTrigger(EvaluateJsonPath.java:329)
at 
org.apache.nifi.processor.AbstractProcessor.onTrigger(AbstractProcessor.java:27)
at 
org.apache.nifi.controller.StandardProcessorNode.onTrigger(StandardProcessorNode.java:1162)
at 
org.apache.nifi.controller.tasks.ConnectableTask.invoke(ConnectableTask.java:205)
at 
org.apache.nifi.controller.scheduling.TimerDrivenSchedulingAgent$1.run(TimerDrivenSchedulingAgent.java:117)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)


Are there any other settings I can tune? if not what are my options?

Thanks


Re: Problem processing "huge" json objects

2020-02-14 Thread Jean-Sebastien Vachon
The JSON contains an array of object that are to be inserted into a DB (and 
copied over to S3 for archival)...
I used a Split processor to cut them down to smaller chunks and it worked.

Thanks anyhow

From: Pierre Villard 
Sent: Friday, February 14, 2020 3:28 PM
To: users@nifi.apache.org 
Subject: Re: Problem processing "huge" json objects

Hi,

Can't you use the Record processors? What are you trying to achieve?

Thanks,
Pierre

Le ven. 14 févr. 2020 à 12:01, Jean-Sebastien Vachon 
mailto:jsvac...@brizodata.com>> a écrit :
Hi all,

I am having some trouble processing 17 objects (total size 20.6GB ) through a 
JsonEvaluateJsonPath processor.
Originally, the JVM had only 6GB and I progressively upgraded the amount of Ram 
and it still fails with the following settings:

 -Xms16g -Xmx40g -XX:MaxPermSize=6G -XX:PermSize=4G

The exact error message is:

2020-02-14 19:55:53,799 ERROR [Timer-Driven Process Thread-316] 
o.a.n.p.standard.EvaluateJsonPath 
EvaluateJsonPath[id=01701031-6cff-1beb-b313-ab0531781357] 
EvaluateJsonPath[id=01701031-6cff-1beb-b313-ab0531781357] failed to process 
session due to java.lang.OutOfMemoryError: Requested array size exceeds VM 
limit; Pro
cessor Administratively Yielded for 1 sec: java.lang.OutOfMemoryError: 
Requested array size exceeds VM limit
java.lang.OutOfMemoryError: Requested array size exceeds VM limit
at java.lang.StringCoding.encode(StringCoding.java:350)
at java.lang.String.getBytes(String.java:941)
at 
org.apache.nifi.processors.standard.EvaluateJsonPath.lambda$onTrigger$3(EvaluateJsonPath.java:331)
at 
org.apache.nifi.processors.standard.EvaluateJsonPath$$Lambda$840/682258977.process(Unknown
 Source)
at 
org.apache.nifi.controller.repository.StandardProcessSession.write(StandardProcessSession.java:2665)
at 
org.apache.nifi.processors.standard.EvaluateJsonPath.onTrigger(EvaluateJsonPath.java:329)
at 
org.apache.nifi.processor.AbstractProcessor.onTrigger(AbstractProcessor.java:27)
at 
org.apache.nifi.controller.StandardProcessorNode.onTrigger(StandardProcessorNode.java:1162)
at 
org.apache.nifi.controller.tasks.ConnectableTask.invoke(ConnectableTask.java:205)
at 
org.apache.nifi.controller.scheduling.TimerDrivenSchedulingAgent$1.run(TimerDrivenSchedulingAgent.java:117)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)


Are there any other settings I can tune? if not what are my options?

Thanks


Problem processing "huge" json objects

2020-02-14 Thread Jean-Sebastien Vachon
Hi all,

I am having some trouble processing 17 objects (total size 20.6GB ) through a 
JsonEvaluateJsonPath processor.
Originally, the JVM had only 6GB and I progressively upgraded the amount of Ram 
and it still fails with the following settings:

 -Xms16g -Xmx40g -XX:MaxPermSize=6G -XX:PermSize=4G

The exact error message is:

2020-02-14 19:55:53,799 ERROR [Timer-Driven Process Thread-316] 
o.a.n.p.standard.EvaluateJsonPath 
EvaluateJsonPath[id=01701031-6cff-1beb-b313-ab0531781357] 
EvaluateJsonPath[id=01701031-6cff-1beb-b313-ab0531781357] failed to process 
session due to java.lang.OutOfMemoryError: Requested array size exceeds VM 
limit; Pro
cessor Administratively Yielded for 1 sec: java.lang.OutOfMemoryError: 
Requested array size exceeds VM limit
java.lang.OutOfMemoryError: Requested array size exceeds VM limit
at java.lang.StringCoding.encode(StringCoding.java:350)
at java.lang.String.getBytes(String.java:941)
at 
org.apache.nifi.processors.standard.EvaluateJsonPath.lambda$onTrigger$3(EvaluateJsonPath.java:331)
at 
org.apache.nifi.processors.standard.EvaluateJsonPath$$Lambda$840/682258977.process(Unknown
 Source)
at 
org.apache.nifi.controller.repository.StandardProcessSession.write(StandardProcessSession.java:2665)
at 
org.apache.nifi.processors.standard.EvaluateJsonPath.onTrigger(EvaluateJsonPath.java:329)
at 
org.apache.nifi.processor.AbstractProcessor.onTrigger(AbstractProcessor.java:27)
at 
org.apache.nifi.controller.StandardProcessorNode.onTrigger(StandardProcessorNode.java:1162)
at 
org.apache.nifi.controller.tasks.ConnectableTask.invoke(ConnectableTask.java:205)
at 
org.apache.nifi.controller.scheduling.TimerDrivenSchedulingAgent$1.run(TimerDrivenSchedulingAgent.java:117)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)


Are there any other settings I can tune? if not what are my options?

Thanks


Content repo

2020-01-17 Thread Jean-Sebastien Vachon
Hi all,

a quick question regarding the content repository. I've got some intense 
processing going on on my instance and the size of the repo is being reported 
as over 880GB (with 'du -csh /opt/nifi/content_repository') but when I check my 
flow, I've got at most 5-6 GB of data queued.
I've restarted Nifi but the size stays about the same.

Any idea how I can recover some space?

I am using Nifi 1.9.1 on an AWS Linux instance with Java 1.8.0_201

Thanks


Re: Weird behaviour

2019-10-03 Thread Jean-Sebastien Vachon
Hi Joe,

I read a thread about this new provenance repo yesterday night 
(https://www.mail-archive.com/users@nifi.apache.org/msg07179.html) and tried it.
It did a very big difference and I was able to process my entire queue during 
the night.

I never had any issues with GC1 so far. What GC engine to you recommend? CMS?

thanks

From: Joe Witt 
Sent: Wednesday, October 2, 2019 10:30 PM
To: users@nifi.apache.org 
Subject: Re: Weird behaviour

Jean

Id recommend switching to the new provenance repo called 
WriteAheadProvenaceRepository.  Look at a new nifi downloads nifi.properties as 
it has been the default for a while.  This will help the prov stuff.  You may 
also want to stop using g1gc if on java 8.

I cant explain the status history finding.

Thanks

On Wed, Oct 2, 2019 at 8:32 PM Jean-Sebastien Vachon 
mailto:jsvac...@brizodata.com>> wrote:
Just did a thread dump using bin/nifi.sh dump and saw that a lot of threads are 
waiting on a lock.



"Provenance Maintenance Thread-3" Id=99 RUNNABLE
at java.io.UnixFileSystem.getLength(Native Method)
at java.io.File.length(File.java:974)
at 
org.apache.nifi.provenance.IndexConfiguration.getSize(IndexConfiguration.java:341)
at 
org.apache.nifi.provenance.IndexConfiguration.getIndexSize(IndexConfiguration.java:355)
at 
org.apache.nifi.provenance.PersistentProvenanceRepository.getSize(PersistentProvenanceRepository.java:892)
at 
org.apache.nifi.provenance.PersistentProvenanceRepository.purgeOldEvents(PersistentProvenanceRepository.java:913)
- waiting on 
org.apache.nifi.provenance.PersistentProvenanceRepository@3cee277e
at 
org.apache.nifi.provenance.PersistentProvenanceRepository.rollover(PersistentProvenanceRepository.java:1416)
at 
org.apache.nifi.provenance.PersistentProvenanceRepository.access$300(PersistentProvenanceRepository.java:131)
at 
org.apache.nifi.provenance.PersistentProvenanceRepository$1.run(PersistentProvenanceRepository.java:302)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Number of Locked Synchronizers: 3
- java.util.concurrent.locks.ReentrantReadWriteLock$FairSync@46683a80
- java.util.concurrent.locks.ReentrantLock$NonfairSync@3973f842
- java.util.concurrent.ThreadPoolExecutor$Worker@3d2af8d7

"Provenance Query Thread-1" Id=1185 WAITING  on 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@232695c5
at sun.misc.Unsafe.park(Native Method)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
at 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
at 
java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
at 
java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1074)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1134)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)

____
From: Jean-Sebastien Vachon 
mailto:jsvac...@brizodata.com>>
Sent: Wednesday, October 2, 2019 9:25 PM
To: users@nifi.apache.org<mailto:users@nifi.apache.org> 
mailto:users@nifi.apache.org>>
Subject: Weird behaviour

Hi all,

First thing first... sorry for the long post but my Nifi instance (1.9.1) 
started to behave weird (according to my experience at least) this afternoon 
and I'm struggling to determine the cause of the problem (and fix it)
All processors are stopped and I have about 205k flows queued in a single queue 
as illustrated below.


[cid:16d8f727eed72f938c21]

The process "SaveToDB" seems to be causing some issues. It actually does a few 
more things than saving to the DB so this module was implemented as an 
ExecuteStreamCommand processor using Python3.
This module hasn't changed in quite some time and this is the first time I'm 
having issues around it. Whenever I start it (even with only 1 thread), Nifi 
start to show the following message...

[cid:16d8f727eedb7cb0fdb2]

Another weird issue is that if I try to open the "View Status History" on this 
processor, Nifi says: "Insufficient history, please try again later"

Re: Weird behaviour

2019-10-02 Thread Jean-Sebastien Vachon
Just did a thread dump using bin/nifi.sh dump and saw that a lot of threads are 
waiting on a lock.



"Provenance Maintenance Thread-3" Id=99 RUNNABLE
at java.io.UnixFileSystem.getLength(Native Method)
at java.io.File.length(File.java:974)
at 
org.apache.nifi.provenance.IndexConfiguration.getSize(IndexConfiguration.java:341)
at 
org.apache.nifi.provenance.IndexConfiguration.getIndexSize(IndexConfiguration.java:355)
at 
org.apache.nifi.provenance.PersistentProvenanceRepository.getSize(PersistentProvenanceRepository.java:892)
at 
org.apache.nifi.provenance.PersistentProvenanceRepository.purgeOldEvents(PersistentProvenanceRepository.java:913)
- waiting on 
org.apache.nifi.provenance.PersistentProvenanceRepository@3cee277e
at 
org.apache.nifi.provenance.PersistentProvenanceRepository.rollover(PersistentProvenanceRepository.java:1416)
at 
org.apache.nifi.provenance.PersistentProvenanceRepository.access$300(PersistentProvenanceRepository.java:131)
at 
org.apache.nifi.provenance.PersistentProvenanceRepository$1.run(PersistentProvenanceRepository.java:302)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Number of Locked Synchronizers: 3
- java.util.concurrent.locks.ReentrantReadWriteLock$FairSync@46683a80
- java.util.concurrent.locks.ReentrantLock$NonfairSync@3973f842
- java.util.concurrent.ThreadPoolExecutor$Worker@3d2af8d7

"Provenance Query Thread-1" Id=1185 WAITING  on 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@232695c5
at sun.misc.Unsafe.park(Native Method)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
at 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
at 
java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
at 
java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1074)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1134)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)

____
From: Jean-Sebastien Vachon 
Sent: Wednesday, October 2, 2019 9:25 PM
To: users@nifi.apache.org 
Subject: Weird behaviour

Hi all,

First thing first... sorry for the long post but my Nifi instance (1.9.1) 
started to behave weird (according to my experience at least) this afternoon 
and I'm struggling to determine the cause of the problem (and fix it)
All processors are stopped and I have about 205k flows queued in a single queue 
as illustrated below.


[cid:1c8868f8-4527-4d5c-86f7-c35a16116385]

The process "SaveToDB" seems to be causing some issues. It actually does a few 
more things than saving to the DB so this module was implemented as an 
ExecuteStreamCommand processor using Python3.
This module hasn't changed in quite some time and this is the first time I'm 
having issues around it. Whenever I start it (even with only 1 thread), Nifi 
start to show the following message...

[cid:e107a064-c536-42c7-9ae9-4b88fbce3e93]

Another weird issue is that if I try to open the "View Status History" on this 
processor, Nifi says: "Insufficient history, please try again later".
I tried many things, like restarting Nifi, and noticed that the CPU runs at 
150% on startup with no indication that it will ever stop. I read about these 
warning/error messages and tuned a few settings to increase the number of 
threads to index and restarted Nifi but
the CPU is still over 100% when no actual process is running in Nifi. Looking 
at /nifi-api/system-diagnostics shows the following:

{"systemDiagnostics": {"aggregateSnapshot": {"totalNonHeap": "235.4 
MB","totalNonHeapBytes": 246833152,"usedNonHeap": "222.25 
MB","usedNonHeapBytes": 233046120,"freeNonHeap": "13.15 MB","freeNonHeapBytes": 
13787032,"maxNonHeap": "-1 bytes","maxNonHeapBytes": -1,"totalHeap": "12 
GB","totalHeapBytes": 12884901888,"usedHeap": "7.14 GB","usedHeapBytes": 
7670910608

Weird behaviour

2019-10-02 Thread Jean-Sebastien Vachon
Hi all,

First thing first... sorry for the long post but my Nifi instance (1.9.1) 
started to behave weird (according to my experience at least) this afternoon 
and I'm struggling to determine the cause of the problem (and fix it)
All processors are stopped and I have about 205k flows queued in a single queue 
as illustrated below.


[cid:1c8868f8-4527-4d5c-86f7-c35a16116385]

The process "SaveToDB" seems to be causing some issues. It actually does a few 
more things than saving to the DB so this module was implemented as an 
ExecuteStreamCommand processor using Python3.
This module hasn't changed in quite some time and this is the first time I'm 
having issues around it. Whenever I start it (even with only 1 thread), Nifi 
start to show the following message...

[cid:e107a064-c536-42c7-9ae9-4b88fbce3e93]

Another weird issue is that if I try to open the "View Status History" on this 
processor, Nifi says: "Insufficient history, please try again later".
I tried many things, like restarting Nifi, and noticed that the CPU runs at 
150% on startup with no indication that it will ever stop. I read about these 
warning/error messages and tuned a few settings to increase the number of 
threads to index and restarted Nifi but
the CPU is still over 100% when no actual process is running in Nifi. Looking 
at /nifi-api/system-diagnostics shows the following:

{"systemDiagnostics": {"aggregateSnapshot": {"totalNonHeap": "235.4 
MB","totalNonHeapBytes": 246833152,"usedNonHeap": "222.25 
MB","usedNonHeapBytes": 233046120,"freeNonHeap": "13.15 MB","freeNonHeapBytes": 
13787032,"maxNonHeap": "-1 bytes","maxNonHeapBytes": -1,"totalHeap": "12 
GB","totalHeapBytes": 12884901888,"usedHeap": "7.14 GB","usedHeapBytes": 
7670910608,"freeHeap": "4.86 GB","freeHeapBytes": 5213991280,"maxHeap": "12 
GB","maxHeapBytes": 12884901888,"heapUtilization": 
"60.0%","availableProcessors": 72,"processorLoadAverage": 1.06,"totalThreads": 
916,"daemonThreads": 53,"uptime": "00:58:48.996",
I was wondering if the high number of threads is expected or not... 916 threads 
seems a lot to me. Especially since my max threads count is set to 400 right 
now.
The machine has 72 CPUs and is quite capable but the CPU has been running at 
over 100% for hours now.

Do you have any recommendation?

thanks


Re: node died unexpectedly

2019-09-25 Thread Jean-Sebastien Vachon
I could if the Nifi was up and running but when I loose the node, the Nifi 
process itself dies so I can hardly take a heap dump of something that is not 
running 

It happened twice before I sent my first email...nothing since then.

From: DEHAY Aurelien 
Sent: Wednesday, September 25, 2019 10:29 AM
To: users@nifi.apache.org 
Subject: RE: node died unexpectedly


Classification\_- INTERNAL & PARTNERS



Hello.



By any chance, did you take thread dumps?



If it happens again, take thread dumps of the java nifi process (with jstack: 
jstack -l  >> threaddumps.log). Take 3 or 4 every 10/15 seconds .



Regards.



From: Jean-Sebastien Vachon 
Sent: lundi 23 septembre 2019 15:08
To: users@nifi.apache.org
Subject: node died unexpectedly



Hi,



I am running a cluster of five nodes and one of them just quit for no apparent 
reason... at least I could not find anything wrong on the system.

Here is a section of the logs. You can clearly see that there is a 8 minutes 
gap before I manually had to restart Nifi.  I've looked in /var/log/messages 
and Nifi's log files but could

not find any reason for this to happen...



any thoughts/ideas?



---



2019-09-23 12:51:54,567 ERROR [Timer-Driven Process Thread-40] 
o.a.n.p.standard.ExecuteStreamCommand 
ExecuteStreamCommand[id=8b748c9a-7d7d-30cd-86a3-4ea3c4ccd3b4] Transferring flow 
file 
StandardFlowFileRecord[uuid=60d29ce2-088a-4829-a0ae-17672f1d86e6,claim=StandardContentClaim
 [resourceClaim=StandardResourceClaim[id

=1569243036019-4497, container=default, section=401], offset=591157, 
length=-1],offset=0,name=186373,size=0] to nonzero status. Executable command 
/usr/bin/python3 ended in an error:

2019-09-23 12:51:54,570 ERROR [Timer-Driven Process Thread-4] 
o.a.n.p.standard.ExecuteStreamCommand 
ExecuteStreamCommand[id=272623bc-a371-3ff9-adaf-d8363b91a5ed] Transferring flow 
file 
StandardFlowFileRecord[uuid=a18485c8-9b03-43a8-8c13-e76803df9290,claim=StandardContentClaim
 [resourceClaim=StandardResourceClaim[id=1569242553862-4458, container=default, 
section=362], offset=780733, length=21229],offset=0,name=33287,size=21229] to 
nonzero status. Executable command /usr/bin/python3 ended in an error: 
/usr/local/lib/python3.7/site-packages/requests/__init__.py:80: 
RequestsDependencyWarning: urllib3 (1.23) or chardet (3.0.4) doesn't match a 
supported version!

  RequestsDependencyWarning)



2019-09-23 12:51:54,570 ERROR [Thread-130947] 
o.a.n.p.standard.ExecuteStreamCommand 
ExecuteStreamCommand[id=a7cb604a-465d-3440-beab-b9554e1ba4bb] Failed to write 
flow file to stdin due to java.io.IOException: Broken pipe: 
java.io.IOException: Broken pipe

java.io.IOException: Broken pipe

at java.io.FileOutputStream.writeBytes(Native Method)

at java.io.FileOutputStream.write(FileOutputStream.java:326)

at java.io.BufferedOutputStream.write(BufferedOutputStream.java:122)

at java.io.BufferedOutputStream.write(BufferedOutputStream.java:122)

at org.apache.nifi.stream.io.StreamUtils.copy(StreamUtils.java:36)

at 
org.apache.nifi.processors.standard.ExecuteStreamCommand$2.run(ExecuteStreamCommand.java:519)

at java.lang.Thread.run(Thread.java:748)

2019-09-23 12:51:54,570 ERROR [Timer-Driven Process Thread-27] 
o.a.n.p.standard.ExecuteStreamCommand 
ExecuteStreamCommand[id=a7cb604a-465d-3440-beab-b9554e1ba4bb] Transferring flow 
file 
StandardFlowFileRecord[uuid=e5a82bb6-23a9-4a16-825a-1d976cd114cc,claim=StandardContentClaim
 [resourceClaim=StandardResourceClaim[id=1569242062433-4406, container=default, 
section=310], offset=680454, length=-1],offset=0,name=164088,size=0] to nonzero 
status. Executable command /usr/bin/python3 ended in an error:



2019-09-23 12:59:56,012 INFO [main] org.apache.nifi.NiFi Launching NiFi...

2019-09-23 12:59:56,179 INFO [main] o.a.nifi.properties.NiFiPropertiesLoader 
Determined default nifi.properties path to be '/opt/nifi/./conf/nifi.properties'

2019-09-23 12:59:56,181 INFO [main] o.a.nifi.properties.NiFiPropertiesLoader 
Loaded 149 properties from /opt/nifi/./conf/nifi.properties

2019-09-23 12:59:56,186 INFO [main] org.apache.nifi.NiFi Loaded 149 properties

This electronic transmission (and any attachments thereto) is intended solely 
for the use of the addressee(s). It may contain confidential or legally 
privileged information. If you are not the intended recipient of this message, 
you must delete it immediately and notify the sender. Any unauthorized use or 
disclosure of this message is strictly prohibited.  Faurecia does not guarantee 
the integrity of this transmission and shall therefore never be liable if the 
message is altered or falsified nor for any virus, interception or damage to 
your system.


node died unexpectedly

2019-09-23 Thread Jean-Sebastien Vachon
Hi,

I am running a cluster of five nodes and one of them just quit for no apparent 
reason... at least I could not find anything wrong on the system.
Here is a section of the logs. You can clearly see that there is a 8 minutes 
gap before I manually had to restart Nifi.  I've looked in /var/log/messages 
and Nifi's log files but could
not find any reason for this to happen...

any thoughts/ideas?

---

2019-09-23 12:51:54,567 ERROR [Timer-Driven Process Thread-40] 
o.a.n.p.standard.ExecuteStreamCommand 
ExecuteStreamCommand[id=8b748c9a-7d7d-30cd-86a3-4ea3c4ccd3b4] Transferring flow 
file 
StandardFlowFileRecord[uuid=60d29ce2-088a-4829-a0ae-17672f1d86e6,claim=StandardContentClaim
 [resourceClaim=StandardResourceClaim[id
=1569243036019-4497, container=default, section=401], offset=591157, 
length=-1],offset=0,name=186373,size=0] to nonzero status. Executable command 
/usr/bin/python3 ended in an error:
2019-09-23 12:51:54,570 ERROR [Timer-Driven Process Thread-4] 
o.a.n.p.standard.ExecuteStreamCommand 
ExecuteStreamCommand[id=272623bc-a371-3ff9-adaf-d8363b91a5ed] Transferring flow 
file 
StandardFlowFileRecord[uuid=a18485c8-9b03-43a8-8c13-e76803df9290,claim=StandardContentClaim
 [resourceClaim=StandardResourceClaim[id=1569242553862-4458, container=default, 
section=362], offset=780733, length=21229],offset=0,name=33287,size=21229] to 
nonzero status. Executable command /usr/bin/python3 ended in an error: 
/usr/local/lib/python3.7/site-packages/requests/__init__.py:80: 
RequestsDependencyWarning: urllib3 (1.23) or chardet (3.0.4) doesn't match a 
supported version!
  RequestsDependencyWarning)

2019-09-23 12:51:54,570 ERROR [Thread-130947] 
o.a.n.p.standard.ExecuteStreamCommand 
ExecuteStreamCommand[id=a7cb604a-465d-3440-beab-b9554e1ba4bb] Failed to write 
flow file to stdin due to java.io.IOException: Broken pipe: 
java.io.IOException: Broken pipe
java.io.IOException: Broken pipe
at java.io.FileOutputStream.writeBytes(Native Method)
at java.io.FileOutputStream.write(FileOutputStream.java:326)
at java.io.BufferedOutputStream.write(BufferedOutputStream.java:122)
at java.io.BufferedOutputStream.write(BufferedOutputStream.java:122)
at org.apache.nifi.stream.io.StreamUtils.copy(StreamUtils.java:36)
at 
org.apache.nifi.processors.standard.ExecuteStreamCommand$2.run(ExecuteStreamCommand.java:519)
at java.lang.Thread.run(Thread.java:748)
2019-09-23 12:51:54,570 ERROR [Timer-Driven Process Thread-27] 
o.a.n.p.standard.ExecuteStreamCommand 
ExecuteStreamCommand[id=a7cb604a-465d-3440-beab-b9554e1ba4bb] Transferring flow 
file 
StandardFlowFileRecord[uuid=e5a82bb6-23a9-4a16-825a-1d976cd114cc,claim=StandardContentClaim
 [resourceClaim=StandardResourceClaim[id=1569242062433-4406, container=default, 
section=310], offset=680454, length=-1],offset=0,name=164088,size=0] to nonzero 
status. Executable command /usr/bin/python3 ended in an error:

2019-09-23 12:59:56,012 INFO [main] org.apache.nifi.NiFi Launching NiFi...
2019-09-23 12:59:56,179 INFO [main] o.a.nifi.properties.NiFiPropertiesLoader 
Determined default nifi.properties path to be '/opt/nifi/./conf/nifi.properties'
2019-09-23 12:59:56,181 INFO [main] o.a.nifi.properties.NiFiPropertiesLoader 
Loaded 149 properties from /opt/nifi/./conf/nifi.properties
2019-09-23 12:59:56,186 INFO [main] org.apache.nifi.NiFi Loaded 149 properties


Re: Too many open files

2019-09-18 Thread Jean-Sebastien Vachon
I managed to find the culprit.. it was the init script that I was using that 
was doing something weird.

I added MAX_FD=5 to my nifi-env.sh file and everything seems to be fine now

thanks

From: Abdou B 
Sent: Wednesday, September 18, 2019 10:42 AM
To: users@nifi.apache.org 
Subject: Re: Too many open files

Hello,

It seems to me that for some distribution, you should modify those values in 
the Cluster management tool.
For example in HDF, with Ambari, you should change the parameter : 
nifi_user_nofile_limit. for the change to take effect.

Best regards
Abdou

Le mer. 18 sept. 2019 à 16:34, Jean-Sebastien Vachon 
mailto:jsvac...@brizodata.com>> a écrit :
Does not seem to help... processes are still limited to 4096 fds

From: Jean-Sebastien Vachon 
mailto:jsvac...@brizodata.com>>
Sent: Wednesday, September 18, 2019 10:31 AM
To: users@nifi.apache.org<mailto:users@nifi.apache.org> 
mailto:users@nifi.apache.org>>
Subject: Re: Too many open files

Oups.. just saw the following:

Your distribution may require an edit to /etc/security/limits.d/90-nproc.conf 
by adding:
* soft nproc 1

I will try this
________
From: Jean-Sebastien Vachon 
mailto:jsvac...@brizodata.com>>
Sent: Wednesday, September 18, 2019 10:30 AM
To: users@nifi.apache.org<mailto:users@nifi.apache.org> 
mailto:users@nifi.apache.org>>
Subject: Too many open files

Hi all,

I've started to see "Too many open files" error messages in Nifi. I checked 
https://nifi.apache.org/quickstart.html to see the recommended values to fix 
this
and made the required changes to /etc/security/limits.conf, exited my shell and 
restarted Nifi. When I check the limits of the Java processes I can still see 
the limits to be at 4096

I've added the following to /etc/security/limits.conf
*  hard  nofile  5
*  soft  nofile  5

but the processes show this:

 cat /proc/26861/limits
...
Max open files4096 4096 files
...

Any idea where this 4096 comes from? I tried grepping in the init scripts, nifi 
configuration and nifi-env.sh but could not find this anywhere

thanks


Re: Too many open files

2019-09-18 Thread Jean-Sebastien Vachon
Does not seem to help... processes are still limited to 4096 fds

From: Jean-Sebastien Vachon 
Sent: Wednesday, September 18, 2019 10:31 AM
To: users@nifi.apache.org 
Subject: Re: Too many open files

Oups.. just saw the following:

Your distribution may require an edit to /etc/security/limits.d/90-nproc.conf 
by adding:
* soft nproc 1

I will try this

From: Jean-Sebastien Vachon 
Sent: Wednesday, September 18, 2019 10:30 AM
To: users@nifi.apache.org 
Subject: Too many open files

Hi all,

I've started to see "Too many open files" error messages in Nifi. I checked 
https://nifi.apache.org/quickstart.html to see the recommended values to fix 
this
and made the required changes to /etc/security/limits.conf, exited my shell and 
restarted Nifi. When I check the limits of the Java processes I can still see 
the limits to be at 4096

I've added the following to /etc/security/limits.conf
*  hard  nofile  5
*  soft  nofile  5

but the processes show this:

 cat /proc/26861/limits
...
Max open files4096 4096 files
...

Any idea where this 4096 comes from? I tried grepping in the init scripts, nifi 
configuration and nifi-env.sh but could not find this anywhere

thanks


Re: Too many open files

2019-09-18 Thread Jean-Sebastien Vachon
Oups.. just saw the following:

Your distribution may require an edit to /etc/security/limits.d/90-nproc.conf 
by adding:
* soft nproc 1

I will try this

From: Jean-Sebastien Vachon 
Sent: Wednesday, September 18, 2019 10:30 AM
To: users@nifi.apache.org 
Subject: Too many open files

Hi all,

I've started to see "Too many open files" error messages in Nifi. I checked 
https://nifi.apache.org/quickstart.html to see the recommended values to fix 
this
and made the required changes to /etc/security/limits.conf, exited my shell and 
restarted Nifi. When I check the limits of the Java processes I can still see 
the limits to be at 4096

I've added the following to /etc/security/limits.conf
*  hard  nofile  5
*  soft  nofile  5

but the processes show this:

 cat /proc/26861/limits
...
Max open files4096 4096 files
...

Any idea where this 4096 comes from? I tried grepping in the init scripts, nifi 
configuration and nifi-env.sh but could not find this anywhere

thanks


Too many open files

2019-09-18 Thread Jean-Sebastien Vachon
Hi all,

I've started to see "Too many open files" error messages in Nifi. I checked 
https://nifi.apache.org/quickstart.html to see the recommended values to fix 
this
and made the required changes to /etc/security/limits.conf, exited my shell and 
restarted Nifi. When I check the limits of the Java processes I can still see 
the limits to be at 4096

I've added the following to /etc/security/limits.conf
*  hard  nofile  5
*  soft  nofile  5

but the processes show this:

 cat /proc/26861/limits
...
Max open files4096 4096 files
...

Any idea where this 4096 comes from? I tried grepping in the init scripts, nifi 
configuration and nifi-env.sh but could not find this anywhere

thanks


Re: clean shutdown

2019-08-29 Thread Jean-Sebastien Vachon
Thanks to both of you to getting back to me...

I didn't know about offloading a node. I will certainly look into this. I 
quickly looked through the API and saw no mention of the offload word.
Does that mean there is no equivalent function in the API?


Thanks



From: Pierre Villard 
Sent: Thursday, August 29, 2019 3:41 AM
To: users@nifi.apache.org 
Subject: Re: clean shutdown

OK, I didn't understand when you initially said "prevent data loss". I thought 
you meant gracefully stop the instance to avoid data corruption of some sort.

Now I better understand your situation, I see two options:
- the one mentioned by Jon: decommission the node [1] with the REST API and 
hope for the best but with no guarantee
- have the data on attached disks that you could re-attach to a new node at a 
later time (I don't know the AWS specifics around that)

[1] 
https://nifi.apache.org/docs/nifi-docs/html/administration-guide.html#decommission-nodes

Le jeu. 29 août 2019 à 03:32, Jon Logan 
mailto:jmlo...@buffalo.edu>> a écrit :
Remember that spot instances are given shutdown notifications on a best-effort 
basis[1]. You would have to disconnect the node, drain it, then shut it down 
after draining, and hope you do so before you get killed. You could also 
consider the new hibernation feature -- it'll hibernate your node instead of 
terminating, and then rehydrate it at a later time. Your cluster would have a 
disconnected node in the mean time though. All of these scenarios introduce a 
significant potential of data loss, you should be sure you could reproduce the 
data from a durable source if needed (ex. Kafka, etc), or be accepting of the 
data loss.


[1]  
https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/spot-interruptions.html 
While we make every effort to provide this warning as soon as possible, it is 
possible that your Spot Instance is terminated before the warning can be made 
available. Test your application to ensure that it handles an unexpected 
instance termination gracefully, even if you are testing for interruption 
notices. You can do so by running the application using an On-Demand Instance 
and then terminating the On-Demand Instance yourself.

On Wed, Aug 28, 2019 at 8:57 PM Jean-Sebastien Vachon 
mailto:jsvac...@brizodata.com>> wrote:
Hi Craig,

I made some additional tests and I am afraid I lost flows... I used the same 
flow I described earlier, generated around 30k flows and load balanced them on 
the three nodes forming my cluster.
I then shutdown one of the machine. The result is that I lost 10k flows that 
were scheduled to be processed on this machine. This is a problem I need to 
address and I'll be looking for ideas shortly.

For those interested in automating the removal of a spot instance from a 
cluster... here is something to get you started.
AWS recommend to monitor the URL found in the if statement every 5s (or so)... 
Since cron only supports 1 minute intervals and nothing smaller,
I accomplish what I wanted by adding multiple crons and sleeping for a variable 
amount of time.

You will need jq and curl to be installed on your machine for this to work.
The basic idea is to wait until the web page appears to exist and then trigger 
a series of actions.

---

#!/bin/bash
sleep $1

NODE_IP=`curl -s 
http://169.254.169.254/latest/meta-data/local-ipv4`<http://169.254.169.254/latest/meta-data/local-ipv4>
NODE_ID=`curl -s "http://${NODE_IP}:8088/nifi-api/controller/cluster; | jq 
--arg IP "${NODE_IP}" -r '.cluster.nodes[] | select('.address' == $IP).nodeId'`
OTHER_NODE=`curl -s "http://${NODE_IP}:8088/nifi-api/controller/cluster; | jq 
--arg IP "${NODE_IP}"  -r '.cluster.nodes[] | select('.address' != 
$IP).address' | head -1`

if [ -z $(curl -Is 
http://169.254.169.254/latest/meta-data/spot/termination-time | head -1 | grep 
404 | cut -d' ' -f 2) ]
then
echo "Running shutdown hook."
systemctl stop nifi
sleep 5
curl -s -X DELETE 
"http://${OTHER_NODE}:8088/nifi-api/controller/cluster/nodes/$NODE_ID;
fi


From: Jean-Sebastien Vachon 
mailto:jsvac...@brizodata.com>>
Sent: Wednesday, August 28, 2019 7:39 PM
To: users@nifi.apache.org<mailto:users@nifi.apache.org> 
mailto:users@nifi.apache.org>>
Subject: Re: clean shutdown

Hi Craig,

First the generic stuff...

according to the tests I made, no flows are lost when a machine is removed from 
the cluster.  They seem to be requeued.
However, I only tested with a very basic flow and not with my whole flow which 
involves a lot of things.
Basically, I used a GenerateFlow to generate some data and a dummy Python 
process to do something with it. The queue between the two
processors was configured to do load balancing using a round robin. I must 
admit that I haven't look if the item was requeued and dispatched to another 
node.
The output of the python module was split betwe

Re: clean shutdown

2019-08-28 Thread Jean-Sebastien Vachon
Hi Craig,

I made some additional tests and I am afraid I lost flows... I used the same 
flow I described earlier, generated around 30k flows and load balanced them on 
the three nodes forming my cluster.
I then shutdown one of the machine. The result is that I lost 10k flows that 
were scheduled to be processed on this machine. This is a problem I need to 
address and I'll be looking for ideas shortly.

For those interested in automating the removal of a spot instance from a 
cluster... here is something to get you started.
AWS recommend to monitor the URL found in the if statement every 5s (or so)... 
Since cron only supports 1 minute intervals and nothing smaller,
I accomplish what I wanted by adding multiple crons and sleeping for a variable 
amount of time.

You will need jq and curl to be installed on your machine for this to work.
The basic idea is to wait until the web page appears to exist and then trigger 
a series of actions.

---

#!/bin/bash
sleep $1

NODE_IP=`curl -s http://169.254.169.254/latest/meta-data/local-ipv4`
NODE_ID=`curl -s "http://${NODE_IP}:8088/nifi-api/controller/cluster; | jq 
--arg IP "${NODE_IP}" -r '.cluster.nodes[] | select('.address' == $IP).nodeId'`
OTHER_NODE=`curl -s "http://${NODE_IP}:8088/nifi-api/controller/cluster; | jq 
--arg IP "${NODE_IP}"  -r '.cluster.nodes[] | select('.address' != 
$IP).address' | head -1`

if [ -z $(curl -Is 
http://169.254.169.254/latest/meta-data/spot/termination-time | head -1 | grep 
404 | cut -d' ' -f 2) ]
then
echo "Running shutdown hook."
systemctl stop nifi
sleep 5
curl -s -X DELETE 
"http://${OTHER_NODE}:8088/nifi-api/controller/cluster/nodes/$NODE_ID;
fi

____
From: Jean-Sebastien Vachon 
Sent: Wednesday, August 28, 2019 7:39 PM
To: users@nifi.apache.org 
Subject: Re: clean shutdown

Hi Craig,

First the generic stuff...

according to the tests I made, no flows are lost when a machine is removed from 
the cluster.  They seem to be requeued.
However, I only tested with a very basic flow and not with my whole flow which 
involves a lot of things.
Basically, I used a GenerateFlow to generate some data and a dummy Python 
process to do something with it. The queue between the two
processors was configured to do load balancing using a round robin. I must 
admit that I haven't look if the item was requeued and dispatched to another 
node.
The output of the python module was split between success and failure and no 
single flow reached the failure state.

then to AWS specific stuff...

I had to script a few things to cleanup within the two minutes warning AWS is 
giving me.
Since I am using spot instances, I know the instance will not come back so I 
had to automate the clean up of the cluster by
using an API call to remove the machine from the cluster. In order to remove 
the machine from the cluster, I need to stop Nifi first and then remove the 
machine through
a call to the API on a second node. I am still polishing the script to 
accomplish this. I may share it once it is working as expected in case someone 
else has this issue.

Let me know if you need more details about anything...

From: Craig Knell 
Sent: Wednesday, August 28, 2019 6:52 PM
To: users@nifi.apache.org 
Subject: Re: clean shutdown

Hi Jean-Sebastien,

I’d be interested to hear how this performs

Best regards

Craig

On 28 Aug 2019, at 22:28, Jean-Sebastien Vachon 
mailto:jsvac...@brizodata.com>> wrote:

Hi Pierre,

thanks for your input.

I am already intercepting AWS termination notification so I will add a few 
steps and see how it reacts

Thanks again

From: Pierre Villard 
mailto:pierre.villard...@gmail.com>>
Sent: Wednesday, August 28, 2019 4:17 AM
To: users@nifi.apache.org<mailto:users@nifi.apache.org> 
mailto:users@nifi.apache.org>>
Subject: Re: clean shutdown

Hi Jean-Sebastien,

When you stop NiFi, by default, it will try to gracefully stop everything in 10 
seconds, and if not all components are nicely stopped after that, it will force 
shut down the NiFi process. This is configured with 
"nifi.flowcontroller.graceful.shutdown.period" in nifi.properties file. If you 
have processors/CS that might take longer to stop gracefully (because of 
connections to external systems for instance), you could increase this value.

I'm not very familiar with AWS spot instances but I'd try to catch the spot 
notification event to stop the NiFi service on the host before the instance is 
stopped/killed.

Pierre



Le mar. 27 août 2019 à 20:05, Jean-Sebastien Vachon 
mailto:jsvac...@brizodata.com>> a écrit :
Hi everybody,

I am working with AWS spot instances and one thing that is giving me a hard 
time is to perform a clean (and quick) shutdown of Nifi in order to prevent 
data loss.

AWS will give you about two minutes to clean up everything before the machine 
is actually sh

Re: clean shutdown

2019-08-28 Thread Jean-Sebastien Vachon
Hi Craig,

First the generic stuff...

according to the tests I made, no flows are lost when a machine is removed from 
the cluster.  They seem to be requeued.
However, I only tested with a very basic flow and not with my whole flow which 
involves a lot of things.
Basically, I used a GenerateFlow to generate some data and a dummy Python 
process to do something with it. The queue between the two
processors was configured to do load balancing using a round robin. I must 
admit that I haven't look if the item was requeued and dispatched to another 
node.
The output of the python module was split between success and failure and no 
single flow reached the failure state.

then to AWS specific stuff...

I had to script a few things to cleanup within the two minutes warning AWS is 
giving me.
Since I am using spot instances, I know the instance will not come back so I 
had to automate the clean up of the cluster by
using an API call to remove the machine from the cluster. In order to remove 
the machine from the cluster, I need to stop Nifi first and then remove the 
machine through
a call to the API on a second node. I am still polishing the script to 
accomplish this. I may share it once it is working as expected in case someone 
else has this issue.

Let me know if you need more details about anything...

From: Craig Knell 
Sent: Wednesday, August 28, 2019 6:52 PM
To: users@nifi.apache.org 
Subject: Re: clean shutdown

Hi Jean-Sebastien,

I’d be interested to hear how this performs

Best regards

Craig

On 28 Aug 2019, at 22:28, Jean-Sebastien Vachon 
mailto:jsvac...@brizodata.com>> wrote:

Hi Pierre,

thanks for your input.

I am already intercepting AWS termination notification so I will add a few 
steps and see how it reacts

Thanks again

From: Pierre Villard 
mailto:pierre.villard...@gmail.com>>
Sent: Wednesday, August 28, 2019 4:17 AM
To: users@nifi.apache.org<mailto:users@nifi.apache.org> 
mailto:users@nifi.apache.org>>
Subject: Re: clean shutdown

Hi Jean-Sebastien,

When you stop NiFi, by default, it will try to gracefully stop everything in 10 
seconds, and if not all components are nicely stopped after that, it will force 
shut down the NiFi process. This is configured with 
"nifi.flowcontroller.graceful.shutdown.period" in nifi.properties file. If you 
have processors/CS that might take longer to stop gracefully (because of 
connections to external systems for instance), you could increase this value.

I'm not very familiar with AWS spot instances but I'd try to catch the spot 
notification event to stop the NiFi service on the host before the instance is 
stopped/killed.

Pierre



Le mar. 27 août 2019 à 20:05, Jean-Sebastien Vachon 
mailto:jsvac...@brizodata.com>> a écrit :
Hi everybody,

I am working with AWS spot instances and one thing that is giving me a hard 
time is to perform a clean (and quick) shutdown of Nifi in order to prevent 
data loss.

AWS will give you about two minutes to clean up everything before the machine 
is actually shutdown.
Is there a way to stop/kill all processes running on the host without loosing 
anything? It is fine if all the flowfiles being processed are simply requeued.

Would simply killing the processes achieve this? (I doubt so)... would it be 
better to fetch a list of running processors and terminate them using Nifi's 
API?

All ideas and thoughts are welcome

thanks


Re: clean shutdown

2019-08-28 Thread Jean-Sebastien Vachon
Hi Pierre,

thanks for your input.

I am already intercepting AWS termination notification so I will add a few 
steps and see how it reacts

Thanks again

From: Pierre Villard 
Sent: Wednesday, August 28, 2019 4:17 AM
To: users@nifi.apache.org 
Subject: Re: clean shutdown

Hi Jean-Sebastien,

When you stop NiFi, by default, it will try to gracefully stop everything in 10 
seconds, and if not all components are nicely stopped after that, it will force 
shut down the NiFi process. This is configured with 
"nifi.flowcontroller.graceful.shutdown.period" in nifi.properties file. If you 
have processors/CS that might take longer to stop gracefully (because of 
connections to external systems for instance), you could increase this value.

I'm not very familiar with AWS spot instances but I'd try to catch the spot 
notification event to stop the NiFi service on the host before the instance is 
stopped/killed.

Pierre



Le mar. 27 août 2019 à 20:05, Jean-Sebastien Vachon 
mailto:jsvac...@brizodata.com>> a écrit :
Hi everybody,

I am working with AWS spot instances and one thing that is giving me a hard 
time is to perform a clean (and quick) shutdown of Nifi in order to prevent 
data loss.

AWS will give you about two minutes to clean up everything before the machine 
is actually shutdown.
Is there a way to stop/kill all processes running on the host without loosing 
anything? It is fine if all the flowfiles being processed are simply requeued.

Would simply killing the processes achieve this? (I doubt so)... would it be 
better to fetch a list of running processors and terminate them using Nifi's 
API?

All ideas and thoughts are welcome

thanks


clean shutdown

2019-08-27 Thread Jean-Sebastien Vachon
Hi everybody,

I am working with AWS spot instances and one thing that is giving me a hard 
time is to perform a clean (and quick) shutdown of Nifi in order to prevent 
data loss.

AWS will give you about two minutes to clean up everything before the machine 
is actually shutdown.
Is there a way to stop/kill all processes running on the host without loosing 
anything? It is fine if all the flowfiles being processed are simply requeued.

Would simply killing the processes achieve this? (I doubt so)... would it be 
better to fetch a list of running processors and terminate them using Nifi's 
API?

All ideas and thoughts are welcome

thanks


Re: Question on MergeContent "Max bin age"

2019-08-11 Thread Jean-Sebastien Vachon
Hi,

just found out that there is an attribute called "merge.count" giving the 
information I wanted.
Not sure why I didn't see this in the first place...
____
From: Jean-Sebastien Vachon 
Sent: Friday, August 9, 2019 1:23 PM
To: users@nifi.apache.org 
Subject: Re: Question on MergeContent "Max bin age"

I created https://issues.apache.org/jira/browse/NIFI-6537 for this.

Let me know if you need more information
____
From: Jean-Sebastien Vachon 
Sent: Friday, August 9, 2019 1:11 PM
To: users@nifi.apache.org 
Subject: Re: Question on MergeContent "Max bin age"

I like the idea of adding an attribute telling you what happened with your bin. 
People could then Route On attribute and handle different scenarios according 
to their own situation.
However, I believe the reason should be logged somewhere as it could help 
people figure out what is happening to their flow.

From: Jeff 
Sent: Friday, August 9, 2019 11:43 AM
To: users@nifi.apache.org 
Subject: Re: Question on MergeContent "Max bin age"

I like the idea of seeing the details of the reason for eviction/merge in the 
details of a provenance event.  Those same details could be provided in an 
attribute as well.  If a log statement was also created, it should probably be 
at the DEBUG level.

On Fri, Aug 9, 2019 at 9:57 AM Mark Payne 
mailto:marka...@hotmail.com>> wrote:
I don’t believe this information is made available. It would certainly be a 
useful improvement to include the reason that the “bin” was evicted and merged 
- due to timeout, minimum threshold reached, maximum threshold reached, or due 
to running out of space for a new bin. Please do file a jira for that 
improvement.

What do you think is the most useful way to relay this information? Logs? 
Attribute on the merged flowfile? Details of the provenance event?

Thanks
-Mark

Sent from my iPhone

On Aug 9, 2019, at 8:20 AM, Jean-Sebastien Vachon 
mailto:jsvac...@brizodata.com>> wrote:

Hi all,

Is there a way to know if a MergeContent module has timed out because it 
reached the "Max bin age" setting?

Thanks


Re: Question on MergeContent "Max bin age"

2019-08-09 Thread Jean-Sebastien Vachon
I created https://issues.apache.org/jira/browse/NIFI-6537 for this.

Let me know if you need more information

From: Jean-Sebastien Vachon 
Sent: Friday, August 9, 2019 1:11 PM
To: users@nifi.apache.org 
Subject: Re: Question on MergeContent "Max bin age"

I like the idea of adding an attribute telling you what happened with your bin. 
People could then Route On attribute and handle different scenarios according 
to their own situation.
However, I believe the reason should be logged somewhere as it could help 
people figure out what is happening to their flow.

From: Jeff 
Sent: Friday, August 9, 2019 11:43 AM
To: users@nifi.apache.org 
Subject: Re: Question on MergeContent "Max bin age"

I like the idea of seeing the details of the reason for eviction/merge in the 
details of a provenance event.  Those same details could be provided in an 
attribute as well.  If a log statement was also created, it should probably be 
at the DEBUG level.

On Fri, Aug 9, 2019 at 9:57 AM Mark Payne 
mailto:marka...@hotmail.com>> wrote:
I don’t believe this information is made available. It would certainly be a 
useful improvement to include the reason that the “bin” was evicted and merged 
- due to timeout, minimum threshold reached, maximum threshold reached, or due 
to running out of space for a new bin. Please do file a jira for that 
improvement.

What do you think is the most useful way to relay this information? Logs? 
Attribute on the merged flowfile? Details of the provenance event?

Thanks
-Mark

Sent from my iPhone

On Aug 9, 2019, at 8:20 AM, Jean-Sebastien Vachon 
mailto:jsvac...@brizodata.com>> wrote:

Hi all,

Is there a way to know if a MergeContent module has timed out because it 
reached the "Max bin age" setting?

Thanks


Re: Question on MergeContent "Max bin age"

2019-08-09 Thread Jean-Sebastien Vachon
I like the idea of adding an attribute telling you what happened with your bin. 
People could then Route On attribute and handle different scenarios according 
to their own situation.
However, I believe the reason should be logged somewhere as it could help 
people figure out what is happening to their flow.

From: Jeff 
Sent: Friday, August 9, 2019 11:43 AM
To: users@nifi.apache.org 
Subject: Re: Question on MergeContent "Max bin age"

I like the idea of seeing the details of the reason for eviction/merge in the 
details of a provenance event.  Those same details could be provided in an 
attribute as well.  If a log statement was also created, it should probably be 
at the DEBUG level.

On Fri, Aug 9, 2019 at 9:57 AM Mark Payne 
mailto:marka...@hotmail.com>> wrote:
I don’t believe this information is made available. It would certainly be a 
useful improvement to include the reason that the “bin” was evicted and merged 
- due to timeout, minimum threshold reached, maximum threshold reached, or due 
to running out of space for a new bin. Please do file a jira for that 
improvement.

What do you think is the most useful way to relay this information? Logs? 
Attribute on the merged flowfile? Details of the provenance event?

Thanks
-Mark

Sent from my iPhone

On Aug 9, 2019, at 8:20 AM, Jean-Sebastien Vachon 
mailto:jsvac...@brizodata.com>> wrote:

Hi all,

Is there a way to know if a MergeContent module has timed out because it 
reached the "Max bin age" setting?

Thanks


Question on MergeContent "Max bin age"

2019-08-09 Thread Jean-Sebastien Vachon
Hi all,

Is there a way to know if a MergeContent module has timed out because it 
reached the "Max bin age" setting?

Thanks


Running Nifi on AWS spot instances

2019-08-06 Thread Jean-Sebastien Vachon
Hi all,

I've been running Nifi on AWS for quite some time but I only recently tried to 
move everything to spot instances to reduce costs.

I'd like to know what happens with data being processed whenever a node 
disappears from the cluster.
AWS will give me a 2-5 minutes warning but that's still pretty short as I have 
processors that can run much longer than that.

So what's happening to the data being processed on the node when it dies? What 
is the best strategy to recover and prevent data loss?

Thanks for any advice


Re: Best way to wait until everything is finished

2019-04-05 Thread Jean-Sebastien Vachon
I will give it a shot.

Thanks for the hint

From: Bryan Bende 
Sent: Friday, April 5, 2019 2:46 PM
To: users@nifi.apache.org
Subject: Re: Best way to wait until everything is finished

Since you are using SplitJson, it should be adding the standard
"fragment" attributes to each flow file.

You can then use MergeContent in Defragment mode, which uses those
attributes to merge all the fragments back together.

On Fri, Apr 5, 2019 at 2:39 PM Jean-Sebastien Vachon
 wrote:
>
> Hi again,
>
> one of the last issue I am facing with my flow right now is to make one of my 
> MergeContent processor wait until everything has been processed before 
> merging everything into a CSV.
>
> What would be the best way to implement this? is this through Wait/Notify? a 
> counter?
>
> My whole process starts from a queue that receives an array of JSON objects 
> containing all the documents to be processed. This goes through a SplitJson 
> processor to generate a flow for each JSON object received. Each 
> object/record is processed individually and once they are all processed, the 
> MergeContent processor will put everything back together to start the final 
> CSV generation but I need it to wait until all objects have been processed.
>
> Any recommendations to implement this behaviour?
>
> Thanks


Best way to wait until everything is finished

2019-04-05 Thread Jean-Sebastien Vachon
Hi again,

one of the last issue I am facing with my flow right now is to make one of my 
MergeContent processor wait until everything has been processed before merging 
everything into a CSV.

What would be the best way to implement this? is this through Wait/Notify? a 
counter?

My whole process starts from a queue that receives an array of JSON objects 
containing all the documents to be processed. This goes through a SplitJson 
processor to generate a flow for each JSON object received. Each object/record 
is processed individually and once they are all processed, the MergeContent 
processor will put everything back together to start the final CSV generation 
but I need it to wait until all objects have been processed.

Any recommendations to implement this behaviour?

Thanks


Re: insufficient content written

2019-04-05 Thread Jean-Sebastien Vachon
Never mind... I had an extra carriage-return in my new filename attribute 
definition. Removing it fixed the problem.


From: Joe Witt 
Sent: Friday, April 5, 2019 2:15 PM
To: users@nifi.apache.org
Subject: Re: insufficient content written

Hello

Can you share logs or screenshots or anymore details to illustrate what you're 
seeing?

Thanks

On Fri, Apr 5, 2019 at 2:14 PM Jean-Sebastien Vachon 
mailto:jsvac...@brizodata.com>> wrote:
Hi all,

I've added an UpdateAttribute processor to change the filename attribute and 
since then I am not able to either view or download the file even if NIfi 
reports its size correctly.

My first thought was a disk full but that is not the case.

the filename after the updateAttribute looks like this:

20190405-180910.269-xxx-1.0.csv

where xxx is the name of the customer which does not contain any illegal 
character.

Any idea?


insufficient content written

2019-04-05 Thread Jean-Sebastien Vachon
Hi all,

I've added an UpdateAttribute processor to change the filename attribute and 
since then I am not able to either view or download the file even if NIfi 
reports its size correctly.

My first thought was a disk full but that is not the case.

the filename after the updateAttribute looks like this:

20190405-180910.269-xxx-1.0.csv

where xxx is the name of the customer which does not contain any illegal 
character.

Any idea?


Re: Load balancing strategy/thoughts

2019-04-01 Thread Jean-Sebastien Vachon
Hi,

The few final steps/processes of my flow is to produce a final file containing 
the information that was computed in the various processors and save them into 
a single file to be delivered to a customer. I would rather send them one 
larger file instead of many small ones.

Thanks for your comments, I will configure the MergeContent processor's queue 
and see how it goes

From: Bryan Bende 
Sent: Monday, April 1, 2019 2:52 PM
To: users@nifi.apache.org
Subject: Re: Load balancing strategy/thoughts

Hello,

Is there a reason why it has to be brought back to the same original node?

As long as MergeContent is scheduled on all nodes, then you can choose
"Single node" strategy for the queue leading into MergeContent, and
one of the nodes will get all the pieces and can do the
merge/defragment.

-Bryan

On Mon, Apr 1, 2019 at 2:29 PM Jean-Sebastien Vachon
 wrote:
>
> Hi all,
>
> over the last couple of days, I've been playing with the different load 
> balancing options.
> They all seem to  do what they are designed for but I have a small issue and 
> I am not sure how to deal with this...
>
> Let's say I have a process A which output is load balanced on all the nodes 
> within my cluster to a processor B. Once everything has been processed on 
> each node, I want to bring back everything to the same node to merge the 
> results together and perform additional processing.
>
> Should I use a MergeContent on the main/primary node? or configure the output 
> queue to use the "Single node" load balancing  strategy? The documentation 
> for the latest says that " the node the flows will be sent to is not defined".
>
> What is the recommended approach for this?
>
> Thanks


Load balancing strategy/thoughts

2019-04-01 Thread Jean-Sebastien Vachon
Hi all,

over the last couple of days, I've been playing with the different load 
balancing options.
They all seem to  do what they are designed for but I have a small issue and I 
am not sure how to deal with this...

Let's say I have a process A which output is load balanced on all the nodes 
within my cluster to a processor B. Once everything has been processed on each 
node, I want to bring back everything to the same node to merge the results 
together and perform additional processing.

Should I use a MergeContent on the main/primary node? or configure the output 
queue to use the "Single node" load balancing  strategy? The documentation for 
the latest says that " the node the flows will be sent to is not defined".

What is the recommended approach for this?

Thanks


Re: Problem with load balancing option

2019-03-25 Thread Jean-Sebastien Vachon
Hi,

I saw that bug report and I will upgrade to the latest version ASAP. But my 
main problem was the lack of the section to configure the load balancer 
correctly. Once I've added the section and opened the required ports in my 
infrastructure, everything started to work as expected and it is a life changer 


The load is now properly balanced between all nodes and the performance boost I 
got is outstanding

One note however, I've checked the migration guide from 1.8 to 1.9 and didn't 
see any mention of this new section within nifi.properties. It might be good 
idea to add a section about this so that people upgrading their cluster have 
all the information at hand. This might save them some time.

Thanks all for your outstanding work

From: Koji Kawamura 
Sent: Sunday, March 24, 2019 10:39 PM
To: users@nifi.apache.org
Cc: Jean-Sebastien Vachon
Subject: Re: Problem with load balancing option

Hi,

That looks similar to this one:
Occasionally FlowFiles appear to get "stuck" in a Load-Balanced Connection
https://issues.apache.org/jira/browse/NIFI-5919

If you're using NiFi 1.8.0, I recommend trying the latest 1.9.1 which
has the fix for the above issue.

Hope this helps.

Koji

On Sat, Mar 23, 2019 at 12:15 AM Jean-Sebastien Vachon
 wrote:
>
> Hi,
>
> FYI, I managed to get my node back by removing the node from the cluster, 
> deleting the local flow and restart Nifi.
>
> Hope this helps identify the issue
> ________
> From: Jean-Sebastien Vachon 
> Sent: Friday, March 22, 2019 10:56 AM
> To: users@nifi.apache.org
> Subject: Re: Problem with load balancing option
>
> Hi again,
>
> I thought everything was fine but one of my node can not start..
>
> 2019-03-22 14:51:27,811 INFO [main] o.a.n.wali.SequentialAccessWriteAheadLog 
> Successfully recovered 10396 records in 367 milliseconds. Now checkpointing 
> to ensure that Write-Ahead Log is in a consistent state
> 2019-03-22 14:51:28,046 INFO [main] o.a.n.wali.SequentialAccessWriteAheadLog 
> Checkpointed Write-Ahead Log with 10396 Records and 0 Swap Files in 235 
> milliseconds (Stop-the-world time = 6 milliseconds), max Transaction ID 24370
> 2019-03-22 14:51:28,065 ERROR [main] o.a.nifi.controller.StandardFlowService 
> Failed to load flow from cluster due to: 
> org.apache.nifi.cluster.ConnectionExcepti
> on: Failed to connect node to cluster due to: 
> java.lang.ArrayIndexOutOfBoundsException: -1
> org.apache.nifi.cluster.ConnectionException: Failed to connect node to 
> cluster due to: java.lang.ArrayIndexOutOfBoundsException: -1
> at 
> org.apache.nifi.controller.StandardFlowService.loadFromConnectionResponse(StandardFlowService.java:1009)
> at 
> org.apache.nifi.controller.StandardFlowService.load(StandardFlowService.java:539)
> at org.apache.nifi.web.server.JettyServer.start(JettyServer.java:939)
> at org.apache.nifi.NiFi.(NiFi.java:157)
> at org.apache.nifi.NiFi.(NiFi.java:71)
> at org.apache.nifi.NiFi.main(NiFi.java:296)
> Caused by: java.lang.ArrayIndexOutOfBoundsException: -1
> at 
> org.apache.nifi.controller.queue.clustered.partition.CorrelationAttributePartitioner.getPartition(CorrelationAttributePartitioner.java:44)
> at 
> org.apache.nifi.controller.queue.clustered.SocketLoadBalancedFlowFileQueue.getPartition(SocketLoadBalancedFlowFileQueue.java:611)
> at 
> org.apache.nifi.controller.queue.clustered.SocketLoadBalancedFlowFileQueue.putAndGetPartition(SocketLoadBalancedFlowFileQueue.java:749)
> at 
> org.apache.nifi.controller.queue.clustered.SocketLoadBalancedFlowFileQueue.put(SocketLoadBalancedFlowFileQueue.java:739)
> at 
> org.apache.nifi.controller.repository.WriteAheadFlowFileRepository.loadFlowFiles(WriteAheadFlowFileRepository.java:587)
> at 
> org.apache.nifi.controller.FlowController.initializeFlow(FlowController.java:818)
> at 
> org.apache.nifi.controller.StandardFlowService.initializeController(StandardFlowService.java:1019)
> at 
> org.apache.nifi.controller.StandardFlowService.loadFromConnectionResponse(StandardFlowService.java:991)
> ... 5 common frames omitted
>
> Any idea?
> 
> From: Jean-Sebastien Vachon
> Sent: Friday, March 22, 2019 10:34 AM
> To: Jean-Sebastien Vachon; users@nifi.apache.org
> Subject: Re: Problem with load balancing option
>
> Hi,
>
> I stopped each node one by one and the queue is now empty. Not sure if this 
> is a bug or intended but it does look strange from a user point of view
>
> Thanks
> 
> From: Jean-Sebastien Vachon 
> Sent: Friday, March 22, 2019 10:28 AM
> To: users@nifi.apache.org
> Subject: Problem with load balancing op

Re: No load on second node

2019-03-22 Thread Jean-Sebastien Vachon
Ah that's what is missing from my configuration. Like I said, I've upgraded 
from 1.7 and didn't check for new configuration options.

Thanks, that will make it a lot easier to fix

From: Bryan Bende 
Sent: Friday, March 22, 2019 1:38 PM
To: users@nifi.apache.org
Subject: Re: No load on second node

Hello,

The host and port used for load balancing are defined in
nifi.properties of each node with the following properties:

# cluster load balancing properties #
nifi.cluster.load.balance.host=
nifi.cluster.load.balance.port=6342
nifi.cluster.load.balance.connections.per.node=4
nifi.cluster.load.balance.max.thread.count=8
nifi.cluster.load.balance.comms.timeout=30 sec

Thanks,

Bryan

On Fri, Mar 22, 2019 at 1:37 PM Jean-Sebastien Vachon
 wrote:
>
> I am living in a very strict environment...  is a new port being used for 
> load balancing?
>
> The logs show 8088 (which is my Jetty port) which is open and available for 
> the node not receiving anything.
>
> I've read these two articles:
> https://blogs.apache.org/nifi/entry/load-balancing-across-the-cluster
>
> and
>
> https://pierrevillard.com/2018/10/29/nifi-1-8-revolutionizing-the-list-fetch-pattern-and-more/
>
> and I don't see any mention in this regard...
>
> My last hope will be to upgrade to the latest version (1.9.1)
>
> Thanks
>
>
> 
> From: Jean-Sebastien Vachon 
> Sent: Friday, March 22, 2019 11:41 AM
> To: users@nifi.apache.org
> Subject: No load on second node
>
> Hi all,
>
> What's the best strategy to get load balancing working properly? I've 
> configure one of the very first connection of my flow to use one of the load 
> balancing option so that flows are processed on both machines.
>
> However, one of my two nodes is not doing anything. The load one the first 
> server is way higher than the load on the second (60 vs 1).
>
> I tailed the logs on both servers and there is not much information except 
> for the following:
>
> 2019-03-22 15:40:17,579 ERROR [Load-Balanced Client Thread-7] 
> o.a.n.c.q.c.c.a.n.NioAsyncLoadBalanceClient Unable to connect to 
> 10.0.2.132:8088 for load balancing
> java.net.ConnectException: Connection timed out
>
> I used telnet to connect and everything seems fine. I recently upgraded from 
> Nifi 1.7.. could it be that I'm missing some configuration ?
>
> Thanks


Re: No load on second node

2019-03-22 Thread Jean-Sebastien Vachon
I am living in a very strict environment...  is a new port being used for load 
balancing?

The logs show 8088 (which is my Jetty port) which is open and available for the 
node not receiving anything.

I've read these two articles:
https://blogs.apache.org/nifi/entry/load-balancing-across-the-cluster

and

https://pierrevillard.com/2018/10/29/nifi-1-8-revolutionizing-the-list-fetch-pattern-and-more/

and I don't see any mention in this regard...

My last hope will be to upgrade to the latest version (1.9.1)

Thanks



From: Jean-Sebastien Vachon 
Sent: Friday, March 22, 2019 11:41 AM
To: users@nifi.apache.org
Subject: No load on second node

Hi all,

What's the best strategy to get load balancing working properly? I've configure 
one of the very first connection of my flow to use one of the load balancing 
option so that flows are processed on both machines.

However, one of my two nodes is not doing anything. The load one the first 
server is way higher than the load on the second (60 vs 1).

I tailed the logs on both servers and there is not much information except for 
the following:

2019-03-22 15:40:17,579 ERROR [Load-Balanced Client Thread-7] 
o.a.n.c.q.c.c.a.n.NioAsyncLoadBalanceClient Unable to connect to 
10.0.2.132:8088 for load balancing
java.net.ConnectException: Connection timed out

I used telnet to connect and everything seems fine. I recently upgraded from 
Nifi 1.7.. could it be that I'm missing some configuration ?

Thanks


No load on second node

2019-03-22 Thread Jean-Sebastien Vachon
Hi all,

What's the best strategy to get load balancing working properly? I've configure 
one of the very first connection of my flow to use one of the load balancing 
option so that flows are processed on both machines.

However, one of my two nodes is not doing anything. The load one the first 
server is way higher than the load on the second (60 vs 1).

I tailed the logs on both servers and there is not much information except for 
the following:

2019-03-22 15:40:17,579 ERROR [Load-Balanced Client Thread-7] 
o.a.n.c.q.c.c.a.n.NioAsyncLoadBalanceClient Unable to connect to 
10.0.2.132:8088 for load balancing
java.net.ConnectException: Connection timed out

I used telnet to connect and everything seems fine. I recently upgraded from 
Nifi 1.7.. could it be that I'm missing some configuration ?

Thanks


Re: Problem with load balancing option

2019-03-22 Thread Jean-Sebastien Vachon
Hi,

FYI, I managed to get my node back by removing the node from the cluster, 
deleting the local flow and restart Nifi.

Hope this helps identify the issue

From: Jean-Sebastien Vachon 
Sent: Friday, March 22, 2019 10:56 AM
To: users@nifi.apache.org
Subject: Re: Problem with load balancing option

Hi again,

I thought everything was fine but one of my node can not start..

2019-03-22 14:51:27,811 INFO [main] o.a.n.wali.SequentialAccessWriteAheadLog 
Successfully recovered 10396 records in 367 milliseconds. Now checkpointing to 
ensure that Write-Ahead Log is in a consistent state
2019-03-22 14:51:28,046 INFO [main] o.a.n.wali.SequentialAccessWriteAheadLog 
Checkpointed Write-Ahead Log with 10396 Records and 0 Swap Files in 235 
milliseconds (Stop-the-world time = 6 milliseconds), max Transaction ID 24370
2019-03-22 14:51:28,065 ERROR [main] o.a.nifi.controller.StandardFlowService 
Failed to load flow from cluster due to: 
org.apache.nifi.cluster.ConnectionExcepti
on: Failed to connect node to cluster due to: 
java.lang.ArrayIndexOutOfBoundsException: -1
org.apache.nifi.cluster.ConnectionException: Failed to connect node to cluster 
due to: java.lang.ArrayIndexOutOfBoundsException: -1
at 
org.apache.nifi.controller.StandardFlowService.loadFromConnectionResponse(StandardFlowService.java:1009)
at 
org.apache.nifi.controller.StandardFlowService.load(StandardFlowService.java:539)
at org.apache.nifi.web.server.JettyServer.start(JettyServer.java:939)
at org.apache.nifi.NiFi.(NiFi.java:157)
at org.apache.nifi.NiFi.(NiFi.java:71)
at org.apache.nifi.NiFi.main(NiFi.java:296)
Caused by: java.lang.ArrayIndexOutOfBoundsException: -1
at 
org.apache.nifi.controller.queue.clustered.partition.CorrelationAttributePartitioner.getPartition(CorrelationAttributePartitioner.java:44)
at 
org.apache.nifi.controller.queue.clustered.SocketLoadBalancedFlowFileQueue.getPartition(SocketLoadBalancedFlowFileQueue.java:611)
at 
org.apache.nifi.controller.queue.clustered.SocketLoadBalancedFlowFileQueue.putAndGetPartition(SocketLoadBalancedFlowFileQueue.java:749)
at 
org.apache.nifi.controller.queue.clustered.SocketLoadBalancedFlowFileQueue.put(SocketLoadBalancedFlowFileQueue.java:739)
at 
org.apache.nifi.controller.repository.WriteAheadFlowFileRepository.loadFlowFiles(WriteAheadFlowFileRepository.java:587)
at 
org.apache.nifi.controller.FlowController.initializeFlow(FlowController.java:818)
at 
org.apache.nifi.controller.StandardFlowService.initializeController(StandardFlowService.java:1019)
at 
org.apache.nifi.controller.StandardFlowService.loadFromConnectionResponse(StandardFlowService.java:991)
... 5 common frames omitted

Any idea?

From: Jean-Sebastien Vachon
Sent: Friday, March 22, 2019 10:34 AM
To: Jean-Sebastien Vachon; users@nifi.apache.org
Subject: Re: Problem with load balancing option

Hi,

I stopped each node one by one and the queue is now empty. Not sure if this is 
a bug or intended but it does look strange from a user point of view

Thanks

From: Jean-Sebastien Vachon 
Sent: Friday, March 22, 2019 10:28 AM
To: users@nifi.apache.org
Subject: Problem with load balancing option

Hi all,

I've configured one of my connection to use the "partition by attribute" load 
balancing option.
It was not working as expected and after a few tests I realized I was missing 
some dependencies on the cluster nodes so I stopped everything (not related to 
the load balancing or Nifi at all)

Now, I stopped everything before fixing  my dependencies issues and the UI 
shows 1906 items in the queue for that connection but I can't list them or 
empty the queue.
Nifi tells me that there are no flow files in the queue when I try to list them 
and that 0 flowfiles out of 1906 were removed from the queue.

I tried connecting the destination to some other process like a LogMessage 
processor but nothing is happening. The 1906 items are stuck and I cannot 
delete the connection because it's not empty.

Any recommendations to fix this?

thanks



Re: Problem with load balancing option

2019-03-22 Thread Jean-Sebastien Vachon
Hi again,

I thought everything was fine but one of my node can not start..

2019-03-22 14:51:27,811 INFO [main] o.a.n.wali.SequentialAccessWriteAheadLog 
Successfully recovered 10396 records in 367 milliseconds. Now checkpointing to 
ensure that Write-Ahead Log is in a consistent state
2019-03-22 14:51:28,046 INFO [main] o.a.n.wali.SequentialAccessWriteAheadLog 
Checkpointed Write-Ahead Log with 10396 Records and 0 Swap Files in 235 
milliseconds (Stop-the-world time = 6 milliseconds), max Transaction ID 24370
2019-03-22 14:51:28,065 ERROR [main] o.a.nifi.controller.StandardFlowService 
Failed to load flow from cluster due to: 
org.apache.nifi.cluster.ConnectionExcepti
on: Failed to connect node to cluster due to: 
java.lang.ArrayIndexOutOfBoundsException: -1
org.apache.nifi.cluster.ConnectionException: Failed to connect node to cluster 
due to: java.lang.ArrayIndexOutOfBoundsException: -1
at 
org.apache.nifi.controller.StandardFlowService.loadFromConnectionResponse(StandardFlowService.java:1009)
at 
org.apache.nifi.controller.StandardFlowService.load(StandardFlowService.java:539)
at org.apache.nifi.web.server.JettyServer.start(JettyServer.java:939)
at org.apache.nifi.NiFi.(NiFi.java:157)
at org.apache.nifi.NiFi.(NiFi.java:71)
at org.apache.nifi.NiFi.main(NiFi.java:296)
Caused by: java.lang.ArrayIndexOutOfBoundsException: -1
at 
org.apache.nifi.controller.queue.clustered.partition.CorrelationAttributePartitioner.getPartition(CorrelationAttributePartitioner.java:44)
at 
org.apache.nifi.controller.queue.clustered.SocketLoadBalancedFlowFileQueue.getPartition(SocketLoadBalancedFlowFileQueue.java:611)
at 
org.apache.nifi.controller.queue.clustered.SocketLoadBalancedFlowFileQueue.putAndGetPartition(SocketLoadBalancedFlowFileQueue.java:749)
at 
org.apache.nifi.controller.queue.clustered.SocketLoadBalancedFlowFileQueue.put(SocketLoadBalancedFlowFileQueue.java:739)
at 
org.apache.nifi.controller.repository.WriteAheadFlowFileRepository.loadFlowFiles(WriteAheadFlowFileRepository.java:587)
at 
org.apache.nifi.controller.FlowController.initializeFlow(FlowController.java:818)
at 
org.apache.nifi.controller.StandardFlowService.initializeController(StandardFlowService.java:1019)
at 
org.apache.nifi.controller.StandardFlowService.loadFromConnectionResponse(StandardFlowService.java:991)
... 5 common frames omitted

Any idea?

From: Jean-Sebastien Vachon
Sent: Friday, March 22, 2019 10:34 AM
To: Jean-Sebastien Vachon; users@nifi.apache.org
Subject: Re: Problem with load balancing option

Hi,

I stopped each node one by one and the queue is now empty. Not sure if this is 
a bug or intended but it does look strange from a user point of view

Thanks

From: Jean-Sebastien Vachon 
Sent: Friday, March 22, 2019 10:28 AM
To: users@nifi.apache.org
Subject: Problem with load balancing option

Hi all,

I've configured one of my connection to use the "partition by attribute" load 
balancing option.
It was not working as expected and after a few tests I realized I was missing 
some dependencies on the cluster nodes so I stopped everything (not related to 
the load balancing or Nifi at all)

Now, I stopped everything before fixing  my dependencies issues and the UI 
shows 1906 items in the queue for that connection but I can't list them or 
empty the queue.
Nifi tells me that there are no flow files in the queue when I try to list them 
and that 0 flowfiles out of 1906 were removed from the queue.

I tried connecting the destination to some other process like a LogMessage 
processor but nothing is happening. The 1906 items are stuck and I cannot 
delete the connection because it's not empty.

Any recommendations to fix this?

thanks



Re: Problem with load balancing option

2019-03-22 Thread Jean-Sebastien Vachon
Hi,

I stopped each node one by one and the queue is now empty. Not sure if this is 
a bug or intended but it does look strange from a user point of view

Thanks

From: Jean-Sebastien Vachon 
Sent: Friday, March 22, 2019 10:28 AM
To: users@nifi.apache.org
Subject: Problem with load balancing option

Hi all,

I've configured one of my connection to use the "partition by attribute" load 
balancing option.
It was not working as expected and after a few tests I realized I was missing 
some dependencies on the cluster nodes so I stopped everything (not related to 
the load balancing or Nifi at all)

Now, I stopped everything before fixing  my dependencies issues and the UI 
shows 1906 items in the queue for that connection but I can't list them or 
empty the queue.
Nifi tells me that there are no flow files in the queue when I try to list them 
and that 0 flowfiles out of 1906 were removed from the queue.

I tried connecting the destination to some other process like a LogMessage 
processor but nothing is happening. The 1906 items are stuck and I cannot 
delete the connection because it's not empty.

Any recommendations to fix this?

thanks



Problem with load balancing option

2019-03-22 Thread Jean-Sebastien Vachon
Hi all,

I've configured one of my connection to use the "partition by attribute" load 
balancing option.
It was not working as expected and after a few tests I realized I was missing 
some dependencies on the cluster nodes so I stopped everything (not related to 
the load balancing or Nifi at all)

Now, I stopped everything before fixing  my dependencies issues and the UI 
shows 1906 items in the queue for that connection but I can't list them or 
empty the queue.
Nifi tells me that there are no flow files in the queue when I try to list them 
and that 0 flowfiles out of 1906 were removed from the queue.

I tried connecting the destination to some other process like a LogMessage 
processor but nothing is happening. The 1906 items are stuck and I cannot 
delete the connection because it's not empty.

Any recommendations to fix this?

thanks



Re: Problem with inferred schema

2019-03-11 Thread Jean-Sebastien Vachon
Oh sorry.. the \r\n is due to bad copy/paste and it's not part of my schema. I 
will have a look at 1.9

Thanks

From: Matt Burgess 
Sent: Monday, March 11, 2019 2:09 PM
To: users@nifi.apache.org
Subject: Re: Problem with inferred schema

It might be because you have inferred an array whose items are null
(due to the fact that the incoming data had empty arrays), also I'm
not sure if this was a result of your shortening up the schema, but
you have a field called "filterstatus" whose type shows up (in the
email anyway) as "strin\r\ng". Is that carriage return character in
your actual schema? If so, that's not a valid type name which seems
consistent with the Avro exception. If not, please feel free to
provide the whole schema (either in the email or in a Gist or
whatever) and I'll try to reproduce.

Also in NiFi 1.9.0 the RecordReaders have schema inference capability,
so you may have more luck with those than the InferAvroSchema
processor.

Regards,
Matt

On Mon, Mar 11, 2019 at 2:03 PM Jean-Sebastien Vachon
 wrote:
>
> Hi all,
>
> I am trying to use a MergeRecord processor to convert JSON data into its Avro 
> equivalent and I am getting the following exception:
>
> Caused by: org.apache.nifi.schema.access.SchemaNotFoundException: Failed to 
> create schema from the Schema Text after evaluating FlowFile Attributes
> at 
> org.apache.nifi.schema.access.AvroSchemaTextStrategy.getSchema(AvroSchemaTextStrategy.java:56)
> at 
> org.apache.nifi.serialization.SchemaRegistryService.getSchema(SchemaRegistryService.java:125)
> at 
> org.apache.nifi.json.JsonTreeReader.createRecordReader(JsonTreeReader.java:73)
> at 
> org.apache.nifi.serialization.RecordReaderFactory.createRecordReader(RecordReaderFactory.java:46)
> at sun.reflect.GeneratedMethodAccessor516.invoke(Unknown Source)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at 
> org.apache.nifi.controller.service.StandardControllerServiceInvocationHandler.invoke(StandardControllerServiceInvocationHandler.java:84)
> at com.sun.proxy.$Proxy89.createRecordReader(Unknown Source)
> at 
> org.apache.nifi.processors.standard.MergeRecord.binFlowFile(MergeRecord.java:365)
> ... 11 common frames omitted
> Caused by: org.apache.avro.AvroRuntimeException: Not a named type:
>
> I am using a InferAvroSchema to compute the schema from the data as I want my 
> process to be as generic as possible.
> The data is then streamed into the MergeRecord processor which is configured 
> to use a JsonTreeReader with the "Use schema Text" strategy and the "Schema 
> text" set to "${inferred.avro.schema}".
> My schema (below) is a bit long so I tried to shorten it a little to make it 
> easier to read.
>
> Any idea what could cause the exception? all fields seem to be named 
> correctly but I am suspecting that the inner array "urls" might be causing 
> this.
>
> Thanks
>
> {
>   "type": "array",
>   "items": {
> "type": "record",
> "name": "toto",
> "fields": [
>   {
> "name": "body",
> "type": "string",
> "doc": "Type inferred from '\"some html\\nthis is some text\"'"
>   },
>   {
> "name": "takeout",
> "type": "boolean",
> "doc": "Type inferred from 'false'"
>   },
>   {
> "name": "redirected\r\nTo",
> "type": "string",
> "doc": "Type inferred from '\"\"'"
>   },
>   {
> "name": "delivery",
> "type": "boolean",
> "doc": "Type inferred from 'false'"
>   },
>   {
> "name": "pageHistoryharvestTS",
> "type": "string",
> "doc": "Type inferred from '\"2019/01/16 15:19:48\"'"
>   },
>   {
> "name": "pageHistoryorganizationName",
> "type": "string",
> "doc": "Type inferred from '\"xxx\"'"
>   },
>   {
> "name": "pageHistoryhtmlDigest",
> "type": "string",
> "doc": "Type inferred from '\"76439be59e8ee584099f102a3a7f4f

Problem with inferred schema

2019-03-11 Thread Jean-Sebastien Vachon
Hi all,

I am trying to use a MergeRecord processor to convert JSON data into its Avro 
equivalent and I am getting the following exception:

Caused by: org.apache.nifi.schema.access.SchemaNotFoundException: Failed to 
create schema from the Schema Text after evaluating FlowFile Attributes
at 
org.apache.nifi.schema.access.AvroSchemaTextStrategy.getSchema(AvroSchemaTextStrategy.java:56)
at 
org.apache.nifi.serialization.SchemaRegistryService.getSchema(SchemaRegistryService.java:125)
at 
org.apache.nifi.json.JsonTreeReader.createRecordReader(JsonTreeReader.java:73)
at 
org.apache.nifi.serialization.RecordReaderFactory.createRecordReader(RecordReaderFactory.java:46)
at sun.reflect.GeneratedMethodAccessor516.invoke(Unknown Source)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.apache.nifi.controller.service.StandardControllerServiceInvocationHandler.invoke(StandardControllerServiceInvocationHandler.java:84)
at com.sun.proxy.$Proxy89.createRecordReader(Unknown Source)
at 
org.apache.nifi.processors.standard.MergeRecord.binFlowFile(MergeRecord.java:365)
... 11 common frames omitted
Caused by: org.apache.avro.AvroRuntimeException: Not a named type:

I am using a InferAvroSchema to compute the schema from the data as I want my 
process to be as generic as possible.
The data is then streamed into the MergeRecord processor which is configured to 
use a JsonTreeReader with the "Use schema Text" strategy and the "Schema text" 
set to "${inferred.avro.schema}".
My schema (below) is a bit long so I tried to shorten it a little to make it 
easier to read.

Any idea what could cause the exception? all fields seem to be named correctly 
but I am suspecting that the inner array "urls" might be causing this.

Thanks

{
  "type": "array",
  "items": {
"type": "record",
"name": "toto",
"fields": [
  {
"name": "body",
"type": "string",
"doc": "Type inferred from '\"some html\\nthis is some text\"'"
  },
  {
"name": "takeout",
"type": "boolean",
"doc": "Type inferred from 'false'"
  },
  {
"name": "redirected\r\nTo",
"type": "string",
"doc": "Type inferred from '\"\"'"
  },
  {
"name": "delivery",
"type": "boolean",
"doc": "Type inferred from 'false'"
  },
  {
"name": "pageHistoryharvestTS",
"type": "string",
"doc": "Type inferred from '\"2019/01/16 15:19:48\"'"
  },
  {
"name": "pageHistoryorganizationName",
"type": "string",
"doc": "Type inferred from '\"xxx\"'"
  },
  {
"name": "pageHistoryhtmlDigest",
"type": "string",
"doc": "Type inferred from '\"76439be59e8ee584099f102a3a7f4f79\"'"
  },
  {
"name": "pageHistoryorganizationId",
"type": "int",
"doc": "Type inferred from '12'"
  },
  {
"name": "pageHistorywebsite_id",
"type": "int",
"doc": "Type inferred from '12'"
  },
  {
"name": "pageHistoryqueryTime",
"type": "long",
"doc": "Type inferred from '9450795888900757'"
  },
  {
"name": "pageHistoryhttpCode",
"type": "int",
"doc": "Type inferred from '200'"
  },
  {
"name": "pageHistorypriority",
"type": "int",
"doc": "Type inferred from '1000'"
  },
  {
"name": "pageHistoryrunId",
"type": "int",
"doc": "Type inferred from '2'"
  },
  {
"name": "pageHistoryid",
"type\r\n": "int",
"doc": "Type inferred from '116'"
  },
  {
"name": "pageHistoryurlId",
"type": "int",
"doc": "Type inferred from '12'"
  },
  {
"name": "pageHistoryurlLevel",
"type": "int",
"doc": "Type inferred from '0'"
  },
  {
"name": "pageHistoryspiderStatus",
"type": "string",
"doc": "Type inferred from '\"done\"'"
  },
  {
"name": "pageHistoryurlAddress",
"type": "string",
"doc": "Type inferred from '\"http://somesite.com\;'"
  },
  {
"name": "urlCount",
"type": "int",
"doc": "Type inferred from '1'"
  },
  {
"name": "html",
"type": "string",
"doc": "Type inferred from '\"some htmlthis is some 
text\"'"
  },
  {
"name": "urls",
"type": {
  "type": "array",
  "items": {
"type": "record",
"name": "urls",
"fields": [
  {
"name": "url",
"type": "string",
"doc": "Type inferred from 
'\"http://somesite.com/thismenudoestnotexistpdf\;'"
  },
  {
"name": "text",
   

Re: PutElasticsearchHttp can not use Flowfile attribute for ES_URL

2019-02-05 Thread Jean-Sebastien Vachon
Thanks all for the feedback.

From: Joe Percivall 
Sent: Monday, February 4, 2019 9:59 PM
To: users@nifi.apache.org
Subject: Re: PutElasticsearchHttp can not use Flowfile attribute for ES_URL

I believe also one of the reasons this was done is because PutElasticsearchHttp 
takes in batches of FlowFiles and does a bulk insert. In order to support 
FlowFile attribute expression on the URL, we would have to either only act on 
one FlowFile at a time or determine another mechanism for handling that 
ambiguity.

PutElasticsearcHttpRecord on the other hand only takes in a single FlowFile 
with each onTrigger and could be more easily updated to support that use-case.

Cheers,
Joe

On Mon, Feb 4, 2019 at 5:05 PM Matt Burgess 
mailto:mattyb...@apache.org>> wrote:
The restriction to using the variable registry only has always been
there AFAIK, but as of 1.6 we made the distinction in documentation on
how expression language would be evaluated for each property. The
choice was so that we weren't constantly recreating connections for
each flow file, in fact all concurrent tasks share the same underlying
OkHttpClient.

We could probably do something fancier where we allow flowfile
attributes to be evaluated as well, but have a modestly-sized
least-recently-used (LRU) cache of clients, keeping them open until
they are evicted (and closing them all when stopped). Please feel free
to file an improvement Jira and we can discuss further there.

Regards,
Matt

On Mon, Feb 4, 2019 at 4:16 PM Jean-Sebastien Vachon
mailto:jsvac...@brizodata.com>> wrote:
>
> Hi all,
>
> I was just finishing modifying my flow to make it more reusable by having my 
> source document containing information about where to store the final 
> document (some Elasticsearch index)
> Everything was fine until I found out that the PutElasticsearchHttp's 
> documentation was saying this...
>
> Supports Expression Language: true (will be evaluated using variable registry 
> only)
>
>
> It looks like this restriction appeared around Nifi 1.6 (as per the 
> documentation)... is there a reason for such a limitation?
>
> My current flow was extracting the information from the input JSON document 
> and saving the information inside a Flow attribute.
>
> What can I do about this?  I don't like monkey patching.. is there any other 
> way to get around this?
>
> Thanks


--
Joe Percivall
linkedin.com/in/Percivall<http://linkedin.com/in/Percivall>
e: jperciv...@apache.com<mailto:jperciv...@apache.com>


Re: PutElasticsearchHttp can not use Flowfile attribute for ES_URL

2019-02-04 Thread Jean-Sebastien Vachon
Hi Luis,

thanks for the hint... this is indeed a good work around. So simple that I 
should have thought about it 

Regards

From: Luis Carmona 
Sent: Monday, February 4, 2019 4:23 PM
To: users
Subject: Re: PutElasticsearchHttp can not use Flowfile attribute for ES_URL

HI Jean,

I'm not even near to be an expert on NIFI, but I did accomplish to put to work 
an scenario similar to the one you describe.

I was able to read some data, process it and store the Json payload in ES. I 
used HTTP invoke, instead of ES Processors.

Hope it helps you.

Regards,

LC



De: "Jean-Sebastien Vachon" 
Para: "users" 
Enviados: Lunes, 4 de Febrero 2019 18:16:23
Asunto: PutElasticsearchHttp can not use Flowfile attribute for ES_URL

Hi all,

I was just finishing modifying my flow to make it more reusable by having my 
source document containing information about where to store the final document 
(some Elasticsearch index)
Everything was fine until I found out that the PutElasticsearchHttp's 
documentation was saying this...

Supports Expression Language: true (will be evaluated using variable registry 
only)

It looks like this restriction appeared around Nifi 1.6 (as per the 
documentation)... is there a reason for such a limitation?

My current flow was extracting the information from the input JSON document and 
saving the information inside a Flow attribute.

What can I do about this?  I don't like monkey patching.. is there any other 
way to get around this?

Thanks



PutElasticsearchHttp can not use Flowfile attribute for ES_URL

2019-02-04 Thread Jean-Sebastien Vachon
Hi all,

I was just finishing modifying my flow to make it more reusable by having my 
source document containing information about where to store the final document 
(some Elasticsearch index)
Everything was fine until I found out that the PutElasticsearchHttp's 
documentation was saying this...

Supports Expression Language: true (will be evaluated using variable registry 
only)

It looks like this restriction appeared around Nifi 1.6 (as per the 
documentation)... is there a reason for such a limitation?

My current flow was extracting the information from the input JSON document and 
saving the information inside a Flow attribute.

What can I do about this?  I don't like monkey patching.. is there any other 
way to get around this?

Thanks


ExecuteStreamCommand vs variables

2018-12-21 Thread Jean-Sebastien Vachon
Hi all,


I have a few python modules based on  ExecuteStreamCommand  and I need them to 
access process groups variables.

How shall I access them? is it through the environment? That's what I used to 
do in Nifi 1.7 but it does not seem to work under Nifi 1.8 as  I am getting 
KeyErrors.


Thanks


Re: NIfi cluster on Docker

2018-12-18 Thread Jean-Sebastien Vachon
Thanks for the advice.

What about Zookeeper? are you running it within ECS as well? or are you 
managing it outside of ECS?

Are there other alternatives?


Thanks


From: Austin Heyne 
Sent: Tuesday, December 18, 2018 2:17:24 PM
To: users@nifi.apache.org
Subject: Re: NIfi cluster on Docker

We're using Nifi in ECS and the only gotcha's that come to mind are
making sure you have a big enough EBS drive on the instance to handle
all your flow files. You might get better performance with local storage
but EBS has been good enough for us and we push a lot of data through
it. You also might also need to up your Ulimits in the task definition
and finally I would look into Nifi Registry since getting the
flow.xml.gz out is a pain when you need to save it after making changes.

-Austin


On 12/18/2018 10:33 AM, Joe Witt wrote:
> Hello
>
> Not aware of folks working with AWS/Fargate but I am aware of a slew
> of NiFi on K8S work including NiFi on EKS...so I know that works well.
>
> Curious to hear your progress.
>
> Thanks
> Joe
> On Tue, Dec 18, 2018 at 7:57 AM Jean-Sebastien Vachon
>  wrote:
>> Hi all,
>>
>>
>> anyone has tried running a Nifi cluster on AWS ECS/fargate?
>>
>> Just curious as I will probably give it a try shortly and it would be nice 
>> to have some advice before I am getting into this.
>>
>>
>> Thanks

--
Austin L. Heyne



NIfi cluster on Docker

2018-12-18 Thread Jean-Sebastien Vachon
Hi all,


anyone has tried running a Nifi cluster on AWS ECS/fargate?

Just curious as I will probably give it a try shortly and it would be nice to 
have some advice before I am getting into this.


Thanks


exporting a template vs Variables vs other settings

2018-09-13 Thread Jean-Sebastien Vachon
Hi,

When I export/create a template based on my current flow, is there a way to 
extract the current variables/settings at the same time?

Is there a way of setting these settings (max threads)  and variables through 
the nifi.properties (or other configuration file)?

I'm looking to fully automate the creation of a cluster through Terraform and 
that's about the only missing piece.

Thanks


RE: Running Nifi in cluster mode

2018-09-12 Thread Jean-Sebastien Vachon
Thanks Bryan,

This really helps.

-Original Message-
From: Bryan Bende  
Sent: September 12, 2018 10:23 AM
To: users@nifi.apache.org
Subject: Re: Running Nifi in cluster mode

Hello,

There is a graph of processors often referred to as the "flow" and each node in 
the cluster runs a copy of the flow. So all nodes are running the same 
components, with the exception of source processors that happen to be scheduled 
as primary node only.

The data must be divided across the cluster to make use of the cluster 
appropriately, and this depends on the source of your data [1].

Since all nodes are generally doing the same thing, it probably makes the most 
sense for them to similar in terms of hardware. They don't have to be, but NiFi 
is not making any decisions based on the hardware.

-Bryan

[1] 
https://community.hortonworks.com/articles/16120/how-do-i-distribute-data-across-a-nifi-cluster.html


On Wed, Sep 12, 2018 at 9:58 AM, Jean-Sebastien Vachon  
wrote:
> Hi all,
>
>
>
> Can someone tell me how Nifi manages/dispatches jobs to nodes in a cluster?
> Right now, I have a cluster of only three identical machines running 
> on AWS but I would like to be able to extend my cluster by adding spot 
> instances of different types and capacity. Will Nifi be aware that 
> some machines do not have the same capacity ? or should I try to keep 
> the capacity (CPU, RAM) the same across the cluster?
>
>
>
> Also, is Nifi looking at some metrics to determine where a given 
> processor should be executed? Does it have any load balancing 
> algorithm to spread the load as evenly as possible?
>
>
>
> Thanks


Running Nifi in cluster mode

2018-09-12 Thread Jean-Sebastien Vachon
Hi all,

Can someone tell me how Nifi manages/dispatches jobs to nodes in a cluster? 
Right now, I have a cluster of only three identical machines running on AWS but 
I would like to be able to extend my cluster by adding spot instances of 
different types and capacity. Will Nifi be aware that some machines do not have 
the same capacity ? or should I try to keep the capacity (CPU, RAM) the same 
across the cluster?

Also, is Nifi looking at some metrics to determine where a given processor 
should be executed? Does it have any load balancing algorithm to spread the 
load as evenly as possible?

Thanks


ListS3 processor

2018-09-10 Thread Jean-Sebastien Vachon
Hi all,

I am using a ListS3 processor to process a large number of files stored in S3 
but this processor only runs on the primary node. Could this be the cause of 
the heavy unbalanced distribution of the load amongst the three identical nodes 
I have?

Is there anyway of distributing the load to all nodes ? or should I simply 
replace ListS3 with something else?

Thanks


RE: Attributes vs JOLTTransformJSON

2018-07-18 Thread Jean-Sebastien Vachon
Hi,

Thanks for the advice. I don’t really like Javadoc but it did the job 

Regards

From: Juan Pablo Gardella 
Sent: July 18, 2018 5:46 PM
To: users@nifi.apache.org
Subject: Re: Attributes vs JOLTTransformJSON

The best docs are javadoc for Jolt. I suggest to checkout the code and read 
from there. It also has examples.

On Wed, 18 Jul 2018 at 18:41 Jean-Sebastien Vachon 
mailto:jsvac...@brizodata.com>> wrote:
Hi all,

I’m using a JOLT transformation at the very end of my flow to filter out some 
attributes that I don’t want to send to ElasticSearch for Indexing. So far, it 
is working great but I’d like to include the value of an attribute (docId) into 
the transformation as well.

My JOLT specs are:
[{
"operation": "shift",
"spec": {
"companyId": "&",
"companyName": "&",
"s3Key": "&",
"runId": "&",
"urls": "&",
"urlId": "&",
"urlLevel": "&",
"urlAddress": "&",
 "docId": "${docId}"
}
}]

When I run my flow through this processor, the result is (check the last field):

{
  "companyId" : 1,
  "companyName" : "some company",
  "s3Key" : "1.9fe1cf4d384cd0a4cec3d97f54ae5a8d.json",
  "runId" : 1,
  "urls" : [ {
"url" : "http://www.somecompany.com;,
"id" : 0,
"filter_status" : "ok"
  }, {
"url" : "http://www. 
somecompany.com/contact<http://somecompany.com/contact>",
"id" : 0,
"filter_status" : "ok"
  }, {
"url" : "http://www. somecompany.com/#nav<http://somecompany.com/#nav>",
"id" : 0,
"filter_status" : "ok"
  }, {
"url" : "http://www. somecompany.com#top<http://somecompany.com#top>",
"id" : 0,
"filter_status" : "ok"
  } ],
  "urlId" : 1,
  "urlLevel" : 0,
  "urlAddress" : "http://www. somecompany.com<http://somecompany.com>",
  "1001" : "1001"
}

I was expecting the last field to read like “docId”: “1001”…
Now, I’m pretty sure this is obvious to someone experienced with JOLT but I 
googled a bit and could not find good documentation about JOLT’s syntax.

Thanks
--
Jean-Sébastien Vachon
vacho...@gmail.com<mailto:jsvac...@brizodata.com>
jsvac...@brizodata.com<mailto:jsvac...@brizodata.com>
www.brizodata.com<http://www.brizodata.com/>



Attributes vs JOLTTransformJSON

2018-07-18 Thread Jean-Sebastien Vachon
Hi all,

I'm using a JOLT transformation at the very end of my flow to filter out some 
attributes that I don't want to send to ElasticSearch for Indexing. So far, it 
is working great but I'd like to include the value of an attribute (docId) into 
the transformation as well.

My JOLT specs are:
[{
"operation": "shift",
"spec": {
"companyId": "&",
"companyName": "&",
"s3Key": "&",
"runId": "&",
"urls": "&",
"urlId": "&",
"urlLevel": "&",
"urlAddress": "&",
 "docId": "${docId}"
}
}]

When I run my flow through this processor, the result is (check the last field):

{
  "companyId" : 1,
  "companyName" : "some company",
  "s3Key" : "1.9fe1cf4d384cd0a4cec3d97f54ae5a8d.json",
  "runId" : 1,
  "urls" : [ {
"url" : "http://www.somecompany.com;,
"id" : 0,
"filter_status" : "ok"
  }, {
"url" : "http://www. somecompany.com/contact",
"id" : 0,
"filter_status" : "ok"
  }, {
"url" : "http://www. somecompany.com/#nav",
"id" : 0,
"filter_status" : "ok"
  }, {
"url" : "http://www. somecompany.com#top",
"id" : 0,
"filter_status" : "ok"
  } ],
  "urlId" : 1,
  "urlLevel" : 0,
  "urlAddress" : "http://www. somecompany.com",
  "1001" : "1001"
}

I was expecting the last field to read like "docId": "1001"...
Now, I'm pretty sure this is obvious to someone experienced with JOLT but I 
googled a bit and could not find good documentation about JOLT's syntax.

Thanks
--
Jean-Sébastien Vachon

vacho...@gmail.com
jsvac...@brizodata.com
www.brizodata.com



RE: Send data to ElasticSearch

2018-07-16 Thread Jean-Sebastien Vachon
Thanks all. I will look into these


From: Matt Burgess 
Sent: July 16, 2018 3:42 PM
To: users@nifi.apache.org
Subject: Re: Send data to ElasticSearch

There’s a PutElasticsearchHttpRecord that should give you the best of both 
worlds, the Record API to convert data types and using the REST API.

On Jul 16, 2018, at 2:59 PM, Mike Thomsen 
mailto:mikerthom...@gmail.com>> wrote:
With PutElasticsearchHttp, you'll have to define the field in advance as a date 
one because it's not a record-aware processor (the Record API has data types 
that can solve this problem). Once you've defined your date fields, an ISO8601 
date string will suffice. If you have more than one index you'll need to write 
to, and the date field names are the same, I'd recommend writing an 
ElasticSearch index template to apply the rule to all indexes that match the 
supplied name pattern.

On Mon, Jul 16, 2018 at 2:36 PM Jean-Sebastien Vachon 
mailto:jsvac...@brizodata.com>> wrote:
Hi all,

I’m sending the result of my flow to ElasticSearch for indexing but ES fails to 
recognize my timestamp field for a date based field.
What is the recommended approach to fix this? I thought about adding some steps 
to my flow that would check if a mapping exists for my index and the send the 
mappings to ES if required. In case where the Mapping do not exists, ES will 
return something like:

{
  "error": {
"root_cause": [
  {
"type": "index_not_found_exception",
"reason": "no such index",
"resource.type": "index_or_alias",
"resource.id<http://resource.id>": "my_index",
"index_uuid": "_na_",
"index": "my_index"
  }
],
"type": "index_not_found_exception",
"reason": "no such index",
"resource.type": "index_or_alias",
"resource.id<http://resource.id>": "my_index",
"index_uuid": "_na_",
"index": "my_index"
  },
  "status": 404
}

So, If I fetch the response and look out for “status:404”, I would know whether 
or not I need to create the mappings. In this case, I would call a 
PutElasticSearchHttp with the proper mappings and everything should be fine. 
However, this seems like a lot of trouble for an operation that will most 
likely failed only once.

Are there any other ways of dealing with this?

Thanks
--
Jean-Sébastien Vachon
vacho...@gmail.com<mailto:jsvac...@brizodata.com>
jsvac...@brizodata.com<mailto:jsvac...@brizodata.com>
www.brizodata.com<http://www.brizodata.com/>



Send data to ElasticSearch

2018-07-16 Thread Jean-Sebastien Vachon
Hi all,

I'm sending the result of my flow to ElasticSearch for indexing but ES fails to 
recognize my timestamp field for a date based field.
What is the recommended approach to fix this? I thought about adding some steps 
to my flow that would check if a mapping exists for my index and the send the 
mappings to ES if required. In case where the Mapping do not exists, ES will 
return something like:

{
  "error": {
"root_cause": [
  {
"type": "index_not_found_exception",
"reason": "no such index",
"resource.type": "index_or_alias",
"resource.id": "my_index",
"index_uuid": "_na_",
"index": "my_index"
  }
],
"type": "index_not_found_exception",
"reason": "no such index",
"resource.type": "index_or_alias",
"resource.id": "my_index",
"index_uuid": "_na_",
"index": "my_index"
  },
  "status": 404
}

So, If I fetch the response and look out for "status:404", I would know whether 
or not I need to create the mappings. In this case, I would call a 
PutElasticSearchHttp with the proper mappings and everything should be fine. 
However, this seems like a lot of trouble for an operation that will most 
likely failed only once.

Are there any other ways of dealing with this?

Thanks
--
Jean-Sébastien Vachon

vacho...@gmail.com
jsvac...@brizodata.com
www.brizodata.com



RE: Top right menu won't open after upgrade

2018-07-16 Thread Jean-Sebastien Vachon
That was indeed the problem

thanks

-Original Message-
From: Joe Witt  
Sent: July 16, 2018 2:22 PM
To: users@nifi.apache.org
Subject: Re: Top right menu won't open after upgrade

Hello

Sounds like possibly a caching issue.  Try doing a hard cache clear/refresh in 
your web browser.

Thanks

On Mon, Jul 16, 2018 at 2:20 PM, Jean-Sebastien Vachon  
wrote:
> Hi all,
>
>
>
> I just upgraded my 1.4.1 cluster to 1.7.1 and discovered that the top 
> right menu is not opening when clicking on it.
>
> Anyone got this problem before? I can see that my three servers have 
> formed a cluster but can’t access any of the templates/cluster menu anymore.
>
>
>
> Thanks
>
>
>
> --
>
> Jean-Sébastien Vachon
>
> vacho...@gmail.com
>
> jsvac...@brizodata.com
>
> www.brizodata.com
>
>


RE: Top right menu won't open after upgrade

2018-07-16 Thread Jean-Sebastien Vachon
I've already checked the console for error when clicking on the menu and there 
are none. I do, however, see two errors when loading the page

Failed to load resource: the server responded with a status of 409 (Conflict)
Failed to load resource: the server responded with a status of 409 (Conflict)

Thanks

From: Jean-Sebastien Vachon 
Sent: July 16, 2018 2:20 PM
To: users@nifi.apache.org
Subject: Top right menu won't open after upgrade

Hi all,

I just upgraded my 1.4.1 cluster to 1.7.1 and discovered that the top right 
menu is not opening when clicking on it.
Anyone got this problem before? I can see that my three servers have formed a 
cluster but can't access any of the templates/cluster menu anymore.

Thanks

--
Jean-Sébastien Vachon
vacho...@gmail.com<mailto:jsvac...@brizodata.com>
jsvac...@brizodata.com<mailto:jsvac...@brizodata.com>
www.brizodata.com<http://www.brizodata.com/>



Top right menu won't open after upgrade

2018-07-16 Thread Jean-Sebastien Vachon
Hi all,

I just upgraded my 1.4.1 cluster to 1.7.1 and discovered that the top right 
menu is not opening when clicking on it.
Anyone got this problem before? I can see that my three servers have formed a 
cluster but can't access any of the templates/cluster menu anymore.

Thanks

--
Jean-Sébastien Vachon

vacho...@gmail.com
jsvac...@brizodata.com
www.brizodata.com



Question about how Nifi is handling ExecuteStreamCommand

2018-07-13 Thread Jean-Sebastien Vachon
Hi

Let says I have an external process that I am running using an 
ExecuteStreamCommand, will Nifi keep the process running until there is nothing 
left to process in the queue? Or will it instantiate a new process for every 
element within the queue (up to the setting for Concurrent Tasks).

Thanks
--
Jean-Sébastien Vachon

vacho...@gmail.com
jsvac...@brizodata.com
www.brizodata.com



RE: Merging output of multiple processors

2018-07-12 Thread Jean-Sebastien Vachon
Thanks Kevin,

Knowing that I don’t have to are about the amount of data helps a lot. This 
will (might) be even easier than I thought.

Thanks for your time and support.

From: Kevin Doran 
Sent: July 12, 2018 12:40 PM
To: users@nifi.apache.org
Subject: Re: Merging output of multiple processors

Hi Jean-Sébastien,

Sorry you’re running into trouble. NiFi can have a bit of a learning curve at 
first, but once you are comfortable with the components it comes with and how 
to use them effectively, it gets much faster to accomplish tasks such as your 
example.

In general, don’t worry too much about reducing the amount of data going 
around. Unless you modify the flow file contents, all content data is passed by 
reference in NiFi, meaning there is only one physical copy of it (stored in the 
flow file content repository) in NiFi’s storage at any time. In most cases, 
this is very efficient and you do not need to optimize the data for NiFi, it 
will try to do the intelligent thing for you. If you start to experience system 
resource exhaustion (e.g., run out of storage in content repository), or if 
copying data becomes a performance bottleneck in your flow, then take the time 
to optimize that aspect or tune the system to meet the demands of your data 
flow.

Keeping that in mind, here are a few pointers to help you get started:


  1.  Rather than split parts of your JSON object apart into separate flow 
files, keep the entire object together. From there, take advantage of 
processors that are designed to interpret (and manipulate) the flow file 
contents as JSON. Look at the JSON processors (you can filter processors by 
name, e.g., “Json” when adding one to the canvas), such as JoltTransformJSON, 
EvaluateJsonPath, FlattenJson, SplitJson, JsonPathReader.
  2.  Processing multiple JSON records at a time can often be more efficient 
than a single JSON object per flow file. If you have or can construct flow 
files in this manner (e.g., a single flow file contains an array of JSON 
elements), then from that point on you can use the record-oriented processors. 
Record processors (again, you can find them by searching/filtering for “Record” 
when adding processors or controller services) require defining a schema to 
apply to your data, but once that is done, you can read/write/modify various 
record formats, including Json. Here is an example of doing JSON record 
enrichment in this style [1].
  3.  Lastly, in general, don’t worry too much about reducing the amount of 
data going around. Unless you modify the flow file contents, all content data 
is passed by reference in NiFi, meaning there is only one physical copy of it 
(stored in the flow file content repository) in NiFi’s storage at any time. In 
most cases, this is very efficient and you do not need to optimize the data for 
NiFi, it will try to do the intelligent thing for you. If you start to 
experience system resource exhaustion (e.g., run out of storage in content 
repository), or if copying data becomes a performance bottleneck in your flow, 
then take the time to optimize that aspect or tune the system to meet the 
demands of your data flow.

[1] 
https://community.hortonworks.com/articles/138632/data-flow-enrichment-with-nifi-lookuprecord-proces.html

Hope this helps!
Kevin

From: Jean-Sebastien Vachon 
mailto:jsvac...@brizodata.com>>
Reply-To: mailto:users@nifi.apache.org>>
Date: Thursday, July 12, 2018 at 12:12
To: "users@nifi.apache.org<mailto:users@nifi.apache.org>" 
mailto:users@nifi.apache.org>>
Subject: Merging output of multiple processors

Hi,

I am pretty new to Nifi and I’m struggling on something that (in my mind) 
should be very easy to do 
My flow consists of a Json file being processed by different processors to 
extract different information and enrich the data. Each processors have been 
implemented as ExecuteStreamCommand and will output the information extracted 
in a JSON like element. As an example, one of the module determines the 
language of one of the field in the original JSON and will output something 
like:

{ “language” : “en” }

Every module is extracting a different piece of information and my goal was to 
reduce the amount of data going around.

What would be the best way of merging the responses on all modules into my JSON 
when everything has been processed? The resulting JSON will then continue in 
the flow for further processing.

I tried using the MergeContent module but the output format can not be in JSON 
so I’m a bit stuck. Right now, the merge strategy is set to “Bin-Packing 
algorithm” with the Correlation attribute set to ${filename}. The min and max 
entries are set to the expected number of elements to merge (6 in this case).

I tried the “Defragment” strategy as well but the system was complaining about 
missing fragment.index attribute (which I tried to provide through an 
UpdateAttribute processor but that does not seem to work either

Thanks
--
Jean-Sé

Merging output of multiple processors

2018-07-12 Thread Jean-Sebastien Vachon
Hi,

I am pretty new to Nifi and I’m struggling on something that (in my mind) 
should be very easy to do 
My flow consists of a Json file being processed by different processors to 
extract different information and enrich the data. Each processors have been 
implemented as ExecuteStreamCommand and will output the information extracted 
in a JSON like element. As an example, one of the module determines the 
language of one of the field in the original JSON and will output something 
like:

{ “language” : “en” }

Every module is extracting a different piece of information and my goal was to 
reduce the amount of data going around.

What would be the best way of merging the responses on all modules into my JSON 
when everything has been processed? The resulting JSON will then continue in 
the flow for further processing.

I tried using the MergeContent module but the output format can not be in JSON 
so I’m a bit stuck. Right now, the merge strategy is set to “Bin-Packing 
algorithm” with the Correlation attribute set to ${filename}. The min and max 
entries are set to the expected number of elements to merge (6 in this case).

I tried the “Defragment” strategy as well but the system was complaining about 
missing fragment.index attribute (which I tried to provide through an 
UpdateAttribute processor but that does not seem to work either

Thanks
--
Jean-Sébastien Vachon

vacho...@gmail.com
jsvac...@brizodata.com
www.brizodata.com



Versioning

2018-07-10 Thread Jean-Sebastien Vachon
Hi,

Is there anyway to use Git to keep track of changes made to the flow?

What are the best practices to be able to have multiple people working together 
on a project?
We are planning to implement our flow using groups (local and remote).. is that 
a good approach?

I am looking into any advice on this aspect

thanks

--
Jean-Sébastien Vachon

vacho...@gmail.com
jsvac...@brizodata.com
www.brizodata.com



RE: Problem with embedded Zookeeper

2018-07-09 Thread Jean-Sebastien Vachon
I finally managed to get it up and running but it required a few additional 
steps.

First, I had a few permissions /ownership that were causing issues. I had to 
make a few changes so that the ‘nifi’ user was the owner of everything. This 
was most likely the main cause of all my issues.

Then, when I started all three nodes at the same time.. they were taking 
forever to establish the election mechanism.
They were all saying “still voting on which Flow is the correct flow for the 
cluster”. I stopped everything and restarted only one node… after some time, it 
came up fine and then I started the two other nodes and they were finally able 
to join the cluster.

I am not sure if this was the correct way of bringing up everything but I can 
finally proceed with my tests

Thanks all

From: Andy LoPresto 
Sent: July 9, 2018 1:42 PM
To: users@nifi.apache.org
Subject: Re: Problem with embedded Zookeeper

You should not have to modify the state-management.xml file. Here is an example 
of cluster configuration (the certificates here use wildcard CNs, which is an 
active issue I’m working to resolve, but the ZK settings are all correct). I am 
currently running this cluster using embedded ZK on 3 NiFi instances running on 
the same local machine. You’ll have to change the DNS entries to your nodes.

https://github.com/alopresto/nifi_secure_cluster_wildcard

Andy LoPresto
alopre...@apache.org<mailto:alopre...@apache.org>
alopresto.apa...@gmail.com<mailto:alopresto.apa...@gmail.com>
PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4  BACE 3C6E F65B 2F7D EF69

On Jul 9, 2018, at 10:07 AM, Jean-Sebastien Vachon 
mailto:jsvac...@brizodata.com>> wrote:

Hi Andy,

Thanks for the suggestion but I did spot that typo earlier. One quick question, 
do we need to make any change to the state-management.xml file? Like filling 
the “Connect String” property to something like:


zk-provider

org.apache.nifi.controller.state.providers.zookeeper.ZooKeeperStateProvider
nifi01.brizo.com<http://nifi01.brizo.com/>:2181,nifi02.brizo.com<http://nifi02.brizo.com/>:2181,nifi03.brizo.com<http://nifi03.brizo.com/>:2181
/nifi
10 seconds
Open


Actually, I tried this and it didn’t help 

I might have to try without the Embedded ZK like someone else suggested but I 
don’t see any reason why the embedded version would not work

Thanks

From: Andy LoPresto mailto:alopre...@apache.org>>
Sent: July 9, 2018 12:29 PM
To: users@nifi.apache.org<mailto:users@nifi.apache.org>
Subject: Re: Problem with embedded Zookeeper

Jean-Sebastian,

Pierre’s guide has a typo in it — make sure that you configured the Zookeeper 
nodes with *./conf/zookeeper.properties*, not *./conf/zookeep.properties*. The 
location of the ZK configuration file needs to match the property defined in 
nifi.properties as 
nifi.state.management.embedded.zookeeper.properties=./conf/zookeeper.properties.

I just ran through this the other day and realized there was a typo there. 
Hopefully this helps. Let us know if you have any other issues.


Andy LoPresto
alopre...@apache.org<mailto:alopre...@apache.org>
alopresto.apa...@gmail.com<mailto:alopresto.apa...@gmail.com>
PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4  BACE 3C6E F65B 2F7D EF69

On Jul 9, 2018, at 8:47 AM, Jean-Sebastien Vachon 
mailto:jsvac...@brizodata.com>> wrote:

Hi all,

I am following this guide 
(https://pierrevillard.com/2016/08/13/apache-nifi-1-0-0-cluster-setup/) in 
order to setup a Nifi cluster containing 3 nodes for testing purposes. The 
three nodes are up and running but somehow, they are not in cluster.

I’ve checked my configs  over and over and can not figure out why they are not 
clustered.  Here are portions of my configuration file.

nifi.state.management.embedded.zookeeper.start=true
nifi.state.management.embedded.zookeeper.properties=./conf/zookeeper.properties

nifi.cluster.is.node=true
nifi.cluster.node.address=nifi02.brizo.com<http://nifi02.brizo.com/>
nifi.cluster.node.protocol.port=

nifi.zookeeper.connect.string=nifi01.brizo.com<http://nifi01.brizo.com/>:2181,nifi02.brizo.com<http://nifi02.brizo.com/>:2181,nifi03.brizo.com<http://nifi03.brizo.com/>:2181
nifi.remote.input.host=nifi02.brizo.com<http://nifi02.brizo.com/>
nifi.remote.input.secure=false
nifi.remote.input.socket.port=9998

I’ve created the myid file under $NIFI_HOME/state/zookeeper for each of the 
nodes.

When I start a node and check the logs (I raised the level to DEBUG), I can see 
very little content referring to Zookeeper. I also tried to telnet port 2181 
and nothing is listening on that port.

I am obviously missing something. Any idea what it could be?

Thanks
--
Jean-Sébastien Vachon

téléphone:  (418) 655-6661
vacho...@gmail.com<mailto:jsvac...@brizodata.com>
jsvac...@brizodata.com<mailto:jsvac...@brizodata.com>
www.brizodata.com<http://www.brizodata.com/>



RE: Problem with embedded Zookeeper

2018-07-09 Thread Jean-Sebastien Vachon
Hi Andy,

Thanks for the suggestion but I did spot that typo earlier. One quick question, 
do we need to make any change to the state-management.xml file? Like filling 
the “Connect String” property to something like:


zk-provider

org.apache.nifi.controller.state.providers.zookeeper.ZooKeeperStateProvider
nifi01.brizo.com:2181,nifi02.brizo.com:2181,nifi03.brizo.com:2181
/nifi
10 seconds
Open


Actually, I tried this and it didn’t help 

I might have to try without the Embedded ZK like someone else suggested but I 
don’t see any reason why the embedded version would not work

Thanks

From: Andy LoPresto 
Sent: July 9, 2018 12:29 PM
To: users@nifi.apache.org
Subject: Re: Problem with embedded Zookeeper

Jean-Sebastian,

Pierre’s guide has a typo in it — make sure that you configured the Zookeeper 
nodes with *./conf/zookeeper.properties*, not *./conf/zookeep.properties*. The 
location of the ZK configuration file needs to match the property defined in 
nifi.properties as 
nifi.state.management.embedded.zookeeper.properties=./conf/zookeeper.properties.

I just ran through this the other day and realized there was a typo there. 
Hopefully this helps. Let us know if you have any other issues.


Andy LoPresto
alopre...@apache.org<mailto:alopre...@apache.org>
alopresto.apa...@gmail.com<mailto:alopresto.apa...@gmail.com>
PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4  BACE 3C6E F65B 2F7D EF69

On Jul 9, 2018, at 8:47 AM, Jean-Sebastien Vachon 
mailto:jsvac...@brizodata.com>> wrote:

Hi all,

I am following this guide 
(https://pierrevillard.com/2016/08/13/apache-nifi-1-0-0-cluster-setup/) in 
order to setup a Nifi cluster containing 3 nodes for testing purposes. The 
three nodes are up and running but somehow, they are not in cluster.

I’ve checked my configs  over and over and can not figure out why they are not 
clustered.  Here are portions of my configuration file.

nifi.state.management.embedded.zookeeper.start=true
nifi.state.management.embedded.zookeeper.properties=./conf/zookeeper.properties

nifi.cluster.is.node=true
nifi.cluster.node.address=nifi02.brizo.com<http://nifi02.brizo.com/>
nifi.cluster.node.protocol.port=

nifi.zookeeper.connect.string=nifi01.brizo.com<http://nifi01.brizo.com/>:2181,nifi02.brizo.com<http://nifi02.brizo.com/>:2181,nifi03.brizo.com<http://nifi03.brizo.com/>:2181
nifi.remote.input.host=nifi02.brizo.com<http://nifi02.brizo.com/>
nifi.remote.input.secure=false
nifi.remote.input.socket.port=9998

I’ve created the myid file under $NIFI_HOME/state/zookeeper for each of the 
nodes.

When I start a node and check the logs (I raised the level to DEBUG), I can see 
very little content referring to Zookeeper. I also tried to telnet port 2181 
and nothing is listening on that port.

I am obviously missing something. Any idea what it could be?

Thanks
--
Jean-Sébastien Vachon

téléphone:  (418) 655-6661
vacho...@gmail.com<mailto:jsvac...@brizodata.com>
jsvac...@brizodata.com<mailto:jsvac...@brizodata.com>
www.brizodata.com<http://www.brizodata.com/>



Problem with embedded Zookeeper

2018-07-09 Thread Jean-Sebastien Vachon
Hi all,

I am following this guide 
(https://pierrevillard.com/2016/08/13/apache-nifi-1-0-0-cluster-setup/) in 
order to setup a Nifi cluster containing 3 nodes for testing purposes. The 
three nodes are up and running but somehow, they are not in cluster.

I've checked my configs  over and over and can not figure out why they are not 
clustered.  Here are portions of my configuration file.

nifi.state.management.embedded.zookeeper.start=true
nifi.state.management.embedded.zookeeper.properties=./conf/zookeeper.properties

nifi.cluster.is.node=true
nifi.cluster.node.address=nifi02.brizo.com
nifi.cluster.node.protocol.port=

nifi.zookeeper.connect.string=nifi01.brizo.com:2181,nifi02.brizo.com:2181,nifi03.brizo.com:2181
nifi.remote.input.host=nifi02.brizo.com
nifi.remote.input.secure=false
nifi.remote.input.socket.port=9998

I've created the myid file under $NIFI_HOME/state/zookeeper for each of the 
nodes.

When I start a node and check the logs (I raised the level to DEBUG), I can see 
very little content referring to Zookeeper. I also tried to telnet port 2181 
and nothing is listening on that port.

I am obviously missing something. Any idea what it could be?

Thanks
--
Jean-Sébastien Vachon

téléphone:  (418) 655-6661
vacho...@gmail.com
jsvac...@brizodata.com
www.brizodata.com