NPE at JMSConsumer processor

2019-11-12 Thread Juan Pablo Gardella
Hello all,

I found the following NPE in Nifi 1.5.0 version:

2019-11-13 04:19:11,031 ERROR [Timer-Driven Process Thread-5]
o.apache.nifi.jms.processors.ConsumeJMS ConsumeJMS -
JMSConsumer[destination:null; pub-sub:true;] ConsumeJMS -
JMSConsumer[destination:null; pub-sub:true;] failed to process session due
to java.lang.NullPointerException: {}java.lang.NullPointerException: nullat
org.apache.nifi.jms.processors.MessageBodyToBytesConverter.toBytes(MessageBodyToBytesConverter.java:40)at
org.apache.nifi.jms.processors.JMSConsumer$1.doInJms(JMSConsumer.java:84)at
org.apache.nifi.jms.processors.JMSConsumer$1.doInJms(JMSConsumer.java:65)at
org.springframework.jms.core.JmsTemplate.execute(JmsTemplate.java:494)at
org.apache.nifi.jms.processors.JMSConsumer.consume(JMSConsumer.java:65)at
org.apache.nifi.jms.processors.ConsumeJMS.rendezvousWithJms(ConsumeJMS.java:144)at
org.apache.nifi.jms.processors.AbstractJMSProcessor.onTrigger(AbstractJMSProcessor.java:139)at
org.apache.nifi.jms.processors.ConsumeJMS.onTrigger(ConsumeJMS.java:56)at
org.apache.nifi.processor.AbstractProcessor.onTrigger(AbstractProcessor.java:27)

Basicaly the TextMessage.getText()

comes in null. According the javadoc is possible.

The fix I did consists on log a WARN and write a empty byte array as
output. In latest version is also not handled the null also. Is it a valid
solution to write in flowfile body an empty array in this scenario?

Juan


Re: Supporting Elasticsearch scrolling with an input flow file

2019-11-12 Thread Koji Kawamura
Hi Tim,

Sorry for the late reply.
It seems the ScrollElasticsearchHttp processor is designed to run a
one-shot query to import query results from Elasticsearch.
The description says "The state must be cleared before another query
can be run."
It tracks progress using managed state, not via incoming FlowFiles.
This processor is a source-processor similar to processors such as ListFile.

If large amount of documents need to be ingested with pagination by a
query based on the incoming FlowFile attribute, then I'd enhance
QueryElasticsearchHttp processor, so that it can route original
incoming FlowFile to a new relationship such as 'next page' while
incrementing the page number attribute, so that next time the FlowFile
is passed to the same QueryElasticsearchHttp again, the query results
in the next page will be used to populate FlowFiles into 'success'
relationship.
QueryElasticsearchHttp processor currently simply removes incoming FlowFiles.

Thanks,
Koji

On Fri, Nov 1, 2019 at 5:05 AM Tim Dean  wrote:
>
> Hello,
>
> I would like to use the existing ScrollElasticsearchHttp to perform a search 
> that returns a potentially large number of hits. The parameters of the search 
> need to reference one or more flow file attributes.
>
> Looking at the source code for this processor it appears that the QUERY 
> property supports EL with flow file attributes. Furthermore, the 
> documentation for the FAILURE relationship notes that only incoming flow 
> files will be routed to failure. So it seems clear that this processor was 
> designed to allow input flow files. Unfortunately though, the processor also 
> has been annotated with INPUT_FORBIDDEN so I can’t use as is.
>
> I assume that there is a good reason for forbidding input here. Before I go 
> and try to implement a custom processor that does what I want, I’d like to 
> know if some hidden problem awaits me.
>
> Can someone clarify why this processor forbids input, and what problems I 
> might expect if I try to circumvent this limitation?
>
> Thanks
>
> - Tim
>
> Sent from my iPhone


Re: Influence about removing RequiresInstanceClassLoading from AbstractHadoopProcessor processor

2019-11-12 Thread abellnotring






Hi,jeff,    There is no Kerberos authentication  in my HADOOP clusters , but I find UGI is initialized with an ExtendConfiguration(extend from hadoop Configruation) when those processor's instance was first scheduled .I would use nifi to connect different HADOOP cluster, will it  run into any issues?(I’m running tests for this)






  

 
Thanks  By Hai Luo
On 11/13/2019 07:01,Jeff wrote: 


If you remove the @RequiresInstanceClassloading, the UserGroupInformation class from Hadoop (hadoop-common, if I remember correctly) will be shared across all instances that come from a particular NAR (such as PutHDFS, ListHDFS, FetchHDFS, etc, from nifi-hadoop-nar-x.y.z.nar).  If you are using Kerberos in those processors and configured different principals across the various processors, you could run into issues when the processors attempt to acquire new TGTs, most likely the first time a relogin is attempted.  UGI has some static state and @RequiresInstanceClassloading makes sure each instance of a processor with that annotation has its own classloader to keep that kind of state from being shared across instances.On Mon, Nov 11, 2019 at 9:41 PM abellnotring  wrote:







Hi,Peter & All     I’m using kylo to manage the nifi flow(called feed in kylo),and there are 4200 instances(600+ instances extended from  AbstractHadoopProcessor) in my nifi canvas,The NIFI Non-Heap Memory has increased more than  6GB after some days running ,which is extremely abnormal . I have analyzed the class loaded into Compressed Class Space ,and found most of the CCS was used by classes related by AbstractHadoopProcessor.   So  I think removing RequiresInstanceClassLoading from AbstractHadoopProcessor processor may be a Solution for reducing the CCS used.   Do you have any ideas for this ?






  



    Thanks    By Hai Luo

On 11/12/2019 02:17,Shawn Weeks wrote: 




I’m assuming your talking about the snappy problem. If you use compress content prior to puthdfs you can compress with Snappy as it uses the Java Native Snappy Lib. The HDFS processors are limited to the actual Hadoop Libraries so they’d
 have to change from Native to get around this. I’m pretty sure we need instance loading to handle the other issues mentioned.
 
Thanks
Shawn
 

From: Joe Witt 
Reply-To: "users@nifi.apache.org" 
Date: Monday, November 11, 2019 at 8:56 AM
To: "users@nifi.apache.org" 
Subject: Re: Influence about removing RequiresInstanceClassLoading from AbstractHadoopProcessor processor


 



Peter



 


The most common challenge is if two isolated instances both want to use a native lib.  No two native libs with the same name can be in the same jvm.  We need to solve that for sure.


 


Thanks


 


On Mon, Nov 11, 2019 at 9:53 AM Peter Turcsanyi  wrote:



Hi Hai Luo, 

 


@RequiresInstanceClassLoading makes possible to configure separate / isolated "Additional Classpath Resources" settings on your HDFS processors (eg. S3 storage driver on one of your PutHDFS and Azure
 Blob on the other).


 


Is there any specific reason / use case why you are considering to remove it?


 


Regards,


Peter Turcsanyi


 


On Mon, Nov 11, 2019 at 3:30 PM abellnotring  wrote:






Hi,all



     I’m considering removing the RequiresInstanceClassLoading annotation from class AbstractHadoopProcessor,


     Does anybody know the potential Influence?


 


    Thanks


    By Hai Luo

















Usage of EvaluateJsonPath

2019-11-12 Thread Kayak28
Hello, Nifi Community members:
I am very new to Nifi, and this is actually my first try to use Nifi.

I am using Nifi 1.10 now, and want to gather data from twitter and to
import them to Apache Solr(search engine).
I followed the guide
https://blogs.apache.org/nifi/entry/indexing_tweets_with_nifi_and
According to the guide, data flows are like below.
1. GetTwitter Processor
2. EvaluateJsonPath Processor if success
3. RounteOnAttribute Processor if matched


Even though EvaluateJsonPath Processor does not have any warning message,
when I run Nifi group, the data does not flow to EvaluateJsonPath
Processor. (the data seems queued between GetTwitter and EvaluateJsonPath
Processor)

What can be a cause of this problem?

Any clue will be very appreciated.

Sincerely,
Kaya Ota


Re: NiFi Upgrade 1.9.2 to 1.10.0 - LDAP Failure

2019-11-12 Thread Josef.Zahner1
Update: we found out that the described issue (fallback to “simple” mode) is 
related to Java-11 (and of course LDAP with START_TLS) only. The error message 
is gone with Java 1.8.0, so in our case we will use for now Java 1.8.0. As 
already mentioned earlier, another option would be to use Java 11 but with 
LDAPS instead of START_TLS, but we decided against it.

I’ve updated https://issues.apache.org/jira/browse/NIFI-6860 with this 
information.

Hopefully one of the devs can fix this in future releases.

Cheers Josef

From: "Zahner Josef, GSB-LR-TRW-LI" 
Date: Monday, 11 November 2019 at 11:16
To: "users@nifi.apache.org" 
Subject: Re: NiFi Upgrade 1.9.2 to 1.10.0 - LDAP Failure

And additionally, below the output of the tcpdump captured on the NiFi node 
during startup of NiFi 1.10.0. We use the standard LDAP port (389). And you 
where right, I see in the dump that NiFi tries to authenticate with “simple” 
authentication with START_TLS…

[cid:image001.png@01D59881.72F2B710]


From: "Zahner Josef, GSB-LR-TRW-LI" 
Date: Monday, 11 November 2019 at 11:06
To: "users@nifi.apache.org" 
Subject: Re: NiFi Upgrade 1.9.2 to 1.10.0 - LDAP Failure

Hi Andy,

I’ve just opened a jira bugreport:
https://issues.apache.org/jira/projects/NIFI/issues/NIFI-6860

We changed nothing on the LDAP. The whole setup still works for our production 
nodes with NiFi 1.9.2, we have multiple clusters/single NiFi’s running. As we 
use ansible I’ve removed again NiFi 1.10.0 from the test node and installed 
again NiFi 1.9.2, it was working without any issues. And the only difference 
between NiFi 1.9.2 and 1.10.0 deployment are the new config parameters.

As you can see in  the bugreport, I’ve switched now to LDAPS and this is 
working… Users are visible in the “Users” windows and I can login with an LDAP 
user. I just switched to LDAPS instead of START_TLS and added an “S” to the URL 
of the LDAP server.

Cheers Josef



From: Andy LoPresto 
Reply to: "users@nifi.apache.org" 
Date: Monday, 11 November 2019 at 10:46
To: "users@nifi.apache.org" 
Subject: Re: NiFi Upgrade 1.9.2 to 1.10.0 - LDAP Failure

Hi Josef,

My inclination is that somehow the password NiFi is trying to send to the LDAP 
service is no longer sufficiently protected? The only other change I am aware 
of that could influence this is the Spring Security upgrade from 4.2.8 to 
4.2.13 (NiFi-6412) [1]; the new version of Spring Security might enforce a new 
restriction on how the password is sent that LDAP doesn’t like. The LDAP error 
code 13 refers to the password being sent in plaintext [2]. As you are using 
StartTLS, I am assuming the LDAP port you’re connecting to is still 389? Did 
anything change on the LDAP server? Can you verify a simple lookup using 
ldapsearch still works? If you get the same error code, you may need to add -Z 
to the command to initialize a secure TLS channel.

[1] https://issues.apache.org/jira/browse/NIFI-6412
[2] 
https://ldap.com/ldap-result-code-reference-core-ldapv3-result-codes/#rc-confidentialityRequired


Andy LoPresto
alopre...@apache.org
alopresto.apa...@gmail.com
PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4  BACE 3C6E F65B 2F7D EF69




On Nov 11, 2019, at 4:59 PM, 
josef.zahn...@swisscom.com wrote:

Hi guys

We would like to upgrade from NiFi 1.9.2 to 1.10.0 and we have HTTPS with LDAP 
(START_TLS) authentication successfully enabled on 1.9.2. Now after upgrading,  
we have an issue which prevents nifi from startup:


2019-11-11 08:29:30,447 ERROR [main] o.s.web.context.ContextLoader Context 
initialization failed
org.springframework.beans.factory.UnsatisfiedDependencyException: Error 
creating bean with name 
'org.springframework.security.config.annotation.web.configuration.WebSecurityConfiguration':
 Unsatisfied dependency expressed through method 
'setFilterChainProxySecurityConfigurer' parameter 1; nested exception is 
org.springframework.beans.factory.BeanExpressionException: Expression parsing 
failed; nested exception is 
org.springframework.beans.factory.UnsatisfiedDependencyException: Error 
creating bean with name 'org.apache.nifi.web.NiFiWebApiSecurityConfiguration': 
Unsatisfied dependency expressed through method 'setJwtAuthenticationProvider' 
parameter 0; nested exception is 
org.springframework.beans.factory.BeanCreationException: Error creating bean 
with name 'jwtAuthenticationProvider' defined in class path resource 
[nifi-web-security-context.xml]: Cannot resolve reference to bean 'authorizer' 
while setting constructor argument; nested exception is 
org.springframework.beans.factory.BeanCreationException: Error creating bean 
with name 'authorizer': FactoryBean threw exception on object creation; nested 
exception is org.springframework.ldap.AuthenticationNotSupportedException: 
[LDAP: error code 13 - confidentiality required]; nested exception is 
javax.naming.AuthenticationNotSupportedException: [LDAP: error code 13 - 
confidentiality required]
 

Re: Default Retry mechanism for NIFI puts3Object processor

2019-11-12 Thread sanjeet rath
Thanks Peter for helping me out.

Regards,
Sanjeet

On Tue, 12 Nov, 2019, 6:49 PM Peter Turcsanyi,  wrote:

> Hi Sanjeet,
>
> There is an open issue [1] about retry handling in AWS processors with a
> pull request available [2] that might be interesting for you / solve your
> problem. Unfortunately it has not been merged yet.
>
> This would be a more generic solution for all AWS processors which also
> adds an option to configure the retry policy.
>
> Regards,
> Peter
>
> [1] https://issues.apache.org/jira/browse/NIFI-6486
> [2] https://github.com/apache/nifi/pull/3612
>
> On Mon, Nov 11, 2019 at 6:15 PM sanjeet rath 
> wrote:
>
>> Hi Team,
>>
>> I am using puts3Object processor of the nifi , to uploading object from
>> onprem to AWS s3 bucket. i believe we have 2 types of uploading , single
>> part upload and multipart upload as per the threshold value defined for
>> multipart.
>>
>> for multipart , 3 steps are followed
>> 1)s3.nitiateMultipartUpload , 2)s3.uploadPart 3)s3.completeMultipartUpload
>>
>> while checking the code i found , in s3.completeMultipartUpload method,
>> if there is any server side exception(5**), then it is retrying 3 times (as
>> in CompleteMultipartUploadRetryCondition class of AWS SDK,
>> MAX_RETRY_ATTEMPTS is constant variable of value 3) using a do while loop .
>>
>> I have 2 questions
>>
>> a) This default retry mechanism (value is 3)is only used in
>> s3.completeMultipartUpload method ? as i don't find any code for retry used
>> in single object upload.
>>
>> b) if am going to changes MaxErrorRetry value AWS ClientConfiguration,
>> does this will change it retry count if there is S3exception(5**)  as per
>> value i have set, as its a constant value of 3. Please confirm.
>>
>> c)If B answer is YES. Then only
>> ClientConfiguration.MaxErrorRetry(myCostumValue) will work or
>>
>> I have to add bellow code for retry policy also.
>>
>> ClientConfiguration.setRetryPolicy(new
>> RetryPolicy(config.getRetryPolicy().getRetryCondition(),config.getRetryPolicy().getBackoffStrategy(),
>> myCostumValue, true).
>>
>>
>> Thanks ,
>>
>> Sanjeet
>>
>>
>>
>>


Re: Influence about removing RequiresInstanceClassLoading from AbstractHadoopProcessor processor

2019-11-12 Thread Shawn Weeks
I’ve uploaded lots files to HDFS that were bigger and smaller than a block. 
Last time I tried compression with PutHDFS I didn’t notice any block splitting 
happening either. It still showed up a single file.

Thanks

From: Matt Burgess 
Reply-To: "users@nifi.apache.org" 
Date: Monday, November 11, 2019 at 10:13 AM
To: "users@nifi.apache.org" 
Subject: Re: Influence about removing RequiresInstanceClassLoading from 
AbstractHadoopProcessor processor

I can’t remember for sure but I think if you use CompressContent the compressed 
file has to fit in a single HDFS file block in order to work. IIRC 
Hadoop-Snappy is different from regular Snappy in the sense that it puts the 
compression header in each block so the file can be reassembled and 
decompressed correctly.



On Nov 11, 2019, at 10:30 AM, Shawn Weeks  wrote:
I’m assuming your talking about the snappy problem. If you use compress content 
prior to puthdfs you can compress with Snappy as it uses the Java Native Snappy 
Lib. The HDFS processors are limited to the actual Hadoop Libraries so they’d 
have to change from Native to get around this. I’m pretty sure we need instance 
loading to handle the other issues mentioned.

Thanks
Shawn

From: Joe Witt 
Reply-To: "users@nifi.apache.org" 
Date: Monday, November 11, 2019 at 8:56 AM
To: "users@nifi.apache.org" 
Subject: Re: Influence about removing RequiresInstanceClassLoading from 
AbstractHadoopProcessor processor

Peter

The most common challenge is if two isolated instances both want to use a 
native lib.  No two native libs with the same name can be in the same jvm.  We 
need to solve that for sure.

Thanks

On Mon, Nov 11, 2019 at 9:53 AM Peter Turcsanyi 
mailto:turcsa...@apache.org>> wrote:
Hi Hai Luo,

@RequiresInstanceClassLoading makes possible to configure separate / isolated 
"Additional Classpath Resources" settings on your HDFS processors (eg. S3 
storage driver on one of your PutHDFS and Azure Blob on the other).

Is there any specific reason / use case why you are considering to remove it?

Regards,
Peter Turcsanyi

On Mon, Nov 11, 2019 at 3:30 PM abellnotring 
mailto:abellnotr...@sina.com>> wrote:
Hi,all
 I’m considering removing the RequiresInstanceClassLoading annotation from 
class AbstractHadoopProcessor,
 Does anybody know the potential Influence?

Thanks
By Hai Luo


Re: Influence about removing RequiresInstanceClassLoading from AbstractHadoopProcessor processor

2019-11-12 Thread Bryan Bende
What version of NiFi is Kylo based on?

There was a memory leak related to HDFS processors that was fixed back in 1.7.0:

https://issues.apache.org/jira/browse/NIFI-5136

On Mon, Nov 11, 2019 at 11:14 PM Jeff  wrote:
>
> If you remove the @RequiresInstanceClassloading, the UserGroupInformation 
> class from Hadoop (hadoop-common, if I remember correctly) will be shared 
> across all instances that come from a particular NAR (such as PutHDFS, 
> ListHDFS, FetchHDFS, etc, from nifi-hadoop-nar-x.y.z.nar).  If you are using 
> Kerberos in those processors and configured different principals across the 
> various processors, you could run into issues when the processors attempt to 
> acquire new TGTs, most likely the first time a relogin is attempted.  UGI has 
> some static state and @RequiresInstanceClassloading makes sure each instance 
> of a processor with that annotation has its own classloader to keep that kind 
> of state from being shared across instances.
>
> On Mon, Nov 11, 2019 at 9:41 PM abellnotring  wrote:
>>
>> Hi,Peter & All
>>  I’m using kylo to manage the nifi flow(called feed in kylo),and there 
>> are 4200 instances(600+ instances extended from  AbstractHadoopProcessor) in 
>> my nifi canvas,The NIFI Non-Heap Memory has increased more than  6GB after 
>> some days running ,which is extremely abnormal . I have analyzed the class 
>> loaded into Compressed Class Space ,and found most of the CCS was used by 
>> classes related by AbstractHadoopProcessor.
>>So  I think removing RequiresInstanceClassLoading from 
>> AbstractHadoopProcessor processor may be a Solution for reducing the CCS 
>> used.
>>Do you have any ideas for this ?
>>
>>
>>
>> Thanks
>>
>>
>> By Hai Luo
>>
>> On 11/12/2019 02:17,Shawn Weeks wrote:
>>
>> I’m assuming your talking about the snappy problem. If you use compress 
>> content prior to puthdfs you can compress with Snappy as it uses the Java 
>> Native Snappy Lib. The HDFS processors are limited to the actual Hadoop 
>> Libraries so they’d have to change from Native to get around this. I’m 
>> pretty sure we need instance loading to handle the other issues mentioned.
>>
>>
>>
>> Thanks
>>
>> Shawn
>>
>>
>>
>> From: Joe Witt 
>> Reply-To: "users@nifi.apache.org" 
>> Date: Monday, November 11, 2019 at 8:56 AM
>> To: "users@nifi.apache.org" 
>> Subject: Re: Influence about removing RequiresInstanceClassLoading from 
>> AbstractHadoopProcessor processor
>>
>>
>>
>> Peter
>>
>>
>>
>> The most common challenge is if two isolated instances both want to use a 
>> native lib.  No two native libs with the same name can be in the same jvm.  
>> We need to solve that for sure.
>>
>>
>>
>> Thanks
>>
>>
>>
>> On Mon, Nov 11, 2019 at 9:53 AM Peter Turcsanyi  wrote:
>>
>> Hi Hai Luo,
>>
>>
>>
>> @RequiresInstanceClassLoading makes possible to configure separate / 
>> isolated "Additional Classpath Resources" settings on your HDFS processors 
>> (eg. S3 storage driver on one of your PutHDFS and Azure Blob on the other).
>>
>>
>>
>> Is there any specific reason / use case why you are considering to remove it?
>>
>>
>>
>> Regards,
>>
>> Peter Turcsanyi
>>
>>
>>
>> On Mon, Nov 11, 2019 at 3:30 PM abellnotring  wrote:
>>
>> Hi,all
>>
>>  I’m considering removing the RequiresInstanceClassLoading annotation 
>> from class AbstractHadoopProcessor,
>>
>>  Does anybody know the potential Influence?
>>
>>
>>
>> Thanks
>>
>> By Hai Luo


Re: Default Retry mechanism for NIFI puts3Object processor

2019-11-12 Thread Peter Turcsanyi
Hi Sanjeet,

There is an open issue [1] about retry handling in AWS processors with a
pull request available [2] that might be interesting for you / solve your
problem. Unfortunately it has not been merged yet.

This would be a more generic solution for all AWS processors which also
adds an option to configure the retry policy.

Regards,
Peter

[1] https://issues.apache.org/jira/browse/NIFI-6486
[2] https://github.com/apache/nifi/pull/3612

On Mon, Nov 11, 2019 at 6:15 PM sanjeet rath  wrote:

> Hi Team,
>
> I am using puts3Object processor of the nifi , to uploading object from
> onprem to AWS s3 bucket. i believe we have 2 types of uploading , single
> part upload and multipart upload as per the threshold value defined for
> multipart.
>
> for multipart , 3 steps are followed
> 1)s3.nitiateMultipartUpload , 2)s3.uploadPart 3)s3.completeMultipartUpload
>
> while checking the code i found , in s3.completeMultipartUpload method, if
> there is any server side exception(5**), then it is retrying 3 times (as in
> CompleteMultipartUploadRetryCondition class of AWS SDK,  MAX_RETRY_ATTEMPTS
> is constant variable of value 3) using a do while loop .
>
> I have 2 questions
>
> a) This default retry mechanism (value is 3)is only used in
> s3.completeMultipartUpload method ? as i don't find any code for retry used
> in single object upload.
>
> b) if am going to changes MaxErrorRetry value AWS ClientConfiguration,
> does this will change it retry count if there is S3exception(5**)  as per
> value i have set, as its a constant value of 3. Please confirm.
>
> c)If B answer is YES. Then only
> ClientConfiguration.MaxErrorRetry(myCostumValue) will work or
>
> I have to add bellow code for retry policy also.
>
> ClientConfiguration.setRetryPolicy(new
> RetryPolicy(config.getRetryPolicy().getRetryCondition(),config.getRetryPolicy().getBackoffStrategy(),
> myCostumValue, true).
>
>
> Thanks ,
>
> Sanjeet
>
>
>
>