Re: CryptographicHashContent calculates 2 differents sha256 hashes on the same content

2021-10-20 Thread Joe Witt
Jens,

Also what type of file system/storage system are you running NiFi on
in this case?  We'll need to know this for the NiFi
content/flowfile/provenance repositories? Is it NFS?

Thanks

On Wed, Oct 20, 2021 at 11:14 AM Joe Witt  wrote:
>
> Jens,
>
> And to further narrow this down
>
> "I have a test flow, where a GenerateFlowfile has created 6x 1GB files
> (2 files per node) and next process was a hashcontent before it run
> into a test loop. Where files are uploaded via PutSFTP to a test
> server, and downloaded again and recalculated the hash. I have had one
> issue after 3 days of running."
>
> So to be clear with GenerateFlowFile making these files and then you
> looping the content is wholly and fully exclusively within the control
> of NiFI.  No Get/Fetch/Put-SFTP of any kind at all. In by looping the
> same files over and over in nifi itself you can make this happen or
> cannot?
>
> Thanks
>
> On Wed, Oct 20, 2021 at 11:08 AM Joe Witt  wrote:
> >
> > Jens,
> >
> > "After fetching a FlowFile-stream file and unpacked it back into NiFi
> > I calculate a sha256. 1 minutes later I recalculate the sha256 on the
> > exact same file. And got a new hash. That is what worry’s me.
> > The fact that the same file can be recalculated and produce two
> > different hashes, is very strange, but it happens. "
> >
> > Ok so to confirm you are saying that in each case this happens you see
> > it first compute the wrong hash, but then if you retry the same
> > flowfile it then provides the correct hash?
> >
> > Can you please also show/share the lineage history for such a flow
> > file then?  It should have events for the initial hash, second hash,
> > the unpacking, trace to the original stream, etc...
> >
> > Thanks
> >
> > On Wed, Oct 20, 2021 at 11:00 AM Jens M. Kofoed  
> > wrote:
> > >
> > > Dear Mark and Joe
> > >
> > > I know my setup isn’t normal for many people. But if we only looks at my 
> > > receive side, which the last mails is about. Every thing is happening at 
> > > the same NIFI instance. It is the same 3 node NIFI cluster.
> > > After fetching a FlowFile-stream file and unpacked it back into NiFi I 
> > > calculate a sha256. 1 minutes later I recalculate the sha256 on the exact 
> > > same file. And got a new hash. That is what worry’s me.
> > > The fact that the same file can be recalculated and produce two different 
> > > hashes, is very strange, but it happens. Over the last 5 months it have 
> > > only happen 35-40 times.
> > >
> > > I can understand if the file is not completely loaded and saved into the 
> > > content repository before the hashing starts. But I believe that the 
> > > unpack process don’t forward the flow file to the next process before it 
> > > is 100% finish unpacking and saving the new content to the repository.
> > >
> > > I have a test flow, where a GenerateFlowfile has created 6x 1GB files (2 
> > > files per node) and next process was a hashcontent before it run into a 
> > > test loop. Where files are uploaded via PutSFTP to a test server, and 
> > > downloaded again and recalculated the hash. I have had one issue after 3 
> > > days of running.
> > > Now the test flow is running without the Put/Fetch sftp processors.
> > >
> > > Another problem is that I can’t find any correlation to other events. Not 
> > > within NIFI, nor the server itself or VMWare. If I just could find any 
> > > other event which happens at the same time, I might be able to force some 
> > > kind of event to trigger the issue.
> > > I have tried to force VMware to migrate a NiFi node to another host. 
> > > Forcing it to do a snapshot and deleting snapshots, but nothing can 
> > > trigger and error.
> > >
> > > I know it will be very very difficult to reproduce. But I will setup 
> > > multiple NiFi instances running different test flows to see if I can find 
> > > any reason why it behaves as it does.
> > >
> > > Kind Regards
> > > Jens M. Kofoed
> > >
> > > Den 20. okt. 2021 kl. 16.39 skrev Mark Payne :
> > >
> > > Jens,
> > >
> > > Thanks for sharing the images.
> > >
> > > I tried to setup a test to reproduce the issue. I’ve had it running for 
> > > quite some time. Running through millions of iterations.
> > >
> > > I’ve used 5 KB files, 50 KB files, 50 MB files, and larger (to the tune 
> > > of hundreds of MB). I’ve been unable to reproduce an issue after millions 
> > > of iterations.
> > >
> > > So far I cannot replicate. And since you’re pulling the data via SFTP and 
> > > then unpacking, which preserves all original attributes from a different 
> > > system, this can easily become confusing.
> > >
> > > Recommend trying to reproduce with SFTP-related processors out of the 
> > > picture, as Joe is mentioning. Either using GetFile/FetchFile or 
> > > GenerateFlowFile. Then immediately use CryptographicHashContent to 
> > > generate an ‘initial hash’, copy that value to another attribute, and 
> > > then loop, generating the hash and comparing against the original one. 
> > > 

Re: CryptographicHashContent calculates 2 differents sha256 hashes on the same content

2021-10-20 Thread Joe Witt
Jens,

And to further narrow this down

"I have a test flow, where a GenerateFlowfile has created 6x 1GB files
(2 files per node) and next process was a hashcontent before it run
into a test loop. Where files are uploaded via PutSFTP to a test
server, and downloaded again and recalculated the hash. I have had one
issue after 3 days of running."

So to be clear with GenerateFlowFile making these files and then you
looping the content is wholly and fully exclusively within the control
of NiFI.  No Get/Fetch/Put-SFTP of any kind at all. In by looping the
same files over and over in nifi itself you can make this happen or
cannot?

Thanks

On Wed, Oct 20, 2021 at 11:08 AM Joe Witt  wrote:
>
> Jens,
>
> "After fetching a FlowFile-stream file and unpacked it back into NiFi
> I calculate a sha256. 1 minutes later I recalculate the sha256 on the
> exact same file. And got a new hash. That is what worry’s me.
> The fact that the same file can be recalculated and produce two
> different hashes, is very strange, but it happens. "
>
> Ok so to confirm you are saying that in each case this happens you see
> it first compute the wrong hash, but then if you retry the same
> flowfile it then provides the correct hash?
>
> Can you please also show/share the lineage history for such a flow
> file then?  It should have events for the initial hash, second hash,
> the unpacking, trace to the original stream, etc...
>
> Thanks
>
> On Wed, Oct 20, 2021 at 11:00 AM Jens M. Kofoed  
> wrote:
> >
> > Dear Mark and Joe
> >
> > I know my setup isn’t normal for many people. But if we only looks at my 
> > receive side, which the last mails is about. Every thing is happening at 
> > the same NIFI instance. It is the same 3 node NIFI cluster.
> > After fetching a FlowFile-stream file and unpacked it back into NiFi I 
> > calculate a sha256. 1 minutes later I recalculate the sha256 on the exact 
> > same file. And got a new hash. That is what worry’s me.
> > The fact that the same file can be recalculated and produce two different 
> > hashes, is very strange, but it happens. Over the last 5 months it have 
> > only happen 35-40 times.
> >
> > I can understand if the file is not completely loaded and saved into the 
> > content repository before the hashing starts. But I believe that the unpack 
> > process don’t forward the flow file to the next process before it is 100% 
> > finish unpacking and saving the new content to the repository.
> >
> > I have a test flow, where a GenerateFlowfile has created 6x 1GB files (2 
> > files per node) and next process was a hashcontent before it run into a 
> > test loop. Where files are uploaded via PutSFTP to a test server, and 
> > downloaded again and recalculated the hash. I have had one issue after 3 
> > days of running.
> > Now the test flow is running without the Put/Fetch sftp processors.
> >
> > Another problem is that I can’t find any correlation to other events. Not 
> > within NIFI, nor the server itself or VMWare. If I just could find any 
> > other event which happens at the same time, I might be able to force some 
> > kind of event to trigger the issue.
> > I have tried to force VMware to migrate a NiFi node to another host. 
> > Forcing it to do a snapshot and deleting snapshots, but nothing can trigger 
> > and error.
> >
> > I know it will be very very difficult to reproduce. But I will setup 
> > multiple NiFi instances running different test flows to see if I can find 
> > any reason why it behaves as it does.
> >
> > Kind Regards
> > Jens M. Kofoed
> >
> > Den 20. okt. 2021 kl. 16.39 skrev Mark Payne :
> >
> > Jens,
> >
> > Thanks for sharing the images.
> >
> > I tried to setup a test to reproduce the issue. I’ve had it running for 
> > quite some time. Running through millions of iterations.
> >
> > I’ve used 5 KB files, 50 KB files, 50 MB files, and larger (to the tune of 
> > hundreds of MB). I’ve been unable to reproduce an issue after millions of 
> > iterations.
> >
> > So far I cannot replicate. And since you’re pulling the data via SFTP and 
> > then unpacking, which preserves all original attributes from a different 
> > system, this can easily become confusing.
> >
> > Recommend trying to reproduce with SFTP-related processors out of the 
> > picture, as Joe is mentioning. Either using GetFile/FetchFile or 
> > GenerateFlowFile. Then immediately use CryptographicHashContent to generate 
> > an ‘initial hash’, copy that value to another attribute, and then loop, 
> > generating the hash and comparing against the original one. I’ll attach a 
> > flow that does this, but not sure if the email server will strip out the 
> > attachment or not.
> >
> > This way we remove any possibility of actual corruption between the two 
> > nifi instances. If we can still see corruption / different hashes within a 
> > single nifi instance, then it certainly warrants further investigation but 
> > i can’t see any issues so far.
> >
> > Thanks
> > -Mark
> >
> >
> >
> >
> >
> > On Oct 

Re: CryptographicHashContent calculates 2 differents sha256 hashes on the same content

2021-10-20 Thread Joe Witt
Jens,

"After fetching a FlowFile-stream file and unpacked it back into NiFi
I calculate a sha256. 1 minutes later I recalculate the sha256 on the
exact same file. And got a new hash. That is what worry’s me.
The fact that the same file can be recalculated and produce two
different hashes, is very strange, but it happens. "

Ok so to confirm you are saying that in each case this happens you see
it first compute the wrong hash, but then if you retry the same
flowfile it then provides the correct hash?

Can you please also show/share the lineage history for such a flow
file then?  It should have events for the initial hash, second hash,
the unpacking, trace to the original stream, etc...

Thanks

On Wed, Oct 20, 2021 at 11:00 AM Jens M. Kofoed  wrote:
>
> Dear Mark and Joe
>
> I know my setup isn’t normal for many people. But if we only looks at my 
> receive side, which the last mails is about. Every thing is happening at the 
> same NIFI instance. It is the same 3 node NIFI cluster.
> After fetching a FlowFile-stream file and unpacked it back into NiFi I 
> calculate a sha256. 1 minutes later I recalculate the sha256 on the exact 
> same file. And got a new hash. That is what worry’s me.
> The fact that the same file can be recalculated and produce two different 
> hashes, is very strange, but it happens. Over the last 5 months it have only 
> happen 35-40 times.
>
> I can understand if the file is not completely loaded and saved into the 
> content repository before the hashing starts. But I believe that the unpack 
> process don’t forward the flow file to the next process before it is 100% 
> finish unpacking and saving the new content to the repository.
>
> I have a test flow, where a GenerateFlowfile has created 6x 1GB files (2 
> files per node) and next process was a hashcontent before it run into a test 
> loop. Where files are uploaded via PutSFTP to a test server, and downloaded 
> again and recalculated the hash. I have had one issue after 3 days of running.
> Now the test flow is running without the Put/Fetch sftp processors.
>
> Another problem is that I can’t find any correlation to other events. Not 
> within NIFI, nor the server itself or VMWare. If I just could find any other 
> event which happens at the same time, I might be able to force some kind of 
> event to trigger the issue.
> I have tried to force VMware to migrate a NiFi node to another host. Forcing 
> it to do a snapshot and deleting snapshots, but nothing can trigger and error.
>
> I know it will be very very difficult to reproduce. But I will setup multiple 
> NiFi instances running different test flows to see if I can find any reason 
> why it behaves as it does.
>
> Kind Regards
> Jens M. Kofoed
>
> Den 20. okt. 2021 kl. 16.39 skrev Mark Payne :
>
> Jens,
>
> Thanks for sharing the images.
>
> I tried to setup a test to reproduce the issue. I’ve had it running for quite 
> some time. Running through millions of iterations.
>
> I’ve used 5 KB files, 50 KB files, 50 MB files, and larger (to the tune of 
> hundreds of MB). I’ve been unable to reproduce an issue after millions of 
> iterations.
>
> So far I cannot replicate. And since you’re pulling the data via SFTP and 
> then unpacking, which preserves all original attributes from a different 
> system, this can easily become confusing.
>
> Recommend trying to reproduce with SFTP-related processors out of the 
> picture, as Joe is mentioning. Either using GetFile/FetchFile or 
> GenerateFlowFile. Then immediately use CryptographicHashContent to generate 
> an ‘initial hash’, copy that value to another attribute, and then loop, 
> generating the hash and comparing against the original one. I’ll attach a 
> flow that does this, but not sure if the email server will strip out the 
> attachment or not.
>
> This way we remove any possibility of actual corruption between the two nifi 
> instances. If we can still see corruption / different hashes within a single 
> nifi instance, then it certainly warrants further investigation but i can’t 
> see any issues so far.
>
> Thanks
> -Mark
>
>
>
>
>
> On Oct 20, 2021, at 10:21 AM, Joe Witt  wrote:
>
> Jens
>
> Actually is this current loop test contained within a single nifi and there 
> you see corruption happen?
>
> Joe
>
> On Wed, Oct 20, 2021 at 7:14 AM Joe Witt  wrote:
>
> Jens,
>
> You have a very involved setup including other systems (non NiFi).  Have you 
> removed those systems from the equation so you have more evidence to support 
> your expectation that NiFi is doing something other than you expect?
>
> Joe
>
> On Wed, Oct 20, 2021 at 7:10 AM Jens M. Kofoed  wrote:
>
> Hi
>
> Today I have another file which have been running through the retry loop one 
> time. To test the processors and the algorithm I added the HashContent 
> processor and also added hashing by SHA-1.
> I file have been going through the system, and both the SHA-1 and SHA-256 are 
> both different than expected. with a 1 minutes delay the file is going 

Re: CryptographicHashContent calculates 2 differents sha256 hashes on the same content

2021-10-20 Thread Jens M. Kofoed
Dear Mark and Joe

I know my setup isn’t normal for many people. But if we only looks at my 
receive side, which the last mails is about. Every thing is happening at the 
same NIFI instance. It is the same 3 node NIFI cluster.
After fetching a FlowFile-stream file and unpacked it back into NiFi I 
calculate a sha256. 1 minutes later I recalculate the sha256 on the exact same 
file. And got a new hash. That is what worry’s me.
The fact that the same file can be recalculated and produce two different 
hashes, is very strange, but it happens. Over the last 5 months it have only 
happen 35-40 times.

I can understand if the file is not completely loaded and saved into the 
content repository before the hashing starts. But I believe that the unpack 
process don’t forward the flow file to the next process before it is 100% 
finish unpacking and saving the new content to the repository.

I have a test flow, where a GenerateFlowfile has created 6x 1GB files (2 files 
per node) and next process was a hashcontent before it run into a test loop. 
Where files are uploaded via PutSFTP to a test server, and downloaded again and 
recalculated the hash. I have had one issue after 3 days of running.
Now the test flow is running without the Put/Fetch sftp processors.

Another problem is that I can’t find any correlation to other events. Not 
within NIFI, nor the server itself or VMWare. If I just could find any other 
event which happens at the same time, I might be able to force some kind of 
event to trigger the issue.
I have tried to force VMware to migrate a NiFi node to another host. Forcing it 
to do a snapshot and deleting snapshots, but nothing can trigger and error.

I know it will be very very difficult to reproduce. But I will setup multiple 
NiFi instances running different test flows to see if I can find any reason why 
it behaves as it does.

Kind Regards
Jens M. Kofoed

> Den 20. okt. 2021 kl. 16.39 skrev Mark Payne :
> 
> Jens,
> 
> Thanks for sharing the images.
> 
> I tried to setup a test to reproduce the issue. I’ve had it running for quite 
> some time. Running through millions of iterations.
> 
> I’ve used 5 KB files, 50 KB files, 50 MB files, and larger (to the tune of 
> hundreds of MB). I’ve been unable to reproduce an issue after millions of 
> iterations.
> 
> So far I cannot replicate. And since you’re pulling the data via SFTP and 
> then unpacking, which preserves all original attributes from a different 
> system, this can easily become confusing.
> 
> Recommend trying to reproduce with SFTP-related processors out of the 
> picture, as Joe is mentioning. Either using GetFile/FetchFile or 
> GenerateFlowFile. Then immediately use CryptographicHashContent to generate 
> an ‘initial hash’, copy that value to another attribute, and then loop, 
> generating the hash and comparing against the original one. I’ll attach a 
> flow that does this, but not sure if the email server will strip out the 
> attachment or not.
> 
> This way we remove any possibility of actual corruption between the two nifi 
> instances. If we can still see corruption / different hashes within a single 
> nifi instance, then it certainly warrants further investigation but i can’t 
> see any issues so far.
> 
> Thanks
> -Mark
> 
> 
> 
> 
> 
>> On Oct 20, 2021, at 10:21 AM, Joe Witt  wrote:
>> 
>> Jens
>> 
>> Actually is this current loop test contained within a single nifi and there 
>> you see corruption happen?
>> 
>> Joe
>> 
>> On Wed, Oct 20, 2021 at 7:14 AM Joe Witt  wrote:
>> Jens,
>> 
>> You have a very involved setup including other systems (non NiFi).  Have you 
>> removed those systems from the equation so you have more evidence to support 
>> your expectation that NiFi is doing something other than you expect?
>> 
>> Joe
>> 
>> On Wed, Oct 20, 2021 at 7:10 AM Jens M. Kofoed  
>> wrote:
>> Hi
>> 
>> Today I have another file which have been running through the retry loop one 
>> time. To test the processors and the algorithm I added the HashContent 
>> processor and also added hashing by SHA-1.
>> I file have been going through the system, and both the SHA-1 and SHA-256 
>> are both different than expected. with a 1 minutes delay the file is going 
>> back into the hashing content flow and this time it calculates both hashes 
>> fine.
>> 
>> I don't believe that the hashing is buggy, but something is very very 
>> strange. What can influence the processors/algorithm to calculate a 
>> different hash???
>> All the input/output claim information is exactly the same. It is the same 
>> flow/content file going in a loop. It happens on all 3 nodes.
>> 
>> Any suggestions for where to dig ?
>> 
>> Regards
>> Jens M. Kofoed
>> 
>> 
>> 
>> Den ons. 20. okt. 2021 kl. 06.34 skrev Jens M. Kofoed 
>> :
>> Hi Mark
>> 
>> Thanks for replaying and the suggestion to look at the content Claim.
>> These 3 pictures is from the first attempt:
>>   
>> 
>> Yesterday I realized that the content was still in the archive, so I could 
>> 

Re: Nifi and Registry behind Citrix ADC.

2021-10-20 Thread Bryan Bende
Yes, you can think of it the same as how NiFi -> NiFi Registry works...

User accesses NiFi and authenticates in some way, could be client
cert, they then perform an action that calls registry. NiFi makes a
2-way TLS connection to registry using it's own server cert and sends
the end user identity to NiFi Registry in the X-ProxiedEntitiesChain
header.

NiFi Registry then sees the client certificates NiFi server, sees that
there is X-ProxiedEntities, authorizes that NiFi service is allowed to
proxy (as well as any other identities in the chain besides the top
entry for the user), and if so then proceeds to authorize the rest of
the request as the end user identity.

On Wed, Oct 20, 2021 at 10:10 AM Shawn Weeks  wrote:
>
> I didn't know that was supported. Does this require the Proxy to do 2-way ssl 
> back to NiFi?
>
> Thanks
> Shawn
>
> -Original Message-
> From: Bryan Bende 
> Sent: Wednesday, October 20, 2021 9:02 AM
> To: users@nifi.apache.org
> Subject: Re: Nifi and Registry behind Citrix ADC.
>
> If the load balancer can pass the client cert DN in the 
> X-ProxiedEntitiesChain header, then it doesn't have to be a straight pass 
> through. The load balancer identity would need to be authorized as a proxy in 
> NiFi or NiFi Registry.
>
> https://nifi.apache.org/docs/nifi-docs/html/administration-guide.html#proxy_configuration
>
> On Tue, Oct 19, 2021 at 8:43 PM Shawn Weeks  wrote:
> >
> > If you’re authenticating with 2-way ssl you’ll have to setup your load
> > balancer to directly pass the TCP traffic through. Otherwise NiFi
> > doesn’t see the users cert. NiFi doesn’t currently support getting the
> > SSL Cert name from an HTTP Header like some other systems do. Usually
> > if your using an HTTP Load Balancer you’d authenticate with SSO(SAML
> > or OIDC) or LDAP(Username/Password)
> >
> >
> >
> > Thanks
> >
> > Shawn
> >
> >
> >
> > From: Jens M. Kofoed 
> > Sent: Tuesday, October 19, 2021 1:16 AM
> > To: users@nifi.apache.org
> > Subject: Re: Nifi and Registry behind Citrix ADC.
> >
> >
> >
> > Only if you want other ways to authenticate users. I have setup our NIFI 
> > systems to talk with our MS AD via ldaps, and defined different AD groups 
> > which in nifi has different policy rules. Some people can manage every 
> > thing, others can only start/stop specific processors in specific process 
> > groups.
> >
> > Using personal certificates is no problem, I have some admins which also 
> > use there personal certificates. But with certificates you would have to 
> > add and manage users manually in NIFI. Users can of course being added to 
> > internal groups in NIFI and policy configured to groups.
> >
> >
> >
> > reagrd
> >
> > Jens
> >
> >
> >
> > Den tir. 19. okt. 2021 kl. 07.43 skrev Jakobsson Stefan 
> > :
> >
> > We are currently authenticating with personal certificates, should we 
> > change that then?
> >
> >
> >
> > Stefan Jakobsson
> >
> >
> > Systems Manager  |  Scania IT, IKCA |  Scania CV AB
> >
> > Phone: +46 8 553 527 27 Mobile: +46 7 008 834 76
> >
> > Forskargatan 20, SE-151 87 Södertälje, Sweden
> >
> > stefan.jakobs...@scania.com
> >
> >
> >
> > From: Shawn Weeks 
> > Sent: den 18 oktober 2021 21:35
> > To: users@nifi.apache.org
> > Subject: RE: Nifi and Registry behind Citrix ADC.
> >
> >
> >
> > Unless you’re operating the LB in TCP Mode you’ll need to configure NiFi to 
> > use an alternative authentication method like SAML, LDAP, OIDC, etc. You’ll 
> > also need to make sure that your proxy is passing the various HTTP headers 
> > through to NiFi and that NiFi is expecting traffic from a proxy. If you 
> > look in the nifi-user.log and nifi-app.log there might be some hints about 
> > what it didn’t like.
> >
> >
> >
> > Thanks
> >
> > Shawn
> >
> >
> >
> > From: Jakobsson Stefan 
> > Sent: Monday, October 18, 2021 2:26 PM
> > To: users@nifi.apache.org
> > Subject: RE: Nifi and Registry behind Citrix ADC.
> >
> >
> >
> > Ahh, no ADC as in applicationdelivery and loadbalancing 
> >
> >
> >
> > Stefan Jakobsson
> >
> >
> > Systems Manager  |  Scania IT, IKCA |  Scania CV AB
> >
> > Phone: +46 8 553 527 27 Mobile: +46 7 008 834 76
> >
> > Forskargatan 20, SE-151 87 Södertälje, Sweden
> >
> > stefan.jakobs...@scania.com
> >
> >
> >
> > From: Lehel Boér 
> > Sent: den 18 oktober 2021 15:03
> > To: users@nifi.apache.org
> > Subject: Re: Nifi and Registry behind Citrix ADC.
> >
> >
> >
> > Hi Stefan,
> >
> >
> >
> > Please disregard my prior response. The name mislead me, I discovered ADC 
> > is not the same as Active Directory.
> >
> >
> >
> > Kind Regards,
> >
> > Lehel Boér
> >
> >
> >
> > Lehel Boér  ezt írta (időpont: 2021. okt. 18., H, 
> > 14:54):
> >
> > Hi Stefan,
> >
> >
> >
> > Have you tried setting up NiFi with an LDAP provider? Here are a few useful 
> > links.
> >
> >
> >
> > -
> > https://docs.cloudera.com/HDPDocuments/HDF3/HDF-3.4.1.1/nifi-security/
> > content/ldap_login_identity_provider.html
> >
> > - 

Re: CryptographicHashContent calculates 2 differents sha256 hashes on the same content

2021-10-20 Thread Mark Payne
Jens,

Thanks for sharing the images.

I tried to setup a test to reproduce the issue. I’ve had it running for quite 
some time. Running through millions of iterations.

I’ve used 5 KB files, 50 KB files, 50 MB files, and larger (to the tune of 
hundreds of MB). I’ve been unable to reproduce an issue after millions of 
iterations.

So far I cannot replicate. And since you’re pulling the data via SFTP and then 
unpacking, which preserves all original attributes from a different system, 
this can easily become confusing.

Recommend trying to reproduce with SFTP-related processors out of the picture, 
as Joe is mentioning. Either using GetFile/FetchFile or GenerateFlowFile. Then 
immediately use CryptographicHashContent to generate an ‘initial hash’, copy 
that value to another attribute, and then loop, generating the hash and 
comparing against the original one. I’ll attach a flow that does this, but not 
sure if the email server will strip out the attachment or not.

This way we remove any possibility of actual corruption between the two nifi 
instances. If we can still see corruption / different hashes within a single 
nifi instance, then it certainly warrants further investigation but i can’t see 
any issues so far.

Thanks
-Mark





On Oct 20, 2021, at 10:21 AM, Joe Witt 
mailto:joe.w...@gmail.com>> wrote:

Jens

Actually is this current loop test contained within a single nifi and there you 
see corruption happen?

Joe

On Wed, Oct 20, 2021 at 7:14 AM Joe Witt 
mailto:joe.w...@gmail.com>> wrote:
Jens,

You have a very involved setup including other systems (non NiFi).  Have you 
removed those systems from the equation so you have more evidence to support 
your expectation that NiFi is doing something other than you expect?

Joe

On Wed, Oct 20, 2021 at 7:10 AM Jens M. Kofoed 
mailto:jmkofoed@gmail.com>> wrote:
Hi

Today I have another file which have been running through the retry loop one 
time. To test the processors and the algorithm I added the HashContent 
processor and also added hashing by SHA-1.
I file have been going through the system, and both the SHA-1 and SHA-256 are 
both different than expected. with a 1 minutes delay the file is going back 
into the hashing content flow and this time it calculates both hashes fine.

I don't believe that the hashing is buggy, but something is very very strange. 
What can influence the processors/algorithm to calculate a different hash???
All the input/output claim information is exactly the same. It is the same 
flow/content file going in a loop. It happens on all 3 nodes.

Any suggestions for where to dig ?

Regards
Jens M. Kofoed



Den ons. 20. okt. 2021 kl. 06.34 skrev Jens M. Kofoed 
mailto:jmkofoed@gmail.com>>:
Hi Mark

Thanks for replaying and the suggestion to look at the content Claim.
These 3 pictures is from the first attempt:
  

Yesterday I realized that the content was still in the archive, so I could 
Replay the file.

So here are the same pictures but for the replay and as you can see the 
Identifier, offset and Size are all the same.
  

In my flow if the hash does not match my original first calculated hash, it 
goes into a retry loop. Here are the pictures for the 4th time the file went 
through:
  
Here the content Claim is all the same.

It is very rare that we see these issues <1 : 1.000.000 files and only with 
large files. Only once have I seen the error with a 110MB file, the other times 
the files size are above 800MB.
This time it was a Nifi-Flowstream v3 file, which has been exported from one 
system and imported in another. But while the file has been imported it is the 
same file inside NIFI and it stays at the same node. Going through the same 
loop of processors multiple times and in the end the CryptographicHashContent 
calculate a different SHA256 than it did earlier. This should not be 
possible!!! And that is what concern my the most.
What can influence the same processor to calculate 2 different sha256 on the 
exact same content???

Regards
Jens M. Kofoed


Den tir. 19. okt. 2021 kl. 16.51 skrev Mark Payne 
mailto:marka...@hotmail.com>>:
Jens,

In the two provenance events - one showing a hash of dd4cc… and the other 
showing f6f0….
If you go to the Content tab, do they both show the same Content Claim? I.e., 
do the Input Claim / Output Claim show the same values for Container, Section, 
Identifier, Offset, and Size?

Thanks
-Mark

On Oct 19, 2021, at 1:22 AM, Jens M. Kofoed 
mailto:jmkofoed@gmail.com>> wrote:

Dear NIFI Users

I have posted this mail in the developers mailing list and just want to inform 
all of our about a very odd behavior we are facing.
The background:
We have data going between 2 different NIFI systems which has no direct network 
access to each other. Therefore we calculate a SHA256 hash value of the content 
at system 1, before the flowfile and data are combined and saved as a 
"flowfile-stream-v3" pkg file. The file is then transported to system 2, where 
the pkg file 

RE: Nifi and Registry behind Citrix ADC.

2021-10-20 Thread Shawn Weeks
I didn't know that was supported. Does this require the Proxy to do 2-way ssl 
back to NiFi?

Thanks
Shawn

-Original Message-
From: Bryan Bende  
Sent: Wednesday, October 20, 2021 9:02 AM
To: users@nifi.apache.org
Subject: Re: Nifi and Registry behind Citrix ADC.

If the load balancer can pass the client cert DN in the X-ProxiedEntitiesChain 
header, then it doesn't have to be a straight pass through. The load balancer 
identity would need to be authorized as a proxy in NiFi or NiFi Registry.

https://nifi.apache.org/docs/nifi-docs/html/administration-guide.html#proxy_configuration

On Tue, Oct 19, 2021 at 8:43 PM Shawn Weeks  wrote:
>
> If you’re authenticating with 2-way ssl you’ll have to setup your load 
> balancer to directly pass the TCP traffic through. Otherwise NiFi 
> doesn’t see the users cert. NiFi doesn’t currently support getting the 
> SSL Cert name from an HTTP Header like some other systems do. Usually 
> if your using an HTTP Load Balancer you’d authenticate with SSO(SAML 
> or OIDC) or LDAP(Username/Password)
>
>
>
> Thanks
>
> Shawn
>
>
>
> From: Jens M. Kofoed 
> Sent: Tuesday, October 19, 2021 1:16 AM
> To: users@nifi.apache.org
> Subject: Re: Nifi and Registry behind Citrix ADC.
>
>
>
> Only if you want other ways to authenticate users. I have setup our NIFI 
> systems to talk with our MS AD via ldaps, and defined different AD groups 
> which in nifi has different policy rules. Some people can manage every thing, 
> others can only start/stop specific processors in specific process groups.
>
> Using personal certificates is no problem, I have some admins which also use 
> there personal certificates. But with certificates you would have to add and 
> manage users manually in NIFI. Users can of course being added to internal 
> groups in NIFI and policy configured to groups.
>
>
>
> reagrd
>
> Jens
>
>
>
> Den tir. 19. okt. 2021 kl. 07.43 skrev Jakobsson Stefan 
> :
>
> We are currently authenticating with personal certificates, should we change 
> that then?
>
>
>
> Stefan Jakobsson
>
>
> Systems Manager  |  Scania IT, IKCA |  Scania CV AB
>
> Phone: +46 8 553 527 27 Mobile: +46 7 008 834 76
>
> Forskargatan 20, SE-151 87 Södertälje, Sweden
>
> stefan.jakobs...@scania.com
>
>
>
> From: Shawn Weeks 
> Sent: den 18 oktober 2021 21:35
> To: users@nifi.apache.org
> Subject: RE: Nifi and Registry behind Citrix ADC.
>
>
>
> Unless you’re operating the LB in TCP Mode you’ll need to configure NiFi to 
> use an alternative authentication method like SAML, LDAP, OIDC, etc. You’ll 
> also need to make sure that your proxy is passing the various HTTP headers 
> through to NiFi and that NiFi is expecting traffic from a proxy. If you look 
> in the nifi-user.log and nifi-app.log there might be some hints about what it 
> didn’t like.
>
>
>
> Thanks
>
> Shawn
>
>
>
> From: Jakobsson Stefan 
> Sent: Monday, October 18, 2021 2:26 PM
> To: users@nifi.apache.org
> Subject: RE: Nifi and Registry behind Citrix ADC.
>
>
>
> Ahh, no ADC as in applicationdelivery and loadbalancing 
>
>
>
> Stefan Jakobsson
>
>
> Systems Manager  |  Scania IT, IKCA |  Scania CV AB
>
> Phone: +46 8 553 527 27 Mobile: +46 7 008 834 76
>
> Forskargatan 20, SE-151 87 Södertälje, Sweden
>
> stefan.jakobs...@scania.com
>
>
>
> From: Lehel Boér 
> Sent: den 18 oktober 2021 15:03
> To: users@nifi.apache.org
> Subject: Re: Nifi and Registry behind Citrix ADC.
>
>
>
> Hi Stefan,
>
>
>
> Please disregard my prior response. The name mislead me, I discovered ADC is 
> not the same as Active Directory.
>
>
>
> Kind Regards,
>
> Lehel Boér
>
>
>
> Lehel Boér  ezt írta (időpont: 2021. okt. 18., H, 
> 14:54):
>
> Hi Stefan,
>
>
>
> Have you tried setting up NiFi with an LDAP provider? Here are a few useful 
> links.
>
>
>
> - 
> https://docs.cloudera.com/HDPDocuments/HDF3/HDF-3.4.1.1/nifi-security/
> content/ldap_login_identity_provider.html
>
> - https://pierrevillard.com/2017/01/24/integration-of-nifi-with-ldap
>
>
>
> Kind Regards,
>
> Lehel Boér
>
>
>
> Jakobsson Stefan  ezt írta (időpont: 2021. okt. 
> 18., H, 13:02):
>
> Hello,
>
>
>
> I have some issues trying to run Nifi and Nifi-registry behind an ADC. 
> Reason for this is that we need Nifi be accessible from aws onto our 
> onprem nifi installation due demands from our IT sec department
>
>
>
> Anyhow, I can connect to Nifi-Registry on the servers ipconfig (i.e. 
> x.x.x.x:9443/nifi-registry) without problems, but if I try to use the URL 
> setup in the ADC with 9443 redirected to the nifiservers IP we get an error 
> saying:
>
>
>
> This page isn’t working
>
> nifiprod.oururl.com didn’t send any data.
>
> ERR_EMPTY_RESPONSE
>
>
>
> Anyone has any ideas what I should start looking at? I set the https.host to 
> 0.0.0.0 in nifi-registry.conf.
>
>
>
> Stefan Jakobsson
>
>
> Systems Manager  |  Scania IT, IKCA |  Scania CV AB
>
> Phone: +46 8 553 527 27 Mobile: +46 7 008 834 76
>
> Forskargatan 20, SE-151 87 Södertälje, Sweden
>
> stefan.jakobs...@scania.com
>
>


Re: Nifi and Registry behind Citrix ADC.

2021-10-20 Thread Bryan Bende
If the load balancer can pass the client cert DN in the
X-ProxiedEntitiesChain header, then it doesn't have to be a straight
pass through. The load balancer identity would need to be authorized
as a proxy in NiFi or NiFi Registry.

https://nifi.apache.org/docs/nifi-docs/html/administration-guide.html#proxy_configuration

On Tue, Oct 19, 2021 at 8:43 PM Shawn Weeks  wrote:
>
> If you’re authenticating with 2-way ssl you’ll have to setup your load 
> balancer to directly pass the TCP traffic through. Otherwise NiFi doesn’t see 
> the users cert. NiFi doesn’t currently support getting the SSL Cert name from 
> an HTTP Header like some other systems do. Usually if your using an HTTP Load 
> Balancer you’d authenticate with SSO(SAML or OIDC) or LDAP(Username/Password)
>
>
>
> Thanks
>
> Shawn
>
>
>
> From: Jens M. Kofoed 
> Sent: Tuesday, October 19, 2021 1:16 AM
> To: users@nifi.apache.org
> Subject: Re: Nifi and Registry behind Citrix ADC.
>
>
>
> Only if you want other ways to authenticate users. I have setup our NIFI 
> systems to talk with our MS AD via ldaps, and defined different AD groups 
> which in nifi has different policy rules. Some people can manage every thing, 
> others can only start/stop specific processors in specific process groups.
>
> Using personal certificates is no problem, I have some admins which also use 
> there personal certificates. But with certificates you would have to add and 
> manage users manually in NIFI. Users can of course being added to internal 
> groups in NIFI and policy configured to groups.
>
>
>
> reagrd
>
> Jens
>
>
>
> Den tir. 19. okt. 2021 kl. 07.43 skrev Jakobsson Stefan 
> :
>
> We are currently authenticating with personal certificates, should we change 
> that then?
>
>
>
> Stefan Jakobsson
>
>
> Systems Manager  |  Scania IT, IKCA |  Scania CV AB
>
> Phone: +46 8 553 527 27 Mobile: +46 7 008 834 76
>
> Forskargatan 20, SE-151 87 Södertälje, Sweden
>
> stefan.jakobs...@scania.com
>
>
>
> From: Shawn Weeks 
> Sent: den 18 oktober 2021 21:35
> To: users@nifi.apache.org
> Subject: RE: Nifi and Registry behind Citrix ADC.
>
>
>
> Unless you’re operating the LB in TCP Mode you’ll need to configure NiFi to 
> use an alternative authentication method like SAML, LDAP, OIDC, etc. You’ll 
> also need to make sure that your proxy is passing the various HTTP headers 
> through to NiFi and that NiFi is expecting traffic from a proxy. If you look 
> in the nifi-user.log and nifi-app.log there might be some hints about what it 
> didn’t like.
>
>
>
> Thanks
>
> Shawn
>
>
>
> From: Jakobsson Stefan 
> Sent: Monday, October 18, 2021 2:26 PM
> To: users@nifi.apache.org
> Subject: RE: Nifi and Registry behind Citrix ADC.
>
>
>
> Ahh, no ADC as in applicationdelivery and loadbalancing 
>
>
>
> Stefan Jakobsson
>
>
> Systems Manager  |  Scania IT, IKCA |  Scania CV AB
>
> Phone: +46 8 553 527 27 Mobile: +46 7 008 834 76
>
> Forskargatan 20, SE-151 87 Södertälje, Sweden
>
> stefan.jakobs...@scania.com
>
>
>
> From: Lehel Boér 
> Sent: den 18 oktober 2021 15:03
> To: users@nifi.apache.org
> Subject: Re: Nifi and Registry behind Citrix ADC.
>
>
>
> Hi Stefan,
>
>
>
> Please disregard my prior response. The name mislead me, I discovered ADC is 
> not the same as Active Directory.
>
>
>
> Kind Regards,
>
> Lehel Boér
>
>
>
> Lehel Boér  ezt írta (időpont: 2021. okt. 18., H, 
> 14:54):
>
> Hi Stefan,
>
>
>
> Have you tried setting up NiFi with an LDAP provider? Here are a few useful 
> links.
>
>
>
> - 
> https://docs.cloudera.com/HDPDocuments/HDF3/HDF-3.4.1.1/nifi-security/content/ldap_login_identity_provider.html
>
> - https://pierrevillard.com/2017/01/24/integration-of-nifi-with-ldap
>
>
>
> Kind Regards,
>
> Lehel Boér
>
>
>
> Jakobsson Stefan  ezt írta (időpont: 2021. okt. 
> 18., H, 13:02):
>
> Hello,
>
>
>
> I have some issues trying to run Nifi and Nifi-registry behind an ADC. Reason 
> for this is that we need Nifi be accessible from aws onto our onprem nifi 
> installation due demands from our IT sec department
>
>
>
> Anyhow, I can connect to Nifi-Registry on the servers ipconfig (i.e. 
> x.x.x.x:9443/nifi-registry) without problems, but if I try to use the URL 
> setup in the ADC with 9443 redirected to the nifiservers IP we get an error 
> saying:
>
>
>
> This page isn’t working
>
> nifiprod.oururl.com didn’t send any data.
>
> ERR_EMPTY_RESPONSE
>
>
>
> Anyone has any ideas what I should start looking at? I set the https.host to 
> 0.0.0.0 in nifi-registry.conf.
>
>
>
> Stefan Jakobsson
>
>
> Systems Manager  |  Scania IT, IKCA |  Scania CV AB
>
> Phone: +46 8 553 527 27 Mobile: +46 7 008 834 76
>
> Forskargatan 20, SE-151 87 Södertälje, Sweden
>
> stefan.jakobs...@scania.com
>
>