Jens, In the two provenance events - one showing a hash of dd4cc… and the other showing f6f0…. If you go to the Content tab, do they both show the same Content Claim? I.e., do the Input Claim / Output Claim show the same values for Container, Section, Identifier, Offset, and Size?
Thanks -Mark On Oct 19, 2021, at 1:22 AM, Jens M. Kofoed <[email protected]<mailto:[email protected]>> wrote: Dear NIFI Users I have posted this mail in the developers mailing list and just want to inform all of our about a very odd behavior we are facing. The background: We have data going between 2 different NIFI systems which has no direct network access to each other. Therefore we calculate a SHA256 hash value of the content at system 1, before the flowfile and data are combined and saved as a "flowfile-stream-v3" pkg file. The file is then transported to system 2, where the pkg file is unpacked and the flow can continue. To be sure about file integrity we calculate a new sha256 at system 2. But sometimes we see that the sha256 gets another value, which might suggest the file was corrupted. But recalculating the sha256 again gives a new hash value. ---- Tonight I had yet another file which didn't match the expected sha256 hash value. The content is a 1.7GB file and the Event Duration was "00:00:17.539" to calculate the hash. I have created a Retry loop, where the file will go to a Wait process for delaying the file 1 minute and going back to the CryptographicHashContent for a new calculation. After 3 retries the file goes to the retries_exceeded and goes to a disabled process just to be in a queue so I manually can look at it. This morning I rerouted the file from my retries_exceeded queue back to the CryptographicHashContent for a new calculation and this time it calculated the correct hash value. THIS CAN'T BE TRUE :-( :-( But it is. - Something very very strange is happening. <image.png> We are running NiFi 1.13.2 in a 3 node cluster at Ubuntu 20.04.02 with openjdk version "1.8.0_292", OpenJDK Runtime Environment (build 1.8.0_292-8u292-b10-0ubuntu1~20.04-b10), OpenJDK 64-Bit Server VM (build 25.292-b10, mixed mode). Each server is a VM with 4 CPU, 8GB Ram on VMware ESXi, 7.0.2. Each NIFI node is running at different vm physical hosts. I have inspected different logs to see if I can find any correlation what happened at the same time as the file is going through my loop, but there are no event/task at that exact time. System 1: At 10/19/2021 00:15:11.247 CEST my file is going through a CryptographicHashContent: SHA256 value: dd4cc7ef8dbc8d70528e8aa788581f0ab88d297c9c9f39b6b542df68952efd20 The file is exported as a "FlowFile Stream, v3" to System 2 SYSTEM 2: At 10/19/2021 00:18:10.528 CEST the file is going through a CryptographicHashContent: SHA256 value: f6f0909aacae4952f10f6fa7704f3e55d0481ec211d495993550aedbb3fe0819 <image.png> At 10/19/2021 00:19:08.996 CEST the file is going through the same CryptographicHashContent at system 2: SHA256 value: f6f0909aacae4952f10f6fa7704f3e55d0481ec211d495993550aedbb3fe0819 At 10/19/2021 00:20:04.376 CEST the file is going through the same a CryptographicHashContent at system 2: SHA256 value: f6f0909aacae4952f10f6fa7704f3e55d0481ec211d495993550aedbb3fe0819 At 10/19/2021 00:21:01.711 CEST the file is going through the same a CryptographicHashContent at system 2: SHA256 value: f6f0909aacae4952f10f6fa7704f3e55d0481ec211d495993550aedbb3fe0819 At 10/19/2021 06:07:43.376 CEST the file is going through the same a CryptographicHashContent at system 2: SHA256 value: dd4cc7ef8dbc8d70528e8aa788581f0ab88d297c9c9f39b6b542df68952efd20 <image.png> How on earth can this happen??? Kind Regards Jens M. Kofoed
