Jens,

In the two provenance events - one showing a hash of dd4cc… and the other 
showing f6f0….
If you go to the Content tab, do they both show the same Content Claim? I.e., 
do the Input Claim / Output Claim show the same values for Container, Section, 
Identifier, Offset, and Size?

Thanks
-Mark

On Oct 19, 2021, at 1:22 AM, Jens M. Kofoed 
<[email protected]<mailto:[email protected]>> wrote:

Dear NIFI Users

I have posted this mail in the developers mailing list and just want to inform 
all of our about a very odd behavior we are facing.
The background:
We have data going between 2 different NIFI systems which has no direct network 
access to each other. Therefore we calculate a SHA256 hash value of the content 
at system 1, before the flowfile and data are combined and saved as a 
"flowfile-stream-v3" pkg file. The file is then transported to system 2, where 
the pkg file is unpacked and the flow can continue. To be sure about file 
integrity we calculate a new sha256 at system 2. But sometimes we see that the 
sha256 gets another value, which might suggest the file was corrupted. But 
recalculating the sha256 again gives a new hash value.

----

Tonight I had yet another file which didn't match the expected sha256 hash 
value. The content is a 1.7GB file and the Event Duration was "00:00:17.539" to 
calculate the hash.
I have created a Retry loop, where the file will go to a Wait process for 
delaying the file 1 minute and going back to the CryptographicHashContent for a 
new calculation. After 3 retries the file goes to the retries_exceeded and goes 
to a disabled process just to be in a queue so I manually can look at it. This 
morning I rerouted the file from my retries_exceeded queue back to the 
CryptographicHashContent for a new calculation and this time it calculated the 
correct hash value.

THIS CAN'T BE TRUE :-( :-( But it is. - Something very very strange is 
happening.
<image.png>

We are running NiFi 1.13.2 in a 3 node cluster at Ubuntu 20.04.02 with openjdk 
version "1.8.0_292", OpenJDK Runtime Environment (build 
1.8.0_292-8u292-b10-0ubuntu1~20.04-b10), OpenJDK 64-Bit Server VM (build 
25.292-b10, mixed mode). Each server is a VM with 4 CPU, 8GB Ram on VMware 
ESXi, 7.0.2. Each NIFI node is running at different vm physical hosts.
I have inspected different logs to see if I can find any correlation what 
happened at the same time as the file is going through my loop, but there are 
no event/task at that exact time.

System 1:
At 10/19/2021 00:15:11.247 CEST my file is going through a 
CryptographicHashContent: SHA256 value: 
dd4cc7ef8dbc8d70528e8aa788581f0ab88d297c9c9f39b6b542df68952efd20
The file is exported as a "FlowFile Stream, v3" to System 2

SYSTEM 2:
At 10/19/2021 00:18:10.528 CEST the file is going through a 
CryptographicHashContent: SHA256 value: 
f6f0909aacae4952f10f6fa7704f3e55d0481ec211d495993550aedbb3fe0819
<image.png>
At 10/19/2021 00:19:08.996 CEST the file is going through the same 
CryptographicHashContent at system 2: SHA256 value: 
f6f0909aacae4952f10f6fa7704f3e55d0481ec211d495993550aedbb3fe0819
At 10/19/2021 00:20:04.376 CEST the file is going through the same a 
CryptographicHashContent at system 2: SHA256 value: 
f6f0909aacae4952f10f6fa7704f3e55d0481ec211d495993550aedbb3fe0819
At 10/19/2021 00:21:01.711 CEST the file is going through the same a 
CryptographicHashContent at system 2: SHA256 value: 
f6f0909aacae4952f10f6fa7704f3e55d0481ec211d495993550aedbb3fe0819

At 10/19/2021 06:07:43.376 CEST the file is going through the same a 
CryptographicHashContent at system 2: SHA256 value: 
dd4cc7ef8dbc8d70528e8aa788581f0ab88d297c9c9f39b6b542df68952efd20
<image.png>

How on earth can this happen???

Kind Regards
Jens M. Kofoed


Reply via email to