Jens Update from hour 67. Still lookin' good.
Will advise. Thanks On Thu, Oct 28, 2021 at 8:08 AM Jens M. Kofoed <[email protected]> wrote: > > Many many thanks đ Joe for looking into this. My test flow was running for 6 > days before the first error occurred > > Thanks > > > Den 28. okt. 2021 kl. 16.57 skrev Joe Witt <[email protected]>: > > > > Jens, > > > > Am 40+ hours in running both your flow and mine to reproduce. So far > > neither have shown any sign of trouble. Will keep running for another > > week or so if I can. > > > > Thanks > > > >> On Wed, Oct 27, 2021 at 12:42 PM Jens M. Kofoed <[email protected]> > >> wrote: > >> > >> The Physical hosts with VMWare is using the vmfs but the vm machines > >> running at hosts canât see that. > >> But you asked about the underlying file system đ and since my first answer > >> with the copy from the fstab file wasnât enough I just wanted to give all > >> the details đ. > >> > >> If you create a vm for windows you would probably use NTFS (on top of > >> vmfs). For Linux EXT3, EXT4, BTRFS, XFS and so on. > >> > >> All the partitions at my nifi nodes, are local devices (sda, sdb, sdc and > >> sdd) for each Linux machine. I donât use nfs > >> > >> Kind regards > >> Jens > >> > >> > >> > >> Den 27. okt. 2021 kl. 17.47 skrev Joe Witt <[email protected]>: > >> > >> Jens, > >> > >> I don't quite follow the EXT4 usage on top of VMFS but the point here > >> is you'll ultimately need to truly understand your underlying storage > >> system and what sorts of guarantees it is giving you. If linux/the > >> jvm/nifi think it has a typical EXT4 type block storage system to work > >> with it can only be safe/operate within those constraints. I have no > >> idea about what VMFS brings to the table or the settings for it. > >> > >> The sync properties I shared previously might help force the issue of > >> ensuring a formal sync/flush cycle all the way through the disk has > >> occurred which we'd normally not do or need to do but again in some > >> cases offers a stronger guarantee in exchange for performance. > >> > >> In any case...Mark's path for you here will help identify what we're > >> dealing with and we can go from there. > >> > >> I am aware of significant usage of NiFi on VMWare configurations > >> without issue at high rates for many years so whatever it is here is > >> likely solvable. > >> > >> Thanks > >> > >> On Wed, Oct 27, 2021 at 7:28 AM Jens M. Kofoed <[email protected]> > >> wrote: > >> > >> > >> Hi Mark > >> > >> > >> Thanks for the clarification. I will implement the script when I return to > >> the office at Monday next week ( November 1st). > >> > >> I donât use NFS, but ext4. But I will implement the script so we can check > >> if itâs the case here. But I think the issue might be after the processors > >> writing content to the repository. > >> > >> I have a test flow running for more than 2 weeks without any errors. But > >> this flow only calculate hash and comparing. > >> > >> > >> Two other flows both create errors. One flow use > >> PutSFTP->FetchSFTP->CryptographicHashContent->compares. The other flow use > >> MergeContent->UnpackContent->CryptographicHashContent->compares. The last > >> flow is totally inside nifi, excluding other network/server issues. > >> > >> > >> In both cases the CryptographicHashContent is right after a process which > >> writes new content to the repository. But in one case a file in our > >> production flow did calculate a wrong hash 4 times with a 1 minutes delay > >> between each calculation. A few hours later I looped the file back and > >> this time it was OK. > >> > >> Just like the case in step 5 and 12 in the pdf file > >> > >> > >> I will let you all know more later next week > >> > >> > >> Kind regards > >> > >> Jens > >> > >> > >> > >> > >> Den 27. okt. 2021 kl. 15.43 skrev Mark Payne <[email protected]>: > >> > >> > >> And the actual script: > >> > >> > >> > >> import org.apache.nifi.flowfile.FlowFile > >> > >> > >> import java.util.stream.Collectors > >> > >> > >> Map<String, String> getPreviousHistogram(final FlowFile flowFile) { > >> > >> final Map<String, String> histogram = > >> flowFile.getAttributes().entrySet().stream() > >> > >> .filter({ entry -> entry.getKey().startsWith("histogram.") }) > >> > >> .collect(Collectors.toMap({ entry -> entry.key}, { entry -> > >> entry.value })) > >> > >> return histogram; > >> > >> } > >> > >> > >> Map<String, String> createHistogram(final FlowFile flowFile, final > >> InputStream inStream) { > >> > >> final Map<String, String> histogram = new HashMap<>(); > >> > >> final int[] distribution = new int[256]; > >> > >> Arrays.fill(distribution, 0); > >> > >> > >> long total = 0L; > >> > >> final byte[] buffer = new byte[8192]; > >> > >> int len; > >> > >> while ((len = inStream.read(buffer)) > 0) { > >> > >> for (int i=0; i < len; i++) { > >> > >> final int val = buffer[i]; > >> > >> distribution[val]++; > >> > >> total++; > >> > >> } > >> > >> } > >> > >> > >> for (int i=0; i < 256; i++) { > >> > >> histogram.put("histogram." + i, String.valueOf(distribution[i])); > >> > >> } > >> > >> histogram.put("histogram.totalBytes", String.valueOf(total)); > >> > >> > >> return histogram; > >> > >> } > >> > >> > >> void logHistogramDifferences(final Map<String, String> previous, final > >> Map<String, String> updated) { > >> > >> final StringBuilder sb = new StringBuilder("There are differences in the > >> histogram\n"); > >> > >> final Map<String, String> sorted = new TreeMap<>(previous) > >> > >> for (final Map.Entry<String, String> entry : sorted.entrySet()) { > >> > >> final String key = entry.getKey(); > >> > >> final String previousValue = entry.getValue(); > >> > >> final String updatedValue = updated.get(entry.getKey()) > >> > >> > >> if (!Objects.equals(previousValue, updatedValue)) { > >> > >> sb.append("Byte Value: ").append(key).append(", Previous Count: > >> ").append(previousValue).append(", New Count: > >> ").append(updatedValue).append("\n"); > >> > >> } > >> > >> } > >> > >> > >> log.error(sb.toString()); > >> > >> } > >> > >> > >> > >> def flowFile = session.get() > >> > >> if (flowFile == null) { > >> > >> return > >> > >> } > >> > >> > >> final Map<String, String> previousHistogram = > >> getPreviousHistogram(flowFile) > >> > >> Map<String, String> histogram = null; > >> > >> > >> final InputStream inStream = session.read(flowFile); > >> > >> try { > >> > >> histogram = createHistogram(flowFile, inStream); > >> > >> } finally { > >> > >> inStream.close() > >> > >> } > >> > >> > >> if (!previousHistogram.isEmpty()) { > >> > >> if (previousHistogram.equals(histogram)) { > >> > >> log.info("Histograms match") > >> > >> } else { > >> > >> logHistogramDifferences(previousHistogram, histogram) > >> > >> session.transfer(flowFile, REL_FAILURE) > >> > >> return; > >> > >> } > >> > >> } > >> > >> > >> flowFile = session.putAllAttributes(flowFile, histogram) > >> > >> session.transfer(flowFile, REL_SUCCESS) > >> > >> > >> > >> > >> > >> > >> > >> On Oct 27, 2021, at 9:43 AM, Mark Payne <[email protected]> wrote: > >> > >> > >> Jens, > >> > >> > >> For a bit of background here, the reason that Joe and I have expressed > >> interest in NFS file systems is that the way the protocol works, it is > >> allowed to receive packets/chunks of the file out-of-order. So, what > >> happens is letâs say a 1 MB file is being written. The first 500 KB are > >> received. Then instead of the the 501st KB it receives the 503rd KB. What > >> happens is that the size of the file on the file system becomes 503 KB. > >> But what about 501 & 502? Well when you read the data, the file system > >> just returns ASCII NUL characters (byte 0) for those bytes. Once the NFS > >> server receives those bytes, it then goes back and fills in the proper > >> bytes. So if youâre running on NFS, it is possible for the contents of the > >> file on the underlying file system to change out from under you. Itâs not > >> clear to me what other types of file system might do something similar. > >> > >> > >> So, one thing that we can do is to find out whether or not the contents of > >> the underlying file have changed in some way, or if thereâs something else > >> happening that could perhaps result in the hashes being wrong. Iâve put > >> together a script that should help diagnose this. > >> > >> > >> Can you insert an ExecuteScript processor either just before or just after > >> your CryptographicHashContent processor? Doesnât really matter whether > >> itâs run just before or just after. Iâll attach the script here. Itâs a > >> Groovy Script so you should be able to use ExecuteScript with Script > >> Engine = Groovy and the following script as the Script Body. No other > >> changes needed. > >> > >> > >> The way the script works, it reads in the contents of the FlowFile, and > >> then it builds up a histogram of all byte values (0-255) that it sees in > >> the contents, and then adds that as attributes. So it adds attributes such > >> as: > >> > >> histogram.0 = 280273 > >> > >> histogram.1 = 2820 > >> > >> histogram.2 = 48202 > >> > >> histogram.3 = 3820 > >> > >> ⌠> >> > >> histogram.totalBytes = 1780928732 > >> > >> > >> It then checks if those attributes have already been added. If so, after > >> calculating that histogram, it checks against the previous values (in the > >> attributes). If they are the same, the FlowFile goes to âsuccessâ. If they > >> are different, it logs an error indicating the before/after value for any > >> byte whose distribution was different, and it routes to failure. > >> > >> > >> So, if for example, the first time through it sees 280,273 bytes with a > >> value of â0â, and the second times it only sees 12,001 then we know there > >> were a bunch of 0âs previously that were updated to be some other value. > >> And it includes the total number of bytes in case somehow we find that > >> weâre reading too many bytes or not enough bytes or something like that. > >> This should help narrow down whatâs happening. > >> > >> > >> Thanks > >> > >> -Mark > >> > >> > >> > >> > >> On Oct 26, 2021, at 6:25 PM, Joe Witt <[email protected]> wrote: > >> > >> > >> Jens > >> > >> > >> Attached is the flow I was using (now running yours and this one). > >> Curious if that one reproduces the issue for you as well. > >> > >> > >> Thanks > >> > >> > >> On Tue, Oct 26, 2021 at 3:09 PM Joe Witt <[email protected]> wrote: > >> > >> > >> Jens > >> > >> > >> I have your flow running and will keep it running for several days/week to > >> see if I can reproduce. Also of note please use your same test flow but > >> use HashContent instead of crypto hash. Curious if that matters for any > >> reason... > >> > >> > >> Still want to know more about your underlying storage system. > >> > >> > >> You could also try updating nifi.properties and changing the following > >> lines: > >> > >> nifi.flowfile.repository.always.sync=true > >> > >> nifi.content.repository.always.sync=true > >> > >> nifi.provenance.repository.always.sync=true > >> > >> > >> It will hurt performance but can be useful/necessary on certain storage > >> subsystems. > >> > >> > >> Thanks > >> > >> > >> On Tue, Oct 26, 2021 at 12:05 PM Joe Witt <[email protected]> wrote: > >> > >> > >> Ignore "For the scenario where you can replicate this please share the > >> flow.xml.gz for which it is reproducible." I see the uploaded JSON > >> > >> > >> On Tue, Oct 26, 2021 at 12:04 PM Joe Witt <[email protected]> wrote: > >> > >> > >> Jens, > >> > >> > >> We asked about the underlying storage system. You replied with some info > >> but not the specifics. Do you know precisely what the underlying storage > >> is and how it is presented to the operating system? For instance is it > >> NFS or something similar? > >> > >> > >> I've setup a very similar flow at extremely high rates running for the > >> past several days with no issue. In my case though I know precisely what > >> the config is and the disk setup is. Didn't do anything special to be > >> clear but still it is important to know. > >> > >> > >> For the scenario where you can replicate this please share the flow.xml.gz > >> for which it is reproducible. > >> > >> > >> Thanks > >> > >> Joe > >> > >> > >> On Sun, Oct 24, 2021 at 9:53 PM Jens M. Kofoed <[email protected]> > >> wrote: > >> > >> > >> Dear Joe and Mark > >> > >> > >> I have created a test flow without the sftp processors, which don't create > >> any errors. Therefore I created a new test flow where I use a MergeContent > >> and UnpackContent instead of the sftp processors. This keeps all data > >> internal in NIFI, but force NIFI to write and read new files totally local. > >> > >> My flow have been running for 7 days and this morning there where 2 files > >> where the sha256 has been given another has value than original. I have > >> set this flow up in another nifi cluster only for testing, and the cluster > >> is not doing anything else. It is using Nifi 1.14.0 > >> > >> So I can reproduce issues at different nifi clusters and versions (1.13.2 > >> and 1.14.0) where the calculation of a hash on content can give different > >> outputs. Is doesn't make any sense, but it happens. In all my cases the > >> issues happens where the calculations of the hashcontent happens right > >> after NIFI writes the content to the content repository. I don't know if > >> there cut be some kind of delay writing the content 100% before the next > >> processors begin reading the content??? > >> > >> > >> Please see attach test flow, and the previous mail with a pdf showing the > >> lineage of a production file which also had issues. In the pdf check step > >> 5 and 12. > >> > >> > >> Kind regards > >> > >> Jens M. Kofoed > >> > >> > >> > >> Den tor. 21. okt. 2021 kl. 08.28 skrev Jens M. Kofoed > >> <[email protected]>: > >> > >> > >> Joe, > >> > >> > >> To start from the last mail :-) > >> > >> All the repositories has it's own disk, and I'm using ext4 > >> > >> /dev/VG_b/LV_b /nifiRepo ext4 defaults,noatime 0 0 > >> > >> /dev/VG_c/LV_c /provRepo01 ext4 defaults,noatime 0 0 > >> > >> /dev/VG_d/LV_d /contRepo01 ext4 defaults,noatime 0 0 > >> > >> > >> My test flow WITH sftp looks like this: > >> > >> <image.png> > >> > >> And this flow has produced 1 error within 3 days. After many many loops > >> the file fails and went out via the "unmatched" output to the disabled > >> UpdateAttribute, which is doing nothing. Just for keeping the failed > >> flowfile in a queue. I enabled the UpdateAttribute and looped the file > >> back to the CryptographicHashContent and now it calculated the hash > >> correct again. But in this flow I have a FetchSFTP Process right before > >> the Hashing. > >> > >> Right now my flow is running without the 2 sftp processors, and the last > >> 24hours there has been no errors. > >> > >> > >> About the Lineage: > >> > >> Are there a way to export all the lineage data? The export only generate a > >> svg file. > >> > >> This is only for the receiving nifi which is internally calculate 2 > >> different hashes on the same content with ca. 1 minutes delay. Attached is > >> a pdf-document with the lineage, the flow and all the relevant Provenance > >> information's for each step in the lineage. > >> > >> The interesting steps are step 5 and 12. > >> > >> > >> Can the issues be that data is not written 100% to disk between step 4 and > >> 5 in the flow? > >> > >> > >> Kind regards > >> > >> Jens M. Kofoed > >> > >> > >> > >> > >> Den ons. 20. okt. 2021 kl. 23.49 skrev Joe Witt <[email protected]>: > >> > >> > >> Jens, > >> > >> > >> Also what type of file system/storage system are you running NiFi on > >> > >> in this case? We'll need to know this for the NiFi > >> > >> content/flowfile/provenance repositories? Is it NFS? > >> > >> > >> Thanks > >> > >> > >> On Wed, Oct 20, 2021 at 11:14 AM Joe Witt <[email protected]> wrote: > >> > >> > >> Jens, > >> > >> > >> And to further narrow this down > >> > >> > >> "I have a test flow, where a GenerateFlowfile has created 6x 1GB files > >> > >> (2 files per node) and next process was a hashcontent before it run > >> > >> into a test loop. Where files are uploaded via PutSFTP to a test > >> > >> server, and downloaded again and recalculated the hash. I have had one > >> > >> issue after 3 days of running." > >> > >> > >> So to be clear with GenerateFlowFile making these files and then you > >> > >> looping the content is wholly and fully exclusively within the control > >> > >> of NiFI. No Get/Fetch/Put-SFTP of any kind at all. In by looping the > >> > >> same files over and over in nifi itself you can make this happen or > >> > >> cannot? > >> > >> > >> Thanks > >> > >> > >> On Wed, Oct 20, 2021 at 11:08 AM Joe Witt <[email protected]> wrote: > >> > >> > >> Jens, > >> > >> > >> "After fetching a FlowFile-stream file and unpacked it back into NiFi > >> > >> I calculate a sha256. 1 minutes later I recalculate the sha256 on the > >> > >> exact same file. And got a new hash. That is what worryâs me. > >> > >> The fact that the same file can be recalculated and produce two > >> > >> different hashes, is very strange, but it happens. " > >> > >> > >> Ok so to confirm you are saying that in each case this happens you see > >> > >> it first compute the wrong hash, but then if you retry the same > >> > >> flowfile it then provides the correct hash? > >> > >> > >> Can you please also show/share the lineage history for such a flow > >> > >> file then? It should have events for the initial hash, second hash, > >> > >> the unpacking, trace to the original stream, etc... > >> > >> > >> Thanks > >> > >> > >> On Wed, Oct 20, 2021 at 11:00 AM Jens M. Kofoed <[email protected]> > >> wrote: > >> > >> > >> Dear Mark and Joe > >> > >> > >> I know my setup isnât normal for many people. But if we only looks at my > >> receive side, which the last mails is about. Every thing is happening at > >> the same NIFI instance. It is the same 3 node NIFI cluster. > >> > >> After fetching a FlowFile-stream file and unpacked it back into NiFi I > >> calculate a sha256. 1 minutes later I recalculate the sha256 on the exact > >> same file. And got a new hash. That is what worryâs me. > >> > >> The fact that the same file can be recalculated and produce two different > >> hashes, is very strange, but it happens. Over the last 5 months it have > >> only happen 35-40 times. > >> > >> > >> I can understand if the file is not completely loaded and saved into the > >> content repository before the hashing starts. But I believe that the > >> unpack process donât forward the flow file to the next process before it > >> is 100% finish unpacking and saving the new content to the repository. > >> > >> > >> I have a test flow, where a GenerateFlowfile has created 6x 1GB files (2 > >> files per node) and next process was a hashcontent before it run into a > >> test loop. Where files are uploaded via PutSFTP to a test server, and > >> downloaded again and recalculated the hash. I have had one issue after 3 > >> days of running. > >> > >> Now the test flow is running without the Put/Fetch sftp processors. > >> > >> > >> Another problem is that I canât find any correlation to other events. Not > >> within NIFI, nor the server itself or VMWare. If I just could find any > >> other event which happens at the same time, I might be able to force some > >> kind of event to trigger the issue. > >> > >> I have tried to force VMware to migrate a NiFi node to another host. > >> Forcing it to do a snapshot and deleting snapshots, but nothing can > >> trigger and error. > >> > >> > >> I know it will be very very difficult to reproduce. But I will setup > >> multiple NiFi instances running different test flows to see if I can find > >> any reason why it behaves as it does. > >> > >> > >> Kind Regards > >> > >> Jens M. Kofoed > >> > >> > >> Den 20. okt. 2021 kl. 16.39 skrev Mark Payne <[email protected]>: > >> > >> > >> Jens, > >> > >> > >> Thanks for sharing the images. > >> > >> > >> I tried to setup a test to reproduce the issue. Iâve had it running for > >> quite some time. Running through millions of iterations. > >> > >> > >> Iâve used 5 KB files, 50 KB files, 50 MB files, and larger (to the tune of > >> hundreds of MB). Iâve been unable to reproduce an issue after millions of > >> iterations. > >> > >> > >> So far I cannot replicate. And since youâre pulling the data via SFTP and > >> then unpacking, which preserves all original attributes from a different > >> system, this can easily become confusing. > >> > >> > >> Recommend trying to reproduce with SFTP-related processors out of the > >> picture, as Joe is mentioning. Either using GetFile/FetchFile or > >> GenerateFlowFile. Then immediately use CryptographicHashContent to > >> generate an âinitial hashâ, copy that value to another attribute, and then > >> loop, generating the hash and comparing against the original one. Iâll > >> attach a flow that does this, but not sure if the email server will strip > >> out the attachment or not. > >> > >> > >> This way we remove any possibility of actual corruption between the two > >> nifi instances. If we can still see corruption / different hashes within a > >> single nifi instance, then it certainly warrants further investigation but > >> i canât see any issues so far. > >> > >> > >> Thanks > >> > >> -Mark > >> > >> > >> > >> > >> > >> > >> On Oct 20, 2021, at 10:21 AM, Joe Witt <[email protected]> wrote: > >> > >> > >> Jens > >> > >> > >> Actually is this current loop test contained within a single nifi and > >> there you see corruption happen? > >> > >> > >> Joe > >> > >> > >> On Wed, Oct 20, 2021 at 7:14 AM Joe Witt <[email protected]> wrote: > >> > >> > >> Jens, > >> > >> > >> You have a very involved setup including other systems (non NiFi). Have > >> you removed those systems from the equation so you have more evidence to > >> support your expectation that NiFi is doing something other than you > >> expect? > >> > >> > >> Joe > >> > >> > >> On Wed, Oct 20, 2021 at 7:10 AM Jens M. Kofoed <[email protected]> > >> wrote: > >> > >> > >> Hi > >> > >> > >> Today I have another file which have been running through the retry loop > >> one time. To test the processors and the algorithm I added the HashContent > >> processor and also added hashing by SHA-1. > >> > >> I file have been going through the system, and both the SHA-1 and SHA-256 > >> are both different than expected. with a 1 minutes delay the file is going > >> back into the hashing content flow and this time it calculates both hashes > >> fine. > >> > >> > >> I don't believe that the hashing is buggy, but something is very very > >> strange. What can influence the processors/algorithm to calculate a > >> different hash??? > >> > >> All the input/output claim information is exactly the same. It is the same > >> flow/content file going in a loop. It happens on all 3 nodes. > >> > >> > >> Any suggestions for where to dig ? > >> > >> > >> Regards > >> > >> Jens M. Kofoed > >> > >> > >> > >> > >> Den ons. 20. okt. 2021 kl. 06.34 skrev Jens M. Kofoed > >> <[email protected]>: > >> > >> > >> Hi Mark > >> > >> > >> Thanks for replaying and the suggestion to look at the content Claim. > >> > >> These 3 pictures is from the first attempt: > >> > >> <image.png> <image.png> <image.png> > >> > >> > >> Yesterday I realized that the content was still in the archive, so I could > >> Replay the file. > >> > >> <image.png> > >> > >> So here are the same pictures but for the replay and as you can see the > >> Identifier, offset and Size are all the same. > >> > >> <image.png> <image.png> <image.png> > >> > >> > >> In my flow if the hash does not match my original first calculated hash, > >> it goes into a retry loop. Here are the pictures for the 4th time the file > >> went through: > >> > >> <image.png> <image.png> <image.png> > >> > >> Here the content Claim is all the same. > >> > >> > >> It is very rare that we see these issues <1 : 1.000.000 files and only > >> with large files. Only once have I seen the error with a 110MB file, the > >> other times the files size are above 800MB. > >> > >> This time it was a Nifi-Flowstream v3 file, which has been exported from > >> one system and imported in another. But while the file has been imported > >> it is the same file inside NIFI and it stays at the same node. Going > >> through the same loop of processors multiple times and in the end the > >> CryptographicHashContent calculate a different SHA256 than it did earlier. > >> This should not be possible!!! And that is what concern my the most. > >> > >> What can influence the same processor to calculate 2 different sha256 on > >> the exact same content??? > >> > >> > >> Regards > >> > >> Jens M. Kofoed > >> > >> > >> > >> Den tir. 19. okt. 2021 kl. 16.51 skrev Mark Payne <[email protected]>: > >> > >> > >> Jens, > >> > >> > >> In the two provenance events - one showing a hash of dd4cc⌠and the other > >> showing f6f0âŚ. > >> > >> If you go to the Content tab, do they both show the same Content Claim? > >> I.e., do the Input Claim / Output Claim show the same values for > >> Container, Section, Identifier, Offset, and Size? > >> > >> > >> Thanks > >> > >> -Mark > >> > >> > >> On Oct 19, 2021, at 1:22 AM, Jens M. Kofoed <[email protected]> wrote: > >> > >> > >> Dear NIFI Users > >> > >> > >> I have posted this mail in the developers mailing list and just want to > >> inform all of our about a very odd behavior we are facing. > >> > >> The background: > >> > >> We have data going between 2 different NIFI systems which has no direct > >> network access to each other. Therefore we calculate a SHA256 hash value > >> of the content at system 1, before the flowfile and data are combined and > >> saved as a "flowfile-stream-v3" pkg file. The file is then transported to > >> system 2, where the pkg file is unpacked and the flow can continue. To be > >> sure about file integrity we calculate a new sha256 at system 2. But > >> sometimes we see that the sha256 gets another value, which might suggest > >> the file was corrupted. But recalculating the sha256 again gives a new > >> hash value. > >> > >> > >> ---- > >> > >> > >> Tonight I had yet another file which didn't match the expected sha256 hash > >> value. The content is a 1.7GB file and the Event Duration was > >> "00:00:17.539" to calculate the hash. > >> > >> I have created a Retry loop, where the file will go to a Wait process for > >> delaying the file 1 minute and going back to the CryptographicHashContent > >> for a new calculation. After 3 retries the file goes to the > >> retries_exceeded and goes to a disabled process just to be in a queue so I > >> manually can look at it. This morning I rerouted the file from my > >> retries_exceeded queue back to the CryptographicHashContent for a new > >> calculation and this time it calculated the correct hash value. > >> > >> > >> THIS CAN'T BE TRUE :-( :-( But it is. - Something very very strange is > >> happening. > >> > >> <image.png> > >> > >> > >> We are running NiFi 1.13.2 in a 3 node cluster at Ubuntu 20.04.02 with > >> openjdk version "1.8.0_292", OpenJDK Runtime Environment (build > >> 1.8.0_292-8u292-b10-0ubuntu1~20.04-b10), OpenJDK 64-Bit Server VM (build > >> 25.292-b10, mixed mode). Each server is a VM with 4 CPU, 8GB Ram on VMware > >> ESXi, 7.0.2. Each NIFI node is running at different vm physical hosts. > >> > >> I have inspected different logs to see if I can find any correlation what > >> happened at the same time as the file is going through my loop, but there > >> are no event/task at that exact time. > >> > >> > >> System 1: > >> > >> At 10/19/2021 00:15:11.247 CEST my file is going through a > >> CryptographicHashContent: SHA256 value: > >> dd4cc7ef8dbc8d70528e8aa788581f0ab88d297c9c9f39b6b542df68952efd20 > >> > >> The file is exported as a "FlowFile Stream, v3" to System 2 > >> > >> > >> SYSTEM 2: > >> > >> At 10/19/2021 00:18:10.528 CEST the file is going through a > >> CryptographicHashContent: SHA256 value: > >> f6f0909aacae4952f10f6fa7704f3e55d0481ec211d495993550aedbb3fe0819 > >> > >> <image.png> > >> > >> At 10/19/2021 00:19:08.996 CEST the file is going through the same > >> CryptographicHashContent at system 2: SHA256 value: > >> f6f0909aacae4952f10f6fa7704f3e55d0481ec211d495993550aedbb3fe0819 > >> > >> At 10/19/2021 00:20:04.376 CEST the file is going through the same a > >> CryptographicHashContent at system 2: SHA256 value: > >> f6f0909aacae4952f10f6fa7704f3e55d0481ec211d495993550aedbb3fe0819 > >> > >> At 10/19/2021 00:21:01.711 CEST the file is going through the same a > >> CryptographicHashContent at system 2: SHA256 value: > >> f6f0909aacae4952f10f6fa7704f3e55d0481ec211d495993550aedbb3fe0819 > >> > >> > >> At 10/19/2021 06:07:43.376 CEST the file is going through the same a > >> CryptographicHashContent at system 2: SHA256 value: > >> dd4cc7ef8dbc8d70528e8aa788581f0ab88d297c9c9f39b6b542df68952efd20 > >> > >> <image.png> > >> > >> > >> How on earth can this happen??? > >> > >> > >> Kind Regards > >> > >> Jens M. Kofoed > >> > >> > >> > >> > >> <Repro.json> > >> > >> > >> <Try_to_recreate_Jens_Challenge.json> > >> > >> > >> > >>
