Hi, I'm receiving multi-part csv files with related records, that I need to translate into custom json to match our API.
For example if a user purchased 3 items as part of order #123, then order #123 will occupy 3 rows. That same record can be broken up among multiple files. For example order #123 may occupy the last row of file one and the first 2 rows of file 2. The records are guaranteed to be in order, so as soon as I receive a row for order #124, I know that order #123 is complete. The files are being delivered daily in batch, and there is a 'success' file to inform me when the last part file has been uploaded (to S3). I coworker recommended this flow: 1. wait for the success file 2. download everything (so that a delay downloading files doesn't put us over the amount of time merge content is set to wait). 3. run split record to break the files into individual rows 4. run merge content, joining on the order number 5. convert the merged order to json 6. batch the individual json orders (API accepts multiple orders per payload) 7. send on This makes sense, though I'd like to know if I can use something more explicit than a wait-time on MergeContent. Because the files are ordered, there is no need to continue waiting after we've moved on to the next order. Conversly, if we were more explicit, then we can begin processing as soon as the first file is available for download (no need to wait for the success file). With this current flow I'm forced to wait for all files to download because a delay in receiving file_2 could result in two order transactions for Order #123, one with the first item (from the first file) and one with the last 2 items (from the 2nd file). Our destination system doesn't accept duplicate transactions, and would end up ignoring these last two items. I'm happy to continue with the flow above, and maybe that's the best solution, but I'm new to NiFi, so would be very interested in what others consider to be best practice. Thanks in advance, Eric -- Sent from: http://apache-nifi-users-list.2361937.n4.nabble.com/
