Hello
I'm the "BUG" :)
My mind was stay when I used the mergeContent base on number of flowfiles :)
My mind was stay when I used the mergeContent base on number of flowfiles :)
In the case of mergeRecord, the merge is base on Records and not flowfiles :)
Thanks for the response
Regards
Minh
Envoyé: mardi 18 mars 2025 à 19:12
De: "Rafael Fracasso" <rafaelfraca...@gmail.com>
À: users@nifi.apache.org
Objet: Re: NIFI mergeRecord bug
De: "Rafael Fracasso" <rafaelfraca...@gmail.com>
À: users@nifi.apache.org
Objet: Re: NIFI mergeRecord bug
Based on the configuration of your MergeRecord processor and on the few information you provided, there are a few potential reasons why it might not be merging all 4 files into a single file.
2. Maximum Number of Records
You have set Maximum Number of Records to 1000. This is fine unless you have more than 1000 records in total across the 4 files, in which case it would split them into multiple files.
3. Minimum Bin Size
You have set Minimum Bin Size to 0 B, which is correct as it allows bins to be created even if they are very small.
4. Maximum Bin Size
You have not set a value for Maximum Bin Size. This means there is no upper limit on the size of the merged file, which is good if you want to merge everything into one file.
5. Max Bin Age
You have set Max Bin Age to 20 minutes. This means that if a bin (a group of records waiting to be merged) sits in the queue for more than 20 minutes without reaching the minimum number of records, it will be flushed and merged anyway. If your flow is processing quickly, this shouldn't be an issue.
6. Correlation Attribute Name
You have set Correlation Attribute Name to filename. This means that records with the same filename attribute will be grouped together. If your 4 files have different filenames, they will not be merged together. Ensure that the filename attribute is consistent across the records you want to merge.
7. Merge Strategy
You are using the Bin-Packing Algorithm, which tries to optimize the merging process by minimizing the number of output files while respecting the constraints you've set. This is generally a good choice.
Recommendations:
Check Record Count : Verify the number of records in each of the 4 files. If each file has fewer than 20 records, they won't be merged until more records are available.
Consistent Correlation Attribute : Ensure that the filename attribute is consistent across the records you want to merge. If you don't need this correlation, you can leave the Correlation Attribute Name field empty.
Adjust Minimum Number of Records : If you want to ensure that all 4 files are merged regardless of the number of records, you can lower the Minimum Number of Records setting or remove it entirely.
If you adjust these settings and still encounter issues, you may want to monitor the queue sizes and the attributes of the incoming FlowFiles to understand why they are not being merged as expected.
1. Minimum Number of Records
You have set Minimum Number of Records to 20. This means that the MergeRecord processor will only merge records when it has at least 20 records available. If each of your 4 files contains fewer than 20 records, they won't be merged until enough records accumulate.
You have set Minimum Number of Records to 20. This means that the MergeRecord processor will only merge records when it has at least 20 records available. If each of your 4 files contains fewer than 20 records, they won't be merged until enough records accumulate.
2. Maximum Number of Records
You have set Maximum Number of Records to 1000. This is fine unless you have more than 1000 records in total across the 4 files, in which case it would split them into multiple files.
3. Minimum Bin Size
You have set Minimum Bin Size to 0 B, which is correct as it allows bins to be created even if they are very small.
4. Maximum Bin Size
You have not set a value for Maximum Bin Size. This means there is no upper limit on the size of the merged file, which is good if you want to merge everything into one file.
5. Max Bin Age
You have set Max Bin Age to 20 minutes. This means that if a bin (a group of records waiting to be merged) sits in the queue for more than 20 minutes without reaching the minimum number of records, it will be flushed and merged anyway. If your flow is processing quickly, this shouldn't be an issue.
6. Correlation Attribute Name
You have set Correlation Attribute Name to filename. This means that records with the same filename attribute will be grouped together. If your 4 files have different filenames, they will not be merged together. Ensure that the filename attribute is consistent across the records you want to merge.
7. Merge Strategy
You are using the Bin-Packing Algorithm, which tries to optimize the merging process by minimizing the number of output files while respecting the constraints you've set. This is generally a good choice.
Recommendations:
Check Record Count : Verify the number of records in each of the 4 files. If each file has fewer than 20 records, they won't be merged until more records are available.
Consistent Correlation Attribute : Ensure that the filename attribute is consistent across the records you want to merge. If you don't need this correlation, you can leave the Correlation Attribute Name field empty.
Adjust Minimum Number of Records : If you want to ensure that all 4 files are merged regardless of the number of records, you can lower the Minimum Number of Records setting or remove it entirely.
If you adjust these settings and still encounter issues, you may want to monitor the queue sizes and the attributes of the incoming FlowFiles to understand why they are not being merged as expected.
Em ter., 18 de mar. de 2025 às 09:25, <e-soci...@gmx.fr> escreveu:
Hello,
The executeSQL drop 4 files and we have expected to mergeRecord to have only one file.
But I don't know why the processor MergeRecord don't merge all 4 files in single file
Here the config.
Could you tell me what is wrong ?
Thanks
Minh