Re: [Data Flow] File content not read completely
Hi Mark, Surely, PutFile is not the component to merge the content. I'd plan to use some other component and you have given me good pointers. Thank you! Regards, Valencia From: Mark Payne <marka...@hotmail.com> To: Valencia Serrao <vser...@us.ibm.com>, "users@nifi.apache.org" <users@nifi.apache.org> Date: 02/21/2018 06:37 PM Subject: Re: [Data Flow] File content not read completely Hey Valencia, I don't believe that PutFile allows you to append to a file, because doing so is rife with problems if you encounter any kind of error (IOException, for example) or if NiFi restarts in between. Instead, you should take a look at MergeContent. You can set the "Merge Strategy" to "Defragment" in order to re-assemble the FlowFiles that were split apart via SplitText. That being said, splitting the data apart, then using ExtractText, and merging back together can be quite expensive. If your data is JSON or CSV, then you should probably look into using the Record-Based Processors (PublishKafkaRecord, QueryRecord/PartitionRecord). This allows you to avoid ever splitting the data apart to begin with and as a result can perform dramatically better. Thanks -Mark On Feb 21, 2018, at 7:27 AM, Valencia Serrao <vser...@us.ibm.com> wrote: Hi Mark, Yes! I could get all the required entries with the respective matched and unmatched segregated in different folders. Thanks a lot, Mark!! My next plan is to check and see how to append all the flowfiles with matched entries in one file. Regards, Valencia Valencia Serrao---02/16/2018 06:00:47 PM---Hi Mark, Thanks for looking into this. I am trying to put in the components you have suggested. I'll From: Valencia Serrao/Austin/Contr/IBM To: marka...@hotmail.com Cc: users@nifi.apache.org Date: 02/16/2018 06:00 PM Subject: Re: [Data Flow] File content not read completely Hi Mark, Thanks for looking into this. I am trying to put in the components you have suggested. I'll update. Regards, Valencia Mark Payne ---02/15/2018 07:09:32 PM---Valencia, The SplitText processor does not change the ‘filename’ attribute of the FlowFile. So you w From: Mark Payne <marka...@hotmail.com> To: "users@nifi.apache.org" <users@nifi.apache.org> Date: 02/15/2018 07:09 PM Subject: Re: [Data Flow] File content not read completely Valencia, The SplitText processor does not change the ‘filename’ attribute of the FlowFile. So you will end up with multiple FlowFiles having the same name. PutFile may well be overwriting the same file many times - or failing to to write the files do to filename conflicts. You can resolve this, if it’s your problem, by adding an UpdateAttribute to your flow just before PutFile and changing the filename to something unique like ${UUID()} or ${filename}.${nextInt()} Hope this helps! -Mark Sent from my iPhone On Feb 15, 2018, at 4:59 AM, Valencia Serrao <vser...@us.ibm.com> wrote: Hi All, I've started hands-on with Nifi. Basic flows I was able to do without any issues. But currently I've tried adding more steps to the flow. Flow intent: Get a local file, split the text on new line, extract text based on regex, Put matched/unmatched data on respective kafka topics and finally write the kafka contents on the local targets set in PutFile. Current Flow steps: GetFile, SplitText, ExtractText, PutKafka -( 2 of them,one for matched and unmatched), and 2 PutFiles components. The issue I'm facing is that - after the flow execution I see only one entry in each of the 2 PutFile targets and rest of the content is not written to them even if the criteria is matched. I feel its not looping through the whole file or something like that. But I had read that Nifi flow is executed for all contents in source files. Maybe I've missed some config somewhere. It would be really helpful if anyone could help on this issue. Regards, Valencia
Re: [Data Flow] File content not read completely
Hey Valencia, I don't believe that PutFile allows you to append to a file, because doing so is rife with problems if you encounter any kind of error (IOException, for example) or if NiFi restarts in between. Instead, you should take a look at MergeContent. You can set the "Merge Strategy" to "Defragment" in order to re-assemble the FlowFiles that were split apart via SplitText. That being said, splitting the data apart, then using ExtractText, and merging back together can be quite expensive. If your data is JSON or CSV, then you should probably look into using the Record-Based Processors (PublishKafkaRecord, QueryRecord/PartitionRecord). This allows you to avoid ever splitting the data apart to begin with and as a result can perform dramatically better. Thanks -Mark On Feb 21, 2018, at 7:27 AM, Valencia Serrao <vser...@us.ibm.com<mailto:vser...@us.ibm.com>> wrote: Hi Mark, Yes! I could get all the required entries with the respective matched and unmatched segregated in different folders. Thanks a lot, Mark!! My next plan is to check and see how to append all the flowfiles with matched entries in one file. Regards, Valencia Valencia Serrao---02/16/2018 06:00:47 PM---Hi Mark, Thanks for looking into this. I am trying to put in the components you have suggested. I'll From: Valencia Serrao/Austin/Contr/IBM To: marka...@hotmail.com<mailto:marka...@hotmail.com> Cc: users@nifi.apache.org<mailto:users@nifi.apache.org> Date: 02/16/2018 06:00 PM Subject: Re: [Data Flow] File content not read completely Hi Mark, Thanks for looking into this. I am trying to put in the components you have suggested. I'll update. Regards, Valencia Mark Payne ---02/15/2018 07:09:32 PM---Valencia, The SplitText processor does not change the ‘filename’ attribute of the FlowFile. So you w From: Mark Payne <marka...@hotmail.com<mailto:marka...@hotmail.com>> To: "users@nifi.apache.org<mailto:users@nifi.apache.org>" <users@nifi.apache.org<mailto:users@nifi.apache.org>> Date: 02/15/2018 07:09 PM Subject: Re: [Data Flow] File content not read completely Valencia, The SplitText processor does not change the ‘filename’ attribute of the FlowFile. So you will end up with multiple FlowFiles having the same name. PutFile may well be overwriting the same file many times - or failing to to write the files do to filename conflicts. You can resolve this, if it’s your problem, by adding an UpdateAttribute to your flow just before PutFile and changing the filename to something unique like ${UUID()} or ${filename}.${nextInt()} Hope this helps! -Mark Sent from my iPhone On Feb 15, 2018, at 4:59 AM, Valencia Serrao <vser...@us.ibm.com<mailto:vser...@us.ibm.com>> wrote: Hi All, I've started hands-on with Nifi. Basic flows I was able to do without any issues. But currently I've tried adding more steps to the flow. Flow intent: Get a local file, split the text on new line, extract text based on regex, Put matched/unmatched data on respective kafka topics and finally write the kafka contents on the local targets set in PutFile. Current Flow steps: GetFile, SplitText, ExtractText, PutKafka -( 2 of them,one for matched and unmatched), and 2 PutFiles components. The issue I'm facing is that - after the flow execution I see only one entry in each of the 2 PutFile targets and rest of the content is not written to them even if the criteria is matched. I feel its not looping through the whole file or something like that. But I had read that Nifi flow is executed for all contents in source files. Maybe I've missed some config somewhere. It would be really helpful if anyone could help on this issue. Regards, Valencia
Re: [Data Flow] File content not read completely
Hi Mark, Yes! I could get all the required entries with the respective matched and unmatched segregated in different folders. Thanks a lot, Mark!! My next plan is to check and see how to append all the flowfiles with matched entries in one file. Regards, Valencia From: Valencia Serrao/Austin/Contr/IBM To: marka...@hotmail.com Cc: users@nifi.apache.org Date: 02/16/2018 06:00 PM Subject:Re: [Data Flow] File content not read completely Hi Mark, Thanks for looking into this. I am trying to put in the components you have suggested. I'll update. Regards, Valencia From: Mark Payne <marka...@hotmail.com> To: "users@nifi.apache.org" <users@nifi.apache.org> Date: 02/15/2018 07:09 PM Subject: Re: [Data Flow] File content not read completely Valencia, The SplitText processor does not change the ‘filename’ attribute of the FlowFile. So you will end up with multiple FlowFiles having the same name. PutFile may well be overwriting the same file many times - or failing to to write the files do to filename conflicts. You can resolve this, if it’s your problem, by adding an UpdateAttribute to your flow just before PutFile and changing the filename to something unique like ${UUID()} or $ {filename}.${nextInt()} Hope this helps! -Mark Sent from my iPhone On Feb 15, 2018, at 4:59 AM, Valencia Serrao <vser...@us.ibm.com> wrote: Hi All, I've started hands-on with Nifi. Basic flows I was able to do without any issues. But currently I've tried adding more steps to the flow. Flow intent: Get a local file, split the text on new line, extract text based on regex, Put matched/unmatched data on respective kafka topics and finally write the kafka contents on the local targets set in PutFile. Current Flow steps: GetFile, SplitText, ExtractText, PutKafka -( 2 of them,one for matched and unmatched), and 2 PutFiles components. The issue I'm facing is that - after the flow execution I see only one entry in each of the 2 PutFile targets and rest of the content is not written to them even if the criteria is matched. I feel its not looping through the whole file or something like that. But I had read that Nifi flow is executed for all contents in source files. Maybe I've missed some config somewhere. It would be really helpful if anyone could help on this issue. Regards, Valencia
Re: [Data Flow] File content not read completely
Hi Mark, Thanks for looking into this. I am trying to put in the components you have suggested. I'll update. Regards, Valencia From: Mark Payne <marka...@hotmail.com> To: "users@nifi.apache.org" <users@nifi.apache.org> Date: 02/15/2018 07:09 PM Subject: Re: [Data Flow] File content not read completely Valencia, The SplitText processor does not change the ‘filename’ attribute of the FlowFile. So you will end up with multiple FlowFiles having the same name. PutFile may well be overwriting the same file many times - or failing to to write the files do to filename conflicts. You can resolve this, if it’s your problem, by adding an UpdateAttribute to your flow just before PutFile and changing the filename to something unique like ${UUID()} or $ {filename}.${nextInt()} Hope this helps! -Mark Sent from my iPhone On Feb 15, 2018, at 4:59 AM, Valencia Serrao <vser...@us.ibm.com> wrote: Hi All, I've started hands-on with Nifi. Basic flows I was able to do without any issues. But currently I've tried adding more steps to the flow. Flow intent: Get a local file, split the text on new line, extract text based on regex, Put matched/unmatched data on respective kafka topics and finally write the kafka contents on the local targets set in PutFile. Current Flow steps: GetFile, SplitText, ExtractText, PutKafka -( 2 of them,one for matched and unmatched), and 2 PutFiles components. The issue I'm facing is that - after the flow execution I see only one entry in each of the 2 PutFile targets and rest of the content is not written to them even if the criteria is matched. I feel its not looping through the whole file or something like that. But I had read that Nifi flow is executed for all contents in source files. Maybe I've missed some config somewhere. It would be really helpful if anyone could help on this issue. Regards, Valencia
Re: [Data Flow] File content not read completely
Valencia, The SplitText processor does not change the ‘filename’ attribute of the FlowFile. So you will end up with multiple FlowFiles having the same name. PutFile may well be overwriting the same file many times - or failing to to write the files do to filename conflicts. You can resolve this, if it’s your problem, by adding an UpdateAttribute to your flow just before PutFile and changing the filename to something unique like ${UUID()} or ${filename}.${nextInt()} Hope this helps! -Mark Sent from my iPhone On Feb 15, 2018, at 4:59 AM, Valencia Serrao> wrote: Hi All, I've started hands-on with Nifi. Basic flows I was able to do without any issues. But currently I've tried adding more steps to the flow. Flow intent: Get a local file, split the text on new line, extract text based on regex, Put matched/unmatched data on respective kafka topics and finally write the kafka contents on the local targets set in PutFile. Current Flow steps: GetFile, SplitText, ExtractText, PutKafka -( 2 of them,one for matched and unmatched), and 2 PutFiles components. The issue I'm facing is that - after the flow execution I see only one entry in each of the 2 PutFile targets and rest of the content is not written to them even if the criteria is matched. I feel its not looping through the whole file or something like that. But I had read that Nifi flow is executed for all contents in source files. Maybe I've missed some config somewhere. It would be really helpful if anyone could help on this issue. Regards, Valencia