Re: [Data Flow] File content not read completely

2018-02-22 Thread Valencia Serrao
Hi Mark,

Surely, PutFile is not the component to merge the content. I'd plan to use
some other component and you have given me good pointers. Thank you!

Regards,
Valencia



From:   Mark Payne <marka...@hotmail.com>
To: Valencia Serrao <vser...@us.ibm.com>, "users@nifi.apache.org"
<users@nifi.apache.org>
Date:   02/21/2018 06:37 PM
Subject:    Re: [Data Flow] File content not read completely



Hey Valencia,

I don't believe that PutFile allows you to append to a file, because doing
so is rife with
problems if you encounter any kind of error (IOException, for example) or
if NiFi restarts
in between. Instead, you should take a look at MergeContent. You can set
the "Merge Strategy"
to "Defragment" in order to re-assemble the FlowFiles that were split apart
via SplitText.

That being said, splitting the data apart, then using ExtractText, and
merging back together can
be quite expensive. If your data is JSON or CSV, then you should probably
look into using the Record-Based
Processors (PublishKafkaRecord, QueryRecord/PartitionRecord). This allows
you to avoid ever splitting
the data apart to begin with and as a result can perform dramatically
better.

Thanks
-Mark


  On Feb 21, 2018, at 7:27 AM, Valencia Serrao <vser...@us.ibm.com>
  wrote:



  Hi Mark,

  Yes! I could get all the required entries with the respective matched
  and unmatched segregated in different folders. Thanks a lot, Mark!!
  My next plan is to check and see how to append all the flowfiles with
  matched entries in one file.

  Regards,
  Valencia

  Valencia Serrao---02/16/2018 06:00:47 PM---Hi Mark,
  Thanks for looking into this. I am trying to put in the components
  you have suggested. I'll

  From: Valencia Serrao/Austin/Contr/IBM
  To: marka...@hotmail.com
  Cc: users@nifi.apache.org
      Date: 02/16/2018 06:00 PM
  Subject: Re: [Data Flow] File content not read completely




  Hi Mark,

  Thanks for looking into this. I am trying to put in the components
  you have suggested. I'll update.

  Regards,
  Valencia


  Mark Payne ---02/15/2018 07:09:32 PM---Valencia, The
  SplitText processor does not change the ‘filename’ attribute of the
  FlowFile. So you w

  From: Mark Payne <marka...@hotmail.com>
  To: "users@nifi.apache.org" <users@nifi.apache.org>
  Date: 02/15/2018 07:09 PM
  Subject: Re: [Data Flow] File content not read completely



  Valencia,

  The SplitText processor does not change the ‘filename’ attribute of
  the FlowFile. So you will end up with multiple FlowFiles having the
  same name. PutFile may well be overwriting the same file many times -
  or failing to to write the files do to filename conflicts. You can
  resolve this, if it’s your problem, by adding an UpdateAttribute to
  your flow just before PutFile and changing the filename to something
  unique like ${UUID()} or ${filename}.${nextInt()}

  Hope this helps!

  -Mark

  Sent from my iPhone

  On Feb 15, 2018, at 4:59 AM, Valencia Serrao <vser...@us.ibm.com>
  wrote:
  Hi All,

  I've started hands-on with Nifi. Basic flows I was able
  to do without any
  issues. But currently I've tried adding more steps to the
  flow.

  Flow intent: Get a local file, split the text on new
  line, extract text based on regex, Put matched/unmatched
  data on respective kafka topics and
  finally write the kafka contents on the local targets set
  in PutFile.
  Current Flow steps: GetFile, SplitText, ExtractText,
  PutKafka -( 2 of them,one for matched and unmatched), and
  2 PutFiles components.

  The issue I'm facing is that - after the flow execution I
  see only one entry in each of the 2 PutFile targets and
  rest of the content is not written to them even if the
  criteria is matched. I feel its not looping through the
  whole file or something like that. But I had read that
  Nifi flow is executed for all contents in source files.
  Maybe I've missed some config somewhere.

  It would be really helpful if anyone could help on this
  issue.

  Regards,
  Valencia








Re: [Data Flow] File content not read completely

2018-02-21 Thread Mark Payne
Hey Valencia,

I don't believe that PutFile allows you to append to a file, because doing so 
is rife with
problems if you encounter any kind of error (IOException, for example) or if 
NiFi restarts
in between. Instead, you should take a look at MergeContent. You can set the 
"Merge Strategy"
to "Defragment" in order to re-assemble the FlowFiles that were split apart via 
SplitText.

That being said, splitting the data apart, then using ExtractText, and merging 
back together can
be quite expensive. If your data is JSON or CSV, then you should probably look 
into using the Record-Based
Processors (PublishKafkaRecord, QueryRecord/PartitionRecord). This allows you 
to avoid ever splitting
the data apart to begin with and as a result can perform dramatically better.

Thanks
-Mark


On Feb 21, 2018, at 7:27 AM, Valencia Serrao 
<vser...@us.ibm.com<mailto:vser...@us.ibm.com>> wrote:


Hi Mark,

Yes! I could get all the required entries with the respective matched and 
unmatched segregated in different folders. Thanks a lot, Mark!!
My next plan is to check and see how to append all the flowfiles with matched 
entries in one file.

Regards,
Valencia

Valencia Serrao---02/16/2018 06:00:47 PM---Hi Mark, Thanks for 
looking into this. I am trying to put in the components you have suggested. I'll

From: Valencia Serrao/Austin/Contr/IBM
To: marka...@hotmail.com<mailto:marka...@hotmail.com>
Cc: users@nifi.apache.org<mailto:users@nifi.apache.org>
Date: 02/16/2018 06:00 PM
Subject: Re: [Data Flow] File content not read completely




Hi Mark,

Thanks for looking into this. I am trying to put in the components you have 
suggested. I'll update.

Regards,
Valencia


Mark Payne ---02/15/2018 07:09:32 PM---Valencia, The SplitText 
processor does not change the ‘filename’ attribute of the FlowFile. So you w

From: Mark Payne <marka...@hotmail.com<mailto:marka...@hotmail.com>>
To: "users@nifi.apache.org<mailto:users@nifi.apache.org>" 
<users@nifi.apache.org<mailto:users@nifi.apache.org>>
Date: 02/15/2018 07:09 PM
Subject: Re: [Data Flow] File content not read completely




Valencia,

The SplitText processor does not change the ‘filename’ attribute of the 
FlowFile. So you will end up with multiple FlowFiles having the same name. 
PutFile may well be overwriting the same file many times - or failing to to 
write the files do to filename conflicts. You can resolve this, if it’s your 
problem, by adding an UpdateAttribute to your flow just before PutFile and 
changing the filename to something unique like ${UUID()} or 
${filename}.${nextInt()}

Hope this helps!

-Mark

Sent from my iPhone

On Feb 15, 2018, at 4:59 AM, Valencia Serrao 
<vser...@us.ibm.com<mailto:vser...@us.ibm.com>> wrote:

Hi All,

I've started hands-on with Nifi. Basic flows I was able to do without any
issues. But currently I've tried adding more steps to the flow.

Flow intent: Get a local file, split the text on new line, extract text based 
on regex, Put matched/unmatched data on respective kafka topics and
finally write the kafka contents on the local targets set in PutFile.
Current Flow steps: GetFile, SplitText, ExtractText, PutKafka -( 2 of them,one 
for matched and unmatched), and 2 PutFiles components.

The issue I'm facing is that - after the flow execution I see only one entry in 
each of the 2 PutFile targets and rest of the content is not written to them 
even if the criteria is matched. I feel its not looping through the whole file 
or something like that. But I had read that Nifi flow is executed for all 
contents in source files. Maybe I've missed some config somewhere.

It would be really helpful if anyone could help on this issue.

Regards,
Valencia







Re: [Data Flow] File content not read completely

2018-02-21 Thread Valencia Serrao
Hi Mark,

Yes! I could get all the required entries with the respective matched and
unmatched segregated in different folders. Thanks a lot, Mark!!
My next plan is to check and see how to append all the flowfiles with
matched entries in one file.

Regards,
Valencia



From:   Valencia Serrao/Austin/Contr/IBM
To: marka...@hotmail.com
Cc: users@nifi.apache.org
Date:   02/16/2018 06:00 PM
Subject:Re: [Data Flow] File content not read completely


Hi Mark,

Thanks for looking into this. I am trying to put in the components you have
suggested. I'll update.

Regards,
Valencia




From:   Mark Payne <marka...@hotmail.com>
To: "users@nifi.apache.org" <users@nifi.apache.org>
Date:   02/15/2018 07:09 PM
Subject:    Re: [Data Flow] File content not read completely



Valencia,

The SplitText processor does not change the ‘filename’ attribute of the
FlowFile. So you will end up with multiple FlowFiles having the same name.
PutFile may well be overwriting the same file many times - or failing to to
write the files do to filename conflicts. You can resolve this, if it’s
your problem, by adding an UpdateAttribute to your flow just before PutFile
and changing the filename to something unique like ${UUID()} or $
{filename}.${nextInt()}

Hope this helps!

-Mark

Sent from my iPhone

On Feb 15, 2018, at 4:59 AM, Valencia Serrao <vser...@us.ibm.com> wrote:



  Hi All,

  I've started hands-on with Nifi. Basic flows I was able to do without
  any
  issues. But currently I've tried adding more steps to the flow.

  Flow intent: Get a local file, split the text on new line, extract
  text based on regex, Put matched/unmatched data on respective kafka
  topics and
  finally write the kafka contents on the local targets set in PutFile.
  Current Flow steps: GetFile, SplitText, ExtractText, PutKafka -( 2 of
  them,one for matched and unmatched), and 2 PutFiles components.

  The issue I'm facing is that - after the flow execution I see only
  one entry in each of the 2 PutFile targets and rest of the content is
  not written to them even if the criteria is matched. I feel its not
  looping through the whole file or something like that. But I had read
  that Nifi flow is executed for all contents in source files. Maybe
  I've missed some config somewhere.

  It would be really helpful if anyone could help on this issue.

  Regards,
  Valencia










Re: [Data Flow] File content not read completely

2018-02-16 Thread Valencia Serrao
Hi Mark,

Thanks for looking into this. I am trying to put in the components you have
suggested. I'll update.

Regards,
Valencia



From:   Mark Payne <marka...@hotmail.com>
To: "users@nifi.apache.org" <users@nifi.apache.org>
Date:   02/15/2018 07:09 PM
Subject:    Re: [Data Flow] File content not read completely



Valencia,

The SplitText processor does not change the ‘filename’ attribute of the
FlowFile. So you will end up with multiple FlowFiles having the same name.
PutFile may well be overwriting the same file many times - or failing to to
write the files do to filename conflicts. You can resolve this, if it’s
your problem, by adding an UpdateAttribute to your flow just before PutFile
and changing the filename to something unique like ${UUID()} or $
{filename}.${nextInt()}

Hope this helps!

-Mark

Sent from my iPhone

On Feb 15, 2018, at 4:59 AM, Valencia Serrao <vser...@us.ibm.com> wrote:



  Hi All,

  I've started hands-on with Nifi. Basic flows I was able to do without
  any
  issues. But currently I've tried adding more steps to the flow.

  Flow intent: Get a local file, split the text on new line, extract
  text based on regex, Put matched/unmatched data on respective kafka
  topics and
  finally write the kafka contents on the local targets set in PutFile.
  Current Flow steps: GetFile, SplitText, ExtractText, PutKafka -( 2 of
  them,one for matched and unmatched), and 2 PutFiles components.

  The issue I'm facing is that - after the flow execution I see only
  one entry in each of the 2 PutFile targets and rest of the content is
  not written to them even if the criteria is matched. I feel its not
  looping through the whole file or something like that. But I had read
  that Nifi flow is executed for all contents in source files. Maybe
  I've missed some config somewhere.

  It would be really helpful if anyone could help on this issue.

  Regards,
  Valencia







Re: [Data Flow] File content not read completely

2018-02-15 Thread Mark Payne
Valencia,

The SplitText processor does not change the ‘filename’ attribute of the 
FlowFile. So you will end up with multiple FlowFiles having the same name. 
PutFile may well be overwriting the same file many times - or failing to to 
write the files do to filename conflicts. You can resolve this, if it’s your 
problem, by adding an UpdateAttribute to your flow just before PutFile and 
changing the filename to something unique like ${UUID()} or 
${filename}.${nextInt()}

Hope this helps!

-Mark

Sent from my iPhone

On Feb 15, 2018, at 4:59 AM, Valencia Serrao 
> wrote:


Hi All,

I've started hands-on with Nifi. Basic flows I was able to do without any
issues. But currently I've tried adding more steps to the flow.

Flow intent: Get a local file, split the text on new line, extract text based 
on regex, Put matched/unmatched data on respective kafka topics and
finally write the kafka contents on the local targets set in PutFile.
Current Flow steps: GetFile, SplitText, ExtractText, PutKafka -( 2 of them,one 
for matched and unmatched), and 2 PutFiles components.

The issue I'm facing is that - after the flow execution I see only one entry in 
each of the 2 PutFile targets and rest of the content is not written to them 
even if the criteria is matched. I feel its not looping through the whole file 
or something like that. But I had read that Nifi flow is executed for all 
contents in source files. Maybe I've missed some config somewhere.

It would be really helpful if anyone could help on this issue.

Regards,
Valencia