UpdateRecord and arrays

2021-08-11 Thread Robert R. Bruno
Is there a way to use UpdateRecord to update the value of a record that is
an array? In the example we have, we'd like to take a string from one field
in the record which is a string and copy it into another field that is an
array field but currently is null in the record.  So something like:

Input:

{  "test": "test1", "test_array": null}

Desired output:

{  "test": "test1", "test_array": ["test1"] }

We were going to look into if joltransform can do it if UpdateRecord can't.

Thanks!

Robert


Re: Broken pipe write failed errors

2021-06-01 Thread Robert R. Bruno
We wanted to give you another data point.  We tried 3.14.9 and 1.20 for the
two libraries and no errors.  Also, our services are http not https.  Hope
that helps.  Thanks for mentioning the OkHttp client change.  Any chance a
future version of nifi could move back to the 3.x client series?

On Tue, Jun 1, 2021 at 2:50 PM David Handermann 
wrote:

> Robert,
>
> Thanks for the update, that's very interesting.  Version 3.8.1 is several
> years old and is missing a number of updates related to HTTP/2 and TLS.
> OkHttp 3.12.0 introduced support for TLS 1.3 when running on a supported
> JVM, so if the connection is occurring over HTTPS, that may be part of the
> equation.  Java 11 supports TLS 1.3, whereas Java 8 did not support TLS 1.3
> until more recent updates.  It would be interesting to know if your
> configuration still works with a more recent version of OkHttp in the 3.x
> series.  Thanks again for providing the feedback.
>
> Regards,
> David Handermann
>
> On Tue, Jun 1, 2021 at 1:31 PM Robert R. Bruno  wrote:
>
>> David,
>>
>> Quick update for you.  We decided after a bit of troubleshooting with
>> zero luck to just downgrade the OKHttp to 3.8.1 and the okhttp-digest to
>> 1.18, and no more errors.  Not sure what to say.
>>
>> Robert
>>
>> On Mon, May 31, 2021 at 8:41 AM David Handermann <
>> exceptionfact...@gmail.com> wrote:
>>
>>> Hi Robert,
>>>
>>> Thanks for providing the additional details.  It should be possible to
>>> replace the current version of OkHttp 4.9.1 with an older version to see if
>>> that makes a difference.  It would also be helpful to know whether the
>>> remote server supports HTTP/2.  Newer versions of OkHttp have improved
>>> support for HTTP/2, but it also has different connection characteristics.
>>> Setting the Disable HTTP/2 property to True in InvokeHTTP would force the
>>> use of HTTP/1.1.  I would not necessarily expect to see errors on the
>>> server side, but knowing whether the remote server has a connection or
>>> write timeout property would be useful.
>>>
>>> Regards,
>>> David Handermann
>>>
>>> On Sun, May 30, 2021 at 4:54 AM Robert R. Bruno 
>>> wrote:
>>>
>>>> When seeing the error we put our timeouts values in the processor both
>>>> to 5 mins as a test and still saw the errors and well before 5 minutes.  We
>>>> also slowed the processor down a lot and still were seeing the error.
>>>> Failed attempts will often succeed just fine but not always.
>>>>
>>>> As an easy test could we just rebuild with the older http client
>>>> library or did a lot more change with the processor?
>>>>
>>>> We do have access to both endpoints and plan to dig deeper there as
>>>> well, but initial looking did not show errors on server side.
>>>>
>>>> On Sat, May 29, 2021, 23:26 David Handermann <
>>>> exceptionfact...@apache.org> wrote:
>>>>
>>>>> Hi Robert,
>>>>>
>>>>> It would be helpful to know the settings for the Read Timeout and Idle
>>>>> Timeout properties on the InvokeHTTP processors.  If you have access to 
>>>>> the
>>>>> remote service being called, it would also be interesting to know if there
>>>>> are timeouts on that side of the connection.  NiFi 1.13.2 includes a much
>>>>> newer version of the OkHttp client library, which InvokeHTTP uses, so the
>>>>> internal connection handling has gone through some changes.  In general,
>>>>> broken pipe errors suggest that the connection is timing out at some 
>>>>> point,
>>>>> which may be related to a number of a variety of factors such as the 
>>>>> number
>>>>> of connections, payload sizes, network latency, or local resource
>>>>> consumption.
>>>>>
>>>>> Regards,
>>>>> David Handermann
>>>>>
>>>>> On Sat, May 29, 2021 at 2:08 PM Joe Witt  wrote:
>>>>>
>>>>>> K. We have seen specific jvm versions causing issues with socket
>>>>>> handling.  But had not seen it on Java 11 though may be possible.   Is
>>>>>> there a full stack trace?
>>>>>>
>>>>>> On Sat, May 29, 2021 at 12:00 PM Robert R. Bruno 
>>>>>> wrote:
>>>>>>
>>>>>>> We upgraded to java 11 when we upgrade to 1.13.2 we were on java 8
>>>>>>>

Re: Broken pipe write failed errors

2021-06-01 Thread Robert R. Bruno
David,

Quick update for you.  We decided after a bit of troubleshooting with zero
luck to just downgrade the OKHttp to 3.8.1 and the okhttp-digest to
1.18, and no more errors.  Not sure what to say.

Robert

On Mon, May 31, 2021 at 8:41 AM David Handermann 
wrote:

> Hi Robert,
>
> Thanks for providing the additional details.  It should be possible to
> replace the current version of OkHttp 4.9.1 with an older version to see if
> that makes a difference.  It would also be helpful to know whether the
> remote server supports HTTP/2.  Newer versions of OkHttp have improved
> support for HTTP/2, but it also has different connection characteristics.
> Setting the Disable HTTP/2 property to True in InvokeHTTP would force the
> use of HTTP/1.1.  I would not necessarily expect to see errors on the
> server side, but knowing whether the remote server has a connection or
> write timeout property would be useful.
>
> Regards,
> David Handermann
>
> On Sun, May 30, 2021 at 4:54 AM Robert R. Bruno  wrote:
>
>> When seeing the error we put our timeouts values in the processor both to
>> 5 mins as a test and still saw the errors and well before 5 minutes.  We
>> also slowed the processor down a lot and still were seeing the error.
>> Failed attempts will often succeed just fine but not always.
>>
>> As an easy test could we just rebuild with the older http client library
>> or did a lot more change with the processor?
>>
>> We do have access to both endpoints and plan to dig deeper there as well,
>> but initial looking did not show errors on server side.
>>
>> On Sat, May 29, 2021, 23:26 David Handermann 
>> wrote:
>>
>>> Hi Robert,
>>>
>>> It would be helpful to know the settings for the Read Timeout and Idle
>>> Timeout properties on the InvokeHTTP processors.  If you have access to the
>>> remote service being called, it would also be interesting to know if there
>>> are timeouts on that side of the connection.  NiFi 1.13.2 includes a much
>>> newer version of the OkHttp client library, which InvokeHTTP uses, so the
>>> internal connection handling has gone through some changes.  In general,
>>> broken pipe errors suggest that the connection is timing out at some point,
>>> which may be related to a number of a variety of factors such as the number
>>> of connections, payload sizes, network latency, or local resource
>>> consumption.
>>>
>>> Regards,
>>> David Handermann
>>>
>>> On Sat, May 29, 2021 at 2:08 PM Joe Witt  wrote:
>>>
>>>> K. We have seen specific jvm versions causing issues with socket
>>>> handling.  But had not seen it on Java 11 though may be possible.   Is
>>>> there a full stack trace?
>>>>
>>>> On Sat, May 29, 2021 at 12:00 PM Robert R. Bruno 
>>>> wrote:
>>>>
>>>>> We upgraded to java 11 when we upgrade to 1.13.2 we were on java 8
>>>>> with 1.9.2.
>>>>>
>>>>> On Sat, May 29, 2021, 14:21 Joe Witt  wrote:
>>>>>
>>>>>> What JVM are you using?
>>>>>>
>>>>>> Thanks
>>>>>>
>>>>>> On Sat, May 29, 2021 at 11:16 AM Juan Pablo Gardella <
>>>>>> gardellajuanpa...@gmail.com> wrote:
>>>>>>
>>>>>>> Not related to Nifi, but I faced the same type of issue for
>>>>>>> endpoints behind a proxy which takes more than 30 seconds to answer. 
>>>>>>> Fixed
>>>>>>> by replacing Apache Http client by OkHttp. I did not investigate 
>>>>>>> further,
>>>>>>> just simply replaced one library by another and the error was fixed.
>>>>>>>
>>>>>>>
>>>>>>> Juan
>>>>>>>
>>>>>>> On Sat, 29 May 2021 at 15:08, Robert R. Bruno 
>>>>>>> wrote:
>>>>>>>
>>>>>>>> I wanted to see if anyone has any ideas on this one.  Since
>>>>>>>> upgrading to 1.13.2 from 1.9.2 we are starting to see broken pipe 
>>>>>>>> (write
>>>>>>>> failed) errors from a few invokeHttp processers.
>>>>>>>>
>>>>>>>> It is happening to processors talking to different endpoints, so I
>>>>>>>> am suspecting it is on the nifi side.  We are now using load balanced
>>>>>>>> queues throughout our flow.  Is it possible we are hitting a http
>>>>>>>> connection resource issue or something like that? A total guess I'll 
>>>>>>>> admit.
>>>>>>>>
>>>>>>>> If this could be it, does anyone know which parameter(s) to play
>>>>>>>> with in the properties file?  I know there is one setting for jetty 
>>>>>>>> threads
>>>>>>>> and another for max concurrent requests, but it isn't quite clear to 
>>>>>>>> me of
>>>>>>>> they are at all involved with invokeHttp calls.
>>>>>>>>
>>>>>>>> Thanks in advance!
>>>>>>>>
>>>>>>>> Robert
>>>>>>>>
>>>>>>>


Re: Broken pipe write failed errors

2021-05-30 Thread Robert R. Bruno
When seeing the error we put our timeouts values in the processor both to 5
mins as a test and still saw the errors and well before 5 minutes.  We also
slowed the processor down a lot and still were seeing the error.  Failed
attempts will often succeed just fine but not always.

As an easy test could we just rebuild with the older http client library or
did a lot more change with the processor?

We do have access to both endpoints and plan to dig deeper there as well,
but initial looking did not show errors on server side.

On Sat, May 29, 2021, 23:26 David Handermann 
wrote:

> Hi Robert,
>
> It would be helpful to know the settings for the Read Timeout and Idle
> Timeout properties on the InvokeHTTP processors.  If you have access to the
> remote service being called, it would also be interesting to know if there
> are timeouts on that side of the connection.  NiFi 1.13.2 includes a much
> newer version of the OkHttp client library, which InvokeHTTP uses, so the
> internal connection handling has gone through some changes.  In general,
> broken pipe errors suggest that the connection is timing out at some point,
> which may be related to a number of a variety of factors such as the number
> of connections, payload sizes, network latency, or local resource
> consumption.
>
> Regards,
> David Handermann
>
> On Sat, May 29, 2021 at 2:08 PM Joe Witt  wrote:
>
>> K. We have seen specific jvm versions causing issues with socket
>> handling.  But had not seen it on Java 11 though may be possible.   Is
>> there a full stack trace?
>>
>> On Sat, May 29, 2021 at 12:00 PM Robert R. Bruno 
>> wrote:
>>
>>> We upgraded to java 11 when we upgrade to 1.13.2 we were on java 8 with
>>> 1.9.2.
>>>
>>> On Sat, May 29, 2021, 14:21 Joe Witt  wrote:
>>>
>>>> What JVM are you using?
>>>>
>>>> Thanks
>>>>
>>>> On Sat, May 29, 2021 at 11:16 AM Juan Pablo Gardella <
>>>> gardellajuanpa...@gmail.com> wrote:
>>>>
>>>>> Not related to Nifi, but I faced the same type of issue for endpoints
>>>>> behind a proxy which takes more than 30 seconds to answer. Fixed by
>>>>> replacing Apache Http client by OkHttp. I did not investigate further, 
>>>>> just
>>>>> simply replaced one library by another and the error was fixed.
>>>>>
>>>>>
>>>>> Juan
>>>>>
>>>>> On Sat, 29 May 2021 at 15:08, Robert R. Bruno 
>>>>> wrote:
>>>>>
>>>>>> I wanted to see if anyone has any ideas on this one.  Since upgrading
>>>>>> to 1.13.2 from 1.9.2 we are starting to see broken pipe (write failed)
>>>>>> errors from a few invokeHttp processers.
>>>>>>
>>>>>> It is happening to processors talking to different endpoints, so I am
>>>>>> suspecting it is on the nifi side.  We are now using load balanced queues
>>>>>> throughout our flow.  Is it possible we are hitting a http connection
>>>>>> resource issue or something like that? A total guess I'll admit.
>>>>>>
>>>>>> If this could be it, does anyone know which parameter(s) to play with
>>>>>> in the properties file?  I know there is one setting for jetty threads 
>>>>>> and
>>>>>> another for max concurrent requests, but it isn't quite clear to me of 
>>>>>> they
>>>>>> are at all involved with invokeHttp calls.
>>>>>>
>>>>>> Thanks in advance!
>>>>>>
>>>>>> Robert
>>>>>>
>>>>>


Re: Broken pipe write failed errors

2021-05-29 Thread Robert R. Bruno
We upgraded to java 11 when we upgrade to 1.13.2 we were on java 8 with
1.9.2.

On Sat, May 29, 2021, 14:21 Joe Witt  wrote:

> What JVM are you using?
>
> Thanks
>
> On Sat, May 29, 2021 at 11:16 AM Juan Pablo Gardella <
> gardellajuanpa...@gmail.com> wrote:
>
>> Not related to Nifi, but I faced the same type of issue for endpoints
>> behind a proxy which takes more than 30 seconds to answer. Fixed by
>> replacing Apache Http client by OkHttp. I did not investigate further, just
>> simply replaced one library by another and the error was fixed.
>>
>>
>> Juan
>>
>> On Sat, 29 May 2021 at 15:08, Robert R. Bruno  wrote:
>>
>>> I wanted to see if anyone has any ideas on this one.  Since upgrading to
>>> 1.13.2 from 1.9.2 we are starting to see broken pipe (write failed) errors
>>> from a few invokeHttp processers.
>>>
>>> It is happening to processors talking to different endpoints, so I am
>>> suspecting it is on the nifi side.  We are now using load balanced queues
>>> throughout our flow.  Is it possible we are hitting a http connection
>>> resource issue or something like that? A total guess I'll admit.
>>>
>>> If this could be it, does anyone know which parameter(s) to play with in
>>> the properties file?  I know there is one setting for jetty threads and
>>> another for max concurrent requests, but it isn't quite clear to me of they
>>> are at all involved with invokeHttp calls.
>>>
>>> Thanks in advance!
>>>
>>> Robert
>>>
>>


Broken pipe write failed errors

2021-05-29 Thread Robert R. Bruno
I wanted to see if anyone has any ideas on this one.  Since upgrading to
1.13.2 from 1.9.2 we are starting to see broken pipe (write failed) errors
from a few invokeHttp processers.

It is happening to processors talking to different endpoints, so I am
suspecting it is on the nifi side.  We are now using load balanced queues
throughout our flow.  Is it possible we are hitting a http connection
resource issue or something like that? A total guess I'll admit.

If this could be it, does anyone know which parameter(s) to play with in
the properties file?  I know there is one setting for jetty threads and
another for max concurrent requests, but it isn't quite clear to me of they
are at all involved with invokeHttp calls.

Thanks in advance!

Robert


A few quick questions

2021-05-28 Thread Robert R. Bruno
We recently moved to version 1.13.2 and are finally using the registry in
earnest along with parameter contexts.  Being able to store sensitive
values is amazing!

Had two quick questions:

1.  Any good way to turn on/off all controller services in a process group
for the UI?

2. Since you can only have one parameter context per process group, how are
you all handling when you have a specific parameter that is used among many
process groups?  I am trying to avoid having one global parameter context,
but perhaps that is the best practice?

Any thoughts on eventually being able to select more than one parameter
context for a process?  Perhaps the parameter context then would act as a
namespace.  Just a thought.

Thanks,
Robert


Re: Detect duplicate records

2020-08-15 Thread Robert R. Bruno
Yep we were leaning towards off loading it to an external program and then
putting data back to nifi for final delivery.  Looks like that will be best
from the sounds of it.  Again thanks all!

On Sat, Aug 15, 2020, 16:24 Josh Friberg-Wyckoff 
wrote:

> If that is the case and this is high volume like you say, I would think it
> would be more efficient to offload the task to a separate program then
> having a processor for NiFi doing it.
>
> On Sat, Aug 15, 2020, 2:52 PM Otto Fowler  wrote:
>
>> I was working on something for this, but in discussion with some of sme’s
>> on the project, decided to shelve it.  I don’t think I had gotten to the
>> point of a jira.
>>
>> https://apachenifi.slack.com/archives/C0L9S92JY/p1589911056303500
>>
>> On August 15, 2020 at 14:12:07, Robert R. Bruno (rbru...@gmail.com)
>> wrote:
>>
>> Sorry I should have been more clear.  My need is to detect if each record
>> has been seen in the past.  So I need a solution that would be able to go
>> record by record against something like a redis cache that would tell me
>> either first time the record was seen or not and update the cache
>> accordingly.  Guessing nothing like that for records exists at this point?
>>
>> We've used DetectDuplicate to do this for entire flow files, but have the
>> need to do this per record with a preference of not splitting the flow
>> files.
>>
>> Thanks all!
>>
>> On Sat, Aug 15, 2020, 13:38 Jens M. Kofoed 
>> wrote:
>>
>>> Just some info about DISTINCT. In MySQL a union is much much faster than
>>> a DISTINCT. The DICTINCT create a new temp table with the result of the
>>> query. Sorting it and removing duplicates.
>>> If you make a union with a select id=-1, the result is exactly the same.
>>> All duplicates are removed. A DISTINCT which takes 2 min. and 45 sec. only
>>> takes about  15 sec with a union.
>>> kind regards.
>>>
>>> I don't know which engine is in NIFI.
>>> Jens M. Kofoed
>>>
>>> Den lør. 15. aug. 2020 kl. 18.08 skrev Matt Burgess <
>>> mattyb...@apache.org>:
>>>
>>>> In addition to the SO answer, if you know all the fields in the
>>>> record, you can use QueryRecord with SELECT DISTINCT field1,field2...
>>>> FROM FLOWFILE. The SO answer might be more performant but is more
>>>> complex, and QueryRecord will do the operations in-memory so it might
>>>> not handle very large flowfiles.
>>>>
>>>> The current pull request for the Jira has not been active and is not
>>>> in mergeable shape, perhaps I'll get some time to pick it up and get
>>>> it across the finish line :)
>>>>
>>>> Regards,
>>>> Matt
>>>>
>>>> On Sat, Aug 15, 2020 at 11:47 AM Josh Friberg-Wyckoff
>>>>  wrote:
>>>> >
>>>> > Gosh, I should search the NiFi resources first.  They have current
>>>> JIRA for what you are wanting.
>>>> > https://issues.apache.org/jira/browse/NIFI-6047
>>>> >
>>>> > On Sat, Aug 15, 2020 at 10:35 AM Josh Friberg-Wyckoff <
>>>> j...@thefribergs.com> wrote:
>>>> >>
>>>> >> This looks interesting as well.
>>>> >>
>>>> https://stackoverflow.com/questions/52674532/remove-duplicates-in-nifi
>>>> >>
>>>> >> On Sat, Aug 15, 2020 at 10:23 AM Josh Friberg-Wyckoff <
>>>> j...@thefribergs.com> wrote:
>>>> >>>
>>>> >>> In theory I would think you could use the ExecuteStreamCommand to
>>>> use the builtin Operating System sort commands to grab unique records.  The
>>>> Windows Sort command has an undocumented unique option.  The sort command
>>>> on Linux distros also has a unique option as well.
>>>> >>>
>>>> >>> On Sat, Aug 15, 2020 at 5:53 AM Robert R. Bruno 
>>>> wrote:
>>>> >>>>
>>>> >>>> I wanted to see if anyone knew is there a clever way to detect
>>>> duplicate records much like you can with entire flow files with
>>>> DetectDuplicate?  I'd really rather not have to split my records into
>>>> individual flow files since this flow is such high volume.
>>>> >>>>
>>>> >>>> Thanks so much in advance.
>>>>
>>>


Re: Detect duplicate records

2020-08-15 Thread Robert R. Bruno
Sorry I should have been more clear.  My need is to detect if each record
has been seen in the past.  So I need a solution that would be able to go
record by record against something like a redis cache that would tell me
either first time the record was seen or not and update the cache
accordingly.  Guessing nothing like that for records exists at this point?

We've used DetectDuplicate to do this for entire flow files, but have the
need to do this per record with a preference of not splitting the flow
files.

Thanks all!

On Sat, Aug 15, 2020, 13:38 Jens M. Kofoed  wrote:

> Just some info about DISTINCT. In MySQL a union is much much faster than a
> DISTINCT. The DICTINCT create a new temp table with the result of the
> query. Sorting it and removing duplicates.
> If you make a union with a select id=-1, the result is exactly the same.
> All duplicates are removed. A DISTINCT which takes 2 min. and 45 sec. only
> takes about  15 sec with a union.
> kind regards.
>
> I don't know which engine is in NIFI.
> Jens M. Kofoed
>
> Den lør. 15. aug. 2020 kl. 18.08 skrev Matt Burgess  >:
>
>> In addition to the SO answer, if you know all the fields in the
>> record, you can use QueryRecord with SELECT DISTINCT field1,field2...
>> FROM FLOWFILE. The SO answer might be more performant but is more
>> complex, and QueryRecord will do the operations in-memory so it might
>> not handle very large flowfiles.
>>
>> The current pull request for the Jira has not been active and is not
>> in mergeable shape, perhaps I'll get some time to pick it up and get
>> it across the finish line :)
>>
>> Regards,
>> Matt
>>
>> On Sat, Aug 15, 2020 at 11:47 AM Josh Friberg-Wyckoff
>>  wrote:
>> >
>> > Gosh, I should search the NiFi resources first.  They have current JIRA
>> for what you are wanting.
>> > https://issues.apache.org/jira/browse/NIFI-6047
>> >
>> > On Sat, Aug 15, 2020 at 10:35 AM Josh Friberg-Wyckoff <
>> j...@thefribergs.com> wrote:
>> >>
>> >> This looks interesting as well.
>> >> https://stackoverflow.com/questions/52674532/remove-duplicates-in-nifi
>> >>
>> >> On Sat, Aug 15, 2020 at 10:23 AM Josh Friberg-Wyckoff <
>> j...@thefribergs.com> wrote:
>> >>>
>> >>> In theory I would think you could use the ExecuteStreamCommand to use
>> the builtin Operating System sort commands to grab unique records.  The
>> Windows Sort command has an undocumented unique option.  The sort command
>> on Linux distros also has a unique option as well.
>> >>>
>> >>> On Sat, Aug 15, 2020 at 5:53 AM Robert R. Bruno 
>> wrote:
>> >>>>
>> >>>> I wanted to see if anyone knew is there a clever way to detect
>> duplicate records much like you can with entire flow files with
>> DetectDuplicate?  I'd really rather not have to split my records into
>> individual flow files since this flow is such high volume.
>> >>>>
>> >>>> Thanks so much in advance.
>>
>


Detect duplicate records

2020-08-15 Thread Robert R. Bruno
I wanted to see if anyone knew is there a clever way to detect duplicate
records much like you can with entire flow files with DetectDuplicate?  I'd
really rather not have to split my records into individual flow files since
this flow is such high volume.

Thanks so much in advance.


Re: MergeRecord performance

2020-06-01 Thread Robert R. Bruno
I have back pressure object threshold set to 10 on that queue and my
swap threshold is 20.  I don't think though when I had the issue the
number of flow files was very high in the queue in question since the issue
was now at updaterecord after I did a mergecontent that greatly reduced the
number of flow files.

On Mon, Jun 1, 2020, 16:02 Mark Payne  wrote:

> Hey Robert,
>
> How big are the FlowFile queues that you have in front of your
> MergeContent/MergeRecord processors? Or, more specifically, what do you
> have configured for the back pressure threshold? I ask because there was a
> fix in 1.11.0 [1] that had to do with ordering when swapping and ensuring
> that data remains in the same order after being swapped out and swapped
> back in when using the FIFO prioritizer.
>
> Some of the changes there can actually change the thresholds when we
> perform swapping. So I’m curious if you’re seeing a lot of swapping of
> FlowFiles to/from disk when running in 1.11.4 that you didn’t have in
> 1.9.2. Are you seeing logs about swapping occurring? And of note, when I
> talk about swapping, I’m talking about NiFi-level FlowFile swapping, not
> OS-level swapping.
>
> Thanks
> -Mark
>
> [1`] https://issues.apache.org/jira/browse/NIFI-7011
>
>
> On May 22, 2020, at 10:35 AM, Robert R. Bruno  wrote:
>
> Sorry one other thing I thought of that may help.  I noticed on 1.11.4
> when I would stop the updaterecord processor it would take a long period of
> time for the processor to stop (threads were hanging), but when I went back
> to 1.9.2 the processor would stop in a very timely manner.  Not sure if
> that helps, but just another data point.
>
> On Fri, May 22, 2020 at 9:22 AM Robert R. Bruno  wrote:
>
>> I had more updates on this.
>>
>> Yesterday I again attempted to upgrade one of our 1.9.2 clusters that is
>> now using mergecontent vs mergerecord.  The flow had been running on 1.9.2
>> for about a week with no issue.  I did the upgrade to 1.11.4, and saw about
>> 3 of 10 nodes not being able to keep up.  The load on these 3 nodes became
>> very high.  For perspective, a load of 80 is about as high as we like to
>> see these boxes, and some were getting as high as 120.  I saw one
>> bottleneck forming at an updaterecord.  I tried giving that processor a few
>> more threads to see if it would help work off the backlog.  No matter what
>> I tried (lowering thread, changing mergecontent sizes, etc) the load
>> wouldn't go down on those 3 boxes and they had either a slowing growing
>> backlog or would maintain the backlog they had.
>>
>> I then decide to downgrade the nifi back to 1.9.2 with out rebooting the
>> boxes.  I kept all flow files and content as they were.  Upon downgrading
>> no loads were above 50 and this was only on the boxes that had the backlog
>> that formed when we did the upgrade.  The backlog on the 3 boxes worked off
>> with no issue at all, and without me having to make changes to the flow.
>> Once backlogs were worked off then our loads all sat around 20.
>>
>> This is a similar behavior from what we saw before, but just in another
>> part of the flow.  Has anyone else seen anything like this on 1.11.4?
>> Unfortunately for now we can't upgrade due to this problem.  Any thoughts
>> from anyone would be greatly appreciated.
>>
>> Thanks,
>> Robert
>>
>> On Fri, May 8, 2020 at 4:47 PM Robert R. Bruno  wrote:
>>
>>> Sorry for the delayed answer, but was doing some testing this week and
>>> found a few more things out.
>>>
>>> First to answer some of your questions.
>>>
>>> I would say with no actual raw numbers, it was worse than a 10%
>>> degradation.  I say this since the flow was badly backing up, and a 10%
>>> decrease in performance should not have caused this since normally we can
>>> work off a backlog of data with no issues.  I looked at my mergerecord
>>> settings, and I am largely using size as the limiting factor.  I have a max
>>> size of 4MB and a max bin age of 1 minute followed by a second mergerecord
>>> with a max size of 32MB and a max bin age of 5 minutes.
>>>
>>> I changed our flow a bit on a test system that was running 1.11.4, and
>>> discovered the following:
>>>
>>> I changed mergerecords to mergecontents.  I used pretty much all of the
>>> same settings in the mergecontent but had the mergecontent deal with the
>>> avro natively.  In this flow, it currently seems like I don't need to chain
>>> multiple mergecontents together like I did with mergerecords.
>>>
>>> I then fed the merged avro f

Re: MergeRecord performance

2020-05-08 Thread Robert R. Bruno
Sorry for the delayed answer, but was doing some testing this week and
found a few more things out.

First to answer some of your questions.

I would say with no actual raw numbers, it was worse than a 10%
degradation.  I say this since the flow was badly backing up, and a 10%
decrease in performance should not have caused this since normally we can
work off a backlog of data with no issues.  I looked at my mergerecord
settings, and I am largely using size as the limiting factor.  I have a max
size of 4MB and a max bin age of 1 minute followed by a second mergerecord
with a max size of 32MB and a max bin age of 5 minutes.

I changed our flow a bit on a test system that was running 1.11.4, and
discovered the following:

I changed mergerecords to mergecontents.  I used pretty much all of the
same settings in the mergecontent but had the mergecontent deal with the
avro natively.  In this flow, it currently seems like I don't need to chain
multiple mergecontents together like I did with mergerecords.

I then fed the merged avro from the mergecontent to a convertrecord to
convert the data to parquet.  The convertrecord was tremendously slower
than the mergecontent and become a bottleneck.  I then switched the
convertrecord to the convertavrotoparquet processor.  Convertavrotoparquet
can easily handle the output speed of the mergecontent and then some.

My hope is to make these changes to our actual flow soon, and then upgrade
to 1.11.4 again.  I'll let you know how that goes.

Thanks,
Robert

On Mon, Apr 27, 2020 at 9:26 AM Mark Payne  wrote:

> Robert,
>
> What kind of performance degradation were you seeing here? I put together
> some simple flows to see if I could reproduce using 1.9.2 and current
> master.
> My flow consisted of GenerateFlowFile (generating 2 CSV rows per FlowFile)
> -> ConvertRecord (to Avro) -> MergeRecord (read Avro, write Avro) ->
> UpdateAttribute to try to mimic what you’ve got, given the details that I
> have.
>
> I did see a performance degradation on the order of about 10%. So on my
> laptop I went from processing 2.49 MM FlowFiles in 1.9.2 in 5 mins to 2.25
> MM on the master branch. Interestingly, I saw no real change when I enabled
> Snappy compression.
>
> For a point of reference, I also tried removing MergeRecord and just
> Generate -> Convert -> UpdateAttribute. I saw the same roughly 10%
> performance degradation.
>
> I’m curious if you’re seeing more than that. If so, I think a template
> would be helpful to understand what’s different.
>
> Thanks
> -Mark
>
>
> On Apr 24, 2020, at 4:50 PM, Robert R. Bruno  wrote:
>
> Joe,
>
> In that part of the flow, we are using avro readers and writers.  We are
> using snappy compression (which could be part of the problem).  Since we
> are using avro at that point the embedded schema is being used by the
> reader and the writer is using the schema name property along with an
> internal schema registry in nifi.
>
> I can see what could potentially be shared.
>
> Thanks
>
> On Fri, Apr 24, 2020 at 4:41 PM Joe Witt  wrote:
>
>> Robert,
>>
>> Can you please detail the record readers and writers involved and how
>> schemas are accessed?  There can be very important performance related
>> changes in the parsers/serializers of the given formats.  And we've added a
>> lot to make schema caching really capable but you have to opt into it.  It
>> is of course possible MergeRecord itself is the culprit for performance
>> reduction but lets get a more full picture here.
>>
>> Are you able to share a template and sample data which we can use to
>> replicate?
>>
>> Thanks
>>
>> On Fri, Apr 24, 2020 at 4:38 PM Robert R. Bruno 
>> wrote:
>>
>>> I wanted to see if anyone else has experienced performance issues with
>>> the newest version of nifi and MergeRecord?  We have been running on nifi
>>> 1.9.2 for awhile now, and recently upgraded to nifi 1.11.4.  Once upgraded,
>>> our identical flows were no longer able to keep up with our data mainly at
>>> MergeRecord processors.
>>>
>>> We ended up downgrading back to nifi 1.9.2.  Once we downgraded, all was
>>> keeping up again.  There were no errors to speak of when we were running
>>> the flow with 1.11.4.  We did see higher load on the OS, but this may have
>>> been caused by the fact there was such a tremendous backlog built up in the
>>> flow.
>>>
>>> Another side note, we saw one UpdateRecord processor producing errors
>>> when I tested the flow with nifi 1.11.4 with a small test flow.  I was able
>>> to fix this issue by changing some parameters in my RecordWriter.  So
>>> perhaps some underlying ways records are bein

Re: MergeRecord performance

2020-04-24 Thread Robert R. Bruno
Joe,

In that part of the flow, we are using avro readers and writers.  We are
using snappy compression (which could be part of the problem).  Since we
are using avro at that point the embedded schema is being used by the
reader and the writer is using the schema name property along with an
internal schema registry in nifi.

I can see what could potentially be shared.

Thanks

On Fri, Apr 24, 2020 at 4:41 PM Joe Witt  wrote:

> Robert,
>
> Can you please detail the record readers and writers involved and how
> schemas are accessed?  There can be very important performance related
> changes in the parsers/serializers of the given formats.  And we've added a
> lot to make schema caching really capable but you have to opt into it.  It
> is of course possible MergeRecord itself is the culprit for performance
> reduction but lets get a more full picture here.
>
> Are you able to share a template and sample data which we can use to
> replicate?
>
> Thanks
>
> On Fri, Apr 24, 2020 at 4:38 PM Robert R. Bruno  wrote:
>
>> I wanted to see if anyone else has experienced performance issues with
>> the newest version of nifi and MergeRecord?  We have been running on nifi
>> 1.9.2 for awhile now, and recently upgraded to nifi 1.11.4.  Once upgraded,
>> our identical flows were no longer able to keep up with our data mainly at
>> MergeRecord processors.
>>
>> We ended up downgrading back to nifi 1.9.2.  Once we downgraded, all was
>> keeping up again.  There were no errors to speak of when we were running
>> the flow with 1.11.4.  We did see higher load on the OS, but this may have
>> been caused by the fact there was such a tremendous backlog built up in the
>> flow.
>>
>> Another side note, we saw one UpdateRecord processor producing errors
>> when I tested the flow with nifi 1.11.4 with a small test flow.  I was able
>> to fix this issue by changing some parameters in my RecordWriter.  So
>> perhaps some underlying ways records are being handled since 1.9.2 caused
>> the performance issue we saw?
>>
>> Any insight anyone has would be greatly appreciated, as we very much
>> would like to upgrade to nifi 1.11.4.  One thought was switching the
>> MergeRecord processors to MergeContent since I've been told MergeContent
>> seems to perform better, but not sure if this is actually true.  We are
>> using the pattern of chaining a few MergeRecord processors together to help
>> with performance.
>>
>> Thanks in advance!
>>
>


MergeRecord performance

2020-04-24 Thread Robert R. Bruno
I wanted to see if anyone else has experienced performance issues with the
newest version of nifi and MergeRecord?  We have been running on nifi 1.9.2
for awhile now, and recently upgraded to nifi 1.11.4.  Once upgraded, our
identical flows were no longer able to keep up with our data mainly at
MergeRecord processors.

We ended up downgrading back to nifi 1.9.2.  Once we downgraded, all was
keeping up again.  There were no errors to speak of when we were running
the flow with 1.11.4.  We did see higher load on the OS, but this may have
been caused by the fact there was such a tremendous backlog built up in the
flow.

Another side note, we saw one UpdateRecord processor producing errors when
I tested the flow with nifi 1.11.4 with a small test flow.  I was able to
fix this issue by changing some parameters in my RecordWriter.  So perhaps
some underlying ways records are being handled since 1.9.2 caused the
performance issue we saw?

Any insight anyone has would be greatly appreciated, as we very much would
like to upgrade to nifi 1.11.4.  One thought was switching the MergeRecord
processors to MergeContent since I've been told MergeContent seems to
perform better, but not sure if this is actually true.  We are using the
pattern of chaining a few MergeRecord processors together to help with
performance.

Thanks in advance!


Re: Who uses NiFi Cluster in Docker ?

2018-10-19 Thread Robert R. Bruno
Been running nifi cluster in a on-prem kubernetes cluster with a lot of
success.  We found using local disks volumes helped performance.

On Fri, Oct 19, 2018, 03:21 Mike Thomsen  wrote:

> Guillaume,
>
> We also have a patch coming in 1.8 that exposes the clustering settings
> through Docker, so that should make it a lot easier for you to set up a
> test cluster.
>
> On Fri, Oct 19, 2018 at 3:49 AM Asanka Sanjaya  wrote:
>
>> Hi Guillaume,
>> I'm using nifi in our production kubernetes cluster on Google cloud for
>> about a year now and didn't run into any trouble. One thing you need to be
>> aware of is to have a persistent disk attached to your container in
>> kubernetes. Otherwise, when the pod gets restarted you will loose queued
>> flow files.
>>
>> On Thu, Oct 18, 2018 at 9:10 PM PICHARD, Guillaume <
>> guillaume.pich...@sogeti.com> wrote:
>>
>>> Hi,
>>>
>>>
>>>
>>> I’m looking for experiences and return on experience in running a Nifi
>>> Cluster in production using docker/kubernetes/mesos. Is it working well ?
>>> Is it stable ? Does it handle well a high workload ?
>>>
>>>
>>>
>>> Thanks for you feedbacks,
>>>
>>> Guillaume.
>>>
>>>
>>>
>>
>>
>> --
>>
>> *Thanks,*
>>
>> Asanka Sanjaya Herath
>>
>> Senior Software Engineer | Zone24x7
>>
>


Re: Invalid header error

2018-01-11 Thread Robert R. Bruno
Thanks for getting to this so soon.  As soon as the ticket is closed, I'll
be sure to give it a quick test with the proxy I am using.  Thanks again.

On Wed, Jan 10, 2018 at 1:40 PM Andy LoPresto <alopre...@apache.org> wrote:

> Matt meant to link to this Jira [1]. We will be writing a blog and
> updating the documentation guides in addition to the new property.
>
>
> [1] https://issues.apache.org/jira/browse/NIFI-4761
>
> Andy LoPresto
> alopre...@apache.org
> *alopresto.apa...@gmail.com <alopresto.apa...@gmail.com>*
> PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4  BACE 3C6E F65B 2F7D EF69
>
> On Jan 10, 2018, at 5:54 AM, Matt Gilman <matt.c.gil...@gmail.com> wrote:
>
> Robert, James, All,
>
> NiFi has been updated to be a little more strict regarding incoming HTTP
> requests. If the Host header does not comply with an expected value, the
> request is rejected. Currently, the expected value comes from those .host
> properties. What's happening is the proxy is likely passing through all
> incoming header values. When NiFi sees the request, it appears as those it
> was not meant for it so it's rejected. I believe there are two valid
> options here:
>
> 1) Remove the Host header at the proxy. This should allow it to explicitly
> set it to the NiFi Host when issuing the request instead of passing through
> the incoming value.
> 2) Update NiFi to allow whitelisting of expected Host values like we did
> for context paths. I've created a JIRA for this option [1].
>
> We'll make sure these get appropriately documented for folks running
> behind a proxy.
>
> Thanks
>
> Matt
>
> [1] https://issues.apache.org/jira/browse/NIFI-4501
>
> On Wed, Jan 10, 2018 at 5:00 AM, Robert R. Bruno <rbru...@gmail.com>
> wrote:
>
>> James,
>>
>> Funny enough I was thinking of the same hack, but as you said sounds a
>> bit nasty.  Hopefully there is a better solution.  Also for me, I may not
>> always have local admin rights on my client machine which I believe is
>> required to change the hosts file.
>>
>> Thanks,
>> Robert
>>
>> On Wed, Jan 10, 2018, 00:18 James Wing <jvw...@gmail.com> wrote:
>>
>>> Robert,
>>>
>>> I had the same problem.  One workaround I have used was to add the DNS
>>> name to the /etc/hosts file with a local IP address, so that I could
>>> configure that name in nifi.web.http.host and NiFi would still bind to the
>>> right IP.  It sounds like a nasty hack now that I describe it, but it
>>> worked.
>>>
>>> Perhaps someone else knows a more elegant configuration?
>>>
>>> Thanks,
>>>
>>> James
>>>
>>> On Tue, Jan 9, 2018 at 7:33 AM, Robert R. Bruno <rbru...@gmail.com>
>>> wrote:
>>>
>>>> I just ran into this as well while trying out 1.5.0-SNAPSHOT.
>>>>
>>>> What is the solution where you are running nifi behind a proxy?  I
>>>> tried setting nifi.web.http.host to my proxy ip but then nifi attempted to
>>>> bind to this ip address.
>>>>
>>>> Hopefully I am missing something.  If not any chance a config value for
>>>> allowed proxies before the release?
>>>>
>>>>
>>>>
>>>> On Fri, Dec 15, 2017, 19:26 Mike Thomsen <mikerthom...@gmail.com>
>>>> wrote:
>>>>
>>>>> Thanks. Is that documented?
>>>>>
>>>>> On Fri, Dec 15, 2017 at 7:02 PM, Andy LoPresto <alopre...@apache.org>
>>>>> wrote:
>>>>>
>>>>>> Hi Mike,
>>>>>>
>>>>>> This is a recent change introduced in 1.5.0-SNAPSHOT (master). You
>>>>>> can resolve this by setting nifi.web.http.host in nifi.properties to the
>>>>>> value of SERVER_HERE.
>>>>>>
>>>>>>
>>>>>> Andy LoPresto
>>>>>> alopre...@apache.org
>>>>>> *alopresto.apa...@gmail.com <alopresto.apa...@gmail.com>*
>>>>>> PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4  BACE 3C6E F65B 2F7D EF69
>>>>>>
>>>>>> On Dec 15, 2017, at 3:32 PM, Mike Thomsen <mikerthom...@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>> I get this error after I installed a new build:
>>>>>>
>>>>>> The request contained an invalid host header [SERVER_IP:8080] in the
>>>>>> request [/]. Check for request manipulation or third-party intercept.
>>>>>>
>>>>>

Re: Invalid header error

2018-01-09 Thread Robert R. Bruno
I just ran into this as well while trying out 1.5.0-SNAPSHOT.

What is the solution where you are running nifi behind a proxy?  I tried
setting nifi.web.http.host to my proxy ip but then nifi attempted to bind
to this ip address.

Hopefully I am missing something.  If not any chance a config value for
allowed proxies before the release?



On Fri, Dec 15, 2017, 19:26 Mike Thomsen  wrote:

> Thanks. Is that documented?
>
> On Fri, Dec 15, 2017 at 7:02 PM, Andy LoPresto 
> wrote:
>
>> Hi Mike,
>>
>> This is a recent change introduced in 1.5.0-SNAPSHOT (master). You can
>> resolve this by setting nifi.web.http.host in nifi.properties to the value
>> of SERVER_HERE.
>>
>>
>> Andy LoPresto
>> alopre...@apache.org
>> *alopresto.apa...@gmail.com *
>> PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4  BACE 3C6E F65B 2F7D EF69
>>
>> On Dec 15, 2017, at 3:32 PM, Mike Thomsen  wrote:
>>
>> I get this error after I installed a new build:
>>
>> The request contained an invalid host header [SERVER_IP:8080] in the
>> request [/]. Check for request manipulation or third-party intercept.
>>
>> In the logs it says:
>>
>> 2017-12-15 18:34:59,937 WARN [NiFi Web Server-66]
>> o.a.n.w.s.HostHeaderSanitizationCustomizer Request host header
>> [SERVER_HERE:8080] different from web hostname [(:8080)]. Overriding to
>> [:8080/nifi/]
>> 2017-12-15 18:34:59,938 WARN [NiFi Web Server-66]
>> o.a.nifi.web.server.HostHeaderHandler Request host header
>> [SERVER_HERE:8080] different from web hostname [localhost(:8080)].
>> Overriding to [localhost:8080/nifi/]
>> 2017-12-15 18:35:00,059 WARN [NiFi Web Server-59]
>> o.a.n.w.s.HostHeaderSanitizationCustomizer Request host header
>> [SERVER_HERE:8080] different from web hostname [(:8080)]. Overriding to
>> [:8080/favicon.ico]
>> 2017-12-15 18:35:00,059 WARN [NiFi Web Server-59]
>> o.a.nifi.web.server.HostHeaderHandler Request host header
>> [SERVER_HERE:8080] different from web hostname [localhost(:8080)].
>> Overriding to [localhost:8080/favicon.ico]
>>
>> Never saw this with 1.4 and earlier. Any ideas?
>>
>> Thanks,
>>
>> Mike
>>
>>
>>
>