[jira] [Commented] (NIFI-7716) Received error HTTP_ERROR: HTTP/1.1 420 Enhance Your Calm. Will attempt to reconnect

2020-11-10 Thread Kourge (Jira)


[ 
https://issues.apache.org/jira/browse/NIFI-7716?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17229264#comment-17229264
 ] 

Kourge commented on NIFI-7716:
--

Hi [~V1ncent24],

Are you sure that the same Twitter Consumer Key/Secret and Access Token/Secret 
are not used by another GetTwitter processor or by another application?
In my experience the "420 Enhance Your Calm" exception often occurs when the 
very same Twitter API credentials are used simultaneously.

> Received error HTTP_ERROR: HTTP/1.1 420 Enhance Your Calm. Will attempt to 
> reconnect
> 
>
> Key: NIFI-7716
> URL: https://issues.apache.org/jira/browse/NIFI-7716
> Project: Apache NiFi
>  Issue Type: Bug
>  Components: NiFi Stateless
>Affects Versions: 1.11.3
> Environment: Nifi  on AWS EC2 instance
>Reporter: Vincent Naveen
>Priority: Critical
> Attachments: ErrorOnGetTwitter.JPG
>
>
> We are using python script in the "ExecuteStreamCommand" processor which 
> takes the Twitter user from the database from the incoming flow file 
> generated by the "QueryTableRecord" process. We are able to successfully stop 
> the processor and update the IDs_To_Follow via parameter context in 
> GetTwitter processor and to start the GetTwitter processor.The issue is once 
> the GetTwitter processor starts, it is throwing the error as below,Received 
> error HTTP_ERROR: HTTP/1.1 420 Enhance Your Calm. Will attempt to reconnect
> Received error STOPPED_BY_ERROR: Retries exhausted due to null. Will not 
> attempt to reconnect
> 12:42:46 ISTERRORbe2a0f28-0173-1000-0bbc-4401fbf6d330We came to know that, 
> wait time needs to be added whenever doing the REST Api calls and which we 
> have implemented but still we are getting the same issue. We changed the Max 
> Client Retry count to 50 as well. Still the issue is not fixed.While browsing 
> the below link, we have to do something in the java coding languague but we 
> are using python script in the ExecuteStreamCommand. 
> https://issues.apache.org/jira/browse/NIFI-5953
> [https://github.com/apache/nifi/pull/3276]
> Could you please help us here, how we need to handle this scenario. Thanks in 
> Advance!
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (NIFI-6905) GetTwitter processor, configured to run on primary node only, initializes connection to Twitter API from every NiFi cluster node, even on non-primary nodes

2019-11-29 Thread Kourge (Jira)


[ 
https://issues.apache.org/jira/browse/NIFI-6905?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16985090#comment-16985090
 ] 

Kourge commented on NIFI-6905:
--

I have implemented a solution (#2 in the ticket description).

I updated the *GetTwitter* processor `*onScheduled()*` method to only create a 
`*clientBuilder*` without connecting it to the Twitter API.
 Connection is now initialized by the `*onTrigger()*` method when it needs it 
(in primary node only mode `*onTrigger()*` never run on non primary nodes).
 Added `*onPrimaryNodeChange()`* to close connection on 
`*PRIMARY_NODE_REVOKED*` events.

Please review the pull request.

> GetTwitter processor, configured to run on primary node only, initializes 
> connection to Twitter API from every NiFi cluster node, even on non-primary 
> nodes
> ---
>
> Key: NIFI-6905
> URL: https://issues.apache.org/jira/browse/NIFI-6905
> Project: Apache NiFi
>  Issue Type: Bug
>  Components: Extensions
>Affects Versions: 1.0.0
>Reporter: Kourge
>Assignee: Kourge
>Priority: Major
>  Labels: getTwitter
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> I have a *GetTwitter* processor running on a 3-nodes NiFi cluster and 
> configured to be executed on the primary node only.
> The symptom is that there is a too high frequency of HTTP 420 ("Enhance Your 
> Calm") exceptions on GetTwitter processor start.
> I made the following tests:
>  * With only 1 NiFi node. I was able to start/stop GetTwitter processor 10 
> times in a raw without any errors.
>  * With 2 NiFi nodes running, HTTP 420 errors occurred after a few start/stop 
> (sometimes even after a single start).
> After an analysis of the source code and knowing 
> https://issues.apache.org/jira/browse/NIFI-2592 I came to the conclusion that 
> the GetTwitter processor is initializing the connection to Twitter API on 
> each node of the cluster, even to non-primary nodes.
> The `*onScheduled()*` method is run on every node (see: NIFI-2592) making 
> connections to Twitter with `*client.connect()*`. Then the `*onTrigger()*` 
> method consumes the tweets normally from the primary node.
> Issue is that having more that one node initializing connections make Twitter 
> API raise HTTP 420 errors.
> {code:java}
> ERROR
> org.apache.nifi.processors.twitter.GetTwitter
> GetTwitter[id=XYZ] Received error HTTP_ERROR: HTTP/1.1 420 Enhance Your Calm. 
> Will attempt to reconnect
> {code}
> +*Proposed solutions:*+
>  # Change the behavior of `*onScheduled()*` method to run only on primary 
> node (as proposed in NIFI-2592)
>  # Update GetTwitter processor implementation to not call 
> `*client.connect()*` anymore from the `*onScheduled()*` method but only when 
> *PrimaryNodeState* changes to *ELECTED_PRIMARY_NODE* (And when 
> *PrimaryNodeState* changes to *PRIMARY_NODE_REVOKED*: perform a 
> `*client.stop()*`)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (NIFI-6905) GetTwitter processor, configured to run on primary node only, initializes connection to Twitter API from every NiFi cluster node, even on non-primary nodes

2019-11-29 Thread Kourge (Jira)


 [ 
https://issues.apache.org/jira/browse/NIFI-6905?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kourge reassigned NIFI-6905:


Assignee: Kourge

> GetTwitter processor, configured to run on primary node only, initializes 
> connection to Twitter API from every NiFi cluster node, even on non-primary 
> nodes
> ---
>
> Key: NIFI-6905
> URL: https://issues.apache.org/jira/browse/NIFI-6905
> Project: Apache NiFi
>  Issue Type: Bug
>  Components: Extensions
>Affects Versions: 1.0.0
>Reporter: Kourge
>Assignee: Kourge
>Priority: Major
>  Labels: getTwitter
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> I have a *GetTwitter* processor running on a 3-nodes NiFi cluster and 
> configured to be executed on the primary node only.
> The symptom is that there is a too high frequency of HTTP 420 ("Enhance Your 
> Calm") exceptions on GetTwitter processor start.
> I made the following tests:
>  * With only 1 NiFi node. I was able to start/stop GetTwitter processor 10 
> times in a raw without any errors.
>  * With 2 NiFi nodes running, HTTP 420 errors occurred after a few start/stop 
> (sometimes even after a single start).
> After an analysis of the source code and knowing 
> https://issues.apache.org/jira/browse/NIFI-2592 I came to the conclusion that 
> the GetTwitter processor is initializing the connection to Twitter API on 
> each node of the cluster, even to non-primary nodes.
> The `*onScheduled()*` method is run on every node (see: NIFI-2592) making 
> connections to Twitter with `*client.connect()*`. Then the `*onTrigger()*` 
> method consumes the tweets normally from the primary node.
> Issue is that having more that one node initializing connections make Twitter 
> API raise HTTP 420 errors.
> {code:java}
> ERROR
> org.apache.nifi.processors.twitter.GetTwitter
> GetTwitter[id=XYZ] Received error HTTP_ERROR: HTTP/1.1 420 Enhance Your Calm. 
> Will attempt to reconnect
> {code}
> +*Proposed solutions:*+
>  # Change the behavior of `*onScheduled()*` method to run only on primary 
> node (as proposed in NIFI-2592)
>  # Update GetTwitter processor implementation to not call 
> `*client.connect()*` anymore from the `*onScheduled()*` method but only when 
> *PrimaryNodeState* changes to *ELECTED_PRIMARY_NODE* (And when 
> *PrimaryNodeState* changes to *PRIMARY_NODE_REVOKED*: perform a 
> `*client.stop()*`)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (NIFI-6906) GetTwitter processor throws "invalid cookie header" exceptions when connecting to Twitter API

2019-11-26 Thread Kourge (Jira)
Kourge created NIFI-6906:


 Summary: GetTwitter processor throws "invalid cookie header" 
exceptions when connecting to Twitter API
 Key: NIFI-6906
 URL: https://issues.apache.org/jira/browse/NIFI-6906
 Project: Apache NiFi
  Issue Type: Bug
  Components: Extensions
Affects Versions: 1.9.2
Reporter: Kourge


GetTwitter processor throws "invalid cookie header" exceptions (WARN) when 
connecting to Twitter API:
{code:java}
org.apache.http.client.protocol.ResponseProcessCookies
Invalid cookie header: "set-cookie: guest_id=XYZ; Max-Age=63072000; 
Expires=Thu, 25 Nov 2021 11:08:52 GMT; Path=/; Domain=.twitter.com". Invalid 
'expires' attribute: Thu, 25 Nov 2021 11:08:52 GMT
{code}
HttpClient's CookieSpec may need to be configured.
[https://stackoverflow.com/questions/36473478/fixing-httpclient-warning-invalid-expires-attribute-using-fluent-api/40697322]
 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (NIFI-6905) GetTwitter processor, configured to run on primary node only, initializes connection to Twitter API from every NiFi cluster node, even on non-primary nodes

2019-11-26 Thread Kourge (Jira)
Kourge created NIFI-6905:


 Summary: GetTwitter processor, configured to run on primary node 
only, initializes connection to Twitter API from every NiFi cluster node, even 
on non-primary nodes
 Key: NIFI-6905
 URL: https://issues.apache.org/jira/browse/NIFI-6905
 Project: Apache NiFi
  Issue Type: Bug
  Components: Extensions
Affects Versions: 1.0.0
Reporter: Kourge


I have a *GetTwitter* processor running on a 3-nodes NiFi cluster and 
configured to be executed on the primary node only.
The symptom is that there is a too high frequency of HTTP 420 ("Enhance Your 
Calm") exceptions on GetTwitter processor start.

I made the following tests:
 * With only 1 NiFi node. I was able to start/stop GetTwitter processor 10 
times in a raw without any errors.
 * With 2 NiFi nodes running, HTTP 420 errors occurred after a few start/stop 
(sometimes even after a single start).

After an analysis of the source code and knowing 
https://issues.apache.org/jira/browse/NIFI-2592 I came to the conclusion that 
the GetTwitter processor is initializing the connection to Twitter API on each 
node of the cluster, even to non-primary nodes.

The `*onScheduled()*` method is run on every node (see: NIFI-2592) making 
connections to Twitter with `*client.connect()*`. Then the `*onTrigger()*` 
method consumes the tweets normally from the primary node.
Issue is that having more that one node initializing connections make Twitter 
API raise HTTP 420 errors.
{code:java}
ERROR
org.apache.nifi.processors.twitter.GetTwitter
GetTwitter[id=XYZ] Received error HTTP_ERROR: HTTP/1.1 420 Enhance Your Calm. 
Will attempt to reconnect
{code}
+*Proposed solutions:*+
 # Change the behavior of `*onScheduled()*` method to run only on primary node 
(as proposed in NIFI-2592)
 # Update GetTwitter processor implementation to not call `*client.connect()*` 
anymore from the `*onScheduled()*` method but only when *PrimaryNodeState* 
changes to *ELECTED_PRIMARY_NODE* (And when *PrimaryNodeState* changes to 
*PRIMARY_NODE_REVOKED*: perform a `*client.stop()*`)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (NIFI-5953) GetTwitter processor throws Enhance Your Calm exceptions then fails with Retries exhausted

2019-02-12 Thread Kourge (JIRA)


 [ 
https://issues.apache.org/jira/browse/NIFI-5953?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kourge updated NIFI-5953:
-
Fix Version/s: 1.9.0

> GetTwitter processor throws Enhance Your Calm exceptions then fails with 
> Retries exhausted
> --
>
> Key: NIFI-5953
> URL: https://issues.apache.org/jira/browse/NIFI-5953
> Project: Apache NiFi
>  Issue Type: Bug
>  Components: Extensions
>Affects Versions: 1.7.0
>Reporter: Kourge
>Priority: Major
> Fix For: 1.9.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Hi,
> I am using the GetTwitter processor, with the Filter Endpoint.
>  The issue is that I am often getting series of `*Received error HTTP_ERROR: 
> HTTP/1.1 420 Enhance Your Calm. Will attempt to reconnect*` exceptions.
>  These are followed by one `*Received error STOPPED_BY_ERROR: Retries 
> exhausted due to null. Will not attempt to reconnect*` exception and then the 
> processor don't get any more tweet from Twitter endpoint.
> I am getting rate limited by Twitter API. I am running a NiFi cluster so I am 
> running GetTwitter process on the Primary Node only to prevent using the same 
> credentials several times in parallel.
> I tried to apply the configuration recommendation from this mailing list:
>  
> <[https://lists.apache.org/thread.html/ed397f42a26760280363e9cc1f64f6654c635110005e24ab9486bf19@%3Cdev.nifi.apache.org%3E]
> But raising "run schedule" parameter to 60 seconds does not help in my case 
> since I target reading between 100 and 200 tweets per minute. Setting "run 
> schedule" to 60 seconds will let NiFi poll only 1 tweet per minute and won't 
> be able to consume Twitter API tweets queue.
> +*Proposed solution*+
> I analyzed the `*GetTwitter.java*` implementation and noticed that the 
> `*onTrigger()*`method reconnects (`*client.reconnect();*`) to the Twitter 
> endpoint on `*HTTP_ERROR*`.
>  The issue here is that `*HTTP/1.1 420 Enhance Your Calm*` messages are 
> `*HTTP_ERROR*` but the Twitter HBC library client (com.twitter.hbc) already 
> manage reconnection.
>  Twitter HBC library client is making retries with an increasing wait delay 
> by its own; with 5 retries by default.
> More, it seam that the `*client.reconnect();*` don't work in my case and this 
> brings to be kicked off the Twitter API earlier because that method is called 
> too often.
> My proposed solution is the following (tested on my local development)
> *1. Letting Twitter HBC library client making the connection retries on 
> `HTTP/1.1 420 Enhance Your Calm` messages.*
> The `*onTrigger()*` method should be updated to not try to reconnect in case 
> of `*HTTP_ERROR*` with message equal to `*HTTP/1.1 420 Enhance Your Calm*`:
> {code:java}
>  case HTTP_ERROR:
>  if (!event.getMessage().equals("HTTP/1.1 420 Enhance Your Calm")) {
>  getLogger().error("Received error {}: {}. Will attempt to 
> reconnect", new Object[{event.getEventType(), event.getMessage()});
>  client.reconnect();
>  }
>  else {
>  getLogger().error("Received error {}: {}. Will not attempt to 
> reconnect", new Object[]{event.getEventType(), event.getMessage()});
>  }
>  break;
> {code}
> *2. Parameterize maximum number of connection retries*
> I also noticed that the default number of retries on the Twitter HBC library 
> is sometimes too low (5 times).
>  So it would be useful to add a GetTwitter processor property named `*Max 
> Connection Retries*`. In my usage I found that `*10*` is a good value.
> Then update the `*onSchedule()*` method with this line (replacing `*10*` by 
> the value of `*Max Connection Retries*`)
> {code:java}
> clientBuilder.retries(10); // default value is 5
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (NIFI-5953) GetTwitter processor throws Enhance Your Calm exceptions then fails with Retries exhausted

2019-02-12 Thread Kourge (JIRA)


[ 
https://issues.apache.org/jira/browse/NIFI-5953?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16765887#comment-16765887
 ] 

Kourge commented on NIFI-5953:
--

Hello,

Is anyone available to review this PR? https://github.com/apache/nifi/pull/3276

> GetTwitter processor throws Enhance Your Calm exceptions then fails with 
> Retries exhausted
> --
>
> Key: NIFI-5953
> URL: https://issues.apache.org/jira/browse/NIFI-5953
> Project: Apache NiFi
>  Issue Type: Bug
>  Components: Extensions
>Affects Versions: 1.7.0
>Reporter: Kourge
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Hi,
> I am using the GetTwitter processor, with the Filter Endpoint.
>  The issue is that I am often getting series of `*Received error HTTP_ERROR: 
> HTTP/1.1 420 Enhance Your Calm. Will attempt to reconnect*` exceptions.
>  These are followed by one `*Received error STOPPED_BY_ERROR: Retries 
> exhausted due to null. Will not attempt to reconnect*` exception and then the 
> processor don't get any more tweet from Twitter endpoint.
> I am getting rate limited by Twitter API. I am running a NiFi cluster so I am 
> running GetTwitter process on the Primary Node only to prevent using the same 
> credentials several times in parallel.
> I tried to apply the configuration recommendation from this mailing list:
>  
> <[https://lists.apache.org/thread.html/ed397f42a26760280363e9cc1f64f6654c635110005e24ab9486bf19@%3Cdev.nifi.apache.org%3E]
> But raising "run schedule" parameter to 60 seconds does not help in my case 
> since I target reading between 100 and 200 tweets per minute. Setting "run 
> schedule" to 60 seconds will let NiFi poll only 1 tweet per minute and won't 
> be able to consume Twitter API tweets queue.
> +*Proposed solution*+
> I analyzed the `*GetTwitter.java*` implementation and noticed that the 
> `*onTrigger()*`method reconnects (`*client.reconnect();*`) to the Twitter 
> endpoint on `*HTTP_ERROR*`.
>  The issue here is that `*HTTP/1.1 420 Enhance Your Calm*` messages are 
> `*HTTP_ERROR*` but the Twitter HBC library client (com.twitter.hbc) already 
> manage reconnection.
>  Twitter HBC library client is making retries with an increasing wait delay 
> by its own; with 5 retries by default.
> More, it seam that the `*client.reconnect();*` don't work in my case and this 
> brings to be kicked off the Twitter API earlier because that method is called 
> too often.
> My proposed solution is the following (tested on my local development)
> *1. Letting Twitter HBC library client making the connection retries on 
> `HTTP/1.1 420 Enhance Your Calm` messages.*
> The `*onTrigger()*` method should be updated to not try to reconnect in case 
> of `*HTTP_ERROR*` with message equal to `*HTTP/1.1 420 Enhance Your Calm*`:
> {code:java}
>  case HTTP_ERROR:
>  if (!event.getMessage().equals("HTTP/1.1 420 Enhance Your Calm")) {
>  getLogger().error("Received error {}: {}. Will attempt to 
> reconnect", new Object[{event.getEventType(), event.getMessage()});
>  client.reconnect();
>  }
>  else {
>  getLogger().error("Received error {}: {}. Will not attempt to 
> reconnect", new Object[]{event.getEventType(), event.getMessage()});
>  }
>  break;
> {code}
> *2. Parameterize maximum number of connection retries*
> I also noticed that the default number of retries on the Twitter HBC library 
> is sometimes too low (5 times).
>  So it would be useful to add a GetTwitter processor property named `*Max 
> Connection Retries*`. In my usage I found that `*10*` is a good value.
> Then update the `*onSchedule()*` method with this line (replacing `*10*` by 
> the value of `*Max Connection Retries*`)
> {code:java}
> clientBuilder.retries(10); // default value is 5
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (NIFI-5953) GetTwitter processor throws Enhance Your Calm exceptions then fails with Retries exhausted

2019-01-11 Thread Kourge (JIRA)


 [ 
https://issues.apache.org/jira/browse/NIFI-5953?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kourge updated NIFI-5953:
-
Description: 
Hi,

I am using the GetTwitter processor, with the Filter Endpoint.
 The issue is that I am often getting series of `*Received error HTTP_ERROR: 
HTTP/1.1 420 Enhance Your Calm. Will attempt to reconnect*` exceptions.
 These are followed by one `*Received error STOPPED_BY_ERROR: Retries exhausted 
due to null. Will not attempt to reconnect*` exception and then the processor 
don't get any more tweet from Twitter endpoint.

I am getting rate limited by Twitter API. I am running a NiFi cluster so I am 
running GetTwitter process on the Primary Node only to prevent using the same 
credentials several times in parallel.

I tried to apply the configuration recommendation from this mailing list:
 
<[https://lists.apache.org/thread.html/ed397f42a26760280363e9cc1f64f6654c635110005e24ab9486bf19@%3Cdev.nifi.apache.org%3E]

But raising "run schedule" parameter to 60 seconds does not help in my case 
since I target reading between 100 and 200 tweets per minute. Setting "run 
schedule" to 60 seconds will let NiFi poll only 1 tweet per minute and won't be 
able to consume Twitter API tweets queue.

+*Proposed solution*+

I analyzed the `*GetTwitter.java*` implementation and noticed that the 
`*onTrigger()*`method reconnects (`*client.reconnect();*`) to the Twitter 
endpoint on `*HTTP_ERROR*`.
 The issue here is that `*HTTP/1.1 420 Enhance Your Calm*` messages are 
`*HTTP_ERROR*` but the Twitter HBC library client (com.twitter.hbc) already 
manage reconnection.
 Twitter HBC library client is making retries with an increasing wait delay by 
its own; with 5 retries by default.

More, it seam that the `*client.reconnect();*` don't work in my case and this 
brings to be kicked off the Twitter API earlier because that method is called 
too often.

My proposed solution is the following (tested on my local development)

*1. Letting Twitter HBC library client making the connection retries on 
`HTTP/1.1 420 Enhance Your Calm` messages.*

The `*onTrigger()*` method should be updated to not try to reconnect in case of 
`*HTTP_ERROR*` with message equal to `*HTTP/1.1 420 Enhance Your Calm*`:
{code:java}
 case HTTP_ERROR:
 if (!event.getMessage().equals("HTTP/1.1 420 Enhance Your Calm")) {
 getLogger().error("Received error {}: {}. Will attempt to reconnect", 
new Object[{event.getEventType(), event.getMessage()});
 client.reconnect();
 }
 else {
 getLogger().error("Received error {}: {}. Will not attempt to reconnect", 
new Object[]{event.getEventType(), event.getMessage()});
 }
 break;
{code}
*2. Parameterize maximum number of connection retries*

I also noticed that the default number of retries on the Twitter HBC library is 
sometimes too low (5 times).
 So it would be useful to add a GetTwitter processor property named `*Max 
Connection Retries*`. In my usage I found that `*10*` is a good value.

Then update the `*onSchedule()*` method with this line (replacing `*10*` by the 
value of `*Max Connection Retries*`)
{code:java}
clientBuilder.retries(10); // default value is 5
{code}

  was:
Hi,

I am using the GetTwitter processor, with the Filter Endpoint.
 The issue is that I am often getting series of `*Received error HTTP_ERROR: 
HTTP/1.1 420 Enhance Your Calm. Will attempt to reconnect*` exceptions.
 These are followed by one `*Received error STOPPED_BY_ERROR: Retries exhausted 
due to null. Will not attempt to reconnect*` exception and then the processor 
don't get any more tweet from Twitter endpoint.

I am getting rate limited by Twitter API. I am running a NiFi cluster so I am 
running GetTwitter process on the Primary Node only to prevent using the same 
credentials several times in parallel.

I tried to apply the configuration recommendation from this mailing list:
 
<[https://lists.apache.org/thread.html/ed397f42a26760280363e9cc1f64f6654c635110005e24ab9486bf19@%3Cdev.nifi.apache.org%3E]

But raising "run schedule" parameter to 60 seconds does not help in my case 
since I target reading between 100 and 200 tweets per minute. Setting "run 
schedule" to 60 seconds will let NiFi poll only 1 tweet per minute and won't be 
able to consume Twitter API tweets queue.

+*Proposed solution*+

I analyzed the `*GetTwitter.java*` implementation and noticed that the 
`*onTrigger()*`method reconnects (`*client.reconnect();*`) to the Twitter 
endpoint on `*HTTP_ERROR*`.
 The issue here is that `*HTTP/1.1 420 Enhance Your Calm*` messages are 
`*HTTP_ERROR*` but the Twitter HBC library client (com.twitter.hbc) already 
manage reconnection.
 Twitter HBC library client is making retries with an increasing wait delay by 
its own; with 5 retries by default.

More, it seam that the `*client.reconnect();*` don't work in my case and this 
brings to be kicked off the Twitter API earlier because that method is called 

[jira] [Updated] (NIFI-5953) GetTwitter processor throws Enhance Your Calm exceptions then fails with Retries exhausted

2019-01-11 Thread Kourge (JIRA)


 [ 
https://issues.apache.org/jira/browse/NIFI-5953?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kourge updated NIFI-5953:
-
Description: 
Hi,

I am using the GetTwitter processor, with the Filter Endpoint.
 The issue is that I am often getting series of `*Received error HTTP_ERROR: 
HTTP/1.1 420 Enhance Your Calm. Will attempt to reconnect*` exceptions.
 These are followed by one `*Received error STOPPED_BY_ERROR: Retries exhausted 
due to null. Will not attempt to reconnect*` exception and then the processor 
don't get any more tweet from Twitter endpoint.

I am getting rate limited by Twitter API. I am running a NiFi cluster so I am 
running GetTwitter process on the Primary Node only to prevent using the same 
credentials several times in parallel.

I tried to apply the configuration recommendation from this mailing list:
 
<[https://lists.apache.org/thread.html/ed397f42a26760280363e9cc1f64f6654c635110005e24ab9486bf19@%3Cdev.nifi.apache.org%3E]

But raising "run schedule" parameter to 60 seconds does not help in my case 
since I target reading between 100 and 200 tweets per minute. Setting "run 
schedule" to 60 seconds will let NiFi poll only 1 tweet per minute and won't be 
able to consume Twitter API tweets queue.

+*Proposed solution*+

I analyzed the `*GetTwitter.java*` implementation and noticed that the 
`*onTrigger()*`method reconnects (`*client.reconnect();*`) to the Twitter 
endpoint on `*HTTP_ERROR*`.
 The issue here is that `*HTTP/1.1 420 Enhance Your Calm*` messages are 
`*HTTP_ERROR*` but the Twitter HBC library client (com.twitter.hbc) already 
manage reconnection.
 Twitter HBC library client is making retries with an increasing wait delay by 
its own; with 5 retries by default.

More, it seam that the `*client.reconnect();*` don't work in my case and this 
brings to be kicked off the Twitter API earlier because that method is called 
too often.

My proposed solution is the following (tested on my local development)

*1. Letting Twitter HBC library client making the connection retries on 
`HTTP/1.1 420 Enhance Your Calm` messages.*

The `*onTrigger()*` method should be updated to not try to reconnect in case of 
`*HTTP_ERROR*` with message equal to `*HTTP/1.1 420 Enhance Your Calm*`:
{code:java}
 case HTTP_ERROR:
 if (!event.getMessage().equals("HTTP/1.1 420 Enhance Your Calm")) {
 getLogger().error("Received error {}: {}. Will attempt to reconnect", 
new Object[]{event.getEventType(),event.getMessage()});
 client.reconnect();
 }
 else {
 getLogger().error("Received error {}: {}. Will not attempt to reconnect", 
new Object[]\{event.getEventType(), event.getMessage()});
 }
 break;
{code}
*2. Parameterize maximum number of connection retries*

I also noticed that the default number of retries on the Twitter HBC library is 
sometimes too low (5 times).
 So it would be useful to add a GetTwitter processor property named `*Max 
Connection Retries*`. In my usage I found that `*10*` is a good value.

Then update the `*onSchedule()*` method with this line (replacing `*10*` by the 
value of `*Max Connection Retries*`)
{code:java}
clientBuilder.retries(10); // default value is 5
{code}

  was:
Hi,

I am using the GetTwitter processor, with the Filter Endpoint.
 The issue is that I am often getting series of `*Received error HTTP_ERROR: 
HTTP/1.1 420 Enhance Your Calm. Will attempt to reconnect*` exceptions.
 These are followed by one `*Received error STOPPED_BY_ERROR: Retries exhausted 
due to null. Will not attempt to reconnect*` exception and then the processor 
don't get any more tweet from Twitter endpoint.

I am getting rate limited by Twitter API. I am running a NiFi cluster so I am 
running GetTwitter process on the Primary Node only to prevent using the same 
credentials several times in parallel.

I tried to apply the configuration recommendation from this mailing list:
 
<[https://lists.apache.org/thread.html/ed397f42a26760280363e9cc1f64f6654c635110005e24ab9486bf19@%3Cdev.nifi.apache.org%3E]

But raising "run schedule" parameter to 60 seconds does not help in my case 
since I target reading between 100 and 200 tweets per minute. Setting "run 
schedule" to 60 seconds will let NiFi poll only 1 tweet per minute and won't be 
able to consume Twitter API tweets queue.

+*Proposed solution*+

I analyzed the `*GetTwitter.java*` implementation and noticed that the 
`*onTrigger()*`method reconnects (`*client.reconnect();*`) to the Twitter 
endpoint on `*HTTP_ERROR*`.
 The issue here is that `*HTTP/1.1 420 Enhance Your Calm*` messages are 
`*HTTP_ERROR*` but the Twitter HBC library client (com.twitter.hbc) already 
manage reconnection.
 Twitter HBC library client is making retries with an increasing wait delay by 
its own; with 5 retries by default.

More, it seam that the `*client.reconnect();*` don't work in my case and this 
brings to be kicked off the Twitter API earlier because that method is called 

[jira] [Created] (NIFI-5953) GetTwitter processor throws Enhance Your Calm exceptions then fails with Retries exhausted

2019-01-11 Thread Kourge (JIRA)
Kourge created NIFI-5953:


 Summary: GetTwitter processor throws Enhance Your Calm exceptions 
then fails with Retries exhausted
 Key: NIFI-5953
 URL: https://issues.apache.org/jira/browse/NIFI-5953
 Project: Apache NiFi
  Issue Type: Bug
  Components: Extensions
Affects Versions: 1.7.0
Reporter: Kourge


Hi,

I am using the GetTwitter processor, with the Filter Endpoint.
 The issue is that I am often getting series of `*Received error HTTP_ERROR: 
HTTP/1.1 420 Enhance Your Calm. Will attempt to reconnect*` exceptions.
 These are followed by one `*Received error STOPPED_BY_ERROR: Retries exhausted 
due to null. Will not attempt to reconnect*` exception and then the processor 
don't get any more tweet from Twitter endpoint.

I am getting rate limited by Twitter API. I am running a NiFi cluster so I am 
running GetTwitter process on the Primary Node only to prevent using the same 
credentials several times in parallel.

I tried to apply the configuration recommendation from this mailing list:
 
<[https://lists.apache.org/thread.html/ed397f42a26760280363e9cc1f64f6654c635110005e24ab9486bf19@%3Cdev.nifi.apache.org%3E]

But raising "run schedule" parameter to 60 seconds does not help in my case 
since I target reading between 100 and 200 tweets per minute. Setting "run 
schedule" to 60 seconds will let NiFi poll only 1 tweet per minute and won't be 
able to consume Twitter API tweets queue.

+*Proposed solution*+

I analyzed the `*GetTwitter.java*` implementation and noticed that the 
`*onTrigger()*`method reconnects (`*client.reconnect();*`) to the Twitter 
endpoint on `*HTTP_ERROR*`.
 The issue here is that `*HTTP/1.1 420 Enhance Your Calm*` messages are 
`*HTTP_ERROR*` but the Twitter HBC library client (com.twitter.hbc) already 
manage reconnection.
 Twitter HBC library client is making retries with an increasing wait delay by 
its own; with 5 retries by default.

More, it seam that the `*client.reconnect();*` don't work in my case and this 
brings to be kicked off the Twitter API earlier because that method is called 
too often.

My proposed solution is the following (tested on my local development)

*1. Letting Twitter HBC library client making the connection retries on 
`HTTP/1.1 420 Enhance Your Calm` messages.*

The `*onTrigger()*` method should be updated to not try to reconnect in case of 
`*HTTP_ERROR*` with message equal to `*HTTP/1.1 420 Enhance Your Calm*`:
{code:java}
 case HTTP_ERROR:
 if (!event.getMessage().equals("HTTP/1.1 420 Enhance Your Calm")) {
 getLogger().error("Received error {}: {}. Will attempt to reconnect", new 
Object[]

{event.getEventType(), event.getMessage()});
 client.reconnect();
 }
 else {
 getLogger().error("Received error {}: {}. Will not attempt to reconnect", new 
Object[]\{event.getEventType(), event.getMessage()}

);
 }
 break;
{code}
*2. Parameterize maximum number of connection retries*

I also noticed that the default number of retries on the Twitter HBC library is 
sometimes too low (5 times).
 So it would be useful to add a GetTwitter processor property named `*Max 
Connection Retries*`. In my usage I found that `*10*` is a good value.

Then update the `*onSchedule()*` method with this line (replacing `*10*` by the 
value of `*Max Connection Retries*`)
{code:java}
clientBuilder.retries(10); // default value is 5
{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)