[
https://issues.apache.org/jira/browse/YARN-10068?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17009119#comment-17009119
]
Anand Srinivasan commented on YARN-10068:
-----------------------------------------
Hi Adam Antal,
Thanks for reviewing the patch.
For your comments :
1. Can we make this ERROR level, since it's causing serious issues ?
The reason I kept it at WARN level is that the HTTP response itself is
processed successfully in this case and hence TimelineV2ClientImpl#putObjects
just logs a message when ClientResponse#close fails.
If you think that ERROR level is more appropriate even in the above case, I can
change the level accordingly.
2. We will override the msg {{String}} in the finally part.
We won't override the msg as it's been added at the end of the msg string in
finally part.
} finally {
msg = "Response from the timeline server is not successful"
+ ", HTTP error code: " + resp.getStatus()
+ ", "
+ msg; <====
3. I suggest to add a {{Throwable}} case
Good point. I added Throwable to the list of exceptions.
Thanks and kind regards.
> TimelineV2Client may leak file descriptors creating ClientResponse objects.
> ---------------------------------------------------------------------------
>
> Key: YARN-10068
> URL: https://issues.apache.org/jira/browse/YARN-10068
> Project: Hadoop YARN
> Issue Type: Sub-task
> Components: ATSv2
> Affects Versions: 3.0.0
> Environment: HDP VERSION3.1.4
> AMBARI VERSION2.7.4.0
> Reporter: Anand Srinivasan
> Assignee: Anand Srinivasan
> Priority: Critical
> Attachments: YARN-10068.001.patch, YARN-10068.002.patch,
> YARN-10068.003.patch, image-2020-01-02-14-58-12-773.png
>
>
> Hi team,
> Code-walkthrough between v1 and v2 of TimelineClient API revealed that v2 API
> TimelineV2ClientImpl#putObjects doesn't close ClientResponse objects under
> success status returned from Timeline Server. ClientResponse is closed only
> under erroneous response from the server using ClientResponse#getEntity.
> We also noticed that TimelineClient (v1) closes the ClientResponse object in
> TimelineWriter#putEntities by calling ClientResponse#getEntity in both
> success and error conditions from the server thereby avoiding this file
> descriptor leak.
> Customer's original issue and the symptom was that the NodeManager went down
> because of 'too many files open' condition where there were lots of
> CLOSED_WAIT sockets observed between the timeline client (from NM) and the
> timeline server hosts.
> Could you please help resolve this issue ? Thanks.
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]