[ 
https://issues.apache.org/jira/browse/YARN-5269?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15967126#comment-15967126
 ] 

Haibo Chen edited comment on YARN-5269 at 4/13/17 4:46 AM:
-----------------------------------------------------------

bq.  let's restrict the focus for this jira to showing any exception back to 
the client as and when we determine that a problem occurred writing to HBase
One the client side, any error in the response will be wrapped in an exception 
(after retries) and thrown correctly for TImelineV2Client.putEntities() calls. 
On the server side, we wrap timelinecollector.putEntities() and 
timelinecollector.putEntitiesAsyn() calls with try-catch clause in 
TimelineCollectorWebService, so any problem writing to HBase, that 
TimelineCollector indicates in the form of exceptions thrown by 
putEntities()/putEntitiesAsync(), will be returned to client as 500 error 
response. 

What we are missing though is checking TimelineWriteResponse returned by 
TimelineCollector.putEntities(). Per discussion above with Varun and its 
javadoc, TimelineWriteResponse allows details indications of errors and 
individual entities that they are associated with. To comply with current 
TimelineCollector API, it seems that we should inspect TimelineWriteResponse 
and return all found errors to clients.  

However, TimelineWriteResponse is hardly useful given our current 
implementation for two reasons: 

1) TimelineWriteResponse returned by TimelineCollector.putEntities() is what is 
returned by TimelineWriter.write().  HBaseTImelineWriter.write() is essentially 
an async operation, the returned TimelineResponse never contains any error. 
Problems are solely indicated as IOException
{code}
public class HBaseTimelineWriterImpl extends AbstractService implements
    TimelineWriter {
 public TimelineWriteResponse write(String clusterId, String userId,
      String flowName, String flowVersion, long flowRunId, String appId,
      TimelineEntities data) throws IOException {

    TimelineWriteResponse putStatus = new TimelineWriteResponse();
    // defensive coding to avoid NPE during row key construction
    if ((flowName == null) || (appId == null) || (clusterId == null)
        || (userId == null)) {
      LOG.warn("Found null for one of: flowName=" + flowName + " appId=" + appId
          + " userId=" + userId + " clusterId=" + clusterId
          + " . Not proceeding with writing to hbase");
      return putStatus;
    }
    ...
    return putStatus;
}
{code}
In cases where flowName/appId/userId/clusterId/ is null, for example, no error 
is included in the response.

2) We can no longer just return 500 with an exception message to clients. More 
involved structure is needed to return errors from TimelineCollector to 
TimelineV2Client, and callers of timelinev2Client.putEntity(TimelineEntity... 
entities) will now be expected to handle all cases of errors unless we change 
timelinev2Client.putEntity() to only accept one entity per call

It is OK for now to not handle TimelineWriteResponse given that 
HBaseTimelineWriter always return a dummy one and it is the only backend 
implementation. Thus, I am removing the yarn5355-merge-blocker label. 

But once we expect alternative backend/TimelineWriter implementations, we need 
to either add support in TimelineCollectorWebService and TimelineV2Client to 
handle detailed error response, or remove TimelineWriteResponse from 
TimelineWriter.write() signature completely if we cannot really make use of it 
like we desire to.


was (Author: haibochen):
bq.  let's restrict the focus for this jira to showing any exception back to 
the client as and when we determine that a problem occurred writing to HBase
One the client side, any error in the response will be wrapped in an exception 
(after retries) and thrown correctly for TImelineV2Client.putEntities() calls. 
On the server side, we wrap timelinecollector.putEntities() and 
timelinecollector.putEntitiesAsyn() calls with try-catch clause in 
TimelineCollectorWebService, so any problem writing to HBase, that 
TimelineCollector indicates in the form of exceptions thrown by 
putEntities()/putEntitiesAsync(), will be returned to client as 500 error 
response. 

What we are missing though is checking TimelineWriteResponse returned by 
TimelineCollector.putEntities(). Per discussion above with Varun and its 
javadoc, TimelineWriteResponse allows details indications of errors and 
individual entities that they are associated with. To comply with current 
TimelineCollector API, it seems that we should inspect TimelineWriteResponse 
and return all found errors to clients.  

However, TimelineWriteResponse is hardly useful given our current 
implementation for two reasons: 
1) TimelineWriteResponse returned by TimelineCollector.putEntities() is what is 
returned by TimelineWriter.write().  HBaseTImelineWriter.write() is essentially 
an async operation, the returned TimelineResponse never contains any error. 
Problems are solely indicated as IOException
{code}
public class HBaseTimelineWriterImpl extends AbstractService implements
    TimelineWriter {
 public TimelineWriteResponse write(String clusterId, String userId,
      String flowName, String flowVersion, long flowRunId, String appId,
      TimelineEntities data) throws IOException {

    TimelineWriteResponse putStatus = new TimelineWriteResponse();
    // defensive coding to avoid NPE during row key construction
    if ((flowName == null) || (appId == null) || (clusterId == null)
        || (userId == null)) {
      LOG.warn("Found null for one of: flowName=" + flowName + " appId=" + appId
          + " userId=" + userId + " clusterId=" + clusterId
          + " . Not proceeding with writing to hbase");
      return putStatus;
    }
    ...
    return putStatus;
}
{code}
In cases where flowName/appId/userId/clusterId/ is null, for example, no error 
is included in the response.
2) We can no longer just return 500 with an exception message to clients. More 
involved structure is needed to return errors from TimelineCollector to 
TimelineV2Client, and callers of timelinev2Client.putEntity(TimelineEntity... 
entities) will now be expected to handle all cases of errors unless we change 
timelinev2Client.putEntity() to only accept one entity per call

It is OK for now to not handle TimelineWriteResponse given that 
HBaseTimelineWriter always return a dummy one and it is the only backend 
implementation. Thus, I am removing the yarn5355-merge-blocker label. 
But once we expect alternative backend/TimelineWriter implementations, we need 
to either add support in TimelineCollectorWebService and TimelineV2Client to 
handle detailed error response, or remove TimelineWriteResponse from 
TimelineWriter.write() signature completely if we cannot really make use of it 
like we desire to.

> Bubble exceptions and errors all the way up the calls, including to clients.
> ----------------------------------------------------------------------------
>
>                 Key: YARN-5269
>                 URL: https://issues.apache.org/jira/browse/YARN-5269
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: timelineserver
>    Affects Versions: YARN-2928
>            Reporter: Joep Rottinghuis
>            Assignee: Haibo Chen
>              Labels: YARN-5355
>
> Currently we ignore (swallow) exception from the HBase side in many cases 
> (reads and writes).
> Also, on the client side, neither TimelineClient#putEntities (the v2 flavor) 
> nor the #putEntitiesAsync method return any value.
> For the second drop we may want to consider how we properly bubble up 
> exceptions throughout the write and reader call paths and if we want to 
> return a response in putEntities and some future kind of result for 
> putEntitiesAsync.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

Reply via email to