[jira] [Commented] (HDDS-2175) Propagate System Exceptions from the OzoneManager

2019-10-01 Thread Arpit Agarwal (Jira)


[ 
https://issues.apache.org/jira/browse/HDDS-2175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16942183#comment-16942183
 ] 

Arpit Agarwal commented on HDDS-2175:
-

Thank you for the link to the paper. It looks like a great weekend read.

This quote from chapter 1 stands out:
bq. While it is widely accepted that exception handling has a number of 
problems, it is the best we currently have available[38, 72].

> Propagate System Exceptions from the OzoneManager
> -
>
> Key: HDDS-2175
> URL: https://issues.apache.org/jira/browse/HDDS-2175
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>  Components: Ozone Manager
>Reporter: Supratim Deka
>Assignee: Supratim Deka
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> Exceptions encountered while processing requests on the OM are categorized as 
> business exceptions and system exceptions. All of the business exceptions are 
> captured as OMException and have an associated status code which is returned 
> to the client. The handling of these is not going to be changed.
> Currently system exceptions are returned as INTERNAL ERROR to the client with 
> a 1 line message string from the exception. The scope of this jira is to 
> capture system exceptions and propagate the related information(including the 
> complete stack trace) back to the client.
> There are 3 sub-tasks required to achieve this
> 1. Separate capture and handling for OMException and the other 
> exceptions(IOException). For system exceptions, use Hadoop IPC 
> ServiceException mechanism to send the stack trace to the client.
> 2. track and propagate exceptions inside Ratis OzoneManagerStateMachine and 
> propagate up to the OzoneManager layer (on the leader). Currently, these 
> exceptions are not being tracked.
> 3. Handle and propagate exceptions from Ratis.
> Will raise jira for each sub-task.
>   



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-2175) Propagate System Exceptions from the OzoneManager

2019-10-01 Thread Arpit Agarwal (Jira)


[ 
https://issues.apache.org/jira/browse/HDDS-2175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16942165#comment-16942165
 ] 

Arpit Agarwal commented on HDDS-2175:
-

C++ exceptions are [widely considered 
broken|http://yosefk.com/c++fqa/defective.html#defect-10] so we can't directly 
compare C++ best practices with Java. Golang not having exceptions is a step 
backwards for debuggability. Perhaps it works well for Google, for mere mortals 
like me exceptions are a boon. :) It is especially valuable in this phase of 
Ozone where we are stabilizing it. 

bq. But as I said; I think the disagreement is a question of taste; so I do not 
want perfect to be the enemy of good
Thanks for giving the option to go ahead. One thing we can do is make this 
behavior configurable. In the future we can turn it off entirely if it turns 
out not to be useful.

> Propagate System Exceptions from the OzoneManager
> -
>
> Key: HDDS-2175
> URL: https://issues.apache.org/jira/browse/HDDS-2175
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>  Components: Ozone Manager
>Reporter: Supratim Deka
>Assignee: Supratim Deka
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> Exceptions encountered while processing requests on the OM are categorized as 
> business exceptions and system exceptions. All of the business exceptions are 
> captured as OMException and have an associated status code which is returned 
> to the client. The handling of these is not going to be changed.
> Currently system exceptions are returned as INTERNAL ERROR to the client with 
> a 1 line message string from the exception. The scope of this jira is to 
> capture system exceptions and propagate the related information(including the 
> complete stack trace) back to the client.
> There are 3 sub-tasks required to achieve this
> 1. Separate capture and handling for OMException and the other 
> exceptions(IOException). For system exceptions, use Hadoop IPC 
> ServiceException mechanism to send the stack trace to the client.
> 2. track and propagate exceptions inside Ratis OzoneManagerStateMachine and 
> propagate up to the OzoneManager layer (on the leader). Currently, these 
> exceptions are not being tracked.
> 3. Handle and propagate exceptions from Ratis.
> Will raise jira for each sub-task.
>   



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-2175) Propagate System Exceptions from the OzoneManager

2019-09-30 Thread Anu Engineer (Jira)


[ 
https://issues.apache.org/jira/browse/HDDS-2175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16941476#comment-16941476
 ] 

Anu Engineer commented on HDDS-2175:


It is something that I disagree with. But if you feel strongly about this; 
please go ahead.

> Propagate System Exceptions from the OzoneManager
> -
>
> Key: HDDS-2175
> URL: https://issues.apache.org/jira/browse/HDDS-2175
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>  Components: Ozone Manager
>Reporter: Supratim Deka
>Assignee: Supratim Deka
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> Exceptions encountered while processing requests on the OM are categorized as 
> business exceptions and system exceptions. All of the business exceptions are 
> captured as OMException and have an associated status code which is returned 
> to the client. The handling of these is not going to be changed.
> Currently system exceptions are returned as INTERNAL ERROR to the client with 
> a 1 line message string from the exception. The scope of this jira is to 
> capture system exceptions and propagate the related information(including the 
> complete stack trace) back to the client.
> There are 3 sub-tasks required to achieve this
> 1. Separate capture and handling for OMException and the other 
> exceptions(IOException). For system exceptions, use Hadoop IPC 
> ServiceException mechanism to send the stack trace to the client.
> 2. track and propagate exceptions inside Ratis OzoneManagerStateMachine and 
> propagate up to the OzoneManager layer (on the leader). Currently, these 
> exceptions are not being tracked.
> 3. Handle and propagate exceptions from Ratis.
> Will raise jira for each sub-task.
>   



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-2175) Propagate System Exceptions from the OzoneManager

2019-09-30 Thread Arpit Agarwal (Jira)


[ 
https://issues.apache.org/jira/browse/HDDS-2175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16941456#comment-16941456
 ] 

Arpit Agarwal commented on HDDS-2175:
-

bq. it is hard to parse these exceptions even when they are part of normal log 
files.
And yet these exceptions are a godsend. I would rather see one exception than 
10 obscure log messages since it tells me exactly when something 'exceptional' 
happened and the code path leading to the occurrence.

bq. If we add exceptions to those strings, the human readability of those error 
messages goes down.
The readability goes up. You now actually get a sense for what actually went 
wrong instead of some generic message. 

bq. I had a chat with Supratim Deka and I said that I am all for increasing the 
fidelity of the error codes, that is we can add more error codes if we want to 
fine tune these messages. 
Lot more work with inferior results. Error codes are terrible in layered 
systems [since multiple layers will often wind up translating 
codes|https://twitter.com/Obdurodon/status/1161700056740876289]. The only way 
to maintain full fidelity is add a new error code for every single failure 
path, an impossible task. Instead just present the original exception as it 
happened. This is friendlier for your end users and painless for developers.

bq. I prefer a clear, simple contract between the server and client, I think it 
makes it easier for future clients to be developed more easily.
Exceptions as added here will make development of future clients super easy. 
Since the exception is stringified and propagated over the wire, all the client 
has to do is print the string without any interpretation. The fears seems 
unfounded to me.

> Propagate System Exceptions from the OzoneManager
> -
>
> Key: HDDS-2175
> URL: https://issues.apache.org/jira/browse/HDDS-2175
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>  Components: Ozone Manager
>Reporter: Supratim Deka
>Assignee: Supratim Deka
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> Exceptions encountered while processing requests on the OM are categorized as 
> business exceptions and system exceptions. All of the business exceptions are 
> captured as OMException and have an associated status code which is returned 
> to the client. The handling of these is not going to be changed.
> Currently system exceptions are returned as INTERNAL ERROR to the client with 
> a 1 line message string from the exception. The scope of this jira is to 
> capture system exceptions and propagate the related information(including the 
> complete stack trace) back to the client.
> There are 3 sub-tasks required to achieve this
> 1. Separate capture and handling for OMException and the other 
> exceptions(IOException). For system exceptions, use Hadoop IPC 
> ServiceException mechanism to send the stack trace to the client.
> 2. track and propagate exceptions inside Ratis OzoneManagerStateMachine and 
> propagate up to the OzoneManager layer (on the leader). Currently, these 
> exceptions are not being tracked.
> 3. Handle and propagate exceptions from Ratis.
> Will raise jira for each sub-task.
>   



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-2175) Propagate System Exceptions from the OzoneManager

2019-09-30 Thread Anu Engineer (Jira)


[ 
https://issues.apache.org/jira/browse/HDDS-2175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16941392#comment-16941392
 ] 

Anu Engineer commented on HDDS-2175:


bq. I feel that call stacks are invaluable when included in the bug report to 
the developer.

I completely agree. As I mentioned in my comment in the Github, they are very 
useful tools for debugging. But we have to weigh the pros and cons of the 
approach. Here are some downsides, so I will list them out.

1. Code and Style Consistency - Generally, Errors are propagated via Error code 
and Message (Goland, C, etc) or Exceptions (Java, C++ etc). When we developed 
this interface, we choose to go with Error code and Message approach instead of 
Exceptions. So mixing these different approaches creates very inconsistent code 
flows.

2. Prevent Java server abstractions from leaking to client side - Java 
exceptions are very java specific; it is hard to parse these exceptions even 
when they are part of normal log files. It is difficult to read thru a printed 
stack to even understand the issue. This gets compounded when Exceptions stack. 
When we were writing this client interface, we wanted to make sure it is easy 
to write clients in other languages. A simple, Error code and a message is 
universal, that all languages understand and easy to write other language 
clients which can speak this protocol.

3. The current code experience - There are several parts of this code, where 
the clients print out these messages to the users. If we add exceptions to 
those strings, the human readability of those error messages goes down. 

4. If we want to move to exceptions instead of  error codes , it is possible 
(even though I think our future clients will suffer), but we need to move away 
from the error/message model. That is lot of work,  with very little benefit, 
other than the fact that we will have a consistent experience and exceptions 
will flow to the client side.

I had a chat with [~sdeka] and I said that I am all for increasing the fidelity 
of the error codes, that is we can add more error codes if we want to fine tune 
these messages. I am also all for logging more on the server side. So I am not 
against the patch, just wanted to avoid *server side Java exceptions crossing 
over to the client side*. I prefer a clear, simple contract between the server 
and client, I think it makes it easier for future clients to be developed more 
easily. 

> Propagate System Exceptions from the OzoneManager
> -
>
> Key: HDDS-2175
> URL: https://issues.apache.org/jira/browse/HDDS-2175
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>  Components: Ozone Manager
>Reporter: Supratim Deka
>Assignee: Supratim Deka
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> Exceptions encountered while processing requests on the OM are categorized as 
> business exceptions and system exceptions. All of the business exceptions are 
> captured as OMException and have an associated status code which is returned 
> to the client. The handling of these is not going to be changed.
> Currently system exceptions are returned as INTERNAL ERROR to the client with 
> a 1 line message string from the exception. The scope of this jira is to 
> capture system exceptions and propagate the related information(including the 
> complete stack trace) back to the client.
> There are 3 sub-tasks required to achieve this
> 1. Separate capture and handling for OMException and the other 
> exceptions(IOException). For system exceptions, use Hadoop IPC 
> ServiceException mechanism to send the stack trace to the client.
> 2. track and propagate exceptions inside Ratis OzoneManagerStateMachine and 
> propagate up to the OzoneManager layer (on the leader). Currently, these 
> exceptions are not being tracked.
> 3. Handle and propagate exceptions from Ratis.
> Will raise jira for each sub-task.
>   



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-2175) Propagate System Exceptions from the OzoneManager

2019-09-28 Thread Arpit Agarwal (Jira)


[ 
https://issues.apache.org/jira/browse/HDDS-2175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16939880#comment-16939880
 ] 

Arpit Agarwal commented on HDDS-2175:
-

I feel that call stacks are invaluable when included in the bug report to the 
developer.

> Propagate System Exceptions from the OzoneManager
> -
>
> Key: HDDS-2175
> URL: https://issues.apache.org/jira/browse/HDDS-2175
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>  Components: Ozone Manager
>Reporter: Supratim Deka
>Assignee: Supratim Deka
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Exceptions encountered while processing requests on the OM are categorized as 
> business exceptions and system exceptions. All of the business exceptions are 
> captured as OMException and have an associated status code which is returned 
> to the client. The handling of these is not going to be changed.
> Currently system exceptions are returned as INTERNAL ERROR to the client with 
> a 1 line message string from the exception. The scope of this jira is to 
> capture system exceptions and propagate the related information(including the 
> complete stack trace) back to the client.
> There are 3 sub-tasks required to achieve this
> 1. Separate capture and handling for OMException and the other 
> exceptions(IOException). For system exceptions, use Hadoop IPC 
> ServiceException mechanism to send the stack trace to the client.
> 2. track and propagate exceptions inside Ratis OzoneManagerStateMachine and 
> propagate up to the OzoneManager layer (on the leader). Currently, these 
> exceptions are not being tracked.
> 3. Handle and propagate exceptions from Ratis.
> Will raise jira for each sub-task.
>   



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-2175) Propagate System Exceptions from the OzoneManager

2019-09-27 Thread Supratim Deka (Jira)


[ 
https://issues.apache.org/jira/browse/HDDS-2175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16939841#comment-16939841
 ] 

Supratim Deka commented on HDDS-2175:
-

Note from [~aengineer] posted on the github PR:

Also are these call stacks something that the end user should ever see? I have 
always found as user a call stack useless, it might be useful for the developer 
for debugging purposes, but clients are generally things used by real users. 
Maybe if these stacks are not logged in the ozone.log, we can log them, 
provided we can guard them via a config key and by default we do not do that.


> Propagate System Exceptions from the OzoneManager
> -
>
> Key: HDDS-2175
> URL: https://issues.apache.org/jira/browse/HDDS-2175
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>  Components: Ozone Manager
>Reporter: Supratim Deka
>Assignee: Supratim Deka
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Exceptions encountered while processing requests on the OM are categorized as 
> business exceptions and system exceptions. All of the business exceptions are 
> captured as OMException and have an associated status code which is returned 
> to the client. The handling of these is not going to be changed.
> Currently system exceptions are returned as INTERNAL ERROR to the client with 
> a 1 line message string from the exception. The scope of this jira is to 
> capture system exceptions and propagate the related information(including the 
> complete stack trace) back to the client.
> There are 3 sub-tasks required to achieve this
> 1. Separate capture and handling for OMException and the other 
> exceptions(IOException). For system exceptions, use Hadoop IPC 
> ServiceException mechanism to send the stack trace to the client.
> 2. track and propagate exceptions inside Ratis OzoneManagerStateMachine and 
> propagate up to the OzoneManager layer (on the leader). Currently, these 
> exceptions are not being tracked.
> 3. Handle and propagate exceptions from Ratis.
> Will raise jira for each sub-task.
>   



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org