[GitHub] [hbase] ndimiduk commented on issue #754: HBASE-22978 : Online slow response log

2020-02-13 Thread GitBox
ndimiduk commented on issue #754: HBASE-22978 : Online slow response log
URL: https://github.com/apache/hbase/pull/754#issuecomment-585986284
 
 
   > I have tried to cover these points in my recent commit except for `need 
separation of concerns between SlowLog.toString and formatting of output in the 
shell`. The reason being shell is not able to use Gson as efficiently as 
keeping `Gson.toJson()` in toString of SlowLogRecord because we have some 
serialization filters (neither does shell have any good pretty print library 
that works with our POJO - AFAIK). shell or any client would receive 
`List` as output of `Admin.getSlowLogResponses()`. In a way, 
keeping `Gson.toJson()` within `toString()` would ensure all clients can print 
output in pretty print format. Do you think this is good to maintain uniform 
print format across all clients?
   
   No, I do not thing it's good to maintain uniform print format across all 
client. Anything that is providing display functionality for an object will 
need control over how the content is formatted.
   
   I maintain that pretty-printed json does not belong in the object's 
`toString` method. The abundantly common use-case for `toString` is the logging 
infrastructure, which has different requirements from the shell's user 
interface. Please just use the Apache Commons Lang `ToStringBuilder` (I prefer 
it's `SHORT_PREFIX_STYLE`) for that purpose.
   
   If you would like the object to control it's own formatting as a convenience 
to a variety of clients (something I find dubious but why not), I have no 
problem with putting this concern behind a separate method in `SlowLogRecord`, 
such as `toFormattedJson`. My main objection to this approach is that this 
introduces a dependency into `SlowLogRecord`, what is otherwise a simple POJO, 
onto an unrelated provider of json representation.
   
   Also, you may not have noticed that the shell has it's own class for 
handling display, called `Formatter`, under 
`hbase-shell/src/main/ruby/shell/formatter.rb`. I haven't used it myself, but 
it seems purpose-built to handle the use-case you have.
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [hbase] ndimiduk commented on issue #754: HBASE-22978 : Online slow response log

2020-02-12 Thread GitBox
ndimiduk commented on issue #754: HBASE-22978 : Online slow response log
URL: https://github.com/apache/hbase/pull/754#issuecomment-585382957
 
 
   Thanks for addressing all my concerns so far @virajjasani . I've tried to 
resolve all the inline discussions. Here's a summary of what I think remains 
outstanding.
   
   - log large responses (follow-on task)?
   - don’t assume “tooLarge” and “tooSlow” are mutually exclusive.
   - replicate log entries into a file on HDFS or a new system table (follow-on 
task)?
   - expose slow/large responses as a metric (follow-on task)?
   - do we need to ship with thrift api parity?
   - push filter and limit logic into the server, so all clients have the same 
experience
   - I think this requires adding a `TooSlowRequest` protobuf message with 
filter and limit parameters
   - Maybe then `SlowLogPayload` is renamed to `SlowLogRecord` or something 
like that
   - need separation of concerns between SlowLog.toString and formatting of 
output in the shell
   - avoid adding to the surface area of HConstants
   - proper interrupt handling within SlowLogEventHandler
   - update configuration adoc to reference the feature adoc
   - test cleanup
   - avoid sleeping for fixed time units, instead use waitFor
   - maybe combine some methods containing repeated logic, introduce a 
helper method or two


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [hbase] ndimiduk commented on issue #754: HBASE-22978 : Online slow response log

2020-02-12 Thread GitBox
ndimiduk commented on issue #754: HBASE-22978 : Online slow response log
URL: https://github.com/apache/hbase/pull/754#issuecomment-585377300
 
 
   > I think you misunderstood me here...
   > 
   > "With the patch we would run at RS entrance only and for any slow running 
query it will go to this ring buffer running at RS. Just that this patch 
doesn't include storing slow logs in HDFS, which is also a good idea and 
probably up for a subtask given the priority is storing slow logs in ring 
buffer first. Storing them in HDFS would be just an option as part of another 
Jira(sub-task)."
   > 
   > What I was saying is that your ringbuffer only keeps dfsclient ops. I was 
wondering if we could put actual requests into a system like this. If they are 
taking a long time, we could inspect the ringbuffer to see which type of 
queries are taking a while. It would be coarser grained than the dfs accesses 
only. Was just wondering if you think we could do this in a follow-on based on 
work h ere.
   
   You sure about that @saintstack ? My read here is that we're logging the 
HBase client RPCs, sitting at the "front door" in so much as it's being done in 
`RpcServer#call`. I don't see HDFS client ops involved here.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [hbase] ndimiduk commented on issue #754: HBASE-22978 : Online slow response log

2020-02-12 Thread GitBox
ndimiduk commented on issue #754: HBASE-22978 : Online slow response log
URL: https://github.com/apache/hbase/pull/754#issuecomment-585372373
 
 
   > 2\. IMHO if we are going to achieve significant resource utilization with 
TTL of ring buffer entries, we can have a sub-task to this Jira, but do we 
really need it? I feel it's fine to let them stay in the buffer. Anyways size 
of the buffer and whether to even use ring buffer is up-to user. However, if 
required, I am fine with a sub-task.
   
   It is a fixed size of memory, and a relatively small amount, so I guess it's 
not a big deal. I just thought it would be good to free memory when we can. I 
see why this would be a bit of a hassle -- you'd need a cleaner chore and the 
Disruptor API is probably not designed for ad-hoc removal. I guess just omit it 
for now.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [hbase] ndimiduk commented on issue #754: HBASE-22978 : Online slow response log

2020-02-10 Thread GitBox
ndimiduk commented on issue #754: HBASE-22978 : Online slow response log
URL: https://github.com/apache/hbase/pull/754#issuecomment-584229446
 
 
   >> 1. Since the detailed info is going through this ring buffer subsystem, 
why do we continue to log at all? Why not replace the logging with a metric?
   > 1. Ring Buffer is not a replacement of slow logs that we write to file, 
it's just a complete representation of slow logs, whereas in log files we might 
have trimmed logs. We also have plan to write all slowlogs to HDFS (optionally) 
and provide HDFS file path to user optionally, at that time we can think of 
removing logging in log files. However, I am not sure about metric. Which 
metric are you suggesting to expose? You mean slow logs with some parameters?
   
   I'm suggesting we do away with file logging and instead expose the presence 
of slow activity as a (new?) metric.
   
   By the time this patch lands, we'll have the logged information in two 
places, neither of which, on their own, allow an operator to easily see how 
prevalent the problem is. I'm suggesting that we could instead provide a metric 
that says "this region server is experiencing N slow request per time interval 
T", as well as providing detailed incident fidelity via this new mechanism.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [hbase] ndimiduk commented on issue #754: HBASE-22978 : Online slow response log

2020-01-27 Thread GitBox
ndimiduk commented on issue #754: HBASE-22978 : Online slow response log
URL: https://github.com/apache/hbase/pull/754#issuecomment-579042263
 
 
   I sent a note to dev@ giving a status update. If you don’t find there what
   you need, or have some comments, please reply on that thread.
   
   On Mon, Jan 27, 2020 at 16:42 Viraj Jasani  wrote:
   
   > Thanks for the review @ndimiduk 
   > I will try to be on top of this. Could you please let me know if you have
   > any estimate of 2.3.0 release? May be by end of next week?
   >
   > —
   > You are receiving this because you were mentioned.
   > Reply to this email directly, view it on GitHub
   > 
,
   > or unsubscribe
   > 

   > .
   >
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [hbase] ndimiduk commented on issue #754: HBASE-22978 : Online slow response log

2020-01-27 Thread GitBox
ndimiduk commented on issue #754: HBASE-22978 : Online slow response log
URL: https://github.com/apache/hbase/pull/754#issuecomment-578952421
 
 
   A couple high-level questions:
   
   1. Since the detailed info is going through this ring buffer subsystem, why 
do we continue to log at all? Why not replace the logging with a metric? 
   1. Should entries of the ring buffer expire after a period of time?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services