[jira] [Commented] (HBASE-11747) ClusterStatus is too bulky
[ https://issues.apache.org/jira/browse/HBASE-11747?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14705451#comment-14705451 ] Andrew Purtell commented on HBASE-11747: bq. Do we want to just bump CodeInputStream#limit to higher numbers and see if that addresses problem at hands I did this on HBASE-13825 ClusterStatus is too bulky --- Key: HBASE-11747 URL: https://issues.apache.org/jira/browse/HBASE-11747 Project: HBase Issue Type: Sub-task Reporter: Virag Kothari Attachments: exceptiontrace Following exception on 0.98 with 1M regions on cluster with 160 region servers {code} Caused by: java.io.IOException: Call to regionserverhost:port failed on local exception: com.google.protobuf.InvalidProtocolBufferException: Protocol message was too large. May be malicious. Use CodedInputStream.setSizeLimit() to increase the size limit. at org.apache.hadoop.hbase.ipc.RpcClient.wrapException(RpcClient.java:1482) at org.apache.hadoop.hbase.ipc.RpcClient.call(RpcClient.java:1454) at org.apache.hadoop.hbase.ipc.RpcClient.callBlockingMethod(RpcClient.java:1654) at org.apache.hadoop.hbase.ipc.RpcClient$BlockingRpcChannelImplementation.callBlockingMethod(RpcClient.java:1712) at org.apache.hadoop.hbase.protobuf.generated.MasterProtos$MasterService$BlockingStub.getClusterStatus(MasterProtos.java:42555) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$5.getClusterStatus(HConnectionManager.java:2132) at org.apache.hadoop.hbase.client.HBaseAdmin$16.call(HBaseAdmin.java:2166) at org.apache.hadoop.hbase.client.HBaseAdmin$16.call(HBaseAdmin.java:2162) at org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:114) ... 43 more Caused by: com.google.protobuf.InvalidProtocolBufferException: Protocol message was too large. May be malicious. Use CodedInputStream.setSizeLimit() to increase the size limit. at com.google.protobuf.InvalidProtocolBufferException.sizeLimitExceeded(InvalidProtocolBufferException.java:110) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-11747) ClusterStatus is too bulky
[ https://issues.apache.org/jira/browse/HBASE-11747?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14610932#comment-14610932 ] Thiruvel Thirumoolan commented on HBASE-11747: -- We are also exploring the option of compressing the status and sending from the server. ClusterStatus is too bulky --- Key: HBASE-11747 URL: https://issues.apache.org/jira/browse/HBASE-11747 Project: HBase Issue Type: Sub-task Reporter: Virag Kothari Attachments: exceptiontrace Following exception on 0.98 with 1M regions on cluster with 160 region servers {code} Caused by: java.io.IOException: Call to regionserverhost:port failed on local exception: com.google.protobuf.InvalidProtocolBufferException: Protocol message was too large. May be malicious. Use CodedInputStream.setSizeLimit() to increase the size limit. at org.apache.hadoop.hbase.ipc.RpcClient.wrapException(RpcClient.java:1482) at org.apache.hadoop.hbase.ipc.RpcClient.call(RpcClient.java:1454) at org.apache.hadoop.hbase.ipc.RpcClient.callBlockingMethod(RpcClient.java:1654) at org.apache.hadoop.hbase.ipc.RpcClient$BlockingRpcChannelImplementation.callBlockingMethod(RpcClient.java:1712) at org.apache.hadoop.hbase.protobuf.generated.MasterProtos$MasterService$BlockingStub.getClusterStatus(MasterProtos.java:42555) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$5.getClusterStatus(HConnectionManager.java:2132) at org.apache.hadoop.hbase.client.HBaseAdmin$16.call(HBaseAdmin.java:2166) at org.apache.hadoop.hbase.client.HBaseAdmin$16.call(HBaseAdmin.java:2162) at org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:114) ... 43 more Caused by: com.google.protobuf.InvalidProtocolBufferException: Protocol message was too large. May be malicious. Use CodedInputStream.setSizeLimit() to increase the size limit. at com.google.protobuf.InvalidProtocolBufferException.sizeLimitExceeded(InvalidProtocolBufferException.java:110) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-11747) ClusterStatus is too bulky
[ https://issues.apache.org/jira/browse/HBASE-11747?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14610877#comment-14610877 ] Mikhail Antonov commented on HBASE-11747: - Wondering about next steps/directions here. Do we want to just bump CodeInputStream#limit to higher numbers and see if that addresses problem at hands (I think it should), or do we want to optimize protocol? Ive seen 3 options here - 1)streaming instead of single message 2) decouple region/RS load info from cluster status itself 3) try to make data pieces themselves more compact, region names etc. ClusterStatus is too bulky --- Key: HBASE-11747 URL: https://issues.apache.org/jira/browse/HBASE-11747 Project: HBase Issue Type: Sub-task Reporter: Virag Kothari Attachments: exceptiontrace Following exception on 0.98 with 1M regions on cluster with 160 region servers {code} Caused by: java.io.IOException: Call to regionserverhost:port failed on local exception: com.google.protobuf.InvalidProtocolBufferException: Protocol message was too large. May be malicious. Use CodedInputStream.setSizeLimit() to increase the size limit. at org.apache.hadoop.hbase.ipc.RpcClient.wrapException(RpcClient.java:1482) at org.apache.hadoop.hbase.ipc.RpcClient.call(RpcClient.java:1454) at org.apache.hadoop.hbase.ipc.RpcClient.callBlockingMethod(RpcClient.java:1654) at org.apache.hadoop.hbase.ipc.RpcClient$BlockingRpcChannelImplementation.callBlockingMethod(RpcClient.java:1712) at org.apache.hadoop.hbase.protobuf.generated.MasterProtos$MasterService$BlockingStub.getClusterStatus(MasterProtos.java:42555) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$5.getClusterStatus(HConnectionManager.java:2132) at org.apache.hadoop.hbase.client.HBaseAdmin$16.call(HBaseAdmin.java:2166) at org.apache.hadoop.hbase.client.HBaseAdmin$16.call(HBaseAdmin.java:2162) at org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:114) ... 43 more Caused by: com.google.protobuf.InvalidProtocolBufferException: Protocol message was too large. May be malicious. Use CodedInputStream.setSizeLimit() to increase the size limit. at com.google.protobuf.InvalidProtocolBufferException.sizeLimitExceeded(InvalidProtocolBufferException.java:110) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-11747) ClusterStatus is too bulky
[ https://issues.apache.org/jira/browse/HBASE-11747?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14610905#comment-14610905 ] stack commented on HBASE-11747: --- Lets up CIS#limit for sure. In new JIRA optimize protocol. There are a few already if you search 'hbase clusterstatus'. I like #2 and #3 from your list. For #2, was looking at exporting jmx so say the Master could read cluster metrics instead of getting metrics recast and served on the heartbeat. Was looking at https://jolokia.org/ Seems more sensible than JMX federation (Seems like its possible to hook up as src for D3 graphing). Do we poll rather than have the stuff pushed? What happens in a big cluster? Good on you [~mantonov] ClusterStatus is too bulky --- Key: HBASE-11747 URL: https://issues.apache.org/jira/browse/HBASE-11747 Project: HBase Issue Type: Sub-task Reporter: Virag Kothari Attachments: exceptiontrace Following exception on 0.98 with 1M regions on cluster with 160 region servers {code} Caused by: java.io.IOException: Call to regionserverhost:port failed on local exception: com.google.protobuf.InvalidProtocolBufferException: Protocol message was too large. May be malicious. Use CodedInputStream.setSizeLimit() to increase the size limit. at org.apache.hadoop.hbase.ipc.RpcClient.wrapException(RpcClient.java:1482) at org.apache.hadoop.hbase.ipc.RpcClient.call(RpcClient.java:1454) at org.apache.hadoop.hbase.ipc.RpcClient.callBlockingMethod(RpcClient.java:1654) at org.apache.hadoop.hbase.ipc.RpcClient$BlockingRpcChannelImplementation.callBlockingMethod(RpcClient.java:1712) at org.apache.hadoop.hbase.protobuf.generated.MasterProtos$MasterService$BlockingStub.getClusterStatus(MasterProtos.java:42555) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$5.getClusterStatus(HConnectionManager.java:2132) at org.apache.hadoop.hbase.client.HBaseAdmin$16.call(HBaseAdmin.java:2166) at org.apache.hadoop.hbase.client.HBaseAdmin$16.call(HBaseAdmin.java:2162) at org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:114) ... 43 more Caused by: com.google.protobuf.InvalidProtocolBufferException: Protocol message was too large. May be malicious. Use CodedInputStream.setSizeLimit() to increase the size limit. at com.google.protobuf.InvalidProtocolBufferException.sizeLimitExceeded(InvalidProtocolBufferException.java:110) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-11747) ClusterStatus is too bulky
[ https://issues.apache.org/jira/browse/HBASE-11747?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14611054#comment-14611054 ] Mikhail Antonov commented on HBASE-11747: - [~thiruvel] bq. We are also exploring the option of compressing the status and sending from the server. Could you please describe you case little more? You're facing this error, or just trying to optimize the traffic or something else? Would be interested to know the size of cluster/ # of regions you're serving. ClusterStatus is too bulky --- Key: HBASE-11747 URL: https://issues.apache.org/jira/browse/HBASE-11747 Project: HBase Issue Type: Sub-task Reporter: Virag Kothari Attachments: exceptiontrace Following exception on 0.98 with 1M regions on cluster with 160 region servers {code} Caused by: java.io.IOException: Call to regionserverhost:port failed on local exception: com.google.protobuf.InvalidProtocolBufferException: Protocol message was too large. May be malicious. Use CodedInputStream.setSizeLimit() to increase the size limit. at org.apache.hadoop.hbase.ipc.RpcClient.wrapException(RpcClient.java:1482) at org.apache.hadoop.hbase.ipc.RpcClient.call(RpcClient.java:1454) at org.apache.hadoop.hbase.ipc.RpcClient.callBlockingMethod(RpcClient.java:1654) at org.apache.hadoop.hbase.ipc.RpcClient$BlockingRpcChannelImplementation.callBlockingMethod(RpcClient.java:1712) at org.apache.hadoop.hbase.protobuf.generated.MasterProtos$MasterService$BlockingStub.getClusterStatus(MasterProtos.java:42555) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$5.getClusterStatus(HConnectionManager.java:2132) at org.apache.hadoop.hbase.client.HBaseAdmin$16.call(HBaseAdmin.java:2166) at org.apache.hadoop.hbase.client.HBaseAdmin$16.call(HBaseAdmin.java:2162) at org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:114) ... 43 more Caused by: com.google.protobuf.InvalidProtocolBufferException: Protocol message was too large. May be malicious. Use CodedInputStream.setSizeLimit() to increase the size limit. at com.google.protobuf.InvalidProtocolBufferException.sizeLimitExceeded(InvalidProtocolBufferException.java:110) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-11747) ClusterStatus is too bulky
[ https://issues.apache.org/jira/browse/HBASE-11747?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14611051#comment-14611051 ] Mikhail Antonov commented on HBASE-11747: - [~stack] bq. For #2, was looking at exporting jmx so say the Master could read cluster metrics instead of getting metrics recast and served on the heartbeat Did you mean rpc, not jmx? I briefly looked at where it's actually used, and unless I'm missing something, we don't really use it in any heardbeats. Cluster status is used: - for subscribers on (multicast) publishing (that's the only push as far as I can tell?) - in separate MasterRpcServices#GetClusterStatus rpc call and accordingly in Admin interface wrapping it (which is in the log posted in the jora) - in REST messages For regular heartbeats we just use MRS#regionServerReport rpc call, which only pushes to master RS server name/load (including region load). So as far as I can tell, those are already mostly decoupled. So I think the options (aside bumping the size of message) drift to something like check if monolithic cluster status is looking too big (over defined limit) on server side, and return it with empty load in this case, setting some flag indicating that message is partially constructed to not fail as transport level, and that client should use separate call to request server/region load for the list of RSs it's interested to know about? In other words, I guess I see 2 basic options: - bump the size of message in this jira (trivial patch) - leave current ClusterStatus format as is for compatibility, but add handling to return empty LiveServerInfo list if it's coming up too big, add new rpc call to retrieve list of LiveServerInfo for a list (range?) of region servers. Here's where RS groups would be handy. What do you think? bq. Seems like its possible to hook up as src for D3 graphing Hmm, that's something different, drawing metrics in the UI? ClusterStatus is too bulky --- Key: HBASE-11747 URL: https://issues.apache.org/jira/browse/HBASE-11747 Project: HBase Issue Type: Sub-task Reporter: Virag Kothari Attachments: exceptiontrace Following exception on 0.98 with 1M regions on cluster with 160 region servers {code} Caused by: java.io.IOException: Call to regionserverhost:port failed on local exception: com.google.protobuf.InvalidProtocolBufferException: Protocol message was too large. May be malicious. Use CodedInputStream.setSizeLimit() to increase the size limit. at org.apache.hadoop.hbase.ipc.RpcClient.wrapException(RpcClient.java:1482) at org.apache.hadoop.hbase.ipc.RpcClient.call(RpcClient.java:1454) at org.apache.hadoop.hbase.ipc.RpcClient.callBlockingMethod(RpcClient.java:1654) at org.apache.hadoop.hbase.ipc.RpcClient$BlockingRpcChannelImplementation.callBlockingMethod(RpcClient.java:1712) at org.apache.hadoop.hbase.protobuf.generated.MasterProtos$MasterService$BlockingStub.getClusterStatus(MasterProtos.java:42555) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$5.getClusterStatus(HConnectionManager.java:2132) at org.apache.hadoop.hbase.client.HBaseAdmin$16.call(HBaseAdmin.java:2166) at org.apache.hadoop.hbase.client.HBaseAdmin$16.call(HBaseAdmin.java:2162) at org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:114) ... 43 more Caused by: com.google.protobuf.InvalidProtocolBufferException: Protocol message was too large. May be malicious. Use CodedInputStream.setSizeLimit() to increase the size limit. at com.google.protobuf.InvalidProtocolBufferException.sizeLimitExceeded(InvalidProtocolBufferException.java:110) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-11747) ClusterStatus is too bulky
[ https://issues.apache.org/jira/browse/HBASE-11747?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14611166#comment-14611166 ] Mikhail Antonov commented on HBASE-11747: - (also curious to hear more opinions?) ClusterStatus is too bulky --- Key: HBASE-11747 URL: https://issues.apache.org/jira/browse/HBASE-11747 Project: HBase Issue Type: Sub-task Reporter: Virag Kothari Attachments: exceptiontrace Following exception on 0.98 with 1M regions on cluster with 160 region servers {code} Caused by: java.io.IOException: Call to regionserverhost:port failed on local exception: com.google.protobuf.InvalidProtocolBufferException: Protocol message was too large. May be malicious. Use CodedInputStream.setSizeLimit() to increase the size limit. at org.apache.hadoop.hbase.ipc.RpcClient.wrapException(RpcClient.java:1482) at org.apache.hadoop.hbase.ipc.RpcClient.call(RpcClient.java:1454) at org.apache.hadoop.hbase.ipc.RpcClient.callBlockingMethod(RpcClient.java:1654) at org.apache.hadoop.hbase.ipc.RpcClient$BlockingRpcChannelImplementation.callBlockingMethod(RpcClient.java:1712) at org.apache.hadoop.hbase.protobuf.generated.MasterProtos$MasterService$BlockingStub.getClusterStatus(MasterProtos.java:42555) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$5.getClusterStatus(HConnectionManager.java:2132) at org.apache.hadoop.hbase.client.HBaseAdmin$16.call(HBaseAdmin.java:2166) at org.apache.hadoop.hbase.client.HBaseAdmin$16.call(HBaseAdmin.java:2162) at org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:114) ... 43 more Caused by: com.google.protobuf.InvalidProtocolBufferException: Protocol message was too large. May be malicious. Use CodedInputStream.setSizeLimit() to increase the size limit. at com.google.protobuf.InvalidProtocolBufferException.sizeLimitExceeded(InvalidProtocolBufferException.java:110) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-11747) ClusterStatus is too bulky
[ https://issues.apache.org/jira/browse/HBASE-11747?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14611159#comment-14611159 ] Mikhail Antonov commented on HBASE-11747: - bq. JMX. Idea is to help shrink ClusterStatus by moving metrics out. Hmm. JMX isn't a transport for messages, is it? I think I'm missing something here.. I thought only of RPC messaging overhaul here. Could you describe JMX approach? bq. Rather than have a protocol that cuts in only when we are too big, could we not slim ClusterStatus so vitals only and always require client use a separate call for detail (or go to metrics system if it is counts, etc., that it is interested in). I like your suggestion of adding a new call for doing new protocol That'd be best. So you think, just modify ClusterStatus proto server side wiring, so we just never include load info in the message (we can avoid completely removing this field to maintain wire compatibility?), and add new rpc method? That's what i'm thinking now too. Question - how would this new RPC overlap with metrics functionality? Let me walk thru users of ClusterStatus and see which of them actually use load info and for what (balancer, what else). bq. Yes. Pardon my conflation. Will restrain myself in future. Oh, I just meant, is there more aspects of this problem than what I see now, which should be considered while deciding of what approach to take. ClusterStatus is too bulky --- Key: HBASE-11747 URL: https://issues.apache.org/jira/browse/HBASE-11747 Project: HBase Issue Type: Sub-task Reporter: Virag Kothari Attachments: exceptiontrace Following exception on 0.98 with 1M regions on cluster with 160 region servers {code} Caused by: java.io.IOException: Call to regionserverhost:port failed on local exception: com.google.protobuf.InvalidProtocolBufferException: Protocol message was too large. May be malicious. Use CodedInputStream.setSizeLimit() to increase the size limit. at org.apache.hadoop.hbase.ipc.RpcClient.wrapException(RpcClient.java:1482) at org.apache.hadoop.hbase.ipc.RpcClient.call(RpcClient.java:1454) at org.apache.hadoop.hbase.ipc.RpcClient.callBlockingMethod(RpcClient.java:1654) at org.apache.hadoop.hbase.ipc.RpcClient$BlockingRpcChannelImplementation.callBlockingMethod(RpcClient.java:1712) at org.apache.hadoop.hbase.protobuf.generated.MasterProtos$MasterService$BlockingStub.getClusterStatus(MasterProtos.java:42555) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$5.getClusterStatus(HConnectionManager.java:2132) at org.apache.hadoop.hbase.client.HBaseAdmin$16.call(HBaseAdmin.java:2166) at org.apache.hadoop.hbase.client.HBaseAdmin$16.call(HBaseAdmin.java:2162) at org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:114) ... 43 more Caused by: com.google.protobuf.InvalidProtocolBufferException: Protocol message was too large. May be malicious. Use CodedInputStream.setSizeLimit() to increase the size limit. at com.google.protobuf.InvalidProtocolBufferException.sizeLimitExceeded(InvalidProtocolBufferException.java:110) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-11747) ClusterStatus is too bulky
[ https://issues.apache.org/jira/browse/HBASE-11747?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14611125#comment-14611125 ] stack commented on HBASE-11747: --- bq. Did you mean rpc, not jmx? JMX. Idea is to help shrink ClusterStatus by moving metrics out. bq. I'm missing something, we don't really use it in any heardbeats Right (I thought we did but it is just a sub-element, the ServerLoad, that is passed on the heartbeat -- pardon me). bq. check if monolithic cluster status is looking too big (over defined limit) on server side, and return it with empty load in this case, setting some flag indicating that message is partially constructed to not fail as transport level, and that client should use separate call to request server/region load for the list of RSs it's interested to know about? Rather than have a protocol that cuts in only when we are too big, could we not slim ClusterStatus so vitals only and always require client use a separate call for detail (or go to metrics system if it is counts, etc., that it is interested in) I like your suggestion of adding a new call for doing new protocol That'd be best. bq. Hmm, that's something different, drawing metrics in the UI? Yes. Pardon my conflation. Will restrain myself in future. ClusterStatus is too bulky --- Key: HBASE-11747 URL: https://issues.apache.org/jira/browse/HBASE-11747 Project: HBase Issue Type: Sub-task Reporter: Virag Kothari Attachments: exceptiontrace Following exception on 0.98 with 1M regions on cluster with 160 region servers {code} Caused by: java.io.IOException: Call to regionserverhost:port failed on local exception: com.google.protobuf.InvalidProtocolBufferException: Protocol message was too large. May be malicious. Use CodedInputStream.setSizeLimit() to increase the size limit. at org.apache.hadoop.hbase.ipc.RpcClient.wrapException(RpcClient.java:1482) at org.apache.hadoop.hbase.ipc.RpcClient.call(RpcClient.java:1454) at org.apache.hadoop.hbase.ipc.RpcClient.callBlockingMethod(RpcClient.java:1654) at org.apache.hadoop.hbase.ipc.RpcClient$BlockingRpcChannelImplementation.callBlockingMethod(RpcClient.java:1712) at org.apache.hadoop.hbase.protobuf.generated.MasterProtos$MasterService$BlockingStub.getClusterStatus(MasterProtos.java:42555) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$5.getClusterStatus(HConnectionManager.java:2132) at org.apache.hadoop.hbase.client.HBaseAdmin$16.call(HBaseAdmin.java:2166) at org.apache.hadoop.hbase.client.HBaseAdmin$16.call(HBaseAdmin.java:2162) at org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:114) ... 43 more Caused by: com.google.protobuf.InvalidProtocolBufferException: Protocol message was too large. May be malicious. Use CodedInputStream.setSizeLimit() to increase the size limit. at com.google.protobuf.InvalidProtocolBufferException.sizeLimitExceeded(InvalidProtocolBufferException.java:110) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-11747) ClusterStatus is too bulky
[ https://issues.apache.org/jira/browse/HBASE-11747?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14611189#comment-14611189 ] stack commented on HBASE-11747: --- bq. JMX isn't a transport for messages, is it? No. Generally JMX is for management. HBase uses it to publish server attributes and metrics. HBase also puts up a JMX Bean Server so you can query the beans over the net. This mechanism uses java's crazy RMI which is mostly unusable by systems other than java and even then, has a ping-pong random port mechanism that requires open port ranges. The nice thing about the https://jolokia.org/ is that it REST/JSON-ifies our JMX making it more palatable to more systems. bq. Could you describe JMX approach? ClusterStatus is made of various attributes including ServerLoad for every node in the cluster. ServerLoad is not actually server load. Rather, it is a dumping ground for all and sundry including server attributes, configuration, and metrics. Redoing ServerLoad so it is just load vitals would be a nice to have so we don't flood the master once a second as all report in with fat messages on their heartbeats. Server metrics are also available published out of our metrics system. Metrics are published variously -- as text in a servlet and as jmx beans available on each server (jmx is on a period IIRC, servlet is poll). That we are dumping out our metrics on a period via JMX and that we then go and collect them all again to put on a heartbeat is silly. Would be nice to refactor. If ServerLoad is slimmed, then it would help here given we do one up for each server and insert in ClusterStatus. That was high-level what I was thinking. Separate issue I'd say, a background consideration when addressing this one. bq. Question - how would this new RPC overlap with metrics functionality? Was thinking they'd be distinct. If you want metrics, use our metrics system; we are publishing our metrics per server anyways. ClusterStatus is too bulky --- Key: HBASE-11747 URL: https://issues.apache.org/jira/browse/HBASE-11747 Project: HBase Issue Type: Sub-task Reporter: Virag Kothari Attachments: exceptiontrace Following exception on 0.98 with 1M regions on cluster with 160 region servers {code} Caused by: java.io.IOException: Call to regionserverhost:port failed on local exception: com.google.protobuf.InvalidProtocolBufferException: Protocol message was too large. May be malicious. Use CodedInputStream.setSizeLimit() to increase the size limit. at org.apache.hadoop.hbase.ipc.RpcClient.wrapException(RpcClient.java:1482) at org.apache.hadoop.hbase.ipc.RpcClient.call(RpcClient.java:1454) at org.apache.hadoop.hbase.ipc.RpcClient.callBlockingMethod(RpcClient.java:1654) at org.apache.hadoop.hbase.ipc.RpcClient$BlockingRpcChannelImplementation.callBlockingMethod(RpcClient.java:1712) at org.apache.hadoop.hbase.protobuf.generated.MasterProtos$MasterService$BlockingStub.getClusterStatus(MasterProtos.java:42555) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$5.getClusterStatus(HConnectionManager.java:2132) at org.apache.hadoop.hbase.client.HBaseAdmin$16.call(HBaseAdmin.java:2166) at org.apache.hadoop.hbase.client.HBaseAdmin$16.call(HBaseAdmin.java:2162) at org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:114) ... 43 more Caused by: com.google.protobuf.InvalidProtocolBufferException: Protocol message was too large. May be malicious. Use CodedInputStream.setSizeLimit() to increase the size limit. at com.google.protobuf.InvalidProtocolBufferException.sizeLimitExceeded(InvalidProtocolBufferException.java:110) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-11747) ClusterStatus is too bulky
[ https://issues.apache.org/jira/browse/HBASE-11747?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14570287#comment-14570287 ] Andrew Purtell commented on HBASE-11747: One option is to use CodedInputStream#setSizeLimit in the client to effectively disable this check by setting it to Integer.MAX. ClusterStatus is too bulky --- Key: HBASE-11747 URL: https://issues.apache.org/jira/browse/HBASE-11747 Project: HBase Issue Type: Sub-task Reporter: Virag Kothari Attachments: exceptiontrace Following exception on 0.98 with 1M regions on cluster with 160 region servers {code} Caused by: java.io.IOException: Call to regionserverhost:port failed on local exception: com.google.protobuf.InvalidProtocolBufferException: Protocol message was too large. May be malicious. Use CodedInputStream.setSizeLimit() to increase the size limit. at org.apache.hadoop.hbase.ipc.RpcClient.wrapException(RpcClient.java:1482) at org.apache.hadoop.hbase.ipc.RpcClient.call(RpcClient.java:1454) at org.apache.hadoop.hbase.ipc.RpcClient.callBlockingMethod(RpcClient.java:1654) at org.apache.hadoop.hbase.ipc.RpcClient$BlockingRpcChannelImplementation.callBlockingMethod(RpcClient.java:1712) at org.apache.hadoop.hbase.protobuf.generated.MasterProtos$MasterService$BlockingStub.getClusterStatus(MasterProtos.java:42555) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$5.getClusterStatus(HConnectionManager.java:2132) at org.apache.hadoop.hbase.client.HBaseAdmin$16.call(HBaseAdmin.java:2166) at org.apache.hadoop.hbase.client.HBaseAdmin$16.call(HBaseAdmin.java:2162) at org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:114) ... 43 more Caused by: com.google.protobuf.InvalidProtocolBufferException: Protocol message was too large. May be malicious. Use CodedInputStream.setSizeLimit() to increase the size limit. at com.google.protobuf.InvalidProtocolBufferException.sizeLimitExceeded(InvalidProtocolBufferException.java:110) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-11747) ClusterStatus is too bulky
[ https://issues.apache.org/jira/browse/HBASE-11747?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14514224#comment-14514224 ] Dev Lakhani commented on HBASE-11747: - Is there any progress on this, or a workaround we can make use of? The comments above by [~virag] state setting: CodedInputStream.setSizeLimit() where can we do this, is it possible to do this in the application code? Or is it possible to set any other config param for example, will replication.source.size.capacity help with a workaround until a fix is implemented? Thanks ClusterStatus is too bulky --- Key: HBASE-11747 URL: https://issues.apache.org/jira/browse/HBASE-11747 Project: HBase Issue Type: Sub-task Reporter: Virag Kothari Attachments: exceptiontrace Following exception on 0.98 with 1M regions on cluster with 160 region servers {code} Caused by: java.io.IOException: Call to regionserverhost:port failed on local exception: com.google.protobuf.InvalidProtocolBufferException: Protocol message was too large. May be malicious. Use CodedInputStream.setSizeLimit() to increase the size limit. at org.apache.hadoop.hbase.ipc.RpcClient.wrapException(RpcClient.java:1482) at org.apache.hadoop.hbase.ipc.RpcClient.call(RpcClient.java:1454) at org.apache.hadoop.hbase.ipc.RpcClient.callBlockingMethod(RpcClient.java:1654) at org.apache.hadoop.hbase.ipc.RpcClient$BlockingRpcChannelImplementation.callBlockingMethod(RpcClient.java:1712) at org.apache.hadoop.hbase.protobuf.generated.MasterProtos$MasterService$BlockingStub.getClusterStatus(MasterProtos.java:42555) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$5.getClusterStatus(HConnectionManager.java:2132) at org.apache.hadoop.hbase.client.HBaseAdmin$16.call(HBaseAdmin.java:2166) at org.apache.hadoop.hbase.client.HBaseAdmin$16.call(HBaseAdmin.java:2162) at org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:114) ... 43 more Caused by: com.google.protobuf.InvalidProtocolBufferException: Protocol message was too large. May be malicious. Use CodedInputStream.setSizeLimit() to increase the size limit. at com.google.protobuf.InvalidProtocolBufferException.sizeLimitExceeded(InvalidProtocolBufferException.java:110) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-11747) ClusterStatus is too bulky
[ https://issues.apache.org/jira/browse/HBASE-11747?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14101323#comment-14101323 ] stack commented on HBASE-11747: --- Good one. Every RS sending 100MB of 'status' to the master every second or so is just obnoxious, especially so when much of this info is being duplicated no our metrics 'channel'. Thanks for bringing this one up Virag. We need a bit of fixup in here. ClusterStatus is too bulky --- Key: HBASE-11747 URL: https://issues.apache.org/jira/browse/HBASE-11747 Project: HBase Issue Type: Sub-task Reporter: Virag Kothari Attachments: exceptiontrace Following exception on 0.98 with 1M regions on cluster with 160 region servers {code} Caused by: java.io.IOException: Call to regionserverhost:port failed on local exception: com.google.protobuf.InvalidProtocolBufferException: Protocol message was too large. May be malicious. Use CodedInputStream.setSizeLimit() to increase the size limit. at org.apache.hadoop.hbase.ipc.RpcClient.wrapException(RpcClient.java:1482) at org.apache.hadoop.hbase.ipc.RpcClient.call(RpcClient.java:1454) at org.apache.hadoop.hbase.ipc.RpcClient.callBlockingMethod(RpcClient.java:1654) at org.apache.hadoop.hbase.ipc.RpcClient$BlockingRpcChannelImplementation.callBlockingMethod(RpcClient.java:1712) at org.apache.hadoop.hbase.protobuf.generated.MasterProtos$MasterService$BlockingStub.getClusterStatus(MasterProtos.java:42555) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$5.getClusterStatus(HConnectionManager.java:2132) at org.apache.hadoop.hbase.client.HBaseAdmin$16.call(HBaseAdmin.java:2166) at org.apache.hadoop.hbase.client.HBaseAdmin$16.call(HBaseAdmin.java:2162) at org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:114) ... 43 more Caused by: com.google.protobuf.InvalidProtocolBufferException: Protocol message was too large. May be malicious. Use CodedInputStream.setSizeLimit() to increase the size limit. at com.google.protobuf.InvalidProtocolBufferException.sizeLimitExceeded(InvalidProtocolBufferException.java:110) {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HBASE-11747) ClusterStatus is too bulky
[ https://issues.apache.org/jira/browse/HBASE-11747?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14097370#comment-14097370 ] Virag Kothari commented on HBASE-11747: --- This exception will be thrown if message size is more than 64MB. With 1M regions (only open and no data) on 160 servers, the size is around 100Mb. For now, did a workaround by setting the CodedInputStream.setSizeLimit() to a very high value. Do we need thinner API's? I assume RegionLoad is quite heavy. ClusterStatus is too bulky --- Key: HBASE-11747 URL: https://issues.apache.org/jira/browse/HBASE-11747 Project: HBase Issue Type: Bug Reporter: Virag Kothari Following exception on 0.98 with 1M regions on cluster with 160 region servers {code} Caused by: java.io.IOException: Call to regionserverhost:port failed on local exception: com.google.protobuf.InvalidProtocolBufferException: Protocol message was too large. May be malicious. Use CodedInputStream.setSizeLimit() to increase the size limit. at org.apache.hadoop.hbase.ipc.RpcClient.wrapException(RpcClient.java:1482) at org.apache.hadoop.hbase.ipc.RpcClient.call(RpcClient.java:1454) at org.apache.hadoop.hbase.ipc.RpcClient.callBlockingMethod(RpcClient.java:1654) at org.apache.hadoop.hbase.ipc.RpcClient$BlockingRpcChannelImplementation.callBlockingMethod(RpcClient.java:1712) at org.apache.hadoop.hbase.protobuf.generated.MasterProtos$MasterService$BlockingStub.getClusterStatus(MasterProtos.java:42555) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$5.getClusterStatus(HConnectionManager.java:2132) at org.apache.hadoop.hbase.client.HBaseAdmin$16.call(HBaseAdmin.java:2166) at org.apache.hadoop.hbase.client.HBaseAdmin$16.call(HBaseAdmin.java:2162) at org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:114) ... 43 more Caused by: com.google.protobuf.InvalidProtocolBufferException: Protocol message was too large. May be malicious. Use CodedInputStream.setSizeLimit() to increase the size limit. at com.google.protobuf.InvalidProtocolBufferException.sizeLimitExceeded(InvalidProtocolBufferException.java:110) {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HBASE-11747) ClusterStatus is too bulky
[ https://issues.apache.org/jira/browse/HBASE-11747?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14097387#comment-14097387 ] Andrew Purtell commented on HBASE-11747: bq. Do we need thinner API's? I assume RegionLoad is quite heavy. Yes, but we have to be careful in 0.98 not to change APIs in a breaking way. I think increasing the message size limit to work around the problem is fine given that consideration. How often do you plan to call HBaseAdmin#getClusterStatus? ClusterStatus is too bulky --- Key: HBASE-11747 URL: https://issues.apache.org/jira/browse/HBASE-11747 Project: HBase Issue Type: Bug Reporter: Virag Kothari Attachments: exceptiontrace Following exception on 0.98 with 1M regions on cluster with 160 region servers {code} Caused by: java.io.IOException: Call to regionserverhost:port failed on local exception: com.google.protobuf.InvalidProtocolBufferException: Protocol message was too large. May be malicious. Use CodedInputStream.setSizeLimit() to increase the size limit. at org.apache.hadoop.hbase.ipc.RpcClient.wrapException(RpcClient.java:1482) at org.apache.hadoop.hbase.ipc.RpcClient.call(RpcClient.java:1454) at org.apache.hadoop.hbase.ipc.RpcClient.callBlockingMethod(RpcClient.java:1654) at org.apache.hadoop.hbase.ipc.RpcClient$BlockingRpcChannelImplementation.callBlockingMethod(RpcClient.java:1712) at org.apache.hadoop.hbase.protobuf.generated.MasterProtos$MasterService$BlockingStub.getClusterStatus(MasterProtos.java:42555) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$5.getClusterStatus(HConnectionManager.java:2132) at org.apache.hadoop.hbase.client.HBaseAdmin$16.call(HBaseAdmin.java:2166) at org.apache.hadoop.hbase.client.HBaseAdmin$16.call(HBaseAdmin.java:2162) at org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:114) ... 43 more Caused by: com.google.protobuf.InvalidProtocolBufferException: Protocol message was too large. May be malicious. Use CodedInputStream.setSizeLimit() to increase the size limit. at com.google.protobuf.InvalidProtocolBufferException.sizeLimitExceeded(InvalidProtocolBufferException.java:110) {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HBASE-11747) ClusterStatus is too bulky
[ https://issues.apache.org/jira/browse/HBASE-11747?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14097392#comment-14097392 ] Andrew Purtell commented on HBASE-11747: We can add *new* APIs. I wonder if it would be workable to introduce a streaming status API where the client uses a cursor to iterate over the master's picture of the cluster. Might be tricky wherever regions have migrated or servers have come and gone. The master would have to provide either a consistent snapshot of state or track changes since the client opened the curser and mix in change deltas with iteration results. ClusterStatus is too bulky --- Key: HBASE-11747 URL: https://issues.apache.org/jira/browse/HBASE-11747 Project: HBase Issue Type: Bug Reporter: Virag Kothari Attachments: exceptiontrace Following exception on 0.98 with 1M regions on cluster with 160 region servers {code} Caused by: java.io.IOException: Call to regionserverhost:port failed on local exception: com.google.protobuf.InvalidProtocolBufferException: Protocol message was too large. May be malicious. Use CodedInputStream.setSizeLimit() to increase the size limit. at org.apache.hadoop.hbase.ipc.RpcClient.wrapException(RpcClient.java:1482) at org.apache.hadoop.hbase.ipc.RpcClient.call(RpcClient.java:1454) at org.apache.hadoop.hbase.ipc.RpcClient.callBlockingMethod(RpcClient.java:1654) at org.apache.hadoop.hbase.ipc.RpcClient$BlockingRpcChannelImplementation.callBlockingMethod(RpcClient.java:1712) at org.apache.hadoop.hbase.protobuf.generated.MasterProtos$MasterService$BlockingStub.getClusterStatus(MasterProtos.java:42555) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$5.getClusterStatus(HConnectionManager.java:2132) at org.apache.hadoop.hbase.client.HBaseAdmin$16.call(HBaseAdmin.java:2166) at org.apache.hadoop.hbase.client.HBaseAdmin$16.call(HBaseAdmin.java:2162) at org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:114) ... 43 more Caused by: com.google.protobuf.InvalidProtocolBufferException: Protocol message was too large. May be malicious. Use CodedInputStream.setSizeLimit() to increase the size limit. at com.google.protobuf.InvalidProtocolBufferException.sizeLimitExceeded(InvalidProtocolBufferException.java:110) {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HBASE-11747) ClusterStatus is too bulky
[ https://issues.apache.org/jira/browse/HBASE-11747?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14097506#comment-14097506 ] Elliott Clark commented on HBASE-11747: --- We should look at using smaller region names as well. There's no need to send the whole region name across. ClusterStatus is too bulky --- Key: HBASE-11747 URL: https://issues.apache.org/jira/browse/HBASE-11747 Project: HBase Issue Type: Bug Reporter: Virag Kothari Attachments: exceptiontrace Following exception on 0.98 with 1M regions on cluster with 160 region servers {code} Caused by: java.io.IOException: Call to regionserverhost:port failed on local exception: com.google.protobuf.InvalidProtocolBufferException: Protocol message was too large. May be malicious. Use CodedInputStream.setSizeLimit() to increase the size limit. at org.apache.hadoop.hbase.ipc.RpcClient.wrapException(RpcClient.java:1482) at org.apache.hadoop.hbase.ipc.RpcClient.call(RpcClient.java:1454) at org.apache.hadoop.hbase.ipc.RpcClient.callBlockingMethod(RpcClient.java:1654) at org.apache.hadoop.hbase.ipc.RpcClient$BlockingRpcChannelImplementation.callBlockingMethod(RpcClient.java:1712) at org.apache.hadoop.hbase.protobuf.generated.MasterProtos$MasterService$BlockingStub.getClusterStatus(MasterProtos.java:42555) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$5.getClusterStatus(HConnectionManager.java:2132) at org.apache.hadoop.hbase.client.HBaseAdmin$16.call(HBaseAdmin.java:2166) at org.apache.hadoop.hbase.client.HBaseAdmin$16.call(HBaseAdmin.java:2162) at org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:114) ... 43 more Caused by: com.google.protobuf.InvalidProtocolBufferException: Protocol message was too large. May be malicious. Use CodedInputStream.setSizeLimit() to increase the size limit. at com.google.protobuf.InvalidProtocolBufferException.sizeLimitExceeded(InvalidProtocolBufferException.java:110) {code} -- This message was sent by Atlassian JIRA (v6.2#6252)