I saw this error once in my colleague's server. It seems like an environment issue. Later he restarted HBase and the problem was disappeared.
2018-07-12 21:53 GMT+08:00 Ge Silas <[email protected]>: > This really sounds like an environmental issue. kylin.log in node 2 should > have logs about a query ID started and completed. Does kylin.log on node 3 > have any logs like that? > > And from your log, this is already the 35th retry so I think long before > the query fails, the kylin's status was abnormal already on node 3. > > Thanks, > Silas > ------------------------------ > *发件人:* Phil Scott <[email protected]> > *发送时间:* 2018年7月6日 8:24 > *收件人:* [email protected] > *主题:* Fwd: Kylin 2.3.1 cluster - some nodes fail to query against cube > > > > > *Problem*: > > When I perform a simple SUM() query on my built cube, it runs sub-second > on 1 cluster node, but the other 2 cluster nodes don't recognize a cube for > that query and they run forever (or fail silently without telling the UI > that execution has halted). > > > > *Context*: > > Version: Kylin 2.3.1 > > Mode: Clustered > > > > I have created a Kylin Cube on top of a Fact table in Hive, and Built a > data segment using a sample date range. My Kylin configuration is running > as a 3 node cluster. > > Node 1 is configured as a job & query server (in conf/kylin.properties the > setting is:*kylin.server.mode=all*). > > Nodes 2 and 3 are configured as query-only servers (in > conf/kylin.properties the setting is:*kylin.server.mode=query*) > > Once I have successfully built my cube with a data segment, I try to run a > query like this in the Kylin UI Insight tab: > > SELECT SUM(some_metric) AS value FROM my_fact_table > > > > If I execute this query from the web UI on node 1 or node 3, the query > goes into [executing] status forever. > > If I execute the exact same query from the web UI on node 2, the query > returns in 0.02 seconds. > > So, my nodes 1 and 3 are rendered useless as end-points for querying. > > See picture of results on node 2 and 3: > > [image: https://i.stack.imgur.com/V8Yvs.png] > > > > I've compared the kylin/logs/kylin.log files for node 1 (failing) and node > 2 (working). Both logs matched each other message for message up until the > following spot where node 1 fails... See below: > > > > 2018-07-02 16:38:25,629 DEBUG [Query eaf48991-94fd-40cd-9834-1097e79c6840-74] > enumerator.OLAPEnumerator:120 : return TupleIterator... > > > > 2018-07-02 16:38:46,337 INFO [Scheduler 256150323 FetcherRunner-69] > threadpool.DefaultScheduler:268 : Job Fetcher: 0 should running, 0 actual > running, 0 stopped, 0 ready, 588 already succeed, 43 error, 49 discarded, 0 > others > > > > 2018-07-02 16:39:05,911 INFO [kylin-coproc--pool3-t1] > client.RpcRetryingCaller:146 : Call exception, tries=10, retries=35, > started=68253 ms ago, cancelled=false, msg=*java.io.IOException: Message > missing required fields: compressedRows, stats* > > at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2195) > > at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:112) > > at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecu > tor.java:187) > > at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecu > tor.java:167) > > Caused by: com.google.protobuf.UninitializedMessageException: Message > missing required fields: compressedRows, stats > > at com.google.protobuf.AbstractMessage$Builder.newUninitialized > MessageException(AbstractMessage.java:770) > > at org.apache.kylin.storage.hbase.cube.v2.coprocessor.endpoint. > generated.CubeVisitProtos$CubeVisitResponse$Builder.build( > CubeVisitProtos.java:5019) > > at org.apache.kylin.storage.hbase.cube.v2.coprocessor.endpoint. > generated.CubeVisitProtos$CubeVisitResponse$Builder.build( > CubeVisitProtos.java:4949) > > at org.apache.hadoop.hbase.regionserver.HRegion.execService( > HRegion.java:7866) > > at org.apache.hadoop.hbase.regionserver.RSRpcServices.execServi > ceOnRegion(RSRpcServices.java:1980) > > at org.apache.hadoop.hbase.regionserver.RSRpcServices.execServi > ce(RSRpcServices.java:1962) > > at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ > ClientService$2.callBlockingMethod(ClientProtos.java:32389) > > at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2150) > > > > So, the *client.RpcRetryingCaller* (I assume that's the Kylin server > making an RPC call to HBase) is failing. The error message is: > > > > java.io.IOException: Message missing required fields: > compressedRows, stats > > > > > > > > *Questions* > > 1. What might cause this? > > 2. Is there a way that I can make nodes 1 & 3 "sync up" or > clear/reload from built cube data so that they respond (without having to > rebuild my cube)? Or is this an issue with Nodes 1 & 3 failing to > communicate with HBase? I’ve run command-line hbase queries on all 3 nodes > to make sure they can all communicate with hbase… > > 3. How can I diagnose whether a cube is being recognized by a > particular cluster node? > > > > > > > > -Phil Scott > > > > -- Best regards, Shaofeng Shi 史少锋
