Problem:
When I perform a simple SUM() query on my built cube, it runs sub-second on 1 
cluster node, but the other 2 cluster nodes don't recognize a cube for that 
query and they run forever (or fail silently without telling the UI that 
execution has halted).

Context:
Version: Kylin 2.3.1
Mode: Clustered

I have created a Kylin Cube on top of a Fact table in Hive, and Built a data 
segment using a sample date range. My Kylin configuration is running as a 3 
node cluster.
Node 1 is configured as a job & query server (in conf/kylin.properties the 
setting is:kylin.server.mode=all).
Nodes 2 and 3 are configured as query-only servers (in conf/kylin.properties 
the setting is:kylin.server.mode=query)
Once I have successfully built my cube with a data segment, I try to run a 
query like this in the Kylin UI Insight tab:
         SELECT SUM(some_metric) AS value FROM my_fact_table

If I execute this query from the web UI on node 1 or node 3, the query goes 
into [executing] status forever.
If I execute the exact same query from the web UI on node 2, the query returns 
in 0.02 seconds.
So, my nodes 1 and 3 are rendered useless as end-points for querying.
See picture of results on node 2 and 3:
[https://i.stack.imgur.com/V8Yvs.png]

I've compared the kylin/logs/kylin.log files for node 1 (failing) and node 2 
(working). Both logs matched each other message for message up until the 
following spot where node 1 fails... See below:

2018-07-02 16:38:25,629 DEBUG [Query eaf48991-94fd-40cd-9834-1097e79c6840-74] 
enumerator.OLAPEnumerator:120 : return TupleIterator...

2018-07-02 16:38:46,337 INFO  [Scheduler 256150323 FetcherRunner-69] 
threadpool.DefaultScheduler:268 : Job Fetcher: 0 should running, 0 actual 
running, 0 stopped, 0 ready, 588 already succeed, 43 error, 49 discarded, 0 
others

2018-07-02 16:39:05,911 INFO  [kylin-coproc--pool3-t1] 
client.RpcRetryingCaller:146 : Call exception, tries=10, retries=35, 
started=68253 ms ago, cancelled=false, msg=java.io.IOException: Message missing 
required fields: compressedRows, stats
        at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2195)
        at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:112)
        at 
org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:187)
        at 
org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:167)
Caused by: com.google.protobuf.UninitializedMessageException: Message missing 
required fields: compressedRows, stats
        at 
com.google.protobuf.AbstractMessage$Builder.newUninitializedMessageException(AbstractMessage.java:770)
        at 
org.apache.kylin.storage.hbase.cube.v2.coprocessor.endpoint.generated.CubeVisitProtos$CubeVisitResponse$Builder.build(CubeVisitProtos.java:5019)
        at 
org.apache.kylin.storage.hbase.cube.v2.coprocessor.endpoint.generated.CubeVisitProtos$CubeVisitResponse$Builder.build(CubeVisitProtos.java:4949)
        at 
org.apache.hadoop.hbase.regionserver.HRegion.execService(HRegion.java:7866)
        at 
org.apache.hadoop.hbase.regionserver.RSRpcServices.execServiceOnRegion(RSRpcServices.java:1980)
        at 
org.apache.hadoop.hbase.regionserver.RSRpcServices.execService(RSRpcServices.java:1962)
        at 
org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:32389)
        at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2150)

So, the client.RpcRetryingCaller (I assume that's the Kylin server making an 
RPC call to HBase) is failing. The error message is:

         java.io.IOException: Message missing required fields: compressedRows, 
stats



Questions
1.      What might cause this?
2.      Is there a way that I can make nodes 1 & 3 "sync up" or clear/reload 
from built cube data so that they respond (without having to rebuild my cube)?  
Or is this an issue with Nodes 1 & 3 failing to communicate with HBase?  I’ve 
run command-line hbase queries on all 3 nodes to make sure they can all 
communicate with hbase…
3.      How can I diagnose whether a cube is being recognized by a particular 
cluster node?



-Phil Scott

Reply via email to