Hi, It looks like it is caused by data skew, which offten happen in many big data scene. As far as I know, I think you should check the high cardinality colmun and use it as a "Shard By" column (in "Advanced Setting" of cube design stage). You may check "Redistribute intermediate table" in http://kylin.apache.org/docs20/howto/howto_optimize_build.html for more information. If you find anything wrong or I misunderstand anything, please let me know. Thank you.
----------------- ----------------- Best wishes to you ! From :Xiaoxiang Yu At 2019-06-22 02:33:56, "Cinto Sunny" <[email protected]> wrote: Thanks. We actually have 12 reducers. The problem is that one reducer is getting stuck with huge data. The rest completes. We have a 1.8 billion dsids and not sure if that is problem. If yes, how do we distribute the data - Cinto On Fri, Jun 21, 2019 at 12:03 AM Chao Long <[email protected]> wrote: Hi Cinto Sunny, You can try to set "kylin.engine.mr.uhc-reducer-count" a bigger value, default is 1. On Fri, Jun 21, 2019 at 2:44 PM Cinto Sunny <[email protected]> wrote: Hi All, I am building a cube with 10 dimensions and two measures. The total input size is 100 GB. I am trying to build using Roaring BitMap. One of the fact is user and has ~1.8B userids. The build is getting stuck at stage - Extract Fact Table Distinct Columns. One executor is stuck and is processing over 800M lines. I am using version - 2.6. Any pointers would be appreciated. Let me know is any further information is required. - Cinto
