Hi,
   It looks like it is caused by data skew, which offten happen in many big 
data scene. As far as I know, I think you should check the high cardinality 
colmun and use it as a "Shard By" column (in "Advanced Setting" of cube design 
stage). You may check "Redistribute intermediate table" in 
http://kylin.apache.org/docs20/howto/howto_optimize_build.html for more 
information.
   If you find anything wrong or I misunderstand anything, please let me know. 
Thank you.






-----------------
-----------------
Best wishes to you ! 
From :Xiaoxiang Yu

At 2019-06-22 02:33:56, "Cinto Sunny" <[email protected]> wrote:

Thanks. We actually have 12 reducers. The problem is that one reducer is 
getting stuck with huge data. The rest completes. We have a 1.8 billion dsids 
and not sure if that is problem. If yes, how do we distribute the data


- Cinto




On Fri, Jun 21, 2019 at 12:03 AM Chao Long <[email protected]> wrote:

Hi Cinto Sunny,
   You can try to set "kylin.engine.mr.uhc-reducer-count" a bigger value, 
default is 1.


On Fri, Jun 21, 2019 at 2:44 PM Cinto Sunny <[email protected]> wrote:

Hi All,


I am building a cube with 10 dimensions and two measures. The total input size 
is 100 GB. 
I am trying to build using Roaring BitMap. One of the fact is user and has 
~1.8B userids. 


The build is getting stuck at stage - Extract Fact Table Distinct Columns. One 
executor is stuck and is processing over 800M lines.


I am using version - 2.6.


Any pointers would be appreciated. Let me know is any further information is 
required.


- Cinto

Reply via email to