Re: build a cube with two ultra high cardinality columns

2016-01-08 Thread ShaoFeng Shi
Agree with feng yu, you need think about whether you need build such high-cardinality dimension into Cube; For example, if the column is something like a free text description, or a timestamp column, it doesn't make sense to have them in Cube, as Kylin is an OLDAP engine not a common database; you

Re: build a cube with two ultra high cardinality columns

2016-01-08 Thread yu feng
assume average size of this column is 32 bytes, 50 millions cardinality means 1.5GB, in the step of 'Fact Table Distinct Columns.' mapper need read from intermediate table and remove duplicate values(do it in Combiner), however, this job will startup more than one mapper and just one reducer, there

build a cube with two ultra high cardinality columns

2016-01-08 Thread zhong zhang
Hi All, There are two ultra high carnality columns in our cube. Both of them are over 50 million cardinality. When building the cube, it keeps giving us the error: Error: GC overhead limit exceeded for the reduce jobs at the step Extract Fact Table Distinct Columns. We've just updated to version1