Hi,

I have to generate a big cube, about 400 M rows of historical data (and many dimensions in small-mid size cluster). To avoid a very big cube building process, I  divided this process into month periods (about 30-40 M rows per month). When this process finish, an hourly load process will begin. Then we will have several historical monthly segments and then, new incremental hourly segments. About this scenario, arise me the following questions:

 * Do you recommend merge all the historical segments?
     o Sometimes we will need to rebuilt some month from the last six
       months. Due to the cube size, we thougth will be faster to
       rebuilt just a month segment.
 * I' going to define the following auto merge times after we get all
   historical data, for hourly incremental load.
     o 1 day
     o 7 days
     o 28 days
     o I understand well, this means that
         + Every day, all hourly segments will be merged.
         + Every 7 days, all daily segments will be merged.
         + Every 28 days, all 7 days segments will be merged.
     o This config arises my two questions:
         + 28 days segments will be automatically merged any time?
         + our historical big segments will be automatically merged?
 * I thougth that maybe I need to develop an script that merge segments
   as I need (using kylin rest API), instead of using Kylin cube auto
   merge option.

Thanks in advance,

Roberto

--

*Roberto Tardío Olmos*

/Senior Big Data & Business Intelligence Consultant/
Avenida de Brasil, 17, Planta 16.28020 Madrid
Fijo: 91.788.34.10

Reply via email to