Hi J, There is a slide about Spark tunning in Apache Kylin(author shaofengshi) https://www.slideshare.net/ShiShaoFeng1/spark-tunning-in-apache-kylin
About Step 3 (Extract Fact Table Distinct Columns) OOM, you can try to set this parameter "kylin.engine.mr.uhc-reducer-count" a larger value(default 1). ------------------ Best Regards, Chao Long ------------------ ???????? ------------------ ??????: "Jon Shoberg"<[email protected]>; ????????: 2018??12??18??(??????) ????11:16 ??????: "user"<[email protected]>; ????: Re: Spark tuning within Kylin? Article? Resource? Greatly appreciate the response. I started there but after OOM errors I started to work on the settings for my test lab. After minimal success thought to ask if there was something more in-depth for tuning with other Kylin users found successful. Right now I've gone to very basic configuration with dynamic allocation and see if I can avoid the late-stage OOM errors. J On Mon, Dec 17, 2018 at 7:44 PM JiaTao Tao <[email protected]> wrote: Hope this may help: http://kylin.apache.org/docs/tutorial/cube_spark.html Jon Shoberg <[email protected]> ??2018??12??18?????? ????2:34?????? Is there a good/favorite article for tuning spark settings within Kylin? I finally have Spark (2.1.3 as distributed with Kylin 2.5.2) running on my systems. My small data set (35M records) runs well the default settings. My medium data set (4B records, 40GB compressed source file, 5 measures, 6 dimensions with low carnality) often dies at Step 3 (Extract Fact Table Distinct Columns) with out of memory errors. After using exceptionally large memory settings the job completed but I'm trying to see if there is an optimization possible. Any suggestions or ideas? I've searched/read on spark tuning in general but otherwise feeling I'm not making too much progress on optimizing with the settings I've tried. Thanks!J -- Regards! Aron Tao
