Hi J,
There is a slide about Spark tunning in Apache Kylin(author shaofengshi)
https://www.slideshare.net/ShiShaoFeng1/spark-tunning-in-apache-kylin 


About Step 3 (Extract Fact Table Distinct Columns) OOM, you can try to set this 
parameter "kylin.engine.mr.uhc-reducer-count" a larger value(default 1).



------------------
Best Regards,
Chao Long




------------------ ???????? ------------------
??????: "Jon Shoberg"<[email protected]>;
????????: 2018??12??18??(??????) ????11:16
??????: "user"<[email protected]>;

????: Re: Spark tuning within Kylin? Article? Resource?



Greatly appreciate the response.

I started there but after OOM errors I started to work on the settings for my 
test lab. After minimal success thought to ask if there was something more 
in-depth for tuning with other Kylin users found successful.


Right now I've gone to very basic configuration with dynamic allocation and see 
if I can avoid the late-stage OOM errors.


J


On Mon, Dec 17, 2018 at 7:44 PM JiaTao Tao <[email protected]> wrote:

Hope this may help: http://kylin.apache.org/docs/tutorial/cube_spark.html


Jon Shoberg <[email protected]> ??2018??12??18?????? ????2:34??????

Is there a good/favorite article for tuning spark settings within Kylin?

I finally have Spark (2.1.3 as distributed with Kylin 2.5.2) running on my 
systems.


My small data set (35M records) runs well the default settings.


My medium data set (4B records, 40GB compressed source file, 5 measures, 6 
dimensions with low carnality) often dies at Step 3 (Extract Fact Table 
Distinct Columns) with out of memory errors.


After using exceptionally large memory settings the job completed but I'm 
trying to see if there is an optimization possible.


Any suggestions or ideas?  I've searched/read on spark tuning in general but 
otherwise feeling I'm not making too much progress on optimizing with the 
settings I've tried.


Thanks!J

 



-- 




Regards!

Aron Tao

Reply via email to