Is there a good/favorite article for tuning spark settings within Kylin?

I finally have Spark (2.1.3 as distributed with Kylin 2.5.2) running on my
systems.

My small data set (35M records) runs well the default settings.

My medium data set (4B records, 40GB compressed source file, 5 measures, 6
dimensions with low carnality) often dies at Step 3 (Extract Fact Table
Distinct Columns) with out of memory errors.

After using exceptionally large memory settings the job completed but I'm
trying to see if there is an optimization possible.

Any suggestions or ideas?  I've searched/read on spark tuning in general
but otherwise feeling I'm not making too much progress on optimizing with
the settings I've tried.

Thanks!J

Reply via email to