https://stackoverflow.com/questions/26562033/how-to-set-apache-spark-executor-memory
Regards, Vaquar khan On Mon, Nov 13, 2017 at 6:22 PM, Alec Swan <alecs...@gmail.com> wrote: > Hello, > > I am using the Spark library to convert JSON/Snappy files to ORC/ZLIB > format. Effectively, my Java service starts up an embedded Spark cluster > (master=local[*]) and uses Spark SQL to convert JSON to ORC. However, I > keep getting OOM errors with large (~1GB) files. > > I've tried different ways to reduce memory usage, e.g. by partitioning > data with dataSet.partitionBy("customer).save(filePath), or capping > memory usage by setting spark.executor.memory=1G, but to no vail. > > I am wondering if there is a way to avoid OOM besides splitting the source > JSON file into multiple smaller ones and processing the small ones > individually? Does Spark SQL have to read the JSON/Snappy (row-based) file > in it's entirety before converting it to ORC (columnar)? If so, would it > make sense to create a custom receiver that reads the Snappy file and use > Spark streaming for ORC conversion? > > Thanks, > > Alec > -- Regards, Vaquar Khan +1 -224-436-0783 Greater Chicago