Dataset takes more memory compared to RDD

Raghava Mutharaju Fri, 12 Feb 2016 15:23:37 -0800

Hello All,

I implemented an algorithm using both the RDDs and the Dataset API (in
Spark 1.6). Dataset version takes lot more memory than the RDDs. Is this
normal? Even for very small input data, it is running out of memory and I
get a java heap exception.


I tried the Kryo serializer by registering the classes and I
set spark.kryo.registrationRequired to true. I get the following exception

com.esotericsoftware.kryo.KryoException:
java.lang.IllegalArgumentException: Class is not registered:
org.apache.spark.sql.types.StructField[]
Note: To register this class use:
kryo.register(org.apache.spark.sql.types.StructField[].class);

I tried registering
using conf.registerKryoClasses(Array(classOf[StructField[]]))

But StructField[] does not exist. Is there any other way to register it? I
already registered StructField.

Regards,
Raghava.

Dataset takes more memory compared to RDD

Reply via email to