Re: custom rdd - do I need a hadoop input format?

2019-09-18 Thread Marcelo Valle
To implement a custom RDD with getPartitions, I have to extend `NewHadoopRDD` informing the hadoop input format class, right? What input format could I inform so the file won't be read all at once and my getPartitions method could split by block? On Tue, 17 Sep 2019 at 18:53, Arun Mahadevan

Re: intermittent Kryo serialization failures in Spark

2019-09-18 Thread Vadim Semenov
I remember it not working for us when we were setting it from the inside and needed to actually pass it On Wed, Sep 18, 2019 at 10:38 AM Jerry Vinokurov wrote: > Hi Vadim, > > Thanks for your suggestion. We do preregister the classes, like so: > > object KryoRegistrar { >> >> val

Re: intermittent Kryo serialization failures in Spark

2019-09-18 Thread Jerry Vinokurov
Hi Vadim, Thanks for your suggestion. We do preregister the classes, like so: object KryoRegistrar { > > val classesToRegister: Array[Class[_]] = Array( > classOf[MyModel], >[etc] > ) } > And then we do: val sparkConf = new SparkConf() >