Hi,
i have a non-serializable class and as workaround i'm trying to
re-instantiate it at each de-serialization. Thus, i created a wrapper class
and overridden the writeObject and readObject methods as follow:
private def writeObject(oos: ObjectOutputStream) {
oos.defaultWriteObject()
Hi,
i just migrate to 1.0. Still having the same issue.
Either with or without the custom registrator. Just the usage of the
KryoSerializer triggers the exception immediately.
I set the kryo settings through the property:
System.setProperty(spark.serializer, org.apache.spark.serializer.
UP, doesn't anyone know something about it? ^^
2014-05-06 12:05 GMT+02:00 Andrea Esposito and1...@gmail.com:
Hi there,
sorry if i'm posting a lot lately.
i'm trying to add the KryoSerializer but i receive this exception:
2014 - 05 - 06 11: 45: 23 WARN TaskSetManager: 62 - Loss was due
Thanks all for helping.
Following the Earthson's tip i resolved. I have to report that if you
materialized the RDD and after you try to checkpoint it the operation
doesn't perform.
newRdd = oldRdd.map(myFun).persist(myStorageLevel)
newRdd.foreach(x = myFunLogic(x)) // Here materialized for other
Hi there,
sorry if i'm posting a lot lately.
i'm trying to add the KryoSerializer but i receive this exception:
2014 - 05 - 06 11: 45: 23 WARN TaskSetManager: 62 - Loss was due to
java.io.EOFException
java.io.EOFException
at
Update: Checkpointing it doesn't perform. I checked by the isCheckpointed
method but it returns always false. ???
2014-05-05 23:14 GMT+02:00 Andrea Esposito and1...@gmail.com:
Checkpoint doesn't help it seems. I do it at each iteration/superstep.
Looking deeply, the RDDs are recomputed just
Maybe your memory isn't enough to contain the current RDD and also all the
past ones?
RDDs that are cached or persisted have to be unpersisted explicitly, no
auto-unpersist (maybe changes will be for 1.0 version?) exists.
Be careful that calling cache() or persist() doesn't imply the RDD will be
Hi Sai,
i don't sincerely figure out where you are using the RDDs (because the
split method isn't defined in them) by the way you should use the map
function instead of the foreach due the fact it is NOT idempotent and some
partitions could be recomputed executing the function multiple times.
AFAIK cache() is just a shortcut to the persist method with MEMORY_ONLY
as storage level..
from the source code of RDD:
/** Persist this RDD with the default storage level (`MEMORY_ONLY`). */
def persist(): RDD[T] = persist(StorageLevel.MEMORY_ONLY)
/** Persist this RDD with the default