Custom Serialization

2014-07-02 Thread Andrea Esposito
Hi, i have a non-serializable class and as workaround i'm trying to re-instantiate it at each de-serialization. Thus, i created a wrapper class and overridden the writeObject and readObject methods as follow: private def writeObject(oos: ObjectOutputStream) { oos.defaultWriteObject()

Re: KryoSerializer Exception

2014-05-30 Thread Andrea Esposito
Hi, i just migrate to 1.0. Still having the same issue. Either with or without the custom registrator. Just the usage of the KryoSerializer triggers the exception immediately. I set the kryo settings through the property: System.setProperty(spark.serializer, org.apache.spark.serializer.

Re: KryoSerializer Exception

2014-05-16 Thread Andrea Esposito
UP, doesn't anyone know something about it? ^^ 2014-05-06 12:05 GMT+02:00 Andrea Esposito and1...@gmail.com: Hi there, sorry if i'm posting a lot lately. i'm trying to add the KryoSerializer but i receive this exception: 2014 - 05 - 06 11: 45: 23 WARN TaskSetManager: 62 - Loss was due

Re: Incredible slow iterative computation

2014-05-06 Thread Andrea Esposito
Thanks all for helping. Following the Earthson's tip i resolved. I have to report that if you materialized the RDD and after you try to checkpoint it the operation doesn't perform. newRdd = oldRdd.map(myFun).persist(myStorageLevel) newRdd.foreach(x = myFunLogic(x)) // Here materialized for other

KryoSerializer Exception

2014-05-06 Thread Andrea Esposito
Hi there, sorry if i'm posting a lot lately. i'm trying to add the KryoSerializer but i receive this exception: 2014 - 05 - 06 11: 45: 23 WARN TaskSetManager: 62 - Loss was due to java.io.EOFException java.io.EOFException at

Re: Incredible slow iterative computation

2014-05-05 Thread Andrea Esposito
Update: Checkpointing it doesn't perform. I checked by the isCheckpointed method but it returns always false. ??? 2014-05-05 23:14 GMT+02:00 Andrea Esposito and1...@gmail.com: Checkpoint doesn't help it seems. I do it at each iteration/superstep. Looking deeply, the RDDs are recomputed just

Re: cache not work as expected for iteration?

2014-05-04 Thread Andrea Esposito
Maybe your memory isn't enough to contain the current RDD and also all the past ones? RDDs that are cached or persisted have to be unpersisted explicitly, no auto-unpersist (maybe changes will be for 1.0 version?) exists. Be careful that calling cache() or persist() doesn't imply the RDD will be

Re: Efficient Aggregation over DB data

2014-05-01 Thread Andrea Esposito
Hi Sai, i don't sincerely figure out where you are using the RDDs (because the split method isn't defined in them) by the way you should use the map function instead of the foreach due the fact it is NOT idempotent and some partitions could be recomputed executing the function multiple times.

Re: what is the difference between persist() and cache()?

2014-04-13 Thread Andrea Esposito
AFAIK cache() is just a shortcut to the persist method with MEMORY_ONLY as storage level.. from the source code of RDD: /** Persist this RDD with the default storage level (`MEMORY_ONLY`). */ def persist(): RDD[T] = persist(StorageLevel.MEMORY_ONLY) /** Persist this RDD with the default