grows regularly throughout the execution
until no free space is available, despite the call to the GC.
Aurelien
Le 9/8/15 6:22 PM, Aurélien Bellet a écrit :
Hi,
This is what I tried:
for i in range(1000):
print i
data2=data.repartition(50).cache()
if (i+1) % 10 == 0
Aurélien Bellet
mailto:aurelien.bel...@telecom-paristech.fr>>:
Thanks a lot for the useful link and comments Alexis!
First of all, the problem occurs without doing anything else in the
code (except of course loading my data from HDFS at the beginning) -
so it definitely come
,
2015-09-01 22:48 GMT+08:00 Aurélien Bellet
mailto:aurelien.bel...@telecom-paristech.fr>>:
Dear Alexis,
Thanks again for your reply. After reading about checkpointing I
have modified my sample code as follows:
for i in range(1000):
print i
data2=data.re
Dear Alexis,
Thanks again for your reply. After reading about checkpointing I have
modified my sample code as follows:
for i in range(1000):
print i
data2=data.repartition(50).cache()
if (i+1) % 10 == 0:
data2.checkpoint()
data2.first() # materialize rdd
data.unpers
=
rdd.sample(true,0.01,42).mapPartitions(scala.util.Random.shuffle)
val sample2 =
rdd.sample(true,0.01,43).mapPartitions(scala.util.Random.shuffle)
...
On Fri, Apr 17, 2015 at 3:05 AM, Aurélien Bellet
mailto:aurelien.bel...@telecom-paristech.fr>> wrote:
Hi Sean,
Thanks a lot for your
Hi Sean,
Thanks a lot for your reply. The problem is that I need to sample random
*independent* pairs. If I draw two samples and build all n*(n-1) pairs
then there is a lot of dependency. My current solution is also not
satisfying because some pairs (the closest ones in a partition) have a
mu