Re: Pyspark 2.1.0 weird behavior with repartition

2017-03-11 Thread Olivier Girardot
I kinda reproduced that, with pyspark 2.1 also for hadoop 2.6 and with python 3.x I'll look into it a bit more after I've fixed a few other issues regarding the salting of strings on the cluster. 2017-01-30 20:19 GMT+01:00 Blaž Šnuderl : > I am loading a simple text file using

Pyspark 2.1.0 weird behavior with repartition

2017-01-30 Thread Blaž Šnuderl
I am loading a simple text file using pyspark. Repartitioning it seems to produce garbage data. I got this results using spark 2.1 prebuilt for hadoop 2.7 using pyspark shell. >>> sc.textFile("outc").collect() [u'a', u'b', u'c', u'd', u'e', u'f', u'g', u'h', u'i', u'j', u'k', u'l'] >>>