I kinda reproduced that, with pyspark 2.1 also for hadoop 2.6 and with
python 3.x
I'll look into it a bit more after I've fixed a few other issues regarding
the salting of strings on the cluster.
2017-01-30 20:19 GMT+01:00 Blaž Šnuderl :
> I am loading a simple text file using
I am loading a simple text file using pyspark. Repartitioning it seems to
produce garbage data.
I got this results using spark 2.1 prebuilt for hadoop 2.7 using pyspark
shell.
>>> sc.textFile("outc").collect()
[u'a', u'b', u'c', u'd', u'e', u'f', u'g', u'h', u'i', u'j', u'k', u'l']
>>>