Re: Spark return key value pair

2015-08-19 Thread Robin East
Dawid is right, if you did words.count it would be twice the number of input lines. You can use map like this: words = lines.map(mapper2) for i in words.take(10): msg = i[0] + ":ā€ + i[1] + "\nā€ --- Robin East

Re: Spark return key value pair

2015-08-19 Thread Dawid Wysakowicz
I am not 100% sure but probably flatMap unwinds the tuples. Try with map instead. 2015-08-19 13:10 GMT+02:00 Jerry OELoo : > Hi. > I want to parse a file and return a key-value pair with pySpark, but > result is strange to me. > the test.sql is a big fie and each line is usename and password, wit

Spark return key value pair

2015-08-19 Thread Jerry OELoo
Hi. I want to parse a file and return a key-value pair with pySpark, but result is strange to me. the test.sql is a big fie and each line is usename and password, with # between them, I use below mapper2 to map data, and in my understanding, i in words.take(10) should be a tuple, but the result is