I am not 100% sure but probably flatMap unwinds the tuples. Try with map instead.
2015-08-19 13:10 GMT+02:00 Jerry OELoo <oylje...@gmail.com>: > Hi. > I want to parse a file and return a key-value pair with pySpark, but > result is strange to me. > the test.sql is a big fie and each line is usename and password, with > # between them, I use below mapper2 to map data, and in my > understanding, i in words.take(10) should be a tuple, but the result > is that i is username or password, this is strange for me to > understand, Thanks for you help. > > def mapper2(line): > > words = line.split('#') > return (words[0].strip(), words[1].strip()) > > def main2(sc): > > lines = sc.textFile("hdfs://master:9000/spark/test.sql") > words = lines.flatMap(mapper2) > > for i in words.take(10): > msg = i + ":" + "\n" > > > -- > Rejoice,I Desire! > > --------------------------------------------------------------------- > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > For additional commands, e-mail: user-h...@spark.apache.org > >