Dawid is right, if you did words.count it would be twice the number of input lines. You can use map like this:
words = lines.map(mapper2) for i in words.take(10): msg = i[0] + ":ā + i[1] + "\nā ------------------------------------------------------------------------------- Robin East Spark GraphX in Action Michael Malak and Robin East Manning Publications Co. http://www.manning.com/malak/ <http://www.manning.com/malak/> > On 19 Aug 2015, at 12:19, Dawid Wysakowicz <wysakowicz.da...@gmail.com> wrote: > > I am not 100% sure but probably flatMap unwinds the tuples. Try with map > instead. > > 2015-08-19 13:10 GMT+02:00 Jerry OELoo <oylje...@gmail.com > <mailto:oylje...@gmail.com>>: > Hi. > I want to parse a file and return a key-value pair with pySpark, but > result is strange to me. > the test.sql is a big fie and each line is usename and password, with > # between them, I use below mapper2 to map data, and in my > understanding, i in words.take(10) should be a tuple, but the result > is that i is username or password, this is strange for me to > understand, Thanks for you help. > > def mapper2(line): > > words = line.split('#') > return (words[0].strip(), words[1].strip()) > > def main2(sc): > > lines = sc.textFile("hdfs://master:9000/spark/test.sql") > words = lines.flatMap(mapper2) > > for i in words.take(10): > msg = i + ":" + "\n" > > > -- > Rejoice,I Desire! > > --------------------------------------------------------------------- > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > <mailto:user-unsubscr...@spark.apache.org> > For additional commands, e-mail: user-h...@spark.apache.org > <mailto:user-h...@spark.apache.org> > >