Dawid is right, if you did words.count it would be twice the number of input
lines. You can use map like this:
words = lines.map(mapper2)
for i in words.take(10):
msg = i[0] + ":ā + i[1] + "\nā
---
Robin East
I am not 100% sure but probably flatMap unwinds the tuples. Try with map
instead.
2015-08-19 13:10 GMT+02:00 Jerry OELoo :
> Hi.
> I want to parse a file and return a key-value pair with pySpark, but
> result is strange to me.
> the test.sql is a big fie and each line is usename and password, wit
Hi.
I want to parse a file and return a key-value pair with pySpark, but
result is strange to me.
the test.sql is a big fie and each line is usename and password, with
# between them, I use below mapper2 to map data, and in my
understanding, i in words.take(10) should be a tuple, but the result
is