Re: Spark return key value pair

Robin East Wed, 19 Aug 2015 13:39:25 -0700

Dawid is right, if you did words.count it would be twice the number of input 
lines. You can use map like this:


words = lines.map(mapper2)

   for i in words.take(10):
       msg = i[0] + ":” + i[1] + "\n”

-------------------------------------------------------------------------------
Robin East
Spark GraphX in Action Michael Malak and Robin East
Manning Publications Co.
http://www.manning.com/malak/ <http://www.manning.com/malak/>


> On 19 Aug 2015, at 12:19, Dawid Wysakowicz <wysakowicz.da...@gmail.com> wrote:
> 
> I am not 100% sure but probably flatMap unwinds the tuples. Try with map 
> instead.
> 
> 2015-08-19 13:10 GMT+02:00 Jerry OELoo <oylje...@gmail.com 
> <mailto:oylje...@gmail.com>>:
> Hi.
> I want to parse a file and return a key-value pair with pySpark, but
> result is strange to me.
> the test.sql is a big fie and each line is usename and password, with
> # between them, I use below mapper2 to map data, and in my
> understanding, i in words.take(10) should be a tuple, but the result
> is that i is username or password, this is strange for me to
> understand, Thanks for you help.
> 
> def mapper2(line):
> 
>     words = line.split('#')
>     return (words[0].strip(), words[1].strip())
> 
> def main2(sc):
> 
>     lines = sc.textFile("hdfs://master:9000/spark/test.sql")
>     words = lines.flatMap(mapper2)
> 
>     for i in words.take(10):
>         msg = i + ":" + "\n"
> 
> 
> --
> Rejoice,I Desire!
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org 
> <mailto:user-unsubscr...@spark.apache.org>
> For additional commands, e-mail: user-h...@spark.apache.org 
> <mailto:user-h...@spark.apache.org>
> 
>

Re: Spark return key value pair

Reply via email to