How about making the range in the for loop parallelised? The driver will then kick off the word counts independently.
Regards, Guy Needham | Data Discovery Virgin Media | Technology and Transformation | Data Bartley Wood Business Park, Hook, Hampshire RG27 9UP D 01256 75 3362 I welcome VSRE emails. Learn more at http://vsre.info/ From: ayan guha [mailto:[email protected]] Sent: 18 May 2015 15:46 To: Laeeq Ahmed Cc: [email protected] Subject: Re: Processing multiple columns in parallel My first thought would be creating 10 rdds and run your word count on each of them..I think spark scheduler is going to resolve dependency in parallel and launch 10 jobs. Best Ayan On 18 May 2015 23:41, "Laeeq Ahmed" <[email protected]<mailto:[email protected]>> wrote: Hi, Consider I have a tab delimited text file with 10 columns. Each column is a a set of text. I would like to do a word count for each column. In scala, I would do the following RDD transformation and action: val data = sc.textFile("hdfs://namenode/data.txt") for(i <- 0 until 9){ data.map(_.split("\t",-1)(i)).map((_,1)).reduce(_+_).saveAsTextFile("i") } Within the for loop, it's a parallel process, but each column is sequentially processed from 0 to 9. Is there anyway so that I can process multiple column in parallel in Spark? I saw posting about using AKKA, but RDD itself is already using AKKA. Any pointers would be appreciated. Regards, Laeeq -------------------------------------------------------------------- Save Paper - Do you really need to print this e-mail? Visit www.virginmedia.com for more information, and more fun. This email and any attachments are or may be confidential and legally privileged and are sent solely for the attention of the addressee(s). If you have received this email in error, please delete it from your system: its use, disclosure or copying is unauthorised. Statements and opinions expressed in this email may not represent those of Virgin Media. Any representations or commitments in this email are subject to contract. Registered office: Media House, Bartley Wood Business Park, Hook, Hampshire, RG27 9UP Registered in England and Wales with number 2591237
