Re: Local file being refrenced in mapper function

2014-05-30 Thread Rahul Bhojwani
Thanks jey I was hellpful. On Sat, May 31, 2014 at 12:45 AM, Rahul Bhojwani < rahulbhojwani2...@gmail.com> wrote: > Thanks Marcelo, > > It actually made my few concepts clear. (y). > > > On Fri, May 30, 2014 at 10:14 PM, Marcelo Vanzin > wrote: > >> Hello there, >> >> On Fri, May 30, 2014 at 9

Re: Local file being refrenced in mapper function

2014-05-30 Thread Rahul Bhojwani
Thanks Marcelo, It actually made my few concepts clear. (y). On Fri, May 30, 2014 at 10:14 PM, Marcelo Vanzin wrote: > Hello there, > > On Fri, May 30, 2014 at 9:36 AM, Marcelo Vanzin > wrote: > > workbook = xlsxwriter.Workbook('output_excel.xlsx') > > worksheet = workbook.add_worksheet() > >

Re: Local file being refrenced in mapper function

2014-05-30 Thread Jey Kottalam
Hi Rahul, Marcelo's explanation is correct. Here's a possible approach to your program, in pseudo-Python: # connect to Spark cluster sc = SparkContext(...) # load input data input_data = load_xls(file("input.xls")) input_rows = input_data['Sheet1'].rows # create RDD on cluster input_rdd = sc.p

Re: Local file being refrenced in mapper function

2014-05-30 Thread Marcelo Vanzin
Hello there, On Fri, May 30, 2014 at 9:36 AM, Marcelo Vanzin wrote: > workbook = xlsxwriter.Workbook('output_excel.xlsx') > worksheet = workbook.add_worksheet() > > data = sc.textFile("xyz.txt") > # xyz.txt is a file whose each line contains string delimited by > > row=0 > > def mapperFunc(x): >

Re: Local file being refrenced in mapper function

2014-05-30 Thread Marcelo Vanzin
Hi Rahul, I'll just copy & paste your question here to aid with context, and reply afterwards. - Can I write the RDD data in excel file along with mapping in apache-spark? Is that a correct way? Isn't that a writing will be a local function and can't be passed over the clusters?? Below is g

Local file being refrenced in mapper function

2014-05-30 Thread Rahul Bhojwani
Hi, I recently posted a question on stackoverflow but didn't get any reply. I joined the mailing list now. Can anyone of you guide me a way for the problem mentioned in http://stackoverflow.com/questions/23923966/writing-the-rdd-data-in-excel-file-along-mapping-in-apache-spark Thanks in advance