Re: SparkR : lapplyPartition transforms the data in vertical format
Hello Shivram Thanks for your reply. Here is a simple data set input. This data is in file called /sparkdev/datafiles/covariance.txt 1,1 2,2 3,3 4,4 5,5 6,6 7,7 8,8 9,9 10,10 Output I would like to see is a total of columns. It can be done with reduce, but I wanted to test lapply. Output I want to see is sum of columns in same row 55,55 But output what I get is in two rows 55, NA 55, NA Thanks Pranay -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/SparkR-lapplyPartition-transforms-the-data-in-vertical-format-tp11540p11617.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: SparkR : lapplyPartition transforms the data in vertical format
Hi Pranay, If this is data format is to be assumed, then I believe the issue starts at lines - textFile(sc,/sparkdev/datafiles/covariance.txt) totals - lapply(lines, function(lines) After the first line, `lines` becomes an RDD of strings, each of which is a line of the form 1,1. Therefore, the lapply() should be used to map over each line, like this: totals - lapply(lines, function(line) ... // modified logic and treat each line to have the form `x,x` Doing a quick glance so let me know if this method still doesn't work! On Wed, Aug 6, 2014 at 11:29 PM, Pranay Dave pranay.da...@gmail.com wrote: Hello Shivram Thanks for your reply. Here is a simple data set input. This data is in file called /sparkdev/datafiles/covariance.txt 1,1 2,2 3,3 4,4 5,5 6,6 7,7 8,8 9,9 10,10 Output I would like to see is a total of columns. It can be done with reduce, but I wanted to test lapply. Output I want to see is sum of columns in same row 55,55 But output what I get is in two rows 55, NA 55, NA Thanks Pranay -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/SparkR-lapplyPartition-transforms-the-data-in-vertical-format-tp11540p11617.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: SparkR : lapplyPartition transforms the data in vertical format
Hello Zongheng Infact the problem is in lapplyPartition lapply gives output as 1,1 2,2 3,3 ... 10,10 However lapplyPartition gives output as 55, NA 55, NA Why lapply output is horizontal and lapplyPartition is vertical ? Here is my code library(SparkR) sc - sparkR.init(local) lines - textFile(sc,/sparkdev/datafiles/covariance.txt) totals - lapplyPartition(lines, function(lines) { sumx - 0 sumy - 0 totaln - 0 for (i in 1:length(lines)){ dataxy - unlist(strsplit(lines[i], ,)) sumx - sumx + as.numeric(dataxy[1]) sumy - sumy + as.numeric(dataxy[2]) } ##list(as.numeric(sumx), as.numeric(sumy), as.numeric(sumxy), as.numeric(totaln)) ##list does same as below c(sumx,sumy) } ) output - collect(totals) for (element in output) { cat(as.character(element[1]),as.character(element[2]), \n) } -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/SparkR-lapplyPartition-transforms-the-data-in-vertical-format-tp11540p11726.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: SparkR : lapplyPartition transforms the data in vertical format
I tried this out and what is happening here is that as the input file is small only 1 partition is created. lapplyPartition runs the given function on the partition and computes sumx as 55 and sumy as 55. Now the return value from lapplyPartition is treated as a list by SparkR and collect concatenates all the lists from all partitions. Thus output in this case is just a list with two values and trying to access element[2] in the for loop gives NA. If you just use cat(as.character(element), \n), you should see 55 and 55. Thanks Shivaram On Thu, Aug 7, 2014 at 3:21 PM, Pranay Dave pranay.da...@gmail.com wrote: Hello Zongheng Infact the problem is in lapplyPartition lapply gives output as 1,1 2,2 3,3 ... 10,10 However lapplyPartition gives output as 55, NA 55, NA Why lapply output is horizontal and lapplyPartition is vertical ? Here is my code library(SparkR) sc - sparkR.init(local) lines - textFile(sc,/sparkdev/datafiles/covariance.txt) totals - lapplyPartition(lines, function(lines) { sumx - 0 sumy - 0 totaln - 0 for (i in 1:length(lines)){ dataxy - unlist(strsplit(lines[i], ,)) sumx - sumx + as.numeric(dataxy[1]) sumy - sumy + as.numeric(dataxy[2]) } ##list(as.numeric(sumx), as.numeric(sumy), as.numeric(sumxy), as.numeric(totaln)) ##list does same as below c(sumx,sumy) } ) output - collect(totals) for (element in output) { cat(as.character(element[1]),as.character(element[2]), \n) } -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/SparkR-lapplyPartition-transforms-the-data-in-vertical-format-tp11540p11726.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: SparkR : lapplyPartition transforms the data in vertical format
The output of lapply and lapplyPartition should the same by design -- The only difference is that in lapply the user-defined function returns a row, while it returns a list in lapplyPartition. Could you given an example of a small input and output that you expect to see for the above program ? Shivaram On Wed, Aug 6, 2014 at 5:47 AM, Pranay Dave pranay.da...@gmail.com wrote: Hello As per documentation, lapply works on single records and lapplyPartition works on partition However the format of output does not change When I use lapplypartition, the data is converted to vertical format Here is my code library(SparkR) sc - sparkR.init(local) lines - textFile(sc,/sparkdev/datafiles/covariance.txt) totals - lapply(lines, function(lines) { sumx - 0 sumy - 0 totaln - 0 for (i in 1:length(lines)){ dataxy - unlist(strsplit(lines[i], ,)) sumx - sumx + as.numeric(dataxy[1]) sumy - sumy + as.numeric(dataxy[2]) } ##list(as.numeric(sumx), as.numeric(sumy), as.numeric(sumxy), as.numeric(totaln)) ##list does same as below c(sumx,sumy) } ) output - collect(totals) for (element in output) { cat(as.character(element[1]),as.character(element[2]), \n) } I am expecting output as 55, 55 However it is giving 55,NA 55,NA Where am I going wrong ? Thanks Pranay -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/SparkR-lapplyPartition-transforms-the-data-in-vertical-format-tp11540.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org