Re: SparkR : lapplyPartition transforms the data in vertical format

2014-08-07 Thread Pranay Dave
Hello Shivram
Thanks for your reply. 

Here is a simple data set input. This data is in file called
/sparkdev/datafiles/covariance.txt
1,1
2,2
3,3
4,4
5,5
6,6
7,7
8,8
9,9
10,10

Output I would like to see is a total of columns. It can be done with
reduce, but I wanted to test lapply.

Output I want to see is sum of columns in same row
55,55

But output what I get is in two rows
55, NA
55, NA

Thanks 
Pranay




--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/SparkR-lapplyPartition-transforms-the-data-in-vertical-format-tp11540p11617.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: SparkR : lapplyPartition transforms the data in vertical format

2014-08-07 Thread Zongheng Yang
Hi Pranay,

If this is data format is to be assumed, then I believe the issue starts at

lines - textFile(sc,/sparkdev/datafiles/covariance.txt)
totals - lapply(lines, function(lines)

After the first line, `lines` becomes an RDD of strings, each of which
is a line of the form 1,1. Therefore, the lapply() should be used to
map over each line, like this:

totals - lapply(lines, function(line) ... // modified logic and
treat each line to have the form `x,x`

Doing a quick glance so let me know if this method still doesn't work!

On Wed, Aug 6, 2014 at 11:29 PM, Pranay Dave pranay.da...@gmail.com wrote:
 Hello Shivram
 Thanks for your reply.

 Here is a simple data set input. This data is in file called
 /sparkdev/datafiles/covariance.txt
 1,1
 2,2
 3,3
 4,4
 5,5
 6,6
 7,7
 8,8
 9,9
 10,10

 Output I would like to see is a total of columns. It can be done with
 reduce, but I wanted to test lapply.

 Output I want to see is sum of columns in same row
 55,55

 But output what I get is in two rows
 55, NA
 55, NA

 Thanks
 Pranay




 --
 View this message in context: 
 http://apache-spark-user-list.1001560.n3.nabble.com/SparkR-lapplyPartition-transforms-the-data-in-vertical-format-tp11540p11617.html
 Sent from the Apache Spark User List mailing list archive at Nabble.com.

 -
 To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
 For additional commands, e-mail: user-h...@spark.apache.org


-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: SparkR : lapplyPartition transforms the data in vertical format

2014-08-07 Thread Pranay Dave
Hello Zongheng
Infact the problem is in lapplyPartition
lapply gives output as 
1,1
2,2
3,3
...
10,10

However lapplyPartition gives output as
55, NA
55, NA

Why lapply output is horizontal and lapplyPartition is vertical ?

Here is my code
library(SparkR)


sc - sparkR.init(local)
lines - textFile(sc,/sparkdev/datafiles/covariance.txt)

totals - lapplyPartition(lines, function(lines)
{


sumx - 0
sumy - 0
totaln - 0
for (i in 1:length(lines)){
dataxy - unlist(strsplit(lines[i], ,))
sumx - sumx  + as.numeric(dataxy[1])
sumy - sumy  + as.numeric(dataxy[2])

}

##list(as.numeric(sumx), as.numeric(sumy), as.numeric(sumxy),
as.numeric(totaln))
##list does same as below
c(sumx,sumy)

}

)

output - collect(totals)
for (element in output) {
  cat(as.character(element[1]),as.character(element[2]), \n)
}




--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/SparkR-lapplyPartition-transforms-the-data-in-vertical-format-tp11540p11726.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: SparkR : lapplyPartition transforms the data in vertical format

2014-08-07 Thread Shivaram Venkataraman
I tried this out and what is happening here is that as the input file is
small only 1 partition is created. lapplyPartition runs the given function
on the partition and computes sumx as 55 and sumy as 55. Now the return
value from lapplyPartition is treated as a list by SparkR and collect
concatenates all the lists from all partitions.

Thus output in this case is just a list with two values and trying to
access element[2] in the for loop gives NA. If you just use
cat(as.character(element), \n), you should see 55 and 55.

Thanks
Shivaram


On Thu, Aug 7, 2014 at 3:21 PM, Pranay Dave pranay.da...@gmail.com wrote:

 Hello Zongheng
 Infact the problem is in lapplyPartition
 lapply gives output as
 1,1
 2,2
 3,3
 ...
 10,10

 However lapplyPartition gives output as
 55, NA
 55, NA

 Why lapply output is horizontal and lapplyPartition is vertical ?

 Here is my code
 library(SparkR)


 sc - sparkR.init(local)
 lines - textFile(sc,/sparkdev/datafiles/covariance.txt)

 totals - lapplyPartition(lines, function(lines)
 {


 sumx - 0
 sumy - 0
 totaln - 0
 for (i in 1:length(lines)){
 dataxy - unlist(strsplit(lines[i], ,))
 sumx - sumx  + as.numeric(dataxy[1])
 sumy - sumy  + as.numeric(dataxy[2])

 }

 ##list(as.numeric(sumx), as.numeric(sumy), as.numeric(sumxy),
 as.numeric(totaln))
 ##list does same as below
 c(sumx,sumy)

 }

 )

 output - collect(totals)
 for (element in output) {
   cat(as.character(element[1]),as.character(element[2]), \n)
 }




 --
 View this message in context:
 http://apache-spark-user-list.1001560.n3.nabble.com/SparkR-lapplyPartition-transforms-the-data-in-vertical-format-tp11540p11726.html
 Sent from the Apache Spark User List mailing list archive at Nabble.com.

 -
 To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
 For additional commands, e-mail: user-h...@spark.apache.org




Re: SparkR : lapplyPartition transforms the data in vertical format

2014-08-06 Thread Shivaram Venkataraman
The output of lapply and lapplyPartition should the same by design -- The
only difference is that in lapply the user-defined function returns a row,
while it returns a list in lapplyPartition.

Could you given an example of a small input and output that you expect to
see for the above program ?

Shivaram


On Wed, Aug 6, 2014 at 5:47 AM, Pranay Dave pranay.da...@gmail.com wrote:

 Hello
 As per documentation, lapply works on single records and lapplyPartition
 works on partition
 However the format of output does not change

 When I use lapplypartition, the data is converted to vertical format

 Here is my code
 library(SparkR)


 sc - sparkR.init(local)
 lines - textFile(sc,/sparkdev/datafiles/covariance.txt)

 totals - lapply(lines, function(lines)
 {


 sumx - 0
 sumy - 0
 totaln - 0
 for (i in 1:length(lines)){
 dataxy - unlist(strsplit(lines[i], ,))
 sumx - sumx  + as.numeric(dataxy[1])
 sumy - sumy  + as.numeric(dataxy[2])

 }

 ##list(as.numeric(sumx), as.numeric(sumy), as.numeric(sumxy),
 as.numeric(totaln))
 ##list does same as below
 c(sumx,sumy)

 }

 )

 output - collect(totals)
 for (element in output) {
   cat(as.character(element[1]),as.character(element[2]), \n)
 }

 I am expecting output as 55, 55
 However it is giving
 55,NA
 55,NA

 Where am I going wrong ?
 Thanks
 Pranay



 --
 View this message in context:
 http://apache-spark-user-list.1001560.n3.nabble.com/SparkR-lapplyPartition-transforms-the-data-in-vertical-format-tp11540.html
 Sent from the Apache Spark User List mailing list archive at Nabble.com.

 -
 To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
 For additional commands, e-mail: user-h...@spark.apache.org