You can always define an RDD transpose function yourself. This is what I use in PySpark to transpose an RDD of numpy vectors. It’s not optimal and the vectors need to fit in memory on the worker nodes. def rddTranspose(rdd): # add an index to the rows and the columns, result in triplet dataT1 = data.zipWithIndex().flatMap(lambda (x,i): [(i,j,e) for (j,e) in enumerate(x)]) # use the column from the original as key and group and sort dataT2 = dataT1.map(lambda (i,j,e): (j, (i,e)))\ .groupByKey().sortByKey() # Sort the lists inside the rows dataT3 = dataT2.map(lambda (i, x): sorted(list(x), cmp=lambda (i1,e1),(i2,e2): cmp(i1, i2))) # Remove the indices inside the rows dataT4 = dataT3.map(lambda x: map(lambda (i, y): y , x)) # convert to numpy arrays in the rows return dataT4.map(lambda x: np.asarray(x))
Cheers, Toni On 12 Jan 2015 at 20:45:58, Alex Minnaar (aminn...@verticalscope.com) wrote: That's not quite what I'm looking for. Let me provide an example. I have a rowmatrix A that is nxm and I have two local matrices b and c. b is mx1 and c is nx1. In my spark job I wish to perform the following two computations A*b and A^T*c I don't think this is possible without being able to transpose a rowmatrix. Am I correct? Thanks, Alex From: Reza Zadeh <r...@databricks.com> Sent: Monday, January 12, 2015 1:58 PM To: Alex Minnaar Cc: u...@spark.incubator.apache.org Subject: Re: RowMatrix multiplication As you mentioned, you can perform A * b, where A is a rowmatrix and b is a local matrix. From your email, I figure you want to compute b * A^T. To do this, you can compute C = A b^T, whose result is the transpose of what you were looking for, i.e. C^T = b * A^T. To undo the transpose, you would have transpose C manually yourself. Be careful though, because the result might not have each Row fit in memory on a single machine, which is what RowMatrix requires. This danger is why we didn't provide a transpose operation in RowMatrix natively. To address this and more, there is an effort to provide more comprehensive linear algebra through block matrices, which will likely make it to 1.3: https://issues.apache.org/jira/browse/SPARK-3434 Best, Reza On Mon, Jan 12, 2015 at 6:33 AM, Alex Minnaar <aminn...@verticalscope.com> wrote: I have a rowMatrix on which I want to perform two multiplications. The first is a right multiplication with a local matrix which is fine. But after that I also wish to right multiply the transpose of my rowMatrix with a different local matrix. I understand that there is no functionality to transpose a rowMatrix at this time but I was wondering if anyone could suggest a any kind of work-around for this. I had thought that I might be able to initially create two rowMatrices - a normal version and a transposed version - and use either when appropriate. Can anyone think of another alternative? Thanks, Alex