Repository: spark
Updated Branches:
  refs/heads/branch-1.0 35aa2448a -> 010040fd0


Use numpy directly for matrix multiply.

Using matrix multiply to compute XtX and XtY yields a 5-20x speedup depending 
on problem size.

For example - the following takes 19s locally after this change vs. 5m21s 
before the change. (16x speedup).
bin/pyspark examples/src/main/python/als.py local[8] 1000 1000 50 10 10

Author: Evan Sparks <evan.spa...@gmail.com>

Closes #687 from etrain/patch-1 and squashes the following commits:

e094dbc [Evan Sparks] Touching only diaganols on update.
d1ab9b6 [Evan Sparks] Use numpy directly for matrix multiply.

(cherry picked from commit 6ed7e2cd01955adfbb3960e2986b6d19eaee8717)
Signed-off-by: Reynold Xin <r...@apache.org>


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/010040fd
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/010040fd
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/010040fd

Branch: refs/heads/branch-1.0
Commit: 010040fd0ddd38717e8747c884bc8b1cbf684d38
Parents: 35aa244
Author: Evan Sparks <evan.spa...@gmail.com>
Authored: Thu May 8 00:24:36 2014 -0400
Committer: Reynold Xin <r...@apache.org>
Committed: Thu May 8 00:24:44 2014 -0400

----------------------------------------------------------------------
 examples/src/main/python/als.py | 15 +++++++--------
 1 file changed, 7 insertions(+), 8 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/spark/blob/010040fd/examples/src/main/python/als.py
----------------------------------------------------------------------
diff --git a/examples/src/main/python/als.py b/examples/src/main/python/als.py
index a77dfb2..33700ab 100755
--- a/examples/src/main/python/als.py
+++ b/examples/src/main/python/als.py
@@ -36,14 +36,13 @@ def rmse(R, ms, us):
 def update(i, vec, mat, ratings):
     uu = mat.shape[0]
     ff = mat.shape[1]
-    XtX = matrix(np.zeros((ff, ff)))
-    Xty = np.zeros((ff, 1))
-
-    for j in range(uu):
-        v = mat[j, :]
-        XtX += v.T * v
-        Xty += v.T * ratings[i, j]
-    XtX += np.eye(ff, ff) * LAMBDA * uu
+    
+    XtX = mat.T * mat
+    XtY = mat.T * ratings[i, :].T
+    
+    for j in range(ff):
+        XtX[j,j] += LAMBDA * uu
+    
     return np.linalg.solve(XtX, Xty)
 
 if __name__ == "__main__":

Reply via email to