Thanks for your replies. 1) > Can you describe your failure or give us a strack trace?
Here is job log: 12/11/19 09:54:07 INFO als.ParallelALSFactorizationJob: Recomputing U (iteration 0/15) … 12/11/19 10:03:31 INFO mapred.JobClient: Job complete: job_201211150152_1671 12/11/19 10:03:31 INFO als.ParallelALSFactorizationJob: Recomputing M (iteration 0/15) … 12/11/19 10:10:04 INFO mapred.JobClient: Task Id : attempt_201211150152_<*ALL*>, Status : FAILED … 12/11/19 10:40:40 INFO mapred.JobClient: Failed map tasks=1 All of these mappers (Recomputing M on 1st iteration) fail with "Java heap space" error. Here is Hadoop job memory config: mapred.map.child.java.opts = -Xmx5024m -XX:-UseGCOverheadLimit mapred.child.java.opts = -Xmx200m mapred.job.reuse.jvm.num.tasks = -1 mapred.cluster.reduce.memory.mb = -1 mapred.cluster.map.memory.mb = -1 mapred.cluster.max.reduce.memory.mb = -1 mapred.job.reduce.memory.mb = -1 mapred.job.map.memory.mb = -1 mapred.cluster.max.map.memory.mb = -1 Any tweaks possible? Is mapred.map.child.java.opts ok? 2) As far as I understand ALS can not load U matrix in RAM (20m users) while M is Ok (150k items). Can I split input matrix R (keep all items, split by user) to R1, R2, Rn, then compute M and U1 on R1 (many iterations, then fix M), then compute U2,U3,Un etc using existing M (0,5 iteration, do not recompute M)? I want to do this to avoid Memory issues (train on part ). My question is: will all the users from U1, U2, Un "exist" in the same feature space? Can I then compare users from U1 with users from U2 using their features? Any tweak possible here 3) How to calculate maximum matrix size for given items count and memory limit? For example, my matrix has 20m users, I want to factorize it using 20 features. 20m*20*8 = 3.2 Gb. On the one hand I want to avoid "Java heap space" on the another hand I want to provide my model with maximum training data. I understand that minor changes to parallelALS needed. Have a nice day! Regards, Pavel
