About 20 000 000 users and 150 000 items. 0,03% non-zeros. 20 features required.
Pavel 19.11.12 12:31 пользователь "Sebastian Schelter" <[email protected]> написал: >You need to give much more memory than 200 MB to your mappers. What are >the dimensions of your input in terms of users and items? > >--sebastian > >On 19.11.2012 09:28, Abramov Pavel wrote: >> Thanks for your replies. >> >> 1) >>> Can you describe your failure or give us a strack trace? >> >> >> Here is job log: >> >> 12/11/19 09:54:07 INFO als.ParallelALSFactorizationJob: Recomputing U >> (iteration 0/15) >> … >> 12/11/19 10:03:31 INFO mapred.JobClient: Job complete: >> job_201211150152_1671 >> 12/11/19 10:03:31 INFO als.ParallelALSFactorizationJob: Recomputing M >> (iteration 0/15) >> … >> 12/11/19 10:10:04 INFO mapred.JobClient: Task Id : >> attempt_201211150152_<*ALL*>, Status : FAILED >> … >> 12/11/19 10:40:40 INFO mapred.JobClient: Failed map tasks=1 >> >> >> >> All of these mappers (Recomputing M on 1st iteration) fail with "Java >>heap >> space" error. >> >> Here is Hadoop job memory config: >> >> mapred.map.child.java.opts = -Xmx5024m -XX:-UseGCOverheadLimit >> mapred.child.java.opts = -Xmx200m >> mapred.job.reuse.jvm.num.tasks = -1 >> >> >> mapred.cluster.reduce.memory.mb = -1 >> mapred.cluster.map.memory.mb = -1 >> mapred.cluster.max.reduce.memory.mb = -1 >> mapred.job.reduce.memory.mb = -1 >> mapred.job.map.memory.mb = -1 >> mapred.cluster.max.map.memory.mb = -1 >> >> Any tweaks possible? Is mapred.map.child.java.opts ok? >> >> 2) As far as I understand ALS can not load U matrix in RAM (20m users) >> while M is Ok (150k items). Can I split input matrix R (keep all items, >> split by user) to R1, R2, Rn, then compute M and U1 on R1 (many >> iterations, then fix M), then compute U2,U3,Un etc using existing M (0,5 >> iteration, do not recompute M)? I want to do this to avoid Memory issues >> (train on part ). >> My question is: will all the users from U1, U2, Un "exist" in the same >> feature space? Can I then compare users from U1 with users from U2 using >> their features? >> Any tweak possible here >> >> 3) How to calculate maximum matrix size for given items count and memory >> limit? For example, my matrix has 20m users, I want to factorize it >>using >> 20 features. 20m*20*8 = >> 3.2 Gb. On the one hand I want to avoid "Java heap space" on the another >> hand I want to provide my model with maximum training data. I understand >> that minor changes to parallelALS needed. >> >> Have a nice day! >> >> >> Regards, >> Pavel >> >> >> >> >> >> >> >> >> >> >> >> >> >
