I tried to rework and optimize the code a little in trunk today,
https://issues.apache.org/jira/browse/MAHOUT-1151

You could use this as a basis for further optimization.

Best,
Sebastian


On 06.03.2013 12:44, Josh Devins wrote:
> First bit of feedback. The `M.forEachPair` loop is about 1600-1800 millis
> per user (recall the size is ~2.6M users x ~2.8M items). There doesn't
> appear to be any out of the ordinary GC going on (yet). Going to look at
> optimising this loop a bit and see where I can get. Definitely time-boxing
> this though ;)
> 
> 
> On 6 March 2013 12:16, Sebastian Schelter <[email protected]> wrote:
> 
>> Btw, all important jobs in ALS are map-only, so its the number of map
>> slotes that counts.
>>
>> On 06.03.2013 12:11, Sean Owen wrote:
>>> OK, that's reasonable on 35 machines. (You can turn up to 70 reducers,
>>> probably, as most machines can handle 2 reducers at once).
>>> I think the recommendation step loads one whole matrix into memory.
>> You're
>>> not running out of memory but if you're turning up the heap size to
>>> accommodate, you might be hitting swapping, yes. I think (?) the
>>> conventional wisdom is to turn off swap for Hadoop.
>>>
>>> Sebastian yes that is probably a good optimization; I've had good results
>>> reusing a mutable object in this context.
>>>
>>>
>>> On Wed, Mar 6, 2013 at 10:54 AM, Josh Devins <[email protected]> wrote:
>>>
>>>> The factorization at 2-hours is kind of a non-issue (certainly fast
>>>> enough). It was run with (if I recall correctly) 30 reducers across a 35
>>>> node cluster, with 10 iterations.
>>>>
>>>> I was a bit shocked at how long the recommendation step took and will
>> throw
>>>> some timing debug in to see where the problem lies exactly. There were
>> no
>>>> other jobs running on the cluster during these attempts, but it's
>> certainly
>>>> possible that something is swapping or the like. I'll be looking more
>>>> closely today before I start to consider other options for calculating
>> the
>>>> recommendations.
>>>>
>>>>
>>>
>>
>>
> 

Reply via email to