So the good news is that the patch runs ;)  The bad news is that it's
slower, going from 1600-1800ms to ~2500ms to calculate a single users' topK
recommendations. For kicks, I ran a couple other experiments, progressively
removing code to isolate the problem area. Results are detailed here:
https://gist.github.com/joshdevins/5106930

Conclusions thus far:
 * the patch is not helpful (for performance) and should be reverted or
fixed again (sorry Sebastian)
 * the dot product operation in `Vector` is not efficient enough for large
vectors/matrices, when used as it is in the ALS `RecommenderJob`, inside a
loop over `M`

I've tried a few other experiments with Colt (for example) but there was no
noticeable gain. Parallelizing inside the map task (manually or with
Parallel Colt) is possible but obviously is not ideal in an environment
like Hadoop -- this would save memory since you only need a few map tasks
loading the matrices, but isn't playing very nicely within a shared cluster
:)

Next step at this point is to look at either reducing the number of items
to recommend over, LSH or a third secret plan that "the PhD's" are thinking
about. Paper forthcoming, no doubt :D

@Sebastian, happy to run any patches on our cluster/dataset before making
more commits.



On 6 March 2013 20:58, Josh Devins <[email protected]> wrote:

> Got sidetracked today but I'll run Sebastian's version in trunk tomorrow
> and report back.
>
>
> On 6 March 2013 17:07, Sebastian Schelter <[email protected]> wrote:
>
>> I already committed a fix in that direction. I modified our
>> FixedSizePriorityQueue to allow inspection of its head for direct
>> comparison. This obviates the need to instantiate a Comparable and offer
>> it to the queue.
>>
>> /s
>>
>>
>> On 06.03.2013 17:01, Ted Dunning wrote:
>> > I would recommend against a mutable object on maintenance grounds.
>> >
>> > Better is to keep the threshold that a new score must meet and only
>> > construct the object on need.  That cuts the allocation down to
>> negligible
>> > levels.
>> >
>> > On Wed, Mar 6, 2013 at 6:11 AM, Sean Owen <[email protected]> wrote:
>> >
>> >> OK, that's reasonable on 35 machines. (You can turn up to 70 reducers,
>> >> probably, as most machines can handle 2 reducers at once).
>> >> I think the recommendation step loads one whole matrix into memory.
>> You're
>> >> not running out of memory but if you're turning up the heap size to
>> >> accommodate, you might be hitting swapping, yes. I think (?) the
>> >> conventional wisdom is to turn off swap for Hadoop.
>> >>
>> >> Sebastian yes that is probably a good optimization; I've had good
>> results
>> >> reusing a mutable object in this context.
>> >>
>> >>
>> >> On Wed, Mar 6, 2013 at 10:54 AM, Josh Devins <[email protected]>
>> wrote:
>> >>
>> >>> The factorization at 2-hours is kind of a non-issue (certainly fast
>> >>> enough). It was run with (if I recall correctly) 30 reducers across a
>> 35
>> >>> node cluster, with 10 iterations.
>> >>>
>> >>> I was a bit shocked at how long the recommendation step took and will
>> >> throw
>> >>> some timing debug in to see where the problem lies exactly. There
>> were no
>> >>> other jobs running on the cluster during these attempts, but it's
>> >> certainly
>> >>> possible that something is swapping or the like. I'll be looking more
>> >>> closely today before I start to consider other options for calculating
>> >> the
>> >>> recommendations.
>> >>>
>> >>>
>> >>
>> >
>>
>>
>

Reply via email to