Hi Josh,

Did you run the patch from the jira issue or did you run the trunk? I
made some follow up changes after uploading the patch. I can't imagine
why those small changes would lead to an increase of 50% in the runtime.

/s



On 07.03.2013 15:02, Josh Devins wrote:
> So the good news is that the patch runs ;)  The bad news is that it's
> slower, going from 1600-1800ms to ~2500ms to calculate a single users' topK
> recommendations. For kicks, I ran a couple other experiments, progressively
> removing code to isolate the problem area. Results are detailed here:
> https://gist.github.com/joshdevins/5106930
> 
> Conclusions thus far:
>  * the patch is not helpful (for performance) and should be reverted or
> fixed again (sorry Sebastian)
>  * the dot product operation in `Vector` is not efficient enough for large
> vectors/matrices, when used as it is in the ALS `RecommenderJob`, inside a
> loop over `M`
> 
> I've tried a few other experiments with Colt (for example) but there was no
> noticeable gain. Parallelizing inside the map task (manually or with
> Parallel Colt) is possible but obviously is not ideal in an environment
> like Hadoop -- this would save memory since you only need a few map tasks
> loading the matrices, but isn't playing very nicely within a shared cluster
> :)
> 
> Next step at this point is to look at either reducing the number of items
> to recommend over, LSH or a third secret plan that "the PhD's" are thinking
> about. Paper forthcoming, no doubt :D
> 
> @Sebastian, happy to run any patches on our cluster/dataset before making
> more commits.
> 
> 
> 
> On 6 March 2013 20:58, Josh Devins <[email protected]> wrote:
> 
>> Got sidetracked today but I'll run Sebastian's version in trunk tomorrow
>> and report back.
>>
>>
>> On 6 March 2013 17:07, Sebastian Schelter <[email protected]> wrote:
>>
>>> I already committed a fix in that direction. I modified our
>>> FixedSizePriorityQueue to allow inspection of its head for direct
>>> comparison. This obviates the need to instantiate a Comparable and offer
>>> it to the queue.
>>>
>>> /s
>>>
>>>
>>> On 06.03.2013 17:01, Ted Dunning wrote:
>>>> I would recommend against a mutable object on maintenance grounds.
>>>>
>>>> Better is to keep the threshold that a new score must meet and only
>>>> construct the object on need.  That cuts the allocation down to
>>> negligible
>>>> levels.
>>>>
>>>> On Wed, Mar 6, 2013 at 6:11 AM, Sean Owen <[email protected]> wrote:
>>>>
>>>>> OK, that's reasonable on 35 machines. (You can turn up to 70 reducers,
>>>>> probably, as most machines can handle 2 reducers at once).
>>>>> I think the recommendation step loads one whole matrix into memory.
>>> You're
>>>>> not running out of memory but if you're turning up the heap size to
>>>>> accommodate, you might be hitting swapping, yes. I think (?) the
>>>>> conventional wisdom is to turn off swap for Hadoop.
>>>>>
>>>>> Sebastian yes that is probably a good optimization; I've had good
>>> results
>>>>> reusing a mutable object in this context.
>>>>>
>>>>>
>>>>> On Wed, Mar 6, 2013 at 10:54 AM, Josh Devins <[email protected]>
>>> wrote:
>>>>>
>>>>>> The factorization at 2-hours is kind of a non-issue (certainly fast
>>>>>> enough). It was run with (if I recall correctly) 30 reducers across a
>>> 35
>>>>>> node cluster, with 10 iterations.
>>>>>>
>>>>>> I was a bit shocked at how long the recommendation step took and will
>>>>> throw
>>>>>> some timing debug in to see where the problem lies exactly. There
>>> were no
>>>>>> other jobs running on the cluster during these attempts, but it's
>>>>> certainly
>>>>>> possible that something is swapping or the like. I'll be looking more
>>>>>> closely today before I start to consider other options for calculating
>>>>> the
>>>>>> recommendations.
>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>>
>>
> 

Reply via email to