Re: Number of Clustering MR-Jobs

Sebastian Briesemeister Thu, 28 Mar 2013 09:12:00 -0700

I tried to increase the heap space, but it wasn't enough.

It seems the problem is not the number of mappers. I will start another
thread for this problem with some more details.


Cheers
Sebastian


Am 28.03.2013 16:41, schrieb Dan Filimon:
> From what I've seen, even if the mapper does throw an out of memory
> exception, Hadoop will restart it increasing the memory.
>
> There are ways to configure the mapper/reducer JVMs to use more memory by
> default through the Configuration although I don't recall the exact
> options. It's probably documented in your Hadoop distribution's
> documentation.
>
>
> On Thu, Mar 28, 2013 at 2:52 PM, Sebastian Briesemeister <
> [email protected]> wrote:
>
>> In my case, each map processes requires a lot of memory and I would like
>> to distribute this consumption on multiple nodes.
>>
>> However, I still get out of memory exceptions even if I split the input
>> file into several very small input files??? I though the mapper would
>> consider only one file at a time and would, hence, have no problems with
>> heap space?
>>
>>
>>
>> Am 28.03.2013 10:20, schrieb Ted Dunning:
>>> This is a longstanding Hadoop issue.
>>>
>>> Your suggestion is interesting, but only a few cases would benefit.  The
>>> problem is that splitting involves reading from a very small number of
>>> nodes and thus is not much better than just running the program with few
>>> mappers.  If the data is large enough to make splitting fast, then Hadoop
>>> will just do it.
>>>
>>> The only win for splitting is when the cost per chunk is very high.  I
>>> think that only random forest might fit into that category.
>>>
>>> On Thu, Mar 28, 2013 at 10:04 AM, Sebastian Briesemeister <
>>> [email protected]> wrote:
>>>
>>>> Splitting the files leads to multiple MR-tasks!
>>>>
>>>> Only changing the MR settings of hadoop did not help. In the future it
>>>> would be nice if the drivers would scale themself and would split the
>>>> data according to the dataset size and the number of available MR-slots.
>>>>
>>

Re: Number of Clustering MR-Jobs

Reply via email to