Thank you.

Splitting the files leads to multiple MR-tasks!

Only changing the MR settings of hadoop did not help. In the future it
would be nice if the drivers would scale themself and would split the
data according to the dataset size and the number of available MR-slots.

Cheers
Sebastian

Am 28.03.2013 07:25, schrieb Dan Filimon:
> Yes, it des depend on the number of mappers and what Ted suggested
> (splitting the input file) worked for me.
>
> Here's [1] the code I used to split a SequenceFile (I wrote so that it
> re-splits m files into n files, hence the name).
>
> [1] 
> https://github.com/dfilimon/mahout/blob/skm/examples/src/main/java/org/apache/mahout/clustering/streaming/tools/ResplitSequenceFiles.java
>
> On Thu, Mar 28, 2013 at 2:26 AM, Ted Dunning <[email protected]> wrote:
>> Your idea that this is related to your single input file is the most likely
>> cause.
>>
>> If your input file is relatively small then splitting it up to force
>> multiple mappers is the easiest solution.
>>
>> If your input file is larger, then you might be able to convince the
>> map-reduce framework to use more mappers.
>>
>> On Wed, Mar 27, 2013 at 6:09 PM, Sebastian Briesemeister <
>> [email protected]> wrote:
>>
>>> Yes, correct. It currently starts a single Map task.
>>>
>>>
>>>
>>> Ted Dunning <[email protected]> schrieb:
>>>
>>>> Do you mean that it starts a single map task?
>>>>
>>>> On Wed, Mar 27, 2013 at 5:10 PM, Sebastian Briesemeister <
>>>> [email protected]> wrote:
>>>>
>>>>> Dear all,
>>>>>
>>>>> I am trying to start the FuzzyKMeansDriver on a hadoop cluster so
>>>> that
>>>>> it starts multiple MapReduce-Jobs. However, it always starts just a
>>>>> single MR-Job?!
>>>>>
>>>>> I figured it might be caused by the fact that I generated my input
>>>> data
>>>>> into a single file using SequenceFile.Writer???
>>>>> Or is there another way to influence the number of mapper tasks?
>>>>>
>>>>> Thanks in advance
>>>>> Sebastian
>>>>>
>>> --
>>> Diese Nachricht wurde von meinem Android-Mobiltelefon mit K-9 Mail
>>> gesendet.

Reply via email to