I tried to increase the heap space, but it wasn't enough. It seems the problem is not the number of mappers. I will start another thread for this problem with some more details.
Cheers Sebastian Am 28.03.2013 16:41, schrieb Dan Filimon: > From what I've seen, even if the mapper does throw an out of memory > exception, Hadoop will restart it increasing the memory. > > There are ways to configure the mapper/reducer JVMs to use more memory by > default through the Configuration although I don't recall the exact > options. It's probably documented in your Hadoop distribution's > documentation. > > > On Thu, Mar 28, 2013 at 2:52 PM, Sebastian Briesemeister < > [email protected]> wrote: > >> In my case, each map processes requires a lot of memory and I would like >> to distribute this consumption on multiple nodes. >> >> However, I still get out of memory exceptions even if I split the input >> file into several very small input files??? I though the mapper would >> consider only one file at a time and would, hence, have no problems with >> heap space? >> >> >> >> Am 28.03.2013 10:20, schrieb Ted Dunning: >>> This is a longstanding Hadoop issue. >>> >>> Your suggestion is interesting, but only a few cases would benefit. The >>> problem is that splitting involves reading from a very small number of >>> nodes and thus is not much better than just running the program with few >>> mappers. If the data is large enough to make splitting fast, then Hadoop >>> will just do it. >>> >>> The only win for splitting is when the cost per chunk is very high. I >>> think that only random forest might fit into that category. >>> >>> On Thu, Mar 28, 2013 at 10:04 AM, Sebastian Briesemeister < >>> [email protected]> wrote: >>> >>>> Splitting the files leads to multiple MR-tasks! >>>> >>>> Only changing the MR settings of hadoop did not help. In the future it >>>> would be nice if the drivers would scale themself and would split the >>>> data according to the dataset size and the number of available MR-slots. >>>> >>
