>From what I've seen, even if the mapper does throw an out of memory exception, Hadoop will restart it increasing the memory.
There are ways to configure the mapper/reducer JVMs to use more memory by default through the Configuration although I don't recall the exact options. It's probably documented in your Hadoop distribution's documentation. On Thu, Mar 28, 2013 at 2:52 PM, Sebastian Briesemeister < [email protected]> wrote: > In my case, each map processes requires a lot of memory and I would like > to distribute this consumption on multiple nodes. > > However, I still get out of memory exceptions even if I split the input > file into several very small input files??? I though the mapper would > consider only one file at a time and would, hence, have no problems with > heap space? > > > > Am 28.03.2013 10:20, schrieb Ted Dunning: > > This is a longstanding Hadoop issue. > > > > Your suggestion is interesting, but only a few cases would benefit. The > > problem is that splitting involves reading from a very small number of > > nodes and thus is not much better than just running the program with few > > mappers. If the data is large enough to make splitting fast, then Hadoop > > will just do it. > > > > The only win for splitting is when the cost per chunk is very high. I > > think that only random forest might fit into that category. > > > > On Thu, Mar 28, 2013 at 10:04 AM, Sebastian Briesemeister < > > [email protected]> wrote: > > > >> Splitting the files leads to multiple MR-tasks! > >> > >> Only changing the MR settings of hadoop did not help. In the future it > >> would be nice if the drivers would scale themself and would split the > >> data according to the dataset size and the number of available MR-slots. > >> > >
