Hi Koert, On Sun, Aug 26, 2012 at 11:20 PM, Koert Kuipers <[email protected]> wrote: > Hey Harsh, > Thanks for responding! > Would limiting the logging for each task via mapred.userlog.limit.kb be > strictly enforced (while the job is running)? That would solve my issue of > runaway logging on a job filling up the datanode disks. I would set the > limit high since in general i do want to retain logs, just not in case a > single rogue job starts producing many gigabytes of logs. > Thanks!
It is not strictly enforced such as counter limits are. Exceeding it wouldn't fail the task, only cause the extra logged events to not appear at all (thereby limiting the size). > On Sun, Aug 26, 2012 at 1:44 PM, Harsh J <[email protected]> wrote: >> >> Hi Koert, >> >> To answer on point, there is no turning off this feature. >> >> Since you don't seem to care much for logs from tasks persisting, >> perhaps consider lowering the mapred.userlog.retain.hours to a lower >> value than 24 hours (such as 1h)? Or you may even limit the logging >> from each task to a certain amount of KB via mapred.userlog.limit.kb, >> which is unlimited by default. >> >> Would either of these work for you? >> >> On Sun, Aug 26, 2012 at 11:02 PM, Koert Kuipers <[email protected]> wrote: >> > We have smaller nodes (4 to 6 disks), and we used to write logs to the >> > same >> > disk as where the OS is. So if that disks goes then i don't really care >> > about tasktrackers failing. Also, the fact that logs were written to a >> > single partition meant that i could make sure they would not grow too >> > large >> > in case someone had too verbose logging on a large job. With >> > MAPREDUCE-2415 >> > a job that does massive amount of logging can fill up all the >> > mapred.local.dir, which in our case are on the same partition as the >> > hdfs >> > data dirs, so now faulty logging can fill up hdfs storage, which i >> > really >> > don't like. Any ideas? >> > >> > >> >> >> >> -- >> Harsh J > > -- Harsh J
