sorry i had a misunderstanding of the MapRed report output. i did reduce mapreduce.tasktracker.map.tasks.maximum (number of concurrent maps per node) from the default of 2 to 1. i suppose if i want to do this on a per job/user basis i'll try out the hadoop fair scheduler.
On Oct 19, 2010, at 1:27 PM, Jonathan Ellis wrote: > (Moving to u...@.) > > Isn't reducing the number of map tasks the easiest way to tune this? > > Also: in 0.7 you can use NetworkTopologyStrategy to designate a group > of nodes as your hadoop "datacenter" so the workloads won't overlap. > > On Tue, Oct 19, 2010 at 3:22 PM, Michael Moores <mmoo...@real.com> wrote: >> Does it make sense to add some kind of throttle capability on the >> ColumnFamilyRecordReader for Hadoop? >> >> If I have 60 or so Map tasks running at the same time when the cluster is >> already heavily loaded with OLTP operations, I can get some decreased >> on-line performance >> that may not be acceptable. (I'm loading an 8 node cluster with 2000 TPS.) >> By default my cluster of 8 nodes (which are also the Hadoop JobTracker >> nodes) has 8 Map tasks per node making the get_range_slices call, based on >> what the ColumnFamilyInputFormat has calculated from my token ranges. >> I can increase the inputSplitSize (ConfigHelper.setInputSplitSIze()) so >> that there >> is only one Map task per node, and this helps quite a bit. >> >> But is it reasonable to provide a configurable sleep to cause a wait in >> between smaller size range queries? That would stretch out the Map time >> and let the OLTP processing be less affected. >> >> >> --Michael >> >> >> > > > > -- > Jonathan Ellis > Project Chair, Apache Cassandra > co-founder of Riptano, the source for professional Cassandra support > http://riptano.com