how do i view the local file system output of a mapper on cygwin + windows?
i am currently testing my map reduce job on Windows + Cygwin + Hadoop v0.20.205. for some strange reason, the list of values (i.e. IterableT values) going into the reducer looks all wrong. i have tracked the map reduce process with logging statements (i.e. logged the input to the map, logged the output from the map, logged the partitioner, logged the input to the reducer). at all stages, everything looks correct except at the reducer. is there anyway (using Windows + Cygwin) to view the local map outputs before they are shuffled/sorted to the reducer? i need to know why the values are incorrect.
Re: how do i view the local file system output of a mapper on cygwin + windows?
i found out what my problem was. apparently, when you iterate over IterableType values, that instance of Type is being used over and over. for example, in my reducer, public void reduce(Key key, IteratorValue values, Context context) throws IOException, InterruptedException { IteratorValue it = values.iterator(); Value a = it.next(); Value b = it.next(); } the variables, a and b of type Value, will be the same object instance! i suppose this behavior of the iterator is to optimize iterating so as to avoid the new operator. On Thu, Apr 5, 2012 at 4:55 PM, Jane Wayne jane.wayne2...@gmail.com wrote: i am currently testing my map reduce job on Windows + Cygwin + Hadoop v0.20.205. for some strange reason, the list of values (i.e. IterableT values) going into the reducer looks all wrong. i have tracked the map reduce process with logging statements (i.e. logged the input to the map, logged the output from the map, logged the partitioner, logged the input to the reducer). at all stages, everything looks correct except at the reducer. is there anyway (using Windows + Cygwin) to view the local map outputs before they are shuffled/sorted to the reducer? i need to know why the values are incorrect.
Re: how do i view the local file system output of a mapper on cygwin + windows?
Jane, Yes and thats documented: http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/mapred/Reducer.html#reduce(K2,%20java.util.Iterator,%20org.apache.hadoop.mapred.OutputCollector,%20org.apache.hadoop.mapred.Reporter) The framework will reuse the key and value objects that are passed into the reduce, therefore the application should clone the objects they want to keep a copy of. On Fri, Apr 6, 2012 at 6:26 AM, Jane Wayne jane.wayne2...@gmail.com wrote: i found out what my problem was. apparently, when you iterate over IterableType values, that instance of Type is being used over and over. for example, in my reducer, public void reduce(Key key, IteratorValue values, Context context) throws IOException, InterruptedException { IteratorValue it = values.iterator(); Value a = it.next(); Value b = it.next(); } the variables, a and b of type Value, will be the same object instance! i suppose this behavior of the iterator is to optimize iterating so as to avoid the new operator. On Thu, Apr 5, 2012 at 4:55 PM, Jane Wayne jane.wayne2...@gmail.com wrote: i am currently testing my map reduce job on Windows + Cygwin + Hadoop v0.20.205. for some strange reason, the list of values (i.e. IterableT values) going into the reducer looks all wrong. i have tracked the map reduce process with logging statements (i.e. logged the input to the map, logged the output from the map, logged the partitioner, logged the input to the reducer). at all stages, everything looks correct except at the reducer. is there anyway (using Windows + Cygwin) to view the local map outputs before they are shuffled/sorted to the reducer? i need to know why the values are incorrect. -- Harsh J