how do i view the local file system output of a mapper on cygwin + windows?

2012-04-05 Thread Jane Wayne
i am currently testing my map reduce job on Windows + Cygwin + Hadoop
v0.20.205. for some strange reason, the list of values (i.e.
IterableT values) going into the reducer looks all wrong. i have
tracked the map reduce process with logging statements (i.e. logged
the input to the map, logged the output from the map, logged the
partitioner, logged the input to the reducer). at all stages,
everything looks correct except at the reducer.

is there anyway (using Windows  + Cygwin) to view the local map
outputs before they are shuffled/sorted to the reducer? i need to know
why the values are incorrect.


Re: how do i view the local file system output of a mapper on cygwin + windows?

2012-04-05 Thread Jane Wayne
i found out what my problem was. apparently, when you iterate over
IterableType values, that instance of Type is being used over and
over. for example, in my reducer,

public void reduce(Key key, IteratorValue values, Context context)
throws IOException, InterruptedException {
 IteratorValue it = values.iterator();
 Value a = it.next();
 Value b = it.next();
}

the variables, a and b of type Value, will be the same object
instance! i suppose this behavior of the iterator is to optimize
iterating so as to avoid the new operator.



On Thu, Apr 5, 2012 at 4:55 PM, Jane Wayne jane.wayne2...@gmail.com wrote:
 i am currently testing my map reduce job on Windows + Cygwin + Hadoop
 v0.20.205. for some strange reason, the list of values (i.e.
 IterableT values) going into the reducer looks all wrong. i have
 tracked the map reduce process with logging statements (i.e. logged
 the input to the map, logged the output from the map, logged the
 partitioner, logged the input to the reducer). at all stages,
 everything looks correct except at the reducer.

 is there anyway (using Windows  + Cygwin) to view the local map
 outputs before they are shuffled/sorted to the reducer? i need to know
 why the values are incorrect.


Re: how do i view the local file system output of a mapper on cygwin + windows?

2012-04-05 Thread Harsh J
Jane,

Yes and thats documented:
http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/mapred/Reducer.html#reduce(K2,%20java.util.Iterator,%20org.apache.hadoop.mapred.OutputCollector,%20org.apache.hadoop.mapred.Reporter)

The framework will reuse the key and value objects that are passed
into the reduce, therefore the application should clone the objects
they want to keep a copy of.

On Fri, Apr 6, 2012 at 6:26 AM, Jane Wayne jane.wayne2...@gmail.com wrote:
 i found out what my problem was. apparently, when you iterate over
 IterableType values, that instance of Type is being used over and
 over. for example, in my reducer,

 public void reduce(Key key, IteratorValue values, Context context)
 throws IOException, InterruptedException {
  IteratorValue it = values.iterator();
  Value a = it.next();
  Value b = it.next();
 }

 the variables, a and b of type Value, will be the same object
 instance! i suppose this behavior of the iterator is to optimize
 iterating so as to avoid the new operator.



 On Thu, Apr 5, 2012 at 4:55 PM, Jane Wayne jane.wayne2...@gmail.com wrote:
 i am currently testing my map reduce job on Windows + Cygwin + Hadoop
 v0.20.205. for some strange reason, the list of values (i.e.
 IterableT values) going into the reducer looks all wrong. i have
 tracked the map reduce process with logging statements (i.e. logged
 the input to the map, logged the output from the map, logged the
 partitioner, logged the input to the reducer). at all stages,
 everything looks correct except at the reducer.

 is there anyway (using Windows  + Cygwin) to view the local map
 outputs before they are shuffled/sorted to the reducer? i need to know
 why the values are incorrect.



-- 
Harsh J