End of block/file for Map

2008-12-09 Thread Songting Chen
Is there a way for the Map process to know it's the end of records?

I need to flush some additional data at the end of the Map process, but 
wondering where I should put that code.

Thanks,
-Songting


Re: End of block/file for Map

2008-12-09 Thread Owen O'Malley


On Dec 9, 2008, at 11:35 AM, Songting Chen wrote:


Is there a way for the Map process to know it's the end of records?

I need to flush some additional data at the end of the Map process,  
but wondering where I should put that code.


The close() method is called at the end of the map.

-- Owen


Re: End of block/file for Map

2008-12-09 Thread Aaron Kimball
That's true, but you should be aware that you no longer have an
OutputCollector available in the close() method.  So if you are planning to
have each mapper emit some sort of end record along to the reducer, you
can't do so there. In general, there is not a good solution to that; you
should rethink your algorithm if possible so that you don't need to do that.


(I am not sure what happens if you memoize the OutputCollector you got as a
parameter to your map() method and try to use it. Probably nothing good.)

- Aaron


On Tue, Dec 9, 2008 at 11:42 AM, Owen O'Malley [EMAIL PROTECTED] wrote:


 On Dec 9, 2008, at 11:35 AM, Songting Chen wrote:

  Is there a way for the Map process to know it's the end of records?

 I need to flush some additional data at the end of the Map process, but
 wondering where I should put that code.


 The close() method is called at the end of the map.

 -- Owen



Re: End of block/file for Map

2008-12-09 Thread Owen O'Malley


On Dec 9, 2008, at 7:34 PM, Aaron Kimball wrote:


That's true, but you should be aware that you no longer have an
OutputCollector available in the close() method.


True, but in practice you can keep a handle to it from the map method  
and it will work perfectly. This is required for both streaming and  
pipes to work. (Both of them do their processing asynchronously, so  
the close needs to wait for the subprocess to finish. Because of this,  
the contract with the Mapper and Reducer are very loose and the  
collect method may be called in between calls to the map method.)  In  
the context object api (hadoop-1230), the api will include the context  
object in cleanup, to make it clear that cleanup can also write records.


-- Owen