End of block/file for Map
Is there a way for the Map process to know it's the end of records? I need to flush some additional data at the end of the Map process, but wondering where I should put that code. Thanks, -Songting
Re: End of block/file for Map
On Dec 9, 2008, at 11:35 AM, Songting Chen wrote: Is there a way for the Map process to know it's the end of records? I need to flush some additional data at the end of the Map process, but wondering where I should put that code. The close() method is called at the end of the map. -- Owen
Re: End of block/file for Map
That's true, but you should be aware that you no longer have an OutputCollector available in the close() method. So if you are planning to have each mapper emit some sort of end record along to the reducer, you can't do so there. In general, there is not a good solution to that; you should rethink your algorithm if possible so that you don't need to do that. (I am not sure what happens if you memoize the OutputCollector you got as a parameter to your map() method and try to use it. Probably nothing good.) - Aaron On Tue, Dec 9, 2008 at 11:42 AM, Owen O'Malley [EMAIL PROTECTED] wrote: On Dec 9, 2008, at 11:35 AM, Songting Chen wrote: Is there a way for the Map process to know it's the end of records? I need to flush some additional data at the end of the Map process, but wondering where I should put that code. The close() method is called at the end of the map. -- Owen
Re: End of block/file for Map
On Dec 9, 2008, at 7:34 PM, Aaron Kimball wrote: That's true, but you should be aware that you no longer have an OutputCollector available in the close() method. True, but in practice you can keep a handle to it from the map method and it will work perfectly. This is required for both streaming and pipes to work. (Both of them do their processing asynchronously, so the close needs to wait for the subprocess to finish. Because of this, the contract with the Mapper and Reducer are very loose and the collect method may be called in between calls to the map method.) In the context object api (hadoop-1230), the api will include the context object in cleanup, to make it clear that cleanup can also write records. -- Owen