Hello, I have the following task :
An application that stores files, enables a user to add and delete files. When such an event occurs I append to a file in a hdfs the following record when there was a file added : userid image-uuid size_in_bytes and the following when a file was removed -userid image-uuid size_in_bytes When calculating the average in the reducer, I will have to subtract the size of the removed file and decrease the total to find the average without that file. Deletions are infrequent events. I thought of, in the reducer keeping a hash map in memory that tracks deletions while I am iterating the value list, so that I can correct the final total and count in the end of the iteration. Oh, and this just reminds me that I will have only one reducer for the single ‘avg' key the mapper emits. What do you think ? Regards --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@hadoop.apache.org For additional commands, e-mail: user-h...@hadoop.apache.org