Finding the average over a set of values that are created and deleted

Daniel Santos Thu, 08 Aug 2019 19:04:33 -0700

Hello,

I have the following task :


An application that stores files, enables a user to add and delete files. When 
such an event occurs I append to a file in a hdfs the following record when 
there was a file added :

userid image-uuid size_in_bytes

and the following when a file was removed

-userid image-uuid size_in_bytes

When calculating the average in the reducer, I will have to subtract the size 
of the removed file and decrease the total to find the average without that 
file.

Deletions are infrequent events.

I thought of, in the reducer keeping a hash map in memory that tracks deletions 
while I am iterating the value list, so that I can correct the final total and 
count in the end of the iteration.

Oh, and this just reminds me that I will have only one reducer for the single 
‘avg' key the mapper emits.

What do you think ?

Regards
---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@hadoop.apache.org
For additional commands, e-mail: user-h...@hadoop.apache.org

Finding the average over a set of values that are created and deleted

Reply via email to