Hello,
I have time related data like this :
entity_id, timestamp , data

The resolution of the data is something like 5 seconds.
I want to extract the data with 10 minutes resolution.

So what i can do is :
Just emit everything in the mapper as data is not sorted there .
Emit only every 10 minutes from reducer. The reducer is receiving data sorted by entity_id,timestamp pair (secondary sorting)

This will work fine, but it will take forever, since i have to process TB's of data. Also the data emitted to the reducer will be huge( as i am not filtering in map phase at all) and the number of reducers is much smaller than the number of mappers.

Are there any better ideas how to do this ?

Georgi

Reply via email to