amazon elastic mapreduce

Cam Bazz Sun, 11 Dec 2011 16:48:36 -0800

Hello All,

So I had a single node pseudo cluster that has been calculating me
some statistics running for a year. finally it grew more than
do-it-at-home task.


So I have my data uploaded to s3, and I have configured everything so
that I can load my tables, and load the partitions, and the data is
available to the elastic map reduce.

I have number of problems I need to solve before I can use this in any
useful manner.

First: I load data and I must run number of queries where the input is
a partition name. usually MMDDHH. so each time the script runs, I must
keep a state where I left last, and then it must do some processing
for the partitions  that are newly loaded.

considering I am using s3, how can I store state? perhaps in some
other table, that is also stored in s3? is it a good approach, to keep
states and such things in other tables, like in sql's old days?

another problem I am having is how to implement a function that will
increase partition. how will i know what are the newest loaded
partiton?

also is there like a cursor in HQL?

Best regards,
C.B.

amazon elastic mapreduce

Reply via email to