Hello All, So I had a single node pseudo cluster that has been calculating me some statistics running for a year. finally it grew more than do-it-at-home task.
So I have my data uploaded to s3, and I have configured everything so that I can load my tables, and load the partitions, and the data is available to the elastic map reduce. I have number of problems I need to solve before I can use this in any useful manner. First: I load data and I must run number of queries where the input is a partition name. usually MMDDHH. so each time the script runs, I must keep a state where I left last, and then it must do some processing for the partitions that are newly loaded. considering I am using s3, how can I store state? perhaps in some other table, that is also stored in s3? is it a good approach, to keep states and such things in other tables, like in sql's old days? another problem I am having is how to implement a function that will increase partition. how will i know what are the newest loaded partiton? also is there like a cursor in HQL? Best regards, C.B.