Thanks Bharath. Can you check if the logic I am implementing is correct or needed any modification in it as well? I am very new to Impala UDF & C++ and having some hard time figuring out the problems.
On 20 June 2017 at 14:27, Bharath Vissapragada <[email protected]> wrote: > You need to allocate memory for tsTemp, else it can segfault. That could > be the issue here. > > static TimestampVal* tsTemp; > tsTemp->date = 0; > tsTemp->time_of_day = 0; > > > On Tue, Jun 20, 2017 at 2:15 PM, Ravi Kanth <[email protected]> > wrote: > >> Hi All, >> >> We are using Impala to do various processings in our systems. We have a >> requirement recently, wherein we have to handle the updates on the events >> i.e, we have an 'e_update' table which has the partial updates received for >> various events. The fields that are not updated are being stored as NULL >> values. >> >> >> >> Ex: >> >> >> ID (Int) Date_Time (timestamp) A (Int) B (String) C (String) >> 1 0 1 NULL NULL >> 1 1 2 Hi NULL >> 1 3 4 Hello Hi >> 1 2 5 NULL NULL >> 1 4 NULL NULL Zero >> >> >> >> P.S: Please consider Date_time as valid timestamp type values. For easy >> understanding, mentioned them as 0,1,2,3,4,5 >> >> >> >> As seen in the above table, the events have a unique id and as we get >> an update to a particular event, we are storing the date_time at which >> update has happened and also storing the partial updated values. Apart from >> the updated values, the rest are stored as NULL values. >> >> >> >> We are planning to mimic inplace updates on the table, so that it would >> retrieve the resulting table as follows using the query below: We don't >> delete the data. >> >> >> >> > SELECT id, current_val(A,date_time) as A, current_val(B,date_time) as >> B, current_val(C,date_time) as C from e_update GROUP BY ID; >> >> >> >> where, current_val is a custom impala UDA we are planning to implement. >> i.e. get* latest non null value for the column.* >> >> >> ID (Int) A (Int) B (String) C (String) >> 1 4 Hello Zero >> >> >> >> >> >> Implemented current_val UDA: >> >> The below code is only for int type inputs: >> >> >> >> uda-currentval.h >> >> //This is a sample for retrieving the current value of e_update table >> // >> void CurrentValueInit(FunctionContext* context, IntVal* val); >> void CurrentValueUpdate(FunctionContext* context, const IntVal& input, const >> TimestampVal& ts, IntVal* val); >> void CurrentValueMerge(FunctionContext* context, const IntVal& src, IntVal* >> dst); >> IntVal CurrentValueFinalize(FunctionContext* context, const IntVal& val); >> >> uda-currentval.cc >> >> // >> ----------------------------------------------------------------------------------------------- >> // This is a sample for retrieving the current value of e_update table >> //----------------------------------------------------------------------------------------------- >> void CurrentValueInit(FunctionContext* context, IntVal* val) { >> val->is_null = false; >> val->val = 0; >> } >> >> void CurrentValueUpdate(FunctionContext* context, const IntVal& input, const >> TimestampVal& ts, IntVal* val) { >> static TimestampVal* tsTemp; >> tsTemp->date = 0; >> tsTemp->time_of_day = 0; >> if(tsTemp->date==0 && tsTemp->time_of_day==0){ >> tsTemp->date = ts.date; >> tsTemp->time_of_day = ts.time_of_day; >> val->val = input.val; >> return; >> } >> if(ts.date > tsTemp->date && ts.time_of_day > tsTemp->time_of_day){ >> tsTemp->date = ts.date; >> tsTemp->time_of_day = ts.time_of_day; >> val->val = input.val; >> return; >> } >> } >> >> void CurrentValueMerge(FunctionContext* context, const IntVal& src, IntVal* >> dst) { >> dst->val += src.val; >> } >> >> IntVal CurrentValueFinalize(FunctionContext* context, const IntVal& val) { >> return val; >> } >> >> >> >> We are able to build and create an aggregate function in impala, but when >> trying to run the select query similar to the one above, it is bringing >> down couple of impala deamons and throwing the error below and getting >> terminated. >> >> >> >> WARNINGS: Cancelled due to unreachable impalad(s): >> hadoop102.**.**.**.com:22000 >> >> >> >> >> >> We have impalad running on 14 instances. >> >> >> >> Can someone help resolve us this problem and a better way to achieve a >> solution for the scenario explained. >> > >
