Thanks. Knowing that Init() would be called many times per thread really helps.
Basically I did something like this: void Init(FunctionContext* context, StringVal* result) { DebugPrint("Init: current state 0x%llx", reinterpret_cast<std::size_t>(context->GetFunctionState(FunctionContext::THREAD_LOCAL))); auto ptr = context->allocate(4); DebugPrint("Init: store 0x%llx", reinterpret_cast<std::size_t>(context->GetFunctionState(FunctionContext::THREAD_LOCAL))); context->SetFunctionState(FunctionContext::THREAD_LOCAL, ptr); } And just nop in the update, serialize, merge, and finalize. The DebugPrint is printing to a temporary file with timestamp and thread id with syscall(SYS_gettid) The output in the file looks like: 2020-06-05 16:13:07[26318]: Init: current state 0x0 2020-06-05 16:13:07[26318]: Init: store 0xf474008 2020-06-05 16:13:07[26318]: Init: current state 0x0 2020-06-05 16:13:07[26318]: Init: store 0xea7d008 2020-06-05 16:13:07[26318]: Init: current state 0x0 2020-06-05 16:13:07[26318]: Init: store 0xea7c008 2020-06-05 16:13:07[26318]: Init: current state 0x0 2020-06-05 16:13:07[26318]: Init: store 0xe9f6008 2020-06-05 16:13:07[26318]: Init: current state 0x0 2020-06-05 16:13:07[26318]: Init: store 0xe9f7008 2020-06-05 16:13:07[26318]: Init: current state 0xea7d008 2020-06-05 16:13:07[26318]: Init: store 0xea7d0c0 2020-06-05 16:13:07[26318]: Init: current state 0x0 2020-06-05 16:13:07[26318]: Init: store 0xf475008 2020-06-05 16:13:07[26318]: Init: current state 0xea7c008 2020-06-05 16:13:07[26318]: Init: store 0xea7c0c0 2020-06-05 16:13:07[26318]: Init: current state 0x0 2020-06-05 16:13:07[26318]: Init: store 0xff28008 2020-06-05 16:13:07[26318]: Init: current state 0xea7c0c0 2020-06-05 16:13:07[26318]: Init: store 0xea7c178 2020-06-05 16:13:07[26318]: Init: current state 0xff28008 It does seem that the same thread is calling Init many times, with both same FunctionStates and different FunctionStates. In other words, it seems that these Init() calls are grouped, where different groups have different THREAD_LOCAL FunctionState storage and calls within the same group share the same THREAD_LOCAL FunctionState storage. Does my observation make any sense? On Fri, Jun 5, 2020 at 2:55 PM Tim Armstrong <tarmstr...@cloudera.com> wrote: > I think it would be easier to understand what you're seeing if you > provided an example of what the code for your aggregate function looks > like. If you call SetFunctionState(THREAD_LOCAL), I don't see a way that > the pointer you set would be returned from GetFunctionState(THREAD_LOCAL) > in a different thread. > > Init() is called for every aggregate tuple, so it can be called many times > per thread for aggregations with a grouping key. > > Setting the FRAGMENT_LOCAL state only really makes sense for UDFs when > Prepare(FRAGMENT_LOCAL) is called. After that the state is copied to any > thread-local FunctionContexts. Calling SetFunctionState(FRAGMENT_LOCAL) > later on is only going to modify the thread-local FunctionContext anyway. > > On Fri, Jun 5, 2020 at 11:06 AM Shuhao Tan <johnmave...@gmail.com> wrote: > >> Hi all, >> >> I recently wrote some UDA and I noticed that in be/src/udf/udf.h the >> comments for SetFunctionState starts with >> > Methods for maintaining state across UDF/UDA function calls. >> I presume GetFunctionState/SetFunctionState should work for UDA as well. >> >> I first tried to find an example in the repo, but I found the function is >> exclusively used by UDF. >> I then implemented a simple UDA just to test its behaviour. My current >> findings are: >> Even when using SetFunctionState(THREAD_LOCAL, some_ptr) only in the >> Init, other threads (presumably in the same fragment) can still see it with >> GetFunctionState(THREAD_LOCAL) if their Init were invoked later. >> Currently it seems that threads in the same fragment were calling Init >> sequentially without race condition on FunctionState. >> >> My questions: Are GetFunctionState/SetFunctionState well-defined in UDA? >> If so, what are the semantics and execution guarantees? How does passing >> different FunctionStateScope change the behavior? Is it guaranteed >> thread-safe? >> >> Thanks. >> >