Re: Best Practices for Calling Server Side Functions

Real Wes Williams Mon, 18 Apr 2016 11:06:51 -0700

Thanks Barry. This is a needed thread.

Before leaving RegionFunctionContext, I execute queries often within a 
function.onRegion. Is there a good reason why we don’t support passing the RFC 
into a query when using bind parameters? If not, I’d like to add that to a 
Geode enhancement to eke out even more performance.  In performance tests using 
a region with only 2,000 entries with a small number of nodes, I did not see a 
performance difference between:
A) executing a query using RFC, and
B) executing a query with bind parameters not using RFC


although supporting the RFC with B should theoretically be even faster.

Regards,
Wes Williams

http://gemfire81.docs.pivotal.io/docs-gemfire/developing/query_additional/using_query_bind_parameters.html#concept_173E775FE46B47DF9D7D1E40680D34DF
 
<http://gemfire81.docs.pivotal.io/docs-gemfire/developing/query_additional/using_query_bind_parameters.html#concept_173E775FE46B47DF9D7D1E40680D34DF>


> On Apr 15, 2016, at 9:07 PM, Barry Oglesby <[email protected]> wrote:
> 
> Wes,
> 
> Because your regions are colocated, your example actually works. I'm not sure 
> why you'd do this, and I'm not sure I'd recommend it.
> 
> Under the covers, the query determines it is on the Trade region. Then it 
> gets the buckets (set of integers) from the RegionFunctionContext. Then, the 
> query, parameters and buckets are passed to the region to be executed.
> 
> So, the only thing the RFC is used for is to get the appropriate buckets, and 
> they should be the same in either case.
> 
> You might see some issues with this idea when buckets are moving around 
> during a rebalance. You'd have to test in that scenario to verify.
> 
> 
> Thanks,
> Barry Oglesby
> 
> 
> On Fri, Apr 15, 2016 at 5:14 PM, Real Wes Williams <[email protected] 
> <mailto:[email protected]>> wrote:
> Barry,
> 
> Would passing the RegionFunctionContext to the query exception apply whether 
> the original function was executed on the Orders region vs the Trades region 
> in your example?  If they are colocated I would intuitively think that it 
> _may_ not matter, but if so, the side effects would probably be subtle.
> 
> To be specific by way of modifying your example, are the following equivalent 
> given that Orders and Trades are colocated?
> 
> Example 1 - Executing on the Orders region:    
> **********************************************
> Execution execution = 
> FunctionService.onRegion(orderRegion).withFilter(Collections.singleton(cusip));
> ResultCollector collector = execution.execute(“TradeQueryFunction");
> 
> In the function….
> Query query = queryService.newQuery(select * from /Trade where cusip = ‘123');
> SelectResults results = (SelectResults) this.query.execute(rfc, new String[] 
> {cusip});
> 
> Example 1 - Executing on the Trades region:    
> **********************************************
> Execution execution = 
> FunctionService.onRegion(tradeRegion).withFilter(Collections.singleton(cusip));
> ResultCollector collector = execution.execute(“TradeQueryFunction");
> 
> In the function….
> Query query = queryService.newQuery(select * from /Trade where cusip = ‘123');
> SelectResults results = (SelectResults) this.query.execute(rfc, new String[] 
> {cusip});
> 
> And then in the function:
> 
>> On Apr 15, 2016, at 7:53 PM, Barry Oglesby <[email protected] 
>> <mailto:[email protected]>> wrote:
>> 
>> Executing queries in functions can be tricky.
>> 
>> For executing queries in a function, do something like:
>> 
>> - invoke the function with onRegion
>> - have the function return true from optimizeForWrite so that it is executed 
>> only on primary buckets
>> - use the Query execute API with a RegionFunctionContext in the function. 
>> Otherwise, you could easily end up executing the same query on more than one 
>> member.
>> 
>> If you set a filter, the function (and query) will execute on only the 
>> member containing the primary or primaries for that filter.
>> 
>> Here is an example with trades.
>> 
>> If you route all trades on a specific cusip to the same bucket using a 
>> PartitionResolver, then querying for all trades for a specific cusip can be 
>> done efficiently using a Function. The trades could be stored with a simple 
>> String key like cusip-id or a complex key containing both the cusip and id. 
>> Either way, the PartitionResolver will need to be able to return the cusip 
>> for the routing object.
>> 
>> Invoke the function like:
>> 
>> Execution execution = 
>> FunctionService.onRegion(this.region).withFilter(Collections.singleton(cusip));
>> ResultCollector collector = execution.execute("TradeQueryFunction");
>> Object result = collector.getResult();
>> 
>> In the TradeQueryFunction, execute the query like:
>> 
>> RegionFunctionContext rfc = (RegionFunctionContext) context;
>> String cusip = (String) rfc.getFilter().iterator().next();
>> SelectResults results = (SelectResults) this.query.execute(rfc, new String[] 
>> {cusip});
>> 
>> Where the query is:
>> 
>> select * from /Trade where cusip = $1
>> 
>> This will route the function request to the member whose primary bucket 
>> contains the cusip filter. Then it will execute the query on the 
>> RegionFunctionContext which will just be the data for that bucket. Note: the 
>> PartitionResolver will also need to be able to return the cusip for that 
>> filter (which is just the input string itself).
>> 
>> Here is a some more general info on functions.
>> 
>> If you're executing a function onRegion with a replicated region, then the 
>> function is executed on any member defining that region. Since the region is 
>> replicated, every server has the same data.
>> 
>> If you're executing a function onRegion with a partitioned region, then 
>> where the function is invoked depends on the result of optimizeForWrite. If 
>> optimizeForWrite returns true, the function is invoked on all the members 
>> containing primary buckets for that region. If optimizeForWrite returns 
>> false, the function is invoked on as few members as it can that encompass 
>> all the buckets (so it mixes primary and secondary buckets). For example if 
>> you have 2 members, and the primaries are split between them, then 
>> optimizeForWrite returning true means that the function will be invoked on 
>> both members. Returning false will cause the function to be invoked on only 
>> one member since each member has all the buckets. I almost always have 
>> optimizeForWrite return true.
>> 
>> The onServer/onServers API is used for data-unaware calls (meaning no 
>> specific region involved). In the past, I've used it mainly for admin-type 
>> behavior like:
>> 
>> - start/stop gateway senders
>> - create regions
>> - rebalance
>> - assign buckets
>> 
>> Now, gfsh does a lot of this behavior (maybe all of it), so I don't 
>> necessarily need functions to do it anymore.
>> 
>> One of my favorite onServer use cases is the command pattern using a 
>> Request/Response API like:
>> 
>> - define a Request (like RebalanceCache)-
>> - pass it as an argument to a CommandFunction from the client to a server 
>> using onServer
>> - execute it on the server 
>> - return a Response
>> 
>> One use case for invoking a function from another function is member 
>> notification. This can be done with a CacheListener on a replicated region 
>> too, but the basic idea is:
>> 
>> - invoke a function
>> - in the function, invoke another function on all the members notifying them 
>> something is about to happen
>> - do the thing
>> - invoke another function on all the members notifying them something has 
>> happened
>> 
>> You need to be careful when invoking one function from another. Depending on 
>> what you're doing in the second function, you could get yourself into a 
>> distributed deadlock situation.
>> 
>> I'm not sure this answers all the issues you were seeing, but hopefully it 
>> helps.
>> 
>> Thanks,
>> Barry Oglesby
>> 
>> 
>> On Fri, Apr 15, 2016 at 1:36 PM, Matt Ross <[email protected] 
>> <mailto:[email protected]>> wrote:
>> Hi all,
>> 
>> I'm involved in a sizable GemFire Project right now that is requiring me to 
>> execute Functions in a number of ways, and I wanted to poll the community 
>> for some best practices.  So initially I would execute all functions like 
>> this. 
>> 
>> ResultCollector<?, ?> rc = FunctionService.onRegion(region)
>>     .withArgs(arguments).execute("my-awesome-function");
>> And this worked reliably for quite some time, until I started mixing up 
>> functions that were executing on partition redundant data and replicated 
>> data.  I initially started having problems with this method when I had this 
>> setup.  
>> 1 locator, 2 servers,  and executing functions that would run queries on 
>> partition redundant and replicated regions.  I started getting this problem 
>> where the function would execute on both servers, and the result collector 
>> would indeterminately chose a server to return results from.  According to 
>> logging statements placed within my function I was able to confirm that the 
>> function was being executed twice, on both servers.  We were able to fix 
>> this problem by switching from executing on region, to executing on Pool.  
>> The initial logic being since there was replicated data on both servers, the 
>> function would execute on both servers(Hyptothesis).  
>> Another issue was executing functions from within a function without a 
>> function context.  Let's say I have one function that I execute with on 
>> Pool, there for it is passed a Function Context.  But when I'm actually in 
>> the function I need to execute other functions, some needing a 
>> RegionFunctionContext and some just needing a FunctionContext.  Initially I 
>> was able to just use a Result Collector and FunctionService.onRegion to get 
>> a region context, and then pass my current function context to an instance 
>> of a new function
>> MyAwesomeFunction myAwesomeFunction= MyAwesomeFunction();
>> myAweSomeFunction.execute(functionContext);
>> This worked for a time but complexity started rising and more problems came 
>> up.  
>> So in short I wanted to throw out the blanket question of best practices on 
>> using (onRegion/onPool/onServer), calling other functions from within 
>> functions, what type of functions should be used on what type of regions, 
>> and general design patterns when executing functions.  Thanks!
>> Matthew Ross | Data Engineer | Pivotal
>> 625 Avenue of the Americas NY, NY 10011
>> 516-941-7535 <tel:516-941-7535> | [email protected] <mailto:[email protected]> 
>> 
>> 
> 
>

Re: Best Practices for Calling Server Side Functions

Reply via email to