https://stackoverflow.com/questions/59724806/how-do-i-run-a-data-dependent-function-on-a-partitioned-region-in-a-member-group/59738182#59738182

On Tue, Jan 14, 2020 at 9:28 AM Nabarun Nag <n...@pivotal.io> wrote:

> Looks like the conversation has moved to the stackoverflow.
> Continuing the conversation on stackoverflow.
>
> Regards
> Nabarun Nag
>
> On Tue, Jan 14, 2020 at 9:23 AM Nabarun Nag <n...@apache.org> wrote:
>
>> Hi David,
>> We have started looking into it and get you an answer soon.
>>
>> Regards
>> Naba
>>
>> On Tue, Jan 14, 2020 at 7:48 AM David Loewy <david.lo...@resonate.com>
>> wrote:
>>
>>> Hello!
>>>
>>>
>>>
>>> My team uses Geode as a makeshift analytics engine. We store a
>>> collection of massive raw data objects (200MB+ each) in Geode, but these
>>> objects are never directly returned to the client. Instead, we rely heavily
>>> on custom function execution to process these data sets inside Geode, and
>>> only return the analysis result set.
>>>
>>>
>>>
>>> We have a new requirement to implement two tiers of data analytics
>>> precision. The high-precision analytics will require larger raw data sets
>>> and more CPU time. It is imperative that these high-precision analyses do
>>> not inhibit the low-precision analytics performance in any way. As such,
>>> I'm looking for a solution that keeps these data sets isolated to different
>>> servers.
>>>
>>>
>>>
>>> I built a POC that keeps each data set in its own region (both are
>>> PARTITIONED). These regions are configured to belong to separate Member
>>> Groups, then each server is configured to join one of the two groups. I'm
>>> able to stand up this cluster locally without issue, and gfsh indicates
>>> that everything looks correct: `describe member` shows each member hosting
>>> the expected regions.
>>>
>>>
>>>
>>> My client code configures a ClientCache that points at the cluster's
>>> single locator. My function execution command generally looks like the
>>> following:
>>>
>>>
>>>
>>> FunctionService
>>>
>>>   .onRegion(highPrecisionRegion)
>>>
>>>   .setArguments(inputObject)
>>>
>>>   .filter(keySet)
>>>
>>>   .execute(function);
>>>
>>>
>>>
>>> When I only run the high-precision server, I'm able to execute the
>>> function against the high-precision region. When I only run the
>>> low-precision server, I'm able to execute the function against the
>>> low-precision region. However, when I run both servers and execute the
>>> functions one after the other, I invariably get an exception stating that
>>> *one* of the regions cannot be found. See the following Gist for a sample
>>> of my code and the exception.
>>>
>>> https://gist.github.com/dLoewy/c9f695d67f77ec18a7e60a25c4e62b01
>>>
>>>
>>>
>>> TLDR key points:
>>>
>>> 1) Using member groups, Region A is on Server 1 and Region B is on
>>> Server 2.
>>>
>>> 2) These regions must be PARTITIONED in Production.
>>>
>>> 3) I need to run a *data-dependent* function on one of these regions;
>>> The client code chooses which.
>>>
>>> 4) As-is, my client code always fails to find *one* of the regions.
>>>
>>>
>>>
>>> Can someone please help me get on track? Is there an entirely different
>>> cluster architecture I should be considering? Happy to provide more detail
>>> upon request.
>>>
>>>
>>>
>>> Thanks so much for your time!
>>>
>>>
>>>
>>> David
>>>
>>>
>>>
>>> FYI, the following docs pages mention function execution on Member
>>> Groups, but give very little detail. The first link describes running
>>> data-INdependent functions on member groups, but doesn't say how, and
>>> doesn't say anything about running data-DEpendent functions on member
>>> groups.
>>>
>>>
>>> https://gemfire.docs.pivotal.io/99/geode/developing/function_exec/how_function_execution_works.html
>>>
>>>
>>> https://gemfire.docs.pivotal.io/99/geode/developing/function_exec/function_execution.html
>>>
>>

Reply via email to