https://stackoverflow.com/questions/59724806/how-do-i-run-a-data-dependent-function-on-a-partitioned-region-in-a-member-group/59738182#59738182
On Tue, Jan 14, 2020 at 9:28 AM Nabarun Nag <n...@pivotal.io> wrote: > Looks like the conversation has moved to the stackoverflow. > Continuing the conversation on stackoverflow. > > Regards > Nabarun Nag > > On Tue, Jan 14, 2020 at 9:23 AM Nabarun Nag <n...@apache.org> wrote: > >> Hi David, >> We have started looking into it and get you an answer soon. >> >> Regards >> Naba >> >> On Tue, Jan 14, 2020 at 7:48 AM David Loewy <david.lo...@resonate.com> >> wrote: >> >>> Hello! >>> >>> >>> >>> My team uses Geode as a makeshift analytics engine. We store a >>> collection of massive raw data objects (200MB+ each) in Geode, but these >>> objects are never directly returned to the client. Instead, we rely heavily >>> on custom function execution to process these data sets inside Geode, and >>> only return the analysis result set. >>> >>> >>> >>> We have a new requirement to implement two tiers of data analytics >>> precision. The high-precision analytics will require larger raw data sets >>> and more CPU time. It is imperative that these high-precision analyses do >>> not inhibit the low-precision analytics performance in any way. As such, >>> I'm looking for a solution that keeps these data sets isolated to different >>> servers. >>> >>> >>> >>> I built a POC that keeps each data set in its own region (both are >>> PARTITIONED). These regions are configured to belong to separate Member >>> Groups, then each server is configured to join one of the two groups. I'm >>> able to stand up this cluster locally without issue, and gfsh indicates >>> that everything looks correct: `describe member` shows each member hosting >>> the expected regions. >>> >>> >>> >>> My client code configures a ClientCache that points at the cluster's >>> single locator. My function execution command generally looks like the >>> following: >>> >>> >>> >>> FunctionService >>> >>> .onRegion(highPrecisionRegion) >>> >>> .setArguments(inputObject) >>> >>> .filter(keySet) >>> >>> .execute(function); >>> >>> >>> >>> When I only run the high-precision server, I'm able to execute the >>> function against the high-precision region. When I only run the >>> low-precision server, I'm able to execute the function against the >>> low-precision region. However, when I run both servers and execute the >>> functions one after the other, I invariably get an exception stating that >>> *one* of the regions cannot be found. See the following Gist for a sample >>> of my code and the exception. >>> >>> https://gist.github.com/dLoewy/c9f695d67f77ec18a7e60a25c4e62b01 >>> >>> >>> >>> TLDR key points: >>> >>> 1) Using member groups, Region A is on Server 1 and Region B is on >>> Server 2. >>> >>> 2) These regions must be PARTITIONED in Production. >>> >>> 3) I need to run a *data-dependent* function on one of these regions; >>> The client code chooses which. >>> >>> 4) As-is, my client code always fails to find *one* of the regions. >>> >>> >>> >>> Can someone please help me get on track? Is there an entirely different >>> cluster architecture I should be considering? Happy to provide more detail >>> upon request. >>> >>> >>> >>> Thanks so much for your time! >>> >>> >>> >>> David >>> >>> >>> >>> FYI, the following docs pages mention function execution on Member >>> Groups, but give very little detail. The first link describes running >>> data-INdependent functions on member groups, but doesn't say how, and >>> doesn't say anything about running data-DEpendent functions on member >>> groups. >>> >>> >>> https://gemfire.docs.pivotal.io/99/geode/developing/function_exec/how_function_execution_works.html >>> >>> >>> https://gemfire.docs.pivotal.io/99/geode/developing/function_exec/function_execution.html >>> >>