Re: Load all data from DB on Cache Start

John Blum Sat, 14 Jan 2017 13:39:44 -0800

Amit-

Another thing, a BPP is my recommended way in *Spring* to load data into a
Region after initialization, so I whole heartily support Luke on this.


Also keep in mind, if you need the initial Region load to be done
asynchronously (a BPP callback method is invoked synchronously during a
*Spring* ApplicationContext refresh and will block all other (possible)
beans (coming after) from being initialized), then you are responsible for
making that happen... perhaps with an appropriate Executor and Future.
Keep in mind that you can also publish (fire) an ApplicationEvent to your
"interested" application components (beans) that need to know when the
Region is fully loaded and ready for use.

Additionally, if you do not need to preload your Region on startup, then a
CacheLoader is the recommended way to load data into your Region on cache
misses (another synchronous mechanism called a "read-through").

A word of caution, never, ever auto-wire or inject any beans into a BPP.
To do so could cause premature initialization.  Always rely on the bean
instance passed to the BPPs postProcessXXXX methods.

Thanks,
John


On Sat, Jan 14, 2017 at 1:30 PM, John Blum <[email protected]> wrote:

> Hi Amit, Luke-
>
> Thank you Luke.
>
> Actually Luke is mostly correct.  In this case, the order, however, DOES
> NOT matter.  The *Spring* container is intimately aware of certain types
> of beans defined/declared in the *Spring* ApplicationContext.
> BeanPostProcessors, a container extension point (hook), are one of them.
>
> *Spring* creates all BeanPostProcessors (BPP) before any other
> application beans in order to post process each bean defined/declared in
> the container (except for BPPs and BeanFactoryPostProcessors, of
> course).  The container then proceeds to call the BPP *before* the bean
> is "initialized" by the container (i.e. postProcessBeforeInitializatio
> n(..)) as well as *after* the bean has been "initialized".  A bean
> initialization corresponds to InitializingBean.afterPropertiesSet(), any
> init() methods marked as such in XML config or any @PostContruct methods.
>
> Most SDG FactoryBeans (e.g. PartitionedRegionFactoryBean ->
> RegionFactoryBean) always create their GemFire object (e.g. Region) in
> the afterPropertiesSet() (i.e. initialization) method as the
> <SDG>FactoryBean implements *Spring's* InitializingBean (callback)
> interface.
>
> Therefore, technically, it is safe to define/declare any beans, in any
> order, since the dependencies and callbacks (BPP) pretty much determine the
> order in which beans are constructed, configured and initialized.  SDG even
> takes the Spring container DI concept to the level of ensure GemFire
> objects are created in the order that GemFire expects based on both
> explicit and implicit dependencies (think Regions and a DiskStore, for
> instance, where the DS is just named in the Region configuration;
> under-the-hood, though, SDG creates a RuntimeReference on the named DS to
> ensure the proper order).  Another example would be, it is also possible to
> defined/declare your Regions before a the Cache instance...
>
> <gfe:partitioned-region id="Products" ... />
>
> <gfe:cache/>
>
> SDG does not care how your define yours beans generally will do the right
> thing.  Using JavaConfig is a bit different though and in certain cases you
> have be a bit more conscientious of the order.
>
> In general, if you had a container with multiple beans defined/declared
> that had NO dependencies between them (or other pre-defined order
> specified, such as when using *Spring's* @Ordered annotation in an
> AnnotationBasedApplicationContext or by implementing the Ordered
> interface), then *Spring* will pretty much proceed to construct,
> configure and initialize beans in the order they are declared in the
> ApplicationContext config.
>
> Now, if you have multiple BPPs to process the Region, for various reasons,
> then you will need to define order among them by using the @Ordered
> annotation or by having your custom BPP implement the Ordered interface,
> if order is important.  If an order is not given, then *Spring* makes no
> guarantees which BPP will be invoked first.
>
> Anyway, all of this is well-described in the Spring documentation on 
> "*Customizing
> the nature of a bean*" [1] as well as in "Container Extension Points" [2].
>
> Hope this helps.
>
> -John
>
> [1] http://docs.spring.io/spring/docs/current/spring-framework-reference/
> htmlsingle/#beans-factory-nature
> [2] http://docs.spring.io/spring/docs/current/spring-framework-reference/
> htmlsingle/#beans-factory-extension
>
>
> On Sat, Jan 14, 2017 at 8:38 AM, Amit Pandey <[email protected]>
> wrote:
>
>> Okay...yea as post processors process everything in the IOC thats the
>> only way I guess
>>
>> Thanks
>>
>>
>>
>> On Sat, Jan 14, 2017 at 9:36 PM, Luke Shannon <[email protected]>
>> wrote:
>>
>>> Hi Amit,
>>>
>>> In the past I have done it like this:
>>>
>>> Define a BeanPostProcessor like below. It will go out and get the data
>>> from where ever it lives, convert it to objects and then put them into the
>>> region using a Region reference passed in shortly after the region is
>>> initialized. This bean will need to be in the class path of Geode when it
>>> start up. If using gfsh you can add it to the '--classpath' argument of the
>>> 'start server' command.
>>>
>>> You can then wire this bean into the Geode Cache xml like so:
>>>
>>> <gfe:replicated-region id="Product" />
>>>
>>> <bean id="productLoader" class="mypackage.ProductLoader">
>>>
>>> <property name="targetBeanName" value="Product" />
>>>
>>> </bean>
>>>
>>> Note that this bean is placed *below* your region definitions in the
>>> spring cache xml. If I remember correctly order matters and it will try and
>>> run this before the Region reference is created if the order is not correct.
>>>
>>> Hope this helps,
>>>
>>> Luke
>>>
>>> import java.io.BufferedReader;
>>> import java.io.File;
>>> import java.io.FileReader;
>>> import java.io.IOException;
>>> import java.util.HashMap;
>>> import java.util.Map;
>>> import org.springframework.beans.BeansException;
>>> import org.springframework.beans.factory.config.BeanPostProcessor;
>>> import org.springframework.util.Assert;
>>> import org.springframework.util.StringUtils;
>>> import com.gemstone.gemfire.cache.Region;
>>> import com.google.gson.Gson;
>>>
>>>
>>> public class ProductLoader implements BeanPostProcessor {
>>>
>>> private String targetBeanName;
>>> protected String getTargetBeanName() {
>>>    Assert.state(StringUtils.hasText(targetBeanName), "The target Spring
>>> context bean name was not properly specified!");
>>>    return targetBeanName;
>>>  }
>>>
>>>  public void setTargetBeanName(final String targetBeanName) {
>>>    Assert.hasText(targetBeanName, "The target Spring context bean name
>>> must be specified!");
>>>    this.targetBeanName = targetBeanName;
>>>  }
>>>
>>>  @Override
>>>  public Object postProcessBeforeInitialization(final Object bean, final
>>> String beanName) throws BeansException {
>>>    return bean;
>>>  }
>>>
>>> @SuppressWarnings({ "unchecked", "rawtypes" })
>>> @Override
>>>  public Object postProcessAfterInitialization(final Object bean, final
>>> String beanName) throws BeansException {
>>>    if (beanName.equals(getTargetBeanName()) && bean instanceof Region) {
>>>           //get your data from where it lives and do a put or a put all
>>> into the region here
>>> ((Region) bean).put(<Key For Product>,<Product Value>);
>>>    log.info("Preloading complete. Region now has: " + ((Region)
>>> bean).size());
>>>    }
>>>    return bean;
>>>  }
>>>
>>>
>>>
>>> }
>>>
>>>
>>> On Sat, Jan 14, 2017 at 10:01 AM, Amit Pandey <[email protected]
>>> > wrote:
>>>
>>>> Hey John,
>>>>
>>>> How do we hook up post processors for a region ?
>>>>
>>>> If I have a region like :-
>>>>
>>>> <gfe:partitioned-region id="trades">
>>>>     <gfe:cache-loader>
>>>>         <bean class="x.y.z.TradeLoader"/>
>>>>     </gfe:cache-loader>
>>>>     <gfe:cache-writer>
>>>>         <bean class="x.y.z.TradeWriter"/>
>>>>     </gfe:cache-writer>
>>>>
>>>>
>>>> </gfe:partitioned-region>
>>>>
>>>>
>>>> How do we hook up the post processor?
>>>>
>>>>
>>>> On Tue, Dec 27, 2016 at 1:22 PM, Amit Pandey <[email protected]
>>>> > wrote:
>>>>
>>>>> Hey,
>>>>>
>>>>> Happy Holidays. Wishing you a great new year :)
>>>>>
>>>>> Regards
>>>>>
>>>>> On Tue, Dec 27, 2016 at 1:08 PM, John Blum <[email protected]> wrote:
>>>>>
>>>>>> ;-)  Happy holidays my friend.  Hope your are getting some good R&R.
>>>>>>
>>>>>> On Mon, Dec 26, 2016 at 2:14 PM, Udo Kohlmeyer <[email protected]
>>>>>> > wrote:
>>>>>>
>>>>>>> it helps a lot! :D
>>>>>>>
>>>>>>> On 12/26/16 12:28, John Blum wrote:
>>>>>>>
>>>>>>> Amit-
>>>>>>>
>>>>>>> Regarding...
>>>>>>>
>>>>>>> *> I want to load all data on cache startup at a go.*
>>>>>>>
>>>>>>> Since you are using "*Spring*", you could easily implement a
>>>>>>> *Spring* BeanPostProcessor [1] (BPP) for each (or all the)
>>>>>>> *Region(s)* in which you need to load data.  I do this frequently
>>>>>>> in *Spring Data GemFire/Geode's* test suite when testing *Region*
>>>>>>> data access operations using the GemfireTemplate, *Repositories* or
>>>>>>> things of that nature.  Clearly your BPP could use a DataSource to
>>>>>>> load the data from an external data store (e.g. RDBMS).
>>>>>>>
>>>>>>> Another way to do load data on startup is to use a Geode
>>>>>>> *Initializer*.  However, this would require you to specify a
>>>>>>> snippet of cache.xml and does not work if you specify your *Regions*
>>>>>>> in *Spring* (XML/Java) config as you should when using *Spring*.  I
>>>>>>> also don't recommend using cache.xml, but is the pure, non-*Spring*
>>>>>>> way to invoke logic after the cache has been "fully" initialized (i.e.
>>>>>>> where the *Regions* have been defined in cache.xml).
>>>>>>>
>>>>>>> See here [2] for more details.  Note, the documentation talks of
>>>>>>> "launching an application" on startup, after cache initialization, but
>>>>>>> technically, you can do whatever you want, like load data.
>>>>>>>
>>>>>>> I recommend the BPP.
>>>>>>>
>>>>>>>
>>>>>>> *> How should I set it up in config to allow it to join other nodes
>>>>>>> in cluster?*
>>>>>>>
>>>>>>> Regardless of whether your server data node is "embedded" or not,
>>>>>>> you can still use a Locator, or mcast to have the node join the cluster.
>>>>>>> The "embedded" scenario, where the "application" is a GemFire Server 
>>>>>>> data
>>>>>>> node will be part of the cluster as Udo said.
>>>>>>>
>>>>>>> This is easily achievable with...
>>>>>>>
>>>>>>> <util:properties id="gemfireProperties">
>>>>>>>   <prop key="name">Example</prop>
>>>>>>>   <!-- Set to non-zero value to use Multicast; comment out
>>>>>>> "locators" -->
>>>>>>>   <prop key="*mcast-port*">0</prop>
>>>>>>>   <prop key="log-level">${gemfire.log-level:config}</prop>
>>>>>>>   <prop key=“*locators*”>someHost[10334]</prop>
>>>>>>>   <prop key="start-locator">localhost[1034]</prop>
>>>>>>> </util:properties>
>>>>>>>
>>>>>>> <gfe:cache properties-ref="gemfireProperties"/>
>>>>>>>
>>>>>>> ...
>>>>>>>
>>>>>>>
>>>>>>> As you can see from the snippet of *Spring* XML config above, this
>>>>>>> application is a Geode "peer" cache (i.e. embeds a Geode data 
>>>>>>> node/server).
>>>>>>>
>>>>>>> The "*locators*" Geode/GemFire property enables this node to
>>>>>>> connect to a cluster.  Likewise, you can use the "*mcast-port*"
>>>>>>> property instead, however, I would recommend *Locators* over mcast.
>>>>>>>
>>>>>>> Additionally, you can see that I specified the "start-locator"
>>>>>>> Geode/GemFire property, which enables me to start an embedded Locator.
>>>>>>> Useful for testing purposes and connecting Geode data nodes together in 
>>>>>>> a
>>>>>>> cluster without a dedicated Locator, though, this approach is less
>>>>>>> resilient if the applications/servers go down (as may be the case in a
>>>>>>> micro-services scenario)!
>>>>>>>
>>>>>>>
>>>>>>> *> if I start with embedded server is it required to use client pool
>>>>>>> or is it not required?*
>>>>>>>
>>>>>>> A "client pool" is only applicable to cache clients (i.e.
>>>>>>> ClientCaches) on the "client-side" of the equation.  "peers" find
>>>>>>> (Locator, mcast) and communicate (TCP/UDP, JGroups) with each other 
>>>>>>> through
>>>>>>> other means once a cluster is formed.
>>>>>>>
>>>>>>> In fact, typically, it is more common to position your
>>>>>>> microservices-based applications as Geode cache clients (i.e. 
>>>>>>> <gfe:client-cache
>>>>>>> ...>) and have them connect to a dedicated Geode service (i.e.
>>>>>>> cluster of Geode servers/data nodes where also, 1 or more of those nodes
>>>>>>> are running a "CacheServer", listening for cache clients to
>>>>>>> connect).  These dedicated Geode server nodes in a cluster constituting 
>>>>>>> the
>>>>>>> service can still be configured with *Spring*, but they typically
>>>>>>> will not contain an application-specific components other than
>>>>>>> CacheListeners, Loaders, Writers, AEQ *Listeners*, etc.
>>>>>>>
>>>>>>> ClientCache applications use 1 or more Pools configured to talk to
>>>>>>> the servers in the cluster (either by way of Locator or direct server
>>>>>>> communication). Pools can be configured with groups to target
>>>>>>> specific members (in that group) in the cluster.  Typically, members in 
>>>>>>> 1
>>>>>>> group host a different set of Regions from another group and is a way to
>>>>>>> separate data traffic from 1 client to another dedicated to a specific
>>>>>>> resource/purpose (usually based on business function, etc).
>>>>>>>
>>>>>>> On a side note, some of what you are wanting to do "scale-wise"
>>>>>>> seems like a perfect fit for Pivotal CloudFoundry, which can auto-scale 
>>>>>>> up
>>>>>>> or down nodes in your cluster based on load and other factors.
>>>>>>>
>>>>>>> Anyway, hope this helps!
>>>>>>>
>>>>>>> -John
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> [1] http://docs.spring.io/spring/docs/current/spring-framewo
>>>>>>> rk-reference/htmlsingle/#beans-factory-extension-bpp
>>>>>>> [2] http://geode.apache.org/docs/guide/basic_config/the_cach
>>>>>>> e/setting_cache_initializer.html
>>>>>>>
>>>>>>>
>>>>>>> On Sun, Dec 25, 2016 at 11:12 PM, Amit Pandey <
>>>>>>> [email protected]> wrote:
>>>>>>>
>>>>>>>> Hey,
>>>>>>>>
>>>>>>>> Thanks.
>>>>>>>>
>>>>>>>> I have lots of reference data which will be loaded at start of day.
>>>>>>>> This data is not bound to change much and as such I want to keep it 
>>>>>>>> loaded
>>>>>>>> at the start of day. Read through will make it slow while it is being
>>>>>>>> actually accessed so I want to keep it loaded in memory.
>>>>>>>>
>>>>>>>> Also I want to have functions which will be called by clients to do
>>>>>>>> some compute and return results. Using functions should allow me to add
>>>>>>>> nodes and speed up the compute.
>>>>>>>>
>>>>>>>> I have some micro services each of which will start a gemfire node,
>>>>>>>> and I want to connect, so yes I can set it up with locator.
>>>>>>>>
>>>>>>>> However I have one doubt, if I start with embedded server is it
>>>>>>>> required to use client pool or is it not required?
>>>>>>>>
>>>>>>>> Regards
>>>>>>>>
>>>>>>>> On Mon, Dec 26, 2016 at 1:18 AM, Udo Kohlmeyer <
>>>>>>>> [email protected]> wrote:
>>>>>>>>
>>>>>>>>> Hi there Amit,
>>>>>>>>>
>>>>>>>>> At this stage the only way you could load all data at one go is to
>>>>>>>>> write a client to connect to the db and load all in. Another approach 
>>>>>>>>> could
>>>>>>>>> be to write the same code into a function and invoke the function at 
>>>>>>>>> start
>>>>>>>>> up. But in both cases both are manual.
>>>>>>>>>
>>>>>>>>> To have geode servers join a cluster, you have 2 ways.
>>>>>>>>>
>>>>>>>>>    1. Connecting them up via a locator
>>>>>>>>>    2. Connecting them up via mcast.
>>>>>>>>>
>>>>>>>>> Please be aware the once you connect a server to a cluster, that
>>>>>>>>> server becomes an integral part of the cluster so adding/removing 
>>>>>>>>> servers
>>>>>>>>> from a cluster is not something you'd want to do in a load-based 
>>>>>>>>> scaling
>>>>>>>>> model. i.e if the load is high, add a server and if load is low, shut 
>>>>>>>>> down
>>>>>>>>> a server.
>>>>>>>>>
>>>>>>>>> Just interest sake, what is your use case.
>>>>>>>>>
>>>>>>>>> --Udo
>>>>>>>>>
>>>>>>>>> On 12/24/16 05:57, Amit Pandey wrote:
>>>>>>>>>
>>>>>>>>> Hi Guys,
>>>>>>>>>
>>>>>>>>> I am using Spring Data Geode. I have been able to use read and
>>>>>>>>> write through/ write behind. I want to load all data on cache startup 
>>>>>>>>> at a
>>>>>>>>> go.
>>>>>>>>>
>>>>>>>>> Secondly my geode server is embedded but I want to allow it join
>>>>>>>>> to other nodes.  How should I set it up in config to allow it to join 
>>>>>>>>> other
>>>>>>>>> nodes in cluster?
>>>>>>>>>
>>>>>>>>> Regards
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> -John
>>>>>>> john.blum10101 (skype)
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> -John
>>>>>> john.blum10101 (skype)
>>>>>>
>>>>>
>>>>>
>>>>
>>>
>>>
>>> --
>>> Luke Shannon | Platform Engineering | Pivotal
>>> ------------------------------------------------------------
>>> -------------
>>>
>>> Mobile:416-571-9495 <(416)%20571-9495>
>>> Join the Toronto Pivotal Usergroup: http://www.meetup.c
>>> om/Toronto-Pivotal-User-Group/
>>>
>>
>>
>
>
> --
> -John
> john.blum10101 (skype)
>



-- 
-John
john.blum10101 (skype)

Re: Load all data from DB on Cache Start

Reply via email to