Re: fastest way to bulk insert in geode

John Blum Fri, 03 Mar 2017 10:11:39 -0800

SIMPLE ANSWER:

Well, I am not certain about "fastest", but it is convenient, and maybe 1
of the few ways (perhaps the only way, other than individual Region puts,
which I gather would be slower).

If we are talking about a simple Map of data that is relatively small, then
Region.putAll(:Map) is your best option.

However...

DETAILED ANSWER*:*

I.e. don't equate loading a simple Map with bulk data loads in general.

It really depends on many factors, like distribution factors in
particular... Region type (e.g. REPLICATE vs. PARTITION), Scope (as in
DISTRIBUTED_ACK, DISTRIBUTED_NO_ACK (only applicable for REPLICATE Regions;
i.e. PARTITION Regions are DISTRIBUTED_ACK only), number of redundant
copies (for PARTITION Regions), number of nodes in cluster hosting the
"target" Region, etc, etc.  All these can affect speed.

But typically, bulk loading data (batch) is not so much about speed as it
is consistency/accuracy, or data availability.

A more sophisticated approach in a distributed scenario, say if you were
using PARTITION Regions with a fixed partitioning strategy would be to load
the data in parallel from a Function, where the Function handles the data
set for the individual nodes based on the partitioning strategy.  Of course
redundant copies (along with Redundancy Zones) are still going to affect
perf, even in this approach.

So, again, it is a factor of your consistency and availability guarantees.

See here
<http://gemfire90.docs.pivotal.io/geode/developing/partitioned_regions/how_pr_ha_works.html>
[1]
and here
<http://gemfire90.docs.pivotal.io/geode/developing/partitioned_regions/configuring_ha_for_pr.html>
[2]
for more details.

I think the more pertinent question is where do you want to make your data
available to best serve the needs of your application in a reliable fashion
at runtime, rather than how it gets there.  You must be mindful of how much
memory your data takes up in the first place.  Additionally, using a
CacheLoader to lazily load the data in certain cases might make more
sense.  I.e. w.r.t. to bulk load, it is not about having all your data in
memory, but having the right data in-memory at the right time.  That is
going give your application the best responsiveness.

Food for thought,

-j

[1]
http://gemfire90.docs.pivotal.io/geode/developing/partitioned_regions/how_pr_ha_works.html
[2]
http://gemfire90.docs.pivotal.io/geode/developing/partitioned_regions/configuring_ha_for_pr.html

On Fri, Mar 3, 2017 at 9:26 AM, Amit Pandey <[email protected]>
wrote:

> Hey John ,
>
> Thanks I am planning to use Spring XD. But my current usecase is that I am
> aggregating and doing some computes in a Function and then I want to
> populate it with the values I have a map , is region.putAll the fastest?
>
> Regards
>
> On Fri, Mar 3, 2017 at 10:52 PM, John Blum <[email protected]> wrote:
>
>> You might consider using the Snapshot service
>> <http://gemfire90.docs.pivotal.io/geode/managing/cache_snapshots/chapter_overview.html>
>>  [1]
>> if you previously had data in a Region of another Cluster (for instance).
>>
>> If the data is coming externally, then *Spring XD
>> <http://projects.spring.io/spring-xd/> *[2] is a great tool for moving
>> (streaming) data from a source
>> <http://docs.spring.io/spring-xd/docs/1.3.1.RELEASE/reference/html/#sources> 
>> [3]
>> to a sink
>> <http://docs.spring.io/spring-xd/docs/1.3.1.RELEASE/reference/html/#sinks> 
>> [4].
>> It also allows you to perform all manners of transformations/conversions,
>> trigger events, and so and so forth.
>>
>> -j
>>
>>
>> [1] http://gemfire90.docs.pivotal.io/geode/managing/cache_
>> snapshots/chapter_overview.html
>> [2] http://projects.spring.io/spring-xd/
>> [3] http://docs.spring.io/spring-xd/docs/1.3.1.RELEASE/refer
>> ence/html/#sources
>> [4] http://docs.spring.io/spring-xd/docs/1.3.1.RELEASE/refer
>> ence/html/#sinks
>>
>>
>> On Fri, Mar 3, 2017 at 9:13 AM, Amit Pandey <[email protected]>
>> wrote:
>>
>>> Hey Guys,
>>>
>>> Whats the fastest way to do bulk insert in a region?
>>>
>>> I am using region.putAll , is there any alternative/faster API?
>>>
>>> regards
>>>
>>
>>
>>
>> --
>> -John
>> john.blum10101 (skype)
>>
>
>

-- 
-John
john.blum10101 (skype)

Re: fastest way to bulk insert in geode

Reply via email to