Re: fastest way to bulk insert in geode

Lyndon Adams Mon, 06 Mar 2017 11:18:54 -0800

Oh my god Charlie you are taking my money making opportunities away from me. 
Basically he is right plus you got to add some black GC magic in to the mix to 
optimise pauses.



> On 6 Mar 2017, at 18:57, Charlie Black <[email protected]> wrote:
> 
> putAll() is the bulk operation for geode.   Plain and simple.
> 
> The other techniques outlined in this thread are all efforts to go really 
> fast by separating concerns at multiple levels.   Or taking advantage of the 
> fact there are other system and CPUs that are in the physical architecture.  
> 
> Example: The GC comment - when creating the domain objects sometimes that 
> causes GC pressure which reduces throughput.   I typically look at bulk sizes 
> to reduce that concern.   
> 
> Consider all suggestions then profile your options and choose the right 
> pattern for your app.  
> 
> Regards,
> Charlie
> 
> ---
> Charlie Black
> 858.480.9722 | [email protected] <mailto:[email protected]>
> 
>> On Mar 6, 2017, at 10:42 AM, Amit Pandey <[email protected] 
>> <mailto:[email protected]>> wrote:
>> 
>> Hey Jake,
>> 
>> Thanks. I am a bot confused so a put should be faster than putAll ?
>> 
>> John,
>> 
>> I need to setup all data so that they can be queried.  So I don't think 
>> CacheLoader works for me. Those data are the results of a very large and 
>> expensive computations and doing them dynamically will be costly.
>> 
>> We have a time window to setup the system because after that some other jobs 
>> will start. Currently its taking 2.4 seconds to insert 30,000 data and its 
>> great.  But I am just trying to optimize if it can be made faster.
>> 
>> Regards
>> 
>> On Tue, Mar 7, 2017 at 12:01 AM, John Blum <[email protected] 
>> <mailto:[email protected]>> wrote:
>> Amit-
>> 
>> Note, a CacheLoader does not necessarily imply "loading data from a 
>> database"; it can load data from any [external] data source and does so on 
>> demand (i.e. lazily, on a cache miss).  However, as Mike points out, this 
>> might not work for your Use Case in situations where you are querying, for 
>> instance.
>> 
>> I guess the real question here is, what is the requirement to pre-load this 
>> data quickly?  What is the driving requirement here?
>> 
>> For instance, is the need to be able to bring another system online quickly 
>> in case of "failover".  If so, perhaps an architectural change is more 
>> appropriate such as an Active/Passive arch (using WAN).
>> 
>> -j
>> 
>> 
>> 
>> On Mon, Mar 6, 2017 at 9:45 AM, Amit Pandey <[email protected] 
>> <mailto:[email protected]>> wrote:
>> We might need that actually...problem is we cant use dataloader because we 
>> are not loading from database. So we have to use putall. Its taking 2 
>> seconds for over 30000 data. If implenting it will bring it down that will 
>> be helpful.
>> 
>> On 06-Mar-2017 10:05 pm, "Michael Stolz" <[email protected] 
>> <mailto:[email protected]>> wrote:
>> Of course if you're REALLY in need of speed you can write your own custom 
>> implementations of toData and fromData for the DataSerializable Interface. 
>> 
>> I haven't seen anyone need that much speed in a long time though.
>> 
>> 
>> 
>> --
>> 
>> Mike Stolz
>> Principal Engineer - Gemfire Product Manager
>> Mobile: 631-835-4771 <tel:(631)%20835-4771>
>> 
>> On Mar 3, 2017 11:16 PM, "Real Wes" <[email protected] 
>> <mailto:[email protected]>> wrote:
>> Amit,
>> 
>>  
>> 
>> John and Mike’s advice about tradeoffs is worth heeding. You’ll find that 
>> your speed is probably just fine with putAll but if you just have to have 
>> NOS in your tank, you might consider - since you’re inside a function - do 
>> the putAll from the function into your region but change the region scope to 
>> distributed-no-ack.  See:  
>> <>https://geode.apache.org/docs/guide/developing/distributed_regions/choosing_level_of_dist.html
>>  
>> <https://geode.apache.org/docs/guide/developing/distributed_regions/choosing_level_of_dist.html>
>>  
>> 
>> Wes
>> 
>>  
>> 
>> From: Amit Pandey [mailto:[email protected] 
>> <mailto:[email protected]>] 
>> Sent: Friday, March 3, 2017 12:26 PM
>> To: [email protected] <mailto:[email protected]>
>> Subject: Re: fastest way to bulk insert in geode
>> 
>>  
>> 
>> Hey John ,
>> 
>>  
>> 
>> Thanks I am planning to use Spring XD. But my current usecase is that I am 
>> aggregating and doing some computes in a Function and then I want to 
>> populate it with the values I have a map , is region.putAll the fastest?
>> 
>>  
>> 
>> Regards
>> 
>>  
>> 
>> On Fri, Mar 3, 2017 at 10:52 PM, John Blum <[email protected] 
>> <mailto:[email protected]>> wrote:
>> 
>> You might consider using the Snapshot service 
>> <http://gemfire90.docs.pivotal.io/geode/managing/cache_snapshots/chapter_overview.html>
>>  [1] if you previously had data in a Region of another Cluster (for 
>> instance).
>> 
>>  
>> 
>> If the data is coming externally, then Spring XD 
>> <http://projects.spring.io/spring-xd/> [2] is a great tool for moving 
>> (streaming) data from a source 
>> <http://docs.spring.io/spring-xd/docs/1.3.1.RELEASE/reference/html/#sources> 
>> [3] to a sink 
>> <http://docs.spring.io/spring-xd/docs/1.3.1.RELEASE/reference/html/#sinks> 
>> [4].  It also allows you to perform all manners of 
>> transformations/conversions, trigger events, and so and so forth.
>> 
>>  
>> 
>> -j
>> 
>>  
>> 
>>  
>> 
>> [1] 
>> http://gemfire90.docs.pivotal.io/geode/managing/cache_snapshots/chapter_overview.html
>>  
>> <http://gemfire90.docs.pivotal.io/geode/managing/cache_snapshots/chapter_overview.html>
>> [2] http://projects.spring.io/spring-xd/ 
>> <http://projects.spring.io/spring-xd/>
>> [3] 
>> http://docs.spring.io/spring-xd/docs/1.3.1.RELEASE/reference/html/#sources 
>> <http://docs.spring.io/spring-xd/docs/1.3.1.RELEASE/reference/html/#sources>
>> [4] http://docs.spring.io/spring-xd/docs/1.3.1.RELEASE/reference/html/#sinks 
>> <http://docs.spring.io/spring-xd/docs/1.3.1.RELEASE/reference/html/#sinks>
>>  
>> 
>>  
>> 
>> On Fri, Mar 3, 2017 at 9:13 AM, Amit Pandey <[email protected] 
>> <mailto:[email protected]>> wrote:
>> 
>> Hey Guys,
>> 
>>  
>> 
>> Whats the fastest way to do bulk insert in a region?
>> 
>>  
>> 
>> I am using region.putAll , is there any alternative/faster API?
>> 
>>  
>> 
>> regards
>> 
>> 
>> 
>> 
>>  
>> 
>> --
>> 
>> -John
>> 
>> john.blum10101 (skype)
>> 
>>  
>> 
>> 
>> 
>> 
>> -- 
>> -John
>> john.blum10101 (skype)
>> 
>

Re: fastest way to bulk insert in geode

Reply via email to