Re: fastest way to bulk insert in geode

Amit Pandey Mon, 06 Mar 2017 12:43:13 -0800

Hey Lyndon,

Poor dev here, cant hire you. Not in that kind of position :)


Hey Jake,

Makes sense.  Will try your approach, with DataSerializable.

Hi Charlie,

Okay. I think yea, yes I understand GC needs to be tuned. Also currently I
do use Bulk sizes like I put 500 items and then clear the bulk data and
then fill up 500 again and retry. using DataSerializable with this approach
should be helpful I guess.

Thanks everyone, I will be trying out things and update you guys

On Tue, Mar 7, 2017 at 12:48 AM, Lyndon Adams <[email protected]>
wrote:

> Oh my god Charlie you are taking my money making opportunities away from
> me. Basically he is right plus you got to add some black GC magic in to the
> mix to optimise pauses.
>
>
> On 6 Mar 2017, at 18:57, Charlie Black <[email protected]> wrote:
>
> putAll() is the bulk operation for geode.   Plain and simple.
>
> The other techniques outlined in this thread are all efforts to go really
> fast by separating concerns at multiple levels.   Or taking advantage of
> the fact there are other system and CPUs that are in the physical
> architecture.
>
> Example: The GC comment - when creating the domain objects sometimes that
> causes GC pressure which reduces throughput.   I typically look at bulk
> sizes to reduce that concern.
>
> Consider all suggestions then profile your options and choose the right
> pattern for your app.
>
> Regards,
> Charlie
>
> ---
> Charlie Black
> 858.480.9722 | [email protected]
>
> On Mar 6, 2017, at 10:42 AM, Amit Pandey <[email protected]>
> wrote:
>
> Hey Jake,
>
> Thanks. I am a bot confused so a put should be faster than putAll ?
>
> John,
>
> I need to setup all data so that they can be queried.  So I don't think
> CacheLoader works for me. Those data are the results of a very large and
> expensive computations and doing them dynamically will be costly.
>
> We have a time window to setup the system because after that some other
> jobs will start. Currently its taking 2.4 seconds to insert 30,000 data and
> its great.  But I am just trying to optimize if it can be made faster.
>
> Regards
>
> On Tue, Mar 7, 2017 at 12:01 AM, John Blum <[email protected]> wrote:
>
>> Amit-
>>
>> Note, a CacheLoader does not necessarily imply "loading data from a
>> database"; it can load data from any [external] data source and does so on
>> demand (i.e. lazily, on a cache miss).  However, as Mike points out, this
>> might not work for your Use Case in situations where you are querying, for
>> instance.
>>
>> I guess the real question here is, what is the requirement to pre-load
>> this data quickly?  What is the driving requirement here?
>>
>> For instance, is the need to be able to bring another system online
>> quickly in case of "failover".  If so, perhaps an architectural change is
>> more appropriate such as an Active/Passive arch (using WAN).
>>
>> -j
>>
>>
>>
>> On Mon, Mar 6, 2017 at 9:45 AM, Amit Pandey <[email protected]>
>> wrote:
>>
>>> We might need that actually...problem is we cant use dataloader because
>>> we are not loading from database. So we have to use putall. Its taking 2
>>> seconds for over 30000 data. If implenting it will bring it down that will
>>> be helpful.
>>> On 06-Mar-2017 10:05 pm, "Michael Stolz" <[email protected]> wrote:
>>>
>>>> Of course if you're REALLY in need of speed you can write your own
>>>> custom implementations of toData and fromData for the DataSerializable
>>>> Interface.
>>>>
>>>> I haven't seen anyone need that much speed in a long time though.
>>>>
>>>>
>>>> --
>>>>
>>>> Mike Stolz
>>>> Principal Engineer - Gemfire Product Manager
>>>> Mobile: 631-835-4771 <(631)%20835-4771>
>>>>
>>>> On Mar 3, 2017 11:16 PM, "Real Wes" <[email protected]> wrote:
>>>>
>>>>> Amit,
>>>>>
>>>>>
>>>>>
>>>>> John and Mike’s advice about tradeoffs is worth heeding. You’ll find
>>>>> that your speed is probably just fine with putAll but if you just have to
>>>>> have NOS in your tank, you might consider - since you’re inside a function
>>>>> - do the putAll from the function into your region but change the region
>>>>> scope to distributed-no-ack.  See: https://geode.apache.org/docs/
>>>>> guide/developing/distributed_regions/choosing_level_of_dist.html
>>>>>
>>>>>
>>>>>
>>>>> Wes
>>>>>
>>>>>
>>>>>
>>>>> *From:* Amit Pandey [mailto:[email protected]]
>>>>> *Sent:* Friday, March 3, 2017 12:26 PM
>>>>> *To:* [email protected]
>>>>> *Subject:* Re: fastest way to bulk insert in geode
>>>>>
>>>>>
>>>>>
>>>>> Hey John ,
>>>>>
>>>>>
>>>>>
>>>>> Thanks I am planning to use Spring XD. But my current usecase is that
>>>>> I am aggregating and doing some computes in a Function and then I want to
>>>>> populate it with the values I have a map , is region.putAll the fastest?
>>>>>
>>>>>
>>>>>
>>>>> Regards
>>>>>
>>>>>
>>>>>
>>>>> On Fri, Mar 3, 2017 at 10:52 PM, John Blum <[email protected]> wrote:
>>>>>
>>>>> You might consider using the Snapshot service
>>>>> <http://gemfire90.docs.pivotal.io/geode/managing/cache_snapshots/chapter_overview.html>
>>>>>  [1]
>>>>> if you previously had data in a Region of another Cluster (for instance).
>>>>>
>>>>>
>>>>>
>>>>> If the data is coming externally, then *Spring XD
>>>>> <http://projects.spring.io/spring-xd/> *[2] is a great tool for
>>>>> moving (streaming) data from a source
>>>>> <http://docs.spring.io/spring-xd/docs/1.3.1.RELEASE/reference/html/#sources>
>>>>>  [3]
>>>>> to a sink
>>>>> <http://docs.spring.io/spring-xd/docs/1.3.1.RELEASE/reference/html/#sinks>
>>>>>  [4].
>>>>> It also allows you to perform all manners of transformations/conversions,
>>>>> trigger events, and so and so forth.
>>>>>
>>>>>
>>>>>
>>>>> -j
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> [1] http://gemfire90.docs.pivotal.io/geode/managing/cache_sn
>>>>> apshots/chapter_overview.html
>>>>>
>>>>> [2] http://projects.spring.io/spring-xd/
>>>>>
>>>>> [3] http://docs.spring.io/spring-xd/docs/1.3.1.RELEASE/refer
>>>>> ence/html/#sources
>>>>>
>>>>> [4] http://docs.spring.io/spring-xd/docs/1.3.1.RELEASE/refer
>>>>> ence/html/#sinks
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On Fri, Mar 3, 2017 at 9:13 AM, Amit Pandey <[email protected]>
>>>>> wrote:
>>>>>
>>>>> Hey Guys,
>>>>>
>>>>>
>>>>>
>>>>> Whats the fastest way to do bulk insert in a region?
>>>>>
>>>>>
>>>>>
>>>>> I am using region.putAll , is there any alternative/faster API?
>>>>>
>>>>>
>>>>>
>>>>> regards
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>>
>>>>> -John
>>>>>
>>>>> john.blum10101 (skype)
>>>>>
>>>>>
>>>>>
>>>>
>>
>>
>> --
>> -John
>> john.blum10101 (skype)
>>
>
>
>
>

Re: fastest way to bulk insert in geode

Reply via email to