Hey Lyndon, Poor dev here, cant hire you. Not in that kind of position :)
Hey Jake, Makes sense. Will try your approach, with DataSerializable. Hi Charlie, Okay. I think yea, yes I understand GC needs to be tuned. Also currently I do use Bulk sizes like I put 500 items and then clear the bulk data and then fill up 500 again and retry. using DataSerializable with this approach should be helpful I guess. Thanks everyone, I will be trying out things and update you guys On Tue, Mar 7, 2017 at 12:48 AM, Lyndon Adams <[email protected]> wrote: > Oh my god Charlie you are taking my money making opportunities away from > me. Basically he is right plus you got to add some black GC magic in to the > mix to optimise pauses. > > > On 6 Mar 2017, at 18:57, Charlie Black <[email protected]> wrote: > > putAll() is the bulk operation for geode. Plain and simple. > > The other techniques outlined in this thread are all efforts to go really > fast by separating concerns at multiple levels. Or taking advantage of > the fact there are other system and CPUs that are in the physical > architecture. > > Example: The GC comment - when creating the domain objects sometimes that > causes GC pressure which reduces throughput. I typically look at bulk > sizes to reduce that concern. > > Consider all suggestions then profile your options and choose the right > pattern for your app. > > Regards, > Charlie > > --- > Charlie Black > 858.480.9722 | [email protected] > > On Mar 6, 2017, at 10:42 AM, Amit Pandey <[email protected]> > wrote: > > Hey Jake, > > Thanks. I am a bot confused so a put should be faster than putAll ? > > John, > > I need to setup all data so that they can be queried. So I don't think > CacheLoader works for me. Those data are the results of a very large and > expensive computations and doing them dynamically will be costly. > > We have a time window to setup the system because after that some other > jobs will start. Currently its taking 2.4 seconds to insert 30,000 data and > its great. But I am just trying to optimize if it can be made faster. > > Regards > > On Tue, Mar 7, 2017 at 12:01 AM, John Blum <[email protected]> wrote: > >> Amit- >> >> Note, a CacheLoader does not necessarily imply "loading data from a >> database"; it can load data from any [external] data source and does so on >> demand (i.e. lazily, on a cache miss). However, as Mike points out, this >> might not work for your Use Case in situations where you are querying, for >> instance. >> >> I guess the real question here is, what is the requirement to pre-load >> this data quickly? What is the driving requirement here? >> >> For instance, is the need to be able to bring another system online >> quickly in case of "failover". If so, perhaps an architectural change is >> more appropriate such as an Active/Passive arch (using WAN). >> >> -j >> >> >> >> On Mon, Mar 6, 2017 at 9:45 AM, Amit Pandey <[email protected]> >> wrote: >> >>> We might need that actually...problem is we cant use dataloader because >>> we are not loading from database. So we have to use putall. Its taking 2 >>> seconds for over 30000 data. If implenting it will bring it down that will >>> be helpful. >>> On 06-Mar-2017 10:05 pm, "Michael Stolz" <[email protected]> wrote: >>> >>>> Of course if you're REALLY in need of speed you can write your own >>>> custom implementations of toData and fromData for the DataSerializable >>>> Interface. >>>> >>>> I haven't seen anyone need that much speed in a long time though. >>>> >>>> >>>> -- >>>> >>>> Mike Stolz >>>> Principal Engineer - Gemfire Product Manager >>>> Mobile: 631-835-4771 <(631)%20835-4771> >>>> >>>> On Mar 3, 2017 11:16 PM, "Real Wes" <[email protected]> wrote: >>>> >>>>> Amit, >>>>> >>>>> >>>>> >>>>> John and Mike’s advice about tradeoffs is worth heeding. You’ll find >>>>> that your speed is probably just fine with putAll but if you just have to >>>>> have NOS in your tank, you might consider - since you’re inside a function >>>>> - do the putAll from the function into your region but change the region >>>>> scope to distributed-no-ack. See: https://geode.apache.org/docs/ >>>>> guide/developing/distributed_regions/choosing_level_of_dist.html >>>>> >>>>> >>>>> >>>>> Wes >>>>> >>>>> >>>>> >>>>> *From:* Amit Pandey [mailto:[email protected]] >>>>> *Sent:* Friday, March 3, 2017 12:26 PM >>>>> *To:* [email protected] >>>>> *Subject:* Re: fastest way to bulk insert in geode >>>>> >>>>> >>>>> >>>>> Hey John , >>>>> >>>>> >>>>> >>>>> Thanks I am planning to use Spring XD. But my current usecase is that >>>>> I am aggregating and doing some computes in a Function and then I want to >>>>> populate it with the values I have a map , is region.putAll the fastest? >>>>> >>>>> >>>>> >>>>> Regards >>>>> >>>>> >>>>> >>>>> On Fri, Mar 3, 2017 at 10:52 PM, John Blum <[email protected]> wrote: >>>>> >>>>> You might consider using the Snapshot service >>>>> <http://gemfire90.docs.pivotal.io/geode/managing/cache_snapshots/chapter_overview.html> >>>>> [1] >>>>> if you previously had data in a Region of another Cluster (for instance). >>>>> >>>>> >>>>> >>>>> If the data is coming externally, then *Spring XD >>>>> <http://projects.spring.io/spring-xd/> *[2] is a great tool for >>>>> moving (streaming) data from a source >>>>> <http://docs.spring.io/spring-xd/docs/1.3.1.RELEASE/reference/html/#sources> >>>>> [3] >>>>> to a sink >>>>> <http://docs.spring.io/spring-xd/docs/1.3.1.RELEASE/reference/html/#sinks> >>>>> [4]. >>>>> It also allows you to perform all manners of transformations/conversions, >>>>> trigger events, and so and so forth. >>>>> >>>>> >>>>> >>>>> -j >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> [1] http://gemfire90.docs.pivotal.io/geode/managing/cache_sn >>>>> apshots/chapter_overview.html >>>>> >>>>> [2] http://projects.spring.io/spring-xd/ >>>>> >>>>> [3] http://docs.spring.io/spring-xd/docs/1.3.1.RELEASE/refer >>>>> ence/html/#sources >>>>> >>>>> [4] http://docs.spring.io/spring-xd/docs/1.3.1.RELEASE/refer >>>>> ence/html/#sinks >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> On Fri, Mar 3, 2017 at 9:13 AM, Amit Pandey <[email protected]> >>>>> wrote: >>>>> >>>>> Hey Guys, >>>>> >>>>> >>>>> >>>>> Whats the fastest way to do bulk insert in a region? >>>>> >>>>> >>>>> >>>>> I am using region.putAll , is there any alternative/faster API? >>>>> >>>>> >>>>> >>>>> regards >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> -- >>>>> >>>>> -John >>>>> >>>>> john.blum10101 (skype) >>>>> >>>>> >>>>> >>>> >> >> >> -- >> -John >> john.blum10101 (skype) >> > > > >
