I typically recommend small batch sizes (~1000 keys) with multiple threads (10-100 depending on resources).
Have you calculated if you’ve saturated your network bandwidth? If so, none of these ideas will help. Anthony > On Jun 26, 2019, at 12:15 PM, aashish choudhary > <aashish.choudha...@gmail.com> wrote: > > So are you saying that we should put in batches of 1~10k. But that I tried > already atleast for 10k and it was failing with default readtimeout. > Additionally it takes forever to put all 600k records into that region in > batchmode. > > With best regards, > Ashish > > On Wed, Jun 26, 2019, 11:38 PM Xiaojian Zhou <gz...@pivotal.io > <mailto:gz...@pivotal.io>> wrote: > You can increase max connection size from default 800 to 5000. We did that > long time ago for customer. > > I noticed that your servers are using "replicated" region. In that case, then > the singlehop will not take effect. That's fine. > > If putAll map is too big, then it will hit read timeout issue, because it > will take longer time to process bigger map. > 600k in one map is too big. According to my test, 1k to 10k is the > comfortable size. Since increasing read timeout workaround your issue. I feel > size too big is probably the real root cause. > > So my suggestions: > 1) try to reduce your putAll map to 1k ~ 10K > 2) If still not working, increase max connection size from 800 to 5000. > > Regards > Gester Zhou > > > > On Wed, Jun 26, 2019 at 10:46 AM Charlie Black <cbl...@pivotal.io > <mailto:cbl...@pivotal.io>> wrote: > Try batches that are small as a starting point - say 100. > > On Wed, Jun 26, 2019 at 10:33 AM aashish choudhary > <aashish.choudha...@gmail.com <mailto:aashish.choudha...@gmail.com>> wrote: > Yes we see exceeded max-connections error on server side. > > So I was trying to see how the putAll API works in general and from a > standard java client I was trying to simulate the behaviour that we see on > our server. > I tried to put 600k records using putAll on my local machine with 1 locator > and 2 servers. Region type is replicate persistent and I could see that local > clientCache API getting crashed with some "pool unexpected" error. We do see > this error on our spark code as well. It then do a retry and gets failed. > However surprisingly data gets inserted in the region even though clientCache > java API was crashed. > > I tried to run it through in some batches but those also got failed and it's > too slow. > > Only way I was able to make it work by is increasing readtimeout to 60 > seconds. > > Can someone share some tips on putAll API? > How to use it effectively? > > > With best regards, > Ashish > > On Wed, Jun 26, 2019, 6:20 AM Anilkumar Gingade <aging...@pivotal.io > <mailto:aging...@pivotal.io>> wrote: > Ashish, > > Do you see "exceeded max-connections" error... > > Operation/Job getting completed second time indicates, the server where the > operation is executed first time may have issues, you may want to see the > load on that server and if there are any memory issues. > > >>What is the recommended way to connect to geode using spark? > Its more of how the geode is used in this context; is the spark processors > are acting as geode's client or peer node. If its geode client, then its more > about tuning client connections based on how/what operations are performed. > > Anil > > > > > On Tue, Jun 25, 2019 at 10:54 AM aashish choudhary > <aashish.choudha...@gmail.com <mailto:aashish.choudha...@gmail.com>> wrote: > We could also see below on server side logs as well. > Rejected connection from Server connection from > >> [client host address=x.yx.x.x; client port=abc] because incoming > >> request was rejected by pool possibly due to thread exhaustion > >> > > On Tue, Jun 25, 2019, 7:27 AM aashish choudhary <aashish.choudha...@gmail.com > <mailto:aashish.choudha...@gmail.com>> wrote: > As I mentioned earlier threads count could go to 4000 and we have seen > readtimeout crossing default 10 seconds. We tried to increase read timeout to > 30 seconds but that didn't work either. Record count is not more than 600k. > > Job gets successful in second attempt without changing anything which is bit > weird. > > With best regards, > Ashish > > On Tue, Jun 25, 2019, 12:23 AM Anilkumar Gingade <aging...@pivotal.io > <mailto:aging...@pivotal.io>> wrote: > Hi Ashish, > > How many threads at a time executing putAll jobs in a single client (spark > job?)... > Do you see read timeout exception in client logs...If so, can you try > increasing the read timeout value. Or reducing the putAll size. > > In case of PutAll for partitioned region; the putAll (entries) size is broken > down and sent to respective servers based on its data affinity; the reason > its working with partitioned region. > > You can find more detail on how client-server connection works at: > https://geode.apache.org/docs/guide/14/topologies_and_comm/topology_concepts/how_the_pool_manages_connections.html > > <https://geode.apache.org/docs/guide/14/topologies_and_comm/topology_concepts/how_the_pool_manages_connections.html> > > -Anil. > > > > > > > > On Mon, Jun 24, 2019 at 10:04 AM aashish choudhary > <aashish.choudha...@gmail.com <mailto:aashish.choudha...@gmail.com>> wrote: > Hi, > > We have been experiencing issues while connect to geode using putAll API with > spark. Issue is specific to one particular spark job which tries to load data > to a replicated region. Exception we see in the server side is that default > limit of 800 gets maxed out and on client side we see retry attempt to each > server but gets failed even though when we re ran the same job it gets > completed without any issue. > > In the code problem I could see is that we are connecting to geode using > client cache in forEachPartition which I think could be the issue. So for > each partition we are making a connection to geode. In stats file we could > see that connections getting timeout and there is thread burst also sometimes > >4000. > > What is the recommended way to connect to geode using spark? > > But this one specific job which gets failed most of the times and is a > replicated region. Also when we change the type of region to partitioned then > job gets completed. We have enabled disk persistence for both type of regions. > > Thoughts? > > > > With best regards, > Ashish > > > -- > Charlie Black | cbl...@pivotal.io <mailto:cbl...@pivotal.io>