I typically recommend small batch sizes (~1000 keys) with multiple threads 
(10-100 depending on resources).

Have you calculated if you’ve saturated your network bandwidth?  If so, none of 
these ideas will help.


Anthony


> On Jun 26, 2019, at 12:15 PM, aashish choudhary 
> <aashish.choudha...@gmail.com> wrote:
> 
> So are you saying that we should put in batches of 1~10k. But that I tried 
> already atleast for 10k and it was failing with default readtimeout. 
> Additionally it takes forever to put all 600k records into that region in 
> batchmode.
> 
> With best regards,
> Ashish
> 
> On Wed, Jun 26, 2019, 11:38 PM Xiaojian Zhou <gz...@pivotal.io 
> <mailto:gz...@pivotal.io>> wrote:
> You can increase max connection size from default 800 to 5000. We did that 
> long time ago for customer. 
> 
> I noticed that your servers are using "replicated" region. In that case, then 
> the singlehop will not take effect. That's fine. 
> 
> If putAll map is too big, then it will hit read timeout issue, because it 
> will take longer time to process bigger map. 
> 600k in one map is too big. According to my test, 1k to 10k is the 
> comfortable size. Since increasing read timeout workaround your issue. I feel 
> size too big is probably the real root cause. 
> 
> So my suggestions:
> 1) try to reduce your putAll map to 1k ~ 10K
> 2) If still not working, increase max connection size from 800 to 5000. 
> 
> Regards
> Gester Zhou
> 
> 
> 
> On Wed, Jun 26, 2019 at 10:46 AM Charlie Black <cbl...@pivotal.io 
> <mailto:cbl...@pivotal.io>> wrote:
> Try batches that are small as a starting point - say 100.   
> 
> On Wed, Jun 26, 2019 at 10:33 AM aashish choudhary 
> <aashish.choudha...@gmail.com <mailto:aashish.choudha...@gmail.com>> wrote:
> Yes we see exceeded max-connections error on server side.
> 
> So I was trying to see how the putAll API works in general and from a 
> standard java client I was trying to simulate the behaviour that we see on 
> our server.
> I tried to put 600k records using putAll on my local machine with 1 locator 
> and 2 servers. Region type is replicate persistent and I could see that local 
> clientCache API getting crashed with some "pool unexpected" error. We do see 
> this error on our spark code as well. It then do a retry and gets failed. 
> However surprisingly data gets inserted in the region even though clientCache 
> java API was crashed. 
> 
> I tried to run it through in some batches but those also got failed and it's 
> too slow.
> 
> Only way I was able to make it work by is increasing readtimeout to 60 
> seconds.
> 
> Can someone share some tips on putAll API?
> How to use it effectively?
> 
> 
> With best regards,
> Ashish
> 
> On Wed, Jun 26, 2019, 6:20 AM Anilkumar Gingade <aging...@pivotal.io 
> <mailto:aging...@pivotal.io>> wrote:
> Ashish,
> 
> Do you see "exceeded max-connections" error...
> 
> Operation/Job getting completed second time indicates, the server where the 
> operation is executed first time may have issues, you may want to see the 
> load on that server and if there are any memory issues.
> 
> >>What is the recommended way to connect to geode using spark?
> Its more of how the geode is used in this context; is the spark processors 
> are acting as geode's client or peer node. If its geode client, then its more 
> about tuning client connections based on how/what operations are performed.
> 
>  Anil
> 
> 
> 
> 
> On Tue, Jun 25, 2019 at 10:54 AM aashish choudhary 
> <aashish.choudha...@gmail.com <mailto:aashish.choudha...@gmail.com>> wrote:
> We could also see below on server side logs as well. 
> Rejected connection from Server connection from
> >> [client host address=x.yx.x.x; client port=abc] because incoming
> >> request was rejected by pool possibly due to thread exhaustion
> >>
> 
> On Tue, Jun 25, 2019, 7:27 AM aashish choudhary <aashish.choudha...@gmail.com 
> <mailto:aashish.choudha...@gmail.com>> wrote:
> As I mentioned earlier threads count could go to 4000 and we have seen 
> readtimeout crossing default 10 seconds. We tried to increase read timeout to 
> 30 seconds but that didn't work either. Record count is not more than 600k.
> 
> Job gets successful in second attempt without changing anything which is bit 
> weird.
> 
> With best regards,
> Ashish
> 
> On Tue, Jun 25, 2019, 12:23 AM Anilkumar Gingade <aging...@pivotal.io 
> <mailto:aging...@pivotal.io>> wrote:
> Hi Ashish,
> 
> How many threads at a time executing putAll jobs in a single client (spark 
> job?)...
> Do you see read timeout exception in client logs...If so, can you try 
> increasing the read timeout value. Or reducing the putAll size.
> 
> In case of PutAll for partitioned region; the putAll (entries) size is broken 
> down and sent to respective servers based on its data affinity; the reason 
> its working with partitioned region.
> 
> You can find more detail on how client-server connection works at:
> https://geode.apache.org/docs/guide/14/topologies_and_comm/topology_concepts/how_the_pool_manages_connections.html
>  
> <https://geode.apache.org/docs/guide/14/topologies_and_comm/topology_concepts/how_the_pool_manages_connections.html>
> 
> -Anil.
> 
> 
> 
> 
> 
> 
> 
> On Mon, Jun 24, 2019 at 10:04 AM aashish choudhary 
> <aashish.choudha...@gmail.com <mailto:aashish.choudha...@gmail.com>> wrote:
> Hi,
> 
> We have been experiencing issues while connect to geode using putAll API with 
> spark. Issue is specific to one particular spark job which tries to load data 
> to a replicated region. Exception we see in the server side is that default 
> limit of 800 gets maxed out and on client side we see retry attempt to each 
> server but gets failed even though when we re ran the same job it gets 
> completed without any issue.
> 
> In the code problem I could see is that we are connecting to geode using 
> client cache in forEachPartition which I think could be the issue. So for 
> each partition we are making a connection to geode. In stats file we could 
> see that connections getting timeout and there is thread burst also sometimes 
> >4000.
> 
> What is the recommended way to connect to geode using spark?
> 
> But this one specific job which gets failed most of the times and is a 
> replicated region. Also when we change the type of region to partitioned then 
> job gets completed. We have enabled disk persistence for both type of regions.
> 
> Thoughts?
> 
> 
> 
> With best regards,
> Ashish
> 
> 
> -- 
> Charlie Black | cbl...@pivotal.io <mailto:cbl...@pivotal.io>

Reply via email to