Hi Ashish, Do you have custom code that connects spark to geode? I know there was a geode-spark connector at one point and that it was forked: https://github.com/Pivotal-Field-Engineering/geode-spark-connector (but it looks like it hasn't been updated in awhile). Just curious if there was some code we could look at.
On Mon, Jun 24, 2019 at 11:53 AM Anilkumar Gingade <aging...@pivotal.io> wrote: > Hi Ashish, > > How many threads at a time executing putAll jobs in a single client (spark > job?)... > Do you see read timeout exception in client logs...If so, can you try > increasing the read timeout value. Or reducing the putAll size. > > In case of PutAll for partitioned region; the putAll (entries) size is > broken down and sent to respective servers based on its data affinity; the > reason its working with partitioned region. > > You can find more detail on how client-server connection works at: > > https://geode.apache.org/docs/guide/14/topologies_and_comm/topology_concepts/how_the_pool_manages_connections.html > > -Anil. > > > > > > > > On Mon, Jun 24, 2019 at 10:04 AM aashish choudhary < > aashish.choudha...@gmail.com> wrote: > >> Hi, >> >> We have been experiencing issues while connect to geode using putAll API >> with spark. Issue is specific to one particular spark job which tries to >> load data to a replicated region. Exception we see in the server side is >> that default limit of 800 gets maxed out and on client side we see retry >> attempt to each server but gets failed even though when we re ran the same >> job it gets completed without any issue. >> >> In the code problem I could see is that we are connecting to geode using >> client cache in forEachPartition which I think could be the issue. So for >> each partition we are making a connection to geode. In stats file we could >> see that connections getting timeout and there is thread burst also >> sometimes >4000. >> >> What is the recommended way to connect to geode using spark? >> >> But this one specific job which gets failed most of the times and is a >> replicated region. Also when we change the type of region to partitioned >> then job gets completed. We have enabled disk persistence for both type of >> regions. >> >> Thoughts? >> >> >> >> With best regards, >> Ashish >> >