Re: Spark geode best practices

Anilkumar Gingade Tue, 25 Jun 2019 17:50:55 -0700

Ashish,

Do you see "exceeded max-connections" error...


Operation/Job getting completed second time indicates, the server where the
operation is executed first time may have issues, you may want to see the
load on that server and if there are any memory issues.

>>What is the recommended way to connect to geode using spark?
Its more of how the geode is used in this context; is the spark processors
are acting as geode's client or peer node. If its geode client, then its
more about tuning client connections based on how/what operations are
performed.

 Anil




On Tue, Jun 25, 2019 at 10:54 AM aashish choudhary <
aashish.choudha...@gmail.com> wrote:

> We could also see below on server side logs as well.
>
> Rejected connection from Server connection from
> >> [client host address=x.yx.x.x; client port=abc] because incoming
> >> request was rejected by pool possibly due to thread exhaustion
> >>
>
>
> On Tue, Jun 25, 2019, 7:27 AM aashish choudhary <
> aashish.choudha...@gmail.com> wrote:
>
>> As I mentioned earlier threads count could go to 4000 and we have seen
>> readtimeout crossing default 10 seconds. We tried to increase read timeout
>> to 30 seconds but that didn't work either. Record count is not more than
>> 600k.
>>
>> Job gets successful in second attempt without changing anything which is
>> bit weird.
>>
>> With best regards,
>> Ashish
>>
>> On Tue, Jun 25, 2019, 12:23 AM Anilkumar Gingade <aging...@pivotal.io>
>> wrote:
>>
>>> Hi Ashish,
>>>
>>> How many threads at a time executing putAll jobs in a single client
>>> (spark job?)...
>>> Do you see read timeout exception in client logs...If so, can you try
>>> increasing the read timeout value. Or reducing the putAll size.
>>>
>>> In case of PutAll for partitioned region; the putAll (entries) size is
>>> broken down and sent to respective servers based on its data affinity; the
>>> reason its working with partitioned region.
>>>
>>> You can find more detail on how client-server connection works at:
>>>
>>> https://geode.apache.org/docs/guide/14/topologies_and_comm/topology_concepts/how_the_pool_manages_connections.html
>>>
>>> -Anil.
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> On Mon, Jun 24, 2019 at 10:04 AM aashish choudhary <
>>> aashish.choudha...@gmail.com> wrote:
>>>
>>>> Hi,
>>>>
>>>> We have been experiencing issues while connect to geode using putAll
>>>> API with spark. Issue is specific to one particular spark job which tries
>>>> to load data to a replicated region. Exception we see in the server side is
>>>> that default limit of 800 gets maxed out and on client side we see retry
>>>> attempt to each server but gets failed even though when we re ran the same
>>>> job it gets completed without any issue.
>>>>
>>>> In the code problem I could see is that we are connecting to geode
>>>> using client cache in forEachPartition which I think could be the issue. So
>>>> for each partition we are making a connection to geode. In stats file we
>>>> could see that connections getting timeout and there is thread burst also
>>>> sometimes >4000.
>>>>
>>>> What is the recommended way to connect to geode using spark?
>>>>
>>>> But this one specific job which gets failed most of the times and is a
>>>> replicated region. Also when we change the type of region to partitioned
>>>> then job gets completed. We have enabled disk persistence for both type of
>>>> regions.
>>>>
>>>> Thoughts?
>>>>
>>>>
>>>>
>>>> With best regards,
>>>> Ashish
>>>>
>>>

Re: Spark geode best practices

Reply via email to