Re: Spark geode best practices

Charlie Black Wed, 26 Jun 2019 10:47:00 -0700

Try batches that are small as a starting point - say 100.

On Wed, Jun 26, 2019 at 10:33 AM aashish choudhary <
aashish.choudha...@gmail.com> wrote:


> Yes we see exceeded max-connections error on server side.
>
> So I was trying to see how the putAll API works in general and from a
> standard java client I was trying to simulate the behaviour that we see on
> our server.
> I tried to put 600k records using putAll on my local machine with 1
> locator and 2 servers. Region type is replicate persistent and I could see
> that local clientCache API getting crashed with some "pool unexpected"
> error. We do see this error on our spark code as well. It then do a retry
> and gets failed. However surprisingly data gets inserted in the region even
> though clientCache java API was crashed.
>
> I tried to run it through in some batches but those also got failed and
> it's too slow.
>
> Only way I was able to make it work by is increasing readtimeout to 60
> seconds.
>
> Can someone share some tips on putAll API?
> How to use it effectively?
>
>
> With best regards,
> Ashish
>
> On Wed, Jun 26, 2019, 6:20 AM Anilkumar Gingade <aging...@pivotal.io>
> wrote:
>
>> Ashish,
>>
>> Do you see "exceeded max-connections" error...
>>
>> Operation/Job getting completed second time indicates, the server where
>> the operation is executed first time may have issues, you may want to see
>> the load on that server and if there are any memory issues.
>>
>> >>What is the recommended way to connect to geode using spark?
>> Its more of how the geode is used in this context; is the spark
>> processors are acting as geode's client or peer node. If its geode client,
>> then its more about tuning client connections based on how/what operations
>> are performed.
>>
>>  Anil
>>
>>
>>
>>
>> On Tue, Jun 25, 2019 at 10:54 AM aashish choudhary <
>> aashish.choudha...@gmail.com> wrote:
>>
>>> We could also see below on server side logs as well.
>>>
>>> Rejected connection from Server connection from
>>> >> [client host address=x.yx.x.x; client port=abc] because incoming
>>> >> request was rejected by pool possibly due to thread exhaustion
>>> >>
>>>
>>>
>>> On Tue, Jun 25, 2019, 7:27 AM aashish choudhary <
>>> aashish.choudha...@gmail.com> wrote:
>>>
>>>> As I mentioned earlier threads count could go to 4000 and we have seen
>>>> readtimeout crossing default 10 seconds. We tried to increase read timeout
>>>> to 30 seconds but that didn't work either. Record count is not more than
>>>> 600k.
>>>>
>>>> Job gets successful in second attempt without changing anything which
>>>> is bit weird.
>>>>
>>>> With best regards,
>>>> Ashish
>>>>
>>>> On Tue, Jun 25, 2019, 12:23 AM Anilkumar Gingade <aging...@pivotal.io>
>>>> wrote:
>>>>
>>>>> Hi Ashish,
>>>>>
>>>>> How many threads at a time executing putAll jobs in a single client
>>>>> (spark job?)...
>>>>> Do you see read timeout exception in client logs...If so, can you try
>>>>> increasing the read timeout value. Or reducing the putAll size.
>>>>>
>>>>> In case of PutAll for partitioned region; the putAll (entries) size is
>>>>> broken down and sent to respective servers based on its data affinity; the
>>>>> reason its working with partitioned region.
>>>>>
>>>>> You can find more detail on how client-server connection works at:
>>>>>
>>>>> https://geode.apache.org/docs/guide/14/topologies_and_comm/topology_concepts/how_the_pool_manages_connections.html
>>>>>
>>>>> -Anil.
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On Mon, Jun 24, 2019 at 10:04 AM aashish choudhary <
>>>>> aashish.choudha...@gmail.com> wrote:
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> We have been experiencing issues while connect to geode using putAll
>>>>>> API with spark. Issue is specific to one particular spark job which tries
>>>>>> to load data to a replicated region. Exception we see in the server side 
>>>>>> is
>>>>>> that default limit of 800 gets maxed out and on client side we see retry
>>>>>> attempt to each server but gets failed even though when we re ran the 
>>>>>> same
>>>>>> job it gets completed without any issue.
>>>>>>
>>>>>> In the code problem I could see is that we are connecting to geode
>>>>>> using client cache in forEachPartition which I think could be the issue. 
>>>>>> So
>>>>>> for each partition we are making a connection to geode. In stats file we
>>>>>> could see that connections getting timeout and there is thread burst also
>>>>>> sometimes >4000.
>>>>>>
>>>>>> What is the recommended way to connect to geode using spark?
>>>>>>
>>>>>> But this one specific job which gets failed most of the times and is
>>>>>> a replicated region. Also when we change the type of region to 
>>>>>> partitioned
>>>>>> then job gets completed. We have enabled disk persistence for both type 
>>>>>> of
>>>>>> regions.
>>>>>>
>>>>>> Thoughts?
>>>>>>
>>>>>>
>>>>>>
>>>>>> With best regards,
>>>>>> Ashish
>>>>>>
>>>>>

-- 
Charlie Black | cbl...@pivotal.io

Re: Spark geode best practices

Reply via email to