Re: Revert "GEODE-9369: Command to copy region entries from a WAN site to… #6811

2021-08-27 Thread Kirk Lund
I've merged the revert to develop. Thanks for all the reviews!

On Fri, Aug 27, 2021 at 9:56 AM Kirk Lund  wrote:

> Please review PR #6811 so we can revert GEODE-9369 to clear up the failing
> unit test. Hopefully this will be ready to be reintroduced next week.
>
> If anyone knows of follow-on commits after c9d465, please let me know.
>
> Revert "GEODE-9369: Command to copy region entries from a WAN site to…
> #6811
> https://github.com/apache/geode/pull/6811
>
> Thanks,
> Kirk
>
>


Revert "GEODE-9369: Command to copy region entries from a WAN site to… #6811

2021-08-27 Thread Kirk Lund
Please review PR #6811 so we can revert GEODE-9369 to clear up the failing
unit test. Hopefully this will be ready to be reintroduced next week.

If anyone knows of follow-on commits after c9d465, please let me know.

Revert "GEODE-9369: Command to copy region entries from a WAN site to… #6811
https://github.com/apache/geode/pull/6811

Thanks,
Kirk


Re: Revert GEODE-9408: Avoid duplicate events sent by Serial Gateway Sender when group-transaction-events is true (#6663)

2021-08-27 Thread Kirk Lund
That PR was reverting the wrong commit, so I've closed it. I'll submit a
new PR today to revert the introduction of WanCopyRegionFunction and its
unit test. I'm going to work with Alberto next week to address some
testability issues so it can be reintroduced as soon as possible.

Thanks,
Kirk

On Thu, Aug 26, 2021 at 4:56 PM Kirk Lund  wrote:

> I have submitted PR #6809 to temporarily revert GEODE-9408 until it's
> ready to go back in.
>
> https://github.com/apache/geode/pull/6809
>
> I've reviewed the code that went in with #b377e3, and I think we should
> revert it to rework WanCopyRegionFunction and its unit test. The unit test
> currently mocks three methods in WanCopyRegionFunction (known as partial
> mocking) which is a sign that dependencies are hidden inside the class and
> need to be pulled up to the constructor.
>
> WanCopyRegionFunction also has a static final ExecutorService which is
> never shutdown. The ExecutorService should live outside
> WanCopyRegionFunction, be passed in via the constructor, and be shutdown
> when the Cache closes.
>
> The intermittent test failures are not specific to Windows and are likely
> to also fail on a slow Linux machine.
>
> We really need to avoid Thread sleeps especially in unit tests. This and
> the partial mocking are signs that the class needs to change to better
> enable unit testing.
>
> I'll be more than happy to help but I think we need to revert it so we
> have time to discuss it without everyone else being affected by failing
> tests.
>
> Thanks,
> Kirk
>


Re: Questions about conserve-sockets and WAN replication

2021-08-27 Thread Alberto Gomez
Hi Dave,

I have created the following JIRA tickets:

https://issues.apache.org/jira/browse/GEODE-9557
https://issues.apache.org/jira/browse/GEODE-9558

Please feel free to comment about them.

Thanks,

Alberto

From: Dave Barnes 
Sent: Thursday, August 26, 2021 7:47 PM
To: dev@geode.apache.org 
Cc: Alberto Gomez 
Subject: Re: Questions about conserve-sockets and WAN replication

Alberto,
As you point out, the recommendation to use `conserve-sockets=false` in WAN
configurations already appears in at least three places in the Geode User
Guide.
We can insert an additional mention into the guide -- did you have a
location in mind (sorry, this is an old thread and I don't recall whether
we already discussed this).

With regard to function execution ignoring the global setting, I suspect
that changing function behavior would break existing applications, so that
is probably not an option here.
Again, if you help me identify locations in the doc where you think we need
to insert a note regarding this behavior, we can do that.

Let's state these changes in one or two JIRA tickets for the docs
component. I'm happy to work with you on creating that ticket, but need
your help in identifying target locations in the guide.
Thanks,
Dave


On Thu, Aug 26, 2021 at 2:28 AM Alberto Gomez 
wrote:

> @Dave Barnes, sorry for not having answered to
> your e-mail before.
>
> I am missing the following in the referred documentation:
>
>   *   State that conserve-sockets must be set to false for members that
> participate in a WAN deployment as it is stated in other parts of the
> documentation (see
> ./geode-docs/topologies_and_comm/multi_site_configuration/setting_up_a_multisite_system.html.md.erb
> ./geode-docs/managing/monitor_tune/sockets_and_gateways.html.md.erb
> ./geode-docs/reference/topics/gemfire_properties.html.md.erb):
>   *   "To avoid hangs related to WAN messaging, always use the default
> setting of conserve-sockets=false for
> <%=vars.product_name%> members that participate in a WAN deployment."
>
> @dev@geode.apache.org, besides the above, I
> think the documentation is missing a very important piece of information
> that I have found in [1]:
> "even with conserve-sockets set to false, function executions do not use
> this setting and defaults to conserve-sockets=true behavior, regardless of
> the conserve-sockets setting. "
> and in [2]:
> "a Function Execution Processor does not honor the conserve-sockets
> setting so a shared P2P message reader is used in the remote server"
>
> I wonder if this should be stated in the Geode documentation or rather if
> function execution behavior should be changed to honor the conserve-sockets
> setting.
> Any thoughts on this?
>
> Best regards,
>
> Alberto
>
> [1]
> https://community.pivotal.io/s/article/GemFire-Function-Executions-and-conserve-sockets-behavior?language=en_US
> [2]
> https://medium.com/swlh/threads-used-in-apache-geode-function-execution-9dd707cf227c#bd8c
>
>
>   *
>
> 
> From: Dave Barnes 
> Sent: Wednesday, July 7, 2021 1:05 AM
> To: dev@geode.apache.org 
> Subject: Re: Questions about conserve-sockets and WAN replication
>
> Alberto,
> I recently updated some of the descriptions regarding conserve-sockets.
> Please check out this PR and see if it addresses any of your concerns.
> https://github.com/apache/geode/pull/6516
>
> On Tue, Jul 6, 2021 at 9:57 AM Alberto Gomez 
> wrote:
>
> > Hi,
> >
> > The Geode documentation states the following about conserve-sockets and
> > WAN deployments in [1]:
> >
> > "WAN deployments increase the messaging demands on a Geode system. To
> > avoid hangs related to WAN messaging, always set `conserve-sockets=false`
> > for Geode members that participate in a WAN deployment."
> >
> > Could anyone please provide some more detailed information about why and
> > where these hangs could happen? Is this a hard limitation or something to
> > be considered under certain circumstances?
> >
> > We have run into an unexpected situation which we wonder if it is related
> > to the documentation statement above:
> >
> > In a system like the following:
> >  - 2 WAN sites and 3 servers each
> >  - several partitioned regions with parallel senders
> >  - several replicated regions with serial senders
> >  - conserve-sockets set to true
> >
> > We have sometimes observed, when trying to stop a parallel gateway sender
> > while puts are being sent to both sites, that the thread stopping the
> > gateway sender in one of the members gets stuck waiting to receive a
> reply
> > from the other members (trying to get the size of the queue, see [2]). We
> > see also other threads stuck, some trying to get a lock held by the stuck
> > thread and others waiting in
> > ReplyProcessor21.waitForRepliesUninterruptibly() trying to put or get
> data
> > remotely (See [3] and [4]).
> > If we set conserve-sockets to false we do not