Hi Chao, Thanks for looking into this.
> > Hi Xuneng, > > I just reviewed the patch. It splits the current STREAMING state into > CONNECTING and STREAMING, where CONNECTING represents the connection > establishing phase, which makes sense. The patch is solid, I have just a few > small comments: > > 1 - 0001 > ``` > + <para> > + <literal>stopped</literal>: WAL receiver process is not running. > + This state is normally not visible because the view returns no row > + when the WAL receiver is not running. > + </para> > ``` > > “Is not running” appears twice in this short paragraph, looks redundant. > Suggestion: > ``` > <literal>stopped</literal>: WAL receiver process is not running. > This state is normally not visible because the view returns no row in this > case. > ``` > > 2 - 0002 > ``` > + WALRCV_CONNECTING, /* connecting to primary, > replication not yet > + * started */ > ``` > > I think “primary” is inaccurate. A connection destination could be a primary, > or the other standby, so maybe change “primary” to “upstream server”. Yeah, "upstream server" is more precise. There are multiple user-facing log messages in this file that also use “primary”. I’m wondering whether we should update those as well. > > > 3 - 0002 > ``` > * Mark walreceiver as running in shared memory. > ``` > > I think we should update this comment, changing “running” to “connecting”. > There is not a “running” state, before this patch, “streaming” can roughly > equal to “running”, after this patch, “running” is kinda unclear. > It has been changed. > 4 - 0002 > ``` > + /* Connection established, switch to streaming state > */ > + SpinLockAcquire(&walrcv->mutex); > + Assert(walrcv->walRcvState == WALRCV_CONNECTING); > + walrcv->walRcvState = WALRCV_STREAMING; > + SpinLockRelease(&walrcv->mutex); > ``` > > I don’t feel good with this Assert(). Looking at ShutdownWalRcv(), the > startup process might set the state to STOPPING, so there is a race. > > Before this patch, this is no state change here, thus rest logic can handle > STOPPING. After this patch, if the race occurs, in dev mode, the Assert is > fired; in production mode, STOPPING is overwritten by STREAMING, which is > wrong. > > So, instead of Assert(), I think we should check if current state is > CONNECTING, if not, it should not proceed. It makes sense to me. Please check the updated patch. -- Best, Xuneng
v6-0001-Add-WALRCV_CONNECTING-state-to-walreceiver.patch
Description: Binary data
