Re: [Maria-discuss] GTID and missing domain

2021-02-27 Thread mariadb

Kristian,

Thank you! You helped me on this once before but I _think_ I've finally 
got it now. At this point domain 0 is removed from all four servers 
(both slave_pos and binlog_state) and it looks like they're able to 
connect around to each other as expected. Gives me much more confidence 
in being able to bounce them around as needed in the future.


Thanks again!

Dan

On 2/27/2021 11:03 AM, Kristian Nielsen wrote:

mari...@biblestuph.com writes:


And my primary server has:

gtid_binlog_pos  0-303-67739600,1-303-7363061243,100-303-4338582

 gtid_binlog_state
0-302-67690294,0-301-67719794,0-303-67739600,1-301-7350472534,1-302-7350381758,1-303-7363061243,100-302-4242958,100-301-4332195,100-303-4338582



set global gtid_slave_pos = '1-303-7360639083,100-303-4337869';
start slave;



Got fatal error 1236 from master when reading data from binary log:
'Could not find GTID state requested by slave in any binlog files.
Probably the slave state is too old and required binlog files have
been purged.

Even though I'm positive there are no domain 0 transactions (again,
hasn't been in service for years).


Yes.

You write that "there are no domain 0 transactions". But from the point of
view of the database, there _are_ domain 0 transactions, even though they
may be long in the past. These are seen in gtid_binlog_pos (and
gtid_binlog_state).

When your slave has the 0-domain in the gtid_slave_pos, the master knows
that the slave is missing no transactions. When you delete the 0-domain from
the slave, this is the same conceptually as saying the slave is missing
_all_ transactions in domain 0, and the master must send them all (or error
out if they have been purged, as here).

In general, when a slave connects, the master needs to send all transaction
in a domain that the slave did not apply yet - otherwise the slave will be
missing transactions and have the wrong data. This holds regardless of how
old those missing transactions might be. If a slave connects two years after
last being active, the system should still give a reasonable error, not
silently let the slave continue with incorrect data.

That is why you get the error.


if I:

FLUSH BINARY LOGS DELETE_DOMAIN_ID=(0)

on the master, would I then be able to connect to it via

set global gtid_slave_pos = '1-303-7360639083,100-303-4337869';


Yes.

With this command, we are re-defining the history of the master to say that
there were never any transactions in domain 0. Therefore, any slave that
connects cannot be missing any such transactions.

Hope this helps,

  - Kristian.
  



___
Mailing list: https://launchpad.net/~maria-discuss
Post to : maria-discuss@lists.launchpad.net
Unsubscribe : https://launchpad.net/~maria-discuss
More help   : https://help.launchpad.net/ListHelp


Re: [Maria-discuss] GTID and missing domain

2021-02-27 Thread Kristian Nielsen
mari...@biblestuph.com writes:

> And my primary server has:
>
> gtid_binlog_pos  0-303-67739600,1-303-7363061243,100-303-4338582 
>
> gtid_binlog_state
> 0-302-67690294,0-301-67719794,0-303-67739600,1-301-7350472534,1-302-7350381758,1-303-7363061243,100-302-4242958,100-301-4332195,100-303-4338582
>  

> set global gtid_slave_pos = '1-303-7360639083,100-303-4337869';
> start slave;

> Got fatal error 1236 from master when reading data from binary log:
> 'Could not find GTID state requested by slave in any binlog files.
> Probably the slave state is too old and required binlog files have
> been purged.
>
> Even though I'm positive there are no domain 0 transactions (again,
> hasn't been in service for years).

Yes.

You write that "there are no domain 0 transactions". But from the point of
view of the database, there _are_ domain 0 transactions, even though they
may be long in the past. These are seen in gtid_binlog_pos (and
gtid_binlog_state).

When your slave has the 0-domain in the gtid_slave_pos, the master knows
that the slave is missing no transactions. When you delete the 0-domain from
the slave, this is the same conceptually as saying the slave is missing
_all_ transactions in domain 0, and the master must send them all (or error
out if they have been purged, as here).

In general, when a slave connects, the master needs to send all transaction
in a domain that the slave did not apply yet - otherwise the slave will be
missing transactions and have the wrong data. This holds regardless of how
old those missing transactions might be. If a slave connects two years after
last being active, the system should still give a reasonable error, not
silently let the slave continue with incorrect data.

That is why you get the error.

> if I:
>
> FLUSH BINARY LOGS DELETE_DOMAIN_ID=(0)
>
> on the master, would I then be able to connect to it via
>
> set global gtid_slave_pos = '1-303-7360639083,100-303-4337869';

Yes.

With this command, we are re-defining the history of the master to say that
there were never any transactions in domain 0. Therefore, any slave that
connects cannot be missing any such transactions.

Hope this helps,

 - Kristian.
 

___
Mailing list: https://launchpad.net/~maria-discuss
Post to : maria-discuss@lists.launchpad.net
Unsubscribe : https://launchpad.net/~maria-discuss
More help   : https://help.launchpad.net/ListHelp


[Maria-discuss] GTID and missing domain

2021-02-27 Thread mariadb

Greetings,

I've had this issue before and never quite got to the bottom of it. It 
keeps biting me and I'm hoping I can figure out how to definitively 
solve it.


In brief, my replica server has this for gtid_slave_pos:

0-303-67739600,1-303-7360639083,100-303-4337869

And my primary server has:

gtid_binlog_pos  0-303-67739600,1-303-7363061243,100-303-4338582 

gtid_binlog_state 
0-302-67690294,0-301-67719794,0-303-67739600,1-301-7350472534,1-302-7350381758,1-303-7363061243,100-302-4242958,100-301-4332195,100-303-4338582 



That's all well and good and I can connect that way. But if I do this on 
the replica server:


stop slave;

select @@global.gtid_slave_pos;
0-303-67739600,1-303-7360639083,100-303-4337869

set global gtid_slave_pos = '1-303-7360639083,100-303-4337869';

start slave;

I get:

Got fatal error 1236 from master when reading data from binary log: 
'Could not find GTID state requested by slave in any binlog files. 
Probably the slave state is too old and required binlog files have been 
purged.


Even though I'm positive there are no domain 0 transactions (again, 
hasn't been in service for years).


If I add the `0-303-67739600` back in to gtid_slave_pos I can reconnect.

At this point, I have only one server of a four server chain that I can 
actually connect to with gtid (as above). So I'm a little reluctant to 
do too much experimenting. But just to ask the question, if I:


FLUSH BINARY LOGS DELETE_DOMAIN_ID=(0)

on the master, would I then be able to connect to it via

set global gtid_slave_pos = '1-303-7360639083,100-303-4337869';

from the replica?

At this point, I cannot connect any of my other servers (which currently 
are all replicating from the same master) to each other, even the ones 
that don't show the 0- domain in the gtid-binlog vars. I'm hoping if I 
can figure out the above scenario it might help me deal with the rest, 
but I just keep feeling like I'm missing something.


TIA,

Dan


___
Mailing list: https://launchpad.net/~maria-discuss
Post to : maria-discuss@lists.launchpad.net
Unsubscribe : https://launchpad.net/~maria-discuss
More help   : https://help.launchpad.net/ListHelp