GCSAdmin <notificati...@github.com> writes: > In some case, the speed of reading binlog from master is high, especially > when doing a new replica. > It would bring the high traffic in master. > So We introduce a new variable "read_binlog_speed_limit" to control the > binlog reading rate for IO thread to solve the problem.
> It can work when slave_compressed_protocol is on. > But it maybe doesn't work well when the binlog event is very big. > You can view, comment on, or merge this pull request online at: > > https://github.com/MariaDB/server/pull/246 > -- Patch Links -- > > https://github.com/MariaDB/server/pull/246.patch > https://github.com/MariaDB/server/pull/246.diff Overall this looks clean and simple. There is one problem. The patch adds a field real_network_read_len to the NET structure. This will break the client library ABI, because NET is a part of MYSQL. So this would cause client programs to crash if they are linked with a different version of the client library. So this needs to be changed (if I understand correctly). One option might be to introduce new functions like cli_safe_read_reallen() and my_net_read_packet_reallen(), which return in addition the actual amount of bytes read from the server. The old cli_safe_read() and my_net_read_packet() could then become simple wrapper functions around those. And cli_safe_read_reallen() can be used in read_event() in sql/slave.cc. A smaller issue is that in case of a large packet, a large my_sleep() may be invoked, which will cause STOP SLAVE to hang. I think this can be solved simply by calling slave_sleep() instead, it handles terminating the wait early if interrupted by STOP SLAVE. Detailed comments on the patch below. I rebased the series against latest 10.2 to get a clean diff (the pull request includes a couple merges against the main 10.2 tree, these changes are unrelated to the patch). The rebase is in https://github.com/knielsen/server/tree/GCSAdmin-10.2-binlog-speed-limit-2 > diff --git a/include/mysql.h.pp b/include/mysql.h.pp > index 857f5b9..1da038c 100644 > --- a/include/mysql.h.pp > +++ b/include/mysql.h.pp > @@ -35,6 +35,7 @@ typedef struct st_net { > my_bool thread_specific_malloc; > unsigned char compress; > my_bool unused3; > + unsigned long real_network_read_len; As explained above, I believe this would break the ABI (that's the purpose of mysql.h.pp, to catch such problems). > diff --git a/sql/slave.cc b/sql/slave.cc > index 20bf68e..52bb668 100644 > --- a/sql/slave.cc > +++ b/sql/slave.cc > @@ -3307,13 +3308,14 @@ static int request_dump(THD *thd, MYSQL* mysql, > Master_info* mi, > try a reconnect. We do not want to print anything to > the error log in this case because this a anormal > event in an idle server. > + network_read_len get the real network read length in VIO, especially > using compressed protocol > > RETURN VALUES > 'packet_error' Error > number Length of packet > */ > > -static ulong read_event(MYSQL* mysql, Master_info *mi, bool* > suppress_warnings) > +static ulong read_event(MYSQL* mysql, Master_info *mi, bool* > suppress_warnings, ulong* network_read_len) Generally, lines longer than 80 characters should be avoided (coding style). > @@ -4473,6 +4479,34 @@ Stopping slave I/O thread due to out-of-memory error > from master"); > goto err; > } > > + /* Control the binlog read speed of master when > read_binlog_speed_limit is non-zero > + */ > + ulonglong read_binlog_speed_limit_in_bytes = > opt_read_binlog_speed_limit * 1024; > + if (read_binlog_speed_limit_in_bytes) > + { > + /* prevent the tokenamount become a large value, > + for example, the IO thread doesn't work for a long time > + */ > + if (tokenamount > read_binlog_speed_limit_in_bytes * 2) > + { > + lastchecktime = my_hrtime().val; > + tokenamount = read_binlog_speed_limit_in_bytes * 2; > + } > + > + do > + { > + ulonglong currenttime = my_hrtime().val; > + tokenamount += (currenttime - lastchecktime) * > read_binlog_speed_limit_in_bytes / (1000*1000); > + lastchecktime = currenttime; > + if(tokenamount < network_read_len) > + { > + ulonglong micro_sleeptime = 1000*1000 * (network_read_len - > tokenamount) / read_binlog_speed_limit_in_bytes ; > + my_sleep(micro_sleeptime > 1000 ? micro_sleeptime : 1000); // at > least sleep 1000 micro second > + } > + }while(tokenamount < network_read_len); > + tokenamount -= network_read_len; > + } > + As explained above, probably better to use slave_sleep() here to allow STOP SLAVE to interrupt a long sleep. Would it make sense to do this wait after calling queue_event()? This way the SQL thread can start applying the event immediately, reducing slave lag. What do you think? Thanks, - Kristian. _______________________________________________ Mailing list: https://launchpad.net/~maria-developers Post to : maria-developers@lists.launchpad.net Unsubscribe : https://launchpad.net/~maria-developers More help : https://help.launchpad.net/ListHelp