Makes sense to me, I suppose.
Isn't another approach to swap the opening of the sockets?
Or why does failure to control :179 sockets not stop startup?
> I heard from two devs that started a 2nd bgpd by accident (forgot -n for
> a config check) which then caused downtime.
>
> Below diff adds a check to bgpd similar to the one we have now in ospfd and
> ospf6d: if another process is listening on the control socket bgpd exits.
>
> The situation is a bit different than in the ospfd case:
> - The socket path is defined in the config and not with an option.
> - The config is parsed relatively late (after daemon).
>
> Because of that rcctl would not detect that bgpd exits. But this only applies
> when the first bgpd was started without using the rc scripts.
>
> bgpd already detects that it can not bind the port if another instance is
> running. Adding a fatal there can have an effect when reloading after a
> config change. Loosing all sessions is not nice.
>
> Of course if the socket path is changed in the config and then reloaded the
> same can happen. I think this is less likely than changing a listen address.
>
> I think it is sufficient to add this check only to the control socket and
> not to the restricted socket.
>
> The diff in action:
>
> # /usr/src/usr.sbin/bgpd/obj/bgpd -dv
> startup
> session engine ready
> rereading config
> cannot bind to 0.0.0.0:179: Address already in use
> cannot bind to [::]:179: Address already in use
> control_check: socket in use
> fatal in bgpd: control socket check failed
> rib_new: Adj-RIB-In -> 0
> rib_new: Adj-RIB-Out -> 1
> route decision engine ready
> peer closed imsg connection
> fatal in RDE: Lost connection to parent
> peer closed imsg connection
> SE: Lost connection to parent
> session engine exiting
> #
>
> Comments? OKs?
>
> Remi
>
>
> cvs diff: Diffing .
> Index: bgpd.c
> ===================================================================
> RCS file: /cvs/src/usr.sbin/bgpd/bgpd.c,v
> retrieving revision 1.204
> diff -u -p -r1.204 bgpd.c
> --- bgpd.c 29 Sep 2018 08:11:11 -0000 1.204
> +++ bgpd.c 11 Nov 2018 20:33:32 -0000
> @@ -939,6 +939,8 @@ control_setup(struct bgpd_config *conf)
> }
> if ((cname = strdup(conf->csock)) == NULL)
> fatal("strdup");
> + if ((control_check(cname)) == -1)
> + fatalx("control socket check failed");
> if ((fd = control_init(0, cname)) == -1)
> fatalx("control socket setup failed");
> if (control_listen(fd) == -1)
> Index: control.c
> ===================================================================
> RCS file: /cvs/src/usr.sbin/bgpd/control.c,v
> retrieving revision 1.90
> diff -u -p -r1.90 control.c
> --- control.c 11 Aug 2017 16:02:53 -0000 1.90
> +++ control.c 10 Nov 2018 20:52:46 -0000
> @@ -38,6 +38,32 @@ void control_result(struct ctl_conn *,
> ssize_t imsg_read_nofd(struct imsgbuf *);
>
> int
> +control_check(char *path)
> +{
> + struct sockaddr_un sun;
> + int fd;
> +
> + bzero(&sun, sizeof(sun));
> + sun.sun_family = AF_UNIX;
> + strlcpy(sun.sun_path, path, sizeof(sun.sun_path));
> +
> + if ((fd = socket(AF_UNIX, SOCK_STREAM, 0)) == -1) {
> + log_warn("control_check: socket check");
> + return (-1);
> + }
> +
> + if (connect(fd, (struct sockaddr *)&sun, sizeof(sun)) == 0) {
> + log_warnx("control_check: socket in use");
> + close(fd);
> + return (-1);
> + }
> +
> + close(fd);
> +
> + return (0);
> +}
> +
> +int
> control_init(int restricted, char *path)
> {
> struct sockaddr_un sun;
> Index: session.h
> ===================================================================
> RCS file: /cvs/src/usr.sbin/bgpd/session.h,v
> retrieving revision 1.125
> diff -u -p -r1.125 session.h
> --- session.h 24 Oct 2018 08:26:37 -0000 1.125
> +++ session.h 10 Nov 2018 21:53:04 -0000
> @@ -250,6 +250,7 @@ void prepare_listeners(struct bgpd_conf
> int get_mpe_label(struct rdomain *);
>
> /* control.c */
> +int control_check(char *);
> int control_init(int, char *);
> int control_listen(int);
> void control_shutdown(int);
>