Re: bnxt panic

2022-03-19 Thread Hrvoje Popovski
On 18.3.2022. 18:59, Alexander Bluhm wrote:
> On Fri, Mar 18, 2022 at 11:21:15AM +0100, Hrvoje Popovski wrote:
>> On 17.3.2022. 21:31, Alexander Bluhm wrote:
>>> I don't have the device and don't know the code.  But other drivers
>>> don't process rx and tx interrupts when the interface is not running.
>>>
>>> Maybe this helps.  dlg@ and jmatthew@ should know better than me.
>>>
>>> bluhm
>>
>> Hi,
>>
>> i didn't manage to trigger same panic by hand, maybe i didn't try hard
>> enough, so i've put "ifconfig bnxt0 down && sleep 2 && ifconfig bnxt0 up
>> && sleep 2" in loop and i've got panic below ...
> 

Hi,

sorry for hijack your diff with some other panic ... I thought that
panics are similar.
I've tried everything one more time and now I can answer your questions :)

> Does my diff make things better?

Yes, I can't panic box with this diff as i can without it.


> Does the diff just replace one panic with another?

No.


> Do you need more effort to trigger crashes now?

I've tried all day to reproduce panic with your diff but i can't ...


> Does the panic below also happend without my diff?

Yes, panic below is easy to trigger with or without your diff ..




> bluhm
> 
>>
>> with this loop, box panic quite fast even if box doesn't have any
>> interface configured and it's totally idle ..
>>
>>
>> bnxt0: HWRM_RING_ALLOC command returned RESOURCE_ALLOC_ERROR error.
>> bnxt0: failed to set up tx ring
>> uvm_fault(0xfd904e3ac880, 0xff0, 0, 1) -> e
>> kernel: page fault trap, code=0
>> Stopped at  bnxt_queue_down+0x61:   movq0(%r13,%rax,1),%rsi
>> TIDPIDUID PRFLAGS PFLAGS  CPU  COMMAND
>> *186511  37451  0 0x3  02K ifconfig
>> bnxt_queue_down(802ba000,802bb080) at bnxt_queue_down+0x61
>> bnxt_up(802ba000) at bnxt_up+0x3fb
>> bnxt_ioctl(802ba048,80206910,800027d1bc90) at bnxt_ioctl+0x15b
>> ifioctl(fd8e5b1a98e8,80206910,800027d1bc90,800027d22d28) at
>> ifioctl+0x92b
>> soo_ioctl(fd8e5b6e3cb8,80206910,800027d1bc90,800027d22d28)
>> at soo_ioctl+0x161
>> sys_ioctl(800027d22d28,800027d1bda0,800027d1bdf0) at
>> sys_ioctl+0x2c4
>> syscall(800027d1be60) at syscall+0x374
>> Xsyscall() at Xsyscall+0x128
>> end of kernel
>> end trace frame: 0x7f7d3130, count: 7
>> https://www.openbsd.org/ddb.html describes the minimum info required in
>> bug reports.  Insufficient info makes it difficult to find and fix bugs.
>> ddb{2}>
>>
>>
>> ddb{2}> show reg
>> rdi   0x8226e7b8pci_bus_dma_tag
>> rsi   0x802bb080
>> rbp   0x800027d1ba80
>> rbx 0xff
>> rdx   0xc800
>> rcx0x202
>> rax0xff0
>> r8  0x3f
>> r90x800027d1b9b8
>> r10   0x8f1758fa1ca4c280
>> r11   0x814e96f0_bus_dmamap_destroy
>> r120x100
>> r130
>> r140x101
>> r15   0x802ba000
>> rip   0x815385c1bnxt_queue_down+0x61
>> cs   0x8
>> rflags   0x10216__ALIGN_SIZE+0xf216
>> rsp   0x800027d1ba20
>> ss  0x10
>> bnxt_queue_down+0x61:   movq0(%r13,%rax,1),%rsi
>>
>>
>> ddb{2}> ps
>>PID TID   PPIDUID  S   FLAGS  WAIT  COMMAND
>> *37451  186511  30450  0  7 0x3ifconfig
>>  30450  100025  92193  0  30x10008b  sigsusp   sh
>>  92193  391163  1  0  30x10008b  sigsusp   ksh
>>  29430  255996  1  0  30x100098  kqreadcron
>>  83509   36913  52718 95  3   0x1100092  kqreadsmtpd
>>  85854  120678  52718103  3   0x1100092  kqreadsmtpd
>>  10057  425077  52718 95  3   0x1100092  kqreadsmtpd
>>   8903  266150  52718 95  30x100092  kqreadsmtpd
>>  10952   25497  52718 95  3   0x1100092  kqreadsmtpd
>>  10277  273205  52718 95  3   0x1100092  kqreadsmtpd
>>  52718  225011  1  0  30x100080  kqreadsmtpd
>>  10965   74402  1  0  30x88  kqreadsshd
>>  92646   92606  1  0  30x100080  kqreadntpd
>>  69044   66045  48912 83  30x100092  kqreadntpd
>>  48912  346342  1 83  3   0x1100092  kqreadntpd
>>  94844  373900  21297 73  3   0x1100090  kqreadsyslogd
>>  21297   85879  1  0  30x100082  netio syslogd
>>  35205  481579  0  0  3 0x14200  bored smr
>>  12575  275960  0  0  3 0x14200  pgzerozerothread
>>  24927  447870  0  0  3 0x14200  aiodoned  aiodoned
>>  80994  339930  0  0  3 0x14200  syncerupdate
>>  44042  109239   

Re: route sourceaddr and ping/traceroute

2022-03-19 Thread Claudio Jeker
On Sat, Mar 19, 2022 at 10:57:18PM +1000, David Gwynne wrote:
> On Sat, Mar 19, 2022 at 12:53:54PM +0100, Claudio Jeker wrote:
> > On Sat, Mar 19, 2022 at 12:27:43PM +0100, Claudio Jeker wrote:
> > > On Sat, Mar 19, 2022 at 10:51:13AM +, Stuart Henderson wrote:
> > > > On 2022/03/19 19:14, David Gwynne wrote:
> > > > > On Thu, Mar 17, 2022 at 12:05:16PM -0600, Theo de Raadt wrote:
> > > > > > This should not be done in applications.  The kernel must do it.  
> > > > > > It means
> > > > > > the current kernel code is worng.
> > > > > 
> > > > > i think this is the right place for raw ipv4 sockets like what
> > > > > ping/traceroute uses.
> > > > > 
> > > > > ipv6 looks like it already does the right thing.
> > > > 
> > > > Given that this turned out so similar to the existing v6 support that
> > > > I think you didn't notice before writing the v4 version, this looks
> > > > like the right place indeed :)
> > > > 
> > > > Works for me, OK
> > >  
> > > Did someone try this on an ospf router?
> > > I think our ospfd(8) always includes the source address and uses 
> > > IP_HDRINCL
> > > but not sure about other daemons. Are there other raw IP users that expect
> > > the IP source to be set to the outgoing interface by default?
> > > Maybe IGMP proxies and routers. E.g. dvmrpd depends on IP_MULTICAST_IF to
> > > set the source IP address.
> > > 
> > > Looking at the code I think this will break the IP_MULTICAST_IF case.
> > > In ip_output() the check is like this:
> > > 
> > > if ((IN_MULTICAST(ip->ip_dst.s_addr) ||
> > > (ip->ip_dst.s_addr == INADDR_BROADCAST)) &&
> > > imo != NULL && (ifp = if_get(imo->imo_ifidx)) != NULL) {
> > > 
> > > mtu = ifp->if_mtu;
> > > if (ip->ip_src.s_addr == INADDR_ANY) {
> > > struct in_ifaddr *ia;
> > > 
> > > IFP_TO_IA(ifp, ia);
> > > if (ia != NULL)
> > > ip->ip_src = ia->ia_addr.sin_addr;
> > > }
> > > 
> > > Now raw_ip.c already set ip_src to something so
> > >   if (ip->ip_src.s_addr == INADDR_ANY)
> > > is never true.
> > > 
> > > It is possible to skip the in_pcbselsrc() call in rip_output() when
> > > IN_MULTICAST(ip->ip_dst.s_addr) || (ip->ip_dst.s_addr == INADDR_BROADCAST)
> > > 
> > > Is that good enough?
> > 
> > Actually in_pcbselsrc() checks the multicast options but only for the
> > IN_MULTICAST() case. I guess we could add INADDR_BROADCAST handling in
> > there as well. Seems like a sensible thing todo. Broadcast  is just a very
> > special version of multicast.
> 
> i know we talked about this off list, but for the record there are two
> kinds of IP broadcast. there's 255.255.255.255, and the broadcast
> address on subnets, eg, 192.168.1.0/24 has a broadcast address of
> 192.168.1.255. you're talking about 255.255.255.255 here.

Yes, this only matters for 255.255.255.255. The per netowrk broadcast
addresses were a stupid idea and luckily not relevant here.
 
> im honestly surprised in_pcbselsrc doesnt already do what you're talking
> about. like i said, the impression i get from stevens and other bits of
> the stack is that 255.255.255.255 is largely treated as a multicast
> address, which is why im surprised that it isnt handled already and why
> i think it makes sense. what tests do we need to feel confident with it
> though? 

I was also surprised that this code differs from the very similar code in
ip_output(). I agree that 255.255.255.255 should be treated like a
multicast address.
 
I'm fairly confident that this will be fine. It may unbreak some cases but
luckily the use of 255.255.255.255 is very limited. Apart from DHCP/BOOTP
not much depends on it.

> > Also if this is changed is there still a way to have ip->ip_src ==
> > INADDR_ANY in ip_output()?
> 
> dunno. would slapping a counter on it be useful?

Something like that. It would be nice to remove some of the code in
ip_output() which should be redundant now. But as usual I guess some other
protocol handler may still depend on this feature of ip_output.
A counter or even printf would help identify these cases.
 
> Index: in_pcb.c
> ===
> RCS file: /cvs/src/sys/netinet/in_pcb.c,v
> retrieving revision 1.261
> diff -u -p -r1.261 in_pcb.c
> --- in_pcb.c  14 Mar 2022 22:38:43 -  1.261
> +++ in_pcb.c  19 Mar 2022 12:54:55 -
> @@ -893,11 +893,13 @@ in_pcbselsrc(struct in_addr **insrc, str
>   }
>  
>   /*
> -  * If the destination address is multicast and an outgoing
> -  * interface has been set as a multicast option, use the
> -  * address of that interface as our source address.
> +  * If the destination address is multicast or limited
> +  * broadcast (255.255.255.255) and an outgoing interface has
> +  * been set as a multicast option, use the address of that
> +  * interface as our source address.
> 

Re: route sourceaddr and ping/traceroute

2022-03-19 Thread David Gwynne
On Sat, Mar 19, 2022 at 12:53:54PM +0100, Claudio Jeker wrote:
> On Sat, Mar 19, 2022 at 12:27:43PM +0100, Claudio Jeker wrote:
> > On Sat, Mar 19, 2022 at 10:51:13AM +, Stuart Henderson wrote:
> > > On 2022/03/19 19:14, David Gwynne wrote:
> > > > On Thu, Mar 17, 2022 at 12:05:16PM -0600, Theo de Raadt wrote:
> > > > > This should not be done in applications.  The kernel must do it.  It 
> > > > > means
> > > > > the current kernel code is worng.
> > > > 
> > > > i think this is the right place for raw ipv4 sockets like what
> > > > ping/traceroute uses.
> > > > 
> > > > ipv6 looks like it already does the right thing.
> > > 
> > > Given that this turned out so similar to the existing v6 support that
> > > I think you didn't notice before writing the v4 version, this looks
> > > like the right place indeed :)
> > > 
> > > Works for me, OK
> >  
> > Did someone try this on an ospf router?
> > I think our ospfd(8) always includes the source address and uses IP_HDRINCL
> > but not sure about other daemons. Are there other raw IP users that expect
> > the IP source to be set to the outgoing interface by default?
> > Maybe IGMP proxies and routers. E.g. dvmrpd depends on IP_MULTICAST_IF to
> > set the source IP address.
> > 
> > Looking at the code I think this will break the IP_MULTICAST_IF case.
> > In ip_output() the check is like this:
> > 
> > if ((IN_MULTICAST(ip->ip_dst.s_addr) ||
> > (ip->ip_dst.s_addr == INADDR_BROADCAST)) &&
> > imo != NULL && (ifp = if_get(imo->imo_ifidx)) != NULL) {
> > 
> > mtu = ifp->if_mtu;
> > if (ip->ip_src.s_addr == INADDR_ANY) {
> > struct in_ifaddr *ia;
> > 
> > IFP_TO_IA(ifp, ia);
> > if (ia != NULL)
> > ip->ip_src = ia->ia_addr.sin_addr;
> > }
> > 
> > Now raw_ip.c already set ip_src to something so
> > if (ip->ip_src.s_addr == INADDR_ANY)
> > is never true.
> > 
> > It is possible to skip the in_pcbselsrc() call in rip_output() when
> > IN_MULTICAST(ip->ip_dst.s_addr) || (ip->ip_dst.s_addr == INADDR_BROADCAST)
> > 
> > Is that good enough?
> 
> Actually in_pcbselsrc() checks the multicast options but only for the
> IN_MULTICAST() case. I guess we could add INADDR_BROADCAST handling in
> there as well. Seems like a sensible thing todo. Broadcast  is just a very
> special version of multicast.

i know we talked about this off list, but for the record there are two
kinds of IP broadcast. there's 255.255.255.255, and the broadcast
address on subnets, eg, 192.168.1.0/24 has a broadcast address of
192.168.1.255. you're talking about 255.255.255.255 here.

im honestly surprised in_pcbselsrc doesnt already do what you're talking
about. like i said, the impression i get from stevens and other bits of
the stack is that 255.255.255.255 is largely treated as a multicast
address, which is why im surprised that it isnt handled already and why
i think it makes sense. what tests do we need to feel confident with it
though? 

> Also if this is changed is there still a way to have ip->ip_src ==
> INADDR_ANY in ip_output()?

dunno. would slapping a counter on it be useful?

Index: in_pcb.c
===
RCS file: /cvs/src/sys/netinet/in_pcb.c,v
retrieving revision 1.261
diff -u -p -r1.261 in_pcb.c
--- in_pcb.c14 Mar 2022 22:38:43 -  1.261
+++ in_pcb.c19 Mar 2022 12:54:55 -
@@ -893,11 +893,13 @@ in_pcbselsrc(struct in_addr **insrc, str
}
 
/*
-* If the destination address is multicast and an outgoing
-* interface has been set as a multicast option, use the
-* address of that interface as our source address.
+* If the destination address is multicast or limited
+* broadcast (255.255.255.255) and an outgoing interface has
+* been set as a multicast option, use the address of that
+* interface as our source address.
 */
-   if (IN_MULTICAST(sin->sin_addr.s_addr) && mopts != NULL) {
+   if ((IN_MULTICAST(sin->sin_addr.s_addr) || 
+   sin->sin_addr.s_addr == INADDR_BROADCAST) && mopts != NULL) {
struct ifnet *ifp;
 
ifp = if_get(mopts->imo_ifidx);



Re: route sourceaddr and ping/traceroute

2022-03-19 Thread Claudio Jeker
On Sat, Mar 19, 2022 at 12:27:43PM +0100, Claudio Jeker wrote:
> On Sat, Mar 19, 2022 at 10:51:13AM +, Stuart Henderson wrote:
> > On 2022/03/19 19:14, David Gwynne wrote:
> > > On Thu, Mar 17, 2022 at 12:05:16PM -0600, Theo de Raadt wrote:
> > > > This should not be done in applications.  The kernel must do it.  It 
> > > > means
> > > > the current kernel code is worng.
> > > 
> > > i think this is the right place for raw ipv4 sockets like what
> > > ping/traceroute uses.
> > > 
> > > ipv6 looks like it already does the right thing.
> > 
> > Given that this turned out so similar to the existing v6 support that
> > I think you didn't notice before writing the v4 version, this looks
> > like the right place indeed :)
> > 
> > Works for me, OK
>  
> Did someone try this on an ospf router?
> I think our ospfd(8) always includes the source address and uses IP_HDRINCL
> but not sure about other daemons. Are there other raw IP users that expect
> the IP source to be set to the outgoing interface by default?
> Maybe IGMP proxies and routers. E.g. dvmrpd depends on IP_MULTICAST_IF to
> set the source IP address.
> 
> Looking at the code I think this will break the IP_MULTICAST_IF case.
> In ip_output() the check is like this:
> 
> if ((IN_MULTICAST(ip->ip_dst.s_addr) ||
> (ip->ip_dst.s_addr == INADDR_BROADCAST)) &&
> imo != NULL && (ifp = if_get(imo->imo_ifidx)) != NULL) {
> 
> mtu = ifp->if_mtu;
> if (ip->ip_src.s_addr == INADDR_ANY) {
> struct in_ifaddr *ia;
> 
> IFP_TO_IA(ifp, ia);
> if (ia != NULL)
> ip->ip_src = ia->ia_addr.sin_addr;
> }
> 
> Now raw_ip.c already set ip_src to something so
>   if (ip->ip_src.s_addr == INADDR_ANY)
> is never true.
> 
> It is possible to skip the in_pcbselsrc() call in rip_output() when
> IN_MULTICAST(ip->ip_dst.s_addr) || (ip->ip_dst.s_addr == INADDR_BROADCAST)
> 
> Is that good enough?

Actually in_pcbselsrc() checks the multicast options but only for the
IN_MULTICAST() case. I guess we could add INADDR_BROADCAST handling in
there as well. Seems like a sensible thing todo. Broadcast  is just a very
special version of multicast.

Also if this is changed is there still a way to have ip->ip_src ==
INADDR_ANY in ip_output()?
  
> > > Index: raw_ip.c
> > > ===
> > > RCS file: /cvs/src/sys/netinet/raw_ip.c,v
> > > retrieving revision 1.123
> > > diff -u -p -r1.123 raw_ip.c
> > > --- raw_ip.c  14 Mar 2022 22:38:43 -  1.123
> > > +++ raw_ip.c  19 Mar 2022 03:40:44 -
> > > @@ -222,6 +222,7 @@ int
> > >  rip_output(struct mbuf *m, struct socket *so, struct sockaddr *dstaddr,
> > >  struct mbuf *control)
> > >  {
> > > + struct sockaddr_in *dst = satosin(dstaddr);
> > >   struct ip *ip;
> > >   struct inpcb *inp;
> > >   int flags, error;
> > > @@ -246,8 +247,8 @@ rip_output(struct mbuf *m, struct socket
> > >   ip->ip_off = htons(0);
> > >   ip->ip_p = inp->inp_ip.ip_p;
> > >   ip->ip_len = htons(m->m_pkthdr.len);
> > > - ip->ip_src = inp->inp_laddr;
> > > - ip->ip_dst = satosin(dstaddr)->sin_addr;
> > > + ip->ip_src.s_addr = INADDR_ANY;
> > > + ip->ip_dst = dst->sin_addr;
> > >   ip->ip_ttl = inp->inp_ip.ip_ttl ? inp->inp_ip.ip_ttl : MAXTTL;
> > >   } else {
> > >   if (m->m_pkthdr.len > IP_MAXPACKET) {
> > > @@ -262,11 +263,23 @@ rip_output(struct mbuf *m, struct socket
> > >   ip = mtod(m, struct ip *);
> > >   if (ip->ip_id == 0)
> > >   ip->ip_id = htons(ip_randomid());
> > > + dst->sin_addr = ip->ip_dst;
> > >  
> > >   /* XXX prevent ip_output from overwriting header fields */
> > >   flags |= IP_RAWOUTPUT;
> > >   ipstat_inc(ips_rawout);
> > >   }
> > > +
> > > + if (ip->ip_src.s_addr == INADDR_ANY) {
> > > + struct in_addr *laddr;
> > > +
> > > + error = in_pcbselsrc(, dst, inp);
> > > + if (error != 0)
> > > + return (error);
> > > +
> > > + ip->ip_src = *laddr;
> > > + }
> > > +
> > >  #ifdef INET6
> > >   /*
> > >* A thought:  Even though raw IP shouldn't be able to set IPv6
> > > 
> > 
> 
> -- 
> :wq Claudio
> 

-- 
:wq Claudio



Re: route sourceaddr and ping/traceroute

2022-03-19 Thread Claudio Jeker
On Sat, Mar 19, 2022 at 10:51:13AM +, Stuart Henderson wrote:
> On 2022/03/19 19:14, David Gwynne wrote:
> > On Thu, Mar 17, 2022 at 12:05:16PM -0600, Theo de Raadt wrote:
> > > This should not be done in applications.  The kernel must do it.  It means
> > > the current kernel code is worng.
> > 
> > i think this is the right place for raw ipv4 sockets like what
> > ping/traceroute uses.
> > 
> > ipv6 looks like it already does the right thing.
> 
> Given that this turned out so similar to the existing v6 support that
> I think you didn't notice before writing the v4 version, this looks
> like the right place indeed :)
> 
> Works for me, OK
 
Did someone try this on an ospf router?
I think our ospfd(8) always includes the source address and uses IP_HDRINCL
but not sure about other daemons. Are there other raw IP users that expect
the IP source to be set to the outgoing interface by default?
Maybe IGMP proxies and routers. E.g. dvmrpd depends on IP_MULTICAST_IF to
set the source IP address.

Looking at the code I think this will break the IP_MULTICAST_IF case.
In ip_output() the check is like this:

if ((IN_MULTICAST(ip->ip_dst.s_addr) ||
(ip->ip_dst.s_addr == INADDR_BROADCAST)) &&
imo != NULL && (ifp = if_get(imo->imo_ifidx)) != NULL) {

mtu = ifp->if_mtu;
if (ip->ip_src.s_addr == INADDR_ANY) {
struct in_ifaddr *ia;

IFP_TO_IA(ifp, ia);
if (ia != NULL)
ip->ip_src = ia->ia_addr.sin_addr;
}

Now raw_ip.c already set ip_src to something so
if (ip->ip_src.s_addr == INADDR_ANY)
is never true.

It is possible to skip the in_pcbselsrc() call in rip_output() when
IN_MULTICAST(ip->ip_dst.s_addr) || (ip->ip_dst.s_addr == INADDR_BROADCAST)

Is that good enough?
 
> > Index: raw_ip.c
> > ===
> > RCS file: /cvs/src/sys/netinet/raw_ip.c,v
> > retrieving revision 1.123
> > diff -u -p -r1.123 raw_ip.c
> > --- raw_ip.c14 Mar 2022 22:38:43 -  1.123
> > +++ raw_ip.c19 Mar 2022 03:40:44 -
> > @@ -222,6 +222,7 @@ int
> >  rip_output(struct mbuf *m, struct socket *so, struct sockaddr *dstaddr,
> >  struct mbuf *control)
> >  {
> > +   struct sockaddr_in *dst = satosin(dstaddr);
> > struct ip *ip;
> > struct inpcb *inp;
> > int flags, error;
> > @@ -246,8 +247,8 @@ rip_output(struct mbuf *m, struct socket
> > ip->ip_off = htons(0);
> > ip->ip_p = inp->inp_ip.ip_p;
> > ip->ip_len = htons(m->m_pkthdr.len);
> > -   ip->ip_src = inp->inp_laddr;
> > -   ip->ip_dst = satosin(dstaddr)->sin_addr;
> > +   ip->ip_src.s_addr = INADDR_ANY;
> > +   ip->ip_dst = dst->sin_addr;
> > ip->ip_ttl = inp->inp_ip.ip_ttl ? inp->inp_ip.ip_ttl : MAXTTL;
> > } else {
> > if (m->m_pkthdr.len > IP_MAXPACKET) {
> > @@ -262,11 +263,23 @@ rip_output(struct mbuf *m, struct socket
> > ip = mtod(m, struct ip *);
> > if (ip->ip_id == 0)
> > ip->ip_id = htons(ip_randomid());
> > +   dst->sin_addr = ip->ip_dst;
> >  
> > /* XXX prevent ip_output from overwriting header fields */
> > flags |= IP_RAWOUTPUT;
> > ipstat_inc(ips_rawout);
> > }
> > +
> > +   if (ip->ip_src.s_addr == INADDR_ANY) {
> > +   struct in_addr *laddr;
> > +
> > +   error = in_pcbselsrc(, dst, inp);
> > +   if (error != 0)
> > +   return (error);
> > +
> > +   ip->ip_src = *laddr;
> > +   }
> > +
> >  #ifdef INET6
> > /*
> >  * A thought:  Even though raw IP shouldn't be able to set IPv6
> > 
> 

-- 
:wq Claudio



Re: route sourceaddr and ping/traceroute

2022-03-19 Thread Stuart Henderson
On 2022/03/19 19:14, David Gwynne wrote:
> On Thu, Mar 17, 2022 at 12:05:16PM -0600, Theo de Raadt wrote:
> > This should not be done in applications.  The kernel must do it.  It means
> > the current kernel code is worng.
> 
> i think this is the right place for raw ipv4 sockets like what
> ping/traceroute uses.
> 
> ipv6 looks like it already does the right thing.

Given that this turned out so similar to the existing v6 support that
I think you didn't notice before writing the v4 version, this looks
like the right place indeed :)

Works for me, OK



> Index: raw_ip.c
> ===
> RCS file: /cvs/src/sys/netinet/raw_ip.c,v
> retrieving revision 1.123
> diff -u -p -r1.123 raw_ip.c
> --- raw_ip.c  14 Mar 2022 22:38:43 -  1.123
> +++ raw_ip.c  19 Mar 2022 03:40:44 -
> @@ -222,6 +222,7 @@ int
>  rip_output(struct mbuf *m, struct socket *so, struct sockaddr *dstaddr,
>  struct mbuf *control)
>  {
> + struct sockaddr_in *dst = satosin(dstaddr);
>   struct ip *ip;
>   struct inpcb *inp;
>   int flags, error;
> @@ -246,8 +247,8 @@ rip_output(struct mbuf *m, struct socket
>   ip->ip_off = htons(0);
>   ip->ip_p = inp->inp_ip.ip_p;
>   ip->ip_len = htons(m->m_pkthdr.len);
> - ip->ip_src = inp->inp_laddr;
> - ip->ip_dst = satosin(dstaddr)->sin_addr;
> + ip->ip_src.s_addr = INADDR_ANY;
> + ip->ip_dst = dst->sin_addr;
>   ip->ip_ttl = inp->inp_ip.ip_ttl ? inp->inp_ip.ip_ttl : MAXTTL;
>   } else {
>   if (m->m_pkthdr.len > IP_MAXPACKET) {
> @@ -262,11 +263,23 @@ rip_output(struct mbuf *m, struct socket
>   ip = mtod(m, struct ip *);
>   if (ip->ip_id == 0)
>   ip->ip_id = htons(ip_randomid());
> + dst->sin_addr = ip->ip_dst;
>  
>   /* XXX prevent ip_output from overwriting header fields */
>   flags |= IP_RAWOUTPUT;
>   ipstat_inc(ips_rawout);
>   }
> +
> + if (ip->ip_src.s_addr == INADDR_ANY) {
> + struct in_addr *laddr;
> +
> + error = in_pcbselsrc(, dst, inp);
> + if (error != 0)
> + return (error);
> +
> + ip->ip_src = *laddr;
> + }
> +
>  #ifdef INET6
>   /*
>* A thought:  Even though raw IP shouldn't be able to set IPv6
> 



Re: route sourceaddr and ping/traceroute

2022-03-19 Thread David Gwynne
On Thu, Mar 17, 2022 at 12:05:16PM -0600, Theo de Raadt wrote:
> This should not be done in applications.  The kernel must do it.  It means
> the current kernel code is worng.

i think this is the right place for raw ipv4 sockets like what
ping/traceroute uses.

ipv6 looks like it already does the right thing.

Index: raw_ip.c
===
RCS file: /cvs/src/sys/netinet/raw_ip.c,v
retrieving revision 1.123
diff -u -p -r1.123 raw_ip.c
--- raw_ip.c14 Mar 2022 22:38:43 -  1.123
+++ raw_ip.c19 Mar 2022 03:40:44 -
@@ -222,6 +222,7 @@ int
 rip_output(struct mbuf *m, struct socket *so, struct sockaddr *dstaddr,
 struct mbuf *control)
 {
+   struct sockaddr_in *dst = satosin(dstaddr);
struct ip *ip;
struct inpcb *inp;
int flags, error;
@@ -246,8 +247,8 @@ rip_output(struct mbuf *m, struct socket
ip->ip_off = htons(0);
ip->ip_p = inp->inp_ip.ip_p;
ip->ip_len = htons(m->m_pkthdr.len);
-   ip->ip_src = inp->inp_laddr;
-   ip->ip_dst = satosin(dstaddr)->sin_addr;
+   ip->ip_src.s_addr = INADDR_ANY;
+   ip->ip_dst = dst->sin_addr;
ip->ip_ttl = inp->inp_ip.ip_ttl ? inp->inp_ip.ip_ttl : MAXTTL;
} else {
if (m->m_pkthdr.len > IP_MAXPACKET) {
@@ -262,11 +263,23 @@ rip_output(struct mbuf *m, struct socket
ip = mtod(m, struct ip *);
if (ip->ip_id == 0)
ip->ip_id = htons(ip_randomid());
+   dst->sin_addr = ip->ip_dst;
 
/* XXX prevent ip_output from overwriting header fields */
flags |= IP_RAWOUTPUT;
ipstat_inc(ips_rawout);
}
+
+   if (ip->ip_src.s_addr == INADDR_ANY) {
+   struct in_addr *laddr;
+
+   error = in_pcbselsrc(, dst, inp);
+   if (error != 0)
+   return (error);
+
+   ip->ip_src = *laddr;
+   }
+
 #ifdef INET6
/*
 * A thought:  Even though raw IP shouldn't be able to set IPv6