Re: CARP as a module; followup thoughts

2009-04-22 Thread Bruce M. Simpson

Hi,

Will Andrews wrote:

Hello,

I've written a patch (against 8.0-CURRENT as of r191369) which makes
it possible to build, load, run,  unload CARP as a module, using the
GENERIC kernel.  It can be obtained from:

http://firepipe.net/patches/carp-as-module-20090421.diff
  


There's no need to implement the in*_proto_register() stuff in that 
patch, you should just be able to re-use the encap_attach_func() 
functions. Look at how PIM is implemented in ip_mroute.c for an example.


Other than that it looks like a good start... but would hold off on 
committing as-is. the more general case of registering a MAC address on 
an interface should be considered.


cheers,
BMS
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org


Re: kern/132722: [ath] Wifi ath0 associates fine with AP, but DHCP or IP does not work

2009-03-23 Thread Bruce M Simpson

Matthias Apitz wrote:

I went today evening with my EeePC and CURRENT on USB key
to that Greek restaurant; DHCP does not get IP in CURRENT either;
this is somehow good news, isn't it :-)
  


This may be orthogonal, but:
   A lab colleague and I have been seeing a sporadic problem where the 
ath0 exhibits the symptoms of being disassociated from its AP. We are 
running RELENG_7 on the EeePC 701 since the open source HAL merge.
   In the behaviour we're seeing, we don't see any problem with the 
initial dhclient run, the ath0 just seems to get disassociated within 
5-10 minutes of associating.


If we leave 'ping ap-ip-address' running in the background, we don't 
see this problem.


   We have yet to produce a tcpdump to catch it 'in the act' and 
observe the DLT_IEEE80211 traffic when it actually happens, I have only 
seen the symptoms. The AP does not show the EeePC units as being 
associated any more at this point, but ath0 still shows 'status: 
associated'. The AP involved is a Netgear WG602 V2, and is running the 
vendor's firmware.


I'll try to get set up with 'tcpdump -y ieee802_11' from initial boot 
(including dhcp and anything we bump into).


cheers
BMS
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org


Re: kern/132722: [ath] Wifi ath0 associates fine with AP, but DHCP or IP does not work

2009-03-23 Thread Bruce M Simpson
The following reply was made to PR kern/132722; it has been noted by GNATS.

From: Bruce M Simpson b...@incunabulum.net
To: Matthias Apitz g...@unixarea.de
Cc: bug-follo...@freebsd.org, Sam Leffler s...@freebsd.org, 
 freebsd-net@freebsd.org, Sean C. Farley s...@freebsd.org
Subject: Re: kern/132722: [ath] Wifi ath0 associates fine with AP, but DHCP
 or IP does not work
Date: Mon, 23 Mar 2009 18:44:42 +

 Matthias Apitz wrote:
  I went today evening with my EeePC and CURRENT on USB key
  to that Greek restaurant; DHCP does not get IP in CURRENT either;
  this is somehow good news, isn't it :-)

 
 This may be orthogonal, but:
 A lab colleague and I have been seeing a sporadic problem where the 
 ath0 exhibits the symptoms of being disassociated from its AP. We are 
 running RELENG_7 on the EeePC 701 since the open source HAL merge.
 In the behaviour we're seeing, we don't see any problem with the 
 initial dhclient run, the ath0 just seems to get disassociated within 
 5-10 minutes of associating.
 
 If we leave 'ping ap-ip-address' running in the background, we don't 
 see this problem.
 
 We have yet to produce a tcpdump to catch it 'in the act' and 
 observe the DLT_IEEE80211 traffic when it actually happens, I have only 
 seen the symptoms. The AP does not show the EeePC units as being 
 associated any more at this point, but ath0 still shows 'status: 
 associated'. The AP involved is a Netgear WG602 V2, and is running the 
 vendor's firmware.
 
 I'll try to get set up with 'tcpdump -y ieee802_11' from initial boot 
 (including dhcp and anything we bump into).
 
 cheers
 BMS
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org


ath0 apparent silent disassociation

2009-03-23 Thread Bruce M Simpson

[Repost without attachment]

OK. We've managed to reproduce this set of symptoms now in our work area.

[If anyone needs to see a pcap, please Cc: me offlist.]

Timebase: beginning of the pcap is in sync with a bringup from
single-user mode; the tcpdump runs in the background from init whilst
the system is brought up.

OK, so I timed the apparent loss of connectivity as 6m 30s from that
point I hit the stopwatch, to when I hit it again when the AP's Web GUI
no longer shows the STA affected as being associated.
Obviously such a timing is subject to human/visual jitter, and how
often Netgear's firmware pulls the STA association list from the AP into
the web GUI.

What stands out in the pcap is that 302.291s in (almost 5m exactly),
the STA (ath0) sends an IEEE 802.11 NULL frame to the AP with the PWR
MGT bit set (I'm going to sleep!). This more or less coincides with a
normal beacon from the Netgear AP. It does not advertise Auto Power Save
Delivery (apsd), that bit is 0.
This is puzzling as we don't enable power management by default. As
I understand it, this may be an AP feature in some environments... I can
try reproducing this with an explicit 'ifconfig ath0 -powersave' and see
if it reoccurs.

You'll see that after this NULL frame is sent, there is another
Probe Request, and the Netgear AP does Probe Respond, but this makes no
difference (I ended the capture around 150s after the NULL frame was sent).

At this point we can't send traffic from the ath0, or rather, the AP
is acting as though it never even heard the STA. The STA learns the AP's
IP address/MAC mapping through passive ARP -- we still see broadcasts on
the SSID -- but the AP has started to totally ignore the STA, and seemed
to have ignored its ARP requests also.
We are using MAC address ACL control with this AP, and the ath0
affected is definitely listed in its ACL table, configured up, rebooted etc.

It is as though the STA is entering power saving mode when not
explicitly told to, and the AP is not waking up the STA as it should.

If any more information needed, or where to look, please let me know
what's involved (I MFCed the change after all, so I'll help where I can
until I'm on holiday this week...)

My lab colleague is just working around this with 'ping ap-ip' for
now, that keeps things up, as does OpenVPN...

cheers
BMS



___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org


Re: kern/124282: [libc] socket(2): INP_PORTHIGH and INP_ONESBCAST share same value

2009-03-23 Thread Bruce M. Simpson

bru...@freebsd.org wrote:

Synopsis: [libc] socket(2): INP_PORTHIGH and INP_ONESBCAST share same value

Responsible-Changed-From-To: freebsd-bugs-freebsd-net
Responsible-Changed-By: brucec
Responsible-Changed-When: Mon Mar 23 21:45:54 UTC 2009
Responsible-Changed-Why: 
Over to maintainer(s).
  


rwatson@ saw this crop up in -CURRENT and I believe he has a fix. Not 
sure about MFC but it clearly needs to get fixed...


cheers,
BMS
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org


Re: kern/132722: [ath] Wifi ath0 associates fine with AP, but DHCP or IP does not work

2009-03-23 Thread Bruce M Simpson

John Hay wrote:

I found doing a -bgscan before it happens, make it not happen. I now
have -bgscan in my rc.conf.
  


That's exactly the workaround I needed. Thanks John.

As Sam points out, the root fix is probably already in HEAD; it would be 
nice to find time to backport, but this works for us for now as a 
workaround (we are just using ath0 as a STA for testing in the lab at 
the moment, it is likely we will use hostap later).


cheers,
BMS
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org


Re: kern/132722: [ath] Wifi ath0 associates fine with AP, but DHCP or IP does not work

2009-03-23 Thread Bruce M Simpson
The following reply was made to PR kern/132722; it has been noted by GNATS.

From: Bruce M Simpson b...@incunabulum.net
To: John Hay j...@meraka.org.za
Cc: Matthias Apitz g...@unixarea.de, freebsd-net@freebsd.org, 
 Sam Leffler s...@freebsd.org,
 Sean C. Farley s...@freebsd.org, bug-follo...@freebsd.org
Subject: Re: kern/132722: [ath] Wifi ath0 associates fine with AP, but DHCP
 or IP does not work
Date: Tue, 24 Mar 2009 01:08:33 +

 John Hay wrote:
  I found doing a -bgscan before it happens, make it not happen. I now
  have -bgscan in my rc.conf.

 
 That's exactly the workaround I needed. Thanks John.
 
 As Sam points out, the root fix is probably already in HEAD; it would be 
 nice to find time to backport, but this works for us for now as a 
 workaround (we are just using ath0 as a STA for testing in the lab at 
 the moment, it is likely we will use hostap later).
 
 cheers,
 BMS
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org


Re: IGMP+WiFi panic on recent kernel - in igmp_fasttimo()

2009-03-14 Thread Bruce M Simpson

Sam,

Sam Leffler wrote:
This patches avoids the crash.  Not sure how ifma_protospec is 
supposed to be handled so I'm not committing it.


Thanks for this.

I have a test machine ready to be prepped but it's missing a CF card (I 
have none) so need to pick one up from a friend. I have a pci-cardbus 
adapter + a ral(4) CardBus card, but no CardBus ath(4) -- I imagine this 
ain't specific to ath(4) so that should be fine.


I'll try to look at this Sun/Mon, I have a -CURRENT image built for the 
1U box now that just needs bootstrapping (it has a CF slot).


thanks,
BMS
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org


Re: howto determine network device unit number? device.hints?

2009-01-15 Thread Bruce M. Simpson

Yony Yossef wrote:

Thanks for the explanation.
 
So there's no way to determine this in advance.. 
I must build a script that contains my own mapping between MAC addresses and

the wanted interface names and run it after each driver load, rename the
interfaces if necessary.
It seems quite wrong, don't you agree?
 
And how come the unit number is given an arbitrary value? Is there a good

reason for that?
  


Normally the PCI probe runs in the opposite direction from that of 
Linux. It's largely to do with how the NEWBUS code walks the PCI bus. 
From a systems management point of view, yeah, it's irritating, however 
it would probably take more effort (i.e. kernel code) to try to patch it 
to work differently, and not everyone has free time to sit down and 
patch the kernel.


That and (unlike Solaris) there is no *direct* mapping between the 
card's driver number on the bus and its network driver number.


In your case I'm not sure why your two cards would flip order. Could it 
be how your BIOS and hardware set up the PCI IDSEL lines at boot?



___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org


Re: howto determine network device unit number? device.hints?

2009-01-15 Thread Bruce M. Simpson

Yony,

Bruce M. Simpson wrote:


And how come the unit number is given an arbitrary value? Is there a 
good

reason for that?
  

...

In your case I'm not sure why your two cards would flip order. Could 
it be how your BIOS and hardware set up the PCI IDSEL lines at boot?


If this is the case on your system, then you really need to provide more 
data about your hardware, i.e. motherboard, BIOS, vendor information 
etc. as others point out.


Based on the data you've provided about the issue to date, my best guess 
is that something in the above is different on your system (which is why 
I mentioned IDSEL lines -- the mechanism PCI uses to actually assign bus 
numbers electrically).


Normally the behaviour of FreeBSD's bus probes is well known -- nexus is 
walked for child buses, then these buses are plumbed into NEWBUS, e.g. 
cpu0...cpuN on nexus itself, PCI buses, and PCI subordinate buses in 
that order.


* You mention you don't encounter the issue with Linux, but you may 
already be aware that udev can tie driver instance number(s) to specific 
MAC addresses, although this process isn't fully automatic and any given 
distro may or may not create the persistent udev rules on a first run -- 
so this is comparing apples with oranges.


* [PCI-Express is a special case though, and I've had to sit down and do 
some work with commercial clients to make sure their appliance was able 
to detect devices being in particular slot numbers. Again, though, it's 
just as subject to the PCI enumeration order further up on the bus 
hierarchy as non-PCI-Express drivers.]


So your issue may not be a simple matter of this seems wrong, this 
doesn't work, though I am sorry to hear it isn't working for you right now.


There are a lot of dynamic factors in the overall picture of the system, 
and what seems to work as expected for many users, may not be working 
for you, and we really need basic hardware information, when folk see 
things like this happening, for any volunteer(s) out there to come up 
with the right solution, let alone the true picture of what's actually 
going on in your specific case.


thanks
BMS
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org


Re: howto determine network device unit number? device.hints?

2009-01-15 Thread Bruce M. Simpson

Eygene Ryabinkin wrote:

...
I wanted to stress only one point: simple 'kldunload driver' and
'kldload driver' makes devices to flip for Yony's case.  This means
that unless some PCI hotplug stuff is here (which I don't believe to be
present, because no physical cards are touched and there is actually a
small amount of PCI hotplug support in FreeBSD), no physical PCI devices
get added or removed from the PCI child tree.  It looks like that
something goes wrong during the PCI tree reprobe on the driver module
loading.
  


BTW: Thanks for looking further at the software layer first.

VIM is a wee bit easier to use than a bus analyzer.

Most motherboards don't support PCI geographical addressing, so... I 
wager it's the network driver code which may be the source of the 
problem, based on your analysis!


If this code just doing a blind bump of an instance count and using that 
as a unit number... well, that's OK and expected for software virtual 
devices, but is counter-intuitive for something like hardware.


But I don't have any mtnic source, so this is pure speculation on my part.


Correct me if I am wrong, but pci_driver_added from /sys/pci/pci.c will
invoke device_get_children() to get the list of the attached devices,
and for PCI case the list should be static.
  


Yup, that's right.


I guess that when Yony will enable verbose boot and will show us kernel
messages from two successive kldunload/kldload sequences, we will get
some additional information about what's going on.
  


Hopefully he will chime in...

[bms does some google searching *before* he thinks about throwing his 
toys out of the pram at the Orignal.Poster.]


ding :-) [a light bulb above bms' head]

So... Yony. you're writing a driver.
Maybe there's a bug in it?
That's cool, dude.
Hope it's a nice card and you plan on sharing the sweets with the rest 
of the class. ;-)


But seriously, please mention that you are writing a driver in general 
questions you might ask about the whole system, otherwise, FreeBSD 
volunteers will run around going Is core code broken? and that's not 
so good for community stress levels as a whole.


with lemonade,
BMS
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org


Re: Having problems with limited broadcast

2009-01-08 Thread Bruce M. Simpson

Peter Steele wrote:

...
It's really a matter of time. We didn't anticipate limited broadcast
being broken in FreeBSD and we're scrambling to come up with a solution.
To be quite frank I haven't done anything with IPv6 before so it would
be more research to get up to speed on this option. It seems our best
option is scapy, which unfortunately I also haven't used before...
  


It's not broken -- it has always been this way in all BSD derived 
networking stacks.


Limited broadcast addresses just don't contain any information about 
where the datagram should go, and this is the case in all other 
implementations. They are similar to multicast addresses in that regard.


Linux has a knob SO_BINDTODEVICE which is partly there to workaround 
this problem, however it isn't the ideal semantic fit.


The folk who point out that link-local addresses could be used, have an 
interesting suggestion which might work for you.


thanks
BMS
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org


Re: Having problems with limited broadcast

2009-01-08 Thread Bruce M. Simpson

Peter Steele wrote:

The folk who point out that link-local addresses could be used, have

an 
  

interesting suggestion which might work for you.



It's definitely interesting, but it is very likely that some of our
customers will want to be able to set their own IP ranges and not be
limited to 169.254/16. So we need a more generic solution.


Sounds like it's bpf/pcap city for you guys.

A similar bump-in-the-stack to SO_BINDTODEVICE, e.g. let's call it 
IP_SENDIF has been on the drawing board, but it needs appropriate 
security screening -- the ability to bypass the forwarding tables, 
whilst specifying an interface e.g. by index or name, would be desirable 
only for certain privileged processes.


BTW: If you guys are already looking at scapy, you may also wish to give 
pcs.sourceforge.net a look as an alternative.


It is a Python project which I did some hacking on with George 
Neville-Neill who started it. It has BPF/PCAP support out of the box and 
has a number of powerful features, including a packet-level expect() 
facility, which works in a very similar manner to pexpect (Python expect 
for text streams).


I added a scapy-like concatenation syntax ('/' operator) to it as that 
makes plugging packet chains together that much easier.


I have the beginnings of an IGMPv3 test suite in my home repo written 
using PCS, it uses pcap capture. I imagine a DHCP like protocol could 
easily be implemented using PCS too.


cheers
BMS

___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org


Re: Having problems with limited broadcast

2009-01-08 Thread Bruce M. Simpson

Peter Steele wrote:

...
I personally like this idea, but I'm not sure I can sell it to the
others. Are there any restrictions to these 169.254.x.y addresses?
  


169.254.0.0/16 must never appear outside a link -- it is strictly scoped 
to that link.


Currently the IPv4 BSD stack has no concept of link-scoped addresses, 
but IPv6 does. Link is a realized concept there because of KAME's 
support for the %ifname syntax. Internally, interface indexes get used.


In practice this shouldn't be an issue as long as you can guarantee 
different addresses are used for the 169.254.0.0/16 block on each 
interface, however, it would mean any app using sockets would need to 
explicitly bind to the local address to ensure the correct interface is 
used. Furthermore, we effectively need to be able to support multiple 
next-hops for the 169.254.0.0/16 prefix, otherwise we can support only 
one such interface w/o significant kernel code rewrites.


So, really, LL may not buy you anything at all, and it's likely you need 
to go straight to pcap for your app. These restrictions have existed for 
years, and the fact that they haven't been addressed has largely been 
because there has been no community strategy to deal with it. I 
speculate some BSD-using organisations might have already solved these 
problems, however, without evidence (and code sharing), that's pure 
speculation.


cheers
BMS
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org


Re: Having problems with limited broadcast

2009-01-08 Thread Bruce M. Simpson

Bruce M. Simpson wrote:

Peter Steele wrote:

...
I personally like this idea, but I'm not sure I can sell it to the
others. Are there any restrictions to these 169.254.x.y addresses?
  


169.254.0.0/16 must never appear outside a link -- it is strictly 
scoped to that link.


P.S. I checked in a change to ip_forward() a while back which enforces 
this, as forwarding such traffic between interfaces without NATting it 
or otherwise proxying it is a really bad idea (and also breaks the IPv4 
LL RFC).

___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org


Re: Having problems with limited broadcast

2009-01-07 Thread Bruce M. Simpson

Peter Steele wrote:

..

Based on the discussion in the link above, it doesn't seem like the
problem was entirely resolved by the patches mentioned in this thread.
Has anything been done since this discussion took place. Surely there
must be a way to get limited broadcast to work under FreeBSD.
  


You will need to go to the pcap layer to send limited broadcasts w/o any 
IPv4 addresses configured in a BSD stack for now. If you have an IP on 
the interface, you can just use IP_ONESBCAST.


thanks
BMS
 


___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org
  


___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org


Re: Heads up --- Thinking about UDP and tunneling

2008-12-11 Thread Bruce M. Simpson

Hi,

I am missing context of what Max's suggestion was, do you have a 
reference to an old email thread?


Style bugs:
* needs style(9) and whitespace cleanup.
* C typedefs should be suffixed with _t for consistency with other 
kernel typedefs.

* Function typedefs usually named like foo_func_t (see other subsystems)

Have you looked at m_apply() ? It already exists for stuff like this 
i.e. functions which act on an mbuf chain, although it doesn't 
necessarily expect chain heads.


cheers
BMS

___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org


Re: last call for L2/L3 rewrite code review

2008-12-11 Thread Bruce M. Simpson

Hi,

Just skimming this I notice it uses the if_afdata[AF_INET] pointer 
purely for lltbl purposes; this clashes with the IGMPv3 code drop.


Please look in the bms_netdev branch, where I introduce a 'struct 
ip_ifinfo' to make more general use of that slot. IGMPv3 needs to store 
per-interface state for AF_INET, so this slot really needs to be shared 
with other AF_INET stuff.


Looks like it needs to be updated for VIMAGE also, hopefully others more 
familiar with this can help -- I am busy enough with non-programming 
activity as it is to get up to speed on this, although I have at least 
managed to print Julian's write-up...


Other than that, it looks like a much needed improvement and we are all 
very grateful for our work on this.


thanks
BMS
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org


Re: how to program a driver?

2008-12-09 Thread Bruce M. Simpson

Espartano wrote:

Actually i know how to program with C language in a basic level but i
don't know nothing about hardware or computer organization, what
topics i should study for gain knowledges about net-drivers ? or if
someone can recommend me books about this topic  i will be very
thankful.
  


Try The Indispensable PC Hardware Book by Hans-Peter Messmer for a 
general overview of PC architecture.


cheers
BMS
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: how to program a driver?

2008-12-09 Thread Bruce M. Simpson

[Resend to list for everyone]

Espartano wrote:

Actually i know how to program with C language in a basic level but i
don't know nothing about hardware or computer organization, what
topics i should study for gain knowledges about net-drivers ? or if
someone can recommend me books about this topic  i will be very
thankful.
  


   The seminal work is TCP/IP Illustrated Volume 2 (Gary Wright and W. 
Richard Stevens, Addison-Wesley). Whilst dated it will give you an 
overview of how all the parts in the BSD networking stack fit together.
   It really needs to be updated, however enough things are in flux 
right now that summarising all the changes would be difficult until say 
after FreeBSD 8.0 dust is settled.


   For computer architecture, probably best to learn PC architecture 
these days -- x86 is here to stay, kids, and Netbooks are something of a 
reactionary response triggered by the One-Laptop-Per-Child (OLPC) 
project. In my day, I learned 68000 assembly and C on the Amiga.


   Hans-Peter Messmer's The Indispensable PC Hardware Book is a huge 
book which cost me about 50 GBP new when I first bought it -- I was 
working in a reasonably well paid job at the time, but it can be found 
second hand no doubt around the world.
   Cover to cover it will tell you what you need to know about how the 
PC architecture fits together, but if you need more detail e.g. on stuff 
like FreeBSD network drivers, again, it's best to refer back to the 
source code itself.


Hope this helps.

cheers
BMS
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Vimage howto

2008-12-08 Thread Bruce M. Simpson

Julian,

Thank you (and Marko) very much for preparing this document.

The VIMAGE import has had me at something of an impasse re: the IGMPv3 
branch and clearly written documentation is a big help indeed.


Julian Elischer wrote:

Well not completely, but I've had a number of questions over the
last few months about what it is, so, as Marko and I have written
the following how to virtualize your module document, I've been
directing people to it. After another couple of questions I think
this could do with wider distribition..


Thank you also for providing it here on the list, as opposed to relying 
on Perforce alone. Whilst I understand committers rate p4 for 
experimental work in the FreeBSD sphere, sadly it is simply not 
accessible to the not-so-silent majority in the FreeBSD sphere who are 
not committers, which makes its continued use questionable at best.


regards,
BMS
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: How to support an Ethernet PHY without ID registers?

2008-10-11 Thread Bruce M. Simpson

Sepherosa Ziehau wrote:

Are you sure you could read from BMSR?  Return invalid value from BMSR
is the usual cause of miibus attaching/probing failure.  For ID1/ID2
reading, you could just fake some values in npe(4)'s miibus_readreg
implementation.
  


Thanks for the tip (from you and Pyun). I had to spoof the BMSR read to 
get npe(4) to attach just to begin with. For whatever reason the chip 
doesn't seem to respond on any of the PHY IDs which the Linux folk are 
using (5 and 4 for npe0 (-B) and npe1 (-C) respectively).


I noticed the ucLinux folk needed a similar patch to force driver attach 
under Linux w/the IXP: 
http://mailman.uclinux.org/pipermail/uclinux-dev/2005-March/031419.html


The switch pretty much disappears after npe(4) attaches, I don't see any 
activity lights or link lights at that point. This seems to happen after 
any mii register access.


If I frob things to allow rlswitch to attach, by using hints and hacking 
if_npe.c, I can get dumps of the PHY register space, but it's all ones, 
suggesting that it failed at xScale register level -- that would suggest 
the PHY IDs are *wrong*, or something else isn't right.


Pyun also suggested trying to manually take the PHYs out of power-down 
mode. I tried that with a code snippet I sent him, but still no dice. I 
can't even be sure that the PHYs are being addressed right.


At this point I kind of have to go, whoah, wish I had a logic analyzer 
and grabbers! I believe the firmware configures the switch chip in a 
certain VLAN configuration which isn't meant to be disrupted, although 
Freecom's own SnapGear-based distro apparently does the right thing.


I've looked through all of their GPL materials and cannot find the 
driver for the switch.


I suppose one thing I could try is re-flashing the box with the official 
Freecom firmware, and using mii-diag to dump out what Linux thinks the 
registers are.


thanks
BMS
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to [EMAIL PROTECTED]


How to support an Ethernet PHY without ID registers?

2008-10-07 Thread Bruce M Simpson

Hi,

I have been trying to get FreeBSD onto the Freecom FSG3 Storage Gateway.
It is an xScale based ARM system.

Whilst the npe(4) driver appears to attach, the PHY does not. It is a 
Realtel RTL8305SB switch chip in dual miibus mode. Unfortunately the 
RTL8305SB does not have ID registers. The RTL8305SC does, but it's a 
totally different chip.


We do have a driver in the tree for the RTL8305SC, however these chips 
are different enough for this to cause problems.


Is there any way I could for example force ukphy(4) to attach?

Note: Because there are no ID registers, mii_phy_probe_gen() WILL NOT 
work. It looks like I'd have to override this by hacking if_npe.c 
itself. Can anyone clarify?


cheers
BMS
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Initialisation of a networking protocol

2008-09-29 Thread Bruce M. Simpson

Hi Ryan,

Did you initialize the .pr_init member of struct protosw for MPLS?

AFAIK, MPLS does not use an outer IP header, so adding a struct 
ipprotosw won't work; they are similar structs however.


cheers
BMS
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: ACE on FreeBSD?

2008-09-24 Thread Bruce M. Simpson

Hi,

I looked at ACE years and years ago (~1997) when Doug Schmidt was first 
promoting the ideas behind it. The whole Reactor/Proactor split pretty 
much hangs on the event dispatch which your particular OS supports.


The key observation is whether your target OS implements events in an 
edge-triggered or level-triggered way; I am borrowing definitions from 
electronic engineering here.


You could do a straight port with Proactor, but performance will 
probably suck, because both FreeBSD (and Linux, I believe) need to 
emulate POSIX asynchronous I/O operations.


Reactor will generally fare better on UNIX derived systems such as 
FreeBSD and Linux, because its event handling primitives are geared 
towards the level-triggered facilities provided by select().


In Windows, Winsock events use asynchronous notifications which may be 
tied to Win32 EVENT objects, and the usual Kernel32.DLL thread 
primitives are used around this. This makes Proactor more appropriate in 
that environment.


XORP does some similar stuff to ACE under the hood to support the native 
socket facilities of both Windows and FreeBSD/Linux. It's hybridized but 
it behaves more like Reactor because we run in a single thread, and you 
have to force Winsock's helper thread to run, by preempting you, using 
some file handle and socket tricks.


I don't currently know about stability of ACE on FreeBSD.

cheers
BMS
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Proposed patch, convert IFQ_MAXLEN to kernel tunable...

2008-09-24 Thread Bruce M. Simpson

Hi,

I agree with the intent of the change that IPv4 and IPv6 input queues 
should have a tunable queue length. However, the change provided is 
going to make the definition of IFQ_MAXLEN global and dependent upon a 
variable.


[EMAIL PROTECTED] wrote:

Hi,

It turns out that the last time anyone looked at this constant was
before 1994 and it's very likely time to turn it into a kernel
tunable.  On hosts that have a high rate of packet transmission
packets can be dropped at the interface queue because this value is
too small.  Rather than make a sweeping code change I propose the
following change to the macro and updating a couple of places in the
IP and IPv6 stacks that were using this macro to set their own global
variables.
  


This isn't appropriate for many uses of ifq's which might be internal to 
a given driver or subsystem, and which may use IFQ_MAXLEN for 
convenience, as Ruslan has pointed out. I have code elsewhere which does 
this.


Can you please do this on a per-protocol stack basis? i.e. give IPv4 and 
IPv6 their own TUNABLE queue length.


thanks
BMS
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Proposed patch, convert IFQ_MAXLEN to kernel tunable...

2008-09-24 Thread Bruce M. Simpson

[EMAIL PROTECTED] wrote:

...
I found no occurrences of the above in our code base.  I used cscope
to search all of src/sys.  Are you aware of any occurrences of this?
  


I have been using IFQ_MAXLEN to size buffer queues internal to some 
IGMPv3 stuff.


I don't feel comfortable with a change which sizes the queues for both 
IPv4 and IPv6 stacks, from a variable which is obscured by a macro.


thanks
BMS
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: lost routes

2008-09-24 Thread Bruce M. Simpson

Giulio Ferro wrote:
 
There are no messages in the logs, and no interface has been

touched. Anyway, since there are a lot of routes and only one
gets deleted I don't think it depends on interface changing
(it would delete them all, wouldn't it?)


Normally static routes only get touched if the state of the underlying 
ifp/ifa changes. There are paths in netinet which will cause routes to 
be deleted in this situation.


Occasionally the idea of a floating static re-surfaces... look in the PR 
database with this term for possibly related reports.


cheers
BMS

___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: kern/127528: [icmp]: icmp socket receives icmp replies not owned by the process.

2008-09-22 Thread Bruce M. Simpson

Chris Buechler wrote:


This PR is bogus because:
ICMP has no concept of datagrams being owned by a process. There is 
no field in the ICMP protocol which differentiates ICMP sessions on 
a per-process basis, and this is because ICMP has no concept of 
sessions -- ICMP messages are directed at IP endpoints.


ICMP echo and echo replies do have sessions of sorts, at least 
unique identifying fields - identifier and sequence number.


These fields do exist in ICMP, and as you point out, they are sometimes 
used to implement session-like behaviour.  Many NAT implementations use 
them in this way.


However there is no way of specifying them in a bind() call -- ICMP can 
only be received on a raw socket, and raw sockets will not filter these 
things on behalf of a user process, nor have they ever done to the best 
of my knowledge. They are not part of the address structures for a raw 
socket (SOCK_RAW, PF_INET, * or IPPROTO_ICMP).




This was opened by a pfSense maintainer because it's a change in 
behavior from 6.x releases where this was never an issue, and is 
something we feel is a regression.


Robert has replied outlining a few situations where the behaviour might 
have changed.


Raw sockets do support binding laddr/faddr, there is the possibility 
this could have changed, however there is no notion of processes 
owning streams of ICMP messages, this has never been part of the ICMP 
protocol and to think in these terms is misleading.


It sounds to me as though the application is relying on a form of 
filtering which isn't happening, and the way to track this down is to 
carefully note what, if anything, changed in the expected behaviour 
between releases.


For example, does the application bind() to any given host addresses? 
This is the only form of filtering, apart from multicast SSM, that raw 
sockets would support, and SSM ain't in the tree [yet].


thanks
BMS
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: kern/127528: [icmp]: icmp socket receives icmp replies not owned by the process.

2008-09-21 Thread Bruce M. Simpson

[EMAIL PROTECTED] wrote:

Old Synopsis: icmp socket receives icmp replies not owned by the process.
New Synopsis: [icmp]: icmp socket receives icmp replies not owned by the 
process.
  


This PR is bogus because:
ICMP has no concept of datagrams being owned by a process. There is no 
field in the ICMP protocol which differentiates ICMP sessions on a 
per-process basis, and this is because ICMP has no concept of sessions 
-- ICMP messages are directed at IP endpoints.


The networking stack will only selectively dispatch ICMP traffic based 
on two conditions:

1. ip_proto number (raw sockets may selectively bind to a protocol) and
2. multicast group membership (not applicable in this instance).

 It also shows that both echo requests have different identifiers in 
the id field which should keep the icmp streams seperated.


There is absolutely no requirement for the kernel code to look at the ID 
field, beyond reporting it to consumers of the SOCK_RAW interface.


This PR can be closed, the submitter should consult the pfSense maintainers.

thanks
BMS





___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: kern/127528: [icmp]: icmp socket receives icmp replies not owned by the process.

2008-09-21 Thread Bruce M. Simpson
The following reply was made to PR kern/127528; it has been noted by GNATS.

From: Bruce M. Simpson [EMAIL PROTECTED]
To: [EMAIL PROTECTED]
Cc: freebsd-net@FreeBSD.org, [EMAIL PROTECTED]
Subject: Re: kern/127528: [icmp]: icmp socket receives icmp replies not owned
 by the process.
Date: Sun, 21 Sep 2008 23:12:30 +0100

 [EMAIL PROTECTED] wrote:
  Old Synopsis: icmp socket receives icmp replies not owned by the process.
  New Synopsis: [icmp]: icmp socket receives icmp replies not owned by the 
  process.

 
 This PR is bogus because:
 ICMP has no concept of datagrams being owned by a process. There is no 
 field in the ICMP protocol which differentiates ICMP sessions on a 
 per-process basis, and this is because ICMP has no concept of sessions 
 -- ICMP messages are directed at IP endpoints.
 
 The networking stack will only selectively dispatch ICMP traffic based 
 on two conditions:
  1. ip_proto number (raw sockets may selectively bind to a protocol) and
  2. multicast group membership (not applicable in this instance).
 
   It also shows that both echo requests have different identifiers in 
 the id field which should keep the icmp streams seperated.
 
 There is absolutely no requirement for the kernel code to look at the ID 
 field, beyond reporting it to consumers of the SOCK_RAW interface.
 
 This PR can be closed, the submitter should consult the pfSense maintainers.
 
 thanks
 BMS
 
 
 
 
 
 ___
 freebsd-net@freebsd.org mailing list
 http://lists.freebsd.org/mailman/listinfo/freebsd-net
 To unsubscribe, send any mail to [EMAIL PROTECTED]
 
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: reading routing table

2008-09-18 Thread Bruce M. Simpson

Debarshi Ray wrote:

...
By the way, would you want someone to implement 'show' support for
FreeBSD's route implementation? I can give it a go now. :-)
  


For sure, we'd be very happy to see a patch like that.

Many thanks
BMS
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to [EMAIL PROTECTED]


Problem with IFDATA_DRIVERNAME sysctl

2008-09-09 Thread Bruce M Simpson

Whenever I call this sysctl, I get an errno of EPROGNOTAVAIL from sysctl():

»···name[0] = CTL_NET;
»···name[1] = PF_LINK;
»···name[2] = NETLINK_GENERIC;
»···name[3] = IFMIB_IFDATA;
»···name[4] = ifindex;
»···name[5] = IFDATA_DRIVERNAME;

»···len = IFNAMSIZ;
»···if (sysctl(name, 6, dname, len, NULL, 0) == -1) {
»···»···warnc(EX_OSERR, cannot obtain driver name for ifname %s,
»···»···ifname);
»···»···return (-1);
»···}

The ifindex is valid. dname is a pointer to an IFNAMSIZ sized buffer. 
This problem is happening on a 7.0-RELEASE system.


It looks like the switch..case in that path could be fubar'd by the 
compiler as there are not break statements for each distinct case label, 
could this be due to gcc friendly fire?


cheers
BMS


___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Problem with IFDATA_DRIVERNAME sysctl

2008-09-09 Thread Bruce M. Simpson

Bruce M Simpson wrote:


It looks like the switch..case in that path could be fubar'd by the 
compiler as there are not break statements for each distinct case 
label, could this be due to gcc friendly fire?


Possibly false alarm or PEBKAC, I wasn't checking return values right in 
some of my code, although we should probably have break there anyway.


Patch against RELENG_7_0.
--- if_mib.c.orig   2008-09-10 00:31:25.0 +0100
+++ if_mib.c2008-09-10 00:32:15.0 +0100
@@ -90,6 +90,7 @@
switch(name[1]) {
default:
return ENOENT;
+   break;
 
case IFDATA_GENERAL:
bzero(ifmd, sizeof(ifmd));
@@ -136,6 +137,7 @@
error = SYSCTL_IN(req, ifp-if_linkmib, ifp-if_linkmiblen);
if (error)
return error;
+   break;
 
case IFDATA_DRIVERNAME:
/* 20 is enough for 64bit ints */
@@ -152,6 +154,7 @@
error = EPERM;
free(dbuf, M_TEMP);
return (error);
+   break;
}
return 0;
 }
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to [EMAIL PROTECTED]

Re: how to read dynamic data structures from the kernel (was Re: reading routing table)

2008-09-02 Thread Bruce M. Simpson

Luigi Rizzo wrote:

do you know if any of the *BSD kernels implements some good mechanism
to access a dynamic kernel data structure (e.g. the routing tree/trie,
or even a list or hash table) without the flaws of the two approaches
i indicate above ?
  


Hahaha. I ran into an isomorphic problem with Net-SNMP at work last week.

   There's a need to export the BGP routing table via SNMP. Of course 
doing this in our framework at work requires some IPC calls which always 
require a select() (or WaitForMultipleObjects()) based continuation.
   Net-SNMP doesn't support continuations at the table iterator level, 
so somehow, we need to implement an iterator which can accomodate our 
blocking IPC mechanism.


  [No, we don't use threads, and that would actually create more 
problems than it solves -- running single-threaded with continuations 
lets us run lock free, and we rely on the OS's IPC primitives to 
serialize our code. works just fine for us so far...]


   So we would end up caching the whole primary key range in the SNMP 
sub-agent on a table OID access, a technique which would allow us to 
defer the IPC calls providing we walk the entire range of the iterator 
and cache the keys -- but even THAT is far too much data for the BGP 
table, which is a trie with ~250,000 entries. I hate SNMP GETNEXT.


   Back to the FreeBSD kernel, though.

   If you look at in_mcast.c, particularly in p4 bms_netdev, this is 
what happens for the per-socket multicast source filters -- there is the 
linearization of an RB-tree for setsourcefilter().
   This is fine for something with a limit of ~256 entries per socket 
(why RB for something so small? this is for space vs time -- and also it 
has to merge into a larger filter list in the IGMPv3 paths.)
   And the lock granularity is per-socket. However it doesn't do for 
something as big as a BGP routing table.


   C++ lends itself well to expressing these kinds of smart-pointer 
idioms, though.
   I'm thinking perhaps we need the notion of a sysctl iterator, which 
allocates a token for walking a shared data structure, and is able to 
guarantee that the token maps to a valid pointer for the same entry, 
until its 'advance pointer' operation is called.


Question is, who's going to pull the trigger?

cheers
BMS

P.S. I'm REALLY getting fed up with the lack of openness and 
transparency largely incumbent in doing work in p4.


Come one come all -- we shouldn't need accounts for folk to see and 
contribute what's going on, and the stagnation is getting silly. FreeBSD 
development should not be a committer or chum-of-committer in-crowd.

___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: reading routing table

2008-09-01 Thread Bruce M. Simpson

Debarshi Ray wrote:

I am implementing a library/utility which basically encompasses the
features of the traditional route utilities and those of newer tools
(like ip from iproute2), which are mostly specific to a particular
kernel. The overpowering objective is to make the library/utility work
uniformly across all different kernels, so that programs like
NetworkManager have a portable library/utility to use instead of the
Linux-kernel specific ip which is now being used.
  


Why don't you just use XORP's FEA code?
It already does all this under a BSD-type license.

cheers
BMS

___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: reading routing table

2008-09-01 Thread Bruce M. Simpson

Debarshi Ray wrote:

...
I was going through the FreeBSD and NetBSD documentation and the
FreeBSD sources of netstat and route. I was suprised to see that while
NetBSD's route implementation has a 'show' command, FreeBSD does not
offer any such thing. Moreover it seems that one can not read the
entire routing table using the PF_ROUTE sockets and RTM_GET returns
information pertaining to only one destination. This suprised me
because one can do such a thing with the Linux kernel's RTNETLINK.

Is there a reason why this is so? Or is reading from /dev/kmem the
only way to get a dump of the routing tables?
  


You want 'netstat -rn' to dump them, this is a very common command which 
should be present in a number of online resources on using and 
administering FreeBSD so I am somewhat surprised that you didn't find it.


P.S. Look in the sysctl tree if you need to snapshot the kernel IP 
forwarding tables. You can use kmem, but it is generally frowned upon 
unless you're working from core dumps -- kernels can be built without 
kmem support, or kmem locked down, etc.


cheers
BMS
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: reading routing table

2008-09-01 Thread Bruce M. Simpson

Debarshi Ray wrote:

Why don't you just use XORP's FEA code?
It already does all this under a BSD-type license.



I was not aware of it. What does it do? Is it portable across other
OSes or is it *BSD specific?
  


XORP's FEA process is responsible for talking to the underlying 
forwarding plane. It supports *BSD, Linux, MacOS X, and Microsoft Windows.


Over the last year there was a refactoring where the forwarding table 
management got split into plugin-like modules. It is written in C++ 
although it's likely this split might make integration into other 
projects easier.


Normally that support all goes into a single process, rather than being 
linked into many.


cheers
BMS
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Code review request

2008-08-24 Thread Bruce M. Simpson

M. Warner Losh wrote:

I've been shepherding this patch in my p4 tree for a long time.  It
removes the obsolete support for other systems in if_spppsubr.c.  Is
there a reason I shouldn't commit this?
  


Looks fine to me.
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: [CFT/R] IPv4 source address selection

2008-08-24 Thread Bruce M. Simpson

Bjoern A. Zeeb wrote:

Hi,

I have a patch, that was inspired by work from Y!, to do porper
IPv4 source address selection for unbound sockets (with multi-IP
jails).


Hi,

This kinda overlaps with some other ideas I'd like to see go in. It 
looks good and if it's already been tested, it should probably go in 
anyway as it disentangles the logic and puts it in a separate function.


I'm thinking we may wish to use criteria other than interface or jailed 
socket to select source address.


I should point out though that we picked some stuff up from KAME to do 
source address selection but it's not in the IPv4 stack.


cheers
BMS
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Small patch to multicast code...

2008-08-22 Thread Bruce M. Simpson

[EMAIL PROTECTED] wrote:

I gather you mean that a fast link on which also we're looping back
the packet will be an issue?  Since this packet is only going into the
simloop() routine.
  


We end up calling if_simloop() from a few interesting places, in 
particular the kernel PIM packet handler.


In this particular case we're going to take a full mbuf chain copy every 
time we send a packet which needs to be looped back to userland.


  
I was actually hoping, as the person who last hacked this code, that
you might have a suggestion as to a right fix.  
  


It's been a while since I've done any in-depth FreeBSD work other than 
hacking on the IGMPv3 snap, and my time is largely tied up with other 
work these days, sadly.


It doesn't seem right to my mind that we need to make a full copy of an 
mbuf chain with m_dup() to workaround this kind of problem.


Whilst it may suffice for a band-aid workaround, we may see mbuf pool 
fragmentation as packet rates go up.


However we are now in a new world order where mbuf chains may be very 
tied to the device where they've originated or to where they're going. 
It isn't clear to me where this kind of intrusion is happening.


In the case of ip_mloopback(), somehow we are stomping on a read-only 
copy of an mbuf chain. The use of m_copy() with m_pullup() there is fine 
according to the documented uses of mbuf(9), although as Luigi pointed 
out, most likely we need to look at the upper-layer protocol too, e.g. 
where UDP checksums are also being offloaded.


Some of the code in the IGMPv3 branch actually reworks how loopback 
happens i.e. the preference is not to loop back wherever possible 
because of the locking implications. Check the bms_netdev branch history 
for more info.


cheers
BMS
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Small patch to multicast code...

2008-08-22 Thread Bruce M. Simpson

[EMAIL PROTECTED] wrote:

Somehow the data that the device needs to do the proper checksum
offload is getting trashed here.  Now, since it's clear we need a
writable packet structure so that we don't trash the original, I'm
wondering if the m_pullup() will be sufficient.
  


If it's serious enough to break UDP checksumming on the wire, perhaps we 
should just swallow the mbuf allocator heap churn and do the m_dup() for 
now, but slap in a big comment about why it's there.


BMS
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Small patch to multicast code...

2008-08-21 Thread Bruce M. Simpson

[EMAIL PROTECTED] wrote:

The only thing i can think of is that it's the UDP checksum,
residing beyond hlen, which is overwritten somewhere in the
call to if_simloop -- in which case perhaps a better fix is
to m_pullup() the udp header as well ?



It is the checksum that gets trashed, yes.
...
The m_*() routines actually have reasonable comments, it just seems
the wrong one was used here.
  


Actually, m_copy() has been legacy for some time now -- see comments.

I'd be concerned that the change to m_dup() (which makes a full mbuf 
chain copy) rather than m_copym() (which bumps refcounts) is going to 
eat into the mbuf clusters on fast links, though it's an easy band-aid 
for the problem.


I agree with Luigi that some of the API contract for mbuf(9) doesn't 
hold any more now that we have TSO and other offload.


cheers
BMS
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: BPF problems on FreeBSD 7.0

2008-07-14 Thread Bruce M. Simpson

Robin Sommer wrote:

Hi all,

we're seeing some strange effects with our libpcap-based application
(the Bro network intrusion detection system) on a FreeBSD 7-RELEASE
system. As the application has always been running fine on 6.x,
we're wondering whether this might be triggered by any of the
changes that went into 7.
  

...


I'm wondering whether anybody here has seen something similar or
might have an idea where to start looking for the cause. Any ideas?
  


One place to start might be: netstat -B output in 7.x (I *think* this 
got MFCed), this will let us see what the drop count is for the Bro 
process, and what the flags are for the open BPF descriptors in the system.


I'm not hot on current BPF internals, but I hazard a guess this is 
related to BPF descriptor buffering -- an area where there have been 
changes, some of which I've eyeballed.


cheers
BMS


___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Route messages

2008-06-15 Thread Bruce M. Simpson

Paul wrote:

Get these with GRE tunnel on
FreeBSD 7.0-STABLE FreeBSD 7.0-STABLE #5: Sun May 11 19:00:57 EDT 
2008 :/usr/obj/usr/src/sys/ROUTER  amd64

But do not get them with 7.0-RELEASE

Any ideas what changed? :)  Wish there was some sort of changelog..
# of messages per second seems consistent with packets per second on 
GRE interface..
No impact in routing, but definitely impact in cpu usage for all 
processes monitoring the route messages.


RTM_MISS is actually fairly common when you don't have a default route.

Messages which get enqueued don't necessarily get delivered -- and very 
few processes actually listen to the routing socket actively like this, 
so I wouldn't worry about it.


If it's a real concern for you then you could try hacking in a sysctl to 
tell the radix trie code not to issue RTM_MISS messages on the routing 
socket.


___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: [Removal of mrouted in FreeBSD-7.0]

2008-06-10 Thread Bruce M. Simpson

Archimedes S. Gaviola wrote:
...if ever there's a way to implement IP multicasting with PIM-SM and 
or PIM-DM in the FreeBSD base system, how big is the work would be?  
What are the things that needs to be considered if we are going to 
implement PIM-SM and or PIM-DM to the current FreeBSD network 
subsystem? The goal is to be able FreeBSD to provide native IP 
multicast using PIM just like the way DVMRP protocol is implemented 
before as part of the base system.


I really think the remit of multicast routing is too wide to be 
addressed in the base system, which is why projects like XORP and pimdd 
exist -- it doesn't strike me as a good fit for the FreeBSD base system. 
Separate projects already exist for this.


If someone is willing to commit to all the man-hours involved in the 
reimplementation and ongoing support of such a thing, blimey... they 
must have a lot of free time!


cheers
BMS
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Probable Bug in tcp.h

2008-06-09 Thread Bruce M. Simpson

Marc Lörner wrote:

off0 is 0x14 = no problem with that
but address of ip is 0xe00021c8706e = not correct aligned to 32-bits

Can anyone tell me, where ip is allocated, so I can do a little bit more 
research?
  


It really depends on the context! That's a very wide ranging question.

It depends upon whether mbuf chains are flowing up or down the stack, 
whether or not the network driver supports checksum or header/segment 
offload, and whether or not it is using zero-copy.


Zero copy transmit normally only has mmu cost if the mbuf (from 
userland) can be mapped to a location where headers are easily 
prepended. Zero copy receive is more expensive and complex as it 
requires that the DMA engine on the network interface card supports 
header splitting.


The FreeBSD stack is known to have some issues with mbuf alignment and 
architectures other than those in its Tier 1.



___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: [Removal of mrouted in FreeBSD-7.0]

2008-06-07 Thread Bruce M. Simpson

Archimedes S. Gaviola wrote:
Hi! I have just read from the FreeBSD-7.0 release notes 
http://www.freebsd.org/releases/7.0R/relnotes.html that the mrouted 
multicast routing protocol (DVMRP implementation) has been removed 
from the base system. I want to know what multicast routing protocol 
will served as replacement to this? The KAME snap kit have PIM-SM and 
PIM-DM implementations but are specific only to IPv6. 


DVMRP is something of a legacy protocol now, most deployments use PIM-SM.

mrouted is still available in ports as other folk have pointed out

If you want a freely available router with full multicast capability, 
please give XORP a try.

___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Understanding the interplay of ipfw, vlan, and carp

2008-06-05 Thread Bruce M. Simpson

Peter Jeremy wrote:

Note that one downside of your carpdev patches is that (AFAIK) it is
no longer possible to identify which host sent the packet: The source
and destination MAC addresses, as well as the destination IP address
are all defined by CARP.  Once you change the source IP address to be
the shared address there's nothing to identify which host sent it.
  


If you really, really wanted to, you could write code to prepend the 
original IP or MAC as an experimental IP option. Options less than 0x80 
are not forwarded in IP fragments.


I can understand why you'd want to do this (debugging springs to mind), 
though it does go against the gist of what carp is and does.


Also, there is compatibility to keep in mind, and it's entirely possible 
that the presence of a new and unknown IP option is going to break 
implementations which don't parse IP option headers correctly, or 
trigger other unwanted behaviour (I don't know what this IP option is 
therefore I will drop it).


___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Probable Bug in tcp.h

2008-06-05 Thread Bruce M. Simpson

Marc Lörner wrote:

..
First of all I have the problam of misalignment of th_off. Because in this way 
always 4 bytes are read and the the bits of th_off are replaced. Then the 4 
bytes are written back. 

But should (th_x and th_off) not only be 1 byte in whole - only read and 
write 1 byte?
  


Which machine architecture are you attempting to compile this code on?

On FreeBSD Tier 1 platforms, the access is probably going to come out of 
L2 cache anyway, so the fields in question will be read by a burst cycle.


It is worth noting that NetBSD changed the base type of tcphdr's 
bitfields to uint8_t, however this shuffles the compiler dependency into 
the treatment of the char type. Most modern C compilers support 
unsigned char.


___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: [GSoC - tcptest] - Regression Tests, Conformance Tests...

2008-06-03 Thread Bruce M. Simpson

Victor Hugo Bilouro wrote:

I've made a lot of changes to it; diffs are with him but I can send folk a
copy of my Mercurial repo.


I would appreciate that.
  

Sent (off-list).

As an example of the new PCS syntax and expect() stuff, I'll forward you 
the IGMPv2 test off-list. (Also sent.)



humm, track state is needed to make TCP tests.
  


It is something you'll have to build yourself around the expect() 
functionality.


The experimental IP reassembly code (in pcs/packets/ipv4sar.py) might be 
a good place to start. It isn't finished, but it should demonstrate the 
general principles -- i.e. you read packets in a loop and you pass them 
to an object which knows what to do, in this case, ipv4sar.


One big problem I had was that the concept of fragmentation requires 
deep copies of PCS objects. I imagine that's less of an issue for TCP 
segmentation, as the situation is made somewhat easier by the fact 
you're dealing with streams.


BTW: My snapshot of PCS fixes the IP and TCP option parsers. If you look 
at the IGMP and DHCP decoders, there is an example of a dictionary 
driven option parser. This could also be applied to TCP where it's 
likely to be useful.


I believe most of the bugs have been shaken out of expect(). The main 
problem is buffering and the fact that expect() depends on non-blocking 
I/O. pcap can return more than one packet from the kernel every time you 
call into the non-blocking dispatch function, so I did some internal 
refactoring to allow expect() to deal with that.


So your code has to be able to deal with multiple matches from the 
Connector, even if you only asked to match at *least* one packet. 
Count is mostly about stopping expect() from hanging the flow of 
control anyway.


The syntax and semantics are intentionally similar to PExpect for 
Python. In fact the IGMPv2 test uses PExpect to drive a QEMU virtual 
machine encapsulated as a Python object, for regression testing the IGMP 
code. So my suggestion is check out PExpect too.



I didn't find his site, can you send me?


http://www.fsmware.com/freebsd/syntest2.py

I've added some Scapy-like syntax to PCS which can make the code look a 
bit smaller.


cheers
BMS
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: [Regarding FreeBSD and RFC Compliance]

2008-06-03 Thread Bruce M. Simpson

Dalibor Gudzic wrote:



Any pointers for someone that wishes to do it?


http://wiki.freebsd.org/NetworkRFCCompliance
...is one place to start...

___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Anyone interested in HDLC support for pppd ?

2008-06-03 Thread Bruce M. Simpson

[EMAIL PROTECTED] wrote:

Hello;

I started playing a bit with net/pppd23 and I noticed there are some patches for 
FreeBSD-3.0 that were never committed (NetBSD certainly has them). Our pppd(8) is derived 
from the samba pppd port and should have them if we want to continue updating 
it.
  


Ed Schouten is currently rewriting the tty code. It sounds like line 
disciplines are about to go away, so pppd23 will most likely stop 
working at that point.


There's a Netgraph node ng_cisco which claims to support HDLC. Perhaps 
tweaking MPD to work with it is a better use of effort.

___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: [Regarding FreeBSD and RFC Compliance]

2008-06-02 Thread Bruce M. Simpson

Archimedes S. Gaviola wrote:

To Whom It May Concerned:

Good day! Is there any document or web site that lists all the 
standard Request for Comments (RFCs) for all the networking protocols 
currently implemented on FreeBSD? This will help users identify what 
specific sections of a standard a certain network protocol is being 
implemented especially interoperability with other platforms.


No, want to compile one and contribute it to the project? We'd be very 
grateful for the help.


cheers
BMS
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: [GSoC - tcptest] - Regression Tests, Conformance Tests...

2008-06-02 Thread Bruce M. Simpson

Victor Hugo Bilouro wrote:

Hi,

I'm in architectural phase of tcptest* development, so, I need
understand every possible test it will need cover, because it would
change tcptest architecture.
  


Hey, have you seen gnn's PCS toolkit?
   http://pcs.sourceforge.net/

I've made a lot of changes to it; diffs are with him but I can send folk 
a copy of my Mercurial repo.


I wrote a set of IGMPv2 and IGMPv3 baseline regression tests using it, 
now that I've added things like expect(), etc.


It might save you a lot of work, although the TCP stuff needs attention. 
With expect() you can track state between segments. I started on IP 
reassembly, but ain't finished.


I think Kip Macy's been using it for testing too, I saw a chunk of 
PCS-using TCP code on his site the other day.


cheers
BMS
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: if_var.h micro-optimization

2008-05-30 Thread Bruce M. Simpson

rihad wrote:

Not sure if this is a worthwhile optimization? FreeBSD 7.0

--- /usr/src/sys/net/if_var.h   2007-12-07 09:46:08.0 +0400
+++ if_var.h2008-05-30 18:10:25.0 +0500
@@ -282,7 +282,8 @@
if (m) {\
if (((ifq)-ifq_head = (m)-m_nextpkt) == NULL) \
(ifq)-ifq_tail = NULL; \
-   (m)-m_nextpkt = NULL;  \
+   else\
+   (m)-m_nextpkt = NULL;  \
(ifq)-ifq_len--;   \
}   \
 } while (0)


It could save dirtying an L2 data cache line at the expense of taking a 
conditional branch, but to evaluate your suggested change requires a lot 
more data. Do you plan to do this? Given how _IF_DEQUEUE() is normally 
used the impact is likely negligible.

___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: if_var.h micro-optimization

2008-05-30 Thread Bruce M. Simpson

rihad wrote:

Bruce M. Simpson wrote:


It could save dirtying an L2 data cache line at the expense of taking 
a conditional branch,
Whoa, why don't you take it easy on me :) I'm not that much into 
kernel (or hardware) programming. It's just that reading Ch. 3 of 
TCP/IP Illustrated Vol.2 by Rich Stevens got me digging around FreeBSD 
source code dealing with struct ifnet, where this piece of code caught 
my attention.


It could be red, it could be yellow. It could be 620nm. Who am I to say 
what is and what isn't? ;-)


There are bound to be situations where the change is a win, and even 
some where there isn't. Context is everything...


but to evaluate your suggested change requires a lot more data. Do 
you plan to do this? 
Perhaps there is already a framework for trying out changes in 
-CURRENT and seeing their relative impact, so perhaps someone more 
experienced than I am can see to this?


All educators are busy right now, please hold and the next available 
dogma merchant will be with you as soon as possible. ;-)


(Hint: No, there isn't a framework I know of, unless you wanna make one? 
Scientific process applies, reproducible results, etc. You could script 
stuff, figure out a way to run the kernel or parts of the network stack 
under Valgrind so it can be L2 profiled w/o running it on a real 
machine... or hack hwpmc so it can be done live.. anything is possible.)




Given how _IF_DEQUEUE() is normally used the impact is likely 
negligible.

Oh, I see. A nice first attempt of mine anyway ;) Thanks.


Don't take my word for it, down that road lies darkness.

Seriously though -- it's easy to introduce bugs doing things like this, 
if anything else it's an exercise in really thinking things through.


cheers
BMS

___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: HEAD UP: non-MPSAFE network drivers to be disabled

2008-05-27 Thread Bruce M. Simpson

Julian Elischer wrote:


While this is a good idea on it's own, the difference between
what that achieves and what a line discipline achieves is that
a line disciplin is hardware independent and can even be used
on a virtual device.
I was under the impression that the back-end for UART was light weight 
enough that it could be used as a virtual device.


For example: Many years ago I tried to get the WinModem working in my 
IBM ThinkPad T23. UART lends itself well to being a wrapper for the DSP 
microcode without having any of the historical tty baggage.


In the case of UART the translation shim moves from on top of the 
device node to underneath, in much the same way as has happened for GEOM.


___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: lagg0.2 style vlans on lagg(4) interface

2008-05-22 Thread Bruce M. Simpson

Hi,

It looks like this patch will cause gratuitous ARP to be queued even 
when the interface is not IFF_UP, is this intentional?


Niki Denev wrote:


I think arp_gratuit() needs a better name.
  


arp_announce() ?


Is if_ethersubr.c:ether_ifattach() good place to register the EVENT hook?
  


ARP is also used by FDDI and IEEE 802.5, as well as anything which 
emulates this. Taking the call to arp_ifinit() out of if_setlladdr() is 
likely to break this code.



And if yes, what would be the best way to handle failure to register
the hook, as the function is void?
  
Should I worry about that, or just print a warning message and continue?
  


I see the C++-style comments - perhaps someone who knows event handlers 
better than I can comment, I believe it's using one of the shared kernel 
malloc pools with M_WAIT.


It looks like this won't run afoul of locking, but it is a change to a 
fairly central path which needs to be considered carefully as it affects 
consumers other than Ethernet drivers.


cheers
BMS
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: carp oddness... BACKUP is ARPing!

2008-05-16 Thread Bruce M. Simpson

Rudy wrote:


The CARP in BACKUP is arping... why?


Without looking at the carp code, I can tell you that its addressing 
hook is implemented as a pass-through in ether_input(). carps are not 
IFT_ETHER, therefore they shouldn't emit gratuitous ARP or otherwise 
when an address is configured on one.


So I'll leave this up to someone who knows the carp code, as this is 
most likely where the ARP originated from.


cheers
BMS
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Proposed patch to the kernel and to netstat...

2008-05-15 Thread Bruce M. Simpson

[EMAIL PROTECTED] wrote:

...
Please email me comments.  I'd like to commit this to HEAD soon.  It
can't be put into 7 without removing the cluster and mbuf counting,
but I might do that as well if there is interest.
  


People writing servers are going to find the watermark stuff useful. I'm 
thinking being able to watch the the buffer stats (possibly also in a 
way which we can graph) for a single socket, given its inpcb or so 
address, would also be a neat trick...


cheers
BMS
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: how to identify a PHY?

2008-05-12 Thread Bruce M. Simpson

Marius Strobl wrote:

If the system is running the simplest thing in order to identifiy
the PHYs is to check the oui= and model= output of `devinfo -v`.
Otherwise boot verbose and check the OUI and model output of 
ukphy(4).
  


There's a project for someone in there I'm sure.

Linux has mii-tool and mii-diag. Whilst we generally don't need all of 
the knobs, sometimes it can be useful to dump and poke PHY registers on 
the MII. src/sys/dev/mii/miibus_if.m contains the newbus interface 
definition for miibus which would be a place to start.


cheers
BMS
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: how to identify a PHY?

2008-05-12 Thread Bruce M. Simpson

Volker wrote:

...
In short my original question better reads as how do I know the kind of
phy if no driver has been attached. Can one retrieve that information
out of a verbose boot dmesg (from probing messages)?
  


You can't determine which PHY is in use unless a driver is attached, 
because it's necessary to attach a driver in order to access the card's 
MII registers. Same with any other OS.


If no PHY driver attached, but a NIC driver attached, you should see 
this message:

   device_printf(dev, MII without any PHY!\n);

It sounds like someone needs to instrument the code path mii_phy_probe() 
to print useful information in the situation you describe.


cheers
BMS
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: IPPROTO_DIVERT and PF_INET6

2008-05-07 Thread Bruce M. Simpson

Julian Elischer wrote:

actually the divert sockets should really not be in PF_INET
they could deliver both inet and inet6 packets.
the sockaddr that they return (and which needs to be read for divert
to make sense) could be used to distinguish between them.


Good point. I'd forgotten that they were abusing the fields in sin_zero. 
This is not OK for IPv6, although the kludge can still be perpetuated by 
looking at sa_len and stashing what divert wants at the end of sockaddr_in6.


So there IS a case for making them a separate protocol family if 
someone's going to do a clean implementation of divert sockets for IPv6.


cheers
BMS

___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Problems with netgraph

2008-05-07 Thread Bruce M. Simpson

Oleksandr Samoylyk wrote:


looks like UDP in PPP in GRE


I think so. Should we hope for some progress in this direction in future?


Probably not, unless someone is willing to come up to the table and 
commit to writing and maintaining a Netgraph node to demux GRE, although 
this is only shuffling the fanout elsewhere.


If MPD is relying on raw sockets to demultiplex GRE, then this is what 
it's up against in terms of performance -- repeated acquisitions of the 
INP sleep lock, and context switches when the socket buffer low water 
mark is passed. It might have improved slightly in HEAD since the move 
to rwlocks.


Like udp_input(), rip_input() suffers from the fact that the stack has 
to deal with delivering datagrams to potentially more than one socket, 
and there is no intermediate data structure to handle the fan-out -- it 
walks the entire inp list every time. If you look at the comments in 
udp_input() it's pretty clear this is a historical weakness in the BSD 
implementation.


Windows, by the way, forces socket clients to explicitly request 
reception of broadcast datagrams as of Windows Server 2003, and 
multicasts are strictly delivered to group members only, which 
eliminates that problematic loop -- you can always maintain a tree of 
receivers that way.


I'm happy to review patches if someone else commits to fixing it.

cheers
BMS
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: IPPROTO_DIVERT and PF_INET6

2008-05-06 Thread Bruce M. Simpson

Julian Elischer wrote:

you could implement a whole new protocol family of which there
was a single protocol..  divert.
That's sheer overkill for what Edwin needs to be able to do. We already 
have a bunch of apps which use divert sockets in the IPv4 space, why 
should the existing semantics change? Divert sockets are still tied to 
the transport you instantiate them with, and they have always been a 
special case anyway depending on where one wishes to draw the lines.


There is no reason per se, that I can see, why the IPPROTO_DIVERT 
identifier can't just be re-used along with pf_proto_register() for 
PF_INET6, and I've said this to Edwin off-list. A PROTO_SPACER entry 
just needs to be added to in6protosw.


I was surprised to learn no-one had gone ahead and actually implemented 
it already as there are a few cases in IPv6 which might warrant it 
(6to4, Teredo etc.) If I'm missing anything obvious please let me know.


cheers
BMS
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Network Patches from -RELEASE to -STABLE 7.0

2008-05-04 Thread Bruce M. Simpson

Paul wrote:
   Is there a list of patches that have been applied to -STABLE since 
the -RELEASE ?
I can't seem to find a simple organized list of applied patches 
(something similar to linux kernel changelog).
I want to know if anything has been fixed or udpated in the network 
area to see if it warrants changing the kernel to -STABLE on a 
production machine.


This information is typically present in commit messages, or in 
FreeBSD's release notes. It's not something which is compiled on an 
ad-hoc basis, it is specifically compiled on a  per release basis, 
although you may occasionally see the release engineers updating the 
release notes for -CURRENT.


Cheers
BMS
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: multiple routing tables review patch ready for simple testing.

2008-05-02 Thread Bruce M. Simpson

John Hay wrote:

The linux guys seems to have multiple fibs (or whatever they call them)
which they can chain together by giving them different priorities. The
effect seems to be that a packet will be matched through the highest
priority fib to the lowest until a route match is found en then is used.
Will something like that be possible? I came across that kind of use
with the olsr guys. They let olsrd twiddle one of the higher priority
fibs and then put fallback routes in a lower priority fib. That way
olsrd can override a route (even the default route) and when olsrd
exists and deltes all its routes, the original ones are still in the
lower priority fib and will be used.
  


XORP already does this without relying on any kernel support.

Each routing protocol supplies an origin table of its own. The RIB makes 
the decision on which route to plumb to the kernel based on 
administrative distance. When xorp_olsr exits, its origin table is 
removed, and the winning routes are recalculated.


You don't need to go to the kernel for this sort of thing unless you 
specifically need to implement route policy based on which interface(s) 
a packet came in on.


___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: multiple routing tables review patch ready for simple testing.

2008-05-02 Thread Bruce M. Simpson

Julian Elischer wrote:


OLSR is an overlay network


Nope -- the express intention was that it could be used for basic IP 
connectivity, for mobile devices. In OLSR, every node is a potential IP 
forwarder unless it explicitly advertises itself as being unwilling to 
forward.


and any machine that participated must have a split personality. First 
it must be able to think in terms of the basic local network, and it 
must be able to think in terms

of the world from the perspective of the overlay.


Applying routing policy gets more important at the border. The OLSR 
implementation in XORP is intended to give people a means of 
connectivity between MANET and non-MANET routing domains, by 
redistributing routes into the OLSR cloud.


I daresay these capabilities will get more important, and relevant, to 
the MANET picture as time goes on, but it's best to leave them out of 
the operational picture for now, in my opinion.


cheers
BMS
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: multiple routing tables review patch ready for simple testing.

2008-05-02 Thread Bruce M. Simpson

John Hay wrote:
You don't need to go to the kernel for this sort of thing unless you 
specifically need to implement route policy based on which interface(s) 
a packet came in on.



Yes I know that. But in the world of adhoc wireless mesh networking
there are very few non-linux people, so they basically call the shots
and use the linux kernel features to the full.


Not true. There's an awful lot going on behind closed doors in the MANET 
world, and from the sounds of the emanations, they might not be using 
Linux at all.



 In a sense I can
understand them because their stuff also run on the small embedded
stuff like the linksys wireless boxes and it needs to scale. The
biggest adhoc olsr network is probably the Freifunk one that have
more than 600 wireless nodes, mostly consisting of linksys boxes.
  


The complexity of any system like that is still there, regardless of 
whether or not people choose to make it harder to debug code by 
prematurely pushing it into the kernel.



On some boxes that are also connected to different kinds of networks,
they run a different routing daemon into another fib and by setting
the priorities on the fibs, they can decide which daemon's routes
have the highest priority. And both routing daemons are happy because
the other is not stomping on its feet.
  


Yes, but this is largely to do with the fact that the Linux netlink 
socket allows daemons to coexist due to its use of a tag-length-value 
which captures that information, a different kettle of fish.


The feature you describe is totally possible without adding complexity 
to Julian's current effort.


cheers
BMS
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: multiple routing tables review patch ready for simple testing.

2008-04-30 Thread Bruce M Simpson

Julian Elischer wrote:

An interface may however be present in entries from multiple FIBs
in which case the INCOMING packets on that interface need to
be disambiguated with respect to which FIB they belong to.


Yes, there is no way the forwarding code alone can do this.

It should not be expected to, and it's important to maintain a clean 
functional separation there, otherwise one ends up in the same quagmire 
which has been plaguing a lot of QoS research projects over the years 
(Where do I put this bit of the system?)




This is a job for an outside entity (from the fibs).
In this case a packet classifier such as pf or ipfw is ideal
for the job. providing an outside mechanism for implementing
whatever policy the admin wants to set up.


Absolutely. This has been the intent from the beginning.

There is no one size fits all approach here. We could put a packet 
classifier into the kernel which works just fine for DOCSIS consumer 
distribution networks, but has absolutely no relevance to an ATM 
backbone (these are the two main flavours of access for folk in the UK).




I find it is convenient to envision each routing FIB as a routing
plane, in a stack of such planes. Each plane may know about the same
interfaces or different interfaces. When a packet enters a routing
plane it is routed according to the internal rules of that plane.
Irrespective of how other planes may act.  Each plane can only route
a packet to interfaces that are know about on that plane.
Incoming packets on an interface don't know what plane to go to
and must be told which to use by the external mechanism. It
IS possible that an interface in the future might have a default
plane, but I haven't implemented this.


This limitation seems fine for now.

Users can't be expected to configure the defaults by default if they 
aren't supported, so, if overall the VRF-like feature defaults to off, 
and there are big flashing bold letters saying You must fully configure 
the forwarding plane mappings if you wish to use multiple FIBs, then 
that's fine by me.




if you have several alias addresses on an interface it is possible
that some FIBS know about some of them and others know about other
addresses. New addresses when added are added to each FIB and
whatever is adding them shoudl remove them from the ones that don't
need it.  This may change but it fits in with how the current code
works and keeps the diff to a manageable size.


   In any event, for plain old IP forwarding, a node's endpoint 
addresses are used only as convenient ways of referring to physical links.


To back up and give this some detailed background:

   For example, 192.0.2.1/24 might be configured on fxp0, and we 
receive a packet on another interface for 192.0.2.2. When resolving a 
route, the forwarding code needs to do a lookup to see from where 
192.0.2.2 is reachable before the next-hop is resolved in the table. 
That happens on a per-FIB basis, when the patches are applied -- however 
the job of tagging input for which FIB is the job of the classifier.


   The problems with the above approach begin when an input interface 
resides in multiple virtual FIBs (no 1:1 mapping), or when you can't 
refer to it by an address (it has no address -- unnumbered 
point-to-point link, or addresses do not apply), or when you attempt to 
implement encapsulation (e.g. GRE, IPIP) in the forwarding layer.


   Then, you're reliant on each individual FIB having resolved 
next-hops correctly. The existing forwarding code already does some of 
this by forcing the ifp to be set for any route added to the table. This 
is done implicitly for routes which transit point-to-point interfaces.
   BSD has had some weaknesses in this area. It makes implementing 
things like VRRP particularly difficult, which is why the ifnet approach 
to CARP was used (the forwarding table gets to see a single ifp); it 
eliminates a level of possible recursion from that layer of the routing 
stack.


   With multicast, for example, next-hops can't be identified by IPv4 
addresses alone. Every forwarding decision has potentially more than one 
result, and links are referred to by physical link (this could be an 
ifp, an interface index, a name, whatever), and where messages are 
forwarded is determined using a link-scope protocol such as IGMP.


   There, it's reasonable to expect that the user partitioned off the 
multicast forwarding planes into separate virtual FIBs, and that the 
appropriate rules in the classifier are configured.


   For SSM, the key (S,G) match has to happen in the input classifier, 
if one is going to route flows OK using the multiple FIB feature -- the 
multicast routing daemons have to be aware of it, 'cuz you can't run a 
separate instance of PIM for every set of flows -- PIM is greedy 
per-link, a !1:1 mapping problem exists, PIM has no way of telling 
separate instances apart (no hierarchy in the form of e.g. OSPF areas, 
and even OSPF won't let you put a link in more than one 

Re: multiple routing tables review patch ready for simple testing.

2008-04-30 Thread Bruce M. Simpson

Bakul Shah wrote:

1) A packet arrives on an interface.  If this interface is
   associated with more than one FIB, which FIB does it get
   given to?
  


If you only have a single FIB, there is no issue here.
If you have multiple FIBs, the decision gets made by the classifier.


2) If that decision is taken by a a packet 'classifier',
   isn't it in effect doing the job of a FIB (deciding the
   next hop, which happens to be a local FIB)?  Recall that
   basically a packet passes from a FIB to another FIB until
   it gets to its eventual destination.
  


Up until now, the BSD forwarding code always forwarded packets on the 
basis of the destination address.


In an IP environment this is totally reasonable. Most implementations 
work on this basis -- ultimately, there is a fan-out to a collection of 
tries which hold the prefix information, and there has to be a decision 
about which trie(s) to use for resolving the next-hop. Linux iproute2 
works on this basis more or less.


So the classifier is NOT doing the job of the FIB.


3) When a local packets needs to be sent, which FIB gets it?
   Does setfib decides that?  If there a default FIB?
  


If you look at Julian's patch, he's added an option to the socket layer 
to control this.

There is a default FIB which is used when no FIB tag exists.



I believe having to use pf/ipfw will slow things down a bit
so the question is what does associating an interface with
multiple FIBs buy you?
  


You only need to pass through pf/ipfw if you wish to source-route 
packets, or need to apply a forwarding policy decision more complex than 
the destination field, which is all rtalloc() has historically supported.


If there is any additional latency or slowdown, it's down to how good 
your matching algorithms are as you enter the classifier.




Wouldn't it make sense to treat each alias as on a separate
logical interface?  Then each logical interface belongs to
exactly one FIB.  On input you decide which logical inteface
a packet arrived on by looking at its destination MAC
address.  That reduces confusion quite a bit, at least in my
mind!  What does doing more than this buy you?
  


It doesn't buy anything because there is still no 1:1 mapping -- the 
link-layer destination address maps to an ifp, and multiple aliases 
exist on the ifp.


You still need a classifier to look at other fields in the message and 
decide, based on policy, which FIB is used for next-hop resolution.


Tag switching systems avoid the issue by prepending a tag, but of 
course, what does a packet go through upon entry to an MPLS domain?


You guessed it: A classifier.

cheers
BMS


___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: multiple routing tables review patch ready for simple testing.

2008-04-30 Thread Bruce M. Simpson

Bruce M. Simpson wrote:


Wouldn't it make sense to treat each alias as on a separate
logical interface?  Then each logical interface belongs to
exactly one FIB.  On input you decide which logical inteface
a packet arrived on by looking at its destination MAC
address.  That reduces confusion quite a bit, at least in my
mind!  What does doing more than this buy you?
  


It doesn't buy anything because there is still no 1:1 mapping -- the 
link-layer destination address maps to an ifp, and multiple aliases 
exist on the ifp.


Let me qualify that further: You are talking about splitting network 
layer addresses onto their own logical interfaces, with the goal of 
having a 1:1 mapping for FIB resolution.


This doesn't buy anything, because in IP, the previous hop never encodes 
the next-hop address it sends to -- it merely performs a lookup and 
forwards to you; your MAC address is the same for every IP address you 
have on the link, therefore it is not a unique identifier.


UNLESS you use a separate MAC address for every IP alias which you add, 
in which case, you are merely pushing the mapping elsewhere in the 
stack; it actually adds more complexity in this case.

___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: multiple routing tables review patch ready for simple testing.

2008-04-30 Thread Bruce M Simpson

Julian Elischer wrote:


what's SSM?


Source-specific multicast, where multicast flows (channels) are 
identified by both their original source address, and group address. 
Multicast addresses have no meaning on their own beyond the scope of a 
single link.



I haven't changed any of that.. Basically I've kept clear of
M/Cast. The way I see it, if you don't define ROUTETABLES=2 (or more)
or don;t define it at all in your config then you get what you had
before and I shouldn't have broken anything.


Cool! Doing multicast right is Hard. Doing it right in ad-hoc 
topologies is Harder.


It makes sense to steer clear of it for now. It can no doubt benefit 
from the hierarchy offered by multiple FIBs, but again, the policy 
routing mechanisms don't really exist just now, and things like PIM need 
changes to encompass it.


They will need to come into existence for the model to work on a macro 
scale, for the same reason SSM was put on the table.




I take it from this that you don't have any major complaints
as far as what I've done.


No problems here... I haven't tried testing.

I would say though if we are going to be renaming rtalloc() and friends, 
that names should really change to be descriptive of what it does.
It doesn't allocate a route, it tries to look up a forwarding table 
entry, and returns a reference to it.


cheers
BMS
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Multiple routing tables in action...

2008-04-29 Thread Bruce M. Simpson

Julian Elischer wrote:

The interaction with routing daemons is something I don't know
enough about. I need someone who knows routing daemons to tell
how to correctly tweek code that sends routing events.


As long as it doesn't break anything...



I think it is possible that events from a particular FIB should only 
be reported to routing sockets that are associated with that FIB.

but I'm not sure about this.


Please look at the Linux rtnetlink socket, they use a tag-length-value 
protocol for just this reason.


It seems reasonable that PF_ROUTE messages have some kind of filter 
applied to them until a more complete story can be realised for this.
   Most PF_ROUTE clients are savvy enough to ignore message types on 
the socket that they don't understand.
   If there is a need to announce route adds and deletes on the socket 
on a per-fib basis, it seems reasonable to stash it in one of the unused 
fields (if we've got any of those..urp) and change the rtm_type field 
for now.


However it does take us further down a route (no pun intended) of 
incremental growth which has real risk (lack of or insufficiently rich 
test cases, requirements drift etc) and seems to be incumbent with open 
source in general.




This would mean running a separate instance of the routing daemon for
each FIB (VRF?).  Does this sound right to people?


Sounds crap! You really, really don't want to be doing that if you can 
avoid it.


Of course a lot of what's out there is not geared up to deal with it 
(and why would it be?) so it's fine for the time being, but it really, 
really can't be considered a complete, production-quality solution until 
the missing parts exist.


cheers
BMS

P.S. I am impressed by the scope and ambition of your work even if I 
haven't had a chance to digest it fully yet, and I hope that my concern 
about production quality open source here is not misinterpreted as 
nay-saying or disapproval by anyone.

___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: multiple routing tables review patch ready for simple testing.

2008-04-29 Thread Bruce M. Simpson

Julian Elischer wrote:


A general purpose OS is a different beast as it has no physical
equivalent of the FIB. It may have multiple routing tables, though, to
I think setrib would be a term less likely to cause confusion then
setfib even though, in the case of your FreeBSD patches, it's really
both.

If we need to change the terminology now is the time..
I asked for comments on terminology before and this is what we
came up with.. but once it gets committed it gets set in stone.


The kernel forwarding table is not a RIB.

In the past some apps have tried to use it as one. They really shouldn't 
do that.


There are implementation constraints on the inter-process communication 
involved (PRC_ATOMIC, etc) which make it inherently unsuitable as a 
place for routing daemons to exchange routes, particularly when the 
system is under load, or running near load limits, as would be the case 
with a tightly engineered embedded system.


I understand folk went down that road in the past, as a means to get 
something up and running quickly as a working demo, or as a hangover 
from the days when they were the only tools around, but it isn't the way 
to build a comms infrastructure.


These days general purpose OSes are getting closer to specialised comms 
equipment in terms of what they can do, but more importantly, so are 
people's expectations of them -- and thus people's concern about whether 
or not it works tends to follow.


cheers
BMS
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to [EMAIL PROTECTED]


Strange forwarding issue with tap(4) and if_bridge(4)

2008-04-18 Thread Bruce M Simpson

Hi,

I noticed a strange issue with tap(4) and if_bridge(4) where the bridge 
seems not to be forwarding frames.


6.3-RELEASE, btw.

I have this setup where I use the two to bootstrap QEMU virtual machines.

Up until now I've been using dhcpd for this. This has only ever worked 
right for me if I run dhcpd on the bridge interface. However I tried 
doing it on a second tap, and it worked OK for me.


qemu  /dev/tap0  tap0 -- bridge0  tap1 - 
[bpf] - dhcpcd


  DHCP discovery broadcasts   - DHCP 
unicast replies   OK



If I run dhcpd on another tap interface, this works OK, but obviously 
only if I open the matching character device. dhcpd of course uses bpf 
for injection, not the character device.


HOWEVER: If I try to run my own BOOTP server in userland, on the 
character device, what happens is this:

   If I tcpdump, I see the broadcast DHCP discover messages on the tap OK.
   bpf also sees the unicast replies my code generates.
   But if_bridge does not forward my traffic, even though the unicast 
addresses appear to be correct.


qemu  /dev/tap0  tap0 -- bridge0  tap1 - 
/dev/tap1 - my_bootpd


  DHCP discovery broadcasts  X - 
BOOTP unicast repliesNOT OK



The BOOTP replies (written to /dev/tap1) do not appear on bridge0 or 
tap0. They do however appear on tap1.


In the first setup, the DHCP replies appear on all interfaces in the 
bridge, including the bridge.


What if anything could I be doing wrong?

tcpdump and wireshark report that the BOOTP replies I am generating are 
well formed.
The write semantics I use are identical to those of the QEMU client at 
the other end.

I've ruled out pfil/firewall filters.

Now, as tap1 has been added to a bridge, it is in promiscuous mode -- 
and because bpf shows the userland-generated frames being sent, I 
believe the check I added for the destination address in if_tap.c can be 
ruled out.


The problem occurs even if I add static entries to the bridge's address 
cache and disable all learning. Both RSTP and STP are disabled.


Thanks for any help you can provide.

cheers
BMS

[P.S. I have noticed that in order to get frames from /dev/tapX, 
non-blocking reads are necessary. My code is single threaded, I use 
select() to block it].


___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Question about ip accounting

2008-04-18 Thread Bruce M. Simpson

Christopher Arnold wrote:

Anyone looing at supporting the netfpga card on FreeBSD?
I would love to do that project myself, my time is scarse right now.


I believe there was some interaction between other XORP members and the 
NetFPGA people, although I don't know if this resulted in any outcome.


cheers
BMS
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Looking for a bgp stressing tool

2008-04-18 Thread Bruce M. Simpson

Ingo Flaschberger wrote:


So we are looking for a tool that inject and verify packet with faked
IPs. We want to generate fake traffic between A-B A-C B-C in both 
directions.

The aim is to evaluate the routing capacity of openbgpd/freebsd.

We currently didn't find any tool that fit our needs. Do you have any
suggestion ?


sbgp

you can script this bgp listener/sender.

is hard to find, as it was in the mrtd router package, which is dead 
now.

http://www.filewatcher.com/m/mrtd-2.2.2a.tgz.871976.0.0.html


The regression test framework in XORP is driven by a set of Python 
scripts, I believe it is fully scriptable.


It might also be worthwhile adding BGP message support to PCS:
   http://pcs.sourceforge.net/

I have a lot of patches to go into PCS, gnn@ is pretty busy right now.

cheers
BMS
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: IGMPv3 support

2008-04-15 Thread Bruce M. Simpson

Martin Garon wrote:

I am looking for a FreeBSD release with IGMPv3 and was surprised to find
none.

I know the KAME project added support for IGMPv3. Anyone knows why this was
not imported back into the current sources? I was wondering if it had
anything to do with reliability or rather with business mumbo-jumbo.
  


I am actively working on this right now. Please see the bms_netdev 
branch in p4 for progress. The code there must be considered pre-alpha, 
it's a development branch.


At the moment I am constructing baseline regression tests to make sure 
that everything works according to spec. It's harder than it looks as 
there are a few places where the delta-based vs the SSM API can lead to 
inconsistency and the specs are not completely unambiguous.


cheers
BMS
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: problem in if_tap.c

2008-04-15 Thread Bruce M. Simpson

Maksim Yevmenkin wrote:

please try the following patch. if there is no objections, i will commit it

beetle# diff -u if_tap.c.orig if_tap.c
--- if_tap.c.orig   2007-04-05 10:58:39.0 -0700
+++ if_tap.c2008-04-14 09:42:42.0 -0700
@@ -404,6 +404,7 @@
struct ifnet*ifp = NULL;
struct tap_softc*tp = NULL;
unsigned short   macaddr_hi;
+   uint32_t macaddr_mid;
int  unit, s;
char*name = NULL;
u_char  eaddr[6];
@@ -432,8 +433,9 @@

/* generate fake MAC address: 00 bd xx xx xx unit_no */
macaddr_hi = htons(0x00bd);
+   macaddr_mid = (uint32_t) ticks;
bcopy(macaddr_hi, eaddr, sizeof(short));
-   bcopy(ticks, eaddr[2], sizeof(long));
+   bcopy(macaddr_mid, eaddr[2], sizeof(uint32_t));
eaddr[5] = (u_char)unit;

/* fill the rest and attach interface */
  



This patch looks good, please commit.


[Unless of course we want the autogenerated MAC to be deterministic for 
some reason, but given that it comes from a timer, there's not much 
point in fixing the endianness...]


cheers
BMS
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Howto send a limited broadcast?

2008-04-12 Thread Bruce M. Simpson

tmm wrote:
So, can anyone suggest how I can send a limited broadcast (on an 
interface that has been initalized with an IP and a subnet)?


Use the IP_ONESBCAST option and send to the network broadcast address 
for that subnet. The stack will change it into 255.255.255.255 on 
output. See man page ip(4) for details.


It's a hack, but it's largely due to how the stack has worked historically.

BMS
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to [EMAIL PROTECTED]



Re: Initialising networking protocol

2008-04-05 Thread Bruce M. Simpson

[EMAIL PROTECTED] wrote:

Hi All,

I am working on implementing MPLS in FreeBSD at the moment. I was wondering if 
anyone had some links to any references I could use, or recommend any books I 
can use to help me in that. Failing that, I am struggling with trying to work 
out how to initialise my MPLS protocol in the netisr stack, so the mpls_input 
function I am writing is called when an MPLS packet is received.
  


Seen ayame? http://www.ayame.org/
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Initialising networking protocol

2008-04-05 Thread Bruce M. Simpson

Julian Elischer wrote:


Seen ayame? http://www.ayame.org/


looks like a stalled affort.. things stop in 2002


[greater-than] From what I've read of the code, it seems close to KAME 
and BSD style, and could actually get merged. With a little bit more 
work, the userland could slot into XORP's BGP implementation. Of course, 
all this takes time and effort, however I believe Ayame was a working 
example of MPLS in NetBSD, so it's as good a place to start as any.


cheers
BMS
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to [EMAIL PROTECTED]


fxp(4) multicast transmission bug.

2008-04-02 Thread Bruce M Simpson

Hi,

I am doing some protocol testing, and I just saw something very odd on 
6.3-RELEASE.


If I try to inject multicast traffic via bpf with fxp, bpf will report 
that it went out OK, however it never makes it out onto the wire. I have 
ruled out firewalls, switches and other layer 2 behaviour.


sysctls look like this:
   dev.fxp.0.int_delay: 1000
   dev.fxp.0.bundle_max: 6
   dev.fxp.0.rnr: 0
   dev.fxp.0.noflow: 1

driver flags look like this when injection is OK:
   fxp0: flags=8943UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST mtu 
1500


driver flags look like this when injection is NOT OK:
   fxp0: flags=8843UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST mtu 1500

... however, if for any reason the group I'm sending to has been joined 
by another process or kernel entity, sending is OK.


My understanding of multicast hash filters was that they worked in only 
one direction -- receive, not send.


However, I see from reading the driver that the fxp chip has certain 
restrictions on how the hash filter is programmed -- the command to do 
so must come before any other descriptors are queued.


That's all well and good, but sending should just work. Further 
reading of the driver suggests that it does nothing special about 
multicast transmission, so that would seem to point the finger at the 
driver firmware or the ASIC itself.


If fxp is behaving differently to 99% of hardware out there, surely this 
needs to be marked in capabilities -- I shouldn't strictly need to 
program the hash filter to send the traffic, only receive. Whilst it's 
something an application is *likely* to do, it doesn't have to do so by 
spec, therefore this behaviour is a bug.


Or is there something I'm missing completely here?

Comments? feedback? suggestions?

cheers
BMS
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: panic: tcp_addoptions: TCP options too long w/ with TCP_SIGNATURE support

2008-04-01 Thread Bruce M. Simpson

Dontcha just hate broken vendor NAT?

Yes, it seems reasonable that SACK is the sacrificial victim. 
Considering folk normally configure TCP-MD5 between routers which are 
usually directly connected on the same switch, doing away with SACK 
should be fine.


Funny, I was staring at that define moments ago whilst debugging a 
totally unrelated piece of code in a different language.


Good stuff.
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to [EMAIL PROTECTED]


Unbreaking igmp with pf.

2008-03-31 Thread Bruce M Simpson

Hi all,

Just to follow up on my message last week.

If I don't hear further feedback, I am likely to commit code which 
allows IP Router Alert options through the pf firewall by default.


For further background read on.

cheers
BMS



The lack of support for allowing the IP Router Alert option 
(henceforth: RA)

by default in pf is problematic for the widespread deployment of IGMPv3.

It's also bit some people who have been trying to set up multicast capable
routers, even without IGMPv3, as FreeBSD sends RA by default in IGMP and has
done since the 3.x era.

Currently, PF has no capability to parse IP options, and defaults to 
dropping

traffic which contains them. In day to day deployment, the most used option
is in fact RA.

The meaning of RA is quite simple: all routers on the path must examine the
datagram. It is described in RFC 2113. Currently FreeBSD's forwarding plane
performs no special processing of RA.

Whilst RA came into existence well into after, RFC 3376 extends the 
notion of

IGMP to make the use of RA mandatory. It's reasonable to do this, given that
vendor kit is intended to do it. It also helps IGMP snooping switches spot
the group joins. It is also used with MPLS and RSVP.

So what?, I hear you cry. Yes, but if outgoing IGMP is being squelched
at the host, it breaks IP multicasting for everything but the most
trivial cases (i.e. service discovery at 1 hop, pfsync, etc).

Furthermore... if you don't send IGMP for link-scope groups (224.0.0.0/24),
it will break them anyway if the switch is configured to prune link-layer
multicast traffic.

Options:
1. Change default in FreeBSD pf import to ip options enabled.

2. Add code to pf to simply allow the RA option by default.
   [I'm happiest with this one.]

3. Add code to the options path in pf to decode options, if and only if
   options are allowed, and add a mask specifying the allowed values.

   For reference, the IANA list of IP option numbers is here:
   http://www.iana.org/assignments/ip-parameters
   ...most of those are never used in practice. RA is. There are 30
   possibilities specified for an 8-bit-wide space; the minimal mask fits
   in 32 bits; the maximal mask is therefore 256 bits.

There is some overlap between 2 and 3; FreeBSD's kernel only tacks on 4 
bytes

to the IP header in outgoing router alert traffic, userland apps may do
different things.

So, if I don't hear more feedback from folk, I am likely to commit code 
which

implements option 2.
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: 7.0 - ifconfig create is not working as expected?

2008-03-30 Thread Bruce M. Simpson

Eugene Grosbein wrote:

On Sat, Mar 29, 2008 at 03:43:44PM -0500, Brooks Davis wrote:

  

I was using following command in FreeBSD 6.2:
# ifconfig lo1 create inet 172.16.16.2 netmask 255.255.255.0
In FreeBSD 7.0 I got an error:
# ifconfig lo1 create inet 172.16.16.2 netmask 255.255.255.0
ifconfig: inet: bad value
But it is working splitted in to two commands:
# ifconfig lo1 create
# ifconfig lo1 inet 172.16.16.2 netmask 255.255.255.0
Is this expected behavior or should I file a PR?
  

This expected.  There's some argument it's wrong, but filing a PR is
unlikely to cause it to change any time soon.



Why? The same with creating gif-tunnel, now I need to invoke ifconfig
twice, once for 'create' and once for other tunnel parameters,
whereas for RELENG_6 this works: 'ifconfig gif0 create tunnel 1.1.1.1 2.2.2.2' 


This breaks existing setups/scripts. This is POLA issue.
Why was it broken?
  


I don't know why or how this has happened, however, given the complexity 
of the command line grammar which ifconfig is expected to parse, our 
choices are limited, unless someone(tm) is willing to come along and 
implement a full parser in ifconfig.


I investigated this some years ago and frankly didn't get anywhere, one 
of the constraints was that Sam wanted to modularize the ifconfig code, 
with a view to future dynamic loading -- as such, this places 
restrictions on the kind of parser which can be used.


There is valid argument that we should not do this, as ifconfig is a 
tool which sits in the base system, and should be kept simple and 
therefore small.


On the other hand, there's also the argument that as ifconfig's syntax 
has grown considerably over the years, that we should go ahead and add a 
parser anyway.


In the absence of a full-blown parser, I'm comfortable with ifconfig 
cloner create being a separate operation, which preferably throws an 
error if other commands are included with it, and understand why these 
limitations apply.


cheers
BMS
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to [EMAIL PROTECTED]


CALL FOR FEEDBACK: IGMP and PF interoperability

2008-03-26 Thread Bruce M Simpson
It has come to my attention that the default configuration of PF in 
FreeBSD will block legitimate outgoing IGMP messages.


PF is currently not the default firewall in FreeBSD. Anyone using 
multicast in any way, even for link-scope multicasts (224.x.x.x/24), 
will be affected by this issue if they use PF as their firewall.


This issue was described in this thread:
   http://lists.freebsd.org/pipermail/freebsd-pf/2006-June/002259.html

The documentation does state that allow-opts needs to be specified 
explicitly -- there is no fine grained control for the IPv4 options 
actually filtered, however, and currently the IP Router Alert option is 
handled in the main path in all BSD derived systems.


Please let me know if you have encountered this issue, so that we can 
get started on a workaround.


cheers
BMS
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Frequent pauses with Linux-based router

2008-03-17 Thread Bruce M. Simpson

Sean C. Farley wrote:

I have noticed that with a Linux-based Netgear DG834G (DSL modem)
frequent pauses (example[1]) between external systems and 7-STABLE
(March 14th).  At first, I thought it was ipfilter or ipnat, but I took
those out of the picture by activating telnet on the router and
connecting directly to it.  Even running ls /usr/sbin on the router
would pause occasionally.  I did not (or did not recall) have these
problems with 6-STABLE (post 6.2).  I switched out the NIC (FA-311 (sis)
to a FA-310 (dc)), cable and tried different ports on the modem by which
to connect.  I also tried disabling all RFC sysctl's and SACK.  Nothing
helped.

Finally, I brought out an old DSL modem (SpeedStream 5660).  This fixed
the issue.  I think this maybe a specific issue between Linux
(2.4.17_mvl21-malta-mips_fp_le) and FreeBSD 7.  Is there anything else I
may test to see what is happening?


OT: Hang on, are you saying you're running a MIPS MALTA targeted Linux 
kernel on a Netgear DG834G? That would be interesting as a test platform 
for FreeBSD/mips, considering the platform support for Malta is already 
there. I had a go at doing the Broadcom Sentry5 SoC last year but hadn't 
finished anything.


Long shot, but are 802.3 pause frames appearing anywhere, ie can you 
test with a crossover cable?
Have you done a BER test with UDP or something like that to try to rule 
out non-TCP protocols?


cheers
BMS

___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: FYI: inpcb/pcbinfo mutex - rwlock at some point in the mid-distant future

2008-03-12 Thread Bruce M. Simpson

Robert Watson wrote:
One of those issues is that we need to demonstrate to ourselves that 
exclusive access contention is managed as well with rwlocks as with 
sleep mutexes, as these locks would continue to be fairly highly 
contended in TCP.  The other issue is that rwlocks don't support full 
priority propagation for reader access, although Jeff Roberson has 
recently improved fairness to writers with many readers.


Don't forget that p4 bms_netdev contains a number of optimizations for 
the multicast paths -- there are lock acquisitions which are quite often 
unnecessary, or whose granularity is too high for the data structure(s) 
which need to be shared.


BMS
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Looking for a guide to extend|adapting the socket framework for NFCIP-1

2008-03-02 Thread Bruce M. Simpson

Hi,

I had to use a search engine to figure out what the acronym NFC was, and 
I assume you mean this:

   http://en.wikipedia.org/wiki/Near_Field_Communication

It helps if you give more background information when asking a more 
general audience for feedback.


zDen wrote:

1) As the NFC device is attached to the USB or UART port, how and where in
the source code can I change the output of the byte-stream packet to the
proper physical port? i.e where is the part of the source code that is
physical device dependent when doing the I/O calls?
  

You really need to roll your own driver framework for this.

Whilst the Bluetooth support sounds like it's the right place to start 
to look for ideas, you're going to have to write your own layering.


I know off the top of my head that the Bluetooth support is able to add 
its own TTY disciplines to serial devices but I couldn't tell you 
specifics, as it's not something I meddle with unless I need to.



2) As the protocol family (PF_xx) and address family (AF_xx) of NFC is not
define in the socket library, how can I define them and let the default
socket() call return a socket with the customized structure? I can see that
I may need to use SOCK_RAW as the basic socket framework or any others
recommendation?
  


To learn about adding a new socket family to the system, you really need 
to pick up a copy of TCP/IP Illustrated Volume 2 and read Chapter 15 
onwards.


It sounds like you have a fairly involved and challenging software 
project on your hands.


I hope you're being funded by someone to do it, it doesn't sound like 
something a hobbyist would pick up just for the hell of it if it's going 
to be done properly, i.e. beyond a quick hack for demonstration purposes.


cheers
BMS
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Ephemeral port range (patch)

2008-03-02 Thread Bruce M. Simpson

+1 on increasing the threshold, 1024 is way too low.

Also consider the folk who depend on the existing behaviour: a 
predictable ephemeral port range is useful, if for some reason you need 
to apply a NAT policy to that traffic, with no other knowledge about how 
the applications you must NAT actually behave.


later
BMS
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Routing confusion

2008-02-29 Thread Bruce M. Simpson

Eric Anderson wrote:
I guess my biggest question is, why do the IPs .128, .129, .130, .131 
appear in the routing tables where they're NOT defined?  I don't get it?


You are not seeing forwarding table entries. You are seeing ARP entries 
- the LLINFO flag is set (L). This is a legacy behaviour we haven't done 
away with just yet.


BMS

___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: kern/120958: no response to ICMP traffic on interface configured with a link-local address

2008-02-22 Thread Bruce M. Simpson

I looked at this very briefly.

It's gnarly because in_canforward() is a candidate for inlining and is a 
predicate which is being overloaded with different meanings by 
ip_forward()/ip_input() and icmp_reflect().


So whilst the fix is most likely a 3 liner, it risks making the code 
look crap. We genuinely don't want to forward 169.254.0.0/16 traffic, 
however we genuinely need to reply to ICMP which originates from these 
ranges.


[EMAIL PROTECTED] wrote:

Synopsis: no response to ICMP traffic on interface configured with a link-local 
address

Responsible-Changed-From-To: bms-freebsd-net
Responsible-Changed-By: bms
Responsible-Changed-When: Fri 22 Feb 2008 21:23:23 UTC
Responsible-Changed-Why: 
The secretary disavows all knowledge of your actions.

[Responsible implies I'll fix it, I said no such thing.. I *MIGHT*
get around to it, but Responsible implies there's an obligation.
Cheeky linimon!]

http://www.freebsd.org/cgi/query-pr.cgi?pr=120958
  


___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: IPV6_TCLASS missing from ip6(4)

2008-02-21 Thread Bruce M Simpson

[EMAIL PROTECTED] wrote:

At Wed, 20 Feb 2008 18:25:05 +,
Bruce M Simpson wrote:
  
I just noticed that whilst the socket code appears to support 
IPV6_TCLASS, we don't document it.


I  haven't raised a PR for this issue yet nor have I written a patch.



Please do both :-)
  


Done. TCLASS is in the synopsis, hasn't hit the database yet.

Here's the patch just in case.

[I really wish we used Bugzilla, I shouldn't have to go through the SMTP 
rigmarole... I can log into freefall, but that's a privileged position. 
Also, Mercurial does RSS feeds for commits like git does, further 
removing the need for vulnerable and messy SMTP...]


later
BMS
Index: src/share/man/man4/ip6.4
===
RCS file: /home/ncvs/src/share/man/man4/ip6.4,v
retrieving revision 1.22
diff -u -p -r1.22 ip6.4
--- src/share/man/man4/ip6.4	29 Sep 2006 16:16:41 -	1.22
+++ src/share/man/man4/ip6.4	21 Feb 2008 13:06:20 -
@@ -30,7 +30,7 @@
 .\
 .\ $FreeBSD: src/share/man/man4/ip6.4,v 1.22 2006/09/29 16:16:41 bms Exp $
 .\
-.Dd September 29, 2006
+.Dd Februrary 21, 2008
 .Dt IP6 4
 .Os
 .Sh NAME
@@ -147,7 +147,6 @@ The following socket options are support
 .It Dv IPV6_UNICAST_HOPS Fa int *
 Get or set the default hop limit header field for outgoing unicast
 datagrams sent on this socket.
-A value of \-1 resets to the default value.
 .\ .It Dv IPV6_RECVOPTS Fa int *
 .\ Get or set the status of whether all header options will be
 .\ delivered along with the datagram when it is received.
@@ -312,6 +311,18 @@ The
 routine and family of routines may be used to manipulate this data.
 .Pp
 This option requires superuser privileges.
+.It Dv IPV6_TCLASS Fa int *
+Get or set the value of the traffic class field used for outgoing datagrams
+on this socket.
+The value must be between \-1 and 255.
+A value of \-1 resets to the default value.
+.It Dv IPV6_RECVTCLASS Fa int *
+Get or set the status of whether the traffic class header field will be
+provided as ancillary data along with the payload in subsequent
+.Xr recvmsg 2
+calls.
+The header field is stored as a single value of type
+.Vt int .
 .It Dv IPV6_RTHDR Fa int *
 Get or set whether the routing header from subsequent packets will be
 provided as ancillary data along with the payload in subsequent
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to [EMAIL PROTECTED]

Re: How to reinitialize an interface

2008-02-21 Thread Bruce M. Simpson

Paul Schmehl wrote:
--On Thursday, February 21, 2008 11:41:05 -0800 Tony Coon 
[EMAIL PROTECTED] wrote:




I am looking for a way to flush IP addresses, particularly IPv6, from an
interface and have it repeat the initialization process that the 
interface

goes through on boot, including IPv6  autoconfig.  The service network
restart in Linux seems to do this.


/etc/rc.d/netif restart


...none of which will completely blow away the IPv6 stack state, which 
seems to be what the querent wants. Some refactoring is needed in the 
kernel to support this. IPv4 has the same problem, there's no way to 
administratively blow away certain structures and reinitialize them.


later
BMS
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to [EMAIL PROTECTED]


IPV6_TCLASS missing from ip6(4)

2008-02-20 Thread Bruce M Simpson
I just noticed that whilst the socket code appears to support 
IPV6_TCLASS, we don't document it.


I  haven't raised a PR for this issue yet nor have I written a patch.

This came up when I started hacking support for setting IP_TOS into 
something else.


cheers
BMS
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Multiple default routes on multihome host

2008-02-20 Thread Bruce M. Simpson

Wes Peters wrote:
I see a number of people have replied to this message offering 
solutions of how to accomplish your migration, using a variety of 
tools available to you in FreeBSD.  I've always found this community 
very supportive in this fashion, and I'm glad they've jumped in to 
help you in your transition as well.  Please note that the variety of 
solutions presented recognize that your transition period is just 
that, a temporary situation, and that multiple default routes is not 
the solution.


The thing is, in a peer-to-peer or ad-hoc mesh network, not having 
access to a single next-hop serving as the gateway of last resort has a 
much higher probability of occurring than in a fully converged network 
with more deterministic layer 3 behaviour.


So we're largely arguing apples vs oranges here. Fact of the matter is, 
we can't tell people how to run their networks, or which protocols to 
run. People want IP everywhere and they want it now. (Infinite demand 
for free goods is another story.)


The argument that functionality should not be present because people 
should not run their networks that way carries no water -- 
particularly so when issues of wireless presence and ad-hoc networks 
blow the old assumptions out of the water.


later
BMS
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: 7.0 Link-Local Addresses

2008-02-20 Thread Bruce M. Simpson

James Snow wrote:

I'm trying to use link-local for the cross-over interface between a pair
of FreeBSD boxes running pf, pfsync, and CARP.  These firewalls will
need to be able to route for the whole of RFC1918, and carving off a
piece of that address space isn't an option.

This seemed to be a perfect scenario for link-local addresses until I
ran into the above problem.  RFC 3927 states, in section 1.6 (Alternate
Use Prohibition):

Note that addresses in the 169.254/16 prefix SHOULD NOT be
configured manually

So I'm not sure if this is a bug or just RFC compliance. 
  


I can't see why you're seeing datagrams to 169.254.1.1 being dropped 
based on the information you provide.


I did introduce some checks into the mainline code which will prohibit 
the use of link-local addresses for forwarding, these should not affect 
reception as an endpoint.


However, you should be just fine manually configuring 169.254/16 
addresses for the time being. Whilst it isn't in accordance with the 
letter of the RFC as you correctly point out, there are situations where 
it's useful.


The stack does NOT currently support source address selection policies. 
These were introduced to NetBSD. Currently in FreeBSD, source address 
selection is based solely on destination address.


cheers
BMS
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to [EMAIL PROTECTED]


  1   2   3   4   5   6   >