[B.A.T.M.A.N.] Understanding MAC-Adresses in iptables-Logs
Hello folks, there appears to be some misconfiguration in our network. A gateway is blocking unknown ip-addresses: [658047.514011] FORWARD DROPPEDIN=bat0 OUT=backbone MAC=3a:81:5b:64:fa:32:08:fc:88:9b:8a:60:08:00:45:00:00:4f:6c:b1:40:00:3f:06:b8:8e:0a:a6 SRC=10.166.28.69 DST=173.194.65.188 LEN=79 TOS=0x00 PREC=0x00 TTL=63 ID=27825 DF PROTO=TCP SPT=45173 DPT=5228 WINDOW=9131 RES=0x00 ACK PSH URGP=0 [658047.519455] FORWARD DROPPEDIN=bat0 OUT=backbone MAC=3a:81:5b:64:fa:32:08:fc:88:9b:8a:60:08:00:45:00:00:34:6c:b2:40:00:3f:06:b8:a8:0a:a6 SRC=10.166.28.69 DST=173.194.65.188 LEN=52 TOS=0x00 PREC=0x00 TTL=63 ID=27826 DF PROTO=TCP SPT=45173 DPT=5228 WINDOW=9131 RES=0x00 ACK FIN URGP=0 I'm somewhat confused by the mac-address here - it's very long. Can I somehow derive, which originator or client is propagating or using this address? Greetz, Jan
Re: [B.A.T.M.A.N.] alfred: IPC issues
Hello, thanks for your feedback. Am 08/07/2014 10:37 AM, schrieb Simon Wunderlich: Hello Jan, Hei folks, I'm using alfred 2014.0.0 on OpenWRT Barrier Breaker. Alfred is running: root@6466b34ffcac:~# ps -w | grep alfred 1718 root 1100 S/usr/sbin/alfred -i br-freifunk -b bat0 But cannot be queried using the client: root@6466b34ffcac:~# alfred -r 94 can't connect to unix socket: Connection refused What's wrong here? (...) I don't see a problem right from your configuration. Can you please check: * do you see the socket file /var/run/alfred.sock after starting alfred? Looks nice.. root@6466b34ffcac:~# ls -lh /var/run/alfred.sock srwxr-xr-x1 root root 0 Aug 6 20:47 /var/run/alfred.sock * does the same problem happen if you restart alfred or start it manually? /etc/init.d/alfred restart made it disappear For some reason, this is persistent: * Before restarting, alfred was unusable after rebooting the machine * After a restarting once, alfred is usable - even after reboots Greetu, Jan
Re: [B.A.T.M.A.N.] alfred: IPC issues
Hello, Am 08/07/2014 10:37 AM, schrieb Simon Wunderlich: Hello Jan, Hei folks, I'm using alfred 2014.0.0 on OpenWRT Barrier Breaker. I'm not able to reproduce this issue using 2014.3.0. I'm not certain, whether 2014.3.0 is the cause. Greetz, Jan
[B.A.T.M.A.N.] Running alfred as non-root user?
Hello, everytime I try to start alfred 2014.3.0 using a dedicated non-root user it fails with: can't bind unix socket: Address already in use Running as root is successful. Is it possible to start alfred as non-root? Thanks, Jan
[B.A.T.M.A.N.] alfred: IPC issues
Hei folks, I'm using alfred 2014.0.0 on OpenWRT Barrier Breaker. Alfred is running: root@6466b34ffcac:~# ps -w | grep alfred 1718 root 1100 S/usr/sbin/alfred -i br-freifunk -b bat0 But cannot be queried using the client: root@6466b34ffcac:~# alfred -r 94 can't connect to unix socket: Connection refused What's wrong here? Configuration is config 'alfred' 'alfred' option interface 'br-freifunk' option mode 'server' option batmanif 'bat0' option start_vis '0' option run_facters '0' Thanks, Keep smiling yanosz
Re: [B.A.T.M.A.N.] Fwd: Re: increasing BATADV_FRAG_MAX_FRAGMENTS
Hello, Am 07/01/2014 11:21 AM, schrieb elektra: Hi – During an experiment it was found, that BATADV_FRAG_MAX_FRAGMENTS=16 allows ~72 clients to be connected to one single access point. would you mind sharing some details regarding the experiments you performed ? The reason for this question is that there is no 72 client limit we know of (unless fragmentation was disabled). you are very likely looking in the wrong direction. The number of clients allowed to connect to an AP is configurable and 72 is already way beyond what I would consider useful in real life – if your clients are actually supposed to be able to communicate unicast traffic. Since you are probably using OpenWRT for your tests, look at the configuration parameter maxassoc: http://wiki.openwrt.org/doc/uci/wireless I don't know the default that OpenWRT uses if you don't set it explicitly. But there is a default limit set somewhere (take a look at hostapd.conf) and you might have discovered that by pushing it to the limit. It might as well be the limitation of your WiFi driver / hardware, though. After a lively discussion on irc, I'd like to finalize this thread. To summarize, where we are so far: - batman-adv is not restricted to 72 clients. Pre-2014 versions (which have been used here), support 200 non-clients in their tt-table. - Initially posted by g3ntleman on the Freifunk-KBU mailinglist, an observation (why can we handle 72 clients), became a rumor (there might be a limit in batman-adv), became a fact (we observed a limit in batman-adv) - without any support. - G3ntlemans setup was not meant to be in experiment in a sane setting. It was an emergency fix for a broken down conference wifi. Which - nevertheless - showed some interesting findings. I hope, that we can document this for future reference, but the relevance of the findings in general is probably rather poor. - The aftermath of the discovery was motivated by the idea of running an experiment at this year's FrOSCon. If you like to join us here, you're welcome :-). - It's still unclear, what we observed in detail and we'll probably never know for sure. The reason is simple: No measurements / logs were taken and we cannot reproduce the setting. - To emphasis this: It's neither evident nor even probable that any property of batman-adv is a bottleneck in this scenario. - The used setup was not meant to be suitable for a conference wifi or even high density wifi coverage. It suffers from obvious design flaws. G3ntlemen just remembered having his box with him, the moment the conference wifi was down. There was not a single thought spent on scaling issues here. - I don't know, if 72 clients on a single wifi channel is sane or not ( Ubiquity claims, that the can handle 100) - and - if pre 802.11n experiences (aka gut feelings) are applicable. However, getting a gut feeling is the idea for this year's FrOSCon. Thus: Join, if you're interested. I'd like to excuse for all anger, frustration or hassle generated by this discussion, and, I'd like to thank marec and ordex for their helpful comments. Keep calm, carry one and thanks for your commitment to batman-adv. Greetz, Jan
Re: [B.A.T.M.A.N.] DHCP-Monitoring of Gateways
Hello, Am 06/16/2014 08:14 AM, schrieb Marek Lindner: Hi, we're running batman 2013.4, having multiple Gateways. For doing so, we'd like to monitor the availability of every dhcp-server on each Gateway. I noticed: If DHCP-Packages (even unicast ones) that unicast DHCP-Requests / Discovery are unanswered, if - they're sent locally on a gateway - sent to a gateway not chosen (according to batctl gwl) by the monitoring host. What's the best way of monitoring all DHCP-Servers? Is there a way to set up a monitoring node, that can reach all DHCP-Servers? If not, how can request an ip address locally? if you don't wish batman to interfere with your DHCP requests I suggest to disable the gateway mode. thanks for your feedback. I'd like to keep the gateway mode enable for regular clients. It is sufficient to disable the gateway-mode on the monitoring-node? Thanks, Jan
[B.A.T.M.A.N.] DHCP-Monitoring of Gateways
Hello, we're running batman 2013.4, having multiple Gateways. For doing so, we'd like to monitor the availability of every dhcp-server on each Gateway. I noticed: If DHCP-Packages (even unicast ones) that unicast DHCP-Requests / Discovery are unanswered, if - they're sent locally on a gateway - sent to a gateway not chosen (according to batctl gwl) by the monitoring host. What's the best way of monitoring all DHCP-Servers? Is there a way to set up a monitoring node, that can reach all DHCP-Servers? If not, how can request an ip address locally? Thanks, Greetz, Jan
[B.A.T.M.A.N.] DAT (DNT) IPv6?
Hello, just wondering: Is there a DAT implementation for IPv6? I distributed neighbor table would certainly be nice :) Greetz, Jan
[B.A.T.M.A.N.] On compat version 15 in the Freifunk-KBU network
Hello folks, I'd like to give some thought's for the upcoming batman-adv release, especially on compat version 15. This somewhat summarizes my discussion with T_X on the wireless community weekend (WCW -- 2013-05-10 - 2013-05-12). As some of you may know already, we're running a small freifunk network in western Germany (Köln-Bonn-area) - about ~100 nodes in total; ~40 are online at the moment. It's not my intention to bash on batman-adv in general or to start flame-wars (like: batman-adv vs. olsr or something else) - I'd like to provide some statement on the impact of the upcoming protocol changes. 1. The upcoming protocol change appears to be painful - we have no suitable migration strategy. Compat version 14 and 15 will be incompatible. Nodes will loose mesh connectivity (to older nodes). Since we cannot upgrade all nodes at once, we'll have to run different networks in parallel. In fact, we're running two networks at the moment (compat 13 and 14). We aren't able to upgrade the existing compat 13 nodes in the next months - by that, we'll probably have some compat 13 ones for at least 1/2 year. There is no way of telling newer nodes to use compat version 13. Since some supernodes (dedicated servers) are part of the batman-adv cloud, we need twice the servers as well. Upgrading to compat version 15 will require a huge amount of work: New infrastructure (servers) must be deployed and nodes meshing with each other must be upgraded in parallel. 2. Backwards-compatilibilty doesn't seem to be a design target. Looking at other routing protocols (BGP-4 - for example) - they provide decent ways for protocol extension while still providing backwards compatibility for the version specified 7 years ago. Looking at batman-adv a lot of protocol changes have been introduced in the past years - but no backwards compatibility is there. Eg. - There will be no flag for using compat 14 version in newer version of batman-adv - if 15 is out. - Nodes using 14 and 15 cannot mesh. Since batman-adv depends on the kernel (old batman-adv versions won't build with recent kernels) using 14 will not be an option in a few months: If some router hardware requires recent Kernel / OpenWRT releases version 14 might no longer be used. 3. Conclusion - Freifunk KBU will not use compat version 15 in the foreseeable future - We're aware, that we cannot use version 14 in some months - At the point, when 14 becomes unusable (introduced into OpenWRT Kernels / release we need -- or Debian), we will almost certainly discontinue batman-adv and go one with sth. else. (Due to kernel deps it's easier to run batman-adv / olsr in parallel than two different version of batman-adv) - We'd by very happy if someone forks batman-adv to provide modules for recent kernels still using version 14 - Maybe, some future version of batman-adv addresses backwards-compatibility. Maybe, in 2018 we will have a look at batman-adv again - noticing, that batman-adv remained stable (or migratable) over the last years and start using it again. Maybe, batman-adv will fit our needs, then. Thanks for your time, greetz, yanosz
Re: [B.A.T.M.A.N.] Advice for dense meshes
Hello, Am 22.01.2013 um 13:54 schrieb Steve Song: A colleague of mine is building a batman-adv mesh network in an apartment building with essentially one node per apartment. Not surprisingly, results in a very dense mesh with each node having a large number of neighbours. Here is a typical batctl o output http://pastebin.com/aAR43hj7 This results in some fairly slow connections. I am seeking some general advice on how to optimise batman-adv in the context of a dense mesh. Options that we have considered include turning the radio transmit power down on all of the devices and/or alternating channels on different floors (e.g. 1,6,11,1,6,11,etc). However, it is not clear to us what is the best strategy in this context. Grateful for any tips or suggestions you may have. using different channels is always a good idea when it comes to crowded places. Another option in to use 5 Ghz for mesh and 2.4 Ghz for clients. Another options is to use less nodes - one node per appartment might be too much. Turning down the transmit power can help, but might have the opposite effect. (Interference- vs. transmission range) Another - imho the most important rule - is to limit hops Each hop limits the total throughput: if using same wireless channels one hop cannot forward data while receiving. It has to wait. To summarize: - Minimize the (expected) hop count. - Use as many channels as possible (incl. 5 Ghz) - Use as few nodes per channel as needed to provide coverage Keep smiling yanosz
Re: [B.A.T.M.A.N.] Unterstanding gateway-mode - why do nodes have a sticky gateway
Hello world! I'm back: Am 16.01.2013 um 02:42 schrieb NicoEchániz: I wanted to share this proposal we arrived at after discussion with some AlterMundi hackers, so we can discuss it during our future IRC session. I've previously shared it with yanosz, who had some observations that he can better explain, but agreed on this initial assumption which triggered the proposal: I'll be available on IRC this evening (20:00 CET) - sadly, I'ven't found any other contributers in our freifunk-community? The scenario we see in our networks is that over a certain link quality which is considered acceptable, we want the clients to choose the gw with better bandwidth. So if for example this quality floor is TQ 100, then if a gw has 6Mbit/s advertised b/w and another has 3Mbit/s, the clients that see this gateways with a TQ above 100 will choose the faster one between them. We observed that in the current implementation, advertized gateway throughput is used to modify the final gw selection by publishing unrealistic bandwidth. The proposal tries to fix this, as well as the dynamic switching for selection class 1. Looking at the current code involved we also believe it would allow to make the implementation simpler. This would be the proposed options: gw_sel_class [1,2] 1 will consider gw throughput, 2 will only consider TQ. When using selection class 1, clients will switch gateways if one with better throughput becomes available and reachable with a TQ above gw_tq_floor (see below). Defaults to 2. gw_tq_floor Only relevant for gw_sel_class 1. Above this TQ floor, the gw with the best advertised throughput will be chosen.* Defaults to 100(?) gw_tq_threshold TQ delta that triggers a gw switch in the client. If gw_sel_class is 1, the tq_threshold will only be considered to choose between two or more gateways advertising the same winning throughput on the net. Defaults to 20. Well - I still think, that having a fixed gw_tq_floor may cause unstable gateway-selections if tq oscillates around gw_tq_floor. Perhaps we can discuss this in detail. Thanks, Keep smiling yanosz
Re: [B.A.T.M.A.N.] Unterstanding gateway-mode - why do nodes have a sticky gateway
Hello, Am 09.01.2013 um 07:30 schrieb Marek Lindner: On Wednesday, January 09, 2013 00:35:17 NicoEchániz wrote: On 01/08/2013 03:15 AM, Marek Lindner wrote: May I propose that we schedule an IRC discussion for this topic ? I lost track of what you are actually trying to achieve and I see things getting more and more complicated. Whatever idea you come up with keep in mind that somebody has to do the following before this solution can be merged: * batman-adv code has to be written, tested debugged * batman-adv kernel doc has to be written * batctl code has to be written, tested debugged * man page has to be updated * user documentation for $your_new_mechanism has to be written Unless you manage to convince somebody else to do the work for you that somebody who has to do all work will be you. In other words: It is in your best interest to keep things as simple as possible. :-) agreed. When would it be a good time for an IRC discussion on this matter? I am idling on IRC most of the time. Depends on you guys. Sorry for answering late. We'll have our freifunk community-meeting tomorrow evening and we're going to talk about that. For our IRC-meeting, I'd prefer some day next week - If it's ok for you. Thanks, Keep smiling yanosz
Re: [B.A.T.M.A.N.] Unterstanding gateway-mode - why do nodes have a sticky gateway
Hello, Am 09.01.2013 um 18:39 schrieb NicoEchániz: On 01/09/2013 08:49 AM, Jan Lühr wrote: Hello, Am 09.01.2013 um 07:30 schrieb Marek Lindner: On Wednesday, January 09, 2013 00:35:17 NicoEchániz wrote: On 01/08/2013 03:15 AM, Marek Lindner wrote: May I propose that we schedule an IRC discussion for this topic ? I lost track of what you are actually trying to achieve and I see things getting more and more complicated. Whatever idea you come up with keep in mind that somebody has to do the following before this solution can be merged: * batman-adv code has to be written, tested debugged * batman-adv kernel doc has to be written * batctl code has to be written, tested debugged * man page has to be updated * user documentation for $your_new_mechanism has to be written Unless you manage to convince somebody else to do the work for you that somebody who has to do all work will be you. In other words: It is in your best interest to keep things as simple as possible. :-) agreed. When would it be a good time for an IRC discussion on this matter? I am idling on IRC most of the time. Depends on you guys. Sorry for answering late. We'll have our freifunk community-meeting tomorrow evening and we're going to talk about that. For our IRC-meeting, I'd prefer some day next week - If it's ok for you. I'm also idling on IRC usually; next week is ok; Jan what about monday or twesday?. thanks for your reply - I'll propose a day as soon as I know might be joining us. Keep smiling yanosz
Re: [B.A.T.M.A.N.] Unterstanding gateway-mode - why do nodes have a sticky gateway
Hello, Am 05.01.2013 um 10:47 schrieb Jan Lühr: Am 05.01.2013 um 00:28 schrieb NicoEchániz: On 01/04/2013 02:41 PM, Marek Lindner wrote: On Friday, January 04, 2013 23:12:59 NicoEchániz wrote: Selection class 3 and beyond do change the gateway when a better one becomes available. But as you correctly pointed out these selection classes don't consider the announced bandwidth for the simple reason that nobody cared. In most wireless scenarios (the main playground for batman-adv) the path towards the gateway turned out to be more critical than the gateway's pipe to the internet. Feel free to propose something if you care about having this option. The necessary code shouldn't be too hard to add. Backward compatibility will be more difficult to achieve. I believe that those of us who chose to use selection class 1 would prefer if it were dynamic. So if a new gateway appears, then the client would evaluate with the same previous criteria if it is better or not (considering the gateway's advertised throughput as well as the link quality) and switch accordingly. I see no reason why when someone chooses selection class 1 she would be expecting to choose the best available gw once and not ever check again. If a network is stable, when a new better gw appears in some area it won't be selected until nodes are restarted, and that might be a long time. You misread my statement. Nobody needs to be convinced of the usefulness of such a feature (at least for what concerns myself). Somebody needs to come up with a solid proposal which then needs to be implemented. That is what I meant by saying Feel free to propose something [..]. One other change that might be interesting would be the addition of a setting for how much the advertised throughput affects gw selection. I've seen Jan uses 96/96Mbit as advertised throughput on one router; I do the same on our main gateway, but maybe it would be better if we could actually advertise the real throughput and have a setting to control how much the bandwidth difference affects the selection. Again, feel free to propose something which can be discussed. It could just be implemented as a new setting like: gw_bw_weight which would regulate how much gw bandwidth/throughput affects best gw calculation. I don't know what's the current algorithm but I guess there would be no problem with backwards compatibility here; if the value is not set, the default should produce the same result we get now. So to make the proposal clear: I'd change options for gw_mode client to something like this: gw_sel_class [1|2] (1 considers TQ and bw, 2 only considers TQ) gw_tq_threshold [3...256] the minimum TQ delta to switch to a better gw, defaults to 20. gw_bw_weight [value from 0 to 1] how much does advertised gw bandwidth affect the selection. 0 does not affect selection, 1 has most effect. Defaults to 1 but has no effect if gw_sel_class is 2. For backwards compatibility, if gw_sel_class 2, it should behave as the proposed option 2 with gw_tq_threshold set to the actual value. So, for example gw_sel_class 20 would be translated to: gw_sel_class 2, gw_tq_threshold 20, al least until it's deprecated. I guess, we'll be fine with this. However, it's not clear to me, in what way (formula) gw_bw_weight will actually influence the selection process. Do you have any proposals? Ok, I make one - for Link-Usability (or attractiveness, or whatever you like) LU, Gateway-Class GC and TQ: LU = (1 - gw_bw_weight) * TQ + gw_bw_weight * GC gw_tq_threshold should then be renamed to gw_lu_threshold. However, that implies, that TQ and GC should are linear in LU and that min( TQ ) = min ( GC ) as well as max ( TQ ) = max( GC ). I'll be fine with the assuming that TQ and GC should be linear in LU - GC can be scaled such that the min / max condition holds LU = (1 - gw_bw_weight) * TQ + gw_bw_weight * ( c1 * GC + c2) (for contants c1, c2) @Marek, Nico : What Do you think? Keep smiling yanosz
Re: [B.A.T.M.A.N.] Unterstanding gateway-mode - why do nodes have a sticky gateway
Hello, Am 05.01.2013 um 00:28 schrieb NicoEchániz: On 01/04/2013 02:41 PM, Marek Lindner wrote: On Friday, January 04, 2013 23:12:59 NicoEchániz wrote: Selection class 3 and beyond do change the gateway when a better one becomes available. But as you correctly pointed out these selection classes don't consider the announced bandwidth for the simple reason that nobody cared. In most wireless scenarios (the main playground for batman-adv) the path towards the gateway turned out to be more critical than the gateway's pipe to the internet. Feel free to propose something if you care about having this option. The necessary code shouldn't be too hard to add. Backward compatibility will be more difficult to achieve. I believe that those of us who chose to use selection class 1 would prefer if it were dynamic. So if a new gateway appears, then the client would evaluate with the same previous criteria if it is better or not (considering the gateway's advertised throughput as well as the link quality) and switch accordingly. I see no reason why when someone chooses selection class 1 she would be expecting to choose the best available gw once and not ever check again. If a network is stable, when a new better gw appears in some area it won't be selected until nodes are restarted, and that might be a long time. You misread my statement. Nobody needs to be convinced of the usefulness of such a feature (at least for what concerns myself). Somebody needs to come up with a solid proposal which then needs to be implemented. That is what I meant by saying Feel free to propose something [..]. One other change that might be interesting would be the addition of a setting for how much the advertised throughput affects gw selection. I've seen Jan uses 96/96Mbit as advertised throughput on one router; I do the same on our main gateway, but maybe it would be better if we could actually advertise the real throughput and have a setting to control how much the bandwidth difference affects the selection. Again, feel free to propose something which can be discussed. It could just be implemented as a new setting like: gw_bw_weight which would regulate how much gw bandwidth/throughput affects best gw calculation. I don't know what's the current algorithm but I guess there would be no problem with backwards compatibility here; if the value is not set, the default should produce the same result we get now. So to make the proposal clear: I'd change options for gw_mode client to something like this: gw_sel_class [1|2] (1 considers TQ and bw, 2 only considers TQ) gw_tq_threshold [3...256] the minimum TQ delta to switch to a better gw, defaults to 20. gw_bw_weight [value from 0 to 1] how much does advertised gw bandwidth affect the selection. 0 does not affect selection, 1 has most effect. Defaults to 1 but has no effect if gw_sel_class is 2. For backwards compatibility, if gw_sel_class 2, it should behave as the proposed option 2 with gw_tq_threshold set to the actual value. So, for example gw_sel_class 20 would be translated to: gw_sel_class 2, gw_tq_threshold 20, al least until it's deprecated. I guess, we'll be fine with this. However, it's not clear to me, in what way (formula) gw_bw_weight will actually influence the selection process. Do you have any proposals? Thanks Keep smiling yanosz
Re: [B.A.T.M.A.N.] Packet-Loss-Peaks in a Freifunk-Network
Hi Linus, nice to hear from you again. Am 04.01.2013 um 06:34 schrieb Linus Lüssing: What were the exact batman-adv versions at the time you made those graphs for the shown nodes as well as any intermediate nodes (although if I got your setup right then the nodes in those graphs do not have any intermediate batman-adv hops involved, do they?)? Does it make a difference if you make the entries in the neighbor solicitation table permanent (e.g. with 'ip -6 neigh')? Do the same spikes appear with a 'batctl ping' (or if adding such graphs is quite a hassle then at least, is the overall 'batctl ping' packet loss similar to the one from the ip ping)? Do you see any Changing route towards events in the batman-adv logs between these supposedly always direct neighbors - if yes, maybe with a similar frequency? Or even better, check whether the same packet loss appears when isolating any of such pairs of nodes so that no batman-adv route changes could happen at all. For some reason, the spikes disappeared around midnight - and I don't know why. Maybe batman-adv converged to a different plan (or converged at all) or the Flying Spaghetti Monster our network to work nowadays. I'll go one inspecting this issue, but I cannot answer any of your questions. :-( Keep smiling yanosz
Re: [B.A.T.M.A.N.] Unterstanding gateway-mode - why do nodes have a sticky gateway
Hello, Am 04.01.2013 um 18:41 schrieb Marek Lindner: On Friday, January 04, 2013 23:12:59 NicoEchániz wrote: Selection class 3 and beyond do change the gateway when a better one becomes available. But as you correctly pointed out these selection classes don't consider the announced bandwidth for the simple reason that nobody cared. In most wireless scenarios (the main playground for batman-adv) the path towards the gateway turned out to be more critical than the gateway's pipe to the internet. Feel free to propose something if you care about having this option. The necessary code shouldn't be too hard to add. Backward compatibility will be more difficult to achieve. I believe that those of us who chose to use selection class 1 would prefer if it were dynamic. So if a new gateway appears, then the client would evaluate with the same previous criteria if it is better or not (considering the gateway's advertised throughput as well as the link quality) and switch accordingly. I see no reason why when someone chooses selection class 1 she would be expecting to choose the best available gw once and not ever check again. If a network is stable, when a new better gw appears in some area it won't be selected until nodes are restarted, and that might be a long time. You misread my statement. Nobody needs to be convinced of the usefulness of such a feature (at least for what concerns myself). Somebody needs to come up with a solid proposal which then needs to be implemented. That is what I meant by saying Feel free to propose something [..]. Note that switching immediately as soon as a better gateway comes around isn't the best solution. Imagine a case in which a gateway announcing the highest bandwidth in your network barely is visible for your gateway client. It will keep switching between gateways, thereby breaking all your stateful connections such as: ssh, vpns, video music streams, etc Thanks for being open minded. Imho there are a few tweaks for doing so. For our networks, it's not important, that all clients choose the gateway with the highest data rate. Basically, we have 3 gateway-types - Regular Gateways, that should be used if they are reachable (above a certain TQ threshold) - Backup Gateways - Gateways, that should be used if no regular Gateway is available. - Test Gateways - Just for testing purpose - should be used if no Backup-Gateway is available. Imho others may have more / or less classes. basically what is needed here is a positive gateway (route priority) - imho the already existing gateway-class may be used for doing so. In order to prevent switching immediately as soon as a better gateway comes around, we may can go this way: gw_mode client may have three parameters: 1st: Switching threshold based on TQ (Like today, eg 20: late switching) 2nd: Switching threshold based on Gateway Class). GC-Thresh 3rd: LQ-Threshold for GW-class-switch GC-TQ-Thresh Meaning: Switch gateway to same / better class, if there is a Gateway having TQ xx-Times bigger (traditional late switching with xx) Switch gateway to a better class, if a Class is GC-Thresh times better, and LQ is above TQ-GC-Thresh Switch gateway to a worse class, if all existing gateway of same or better class have LQ GC-TQ-Thresh However, this is quite complex and I don't know if everybody will be happy with this - However, I'll be so. For backwards compatibility, old configurations can be detected by counting the number of parameters and react according old defaults. If you like thinks to be more complex - Different thresholds for blocking DHCPDISCOVER / DHCPREQUEST ;-) Thanks, Keep smiling yanosz
Re: [B.A.T.M.A.N.] Unterstanding gateway-mode - why do nodes have a sticky gateway
Hello, Am 03.01.2013 um 02:57 schrieb Marek Lindner: On Thursday, January 03, 2013 07:28:02 Jan Lühr wrote: The client should use 6a:4b:93:de:00:84 as a gatway, since it provides much higher data rates - however, it is stuck at aa:31:0e:4a:0f:1d. I have observed the same behavior. That's quite frustrating. Can I debug, when and why batman-adv actually chooses as specific gateway? Yes, you can. Enable the batman-adv debug log at compile and runtime. While retrieving the 'batman' log messages you should see something like: Adding route to gateway .. Changing route to gateway .. Found new gateway .. etc Maybe you can post the result here ? Ok, Let's give it a try: === Command Log a) Situation on node at beginng - just one gateway, the right one. Freifunk-b0487acb2d58:~# batctl gwl Gateway (#/255) Nexthop [outgoingIF]: gw_class ... [B.A.T.M.A.N. adv 2012.4.0, MainIF/MAC: wlan0-1/b2:48:7a:cb:2d:59 (bat0)] = 72:8b:06:40:61:2e (248) 72:8b:06:40:61:2e [ mesh-vpn]: 215 - 96MBit/96MBit b) Enable gateway mode on the backup gateway root@fastd3:~# /usr/local/sbin/batctl gw server 1Mbit/1Mbit Gateway-List is: Freifunk-b0487acb2d58:~# batctl gwl Gateway (#/255) Nexthop [outgoingIF]: gw_class ... [B.A.T.M.A.N. adv 2012.4.0, MainIF/MAC: wlan0-1/b2:48:7a:cb:2d:59 (bat0)] = 72:8b:06:40:61:2e (255) 72:8b:06:40:61:2e [ mesh-vpn]: 215 - 96MBit/96MBit aa:31:0e:4a:0f:1d (255) aa:31:0e:4a:0f:1d [ mesh-vpn]: 39 - 1024KBit/1024KBit c) Disable regular gateway, force node to switch to backup kif:~# batctl gw_mode client Gateway-List is: reifunk-b0487acb2d58:~# batctl gwl Gateway (#/255) Nexthop [outgoingIF]: gw_class ... [B.A.T.M.A.N. adv 2012.4.0, MainIF/MAC: wlan0-1/b2:48:7a:cb:2d:59 (bat0)] = aa:31:0e:4a:0f:1d (255) aa:31:0e:4a:0f:1d [ mesh-vpn]: 39 - 1024KBit/1024KBit d) Re-enable regular gateway root@kif:~# /usr/local/sbin/batctl gw server 100Mbit/100Mbit Gateway-List is Freifunk-b0487acb2d58:~# batctl gwl Gateway (#/255) Nexthop [outgoingIF]: gw_class ... [B.A.T.M.A.N. adv 2012.4.0, MainIF/MAC: wlan0-1/b2:48:7a:cb:2d:59 (bat0)] 72:8b:06:40:61:2e (254) 72:8b:06:40:61:2e [ mesh-vpn]: 215 - 96MBit/96MBit = aa:31:0e:4a:0f:1d (255) aa:31:0e:4a:0f:1d [ mesh-vpn]: 39 - 1024KBit/1024KBit e) Wait some time f) Stop Logging === End of command Log === Node log of that hour http://jluehr.de/batman-adv.log.gz zgrep -i gateway batman-adv.log.gz [ 1022240] Gateway class of originator aa:31:0e:4a:0f:1d changed from 0 to 39 [ 1035100] Gateway class of originator 72:8b:06:40:61:2e changed from 215 to 0 [ 1035100] Gateway 72:8b:06:40:61:2e removed from gateway list [ 1035830] Changing route to gateway aa:31:0e:4a:0f:1d (gw_flags: 39, tq: 255) [ 1046020] Gateway class of originator 72:8b:06:40:61:2e changed from 0 to 215
[B.A.T.M.A.N.] Packet-Loss-Peaks in a Freifunk-Network
Hello folks, we spent some effort in measuring the stability on our freifunk-network. The networt itself consists of two networks. - Tinc-VPN, batman-adv 2011.2.0 - fastd-VPN, batman-adv 2012.4.0 The fastd-network has two gateways (kif and fastd3) and two nodes. Each node is connected to both gateways. (Example output of batctl o see below) Please take a look at: http://kbu.freifunk.net/index.php?title=Statistik#Ping_innerhalb_des_Freifunknetz While the first chart shows the stability of a node connected directly via tincy, the later ones use fastd. - Chart 2 shows kif - node1 - Chart 3 shows fastd3 - node1 - Chart 4 shows kif - node2 We noticed a few, unusual things: - regardless of the link tested (kif or fastd3) and regardless of the node (node1, node2) small interruptions (loss-peaks) are appearing. - These peaks appear almost synchronous, a few noise comes from different vpn-links and node-wan-uplinks. - Since all links are wired, radio-noise won't have an impact - The losses appear in batman-adv 2011.2.0 as well as in batman-adv 2012.4.0. Thus I suspect that batman-adv is triggering theses interruptions. Have you notice same problems in past? Is there a way to fix these outages? Thanks, Keep smiling yanosz # batctl o [B.A.T.M.A.N. adv 2012.4.0, MainIF/MAC: wlan0-1/b2:48:7a:cb:2d:59 (bat0)] Originator last-seen (#/255) Nexthop [outgoingIF]: Potential nexthops ... vpn30.130s (255) vpn3 [ mesh-vpn]: kif (198) vpn3 (255) kif0.280s (255) kif [ mesh-vpn]: vpn3 ( 0) kif (255) node12.700s (225) kif [ mesh-vpn]: kif (225) vpn3 (225) node23.110s (223) kif [ mesh-vpn]: kif (223) vpn3 (225)
Re: [B.A.T.M.A.N.] Unterstanding gateway-mode - why do nodes have a sticky gateway
Hello, Am 03.01.2013 um 15:58 schrieb Marek Lindner: On Thursday, January 03, 2013 19:03:11 Jan Lühr wrote: c) Disable regular gateway, force node to switch to backup kif:~# batctl gw_mode client Gateway-List is: reifunk-b0487acb2d58:~# batctl gwl Gateway (#/255) Nexthop [outgoingIF]: gw_class ... [B.A.T.M.A.N. adv 2012.4.0, MainIF/MAC: wlan0-1/b2:48:7a:cb:2d:59 (bat0)] = aa:31:0e:4a:0f:1d (255) aa:31:0e:4a:0f:1d [ mesh-vpn]: 39 - 1024KBit/1024KBit d) Re-enable regular gateway root@kif:~# /usr/local/sbin/batctl gw server 100Mbit/100Mbit Ah! You failed to mention this part in your initial email. The behavior is easily explained: batman-adv does not switch gateway whenever a new gateway is found (even if it is a better gateway) unless the selection class is on fast or late switching. It is. - Sorry, forgot to mention: Freifunk-b0487acb2d58:~# batctl gw_mode client (selection class: 1) = 1 - fast connection consider the gateway's advertised throughput as well as the link quality towards the gateway Keep smiling yanosz
Re: [B.A.T.M.A.N.] Packet-Loss-Peaks in a Freifunk-Network
Hello, Am 03.01.2013 um 16:00 schrieb Marek Lindner: On Thursday, January 03, 2013 22:16:32 Jan Lühr wrote: We noticed a few, unusual things: - regardless of the link tested (kif or fastd3) and regardless of the node (node1, node2) small interruptions (loss-peaks) are appearing. - These peaks appear almost synchronous, a few noise comes from different vpn-links and node-wan-uplinks. - Since all links are wired, radio-noise won't have an impact - The losses appear in batman-adv 2011.2.0 as well as in batman-adv 2012.4.0. Thus I suspect that batman-adv is triggering theses interruptions. If you believe batman-adv is the culprit, please remove batman-adv from your test setup and repeat the exact same test. Otherwise we move into the realm of speculation. Well, I started echoing on the underlying VPN-connection as well and changed the chart-titles for clearance: In chart 4 and chart 5 RTT / Loss from kif.kbu to node-1 is measured. In chart 4 kif is sending ICMPv6 to the link-local address of bat0-interface - in chart 5 the link_local address of the underlying vpn-interface is used. The loss-peaks appear in char 4 (bat0) only. Thanks for your help, Keep smiling yanosz PS: Forgot to click reply to all, sorry for that. Well, I set it f2up this list, since you're node subscribed. Sorry for not mentioning it.
Re: [B.A.T.M.A.N.] Unterstanding gateway-mode - why do nodes have a sticky gateway
Hello, Am 01.01.2013 um 18:25 schrieb NicoEchániz: On 12/30/2012 10:16 PM, Jan Lühr wrote: Hello, I started using batman-adv's gateway mode. Sadly, I ran into some trouble - A client is connected to two gateways via vpn (fastd): # batctl gw_mode client (selection class: 1) # batctl gwl Gateway (#/255) Nexthop [outgoingIF]: gw_class ... [B.A.T.M.A.N. adv 2012.4.0, MainIF/MAC: wlan0-1/f6:ec:38:e9:72:35 (bat0)] 6a:4b:93:de:00:84 (255) 6a:4b:93:de:00:84 [ mesh-vpn]: 207 - 48MBit/48MBit = aa:31:0e:4a:0f:1d (254) aa:31:0e:4a:0f:1d [ mesh-vpn]: 39 - 1024KBit/1024KBit # batctl o [B.A.T.M.A.N. adv 2012.4.0, MainIF/MAC: wlan0-1/f6:ec:38:e9:72:35 (bat0)] Originator last-seen (#/255) Nexthop [outgoingIF]: Potential nexthops ... aa:31:0e:4a:0f:1d0.500s (255) aa:31:0e:4a:0f:1d [ mesh-vpn]: aa:31:0e:4a:0f:1d (255) 6a:4b:93:de:00:840.940s (255) 6a:4b:93:de:00:84 [ mesh-vpn]: 6a:4b:93:de:00:84 (255) The client should use 6a:4b:93:de:00:84 as a gatway, since it provides much higher data rates - however, it is stuck at aa:31:0e:4a:0f:1d. I have observed the same behavior. That's quite frustrating. Can I debug, when and why batman-adv actually chooses as specific gateway? Thanks, Keep smiling yanosz
[B.A.T.M.A.N.] Unterstanding gateway-mode - why does node have a sticky gateway
Hello, I started using batman-adv's gateway mode. Sadly, I ran into some trouble - A client is connected to two gateways via vpn (fastd): # batctl gw_mode client (selection class: 1) # batctl gwl Gateway (#/255) Nexthop [outgoingIF]: gw_class ... [B.A.T.M.A.N. adv 2012.4.0, MainIF/MAC: wlan0-1/f6:ec:38:e9:72:35 (bat0)] 6a:4b:93:de:00:84 (255) 6a:4b:93:de:00:84 [ mesh-vpn]: 207 - 48MBit/48MBit = aa:31:0e:4a:0f:1d (254) aa:31:0e:4a:0f:1d [ mesh-vpn]: 39 - 1024KBit/1024KBit # batctl o [B.A.T.M.A.N. adv 2012.4.0, MainIF/MAC: wlan0-1/f6:ec:38:e9:72:35 (bat0)] Originator last-seen (#/255) Nexthop [outgoingIF]: Potential nexthops ... aa:31:0e:4a:0f:1d0.500s (255) aa:31:0e:4a:0f:1d [ mesh-vpn]: aa:31:0e:4a:0f:1d (255) 6a:4b:93:de:00:840.940s (255) 6a:4b:93:de:00:84 [ mesh-vpn]: 6a:4b:93:de:00:84 (255) The client should use 6a:4b:93:de:00:84 as a gatway, since it provides much higher data rates - however, it is stuck at aa:31:0e:4a:0f:1d. Why is that? How can I make the client choosing 6a:4b:93:de:00:84? Thanks in advance, Keep smiling yanosz
Re: [B.A.T.M.A.N.] Batman-adv gateways vs IPv6
Hello, Am 15.03.2011 um 16:15 schrieb Linus Lüssing: On Mon, Mar 14, 2011 at 01:39:51PM +0100, Jan Lühr wrote: We're up to deploy batman-adv in an wireless mesh network with multiple routers / gateways providing access to the outside world. Furthermore deploying an IPv4 / v6 dual stack networking is one goals we're trying to archive. We stumbled up on batman-adv and its gateway feature, which looks quite promising. (http://www.open-mesh.org/wiki/batman-adv-gateways) However, since dhcp is used for implementing batman-adv gateways, this feature doesn't affect IPv6 ND / icmp6 - am I right? Yes, that's right, the gateway feature only applies for DHCPv4/6. And IPv6 ND / icmp6 is not being touched at all. Like ARP needs to be done for IPv4, also still IPv6 needs to do IPv6 ND with all hosts. Also the IPv6 Router Advertisements, which I guess you were indirectly refering to with the icmp6, are untouched, they are still being flooded through the mesh. There were some ideas on how a batman-adv gateway optimization could look like for that, based on RFC4191, Router Preference, but some tests showed that the Router Preference in Linux was not working as expected: http://comments.gmane.org/gmane.linux.ipv6.usagi.users/2242 But do you actually need that, is there a difficulty with using DHCPv6 in your use-case? Having this in mind, what do you think is a suitable deployment strategy for IPv6 in batman-adv networks? Of course, assigning a single /64 to the mesh-cloud is a simple option, but icmp6 flooding might occur ... As said previously, IPv6 ND will be needed anyway, so there's ICMP flooding anyway. However that's rather low bandwidth traffic and doesn't happen that frequently. Could you describe your use-case a little further? So far no one has noticed any constraints with that in practical setups, afaik. Thanks for you reply - I'm even not sure about our use-case. We're currently discussing options on running an batman-adv in a freifunk-style network, using global IPv6 adresses. My concern is, that using batman-adv has the achilles heel of creating a single collision domain, that'll cause scaling issues. By that, seperating the mesh into different collision domains (using multiple /64 for exmaple) is an option - however, I'm still looking for a practical approach to do so. For instance: If we deploy a bunch of ipv6 routers to our cloud, we have to make sure, that IPv6 autoconfiguration will use the address-space of a nearby router. If we'd connect all /64-Annoucing Routers via batman-adv, clients will get multiple /64 Addresses by that routers. So each node must be smart on what icmp6-messages have to be forwareded ... However - this is quite vague and we're looking for a feasible routing strategy. Thanks, Jan
Re: [B.A.T.M.A.N.] Batman-adv gateways vs IPv6
Hello, Am 15.03.2011 um 16:38 schrieb Jan Lühr: Hello, Am 15.03.2011 um 16:15 schrieb Linus Lüssing: Thanks for you reply - I'm even not sure about our use-case. We're currently discussing options on running an batman-adv in a freifunk-style network, using global IPv6 adresses. My concern is, that using batman-adv has the achilles heel of creating a single collision domain, that'll cause scaling issues. Sorry, I meant broadcast / anycast domain. Keep smiling Jan