Re: [Babel-users] Route-dete :wq

2018-03-11 Thread Dave Taht
On Fri, Mar 9, 2018 at 3:29 PM, Christof Schulze
 wrote:
>> I start running into trouble with 1000+ routes using 1Mbit mcast.
>> Sooner if I seriously
>> slam the network with flent or something else that abuses mcast like mdns.
>> YMMV.
>
> So what is the culprit here? What would it take to add an order of
> magnitude?

Most of the meshy networks run wifi mcast at 11Mbit or higher. Ath9k
devices support this, many others don't.

Experiment with the new unicast code, as that will transfer routes at
the underlying rate of the medium (up to 300Mbit on wireless-n).

Don't run odhcpd or network manager either. They tend to get in a
tussle with babeld in the kernel.

Simulate first, deploy second. Don't prematurely optimize. :)

I have a backlog of other optimizations for babel lying about, ranging
from trivial stuff like at least logging when you are low on compute
or overbuffered:

https://github.com/dtaht/rabeld/commit/b74b4a6f9b532717ee93346963efd894e94615b3
 to something that tries to be more aware of compute bounds, to
another thing that pushes some work down into the kernel via bpf



>
>> As for aggregation and filtering: Most of my Aps have at minimum,
>> ethernet, and two channels - usually four, including the meshy links.
>> The meshy links are ptp, so I've generally "wasted" an entire /22 ipv4
>> network to talk to the Aps. ipv6 /62s.
>>
>> The lab component of my network, for example, has two main links to
>> the production net, and the gateways only announce the
>> subnet it is on (172.22.0.0/16). This cuts the churn seen outside the
>> network when I do crazy things like reboot the whole thing.
>>
>> The biggest problem I've run into, is that meshy links, are, meshy -
>> and I've lost track of the number of times where
>> I had a well defined /16 network in the lab suddenly leak all the
>> meshy /32 bits over the worst possible link - because I plugged
>> something in that was adhoc (and poorly) connected to the outside
>> network that I shouldn't have.
>>
>> Lede creates one /48 ULA by default per AP, and then more /60s. I've
>> had a tendency to try to share one /48, but more recently I was trying
>> to go native ipv6 and disabled the ULA generation entirely.
>>
>> I don't bridge anything except sometimes on the last Aps on a link
>> (which don't announce babel on that bridge). Bridging can do weird
>> things to daemons that want also to be measuring the individual links.
>>
>> So in your network design I'd try to identify your backbone links and
>> try upfront to rationally partition the network numbering scheme,
>> and still, periodically try to optimize it. It makes no sense to
>> export all the churn the last hop of a meshy, yet leaf network can
>> have to the whole network. I'd simulate what you plan, and then slam
>> it with traffic from every point with a tool like flent, and deploy
>> cake or htb+fq_codel on the ISP up/downlinks.
>
> This being a Freifunk-Network, there is not going to be much planning of the
> structure beyond mere basics.
> Until now I just hoped we could get away with having 10K+ routes in one
> network which would translate into 3k+ clients when considering many nodes
> and 3 IPv6 addresses for each client which seems to be a reasonable amount
> when taking into account a clat-address per client and IPv6 privacy
> extensions.

As toke noted, just distribute the /64. Source specific routing is cool, too,
you should try distributing real ipv6 ranges from two or more gateways.

> There are approaches to reduce the amount of routes per client including
> using nat66 on each node. You certainly are making it sound like there
> should be put some thought into reducing the amount of routes. This will be
> the next step after we have more than just a few nodes / clients inside the
> same network.
>
> BTW: the Freifunk networks use an autoupdater. This might just solve your
> tree-climbing-problem in the long run...

Across 6 versions of the OS in 6 years of deployment, and 5 different
generations of hardware, my automation problem is hopeless.

Of all the gear I've had to date the nanostation 5s, wndr3800s, and
picostations have been the best. I'm having good results
thus far with the ubnt UAP-AC-M-USes with the candeletech firmware -
aside from not having enough flash for a web interface.

>
> Cheers
> Christof
>
>>
>> I'm working these days, on making netem better emulate wifi's
>> behaviors. I'm not satisified with it yet.
>>
>
>>> Note that babeld currently sends updates as a single burst when the upate
>>> interval expires (the same is true of Toke's implementation of Babel, as
>>> far as I'm aware).  For very large networks, it would be good to split
>>> updates into one-packet pieces that are sent throughout the update
>>> interval.  I'd be glad to accept a patch that does that.

I'd rather like to keep the burst but measure how long it takes to transmit.

>
 * making babel trigger updates on newly appeared routes
>
>
>>> I've gone 

[Babel-users] [PATCH] Always specify linux route metric to defaults

2018-03-08 Thread Dave Taht
I updated both an aarch64 and an x86_64 box to git head today. No issues.

...

Attached is part one of trying to get atomic updates to work, which is
a change to how metrics are passed into the kernel. I don't know if
this change is worth
it by itself, nor do I understand why -1 was used in the first place.

The original code would export -1 into the linux kernel routing table
for unreachable routes, in addition to setting RTN_UNREACHABLE.

RTN_UNREACHABLE is all you need (at least, on modern kernels), and
changing the metric to -1 makes it impossible to do atomic
updates. Also it has a theoretical flaw in that other similar routes
with smaller metrics, inserted by other protocols, can override the
now unreachable route. Using the defaults leads to more consistent
behavior.

This uses the defaults of 0 and 1024 for ipv4 and ipv6 route metrics
respectively.
From cea7e11152b0a7cebd3572c63ccc9454e6840277 Mon Sep 17 00:00:00 2001
From: Dave Taht <d...@taht.net>
Date: Thu, 8 Mar 2018 13:00:01 -0800
Subject: [PATCH] Always specify linux route metric to defaults

The original code would export -1 into the linux kernel routing table
for unreachable routes, in addition to setting RTN_UNREACHABLE.

RTN_UNREACHABLE is all you need (at least, on modern kernels), and
changing the metric to -1 makes it impossible to do atomic
updates. Also it has a theoretical flaw in that other similar routes
with smaller metrics, inserted by other protocols, can override the
now unreachable route. Using the defaults leads to more consistent
behavior.

This uses the defaults of 0 and 1024 for ipv4 and ipv6 route metrics
respectively.
---
 kernel_netlink.c | 7 ---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/kernel_netlink.c b/kernel_netlink.c
index 94f01b0..e521ed7 100644
--- a/kernel_netlink.c
+++ b/kernel_netlink.c
@@ -100,6 +100,9 @@ int num_old_if = 0;
 
 static int dgram_socket = -1;
 
+static const int ipv4_metric = 0;
+static const int ipv6_metric = 1024;
+
 #ifndef ARPHRD_ETHER
 #warning ARPHRD_ETHER not defined, we might not support exotic link layers
 #define ARPHRD_ETHER 1
@@ -1055,9 +1058,9 @@ kernel_route(int operation, int table,
 rta = RTA_NEXT(rta, len);
 rta->rta_len = RTA_LENGTH(sizeof(int));
 rta->rta_type = RTA_PRIORITY;
+*(int*)RTA_DATA(rta) = ipv4 ? ipv4_metric : ipv6_metric;
 
 if(metric < KERNEL_INFINITY) {
-*(int*)RTA_DATA(rta) = metric;
 rta = RTA_NEXT(rta, len);
 rta->rta_len = RTA_LENGTH(sizeof(int));
 rta->rta_type = RTA_OIF;
@@ -1074,8 +1077,6 @@ kernel_route(int operation, int table,
 rta->rta_type = RTA_GATEWAY;
 memcpy(RTA_DATA(rta), gate, sizeof(struct in6_addr));
 }
-} else {
-*(int*)RTA_DATA(rta) = -1;
 }
 buf.nh.nlmsg_len = (char*)rta + rta->rta_len - buf.raw;
 
-- 
2.7.4

___
Babel-users mailing list
Babel-users@lists.alioth.debian.org
http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/babel-users

Re: [Babel-users] Route-dete :wq

2018-03-08 Thread Dave Taht
On Wed, Mar 7, 2018 at 2:41 PM, Juliusz Chroboczek <j...@irif.fr> wrote:
> Christof, I'm very much interested in your experiments, which are likely
> to improve the quality of the Babel implementations.
>
>> We have update-interval set to 5 minutes to reduce the load on the network
>> because we are hoping to run this topology on 500+ APs with 1000+ Clients.

I wrote the rtod tool to experiment with creating large topologies.
I've not got around
to publishing the veth topos, but the tool is here:
https://github.com/dtaht/rtod

One thing that fell out of that was mainline babeld had a very
suboptimal routine
for merging routes (also it's use of memcmp was inefficient) - and
juliusz put out a call for a qsorted implementation a while back.

Elsewhere in babeld, atomic updates remain elusive. I made a bit of
progress on that
last year (unreachable routes have to keep the same metric to be an
atomic change) but got beat by another bug
that juliusz/martin(?) fixed later.

These are both problems that I've long meant to get to, but the
prospect of redeploying
with the pending unicast feature and source specific routing, as well
as reflashing a
few dozen devices in treetops with modern code, has thus far stopped
me. I kind of hope
to use bird on a goodly portion of the next deployment, and hopefully
this summer
I can find someone that likes climbing trees in california more than I do.

I have high hopes for the unicast stuff to lighten the routing load by
potentially orders of magnitude.

Somewhere in there (after breaking every routing daemon and protocol
in multiple ways with rtod,
and making several improvements to "rabeld", and scheming to replace
all! all of it, with my own
massively sorted, threaded, NEON coprocessor using version of
everything rewritten from scratch and running out of time long before
it reached plausible promise)...

I read "No Free Lunch Theorems for Search", and despaired. Every
daemon and protocol will break on some number of routes. Period.

See:

https://en.wikipedia.org/wiki/No_free_lunch_in_search_and_optimization

Short of a revolution in graph theory I see no way to get anywhere on
that, and yet tend to think
dynamically making the update interval larger to account for various
bounds (cpu,bandwidth) is part of a way out,
along with having a harder r/t scheduler for the bellman-ford calc
(needn't be threaded) would help to make sure it fits within bounds,
repeating important packets more often, etc.

(please note that most of what I'm saying as per NFL, etc is common to
all routing protocols and daemons)

> The protocol is very flexible, but the reference implementation is not
> designed to work with such large update intervals.  The amount of data in
> an update is pretty small, so I would recommend reducing the update
> interval -- it should be fine with thousands of routes in your network.

Try to remember that other traffic eats capacity, and even in a
fq_codel'd environment you only can get a percentage of bandwidth
(with low delays). I've seen devices with essentially infinite  queues
for mcast, also - over 16 seconds long!

I start running into trouble with 1000+ routes using 1Mbit mcast.
Sooner if I seriously
slam the network with flent or something else that abuses mcast like mdns. YMMV.


> The right way to reduce the amount of routing traffic in Babel is not to
> increase the update interval (which can at best yield a linear reduction),
> but to use aggregation and filtering (which can yield an exponential
> decrease in a well designed network).  Dave Taht has been successful with
> this approach, perhaps he'll want to chime in.

I've also experimented with dynamically changing the broadcast update
interval, as long term stable routes are, well, long term stable. Even
a linear reduction seemed worthwhile at the time I was fiddling with
this. Another way to preserve some percentage of sanity would be to
always update default routes often but with a long declared broadcast
interval and less used routes on a longer one.

As for aggregation and filtering: Most of my APs have at minimum,
ethernet, and two channels - usually four, including the meshy links.
The meshy links are ptp, so I've generally "wasted" an entire /22 ipv4
network to talk to the APs. ipv6 /62s.

The lab component of my network, for example, has two main links to
the production net, and the gateways only announce the
subnet it is on (172.22.0.0/16). This cuts the churn seen outside the
network when I do crazy things like reboot the whole thing.

The biggest problem I've run into, is that meshy links, are, meshy -
and I've lost track of the number of times where
I had a well defined /16 network in the lab suddenly leak all the
meshy /32 bits over the worst possible link - because I plugged
something in that was adhoc (and poorly) connected to the outside
network that I shouldn't have.

Lede creates one /48 ULA by defau

Re: [Babel-users] Heads up: merged rfc6126bis into master

2018-01-22 Thread Dave Taht
While not exactly a flag day from a protocol perspective, the commit
removing keep-unfeasible will break existing startup scripts and conf
files that have it enabled.

https://github.com/jech/babeld/commit/0111f5c1d69ce643a6a76a811eb8f89fb4deb936


-- 

Dave Täht
CEO, TekLibre, LLC
http://www.teklibre.com
Tel: 1-669-226-2619

___
Babel-users mailing list
Babel-users@lists.alioth.debian.org
http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/babel-users

Re: [Babel-users] [babel] routing tables of death

2017-09-26 Thread Dave Taht
I too may be able to visit Paris in late march.

also, a start at an alternative to shortest path metrics, with so many
problems that I'd run off this page writing them up, but I was happy
to read it this morning.

 http://ieeexplore.ieee.org/abstract/document/8025937/

___
Babel-users mailing list
Babel-users@lists.alioth.debian.org
http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/babel-users


[Babel-users] routing tables of death

2017-09-08 Thread Dave Taht
I confess curiosity, now that we have quagga, bird, and mainline
babeld versions, and nifty new stuff like unicast hellos, as to how
well they interoperate at this point, as well as perform, under a
stress test like:

https://github.com/dtaht/rtod


-- 

Dave Täht
CEO, TekLibre, LLC
http://www.teklibre.com
Tel: 1-669-226-2619

___
Babel-users mailing list
Babel-users@lists.alioth.debian.org
http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/babel-users

[Babel-users] [PATCH 5/6] gitignore tags TAGS emacs and vi temp files and bad patch attempts

2017-03-09 Thread Dave Taht
From: Dave Taht <d...@taht.net>

Quiet git more.
---
 .gitignore | 9 -
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/.gitignore b/.gitignore
index 635e60b..d298e6e 100644
--- a/.gitignore
+++ b/.gitignore
@@ -2,6 +2,13 @@
 babeld
 babeld.html
 version.h
-cscope.out
+cscope.*
 gmon.out
 core
+TAGS
+tags
+babeld-whole*
+*.rej
+*.orig
+*~
+\#*#
-- 
2.7.4


___
Babel-users mailing list
Babel-users@lists.alioth.debian.org
http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/babel-users


[Babel-users] [PATCH 6/6] babeld-whole: Enable single unit compilation of all of babeld

2017-03-09 Thread Dave Taht
From: Dave Taht <d...@taht.net>

This adds compile guards and Makefile support to building all
of babeld in a single shot, as "babeld-whole".

This has compelling advantages:

- It lets you A/B two versions with different compilation options such
  as debugging on or off, or selectively enable work in progress.
  Example:

  make EXTRA_DEFINES="-DNO_DEBUG -DHAVE_NEON" babeld-whole

- (theoretically) gives the compiler more chances to find optimizations

- Compiles faster

This patch also enables verbose assembly language output of the
resulting code with babeld-whole.s.
---
 Makefile| 15 ++-
 babeld.h|  4 
 configuration.h |  5 -
 disambiguation.h|  5 +
 generate-version.sh |  2 ++
 interface.h |  4 
 kernel.h|  4 
 local.h |  4 
 message.h   |  4 
 neighbour.h |  5 +
 net.h   |  5 +
 resend.h|  6 +-
 route.h |  5 -
 rule.h  |  6 +-
 source.h|  5 +
 util.h  |  4 
 xroute.h|  5 +
 17 files changed, 83 insertions(+), 5 deletions(-)

diff --git a/Makefile b/Makefile
index 9454b1a..e8795fb 100644
--- a/Makefile
+++ b/Makefile
@@ -68,6 +68,19 @@ xroute.o: xroute.c babeld.h kernel.h neighbour.h message.h 
route.h util.h \
 version.h:
./generate-version.sh > version.h
 
+# Whole program compilation with maximum optimization
+
+babeld-whole.c: $(SRCS) $(INCLUDES)
+   cat $(SRCS) > babeld-whole.c
+
+babeld-whole.s: babeld-whole.c
+   $(CC) $(CFLAGS) -O3 $(LDFLAGS) -fwhole-program -fverbose-asm \
+babeld-whole.c -S -o babeld-whole.s
+
+babeld-whole: babeld-whole.c
+   $(CC) $(CFLAGS) $(LDFLAGS) -O3 -fwhole-program babeld-whole.c \
+  -o babeld-whole $(LDLIBS)
+
 .SUFFIXES: .man .html
 
 .man.html:
@@ -99,7 +112,7 @@ uninstall:
-rm -f $(TARGET)$(MANDIR)/man8/babeld.8
 
 clean:
-   -rm -f babeld babeld.html version.h *.o *~ core
+   -rm -f babeld babeld.html version.h *.o *~ core babeld-whole*
 
 reallyclean: clean
-rm -f TAGS tags gmon.out cscope.out
diff --git a/babeld.h b/babeld.h
index e46dd79..2b1b64d 100644
--- a/babeld.h
+++ b/babeld.h
@@ -1,3 +1,5 @@
+#ifndef _BABEL_BABELD
+#define _BABEL_BABELD
 /*
 Copyright (c) 2007, 2008 by Juliusz Chroboczek
 
@@ -110,3 +112,5 @@ void schedule_neighbours_check(int msecs, int override);
 void schedule_interfaces_check(int msecs, int override);
 int resize_receive_buffer(int size);
 int reopen_logfile(void);
+
+#endif
diff --git a/configuration.h b/configuration.h
index de2b1ba..7a4a349 100644
--- a/configuration.h
+++ b/configuration.h
@@ -1,3 +1,5 @@
+#ifndef _BABEL_CONFIGURATION
+#define _BABEL_CONFIGURATION
 /*
 Copyright (c) 2007, 2008 by Juliusz Chroboczek
 
@@ -19,7 +21,6 @@ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR 
OTHERWISE, ARISING FROM,
 OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
 THE SOFTWARE.
 */
-
 /* Values returned by parse_config_from_string. */
 
 #define CONFIG_ACTION_DONE 0
@@ -77,3 +78,5 @@ int install_filter(const unsigned char *prefix, unsigned 
short plen,
const unsigned char *src_prefix, unsigned short src_plen,
struct filter_result *result);
 int finalise_config(void);
+
+#endif
diff --git a/disambiguation.h b/disambiguation.h
index 13fc89a..aa67a68 100644
--- a/disambiguation.h
+++ b/disambiguation.h
@@ -1,3 +1,6 @@
+#ifndef _BABEL_DISAMBIGUATION
+#define _BABEL_DISAMBIGUATION
+
 /*
 Copyright (c) 2014 by Matthieu Boutier and Juliusz Chroboczek.
 
@@ -25,3 +28,5 @@ int kuninstall_route(const struct babel_route *route);
 int kswitch_routes(const struct babel_route *old, const struct babel_route 
*new);
 int kchange_route_metric(const struct babel_route *route,
  unsigned refmetric, unsigned cost, unsigned add);
+
+#endif
diff --git a/generate-version.sh b/generate-version.sh
index 22dba7d..7cd00f4 100755
--- a/generate-version.sh
+++ b/generate-version.sh
@@ -10,4 +10,6 @@ else
 version="unknown"
 fi
 
+echo "#ifndef BABELD_VERSION"
 echo "#define BABELD_VERSION \"$version\""
+echo "#endif"
diff --git a/interface.h b/interface.h
index 7294196..a2eb390 100644
--- a/interface.h
+++ b/interface.h
@@ -1,3 +1,6 @@
+#ifndef _BABEL_INTERFACE
+#define _BABEL_INTERFACE
+
 /*
 Copyright (c) 2007, 2008 by Juliusz Chroboczek
 
@@ -143,3 +146,4 @@ void set_timeout(struct timeval *timeout, int msecs);
 int interface_up(struct interface *ifp, int up);
 int interface_ll_address(struct interface *ifp, const unsigned char *address);
 void check_interfaces(void);
+#endif
diff --git a/kernel.h b/kernel.h
index b6286fc..e2a7565 100644
--- a/kernel.h
+++ b/kernel.h
@@ -1,3 +1,6 @@
+#ifndef _BABEL_KERNEL
+#define _BABEL_KERNEL
+
 /*
 Copyright (c) 2

[Babel-users] [PATCH 4/6] Tests: Add subdir and test for structure packing

2017-03-09 Thread Dave Taht
From: Dave Taht <d...@taht.net>

An initial run of this test on x86_64 shows suboptimal packing for the
filter structure (can be 96), and kernel_route (can be 64).

Other things, such as kernel_rule, buffered_update and xroute could be
padded to more natural 8 byte boundaries for this arch. Other structure
optimizations are feasible. Having this tool readily available to
measure changes across different architectures is helpful.

./show_babel_packing

Compiled with gcc 5.4.0 for the x86_64 architecture in little endian mode
  24 = sizeof (filter_result)
 112 = sizeof (filter)
  44 = sizeof (buffered_update)
  56 = sizeof (interface_conf)
 272 = sizeof (interface)
  68 = sizeof (kernel_route)
  28 = sizeof (kernel_rule)
  64 = sizeof (kernel_filter)
  24 = sizeof (local_socket)
 120 = sizeof (neighbour)
  88 = sizeof (resend)
  88 = sizeof (babel_route)
  56 = sizeof (source)
  44 = sizeof (xroute)
---
 tests/.gitignore   | 11 
 tests/Makefile | 22 
 tests/arch_detect.h| 53 +
 tests/show_babel_packing.c | 66 ++
 4 files changed, 152 insertions(+)
 create mode 100644 tests/.gitignore
 create mode 100644 tests/Makefile
 create mode 100644 tests/arch_detect.h
 create mode 100644 tests/show_babel_packing.c

diff --git a/tests/.gitignore b/tests/.gitignore
new file mode 100644
index 000..dd71803
--- /dev/null
+++ b/tests/.gitignore
@@ -0,0 +1,11 @@
+show_babel_packing
+*.o
+cscope.*
+gmon.out
+core
+TAGS
+tags
+*.rej
+*.orig
+*~
+\#*#
diff --git a/tests/Makefile b/tests/Makefile
new file mode 100644
index 000..e6fe035
--- /dev/null
+++ b/tests/Makefile
@@ -0,0 +1,22 @@
+PREFIX = /usr/local
+MANDIR = $(PREFIX)/share/man
+
+PROGS = show_babel_packing
+
+CDEBUGFLAGS = -Os -g -Wall
+
+DEFINES = $(PLATFORM_DEFINES)
+
+CFLAGS = $(CDEBUGFLAGS) $(DEFINES) $(EXTRA_DEFINES)
+
+INCLUDES = babeld.h net.h kernel.c util.h interface.h source.h neighbour.h \
+   route.h xroute.h message.h resend.h configuration.h local.h \
+   disambiguation.h rule.h version.h
+
+INCLUDES1 := $(INCLUDES:%=../%)
+
+show_babel_packing: show_babel_packing.c $(INCLUDES1)
+   $(CC) $(CFLAGS) $(LDFLAGS) -I.. $@.c -o show_babel_packing $(OBJS) 
$(LDLIBS)
+
+clean:
+   -rm -f $(PROGS) show_babel_packing.o
diff --git a/tests/arch_detect.h b/tests/arch_detect.h
new file mode 100644
index 000..103c2ef
--- /dev/null
+++ b/tests/arch_detect.h
@@ -0,0 +1,53 @@
+/**
+ * arch_detect.h
+ *
+ * Toke Høiland-Jørgensen
+ * 2017-03-08
+ */
+
+#ifndef ARCH_DETECT_H
+#define ARCH_DETECT_H
+
+#ifdef __GNUC__
+#define GCC_VERSION (__GNUC__ * 1 \
+ + __GNUC_MINOR__ * 100 \
+ + __GNUC_PATCHLEVEL__)
+#endif
+
+#define P(a)typedef struct a a ##_t; \
+   printf("%4ld = sizeof (%s)\n", sizeof(a ## _t), #a)
+
+#if !(defined(__arm__) || defined(__aarch64__) || defined(__x86_64__) \
+   || defined(__i386__) || defined(__mips__))
+   const char arch[] = "unknown";
+#else
+#if defined(__arm__) || defined(__aarch64__)
+#if defined(__ARM_ARCH_ISA_A64__) || defined(__aarch64__)
+   const char arch[] = "aarch64";
+#else
+#ifdef __ARM_ARCH_7S__
+   const char arch[] = "armv7s";
+#else
+#ifdef __ARM_ARCH_7A__
+   const char arch[] = "armv7";
+#endif
+#endif
+#endif /* end of arm detection. Fixme - need to find neon regs */
+#else
+#if defined(__x86_64__)
+   const char arch[] = "x86_64";
+#endif
+
+#if defined(__i386__)
+const char arch[] = "x86";
+#endif
+
+#if defined(__mips__)
+const char arch[] = "MIPS";
+#endif
+
+#endif
+
+#endif
+
+#endif
diff --git a/tests/show_babel_packing.c b/tests/show_babel_packing.c
new file mode 100644
index 000..a670abd
--- /dev/null
+++ b/tests/show_babel_packing.c
@@ -0,0 +1,66 @@
+/**
+ * test_packing.c
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "babeld.h"
+#include "util.h"
+#include "net.h"
+#include "kernel.h"
+#include "interface.h"
+#include "source.h"
+#include "neighbour.h"
+#include "route.h"
+#include "xroute.h"
+#include "message.h"
+#include "resend.h"
+#include "configuration.h"
+#include "local.h"
+#include "rule.h"
+#include "version.h"
+
+#include "arch_detect.h"
+
+int main()
+{
+#ifdef __GNUC__
+printf("Compiled with gcc %d.%d.%d for the %s "
+"architecture in %s mode\n",
+__GNUC__, __GNUC_MINOR__, __GNUC_PATCHLEVEL__,
+arch, __BYTE_ORDER__ == __ORDER_LITTLE_ENDIAN__ ?
+"little endian" : "big endian");
+#

[Babel-users] [PATCH 3/6] Quiet the compiler on two uninitialized variable warnings

2017-03-09 Thread Dave Taht
From: Dave Taht <d...@taht.net>

When compiled on arm and gcc 4.9.2, the compiler picks out two sets of
variables passed by reference that "might be used uninitialized".

They aren't, but certainly in the getnet case it was not immediately
obvious to either me or the compiler.

Quiet the warnings by explicitly initializing these variables.
---
 configuration.c | 4 ++--
 route.c | 2 +-
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/configuration.c b/configuration.c
index c4b4d55..b455181 100644
--- a/configuration.c
+++ b/configuration.c
@@ -273,8 +273,8 @@ getnet(int c, unsigned char **p_r, unsigned char *plen_r, 
int *af_r,
 char *t;
 unsigned char *ip;
 unsigned char addr[16];
-unsigned char plen;
-int af, rc;
+unsigned char plen = 0;
+int af = 0, rc = 0;
 
 c = getword(c, , gnc, closure);
 if(c < -1)
diff --git a/route.c b/route.c
index a4cd302..5cd2260 100644
--- a/route.c
+++ b/route.c
@@ -212,7 +212,7 @@ resize_route_table(int new_slots)
 static struct babel_route *
 insert_route(struct babel_route *route)
 {
-int i, n;
+int i, n = 0;
 
 assert(!route->installed);
 
-- 
2.7.4


___
Babel-users mailing list
Babel-users@lists.alioth.debian.org
http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/babel-users


[Babel-users] [PATCH 1/6] Improve Makefile

2017-03-09 Thread Dave Taht
From: Dave Taht <d...@taht.net>

- Add INCLUDES variable for headers
- Add support for tags and TAGS
- create "reallyclean" to get rid of tags, TAGS, gmon.out, cscope.out
- remove TAGs and gmon.out from "clean"
- Add full and correct dependencies on internal headers
---
 Makefile | 58 +-
 1 file changed, 53 insertions(+), 5 deletions(-)

diff --git a/Makefile b/Makefile
index 9b98eb8..9454b1a 100644
--- a/Makefile
+++ b/Makefile
@@ -17,14 +17,53 @@ OBJS = babeld.o net.o kernel.o util.o interface.o source.o 
neighbour.o \
route.o xroute.o message.o resend.o configuration.o local.o \
disambiguation.o rule.o
 
+INCLUDES = babeld.h net.h kernel.c util.h interface.h source.h neighbour.h \
+   route.h xroute.h message.h resend.h configuration.h local.h \
+   disambiguation.h rule.h version.h
+
+KFILES = kernel_netlink.c kernel_socket.c
+
 babeld: $(OBJS)
$(CC) $(CFLAGS) $(LDFLAGS) -o babeld $(OBJS) $(LDLIBS)
 
-babeld.o: babeld.c version.h
+babeld.o: babeld.c $(INCLUDES)
+
+local.o: local.c local.h version.h babeld.h interface.h source.h neighbour.h \
+kernel.h xroute.h route.h util.h configuration.h
+
+kernel.o: kernel_netlink.c kernel_socket.c kernel.h babeld.h
+
+configuration.o: configuration.c babeld.h util.h route.h kernel.h \
+configuration.h rule.h
+
+disambiguation.o: disambiguation.c babeld.h util.h route.h kernel.h \
+ disambiguation.h interface.h rule.h
+
+interface.o: interface.c babeld.h util.h route.h kernel.h local.h \
+interface.h neighbour.h message.h configuration.h xroute.h
+
+message.o: message.c babeld.h util.h net.h interface.h source.h neighbour.h \
+  route.h kernel.h xroute.h resend.h configuration.h
+
+neighbour.o: neighbour.c babeld.h util.h interface.h source.h route.h \
+neighbour.h message.h resend.h local.h
 
-local.o: local.c version.h
+net.o: net.c net.h babeld.h util.h
 
-kernel.o: kernel_netlink.c kernel_socket.c
+resend.o: resend.c babeld.h util.h neighbour.h message.h interface.h \
+ resend.h configuration.h
+
+route.o: route.c babeld.h util.h kernel.h interface.h source.h neighbour.h \
+route.h xroute.h message.h configuration.h local.h disambiguation.h
+
+rule.o: rule.c babeld.h util.h kernel.h configuration.h rule.h
+
+source.o: source.c babeld.h util.h interface.h route.h source.h
+
+util.o: util.c babeld.h util.h
+
+xroute.o: xroute.c babeld.h kernel.h neighbour.h message.h route.h util.h \
+ xroute.h configuration.h interface.h local.h
 
 version.h:
./generate-version.sh > version.h
@@ -36,10 +75,16 @@ version.h:
 
 babeld.html: babeld.man
 
-.PHONY: all install install.minimal uninstall clean
+.PHONY: all install install.minimal uninstall clean reallyclean
 
 all: babeld babeld.man
 
+TAGS: $(SRCS) $(INCLUDES) $(KFILES)
+   etags $(SRCS) $(INCLUDES) $(KFILES)
+
+tags: $(SRCS) $(INCLUDES) $(KFILES)
+   ctags $(SRCS) $(INCLUDES) $(KFILES)
+
 install.minimal: babeld
-rm -f $(TARGET)$(PREFIX)/bin/babeld
mkdir -p $(TARGET)$(PREFIX)/bin
@@ -54,4 +99,7 @@ uninstall:
-rm -f $(TARGET)$(MANDIR)/man8/babeld.8
 
 clean:
-   -rm -f babeld babeld.html version.h *.o *~ core TAGS gmon.out
+   -rm -f babeld babeld.html version.h *.o *~ core
+
+reallyclean: clean
+   -rm -f TAGS tags gmon.out cscope.out
-- 
2.7.4


___
Babel-users mailing list
Babel-users@lists.alioth.debian.org
http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/babel-users


[Babel-users] Misc cleanups to the babeld build system

2017-03-09 Thread Dave Taht
This patch series makes it easier to work on babeld's source
code in a variety of ways.

[PATCH 1/6] Improve Makefile
[PATCH 2/6] Make v4prefix a shared constant between util.c and
[PATCH 3/6] Quiet the compiler on two uninitialized variable warnings
[PATCH 4/6] Tests: Add subdir and test for structure packing
[PATCH 5/6] gitignore tags TAGS emacs and vi temp files and bad patch
[PATCH 6/6] babeld-whole: Enable single unit compilation of all of

___
Babel-users mailing list
Babel-users@lists.alioth.debian.org
http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/babel-users


[Babel-users] [PATCH 2/6] Make v4prefix a shared constant between util.c and message.c

2017-03-09 Thread Dave Taht
From: Dave Taht <d...@taht.net>

Share the data better.
---
 message.c | 3 +--
 util.c| 2 +-
 2 files changed, 2 insertions(+), 3 deletions(-)

diff --git a/message.c b/message.c
index fdc1999..f8c4ad2 100644
--- a/message.c
+++ b/message.c
@@ -54,8 +54,7 @@ unsigned char *unicast_buffer = NULL;
 struct neighbour *unicast_neighbour = NULL;
 struct timeval unicast_flush_timeout = {0, 0};
 
-static const unsigned char v4prefix[16] =
-{0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0xFF, 0xFF, 0, 0, 0, 0 };
+extern const unsigned char v4prefix[16];
 
 #define MAX_CHANNEL_HOPS 20
 
diff --git a/util.c b/util.c
index 1c15dbf..0de2245 100644
--- a/util.c
+++ b/util.c
@@ -246,7 +246,7 @@ normalize_prefix(unsigned char *restrict ret,
 return ret;
 }
 
-static const unsigned char v4prefix[16] =
+const unsigned char v4prefix[16] =
 {0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0xFF, 0xFF, 0, 0, 0, 0 };
 
 static const unsigned char llprefix[16] =
-- 
2.7.4


___
Babel-users mailing list
Babel-users@lists.alioth.debian.org
http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/babel-users


[Babel-users] [PATCHes] Misc cleanups to the babeld build system

2017-03-08 Thread Dave Taht
I am not particularly huge on posting and reviewing patches in-line
on mailing lists, but I can repost this pull request here, if desired.

https://github.com/jech/babeld/pull/9

This patch series consists of a bunch of miscellaneous improvements to
the babeld build system: improving the makefile to fully check
dependencies, adding direct support for TAGS, tags and cscope,
enabling whole program compilation, smashing some tabs that crept in
elsewhere, fixing a missed dependency in disambiguation.h, enabling
whole program compilation and assembly language output, and finally,
adding a tests directory where a test for structure packing now exists
as a further optimization aid.

It is the first of 5 major sets cleanly building on deeper
improvements from the now-retired wreckage in the rabeld repository.

Next up are patch sets for better error logging, then better error handling.

After that - ghu willing - atomic updates, and some major performance
improvements.

-- 
Dave Täht
Let's go make home routers and wifi faster! With better software!
http://blog.cerowrt.org

___
Babel-users mailing list
Babel-users@lists.alioth.debian.org
http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/babel-users

[Babel-users] risc-v

2017-02-26 Thread Dave Taht
I've been fiddling with risc-v stuff here and there of late.

http://bellard.org/riscvemu/

Amazingly babel "just builds" and with a little fiddling merely
crashes on the lack of ipv6 support in the supplied fedora25 kernel.

I guess I should go build a better kernel.

root@nemesis:/tmp/babeld# file babeld
babeld: ELF 64-bit LSB executable, UCB RISC-V, version 1 (SYSV),
dynamically linked, interpreter /lib/ld.so.1, for GNU/Linux 2.6.32,
BuildID[sha1]=b771fb64167fe8dd6d2a424cfe6eff1d96916c0b, not stripped


-- 
Dave Täht
Let's go make home routers and wifi faster! With better software!
http://blog.cerowrt.org

___
Babel-users mailing list
Babel-users@lists.alioth.debian.org
http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/babel-users

[Babel-users] babeldStyle

2017-02-21 Thread Dave Taht
Is there a specific emacs or vi "c-style" setting I should use while
hacking on babeld? (.el file would be helpful)

I see:

functions typically start with the { on the first line
4 character indent
8 characters are usually characters, but there are tabs


-- 
Dave Täht
Let's go make home routers and wifi faster! With better software!
http://blog.cerowrt.org

___
Babel-users mailing list
Babel-users@lists.alioth.debian.org
http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/babel-users

Re: [Babel-users] Debugging unreachable routes - IPv6 as next hop?

2017-02-19 Thread Dave Taht
I have been chasing a similar set of bugs for months now.  Routes
would be unreachable for no reason I could see, updates to the kernel
would fail[1].

How big is the total route table?

Does it stay unreachable?

Can you try reverting to babeld-1.7.1 for lede?

...

I finally got heads down on it last week and I have a slew of
debugging patches that I need to clean up for 1.8... and then I need
to do a build for lede - but I haven't got around to it yet. Nor have
I tried 1.7.x - I was trying to debug something elsewhere

In particular I found that network manager was stomping on babel in my
network. My test case is unfortunately not as simple as yours. But I
was originally seeing some sort of interaction with odhcpd also.

[1] Lastly there was a major bug in the wifi ATF fairness code for
ath9k stomped last week, which could scribble on memory just about
anywhere, and some fixes for odhcpd, and ubus landed also.

___
Babel-users mailing list
Babel-users@lists.alioth.debian.org
http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/babel-users


Re: [Babel-users] [LEDE-DEV] Babeld now has procd support on OpenWRT/LEDE

2017-01-13 Thread Dave Taht
On Fri, Jan 13, 2017 at 8:11 AM, L. D. Pinney <ldpin...@yahoo.com> wrote:
> Go back to playing the guitar and smoking dopethat's what you do best.

I look forward very much to doing more of that... after lede ships.

In the interim, I have two fairly large testbeds setup to test lede -
the ATF patch, the dnsmasq-dnssec code, the bcp38 code, the dhcpv6-pd
code, the new bridging code, and babel integration are all high on my
list, along with sch_cake and the sqm-scripts. I have 5 lede platforms
under test - the archer c7v2, wndr3800, linksys 1200ac, uap-lite, and
picostation, with two more that I plan to add after testing interop
(edgerouter and an apu2). There are also 3 different brands of
cablemodem in the loop. Test clients include a dozen hackerboards
(kernels going back to 3.10), OSX, and Linux.

I'm perpetually testing edge cases and things that seem like they
should work, but don't. While things are vague and confusing (like,
babel-1.8's behavior being confusing in some instances) - and a piece
of a puzzle lands (like not restarting it as much, thus causing less
route retractions and floods) - I do tend to dump partial state about
the rathole I may have been going down.

I've been trying to come up with tests for understanding the true
effects of the multicast-unicast bridging code, for wifi primarily,
and simplifying. At the moment I'm more trying to test 3 layers of
lede dhcp-pd (bridged and routed scenarios) across gear that needs to
interop...

... instead of dealing with how all that feeds into babel source
specific routing - which is a combination of interactions between
netifd, babeld, and multicast/bridging.

>
> STOP CROSS POSTING YOU FSCKin' Clown Boy

If these projects want to write "no cross posting ever" into the
rules, I will gladly comply.

I hope you get your caps-lock key fixed.

I'm going to just filter out all further postings from you into my trash folder.


>
> On Saturday, January 14, 2017 12:06 AM, Dave Taht <dave.t...@gmail.com>
> wrote:
>
>
> On Fri, Jan 13, 2017 at 4:08 AM, L. D. Pinney <ldpin...@yahoo.com> wrote:
>> DAVE :
>>
>> WILL YOU PLEASE STOP YOUR FSCKin' CROSS POSTING ???
>
> I did not start the cross-post in this case.
>
>> This is UNRELATED to the OpenWrt / LEDE DEV mailing list...as the change
>> has
>> been merged.
>
> Interop with routing protocols... and networking in general... does
> exist outside of the openwrt universe. In my case I am deeply
> concerned about what happens against older, deployed versions of
> Linuxes (and other OSes) with the new multicast-unicast bridge
> conversion code in lede. Babel tends to use it, and I am also testing
> (in lede!), as per the below, the dhcpv6-pd code, interop-ing with
> several devices.
>
>> F O
>
> I'm really sorry that you hate cross posting so much. It must be
> terrible to have to elide additional responses or deal with bounce
> messages on every 20th email from me. And it must be wonderful to be
> living in a world where all you have is openwrt/lede devices on the
> network and modern kernels everywhere.
>
>>
>> On Friday, January 13, 2017 5:20 AM, Dave Taht <dave.t...@gmail.com>
>> wrote:
>>
>>
>> On Thu, Jan 12, 2017 at 1:01 PM, Baptiste Jonglez
>> <bapti...@bitsofnetworks.org> wrote:
>>> Hi,
>>>
>>> Here is yet another OpenWRT-related change for babeld: I just merged
>>> procd
>>> support for babeld [2], after more than two years of lingering [1].
>>>
>>> The only user-visible changes should be:
>>>
>>> - babeld now logs to the system log (visible with "logread") instead of a
>>>  file in /var/log.  This is nice for embedded devices, where you don't
>>>  want to write too much to the filesystem.  It is still possible to
>>>  explicitly configure babeld to use a log file;
>>>
>>> - babeld is now restarted automatically whenever it crashes;
>>>
>>> - the usual procd niceties: calling "/etc/init.d/babeld reload" will
>>>  restart babeld only if the configuration has changed.
>>>
>>>
>>> Please test babeld 1.8.0-2 and report any resulting breakage.  I would
>>> like this change (and the other compatibility change) to make it into the
>>> upcoming LEDE release, which is due to happen quite soon.
>>
>> Groovy.
>>
>> lede can dynamically insert/delete routes into tables from netifd
>> babeld can pull routes from "protos" but not tables.
>>
>> I spoke with hedecker (? can't remember his email) about somehow
>> having a field to export routes into kernel protos in the lede network
>> file, he indicated he'd look at it in 

Re: [Babel-users] Babeld now has procd support on OpenWRT/LEDE

2017-01-12 Thread Dave Taht
On Thu, Jan 12, 2017 at 1:01 PM, Baptiste Jonglez
 wrote:
> Hi,
>
> Here is yet another OpenWRT-related change for babeld: I just merged procd
> support for babeld [2], after more than two years of lingering [1].
>
> The only user-visible changes should be:
>
> - babeld now logs to the system log (visible with "logread") instead of a
>   file in /var/log.  This is nice for embedded devices, where you don't
>   want to write too much to the filesystem.  It is still possible to
>   explicitly configure babeld to use a log file;
>
> - babeld is now restarted automatically whenever it crashes;
>
> - the usual procd niceties: calling "/etc/init.d/babeld reload" will
>   restart babeld only if the configuration has changed.
>
>
> Please test babeld 1.8.0-2 and report any resulting breakage.  I would
> like this change (and the other compatibility change) to make it into the
> upcoming LEDE release, which is due to happen quite soon.

Groovy.

lede can dynamically insert/delete routes into tables from netifd
babeld can pull routes from "protos" but not tables.

I spoke with hedecker (? can't remember his email) about somehow
having a field to export routes into kernel protos in the lede network
file, he indicated he'd look at it in a few weeks.

(I wanted to get away from ever having to revise the conf file
dynamically, but it looks like not this release. Not having to restart
babeld as per the above is a nice improvement though and I'll get on
testing it this weekend. At the moment I'm going through some mild
hell with dhcpv6-pd on comcast and adding "sonic" fiber (with a HE
ipv6 tunnel. Will hopefully have 4 source specific gateways to play
with here)

In other other news the "rabeld" backport of the gentler route switch
change loses kernel routes on the vyatta (3.10 based) OS in the
edgerouter. :(. That said, haven't tested mainline babeld there yet.
It seems to work on debian.

For those fiddling with edgerouter's default 1.9.x OS, backports of
cake, iproute, and rabeld are presently here:
https://build.lochnair.net/

> Baptiste
>
> [1] https://github.com/openwrt-routing/packages/pull/55
> [2] https://github.com/openwrt-routing/packages/pull/250
>
> ___
> Babel-users mailing list
> Babel-users@lists.alioth.debian.org
> http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/babel-users



-- 
Dave Täht
Let's go make home routers and wifi faster! With better software!
http://blog.cerowrt.org

___
Babel-users mailing list
Babel-users@lists.alioth.debian.org
http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/babel-users

Re: [Babel-users] unicast attempt breaks timestamping

2017-01-09 Thread Dave Taht
On Mon, Jan 9, 2017 at 6:03 AM, Baptiste Jonglez
<bapti...@bitsofnetworks.org> wrote:
> Hi again,
>
> On Fri, Jan 06, 2017 at 11:02:05AM -0800, Dave Taht wrote:
>> Based on the patch juliusz supplied me to enable unicast IHU, and
>>
>> default enable-timestamp true
>>
>> this stops sending timestamps (which apparently relies on hellos and
>> IHUs being bundled together)
>
> Can you provide the patch in question?  Otherwise, it's hard to test.

https://github.com/dtaht/rabeld/commit/38f1fd2338506a06fbc89d618ab9640d97fce565

> Is there any reason why you couldn't send hellos alongside the IHUs?  As
> far as I remember, the method we use to compute the RTT really needs to
> have a timestamped Hello alongside the IHU.
>
> Baptiste
>
> ___
> Babel-users mailing list
> Babel-users@lists.alioth.debian.org
> http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/babel-users



-- 
Dave Täht
Let's go make home routers and wifi faster! With better software!
http://blog.cerowrt.org

___
Babel-users mailing list
Babel-users@lists.alioth.debian.org
http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/babel-users

[Babel-users] automating route injection

2017-01-06 Thread Dave Taht
what I've typically done to export a single IP is add it to the babel
configuration file, and I also use covering routes a lot, essentially
the same method.

redistribute local ip fd99::66/128 eq 128 allow
redistribute ip 172.26.130.0/23 eq 23 allow
redistribute local deny

While this works, as more and more of my stuff ends up getting dynamic
addresses (ipv6,
tunnels), something simpler that pulled from a kernel table
dynamically, rather than requiring
reconfiguration seemed desirable. Doing things like this

config filter
option 'type' 'redistribute'
option 'ip' '::/0'
option 'le' '61'
option 'action' 'allow'

break when you end up getting a bunch of /64s from elsewhere instead
of a coherent range...

...


the soon to be stable lede/openwrt has a new facility to automatically
insert routes into different tables (you add ipv4table or ipv6table to
the interface setup).

So I thought that I'd simplify matters (hah!) by using that ipvXtable
facility and change my babel
configuration to just pull from a specific kernel table, but thus far
with a variety of attempts, singleton routes like:

ip -6 addr add fd99::66/128 dev lo
ip -6 route add fd99::66/128 dev lo table 8
root@chip-6:~# ip -6 route show table 8
unreachable fd99::66 dev lo  metric 1024  error -101

never manage to get exported to the universe (filtered out due to
be-ing locally unreachable?), or,
if exported, remain unreachable...

I've tried various combinations of

import-table 8
install table 8
redistribute table 8
redistribute local table 8

* Side note

The recent babel 1.8 commit into lede does not include support for the
install or table keywords in filters. I can fix that

there is sadly no support for inserting stuff into a proto rather than
table in lede.

-- 
Dave Täht
Let's go make home routers and wifi faster! With better software!
http://blog.cerowrt.org

___
Babel-users mailing list
Babel-users@lists.alioth.debian.org
http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/babel-users

[Babel-users] unicast attempt breaks timestamping

2017-01-06 Thread Dave Taht
Based on the patch juliusz supplied me to enable unicast IHU, and

default enable-timestamp true

this stops sending timestamps (which apparently relies on hellos and
IHUs being bundled together)

(aside from that unicast IHU is working fine, am still deploying stuff
around the testbed, not evaluating anything yet)

I was polling rtts (as measured by babel) in some prior "latency under
load" tests.

* side notes

the "chip" does not support ipv6-subtrees, nor adhoc. It is certainly
handy to babel the usb0
interface, but working around systemd (on ubuntu) can be a pain.

add neighbour 1cc4970 address fe80::14c7:67ff:fe31:ceac if enp0s16u1
reach  rxcost 96 txcost 96 rtt 0.743 rttcost 0 cost 96
add neighbour 1cc4f40 address fe80::16cc:20ff:fee5:64c2 if enp3s0
reach  rxcost 96 txcost 256 rtt 1.067 rttcost 0 cost 256

saturating the usb interface only bumps up the measured rtt by < 5ms

add interface enp0s16u1 up true ipv6 fe80::3c21:acff:fe2d:82c1 ipv4 172.26.66.10
add neighbour 1cc4970 address fe80::14c7:67ff:fe31:ceac if enp0s16u1
reach  rxcost 96 txcost 96 rtt 4.080 rttcost 0 cost 96


-- 
Dave Täht
Let's go make home routers and wifi faster! With better software!
http://blog.cerowrt.org

___
Babel-users mailing list
Babel-users@lists.alioth.debian.org
http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/babel-users

Re: [Babel-users] some thoughts towards babel-1.9

2017-01-02 Thread Dave Taht
On Mon, Jan 2, 2017 at 3:35 PM, Juliusz Chroboczek  wrote:
>> For the wired link case, I am surprised babel considers it "interfering"!
>
> It doesn't.
>
>   https://github.com/jech/babeld/blob/master/interface.c#L210

If that is the only error in two highly speculative documents then I'm
winning. :)

It is probable that I wrote that down while dealing with something in
bridge mode,
and before I'd realized that we had to set the channel manually.

> ___
> Babel-users mailing list
> Babel-users@lists.alioth.debian.org
> http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/babel-users



-- 
Dave Täht
Let's go make home routers and wifi faster! With better software!
http://blog.cerowrt.org

___
Babel-users mailing list
Babel-users@lists.alioth.debian.org
http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/babel-users

Re: [Babel-users] some thoughts towards babel-1.9

2017-01-02 Thread Dave Taht
On Mon, Jan 2, 2017 at 6:28 AM, Benjamin Henrion <zoo...@gmail.com> wrote:
> On Mon, Dec 12, 2016 at 6:25 PM, Dave Taht <dave.t...@gmail.com> wrote:
>> I've long been testing a few out of tree patches for babel and long
>> have had the intent to try a few more once the first phase of the
>> make-wifi-fast work was completed - which it mostly is, so far as lede
>> is concerned ( https://lwn.net/Articles/705884/ ) - and babel-1.8
>> stablized.
>>
>> I wrote up some of my thinking then in:
>>
>> https://github.com/dtaht/rabeld/blob/master/rabel.md
>
> You are right about the 2 remarks on diversity, the code needs to be
> adapted to handle 802.11AC, and how it minimize channel interference,
> especially nowadays with using multiple channels at once.

Merely handling channel detection at all again would be good. Code for
this, using the correct API, exists in olsrv2, but my brain crashes
when looking at netlink. Then there's merely HT20 vs HT40... and THEN
ac really wonks up the ideas.

> For the wired link case, I am surprised babel considers it "interfering"!

I can't remember how I drew this conclusion, whether it was from
packet captures or from the code, but it appeared to be the case at
the time.

> On this topic, I wanted to BAN hoping on the same channel, as this is
> a really bad feature of wifi mesh.

Not sure if I understand. There are plenty of cases where using the
same channel again makes sense, for example where a directional 5ghz
radio has a ton more bandwidth than a weaker 2.4ghz omni to a given
point.

One test case we did not explore yet with the make-wifi-fast code was
where there is a 5 ghz channel in AP mode with an adhoc backchannel on
the same radio. My hope is we vastly improved the behavior in that
scenario... but to deploy it we needed

> At some point, I should find some time to install a proper outdoor
> testbed somewhere to try that kind of configuration.

The yurtlab (and 110 acre campus) is uninhabitably cold in the winter
(at least to a californian - no snow, though). I'd hoped to do a new
deployment (30+ radios) by this past september, but we weren't done
make-wifi-fast yet. With the time available before spring (say, may),
it seems possible to work on other bothersome problems (ipv6 address
assignment, source specific routing, name services, monitoring, and
security, and other rabel-ish issues)

(If anyone out here wants to bundle up and climb a few roofs with me,
getting a partial deployment along the 2 main backbones would be nice,
long before then)

> I have tried to simulate that with hwmod kernel module, but did not go very 
> far.
>
> --
> Benjamin Henrion 
> FFII Brussels - +32-484-566109 - +32-2-3500762
> "In July 2005, after several failed attempts to legalise software
> patents in Europe, the patent establishment changed its strategy.
> Instead of explicitly seeking to sanction the patentability of
> software, they are now seeking to create a central European patent
> court, which would establish and enforce patentability rules in their
> favor, without any possibility of correction by competing courts or
> democratically elected legislators."



-- 
Dave Täht
Let's go make home routers and wifi faster! With better software!
http://blog.cerowrt.org

___
Babel-users mailing list
Babel-users@lists.alioth.debian.org
http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/babel-users

Re: [Babel-users] [babel] some thoughts towards babel-1.9

2017-01-02 Thread Dave Taht
I added a todo list for my "rabel" branch of babel covering some of
the stuff I'd like
to try.

https://raw.githubusercontent.com/dtaht/rabeld/master/todo.org

___
Babel-users mailing list
Babel-users@lists.alioth.debian.org
http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/babel-users


[Babel-users] essentially atomic updates

2017-01-01 Thread Dave Taht
In my seemingly once yearly attempt at working on babel (admittedly
the year is young!), here's my latest attempt at gently switching
routes.

https://github.com/dtaht/rabeld/commit/6ca4b0fa60dbbd25eb5d7e792d8f8058941d4cdb


-- 
Dave Täht
Let's go make home routers and wifi faster! With better software!
http://blog.cerowrt.org

___
Babel-users mailing list
Babel-users@lists.alioth.debian.org
http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/babel-users

Re: [Babel-users] dropping CS6

2016-12-15 Thread Dave Taht
On Thu, Dec 15, 2016 at 1:10 AM, Henning Rogge <hro...@gmail.com> wrote:
> On Mon, Dec 12, 2016 at 6:03 PM, Dave Taht <dave.t...@gmail.com> wrote:
>> diffserv CS6 (I am the original author of the patch) has turned out to
>> be generally a lose on wifi, putting things into the VO queue where
>> they cannot be aggregated. I'd like to see babel go back to best
>> effort.
>
> Do you have any numbers how well aggregation does work for multicast
> transmissions compared to unicast?

It doesn't, so far as I know.

> Henning Rogge



-- 
Dave Täht
Let's go make home routers and wifi faster! With better software!
http://blog.cerowrt.org

___
Babel-users mailing list
Babel-users@lists.alioth.debian.org
http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/babel-users

Re: [Babel-users] babel on an otherwise transparent bridge

2016-09-07 Thread Dave Taht
On Wed, Sep 7, 2016 at 5:00 PM, Juliusz Chroboczek
 wrote:
>> like it to do is have a dedicated ipv6 ULA ip address for management
>> purposes (not using a vlan here), and announce that to the network, but
>> never offer itself as a routing opportunity to anything
>
> [...]
>
>> out br-lan something deny?
>> in?
>> inflate the metric?
>
> out br-lan ula/128 allow
> out deny

It's nice to know that both of us can struggle with babel syntax. The
first line there doesn't parse. Should it?

out ip fd42:a3d6:5621::1/128 allow
out deny

Parses. the route appears elsewhere, it doesn't show up as a router,
but the box is unpingable via ipv6.



-- 
Dave Täht
Let's go make home routers and wifi faster! With better software!
http://blog.cerowrt.org

___
Babel-users mailing list
Babel-users@lists.alioth.debian.org
http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/babel-users

[Babel-users] babel on an otherwise transparent bridge

2016-09-07 Thread Dave Taht
I have a box setup to be a transparent wifi/ethernet bridge. What I'd
like it to do is have a dedicated ipv6 ULA ip address for management
purposes (not using a vlan here), and announce that to the network,
but never offer
itself as a routing opportunity to anything on any side of the bridge, except
for stuff trying to get to that specific IP. It also needs to have a
route table internally to get around the network (snmp).

If I just blithely leave babel at the defaults, new devices on the bridge tend
to get routes from it, first, leading to a 30-60 second period where
those devices can (and do) choose that 2 hop route rather than the "1
hop" one.

out br-lan something deny?
in?
inflate the metric?

-- 
Dave Täht
Let's go make home routers and wifi faster! With better software!
http://blog.cerowrt.org

___
Babel-users mailing list
Babel-users@lists.alioth.debian.org
http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/babel-users

[Babel-users] alternate source specific encoding?

2016-08-23 Thread Dave Taht
is that in a branch somewhere yet?

I am doing some builds in the yurtlab for the latest and greatest
make-wifi-fast code, and can try this, if available.

-- 
Dave Täht
Let's go make home routers and wifi faster! With better software!
http://blog.cerowrt.org

___
Babel-users mailing list
Babel-users@lists.alioth.debian.org
http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/babel-users

Re: [Babel-users] Some minor incompatible changes to the configuration language

2016-08-01 Thread Dave Taht
On Sun, Jul 31, 2016 at 6:19 PM, Juliusz Chroboczek
 wrote:
> I've just pushed some changes that could, in some edge cases, break your
> configuration files.  Since I'd like to release 1.8 before the end of the
> summer, I'll be grateful if you could test.
>
> First of all, the keyword "wired" is now deprecated (undocumented but
> supported for backwards compatibility), you should now say
>
> interface eth0 type wired
>
> or
>
> interface wlan0 type wireless
>
> Possibel types are "auto", "wired", "wireless" and "tunnel", where
> "tunnel" enables RTT-based cost estimation.
>
> Split-horizon is *disabled* on interfaces of type "auto", so if you want
> split-horizon, you should manually set the interface type.  I'm hesitating
> to enable link-quality estimation on auto interfaces -- it would avoid
> some wrong configurations, but it carries a significant cost.
>
> The second change is that setting max-rtt-penalty no longer enables
> timestamps.  This particular bit of DWIM is no longer necessary now the we
> have the tunnel interface type.

I would like a way to enable timestamping universally. I like
seeing/collecting the measured RTTs on the monitoring tools, and one
day hope to improve the RTT metric system to work on normal (well,
fq_codeled) paths. The cost of getting and sending the timestamp is
trivial.


> I'll be grateful for testing.  I'll also be grateful for any suggestions
> that make the interface configuration more automatic.
>
> -- Juliusz
>
> ___
> Babel-users mailing list
> Babel-users@lists.alioth.debian.org
> http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/babel-users



-- 
Dave Täht
Let's go make home routers and wifi faster! With better software!
http://blog.cerowrt.org

___
Babel-users mailing list
Babel-users@lists.alioth.debian.org
http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/babel-users

Re: [Babel-users] [Cerowrt-devel] Cross-compiling to armhf [was: beaglebone green wireless boards...]

2016-06-23 Thread Dave Taht
On Thu, Jun 23, 2016 at 4:20 PM, Jonathan Morton  wrote:
>
>> On 24 Jun, 2016, at 01:57, Juliusz Chroboczek 
>>  wrote:
>>
>>> the long slow EABI changeover that was obsoleted almost overnight by the
>>> armhf work the raspian folk did, and so on.
>>
>> I am pretty positive that armhf predates raspbian.  Let's please give
>> credit where credit is due.
>
> Ironically, it was I who demonstrated to the Raspbian folks the benefits of 
> an armhf build for the R-Pi 1, back in the early days of that platform.  It 
> seems like an awfully long time ago now.  :-)

Yes, it was a group team of hackers going against the prevailing
wisdom of endless backward compatibility, and succeeding due to
technical excellence and demonstrable performance improvements -
that's why that story sticks with me.  I admire anybody that can do
that. :)



-- 
Dave Täht
Let's go make home routers and wifi faster! With better software!
http://blog.cerowrt.org

___
Babel-users mailing list
Babel-users@lists.alioth.debian.org
http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/babel-users

Re: [Babel-users] Cross-compiling to armhf [was: beaglebone green wireless boards...]

2016-06-23 Thread Dave Taht
On Thu, Jun 23, 2016 at 3:57 PM, Juliusz Chroboczek
 wrote:
>> the long slow EABI changeover that was obsoleted almost overnight by the
>> armhf work the raspian folk did, and so on.
>
> I am pretty positive that armhf predates raspbian.  Let's please give
> credit where credit is due.

I note that I *really like arm*, going back a very long ways.
http://the-edge.blogspot.com/2002/06/axioms-one-of-my-axioms-about.html
I remember telling the CTO of palm they were doomed back then... they
had started trying to differentiate models by *color* at that
point

sure the abi and compiler "were out there" - but getting 20,000
packages converted over and widely into a popular distro and platform,
to me, was the tipping point for wider adoption of the hard float abi,
as something others could build on. I just spent a few minutes
googling for that story, but couldn't find it (what I remember was 3
guys, 3 months, hammering at getting 20,000 packages to all "just
work").

What we had before was a mess of different ABIs, and a whole bunch of
slightly incompatible arm cpu versions - all enough different to
fragment the arm ecosystem. there was no way you could trust one
binary on a different box. Back around this time (2006-2010?) it was
also unclear that arm would accellerate so far past the herd, either,
and there were a ton of other factors, of course that led to where
it's now being considered for supercomputers and looks set to start
unseating intel in many places.

And despite really liking arm, I look forward to entirely new arches
like the risc-v and mill eating its lunch one day. Things like
trustzone, the mali gpu, and other portions of onchip IP commonly
shipped with the chips suck rocks, still. Very few applications are
taking good advantage of the neon vfp code, the onboard caches are way
behind intel's, and so on...

speaking of trustzone - yea! there's a way to use it now.

https://github.com/OP-TEE



>
> -- Juliusz



-- 
Dave Täht
Let's go make home routers and wifi faster! With better software!
http://blog.cerowrt.org

___
Babel-users mailing list
Babel-users@lists.alioth.debian.org
http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/babel-users

Re: [Babel-users] Cross-compiling to armhf [was: beaglebone green wireless boards...]

2016-06-23 Thread Dave Taht
On Thu, Jun 23, 2016 at 3:10 PM, Juliusz Chroboczek
 wrote:
>> (does a working cross compiler exist for the aarch64 in the c2?)
>
>   apt-get install gcc-aarch64-linux-gnu

d@osx: apt-get install gcc-aarch64-linux-gnu

Not found.

...

One of the bigger mistakes I have made in the last 3 years was
adopting an macbook air as my main laptop - primarily because the
keyboard was tolerable and backlit, it was light on my back, and
everything worked, all the time.

Running a vm for any length of time drains the battery, and the mental
semantic confusion I get from switching keyboard and mouse interfaces
between linux vm and osx, not to mention the added overhead of porting
over the tools I use (notably aquamacs), has led to an enormous
decline in my day to day development activity and a corresponding rise
in using email and other management tools. For years I'd advocated to
others that if they are going to develop on linux, for any platform,
then they should eat, sleep, and breathe linux to do so, and I've hurt
my day to day productivity by trying, only counterbalanced by that I
can try for longer (like a 10 hr airline flight)

It turns out I use absolutely no native osx apps that don't run on
linux; although things like garageband had some initial appeal,
ardour4 proved better. So the only defenses I have for that laptop are
the lightness, keyboard, and battery life. It also serves as a
constant reminder of how limited other OSes are and the uphill battle
on what needs to happen for getting universal fixes on everything.

I have two other linux laptops, both broken. On one, the ethernet is
fried, on the other, the X11 gui environment got so messed up that I
can no longer log in - so both have ended up in the testbed for use as
fq_codel development targets rather than directly in front of me. I
have a chromebook, but my attempt to get a real linux on it ended in
disaster.

> Dave, I know you're a grumpy old man, but the Debian folks have done some
> remarkable work on cross-compilation, on multiarch, chroots and emulation.

Yes they have! It is quite amazing how arm got it's act together,
including and especially all the integration work linaro did. I have a
long story on all the work I did on arm architecture long before armhf
became popular, and the mess that that was, all the way back to 1998
and handhelds.org, the disaster that was the ep9302 FPU, the long slow
EABI changeover that was obsoleted almost overnight by the armhf work
the raspian folk did, and so on.

I do plan to try and reform on this upcoming trip - bringing an air,
and reinstalling that busted laptop from scratch - but even then the
trackpad never worked worth a darn. If I don't manage to reform, I'll
also have an odroid c2 and beaglebone with me that both support native
compilation.


> (I wonder why they still insist that we use the morass of complexity
> called Debian-installer.  It is so much easier to run deboostrap, generate
> a root filesystem, tweak the root filesystem until you're happy, and then
> copy it over to the target and be done with it.)
>
> -- Juliusz



-- 
Dave Täht
Let's go make home routers and wifi faster! With better software!
http://blog.cerowrt.org

___
Babel-users mailing list
Babel-users@lists.alioth.debian.org
http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/babel-users

Re: [Babel-users] Cross-compiling to armhf [was: beaglebone green wireless boards...]

2016-06-22 Thread Dave Taht
On Wed, Jun 22, 2016 at 4:31 AM, Juliusz Chroboczek
 wrote:
>> The preinstalled OS has sufficient compiler and onboard flash space to
>> build a current babeld from git, and I'm happy to report IPV6_SUBTREES
>> is compiled in by default.
>
> Dave,
>
> It's not the first time that I notice with wonder that you're compiling on
> the devel boards.  Are you aware that cross-compiling babeld to armhf is
> so easy it's not even funny?
>
>   sudo apt-get install gcc-arm-linux-gnueabihf
>   make CC=arm-linux-gnueabihf-gcc


I ended up writing a long rant about this that I will blog one day...
but my short answer to both your suggestions that I cross compile or
install a docker: "You kids, get off my lawn!" :)

I have a tendency to need to compile things vastly more complex than
babel, often more bleeding edge than what is supplied in a repo, and
*knowing* that an apt-get build-dep something; then checking it out
from git head, will actually work with minimal effort, is a joy. The
latest generation of hackerboards are actually "real computers",
because they have a working, on-board compiler and full debian (and
android) support. They would be even more real if the Xwindow drivers
worked worth a damn and I could hook up a keyboard, or a variety of
more obscure languages (:cough: "go", "rust") actually worked, also.

I would love to one day soon be back on a world where I only had to
compile stuff for one architecture, and could spend more time writing
code rather than dealing with ABI differences. I am impressed with the
java port... Although I don't care for java much, it would be nice to
carry these new protocols into android somehow.

> Shncpd is a little bit trickier, since it depends on libbsd.  I think I'll
> remove the dependency before relase, but in the meantime you may either
> build yourself an armhf libbsd, or install libbsd0:armhf on your system
> (which requires setting up a multiarch environment), or set up
> a cross-compilation chroot, or simply copy libbsd.so from the target system.

Compiling natively, I don't have to think about that.

(does a working cross compiler exist for the aarch64 in the c2?)


-- 
Dave Täht
Let's go make home routers and wifi faster! With better software!
http://blog.cerowrt.org

___
Babel-users mailing list
Babel-users@lists.alioth.debian.org
http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/babel-users

Re: [Babel-users] rib out of sync

2016-06-21 Thread Dave Taht
I stepped back to 1.6.3 and the kernel and babel ribs stayed correct
no matter how much I upped or downed the usb0 interface while keeping
the wifi alive.

This by itself does not mean enough (because source specific routing
is not in 1.6 as best I recall)

In terms of bisecting between babel 1.6.3 and git head, what should I
try next? 1.7.1?

___
Babel-users mailing list
Babel-users@lists.alioth.debian.org
http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/babel-users


[Babel-users] beaglebone green wireless boards now available

2016-06-21 Thread Dave Taht
I just got two of 'em and getting usbnet up was a snap. I got 'em
because they have
dual 2.4ghz 802.11n antennas and I figured the wifi would be faster
than the getchip stuff.

(there is no adhoc support. another reason for looking at this board
is to look at the structure of the drivers for make-wifi-fast)

I have long liked the beaglebones as being a well built product, with
some special features like the onboard PRUs nothing else can match.
The cpu is getting a bit long in the tooth tho, and these wireless
ones (no ethernet!) are so new that cases don't exist for them yet.

https://www.amazon.com/Seeedstudio-BeagleBone-Green-Wireless-Bluetooth/dp/B01GKE8F10/ref=sr_1_1?ie=UTF8=1466552623=8-1=beaglebone+green+wireless

they boot up (pretty fast) with debian jesse, kernel 4.4.9-ti-r25, on
the onboard 4GB emmc flash chip.

I was unaware until this moment that debian jesse appears to be
shipping babeld 1.5.1.

The preinstalled OS has sufficient compiler and onboard flash space to
build a current babeld from git, and I'm happy to report IPV6_SUBTREES
is compiled in by default.

As for whether or not I'll end up going through the same hell I'm
going through elsewhere, too soon to tell.

-- 
Dave Täht
Let's go make home routers and wifi faster! With better software!
http://blog.cerowrt.org

___
Babel-users mailing list
Babel-users@lists.alioth.debian.org
http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/babel-users

Re: [Babel-users] Babel-users Digest, Vol 103, Issue 12

2016-06-21 Thread Dave Taht
As for "interesting items on the agenda", there are a wide variety of
things that appeal to *someone*, if you browse the agenda and working
groups available.

If you plan to attend all week, the sunday newcomers orientation is
quite helpful.

In my case for example I am very interested in the "ACM, IRTF & ISOC
Applied Networking Research Workshop 2016 " on saturday.
sunday is "Emerging Work in IEEE 802" - (one of my big interests in
layer 3 protocols is getting more link layers to work right)

then there is dtn, 6lo, 6titch, homenet, tsvarea, nmlrg, lpwan,
detnet, bier, and a couple others, and that's just monday. If I could
clone myself 3 times, I could fit all that in... the rest of the week
is lighter (for me), but then theres dnssd, homenet, babel, bgp,
iccrg, etc, etc...

and despite all that I mostly find myself in a hallway, talking there.

... This year, I decided that the smartest thing I can do is bring a
giant thermos of coffee, because they have a tendency to run out,
shortly before I need it.

___
Babel-users mailing list
Babel-users@lists.alioth.debian.org
http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/babel-users


Re: [Babel-users] Babel-users Digest, Vol 103, Issue 12

2016-06-21 Thread Dave Taht
On Tue, Jun 21, 2016 at 12:47 PM, Jehan Tremback
 wrote:
> I might be interested in participating in IETF 96. Are there more details
> about how much it costs etc.? How much of the event will be Babel-related?

About 1/1th.

https://datatracker.ietf.org/meeting/96/agenda.html

Cost for non-students is 700 dollars, there are also one day passes,
and student passes.

https://www.ietf.org/meeting/register.html


> -Jehan
>
> On Sun, Jun 19, 2016 at 5:00 AM,
>  wrote:
>>
>> Send Babel-users mailing list submissions to
>> babel-users@lists.alioth.debian.org
>>
>> To subscribe or unsubscribe via the World Wide Web, visit
>>
>> http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/babel-users
>> or, via email, send a message with subject or body 'help' to
>> babel-users-requ...@lists.alioth.debian.org
>>
>> You can reach the person managing the list at
>> babel-users-ow...@lists.alioth.debian.org
>>
>> When replying, please edit your Subject line so it is more specific
>> than "Re: Contents of Babel-users digest..."
>>
>> Today's Topics:
>>
>>1. Re: [babel] WG Action: Formed Babel routing protocol  (babel)
>>   (Juliusz Chroboczek)
>>
>>
>> -- Forwarded message --
>> From: Juliusz Chroboczek 
>> To: ba...@ietf.org
>> Cc: babel-users@lists.alioth.debian.org
>> Date: Sun, 19 Jun 2016 13:17:59 +0200
>> Subject: Re: [Babel-users] [babel] WG Action: Formed Babel routing
>> protocol (babel)
>> The IESG says:
>>
>> > A new IETF WG has been formed in the Routing Area.
>>
>> > Charter: https://datatracker.ietf.org/doc/charter-ietf-babel/
>>
>> Hourra!
>>
>> Many thanks to everyone who helped with making this happen.  (Too many
>> people to fit in the margin of this mail, so I'll just single out the
>> fine-tuning work that Alia did in the final phases.)
>>
>> Pro memoria, IETF 96 is from 17 to 22 July in Berlin, Germany (from
>> Warsaw, just follow the A2 highway westwards for 550km, or take the train
>> at central station; from Paris, hop into the first night train at Gare de
>> l'Est -- no need to suffer through airport security).  The inscription
>> fees are high, but there are very reasonable student rates, and Berlin is
>> full of cheap hotels, many of them close to the Hilton.  (And if you've
>> never been to Berlin -- it's worth visiting.)
>>
>> I'll send a reminder to both lists just before the meeting, with pointers
>> to the online participation URLs.
>>
>> Hope to see you all there,
>>
>> -- Juliusz
>>
>>
>>
>> ___
>> Babel-users mailing list
>> Babel-users@lists.alioth.debian.org
>> http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/babel-users
>
>
>
> ___
> Babel-users mailing list
> Babel-users@lists.alioth.debian.org
> http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/babel-users



-- 
Dave Täht
Let's go make home routers and wifi faster! With better software!
http://blog.cerowrt.org

___
Babel-users mailing list
Babel-users@lists.alioth.debian.org
http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/babel-users

[Babel-users] rib out of sync

2016-06-21 Thread Dave Taht
I have definitively proven that on at least one of the arm boxes we
are using (the getchip) that there is either a new babel bug in git,
and/or arm related issue with forming the netlink message, or a kernel
bug, or excessive gamma radiation in the atmosphere.

I am going to step back to 1.6.1 and see what happens there, and also
drop the arm boxes out of the equation, and hammer on x86, also, and
reduce or increase the number of routes babel is carrying.

I managed to duplicate all the weird behaviors I've seen over the past
several months by buckling down and simplifying things and just
looking at ipv6, while repeatedly failing and restoring an interface
(ifconfig down/up), and rebooting a lot... In particular, seeing the
asymmetric routing scenario take place, repeatably, gave much insight.

...

A feature request for babel one day is: to have a "Verify that the
kernel rib matches babel's rib" option.

...

dumping the babel routes (I do have a ton of routes (about 60
installed in a "working" kernel route table)), selecting 2 for the
sake of example:

fd99::13/128 from ::/0 metric 352 (357) refmetric 256 id
02:0d:b9:ff:fe:41:6c:2c seqno 60577 chan (255) age 16 via usb0 neigh
fe80::f0ba:5eff:feca:7f4e (installed)
fd99::13/128 from ::/0 metric 494 (525) refmetric 96 id
02:0d:b9:ff:fe:41:6c:2c seqno 60577 chan (36) age 7 via wlan0 neigh
fe80::100d:7fff:fe64:c990 (feasible)
^C

root@chipper:~# ip -6 route | grep usb
fe80::/64 dev usb0  proto kernel  metric 256

nothing else. That was last night. Same box, this morning...

root@chipper:~# ip -6 route | grep usb
fd99::4 via fe80::f0ba:5eff:feca:7f4e dev usb0  proto babel  metric 1024
fd9f:237b:c8a6::1d1 via fe80::f0ba:5eff:feca:7f4e dev usb0  proto
babel  metric 1024
fd9f:237b:c8a6:0:cca0:3c4a:60df:a69 via fe80::f0ba:5eff:feca:7f4e dev
usb0  proto babel  metric 1024
fd9f:237b:c8a6::/48 via fe80::f0ba:5eff:feca:7f4e dev usb0  proto
babel  metric 1024
fe80::/64 dev usb0  proto kernel  metric 256

(out of about 30 more that babel's rib claims is on this interface)

In terms of other issues, all in one log...

A) As for why all these routes are stored unreachable in this sample, dunno
B) refmetric 0?

::/0 from 2001:558:6045:bd:3977:7182:7772:8b2b/128 metric 65535
(65535) refmetric 65535 id 16:cc:20:ff:fe:e5:64:c2 seqno 8764 chan
(6,36) age 4 via wlan0 neigh fe80::7ec7:9ff:fede:2bb5 (installed)
::/0 from 2001:558:6045:bd:3977:7182:7772:8b2b/128 metric 640 (645)
refmetric 256 id 16:cc:20:ff:fe:e5:64:c2 seqno 8764 chan (36) age 4
via wlan0 neigh fe80::100d:7fff:fe64:c990 (feasible)
::/0 from 2001:558:6045:bd:3977:7182:7772:8b2b/128 metric 96 (101)
refmetric 0 id 16:cc:20:ff:fe:e5:64:c2 seqno 8764 age 6 via usb0 neigh
fe80::f0ba:5eff:feca:7f4e (feasible)
::/0 from 2001:558:6045:bd:3977:7182:7772:8b2b/128 metric 65535
(65535) refmetric 65535 id 16:cc:20:ff:fe:e5:64:c2 seqno 8764 chan
(6,36) age 5 via wlan0 neigh fe80::7ec7:9ff:fede:2bb5 (installed)
::/0 from 2001:558:6045:bd:3977:7182:7772:8b2b/128 metric 640 (645)
refmetric 256 id 16:cc:20:ff:fe:e5:64:c2 seqno 8764 chan (36) age 5
via wlan0 neigh fe80::100d:7fff:fe64:c990 (feasible)
::/0 from 2001:558:6045:bd:3977:7182:7772:8b2b/128 metric 96 (101)
refmetric 0 id 16:cc:20:ff:fe:e5:64:c2 seqno 8764 age 7 via usb0 neigh
fe80::f0ba:5eff:feca:7f4e (feasible)





-- 
Dave Täht
Let's go make home routers and wifi faster! With better software!
http://blog.cerowrt.org

___
Babel-users mailing list
Babel-users@lists.alioth.debian.org
http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/babel-users

[Babel-users] unicast IHU patch works... but

2016-06-19 Thread Dave Taht
tested against stock daemons running 1.7.1, 1.6, and all patches to
1.8pre to date on an admittedly too complex network, over ethernet,
usbnet, and wifi in ap/sta mode. (not adhoc yet)

I have not entirely got around to testing the main circumstances
(blocked on other factors) I wanted to look at again (powersave,
asymmetric failover)... and still groping to explain the possibly
interrelated weird behaviors I've been seeing all month...

I am sitting here (and NOT running my attempt at an atomic route
change - just the IHU wire change)

after several ifconfig usb0 up and down events, watching ipv6 failover
to the usbnet device generate errors like this (while ipv4 works
correctly)

kernel_route(ADD): Invalid argument
kernel_route(ADD): Invalid argument
kernel_route(MODIFY): Invalid argument
kernel_route(ADD): Invalid argument
kernel_route(ADD): Invalid argument
kernel_route(MODIFY): Invalid argument
kernel_route(ADD): Invalid argument
kernel_route(ADD): Invalid argument
kernel_route(ADD): Invalid argument

So I think we have a bug in ipv6 routing on kernel 4.3.0 in the
getchip, at least.

All these should have been installed by "babel".

fd99::13 via fe80::100d:7fff:fe64:c990 dev wlan0  proto static  metric 1024
fd99::14 via fe80::100d:7fff:fe64:c990 dev wlan0  proto static  metric 1024
fd99::23 via fe80::7ec7:9ff:fede:2bb5 dev wlan0  proto babel  metric 1024

I will add a check to see if we are *always* using the right proto, and
(now that I learned how) start monitoring netlink messages a bit better.


-- 
Dave Täht
Let's go make home routers and wifi faster! With better software!
http://blog.cerowrt.org

___
Babel-users mailing list
Babel-users@lists.alioth.debian.org
http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/babel-users

Re: [Babel-users] [BUG] Route "deadlocks" under load due to non-atomic kernel route updates

2016-06-16 Thread Dave Taht
On Thu, Jun 16, 2016 at 1:40 PM, Kirill Smelkov <k...@nexedi.com> wrote:
> On Thu, Jun 16, 2016 at 08:38:49AM -0700, Dave Taht wrote:
>> On Thu, Jun 16, 2016 at 4:17 AM, Kirill Smelkov <k...@nexedi.com> wrote:
>> > On Wed, Jun 15, 2016 at 12:56:34PM +0200, Juliusz Chroboczek wrote:
>> >> >> If I read you correctly, this looks like a kernel bug: incorrect
>> >> >> invalidation of the route cache.
>> >>
>> >> [...]
>> >>
>> >> > What we have here is of another kind - it is inherent race condition
>> >> > inside kernel
>> >>
>> >> Perhaps I'm confused, but it still looks like a kernel bug to me.
>> >
>> > Yes, it is a kernel bug. But in a sense it is so old and so widespread
>> > that it has to be cared about in userspace - as with atomic route
>> > updates we do not hit it.
>> >
>> > Also: atomic route updates are needed not only for avoiding this bug.
>> > Another reason is: if we have routedel & routeadd pair, even after
>> > routeadd the state of cache is correct, in the time between del & add,
>> > if a packet destined to that route gets to the node, it hits
>> > 'unreachable' route case.
>> >
>> > For usual packets it is only "packet lost" and TCP probably retransmits.
>> > But for SYN packets, e.g. when a connection is going to be established,
>> > ICMP error is returned which results in "host unreachable" error on
>> > originator side.
>>
>> Yes this variant of the bug is still there, essentially, and it bugs me.
>>
>> (btw the facebook page you pointed to fixes they did was fascinating -
>> they have "interesting problems" - like dealing with 1+m routes in
>> their route table)
>>
>> one day a year, for several years now, I get sufficiently irked about
>> the atomic update problem in babel to refresh my knowledge of netlink,
>> hack babel all to hell, and have nothing work. I left myself a bunch
>> more breadcrumbs last night in my hacked up babel version, as to what
>> I tried and what it did wrong... (because I'm actually also chasing
>> another bug which I'll put up in another message)
>>
>> But:
>>
>> Why doing the equivalent of this (and understanding how it does it)
>>
>> ip -6 route add fd99::33/128 via fe80::120d:7fff:fe64:c992 dev eno1
>> ip -6 route replace fd99::33/128 via fe80::120d:7fff:fe64:c991 dev wlp2s0
>>
>> is so hard for me to figure out - that I don't understand. But it
>> seems to require completely tracing through the ip route code, and
>> writing a decoder for the netlink packets created, to figure out why
>> what I thought would be an equivalent for babel, and taking the week
>> or more to do it...
>>
>> -- look! Squirrel!
>
> Dave, maybe this might help you: Wireshark (not tcpdump) has decoder for
> netlink route packets:
>
> https://code.wireshark.org/review/gitweb?p=wireshark.git;a=blob;f=epan/dissectors/packet-netlink-route.c;hb=v2.1.1rc0-170-gc269684

Groovy. Thank you. I did not know.

In discussing this with shemminger this morning, he pointed out there
was a semantic difference between how routes can be replaced in ipv6
and ipv4.

At *one point* last night I thought I'd successfully got ipv6 to
atomic replace, but it had failed on ipv4 - so I will revisit the work
soon, brain cells and time willing.

> so you can create a virtual netlink monitor interface - something along
> the lines of
>
> modprobe nlmon
> ip link add type nlmon
> ip link set nlmon0 up
>
> ( see more details in e.g. https://patchwork.ozlabs.org/patch/259444/ )
>
> and see the actual packets exchanged between iproute and kernel.
>
> Also: there is pyroute2 (https://github.com/svinota/pyroute2) which has debug
> decoder for netlink packets, but out of the box you have to specify packet 
> type
> explicitly:
>
> https://github.com/svinota/pyroute2/blob/master/docs/debug.rst
>
> Maybe you already know all this, but I decided to provide info anyway to make
> sure it is not missed, because you mentioned it is hard for you to understand
> what is going on underneath `ip -6 ...`
>
> Hope this might help,
> Kirill
>
>
>> >> Perhaps it would make sense to speak to netdev about that?
>> >
>> > Yes, makes sense. Though as this particular case is not present on 4.2+
>> > kernels, people on netdev will probably has less interest to look into.
>> >
>> > I will see what can be done.
>> >
>> >> > Quagga, at least, switched to atomic updates some tim

[Babel-users] the costs of periodic disassociation in conventional ap/sta mode

2016-06-16 Thread Dave Taht
In the new lab I ended up connecting up a bunch of machines in sta
mode over wpa... (partially because adhoc was unavailable - and
*mostly* because that's what normal homenet users would do, and lastly
because it improved throughput by 50x in some cases)

... with bad results for babel behavior in general, that I am
gradually trying to reduce the impact of (as well as observe what
happens to other protocols, tests, and daemons). It doesn't help that
I'm also trying to make a major change in how wifi is queued
underneath... Anyway, to summarize two out of three and add a new
one...

A) Powersave enabled caused stas to drop off the net by missing multicast.

There was a few patches that went by on the kernel list recently that
might have fixed a beacon offset problem, which I haven't tested.

good fixes for this problem include rigorously testing for it and
fixing on all the chipsets in the world, or having babel be aware it
is in powersave mode and using a bit of unicast, or something, to keep
itself alive.

B) Network Manager triggered scans were devastating[1],
and even after locking  to one bssid as suggested on the relevant
thread: http://blog.cerowrt.org/post/disabling_channel_scans/

[wifi]
bssid=04:F0:21:1F:36:E2
mac-address-blacklist=
mac-address-randomization=0
mode=infrastructure
seen-bssids=04:F0:21:1F:36:E2;
ssid=FQCODEL

I still had some trouble with babel, which I think I managed to see in
the small with

C) EVEN after putting this in place (which definitely makes things
better) - on anther machine I am periodically disassociating on
another interval for no reason I can discern (yet) - and since last
night, I was logging babel's behavior thoroughly... this is what that
does to babel.

I am pretty sure that events like this trigger a bit more routing
traffic and jitter of babels states, than is desirable, (and maybe not
enough, the link is essentially down on both sides) but did not
capture the traffic in these  cases, and for all I know there are
daemon or kernel mods that can make dropping out of the multicast
group, losing the local addresses, forgetting the channel, and then
coming back online a little less hard on everything.

...

Interface wlp2s0 has no link-local address.
Couldn't determine channel of interface wlp2s0: Invalid argument.
Interface wlp2s0 has no link-local address.
Couldn't determine channel of interface wlp2s0: Invalid argument.
Interface wlp2s0 has no link-local address.
Couldn't determine channel of interface wlp2s0: Invalid argument.
Interface wlp2s0 has no link-local address.
Couldn't determine channel of interface wlp2s0: Invalid argument.
Interface wlp2s0 has no link-local address.
Couldn't determine channel of interface wlp2s0: Invalid argument.
send: Cannot assign requested address
Interface wlp2s0 has no link-local address.
Couldn't determine channel of interface wlp2s0: Invalid argument.
send: Cannot assign requested address


[1] Saying the friends don't let friends use network manager is not
good enough. Documenting how to work around it, or thoroughly fixing
network manager - or the device drivers - would be better.

-- 
Dave Täht
Let's go make home routers and wifi faster! With better software!
http://blog.cerowrt.org

___
Babel-users mailing list
Babel-users@lists.alioth.debian.org
http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/babel-users

Re: [Babel-users] [BUG] Route "deadlocks" under load due to non-atomic kernel route updates

2016-06-16 Thread Dave Taht
On Thu, Jun 16, 2016 at 4:17 AM, Kirill Smelkov <k...@nexedi.com> wrote:
> On Wed, Jun 15, 2016 at 12:56:34PM +0200, Juliusz Chroboczek wrote:
>> >> If I read you correctly, this looks like a kernel bug: incorrect
>> >> invalidation of the route cache.
>>
>> [...]
>>
>> > What we have here is of another kind - it is inherent race condition
>> > inside kernel
>>
>> Perhaps I'm confused, but it still looks like a kernel bug to me.
>
> Yes, it is a kernel bug. But in a sense it is so old and so widespread
> that it has to be cared about in userspace - as with atomic route
> updates we do not hit it.
>
> Also: atomic route updates are needed not only for avoiding this bug.
> Another reason is: if we have routedel & routeadd pair, even after
> routeadd the state of cache is correct, in the time between del & add,
> if a packet destined to that route gets to the node, it hits
> 'unreachable' route case.
>
> For usual packets it is only "packet lost" and TCP probably retransmits.
> But for SYN packets, e.g. when a connection is going to be established,
> ICMP error is returned which results in "host unreachable" error on
> originator side.

Yes this variant of the bug is still there, essentially, and it bugs me.

(btw the facebook page you pointed to fixes they did was fascinating -
they have "interesting problems" - like dealing with 1+m routes in
their route table)

one day a year, for several years now, I get sufficiently irked about
the atomic update problem in babel to refresh my knowledge of netlink,
hack babel all to hell, and have nothing work. I left myself a bunch
more breadcrumbs last night in my hacked up babel version, as to what
I tried and what it did wrong... (because I'm actually also chasing
another bug which I'll put up in another message)

But:

Why doing the equivalent of this (and understanding how it does it)

ip -6 route add fd99::33/128 via fe80::120d:7fff:fe64:c992 dev eno1
ip -6 route replace fd99::33/128 via fe80::120d:7fff:fe64:c991 dev wlp2s0

is so hard for me to figure out - that I don't understand. But it
seems to require completely tracing through the ip route code, and
writing a decoder for the netlink packets created, to figure out why
what I thought would be an equivalent for babel, and taking the week
or more to do it...

-- look! Squirrel!


>



>> Perhaps it would make sense to speak to netdev about that?
>
> Yes, makes sense. Though as this particular case is not present on 4.2+
> kernels, people on netdev will probably has less interest to look into.
>
> I will see what can be done.
>
>> > Quagga, at least, switched to atomic updates some time ago, I think.
>> >
>> > http://patchwork.quagga.net/patch/1234/
>>
>> I see.  I'm busy right now, but I'll be grateful for a patch.
>
> I see about this. Thanks for feedback.
>
>
> On Wed, Jun 15, 2016 at 07:35:05PM -0700, Dave Taht wrote:
>> > https://lab.nexedi.com/kirr/iproute2/blob/bd480e66/t/rtcache-torture
>> > (also attached to this email)
>> >
>> > which reproduces the problem in several minutes just on one computer and
>> > retested it locally: I can reliably reproduce the issue on pristine
>> > Debian 3.16.7-ckt25-2 (on both Atom and Core2 notebooks) and on pristine
>> > 3.16.35 on Atom (compiled by me, since Debian kernel team has not yet
>> > uploaded 3.16.35 to Jessie).
>>
>> I have been running this script on four different machines for hours
>> now without reproducing your bug on the 4.4 or later kernels. It does
>> trigger on a 3.14 kernel. (it helps to do a killall fping6 before
>> exiting!)
>>
>> It does not seem to be happening on 4.4 or later. At one level, I'm
>> relieved - one last babel bug to worry about in openwrt (now 4.4
>> based), although one of the platforms I work on is still stuck at
>> 3.18, as is the 3.14 c2 (for now).
>>
>> At another level I still really, really, really wanted atomic updates
>> in babel, and was clearing the decks to make a run at the right
>> netlink stuff when I'd decided to confirm your bug existed or not in
>> my kernels. :(. Weirdly demotivating.
>>
>>
>> d@dancer:~/bin$ ssh root@pi3 uname -a
>> Linux pi3 4.4.12-v7+ #892 SMP Thu Jun 2 15:41:19 BST 2016 armv7l GNU/Linux
>> d@dancer:~/bin$ ssh root@pi2 uname -a
>> Linux pi2 4.4.12-v7+ #892 SMP Thu Jun 2 15:41:19 BST 2016 armv7l GNU/Linux
>> d@dancer:~/bin$ uname -a
>> Linux dancer 4.5.0-rc7-fqfi #1 SMP PREEMPT Mon Mar 7 16:04:17 PST 2016
>> x86_64 x86_64 x86_64 GNU/Linux
>>
>> ...
>>
>> The odroid C2 has the bug.
>>
>&

Re: [Babel-users] Golang implementation

2016-06-15 Thread Dave Taht
aha. This seems further along.

https://github.com/sh3rp/gabel

On Wed, Jun 15, 2016 at 7:40 PM, Dave Taht <dave.t...@gmail.com> wrote:
> If this is it?
>
> https://github.com/casarez/gabel
>
> It looks like a bit more work is required to get to a decent implementation.
>
> On Wed, Jun 15, 2016 at 5:18 PM, Jehan Tremback
> <jehan.tremb...@gmail.com> wrote:
>> Someone posted in April that they were working on a Golang implementation of
>> Babel. Does anyone know where the code for that is located?
>>
>> -Jehan
>>
>> ___
>> Babel-users mailing list
>> Babel-users@lists.alioth.debian.org
>> http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/babel-users
>
>
>
> --
> Dave Täht
> Let's go make home routers and wifi faster! With better software!
> http://blog.cerowrt.org



-- 
Dave Täht
Let's go make home routers and wifi faster! With better software!
http://blog.cerowrt.org

___
Babel-users mailing list
Babel-users@lists.alioth.debian.org
http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/babel-users

Re: [Babel-users] Golang implementation

2016-06-15 Thread Dave Taht
If this is it?

https://github.com/casarez/gabel

It looks like a bit more work is required to get to a decent implementation.

On Wed, Jun 15, 2016 at 5:18 PM, Jehan Tremback
 wrote:
> Someone posted in April that they were working on a Golang implementation of
> Babel. Does anyone know where the code for that is located?
>
> -Jehan
>
> ___
> Babel-users mailing list
> Babel-users@lists.alioth.debian.org
> http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/babel-users



-- 
Dave Täht
Let's go make home routers and wifi faster! With better software!
http://blog.cerowrt.org

___
Babel-users mailing list
Babel-users@lists.alioth.debian.org
http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/babel-users

Re: [Babel-users] [BUG] Route "deadlocks" under load due to non-atomic kernel route updates

2016-06-15 Thread Dave Taht
> https://lab.nexedi.com/kirr/iproute2/blob/bd480e66/t/rtcache-torture
> (also attached to this email)
>
> which reproduces the problem in several minutes just on one computer and
> retested it locally: I can reliably reproduce the issue on pristine
> Debian 3.16.7-ckt25-2 (on both Atom and Core2 notebooks) and on pristine
> 3.16.35 on Atom (compiled by me, since Debian kernel team has not yet
> uploaded 3.16.35 to Jessie).

I have been running this script on four different machines for hours
now without reproducing your bug on the 4.4 or later kernels. It does
trigger on a 3.14 kernel. (it helps to do a killall fping6 before
exiting!)

It does not seem to be happening on 4.4 or later. At one level, I'm
relieved - one last babel bug to worry about in openwrt (now 4.4
based), although one of the platforms I work on is still stuck at
3.18, as is the 3.14 c2 (for now).

At another level I still really, really, really wanted atomic updates
in babel, and was clearing the decks to make a run at the right
netlink stuff when I'd decided to confirm your bug existed or not in
my kernels. :(. Weirdly demotivating.


d@dancer:~/bin$ ssh root@pi3 uname -a
Linux pi3 4.4.12-v7+ #892 SMP Thu Jun 2 15:41:19 BST 2016 armv7l GNU/Linux
d@dancer:~/bin$ ssh root@pi2 uname -a
Linux pi2 4.4.12-v7+ #892 SMP Thu Jun 2 15:41:19 BST 2016 armv7l GNU/Linux
d@dancer:~/bin$ uname -a
Linux dancer 4.5.0-rc7-fqfi #1 SMP PREEMPT Mon Mar 7 16:04:17 PST 2016
x86_64 x86_64 x86_64 GNU/Linux

...

The odroid C2 has the bug.

d@dancer:~/bin$ ssh root@c2 uname -a
Linux c2 3.14.29-56 #1 SMP PREEMPT Wed Apr 20 12:15:54 BRT 2016
aarch64 aarch64 aarch64 GNU/Linux

BUG: Got unexpected unreachable route for 2226:::::1: #
I'd changed the number
unreachable 2226:::::1 from :: dev lo  src fd99::2  metric
0 \cache  error -101

route table for root 2226::::/48
 8< 
unicast 2226:::::/64 dev dum0  proto boot  scope global  metric 1024
unreachable 2226::::/48 dev lo  proto boot  scope global
metric 1024  error -101
 8< 

route for 2226:::::1 (once again)
unreachable 2226:::::1 from :: dev lo  src fd99::2  metric
0 \cache  error -101 users 1 used 3


>
> It is always the same: the issue reproduces reliably in several minutes.
> And it looks like e.g.
>
>  - 8< 
>  root@mini:/home/kirr/src/tools/net/iproute2/t# time ./rtcache-torture
>  PING :::::1(:::::1) 56 data bytes
>  E.E.E.E..E..EE...E..
>  
>
>  BUG: Linux mini 3.16.35-mini64 #14 SMP PREEMPT Sun Jun 12 19:41:09 MSK 
> 2016 x86_64 GNU/Linux
>  BUG: Got unexpected unreachable route for :::::1:
>  unreachable :::::1 from :: dev lo  src 
> 2001:67c:1254:20::1  metric 0 \cache  error -101
>
>  route table for root ::::/48
>   8< 
>  unicast :::::/64 dev dum0  proto boot  scope global  
> metric 1024
>  unreachable ::::/48 dev lo  proto boot  scope global  metric 
> 1024  error -101
>   8< 
>
>  route for :::::1 (once again)
>  unreachable :::::1 from :: dev lo  src 
> 2001:67c:1254:20::1  metric 0 \cache  error -101 users 1 used 4
>
>  real0m49.938s
>  user0m4.488s
>  sys 0m5.872s
>   8< 
>
> The issue should not show itself with kernels >= 4.2, because there the
> lookup procedure does not take table lock twice, and /128 cache entries
> are not routinely created (they are created only upon PMTU exception).
>
> I'm running Debian testing on my development machine. Currently it has
> 4.5.5-1 (2016-05-29). I can confirm that /128 route cache entries are
> not created there just because a route was looked up.
>
> Kirill
>
>
>  8<  (rtcache-torture)
> #!/bin/sh -e
> # torture for IPv6 RT cache, trying to hit the race between lookup,cache-add 
> & route add
> # http://lists.alioth.debian.org/pipermail/babel-users/2016-June/002547.html
>
>
> tprefix=::  # "whole-network" prefix for tests  /48
> tsubnet=$tprefix:   # subnetwork for which "to" route will be changed 
>   /64
> taddr=$tsubnet::1   # test address on $tsubnet
>
> # setup for tests:
>
> # dum0 dummy device
> ip link del dev dum0 2>/dev/null || :
> ip link add dum0 type dummy
> ip link set up dev dum0
>
> # clean route table for tprefix with only unreachable whole-network route
> ip -6 route flush root $tprefix::/48
> ip -6 route add unreachable $tprefix::/48
> ip -6 route flush cache
>
> ip -6 route add $tsubnet::/64 dev dum0
>
>
> # put a lot of requests to rt/rtcache getting route to $taddr
> trap 'kill $(jobs -p)' EXIT
> rtgetter() {
> # NOTE we cannot do this with `ip route get ...` in a loop, as `ip route
> # get` first takes RTNL lock, and thus will be completely serialized with
> # e.g. route add and del.
> #
> 

Re: [Babel-users] [BUG] Route "deadlocks" under load due to non-atomic kernel route updates

2016-06-11 Thread Dave Taht
On Fri, Jun 10, 2016 at 11:47 AM, Juliusz Chroboczek
 wrote:
> Dear Kirill,
>
> Thank you very much for the detailed analysis.
>
> If I read you correctly, this looks like a kernel bug: incorrect
> invalidation of the route cache.  While we have seen some similar bugs in
> earlier kernel versions, they were not triggered by something that
> simple -- you needed to do some non-trivial rule manipulation in order to
> trigger them.
>
> What is more -- I believe that babeld is using the same procedure as
> Quagga and Bird.  Do you understand why Quagga and Bird are not seeing the
> same issues ?

Quagga, at least, switched to atomic updates some time ago, I think.

http://patchwork.quagga.net/patch/1234/

>
> While I have no objection to switching to a different API for manipulating
> routes, I'd like to first make sure that we understand what's going on here.

I strongly approve of atomic updates and fixing what, if anything,
that breaks...

I have seen oddities in unreachable p2p routes for years now. I've
suspected a variety of causes - notably getting a icmp route
unreachable before babel could make the switch, but have never tracked
it down. Some of the work I'm doing now could be leveraged to try and
make it happen more often, but a few more pieces on top of this

https://www.mail-archive.com/netdev@vger.kernel.org/msg114172.html

need to land before I can propagate all the right pieces to the testbed.

>
> Oh -- and are you running a stock kernel, or one locally patched?  Can you
> reproduce the issue on a pristine, recent kernel?
>
> Thanks again for your help,
>
> -- Juliusz
>
>
>
>
>
> ___
> Babel-users mailing list
> Babel-users@lists.alioth.debian.org
> http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/babel-users



-- 
Dave Täht
Let's go make home routers and wifi faster! With better software!
http://blog.cerowrt.org

___
Babel-users mailing list
Babel-users@lists.alioth.debian.org
http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/babel-users

Re: [Babel-users] Babel in Bird 1.6.0

2016-04-30 Thread Dave Taht
I can confirm that toke's current set of fixes compiles on a rpi3, AND
that I am too stupid to figure out how to create a correct, basic,
babeld .conf file for bird.

On Sat, Apr 30, 2016 at 10:39 AM, Toke Høiland-Jørgensen  wrote:
> Juliusz Chroboczek  writes:
>
>>> Okay, actually trying to put this into code: Is the intention here that
>>> a null-router ID update is acceptable only on *wildcard* retractions or
>>> on *all* retractions?
>>
>> In RFC 6126, there's nothing special about a null router-ID: it's just
>> a router ID.
>
> I didn't actually mean a 'null router ID'. I meant an *unset* router ID.
> I.e., if flag 0x40 is set or the update is preceded by a router ID TLV,
> the router ID is *set*. It may or may not be set to all-zeroes, but that
> is orthogonal. So I was referring to the text stating "the current
> router-id and seqno is not used" - does that refer to all retractions or
> just wildcard ones?
>
> (I suspect the answer to be the former, and that the fact that this
> poses problems is an artifact of the current update handling flow in the
> Bird code; but want to be sure before I change it).
>
>> However, for AEs 0 and 1, the address is too short to carry a router-ID
>> (it's 0 and 4 octets respectively, while a router-ID is 8 octets).  The
>> intention was that a shorter address should be stored in the right side of
>> a router-ID, and padded with zeroes; e.g. the IPv4 address (AE 1) 1.2.3.4
>> maps to the router-ID 0:0:0:0:1:2:3:4, and the zero-length address (AE 0)
>> maps to 0:0:0:0:0:0:0:0.  However, I don't think this is spelled out in
>> RFC 6126.
>
> Well, I've always thought about 0x40 as specifying that the router ID be
> the 64 bits from the address that is semantically encoded by the TLV,
> not the literal bytes in the TLV itself. I.e. Bird does this:
>
>   if (tlv->flags & BABEL_FLAG_ROUTER_ID)
>   {
> state->router_id = ((u64) _I2(msg->prefix)) << 32 | _I3(msg->prefix);
> state->router_id_seen = 1;
>   }
>
> where msg is the internal data structure containing the parsed values.
>
> This means there's no problem in combining flag 0x40 with AE 0; but for
> IPv4 addresses it needs to be specified whether the addresses should be
> padded to the right or the left.
>
>> So my current thinking is:
>>
>>   - if a Babel speaker receives an update with AE 0 or 1 and bit 0x40 set,
>> it MUST set the router-ID to the address in the update, right justified
>> and padded with zeroes;
>
> Yes, this seems reasonable, and should go into a -bis I guess.
>
>>   - a Babel speaker SHOULD NOT set bit 0x40 in updates with AE 0 or 1,
>> lest the author meet the wrath of Markus.
>
> This is "for the time being", or? Surely Markus can be appeased by the
> time a new draft is written? ;)
>
> -Toke
>
> ___
> Babel-users mailing list
> Babel-users@lists.alioth.debian.org
> http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/babel-users



-- 
Dave Täht
Let's go make home routers and wifi faster! With better software!
http://blog.cerowrt.org

___
Babel-users mailing list
Babel-users@lists.alioth.debian.org
http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/babel-users

Re: [Babel-users] Babel in Bird 1.6.0 (documentation)

2016-04-30 Thread Dave Taht
I would like to see bird itself grow a finer knowledge of time smaller
than 1sec.


-- 
Dave Täht
Let's go make home routers and wifi faster! With better software!
http://blog.cerowrt.org

___
Babel-users mailing list
Babel-users@lists.alioth.debian.org
http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/babel-users

Re: [Babel-users] Multicast IHUs [was: perverse powersave bug with sta/ap mode]

2016-04-28 Thread Dave Taht
On Thu, Apr 28, 2016 at 10:10 AM, Juliusz Chroboczek
 wrote:

>> 4) And ya know - it might merely be a (sadly common) bug. Everybody's
>> supposed to wake up for the multicast beacons and get a notification
>> there's more data to come.
>
> Yes, it's obviously a bug.  Just like you, I'm not suprised -- ad-hoc mode
> and power save is the kind of thing that's never tested.


No, this is the kind of thing that normal users of wifi use -
AP/station mode being the most common mode of operation.

adhoc - rarely functional or tested
power save - VERY tested for people that want to save major power,
which is everybody running on battery, pulling out every trick (even
dubious ones) to meet consumption goals (rather than network
connectivity goals).

I do not know to what extent or where the problem I am seeing is
actually happening, I can look at the multicast beacons harder to see
what's going on.

Wifi powersave is not "go to sleep entirely", it is "please wake up on
this schedule (250ms) so I can poke you with more unicast data if I
have any, it also requires (in the spec) that buffering the
accumulated packets be done til that beacon, and multicast packets are
supposed to be sent as CAB ("crap after beacon" in ath9k's
documentation, content after beacon, elsewhere).

The "buffering til you wake up" requirement is hell on trying to roll
a airtime fairness scheduler, or codel, in stack portions

Certainly many devices simply disassociate when they go to sleep nowadays.

-- 
Dave Täht
Let's go make home routers and wifi faster! With better software!
http://blog.cerowrt.org

___
Babel-users mailing list
Babel-users@lists.alioth.debian.org
http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/babel-users

Re: [Babel-users] Multicast IHUs [was: perverse powersave bug with sta/ap mode]

2016-04-28 Thread Dave Taht
On Thu, Apr 28, 2016 at 10:10 AM, Juliusz Chroboczek
 wrote:
>> 1) Well, I have suggested that IHU messages actually be unicast rather
>> than bundled with the hello.
>
> Yes, you have suggested that before.  I answered I would implement that if
> somebody volunteered to do an experimental evaluation.  Nobody volunteered.

We do need to build the size of the community.

I am perpetually giving talks about wifi in front of various
audiences. Making the comment:

"how many of you use wifi, hands up"

Always gets a laugh (from those paying attention)

(sometimes I'm sufficiently annoyed at those not paying attention to
ask "those of you that are paying attention, hands up")

"How many of you understand how it works?"

and nearly all the hands go down.

"Why is this not a problem?", and then I launch into the talk

I guess it would be better to collect my(our) rants, problems, and
arguments, tone them down, and get something about - "wifi, the
dominant paradigm" into more widely read publications than these
mailing lists. The recent conference on wifi in DC had some data like
"3 billion wifi devices shipped last year".


>> That would help somewhat in this case.
>
> That's my intuition too, but I've learned to be wary of my intuitions.
> Doing wireless stuff without careful evaluation is not something I'll do
> again.

Yep. Need more people on these problems. I promise to care more after
we cut latencies under load on wifi by 2 orders of magnitude on 3
chipsets.

>> 2) A protocol that needs "always listening" capability could signal
>> the underlying stack to "make sure" these packets hit the air, and one
>> that also wants "please be lossy" capabl
>> I leave the actual implementation of that request to the fantasies of
>> the authors - a new dscp codepoint or three?
>> /me ducks
>
> No need to duck, Dave, it's very similar to what was done with UDP-Lite,
> where the use of a specific value in the protocol field signals the link
> layer not to discard corrupted frames.  I've never seen it in the wild,
> I wonder why.

Hmm.. In babel's case, switching it to udp-lite would be like 1 line
of code. Not that it would help (unless the "don't multicast this"
code is explicitly filtering out normal udp only), and the flag day
would be no fun, but certainly the basic properties of udplite aren't
entirely unaligned... I have done tests of udplite (I have a patch
available for it for netperf if anyone wants it) and over ipv6, at
least, it did seem to be quite routable over multiple hops.
*link-local* udp-lite should "just work".

>
>> 4) And ya know - it might merely be a (sadly common) bug. Everybody's
>> supposed to wake up for the multicast beacons and get a notification
>> there's more data to come.
>
> Yes, it's obviously a bug.  Just like you, I'm not suprised -- ad-hoc mode
> and power save is the kind of thing that's never tested.  I suggest you
> disable power saving on all your nodes and be done with it.

That does not bode well for normal homenet users in the long run.

> -- Juliusz



-- 
Dave Täht
Let's go make home routers and wifi faster! With better software!
http://blog.cerowrt.org

___
Babel-users mailing list
Babel-users@lists.alioth.debian.org
http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/babel-users

Re: [Babel-users] [Make-wifi-fast] [Cerowrt-devel] perverse powersave bug with sta/ap mode

2016-04-28 Thread Dave Taht
On Thu, Apr 28, 2016 at 9:05 AM, Henning Rogge  wrote:
> On Thu, Apr 28, 2016 at 5:04 PM, moeller0  wrote:
>>
>>> On Apr 28, 2016, at 15:43 , Toke Høiland-Jørgensen  wrote:
>>> Presumably the access point could transparently turn IP-level multicast
>>> into a unicast frame to each associated station? Not sure how that would
>>> work in an IBSS network, though... Does the driver (or mac80211 stack)
>>> maintain a list of neighbours at the mac/phy level?
>>
>> I believe the openwrt developers are thinking a long similar lines, see e.g. 
>> https://lists.openwrt.org/pipermail/openwrt-devel/2015-June/033398.html
>
> Why not just sending IP multicast (not 802.11 management frames) with
> a higher rate (lowest best linkspeed to all known neighbors)?

)I've always liked this idea as an enhancement to the existing 802.11
spec. It is flawed in that the "lowest best linkspeed" in minstrel
decays to the lowest configured link rate for stations that have not
been sampled recently. (Another thing I like about unicast IHU is that
it would keep minstrel's statistics more current). I don't deeply how
other rate controllers besides the old "samplerate" work - and only
just last week finally put the minstrel paper up to review -
(http://blog.cerowrt.org/post/minstrel/). If there are other documents
on wifi rate controllers out there, like those in certain wifi chips,
I'd love to read them

And: I have found several devices in the field that cannot take
anything but the base multicast rate, my old nexus 7 was like that.

I note I'm not seeking a solution for ND/RA/ARP at the moment - in
fact I'm trying to work on something other than routing protocols
entirely ! - I would like it if people would stop trying to treat wifi
as an ethernet equivalent and treat it as the now dominant paradigm it
is, where ethernet is the exception rather than the rule, ethernet
fallback as a "nice to have" rather than a necessity. For years I got
along just fine on wifi + ethernet using the then common ahcp/babeld
single ip methods ( http://blog.cerowrt.org/post/failing_over_faster/
)

I am unfond of turning formerly all multicast protocols into unicast,
on wifi,  as proposed for openwrt and for that matter, in the ietf.

This might make for some background reading:
https://tools.ietf.org/html/draft-yourtchenko-colitti-nd-reduce-multicast-00

Anyway, I put a couple pictures up at
http://blog.cerowrt.org/post/poking_at_powersave/ - I have some data
showing the ap/sta metric going to hell over a few minutes not in that
post yet and I still ended up with some difficulties (I have not
turned powersave off everywhere yet, repeatably, as I need to find the
right hooks to tell Ap/sta mode in
networkmanager/systemd/debian/openwrt to turn it off. (?)) - did I say
I wasn't working on meshy protocols already? :grump:

and I have a conference today and more new gear arriving to play with.

>
> Henning Rogge
> ___
> Make-wifi-fast mailing list
> make-wifi-f...@lists.bufferbloat.net
> https://lists.bufferbloat.net/listinfo/make-wifi-fast



-- 
Dave Täht
Let's go make home routers and wifi faster! With better software!
http://blog.cerowrt.org

___
Babel-users mailing list
Babel-users@lists.alioth.debian.org
http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/babel-users

Re: [Babel-users] [Make-wifi-fast] perverse powersave bug with sta/ap mode

2016-04-28 Thread Dave Taht
On Thu, Apr 28, 2016 at 8:44 AM, Dave Taht <dave.t...@gmail.com> wrote:
> On Thu, Apr 28, 2016 at 7:59 AM, Juliusz Chroboczek
> <j...@pps.univ-paris-diderot.fr> wrote:
>>> Discovery is a special case, that is not quite multicast. [...]  So you
>>> don't need any facility to "reach all" in one message.
>>
>> Are we speaking of the IP Internet, or of some other network?
>
> Heh.  Wifi is "some other network", at this point. Perhaps it always
> was. It was originally targetted at IPX/SPX.
>
> As wifi has evolved all sorts of packets below the conventional link
> layer that are invisible to IP (management frames in general), perhaps
> finding saner ways of exposing these packet types and their properties
> to the conventional IP stack - and the IP stack to the properties of
> the wifi frames - would be of help.
>
> For example, I just "ate" the entire 802.11-2012 and 802.11ac
> specifications, notably section 9 (Aggregation stuff mostly) and annex
> G - for those of you into a digestif, they are publicly available via:
> http://standards.ieee.org/about/get/802/802.11.html
>
> In my case I was mostly looking for properties of ampdus that could be
> better leveraged for congestion control. It turns out that you *can*
> mark certain MPDUs as QosNOACK, which means that they will not be
> block acknowledged. Nobody does this... and, while you could form some
> IP packets with this property there's no way to do it on the ath10k
> (except in raw mode), and no hook for it on the ath9k, and no way of
> the IP layer saying to the 802.11 layer, "it's totally ok you can lose
> this".
>
> Elsewhere in the stack I am seeing retries of 10 (ath9k) and 15
> (Ath10k), and in the initial fq_codel implementation on ath10k, *ZERO*
> loss coming from the wifi layer on a string of traces. (I was
> leveraging codel's ecn marking abilities to "see" this ) The mac802.11
> portion also has sw retries as a global config, not as something that
> is per-station.
>
> I am certain, out there, there is some wifi EE dancing at how perfect
> they've made the wireless layer appear and "transparent" to IP, but I
> look at the aircap packet traces I just got with something akin to
> horror, 10s of ms of retries on stuff, eating other people's air, that
> I'd just as soon throw away, which also shows up on the xplot.org
> tcptraces on a wire downstream as spikes in rtt.
>
> (there is also the needed cts random backoff in there, also, which
> makes it hard to distinguish between retries at various rates and
> needed backoff. I am sick of manually tearing apart aircaps)
>
> Now, dpreed's position on how we do wireless wrong is a great starting
> point... I wish hd'd publish his 11 layer stack document somewhere...
>
>> A number of fundamental Internet protocols, such as ARP and ND, use
>> multicast for discovery (I see broadcast as a special case of multicast).
>> So if you want to implement the TCP/IP suite, your link layer needs to
>> support multicast.  Some people have tried to work around that (see
>> RFC 2022, for example), with IMHO little success.
>
> Sure wish more wifi folk drank with more ietf folk, more often.
> Starting 2 decades back.
>
>>
>> What you seem to be arguing is that it would be possible to design
>> a protocol suite that uses anycast for discovery.  While an interesting
>> research project, your suite would no longer be TCP/IP, good luck getting
>> it deployed.
>>
>> (So what's the solution?  As Toke suggested, push the multicast
>> implementation to the link layer -- have the link layer convert multicast
>> to multiple unicasts in a way that's invisible to the network layer.
>> After all, that's what the link layer is for -- hiding the idiosyncrasies
>> of a given physical layer from the network layer.)
>
> 1) Well, I have suggested that IHU messages actually be unicast rather
> than bundled with the hello. That would help somewhat in this case.
> (and also fix cases where multicast works and unicast doesn't).
> multicast hello would become more of a discovery protocol and you
> could actually signal you can "take" a unicast hello (via a new tlv)
> and establish an ongoing multicast-free association that way.
>
> Given the currently "perfect" characteristics of the underlying
> unicast wireless link layer that would tend to eliminate packet loss
> as a viable metric of quality. :(
>
> 2) A protocol that needs "always listening" capability could signal
> the underlying stack to "make sure" these packets hit the air, and one
> that also wants "please be lossy" capabl
>
> I lea

Re: [Babel-users] [Make-wifi-fast] perverse powersave bug with sta/ap mode

2016-04-28 Thread Dave Taht
On Thu, Apr 28, 2016 at 7:59 AM, Juliusz Chroboczek
 wrote:
>> Discovery is a special case, that is not quite multicast. [...]  So you
>> don't need any facility to "reach all" in one message.
>
> Are we speaking of the IP Internet, or of some other network?

Heh.  Wifi is "some other network", at this point. Perhaps it always
was. It was originally targetted at IPX/SPX.

As wifi has evolved all sorts of packets below the conventional link
layer that are invisible to IP (management frames in general), perhaps
finding saner ways of exposing these packet types and their properties
to the conventional IP stack - and the IP stack to the properties of
the wifi frames - would be of help.

For example, I just "ate" the entire 802.11-2012 and 802.11ac
specifications, notably section 9 (Aggregation stuff mostly) and annex
G - for those of you into a digestif, they are publicly available via:
http://standards.ieee.org/about/get/802/802.11.html

In my case I was mostly looking for properties of ampdus that could be
better leveraged for congestion control. It turns out that you *can*
mark certain MPDUs as QosNOACK, which means that they will not be
block acknowledged. Nobody does this... and, while you could form some
IP packets with this property there's no way to do it on the ath10k
(except in raw mode), and no hook for it on the ath9k, and no way of
the IP layer saying to the 802.11 layer, "it's totally ok you can lose
this".

Elsewhere in the stack I am seeing retries of 10 (ath9k) and 15
(Ath10k), and in the initial fq_codel implementation on ath10k, *ZERO*
loss coming from the wifi layer on a string of traces. (I was
leveraging codel's ecn marking abilities to "see" this ) The mac802.11
portion also has sw retries as a global config, not as something that
is per-station.

I am certain, out there, there is some wifi EE dancing at how perfect
they've made the wireless layer appear and "transparent" to IP, but I
look at the aircap packet traces I just got with something akin to
horror, 10s of ms of retries on stuff, eating other people's air, that
I'd just as soon throw away, which also shows up on the xplot.org
tcptraces on a wire downstream as spikes in rtt.

(there is also the needed cts random backoff in there, also, which
makes it hard to distinguish between retries at various rates and
needed backoff. I am sick of manually tearing apart aircaps)

Now, dpreed's position on how we do wireless wrong is a great starting
point... I wish hd'd publish his 11 layer stack document somewhere...

> A number of fundamental Internet protocols, such as ARP and ND, use
> multicast for discovery (I see broadcast as a special case of multicast).
> So if you want to implement the TCP/IP suite, your link layer needs to
> support multicast.  Some people have tried to work around that (see
> RFC 2022, for example), with IMHO little success.

Sure wish more wifi folk drank with more ietf folk, more often.
Starting 2 decades back.

>
> What you seem to be arguing is that it would be possible to design
> a protocol suite that uses anycast for discovery.  While an interesting
> research project, your suite would no longer be TCP/IP, good luck getting
> it deployed.
>
> (So what's the solution?  As Toke suggested, push the multicast
> implementation to the link layer -- have the link layer convert multicast
> to multiple unicasts in a way that's invisible to the network layer.
> After all, that's what the link layer is for -- hiding the idiosyncrasies
> of a given physical layer from the network layer.)

1) Well, I have suggested that IHU messages actually be unicast rather
than bundled with the hello. That would help somewhat in this case.
(and also fix cases where multicast works and unicast doesn't).
multicast hello would become more of a discovery protocol and you
could actually signal you can "take" a unicast hello (via a new tlv)
and establish an ongoing multicast-free association that way.

Given the currently "perfect" characteristics of the underlying
unicast wireless link layer that would tend to eliminate packet loss
as a viable metric of quality. :(

2) A protocol that needs "always listening" capability could signal
the underlying stack to "make sure" these packets hit the air, and one
that also wants "please be lossy" capabl

I leave the actual implementation of that request to the fantasies of
the authors - a new dscp codepoint or three?
/me ducks

3) There are other stats from minstrel, station association, crypto
state, etc, that could be leveraged.

4) And ya know - it might merely be a (sadly common) bug. Everybody's
supposed to wake up for the multicast beacons and get a notification
there's more data to come.


> -- Juliusz
> ___
> Make-wifi-fast mailing list
> make-wifi-f...@lists.bufferbloat.net
> https://lists.bufferbloat.net/listinfo/make-wifi-fast



-- 
Dave Täht
Let's go make home routers and wifi faster! With better software!

[Babel-users] bug in ^C in -G

2016-04-26 Thread Dave Taht
this is repeatable on all platforms.

root@apu2:/etc# telnet ::1 33123

Trying ::1...
Connected to ::1.
Escape character is '^]'.
BABEL 1.0
version babeld-1.7.1-59-gb648a17-dirty
host apu2
my-id 02:0d:b9:ff:fe:41:6c:2c
ok
^C
dump
^C^C^C
dump
dump, darn it
^C
dump dump dump dump
please dump
dump, with sugar on top?
dump I really love you with echo dump | nc ::1 33123 but
^C^C
dump
quit
Connection closed by foreign host.

(why does quit work and dump not after a cntrl-c?)


-- 
Dave Täht
Let's go make home routers and wifi faster! With better software!
http://blog.cerowrt.org

___
Babel-users mailing list
Babel-users@lists.alioth.debian.org
http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/babel-users

[Babel-users] perverse powersave bug with sta/ap mode

2016-04-26 Thread Dave Taht
Pain shared, reduced, joy shared increased...

for weeks now I've been puzzling over why a variety of links flapped
the way they did,
routes coming and going, failing over to weird paths, and I think I
have finally isolated one
part of the problem.

In an age where adhoc does not work particularly well, and AP/sta mode
does (as in 6mbit vs 500 in one case), I've had a tendency to nail up
links in ap/sta mode.

Well, at least one ( probably several) of the devices I have in the
new lab has *very aggressive* power save, to where babel ipv6
multicast traffic either doesn't sync up to the AP's request for
multicast (or the sta's), or it is merely completely suppressed by the
stack. (or lost due to a bug!)...

Anyway...

So long as there is unicast traffic on the local part of the link, you
don't see a problem. And there's almost always a bit of traffic on the
link. So, perversely... like when I'm looking at it... ssh'd through
to the other side, running something like "watch tc", or like, pinging
from one side of the link to the other... it works. When I go away for
a bit... it fails. Eventually.

If I run a test, after getting everything all setup and verified the
network looks correct... it works.

If I walk away and run a test that has a few minutes :grump: between
runs to let things "settle down", things actually deteriorate.

Babel misses multicast traffic and gradually increases the metric on
that interface due to the loss - causing a given route, in my case, to
eventually fall over to an adhoc wifi radio elsewhere on the network,
which reduces the probability of unicast traffic through that link
still more, until ultimately the local link, otherwise nailed up,
drops off the network completely.

to "fix" this:

iw dev wlp4s0 set power_save off

worked beautifully on the ath10k driver I'm using. The babel metric
stayed stable, the route stayed stable, life was good, throughput
increased, latency dropped...

That said, I know how hard wifi device driver writers are hammering at
trying to reduce multicast effects, and save power... and I haven't
exactly found the root cause of this problem, in this driver... (or in
the ath9k on the AP side) but I think I've seen it elsewhere also,
while chasing this -l failover issue.

multicast beacons are supposed to say "hey, chips, wake up, you need
to hear this".

-- 
Dave Täht
Let's go make home routers and wifi faster! With better software!
http://blog.cerowrt.org

___
Babel-users mailing list
Babel-users@lists.alioth.debian.org
http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/babel-users

Re: [Babel-users] failing over faster?

2016-04-25 Thread Dave Taht
On Mon, Apr 25, 2016 at 4:33 PM, Juliusz Chroboczek
 wrote:
>> A good question would be, what would the ideal time between tests be
>> for the network to stablize? 3 minutes? At least in one series I'd
>> started tests back to back, and didn't kick in the drop link stuff at
>> the right times.
>
>   SOURCE_GC_TIME is 200
>   hold time is 3.5 * update_interval
>
> so in order to make sure that all stale data has been flushed, you should
> keep a silent time of
>
>   MAX(SOURCE_GC_TIME, 3.5 * update_interval)
>
> Note that's update_interval, not hello_interval.

OK. Basically you want two forms of experiment from me -

Overall topology, 3 routers, 2 ethernet, 1 wifi. Fail over in
different, regular, documented ways, and see what happens.

1) with and without -l
2) with and without the patch to check-interfaces
(can't I just put in a debug statement to see if the pi/etc are
sending the message?)
I note there are two places where the polling interval is set in the code.

I note that also a target in these earlier tests was the linksys
1200ac, which has some major issues at a gigE. In this quick failover
test I'd gone from the c2 (100mbit), to the apu2 - gigE, and... wow...
explain THIS!

http://blog.cerowrt.org/flent/failing_over_faster/linksys_1200_majorly_headblocking.svg

I am going to switch to another box as a target, finish solidifying
the overall testbed, and then go heads down into the ath10k driver.

I hope by documenting some of the headaches I've had in getting these
boxes up and working at all, others will benefit.

http://blog.cerowrt.org/post/some_hackerboards_for_wifi/

>
> -- Juliusz
>



-- 
Dave Täht
Let's go make home routers and wifi faster! With better software!
http://blog.cerowrt.org

___
Babel-users mailing list
Babel-users@lists.alioth.debian.org
http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/babel-users

Re: [Babel-users] failing over faster?

2016-04-25 Thread Dave Taht
and in other news the odroid c2's current kernel, and the rpi3 and
rpi2, now all do IPV6_SUBTREES correctly.

___
Babel-users mailing list
Babel-users@lists.alioth.debian.org
http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/babel-users


Re: [Babel-users] failing over faster?

2016-04-25 Thread Dave Taht
Tee-hee! Don't overanalyze yet!, that was not a strictly repeatable
test, as yet. (if you want access to the testbed send me a ssh key)

A good question would be, what would the ideal time between tests be
for the network to stablize? 3 minutes? At least in one series I'd
started tests back to back, and didn't kick in the drop link stuff at
the right times.

odroid c2 (because it's slower than the apu)'s babeld.conf, running
1.8 + patches

# For more information about this configuration file, refer to
# babeld(8)
default enable-timestamps true
ipv6-subtrees true
redistribute local deny
interface eth0
interface usbnet0
out if eth0 metric 33 # gigE but doesn't come close
out if usbnet0 metric 34 # (only gets 200mbit at best in one direction)
# if you would suggest a better metric, let me know.

apu2: babeld.conf running 1.8+ patches. Rate limited to 400Mbit on all
ports, but does that  *really well*. Can definately do gigE... but
that speed messes up the the linksys 1200 which I am replacing with
something faster

d@apu2:~$ cat /etc/babeld.conf
# For more information about this configuration file, refer to
# babeld(8)
redistribute local deny

linksys 1200 (babeld 1.7.1 openwrt trunk) which is on the 64 network
only is configured to have the ad-hoc interface, there is another box
or three somewhere (pi3? pi? cerowrt?) on the adhoc that gets it back
to the other network. I will simplify!

This amd x86 box, btw, is finally looking like the ideal platform for
the fq_codel on wifi work, as well as fiddling with router stuff with
it's onboard 3 port intel ethernet

http://pcengines.ch/apu2c4.htm

I have two of 'em, they are so far working out pretty great.




On Mon, Apr 25, 2016 at 1:03 PM, Juliusz Chroboczek
 wrote:
>> http://blog.cerowrt.org/post/failing_over_faster/
>
> Why does the first stream fail at time 120?  Broken firewall?
>
> There's something wrong in the second stream -- you're falling back in
> 30s, which is a tad high.  Can i please see your babeld.conf?
>
> -- Juliusz



-- 
Dave Täht
Let's go make home routers and wifi faster! With better software!
http://blog.cerowrt.org

___
Babel-users mailing list
Babel-users@lists.alioth.debian.org
http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/babel-users

Re: [Babel-users] failing over faster?

2016-04-25 Thread Dave Taht
Thank ghu we aren't homenet! Wires are dead! :)

I will incorporate your comments later today. Until then, there's
pictures and data now up at:

http://blog.cerowrt.org/post/failing_over_faster/

I am quite puzzled as to how long it takes to fail over even in the
good cases. I guess I gotta take some packet captures.

On Mon, Apr 25, 2016 at 11:47 AM, Juliusz Chroboczek
 wrote:
>> 8+ years ago, with ahcp and babel, and a network configured to use
>> that with a single static ip address on both the ethernet and wifi, I
>> could do that. My own networks were setup that way, anyway... I did it
>> all the time. It was wonderful. I never had to think about it.
>
> Dave, the plan is to do exactly that with shncpd and babeld -- think of
> shncpd as ahcpdv2.  Please try running babeld and shncpd (-M) on the host,
> and if it doesn't work as well as ahcpd, we'll fix it.
>
>> It was massively disconcerting to attempt to move back into the
>> "regular" world where wifi and ethernet were treated as distinct,
>> where taking an interface offline lost its address,
>
> Right.  One difference between ahcpd and shncpd -M is that the former uses
> a single address, while the latter uses one address per interface.  The
> workaround is to keep the interface up, even if it is unconnected.
> Since -M is out of spec anyway, I can be convinced to change that.
>
>> where taking a new /64 was considered mandatory,
>
> That's what -M is for.
>
>> and no host changes allowed,
>
> We're not Homenet, Dave, we're independent researchers.  Just because
> Homenet rejects something doesn't mean we shouldn't do the right thing.
> My personal opinion is that having reasonable support for unchanged hosts
> is a goodness, but we shouldn't shy from designing better hosts.
>
>> I've harped on a need for atomic updates, but I still think that
>> a userspace routing daemon simply can't react fast enough to a change in
>> an ethernet routing table to prevent no-route messages being sent to one
>> or more flows on a busy link when it goes down.
>
> Higher-layer protocols should be able to survive ICMP unreachable by
> retrying after a few jiffies.  TCP certainly does, and if your protocol
> doesn't, it's a bug in the protocol.
>
>> A newer problem that I haven't thunk much about before was that babel
>> aims for a stable route, so if I have 3 routes - one stable, but
>> lousy, and both the better routes flap twice in under 60 seconds or
>> so, we end up choosing the stablest route, sometimes for a very long
>> time.
>
> Yes, over the years babeld has been tuned to prefer stable routes.  Have
> you tried playing with -M?  I'm quite open to changing its default value.
>
> -- Juliusz



-- 
Dave Täht
Let's go make home routers and wifi faster! With better software!
http://blog.cerowrt.org

___
Babel-users mailing list
Babel-users@lists.alioth.debian.org
http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/babel-users

Re: [Babel-users] failing over faster?

2016-04-25 Thread Dave Taht
This ended up being a deeply philosophical digression into routing
behaviors that I think I'll have to blog about, with pictures, to
fully describe.

What I want is a world of ubiquitous always-on connectivity[1] - where
you can be at your desk with 20 connections nailed up, listening to an
audio stream, doing a big upload or download - then pull your box out
of the ethernet dock, go to wifi, move to another room, plug in again,
and everything survive and take advantage of the better link after a
few seconds.

8+ years ago, with ahcp and babel, and a network configured to use
that with a single static ip address on both the ethernet and wifi, I
could do that. My own networks were setup that way, anyway... I did it
all the time. It was wonderful. I never had to think about it.

It was massively disconcerting to attempt to move back into the
"regular" world where wifi and ethernet were treated as distinct,
where taking an interface offline lost its address, where taking a new
/64 was considered mandatory, and no host changes allowed, as part of
homenet. I'd switch to how things were done "in the real world" - get
up from my desk - despite having both the wifi and ethernet online at
the same time - and all my connections would drop. Agh Sure, new
protocols like mosh-multipath, quic, etc, recover from a move, but
they don't...

that wasn't the case I was testing, I was testing multiple routes
through the middle of the network, where I'd hope for better behavior
while there is load.

So what I get currently from trying to do failover in the middle of
the network right now, using the -l option and the supplied patch, is
that usually the failover is not quite quick enough, and 1 or more
connections fails like this: (using the flent rrul test here)

Program output:
  netperf: send_omni: recv_data failed: No route to host
  netperf: send_omni: recv_data failed: No route to host
  Interim result:   33.47 10^6bits/s over 0.200 seconds ending at 1461547666.713
  Interim result:   22.99 10^6bits/s over 0.201 seconds ending at 1461547666.914
  Interim result:

I've harped on a need for atomic updates, but I still think that a
userspace routing daemon simply can't react fast enough to a change in
an ethernet routing table to prevent no-route messages being sent to
one or more flows on a busy link when it goes down.

So I got a mildly better result by installing a static backup link, like this:

172.26.64.0/24 via 172.26.64.1 dev usbnet0  proto babel onlink
172.26.64.0/24 dev usbnet0  proto kernel  scope link  src
172.26.64.231  metric 100
172.26.64.0/24 via 172.26.16.5 dev eth0  metric 200

for which the traffic survives the ifconfig usbnet0 down event better.

I imagine that putting in the "3 best routes" into the kernel RIB is
not something most meshy daemons do?

A newer problem that I haven't thunk much about before was that babel
aims for a stable route, so if I have 3 routes - one stable, but
lousy, and both the better routes flap twice in under 60 seconds or
so, we end up choosing the stablest route, sometimes for a very long
time.

I still see many seconds before stuff recovers in some instances.

[1] http://frankston.com/public/?n=IAC.UAC

___
Babel-users mailing list
Babel-users@lists.alioth.debian.org
http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/babel-users


Re: [Babel-users] failing over faster?

2016-04-24 Thread Dave Taht
groovy. I will try it as soon as I can

This showed some potential for doing it faster than that:

http://stackoverflow.com/questions/7225888/how-can-i-monitor-the-nic-statusup-down-in-a-c-program-without-polling-the-ker

On Sun, Apr 24, 2016 at 1:21 PM, Juliusz Chroboczek
 wrote:
>> # and we fail over in 32 seconds
>
> What happens if you apply the following patch?
>
> diff --git a/babeld.c b/babeld.c
> index 3127e72..0183b32 100644
> --- a/babeld.c
> +++ b/babeld.c
> @@ -744,7 +744,7 @@ main(int argc, char **argv)
>
>  if(timeval_compare(_interfaces_timeout, ) < 0) {
>  check_interfaces();
> -schedule_interfaces_check(3, 1);
> +schedule_interfaces_check(1000, 1);
>  }
>
>  if(now.tv_sec >= expiry_time) {



-- 
Dave Täht
Let's go make home routers and wifi faster! With better software!
http://blog.cerowrt.org

___
Babel-users mailing list
Babel-users@lists.alioth.debian.org
http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/babel-users

[Babel-users] failing over faster?

2016-04-20 Thread Dave Taht
I am fiddling with a rasberry pi3 with a usb ethernet (making it a
100mbit router), the onboard wifi, and 2 usb wifi sticks...

with all the interfaces up I do a ping over ethernet


64 bytes from 172.26.64.231: icmp_seq=56 ttl=63 time=1.45 ms
64 bytes from 172.26.64.231: icmp_seq=57 ttl=63 time=1.18 ms
64 bytes from 172.26.64.231: icmp_seq=58 ttl=63 time=2.89 ms
64 bytes from 172.26.64.231: icmp_seq=59 ttl=63 time=1.20 ms
64 bytes from 172.26.64.231: icmp_seq=60 ttl=63 time=1.30 ms
64 bytes from 172.26.64.231: icmp_seq=61 ttl=63 time=1.42 ms

I do an

ifconfig eth0 down # at this point, after a bit we get:

ping: sendmsg: No route to host
ping: sendmsg: No route to host
ping: sendmsg: No route to host
ping: sendmsg: No route to host
ping: sendmsg: No route to host
ping: sendmsg: No route to host
ping: sendmsg: No route to host
ping: sendmsg: No route to host
ping: sendmsg: No route to host
ping: sendmsg: No route to host
ping: sendmsg: No route to host
ping: sendmsg: No route to host
ping: sendmsg: No route to host
ping: sendmsg: No route to host
ping: sendmsg: No route to host
ping: sendmsg: No route to host
ping: sendmsg: No route to host
ping: sendmsg: No route to host
ping: sendmsg: No route to host
ping: sendmsg: No route to host

# and we fail over in 32 seconds

64 bytes from 172.26.64.231: icmp_seq=94 ttl=62 time=41.5 ms
64 bytes from 172.26.64.231: icmp_seq=95 ttl=62 time=2.10 ms
64 bytes from 172.26.64.231: icmp_seq=96 ttl=62 time=10.5 ms
64 bytes from 172.26.64.231: icmp_seq=97 ttl=62 time=7.27 ms
64 bytes from 172.26.64.231: icmp_seq=98 ttl=62 time=9.58 ms
64 bytes from 172.26.64.231: icmp_seq=99 ttl=62 time=15.3 ms
64 bytes from 172.26.64.231: icmp_seq=100 ttl=62 time=66.1 ms
64 bytes from 172.26.64.231: icmp_seq=101 ttl=63 time=7.73 ms

but I was under the impression we'd fail over faster with -l on and
we'd not get a "no route to host"

(there are two hops on the mesh in the way...)

babeld.conf

default enable-timestamps true
ipv6-subtrees true
# eth1 is attached to a bridged wifi/wired network
interface eth0 wired true link-quality false
interface eth1 wired true link-quality true
# All these adhoc interfaces suck compared to others on the network
# and right now, all on 6
diversity 3
interface wlan1 channel 6
interface wlan0 channel 6
interface wlan2 channel 6
out if wlan1 metric 512
out if wlan0 metric 512
out if wlan2 metric 512
#I wanted to get hncp mesh addresses only (so as to be able to do ss
#routing
#redistribute local ::/128 eq 128 allow
#redistribute local ::/64 gt 128 deny
redistribute local deny
# but ended up going with this for now
redistribute proto 43 allow



-- 
Dave Täht
Let's go make home routers and wifi faster! With better software!
http://blog.cerowrt.org

___
Babel-users mailing list
Babel-users@lists.alioth.debian.org
http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/babel-users

Re: [Babel-users] New paper on Babel, BMX6 and OLSR: Evaluation of mesh routing protocols for wireless community networks

2016-04-19 Thread Dave Taht
I finally got around to reading these papers. I liked the compression
techniques used by bmx6...

I can't help but want to also increase babel's update rate from 2sec
to .5sec as in bmx6 for comparison.

On Tue, Apr 12, 2016 at 12:46 PM, Baptiste Jonglez
 wrote:
> Hi all,
>
> There is a new paper comparing the relative performance of babeld, olsrd,
> and bmx6:
>
> Neumann, Axel, Ester López, and Leandro Navarro. "Evaluation of mesh
> routing protocols for wireless community networks." Computer Networks
> 93 (2015): 308-323.
>
> Compared to the 2013 paper, it's much more torough, and used a testbed
> while the 2013 paper was only based on emulation (this should make
> Matthieu happy!).
>
> The conclusion is the following:
>
> Babel is the most lightweight protocol with the least memory, CPU, and
> control-traffic requirements as long as it is used in networks with
> stable links and low node densities.
>
> However, if the protocol is used in large or dense wireless
> deployments with frequent link changes due to dynamic interference or
> nodes leaving or joining the network, then its reactive mechanisms to
> encounter topology changes by sending additional routing updates and
> route request messages turn into massive control-traffic and
> processing overhead. In such scenarios, OLSR and BMX6, with their
> strictly constant rate for sending topology and routing update
> messages, outperform Babel in terms of overhead, stability, and even
> self-healing capabilities.
>
>
> The paper is available here:
>
> http://www.sciencedirect.com/science/article/pii/S1389128615002522 
> (paywalled)
> http://www2.ic.uff.br/~celio/classes/cmovel/slides/community-mesh-2015.pdf
> http://people.ac.upc.edu/leandro/pubs/eomrpfwcn.pdf (open-access preprint)
> http://dsg.ac.upc.edu/eval-mesh-routing-wcn (more information, data, 
> scripts)
>
>
> Baptiste
>
> ___
> Babel-users mailing list
> Babel-users@lists.alioth.debian.org
> http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/babel-users



-- 
Dave Täht
Let's go make home routers and wifi faster! With better software!
http://blog.cerowrt.org

___
Babel-users mailing list
Babel-users@lists.alioth.debian.org
http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/babel-users

Re: [Babel-users] babeld crashes

2016-04-16 Thread Dave Taht
I eliminated the -l option from all my boxes and thus far I have not
seen it crash.

But I am not trying too hard to make things on my network come and go
right now, I'm busy on other things: http://blog.cerowrt.org

I will put a couple boxes under valgrind the next time I re-org the
network, sometime later this week.

On Sat, Apr 16, 2016 at 9:04 PM, Juliusz Chroboczek
 wrote:
> Dave, could you please try to reproduce this under valgrind?



-- 
Dave Täht
Let's go make home routers and wifi faster! With better software!
http://blog.cerowrt.org

___
Babel-users mailing list
Babel-users@lists.alioth.debian.org
http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/babel-users

Re: [Babel-users] babeld crashes

2016-04-15 Thread Dave Taht
And I got it to happen on the pi3.

(gdb) bt
#0  0x76e09f70 in __GI_raise (sig=sig@entry=6)
at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
#1  0x76e0b324 in __GI_abort () at abort.c:89
#2  0x76e45954 in __libc_message (do_abort=,
fmt=0x76efb830 "*** Error in `%s': %s: 0x%s ***\n")
at ../sysdeps/posix/libc_fatal.c:175
#3  0x76e4bb80 in malloc_printerr (action=1,
str=0x76efba6c "double free or corruption (fasttop)", ptr=)
at malloc.c:4996
#4  0x76e4cb24 in _int_free (av=, p=,
have_lock=100916) at malloc.c:3840
#5  0x0001a35c in update_route (id=, prefix=,
plen=, src_prefix=, src_plen=0 '\000',
seqno=17136, refmetric=96, interval=1600, neigh=0xc5d9d8,
nexthop=0xc5d9e0 "\376\200", channels=0x7ede598c "", channels_len=0)
at route.c:920
#6  0x0001f10c in parse_packet (from=0x0, from@entry=0x7ede5a30 "\n", ifp=0x0,
packet=0x1 ,
packetlen=) at message.c:644
#7  0x000126d8 in main (argc=, argv=)
at babeld.c:675




On Fri, Apr 15, 2016 at 6:39 PM, Dave Taht <dave.t...@gmail.com> wrote:
> I have been experiencing babeld crashes since starting to use git head
> a few weeks ago.
>
> Today after putting in git head everywhere I have been getting quite a
> few crashes (no babel process running, bunch of babel routes left
> behind) - I was not paying much attention to it ( these are a bunch of
> new machines that I was doing other things to and I had assumed it was
> systemd messing up on a restart (I am new to systemd), so I would see
> a creat(/var/run/babeld.pid): File exists...
>
> but nope, I'm segvioing at some point.
>
> I did just manage to see a crash go by and get a core dump. I will
> reboot and retry, then go back a few versions. It took about 5 minutes
> of operation on an active network before this happened, this time
>
> 0  malloc_consolidate (av=av@entry=0x7f47ad14fc00 )
> at malloc.c:4136
> 4136malloc.c: No such file or directory.
> (gdb) up
> #1  0x7f47ace0c9d4 in _int_malloc (
> av=av@entry=0x7f47ad14fc00 , bytes=bytes@entry=3916)
> at malloc.c:3417
> 3417in malloc.c
> (gdb) up
> #2  0x7f47ace0f4ae in __GI___libc_malloc (bytes=bytes@entry=3916)
> at malloc.c:2895
> 2895in malloc.c
> (gdb) up
> #3  0x0040c4f7 in buffer_update (ifp=ifp@entry=0x1d365e0,
> prefix=prefix@entry=0x1d37dc0 "\375\020", plen=plen@entry=128 '\200',
> src_prefix=src_prefix@entry=0x1d37dd1 "", src_plen=src_plen@entry=0 
> '\000')
> at message.c:1443
> 1443ifp->buffered_updates = malloc(n * sizeof(struct
> buffered_update));
> (gdb) up
> #4  0x0040c85a in send_update (ifp=ifp@entry=0x1d365e0,
> urgent=urgent@entry=0,
> prefix=0x1d37dc0 "\375\020", plen=,
> src_prefix=0x1d37dd1 "", src_plen=0 '\000')
> at message.c:1497
> 1497buffer_update(ifp, prefix, plen, src_prefix, src_plen);
>
> (gdb) up
> #5  0x0040c6ed in send_self_update (ifp=0x1d365e0) at message.c:1595
> 1595send_update(ifp, 0, xroute->prefix, xroute->plen,
> (gdb) up
> #6  0x0040c86f in send_update (ifp=0x1d365e0, urgent=0,
> prefix=prefix@entry=0x0,
> plen=plen@entry=0 '\000', src_prefix=0x414460  "",
> src_plen=src_plen@entry=0 '\000')
> at message.c:1500
> 1500send_self_update(ifp);
>
> #7  0x0040c93f in send_update (ifp=ifp@entry=0x1d365e0,
> urgent=urgent@entry=0,
> prefix=prefix@entry=0x0, plen=plen@entry=0 '\000',
> src_prefix=src_prefix@entry=0x0,
> src_plen=src_plen@entry=0 '\000') at message.c:1524
> 1524send_update(ifp, urgent, NULL, 0, zeroes, 0);
>
> #8  0x00402f80 in main (argc=, argv= out>) at babeld.c:767
> 767send_update(ifp, 0, NULL, 0, NULL, 0);
>
> *My babeld.conf is this:
>
> default enable-timestamps true
> redistribute local deny
>
> *babeld command line:
>
> babeld -l -G 33123 -S /var/lib/babeld/state eno1 wlp2s0 wlx9cefd5ff0b2c
>
> the network has got sort of complex in recent days.
>
> --
> Dave Täht
> Let's go make home routers and wifi faster! With better software!
> http://blog.cerowrt.org



-- 
Dave Täht
Let's go make home routers and wifi faster! With better software!
http://blog.cerowrt.org

___
Babel-users mailing list
Babel-users@lists.alioth.debian.org
http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/babel-users

Re: [Babel-users] IPV6_SUBTREES vs policy routing

2016-04-15 Thread Dave Taht
I can confirm this patch does the right thing on x86_64, and that the
current kernels on the pi3 and pi2 now also support IPV6_SUBTREES
correctly.

The wifi on the pi3 *sucks rocks* in adhoc mode from a bufferbloat
perspective, even worse than the ath9k. It does support a usb ethernet
and a few other usb wifi chips, I'm going to try those - but as a
little quick and dirty test router the only problem I've had with
loading it up with 3 usb sticks and ethernet was in the power supply.
I currently have it successfully using an ethernet dongle, one usb
flash stick and one external wifi adaptor.

 I am told IPV6_SUBTREEs is now in the odroid c2 kernel tree, but they
have not put out a binary yet.

I am curious as to what the behavior should be for source specific ipv4?


On Fri, Apr 15, 2016 at 9:45 AM, Matthieu Boutier
 wrote:
>> Do you think this is serious enough to justify releasing 1.7.2?
>
> No, it only adds additional v6 rules: associated tables remains empty.
>
>> Matthieu, could you please tell me when the bug was introduced, so I can
>> put it in the changelog?
>
> ... so, it's quite unclear.  The symptoms appear between
>
> c18e3b0a389fdbf5cc243b094ebb42e59139f035
>
> and
>
> 58dbd2f425a7dcdb58e5b6923bd7df309e233d79
>
> most probably at 72a6264355c0d0e98c48ca45c609c6d0288ec05c
>
> but the bug itself is probably introduced with the function (or its usage).
>
> Matthieu
>



-- 
Dave Täht
Let's go make home routers and wifi faster! With better software!
http://blog.cerowrt.org

___
Babel-users mailing list
Babel-users@lists.alioth.debian.org
http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/babel-users

Re: [Babel-users] IETF news

2016-04-09 Thread Dave Taht
On Sat, Apr 9, 2016 at 11:07 AM, Juliusz Chroboczek
 wrote:
>> attending remotely was not quite satisfactory.
>
> We'll need to think about how to make the babel WG as friendly to remote
> participation as possible.  Obviously, having a competent Jabber scribe is
> important, one that ensures that Jabber questions get asked at the mike in
> timely manner.  Anything else

A video/audio recording option would be simplest in meetecho. Ghu
knows how many legal hurdles that would have. I like that many other
conferences do make recordings available.

>> How was Bit's and Bytes?
>
> We left after a glass of wine or three, we were dining with the VIPs.
>
>> Were the omnia folk there?

https://omnia.turris.cz/en/ had made a big splash at the previous ietf.

They are also the people behind bird.

>
> Who?
>
> -- Juliusz

___
Babel-users mailing list
Babel-users@lists.alioth.debian.org
http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/babel-users


Re: [Babel-users] IETF news

2016-04-09 Thread Dave Taht
On Sat, Apr 9, 2016 at 10:17 AM, Juliusz Chroboczek
 wrote:
>> IETF is boring without Markus.  And without Henning.  And without Steven.
>
> And without Dave.  (But I'm in touch with Dave more often than the other
> three.)

Tee-hee. I have to admit that I longed to be there and attending
remotely was not quite satisfactory.

How was Bit's and Bytes? Were the omnia folk there?

I would like to help put together some form of cross-platform homenet
demo for that part of the berlin IETF (mid-july). At the moment I have
regular openwrt, raspbian, and x86 linux boxes all interoperating
(though I have yet to adopt hnetd for anything), it would be nice to
have yocto, dd-wrt, and perhaps a major vendor (eero? omnia? ubnt?
intel? google? Cisco? anyone?) also.

It might also be good to show a more regular wifi/AP mode interaction
with mobility and multiple channels going.

>
> ___
> Babel-users mailing list
> Babel-users@lists.alioth.debian.org
> http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/babel-users

___
Babel-users mailing list
Babel-users@lists.alioth.debian.org
http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/babel-users


[Babel-users] breaking the ath9k in adhoc mode with the new fq_codel implementation

2016-03-28 Thread Dave Taht
So, thank you for exposing a bug in my code today. The new ath9k
fq_codel code at the 802.11 mac layer bypasses the qdisc...


root@dancer:~/Pictures# tc -s qdisc show dev wlp2s0
qdisc noqueue 0: root refcnt 2
 Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
 backlog 0b 0p requeues 0

and somewhere IFF_RUNNING is no longer being set when iwconfig is in
adhoc mode... so I ended up with the infinit txcost

root@dancer:~/Pictures# ifconfig wlp2s0
wlp2s0Link encap:Ethernet  HWaddr 00:21:63:2f:f2:f4
  inet addr:172.26.17.246  Bcast:255.255.255.255  Mask:255.255.255.255
  UP BROADCAST MULTICAST  MTU:1500  Metric:1
  RX packets:41177 errors:0 dropped:0 overruns:0 frame:0
  TX packets:27218 errors:0 dropped:0 overruns:0 carrier:0
  collisions:0 txqueuelen:1000
  RX bytes:13436569 (13.4 MB)  TX bytes:11838910 (11.8 MB)

root@dancer:~/Pictures# iwconfig wlp2s0
wlp2s0IEEE 802.11abgn  ESSID:"babel"
  Mode:Ad-Hoc  Frequency:2.437 GHz  Cell: 32:DD:4C:C7:F8:87
  Tx-Power=16 dBm
  Retry short limit:7   RTS thr:off   Fragment thr:off
  Encryption key:off
  Power Management:off

and after I unwedge the darn thing out of adhoc and back into sta...
life is much better.

UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1

http://www.taht.net/~d/unwedged.png
http://www.taht.net/~d/unwedged_eth0down.png

Checking around, I am seeing no "running" field on the rpi2 which
might explain why it's wifi doesn't work with babel - that's running
the base OS and a "panda" usb stick in adhoc.

I AM seeing the running field on the rpi3, but that has kernel issues...

I can try to sum up everything that broke on every machine, but my
head hurts, and I'm going back to what I was doing in the first place.


Dave Täht
Let's go make home routers and wifi faster! With better software!
https://www.gofundme.com/savewifi

___
Babel-users mailing list
Babel-users@lists.alioth.debian.org
http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/babel-users

Re: [Babel-users] Control socket language [wass Babel on itty bitty boxes]

2016-03-28 Thread Dave Taht
On Mon, Mar 28, 2016 at 9:27 AM, Gabriel Kerneis  wrote:
>
>
> On Mon, Mar 28, 2016, at 17:53, Juliusz Chroboczek wrote:
>> > PS: I am loving the new "dump" functionality. Tons easier to read than
>> > a logfile. echo 'dump' | nc ::1 33123.
>>
>> Thanks for the kind words.
>>
>> Folks -- I've had little feedback on the new control socket command
>> language.  If you wish to complain, please try to do it before the third
>> week of April, which is when I plan to release 1.8.0.
>
> What about dumping the configuration options too?

+1

> That would solve the
> question of "was subtrees enabled during this test?" and the like.

heh. yep. I wouldn't mind seeing filters also.

is babelweb using "dump" yet? Seems saner to do a poll/response
on a busy network...

(I'll go pull it. if you can't tell this is the first time I've mucked
with babel in a long while)

>
> --
> Gabriel
>
> ___
> Babel-users mailing list
> Babel-users@lists.alioth.debian.org
> http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/babel-users

___
Babel-users mailing list
Babel-users@lists.alioth.debian.org
http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/babel-users


Re: [Babel-users] Babel on itty bitty boxes

2016-03-28 Thread Dave Taht
On Mon, Mar 28, 2016 at 8:35 AM, Matthieu Boutier
 wrote:
>> Why don't you write to the netdev list to ask what's a reliable way to
>> detect IPv6_SUBTREES?
>
> Yes, I'm asking myself if the Dave's "invalid argument" are for 
> source-specific routes.  In which case it answers the question.  Will test it.

Yea, a failure to insert these would be a good test for falling back
to non-subtrees.

...

I uploading to: http://www.taht.net/~d/

babeld.dump-ipv6-subtrees-false babeld_2_wlans.log
babled-ipv6-subtrees-false.log

I see the invalid argument go by in the subtrees-false case.

It was a real joy to be able to just compile stuff directly on these
itty bitty boxes, which saved much time vs openwrt. too much time.

I do crazy things so others don't have to.

...

another possible bug is that I would have assumed the bridged ap box
would have detected the bridge and supplied a different metric or
cost.
I think it is (256) according to the logs, so that is right...

The topology there is:

pi3 - AP br-lan  sw10 (fe80::120d:7fff:fe64:c992)
 (cerowrt connecting as a sta rather than adhoc).

Admittedly that connection measures at 80mbits and is probably a great
deal less flaky than the pi is.

IF you are bored and want access to these boxes let me know.

___
Babel-users mailing list
Babel-users@lists.alioth.debian.org
http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/babel-users


Re: [Babel-users] Babel on itty bitty boxes

2016-03-28 Thread Dave Taht
On Mon, Mar 28, 2016 at 8:26 AM, Matthieu Boutier
 wrote:
>> I put a babel debug 3 log up at:
>>
>> http://www.taht.net/~d/babeld_pi3.log
>
> Strange, there is no more "kernel_route(ADD): invalid argument" lines.  Is it 
> really the same node, with the same options ?

yes. Perhaps, however, I had earlier had ipv6-subtrees true.

And although I seem to have connectivity over the wifi link now (I did
not on that test), it does not choose it.

>
> So you should have all the right routes in the FIB, no ?

Meh. The source of all my issues was that I volunteered to try out 1.8
on a whole bunch of machines at once that I'd never tried it on. :)
Compiling 1.8 on all of them was easy... tracking down each individual
bug was not.

I kind of expected the local wlan connection also to the other side of
the pi's link to go direct inside of 2 minutes. It's not and damned if
I know why, besides the kernel errors... For example, right now, to
connect the two networks is a babel running on a bridged wifi AP/sta
(not adhoc)/ethernet. Which remains for all routes on the pi, despite
it too having 2 direct links.

I know, one step at a time, but it was just a quick test... I thought
delusion-ally that everything on every platform would "just work" by
now.

>
> Matthieu
>
> PS: sorry for your flash !! ;-)
>

PS: I am loving the new "dump" functionality. Tons easier to read than
a logfile. echo 'dump' | nc ::1 33123.

http://www.taht.net/~d/ has a dump and a much bigger log, with eth0,
wlan1,wlan2 enabled.

___
Babel-users mailing list
Babel-users@lists.alioth.debian.org
http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/babel-users


Re: [Babel-users] Babel on itty bitty boxes

2016-03-26 Thread Dave Taht
On Fri, Mar 25, 2016 at 2:41 AM, Matthieu Boutier
 wrote:
>> I checked and IPV6_SUBTREES is disabled in the raspbian kernel
>> build. I filed a bug.
>
> Good to know.  What if you specify explicitly to babeld not to use subtrees:
>
> ipv6-subtrees false

Congrats! That fixed the 3.14 kernel based droid c2. ip -6 rule show
showed the source specific stuff.

It did not fix the 3.10 kernel based c1.

So autodetection is broken on the c2 & c1 kernels, but works on the pi
kernel, and the ipv6-subtrees option can be set manually on the c2 to
work.

I am not seeing the kernel errors on the c2 I am on the rpi, see
below. I will try hooking up a usb wifi stick to the c2, I will try to
find one that works.

>
>> It is straightforward to build a new kernel whenever get around to it.
>
> It's written "optional", yeah?  So it should not be that much important.
> Remove it!

Your cynicism is shared and appreciated.

 routerB does not get the "default from" routes.
>>>
>>> Aren't they even sent on the link ? (tcpdump from rpi)
>>
>> rpi3, c2, c1, x86 box are on the same switch.
>
> Ah ok, so what's in the babel RIB on rpi?
> (killall -USR1 babeld; cat /var/log/babeld.log)

Attached. I note that while the pi3 had had ip -6 rules showing, this
log dump is after a fresh restart - and even after waiting a few
minutes I did not get those ip -6 rules inserted. There are a bunch of
kernel errors from dmesg along the lines of the second bug I filed
with the raspberry folk, which are probably related.

I filed a bug on these:

[258034.946162] : hw csum failure
[258034.946190] CPU: 0 PID: 21128 Comm: babeld Tainted: GW
  4.1.19-v7+ #853
[258034.946200] Hardware name: BCM2709
[258034.946238] [<800185e0>] (unwind_backtrace) from [<80013f48>]
(show_stack+0x20/0x24)
[258034.946263] [<80013f48>] (show_stack) from [<80572fac>]
(dump_stack+0xd4/0x118)
[258034.946289] [<80572fac>] (dump_stack) from [<80497444>]
(netdev_rx_csum_fault+0x44/0x48)
[258034.946312] [<80497444>] (netdev_rx_csum_fault) from [<8048c6d0>]
(skb_copy_and_csum_datagram_msg+0xdc/0xe8)
[258034.946394] [<8048c6d0>] (skb_copy_and_csum_datagram_msg) from
[<7f01f048>] (udpv6_recvmsg+0x11c/0x7cc [ipv6])
[258034.946463] [<7f01f048>] (udpv6_recvmsg [ipv6]) from [<80509d18>]
(inet_recvmsg+0xa4/0xb8)
[258034.946489] [<80509d18>] (inet_recvmsg) from [<8047c3dc>]
(sock_recvmsg+0x20/0x24)
[258034.946510] [<8047c3dc>] (sock_recvmsg) from [<8047e274>]
(___sys_recvmsg+0xa4/0x12c)
[258034.946528] [<8047e274>] (___sys_recvmsg) from [<8047f0f8>]
(__sys_recvmsg+0x4c/0x7c)
[258034.946547] [<8047f0f8>] (__sys_recvmsg) from [<8047f140>]
(SyS_recvmsg+0x18/0x1c)
[258034.946566] [<8047f140>] (SyS_recvmsg) from [<8000fa20>]
(ret_fast_syscall+0x0/0x54)
[258035.497506] eth0: hw csum failure
[258035.497552] CPU: 0 PID: 0 Comm: swapper/0 Tainted: GW
 4.1.19-v7+ #853
[258035.497563] Hardware name: BCM2709
[258035.497604] [<800185e0>] (unwind_backtrace) from [<80013f48>]
(show_stack+0x20/0x24)
[258035.497638] [<80013f48>] (show_stack) from [<80572fac>]
(dump_stack+0xd4/0x118)
[258035.497664] [<80572fac>] (dump_stack) from [<80497444>]
(netdev_rx_csum_fault+0x44/0x48)
[258035.497688] [<80497444>] (netdev_rx_csum_fault) from [<8048c310>]
(__skb_checksum_complete+0xb4/0xb8)
[258035.497709] [<8048c310>] (__skb_checksum_complete) from
[<8053abe4>] (udp6_csum_init+0x1cc/0x218)
[258035.497791] [<8053abe4>] (udp6_csum_init) from [<7f0216c0>]
(__udp6_lib_rcv+0x274/0x4b8 [ipv6])
[258035.497896] [<7f0216c0>] (__udp6_lib_rcv [ipv6]) from [<7f01e238>]
(udpv6_rcv+0x1c/0x20 [ipv6])
[258035.497982] [<7f01e238>] (udpv6_rcv [ipv6]) from [<7f0069d8>]
(ip6_input_finish+0x188/0x5d8 [ipv6])
[258035.498053] [<7f0069d8>] (ip6_input_finish [ipv6]) from
[<7f007514>] (ip6_input+0x30/0x84 [ipv6])
[258035.498128] [<7f007514>] (ip6_input [ipv6]) from [<7f007694>]
(ip6_mc_input+0x12c/0x274 [ipv6])
[258035.498198] [<7f007694>] (ip6_mc_input [ipv6]) from [<7f0067d0>]
(ip6_rcv_finish+0x44/0xc4 [ipv6])
[258035.498267] [<7f0067d0>] (ip6_rcv_finish [ipv6]) from [<7f0072b4>]
(ipv6_rcv+0x48c/0x6bc [ipv6])
[258035.498316] [<7f0072b4>] (ipv6_rcv [ipv6]) from [<80495294>]
(__netif_receive_skb_core+0x694/0xa40)
[258035.498340] [<80495294>] (__netif_receive_skb_core) from
[<80497504>] (__netif_receive_skb+0x20/0x7c)
[258035.498367] [<80497504>] (__netif_receive_skb) from [<8049758c>]
(netif_receive_skb_internal+0x2c/0xa4)
[258035.498390] [<8049758c>] (netif_receive_skb_internal) from
[<80497628>] (netif_receive_skb_sk+0x24/0x9c)
[258035.498414] [<80497628>] (netif_receive_skb_sk) from [<7f35c2f8>]
(ri_tasklet+0xec/0x28c [ifb])
[258035.498439] [<7f35c2f8>] (ri_tasklet [ifb]) from [<8002b948>]
(tasklet_action+0x74/0x10c)
[258035.498459] [<8002b948>] (tasklet_action) from [<8002ade4>]
(__do_softirq+0x1a0/0x3e0)
[258035.498478] [<8002ade4>] (__do_softirq) from [<8002b3c4>]
(irq_exit+0xdc/0x140)
[258035.498499] [<8002b3c4>] (irq_exit) from [<8006e8d4>]
(__handle_domain_irq+0x98/0xec)

Re: [Babel-users] Babel on itty bitty boxes

2016-03-25 Thread Dave Taht
So I set up ipv6-subtrees false for the pi3 also, did get rules, still
got kernel errors.

Turned off the internal wifi (so it's no longer trying to be a
router), but it is still getting kernel errors from the traffic on the
ethernet interface, so I suspect that ipv6 support on the pi has,
until now, been somewhat undertested. there are at least 3 paths in
the ipv6 portion of the stack throwing errors.

___
Babel-users mailing list
Babel-users@lists.alioth.debian.org
http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/babel-users


Re: [Babel-users] Babel on itty bitty boxes

2016-03-24 Thread Dave Taht
On Thu, Mar 24, 2016 at 4:30 PM, Juliusz Chroboczek
 wrote:
>> I wish I lived in a world where more of the weird things "just worked"
>
> You'd be bored.

Ha. At least compiling stuff on these boxes is slow enough to get in a
song or two on the piano.

Filed a bug with the odroid folk as well.

https://github.com/hardkernel/linux/issues/177

Having gone 0 for 3 on these boxes (and I also had picked up a few
more that I haven't bothered to plug in), I guess it was good to
escape the openwrt world for a while.

*sweet* little box that c2 is, speedwise, the ancient kernel meant
every usb wifi stick I had failed in it (in addition to no video or
sound), it CAN drive to gigE (unlike the pi3).

Ah, well, back to fixing wifi on the one chip I sort of understand...

___
Babel-users mailing list
Babel-users@lists.alioth.debian.org
http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/babel-users


Re: [Babel-users] Babel on itty bitty boxes

2016-03-24 Thread Dave Taht
and I filed this bug too.

https://github.com/raspberrypi/linux/issues/1371

I wish I lived in a world where more of the weird things "just worked"

___
Babel-users mailing list
Babel-users@lists.alioth.debian.org
http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/babel-users


[Babel-users] Babel on itty bitty boxes

2016-03-24 Thread Dave Taht
On Thu, Mar 24, 2016 at 1:16 AM, Matthieu Boutier
 wrote:
>> my gateway (routerA) is a source specific babel, without a version
>> number but from an openwrt build about 2 months back. RouterB is a
>> cerowrt box.
>
> On routerA:
>
> 1. How did you redistribute source-specific routes

std openwrt.

> 2. Did you see them exported in babeld ?

x86 box I see:

default from 2001:556:6045:bd:1ccf:b9bb:e8da:2c77 via
fe80::16cc:20ff:fee5:64c2 dev eno1  proto babel  metric 1024  pref
medium
default from 2601:645:4103:56c0::/60 via fe80::16cc:20ff:fee5:64c2 dev
eno1  proto babel  metric 1024  pref medium

so, yep.

>
>> But: On a brand new raspberry pi3 (kernel 4.1.19-v7), same wire, I
>> guess it is either not compiled with IPV6_SUBTREES or the
>> autodetection is broken because I don't get "default from" routes, but
>> the routes inserted in the "ip -6 rules" table.
>
> Look at kernel_older_than in kernel.c, and test this function on your system.
> (You may also want to see kernel_has_ipv6_subtrees in kernel_netlink.c)

I checked and IPV6_SUBTREES is disabled in the raspbian kernel
build. I filed a bug.

https://github.com/raspberrypi/linux/issues/1370

It is straightforward to build a new kernel whenever get around to it.

(there are more than a few 4.1 kernel features I'd like to see in
there also, I imagine if I asked them for pie on pi they might get a
kick out of it.)

>> So, to the weird part. A path that is
>> routerA <-ethernet-> rpi3 <-usbwifi-> routerB
>>
>> routerB does not get the "default from" routes.
>
> Aren't they even sent on the link ? (tcpdump from rpi)

rpi3, c2, c1, x86 box are on the same switch.

I will go dump the wifi side of the pi.

>> an odroid c1, kernel 3.10.80 does not create the rules tables nor
>> default from.
>
> Are you able to create them manually ?

yes on the pi3, c1 and c2.

> sudo ip -6 rule add from 2001:bd8::/48 prio 42 table 20
> ip -6 rule show
> sudo ip -6 rule del prio 42
>
>> I honestly don't know what mainline kernel was "good"
>> for source specific routing
>
> It should be good with any (linux) kernel.  (I would be surprised if you have 
> no support for traffic engineering rules.)

All of them seem compiled without IPV6_SUBTREES.
All of them have ip rule support. the rpi3 creates rules, the c1, c2
do not get rules, the rpi3 does not seem to be forwarding default
from.

I can certainly grant access to them (via ipv6) if you want to play,
or if anyone wants a belated christmas present?  :)

Of these it was the rpi3 that was the most exciting thing to play
with, an ideal cheap meshy router platform, given the onboard wifi did
adhoc... followed closely by the c2 (64 bit arm, 50 bucks, wow).

the c2 "feels" *way* faster than the rpi3 - but some major features
like graphics and sound don't work as yet.

Are there any well known usb sticks that work with adhoc?


> Matthieu
>

___
Babel-users mailing list
Babel-users@lists.alioth.debian.org
http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/babel-users


Re: [Babel-users] Ready for 1.8.0?

2016-03-23 Thread Dave Taht
A) I take it that "diversity routing" now matches the draft but does
not use the right stuff to get the channel(s)?

/me hides. I know, patches gladly accepted, (but ooh, have we made
some progress on bufferbloat in wifi lately)

B) I have been staring at a puzzling thing for the last few days
without time to look at it harder.

my gateway (routerA) is a source specific babel, without a version
number but from an openwrt build about 2 months back. RouterB is a
cerowrt box.

I just updated everything ELSE to your latest commit. On an x86 box I
am  seeing all but a default ipv6 route supplied by babel. Could be a
config error on my part...

But: On a brand new raspberry pi3 (kernel 4.1.19-v7), same wire, I
guess it is either not compiled with IPV6_SUBTREES or the
autodetection is broken because I don't get "default from" routes, but
the routes inserted in the "ip -6 rules" table.

So, to the weird part. A path that is
routerA <-ethernet-> rpi3 <-usbwifi-> routerB

routerB does not get the "default from" routes.

C) Having more fun with kernel versions and a ton o new, cheap hardware:

an odroid c1, kernel 3.10.80 does not create the rules tables nor
default from. I honestly don't know what mainline kernel was "good"
for source specific routing... odroid c2 (64 bit arms, 50 bucks!),
which (sadly) ships with 3.14.29 - I get no default from or rules
table either.

Grump. Whose job is it to convince everybody to compile in this functionality?

D) There is perhaps something you can do about valid_lft vs preferred_lft?

Boy, can a machine get a lot of ipv6 addresses, seemingly 6 in my
installation. BUT! two have a preferred_lft of 0sec.

ex:

inet6 2601:645:4103:56c0:31c4:ae34:38b9:d758/64 scope global
temporary deprecated dynamic
   valid_lft 248076sec preferred_lft 0sec

All 6 IPs are exported via babel, and I assume that once 248076
seconds expire, babel will withdraw the route. For some reason, with 4
other IPs available, I wouldn't mind if it exported the "best" ones
out of each /64 somehow.

/me ducks

E) commit 8f422e0ebc2d3a0bf556611133a96b5e0a16bdb2 looks
"interesting"... what problems did it induce?
Dave Täht
Let's go make home routers and wifi faster! With better software!
https://www.gofundme.com/savewifi


On Wed, Mar 23, 2016 at 6:18 PM, Juliusz Chroboczek
 wrote:
> I've just fixed the diversity extension to comply with version -01 of the
> Internet-Draft.  Since I haven't heard any objections about the new
> configuration interface, I'm wondering if we're ready for 1.8.0.
>
> (The obvious missing feature is the ability to edit filtering rules at
> runtime, but the user-interface is not obvious.  Adding Quagga-style
> route-maps seems overkill to me.)
>
> -- Juliusz
>
> ___
> Babel-users mailing list
> Babel-users@lists.alioth.debian.org
> http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/babel-users

___
Babel-users mailing list
Babel-users@lists.alioth.debian.org
http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/babel-users

Re: [Babel-users] Some thoughts about Babel (feature requests)

2015-12-18 Thread Dave Taht
I guess I am puzzled about the need for tunnels in the architecture.
(I found the usage of "vpn" confusing, to me a vpn offers additional
features like encryption).

You connect to each node (all 1400 of them?) to gather data via
nodewatcher?  (Why not just have a dedicated "control" port?)

or are the vpns merely to give you a single address space in the case
of a partitioned network and the other external links holding it
together?

... somewhat unrelated ...

I'm dying to know if ipv6 packets can transit all these different
product and link types from one edge of the network to
another..

___
Babel-users mailing list
Babel-users@lists.alioth.debian.org
http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/babel-users


[Babel-users] [RFC patch] linux perf support for libmusl

2015-12-17 Thread Dave Taht
since there is a need for profiling babel and other subsystems (in my
case, cake) on tiny systems, the attached patch has been sitting in my
tree for a while, and supplied here in the hope it will be useful
while its problems are sorted out...

...

openwrt: Add support for musl in the linux kernel perf utility

The linux kernel's "perf" profiling utility has a hard dependency on
glibc's strerror_r which has a notorious incompatibility with other
posix standards.

This patch wraps the musl posix std strerr_r with something that behaves like
glibc strerror_r - and currently breaks glibc support as I was unable to
figure out a combination of defines that would work mutually across both
C libraries.

My current take on it is that linux mainline perf should switch to using
strerror_l which is std and known to be thread safe, and would result in
less code

The patch only supports linux-4.4 but is easily extended backwards.

H/T to stephen walker for the initial attempt, nbd for a refinement.

untested on anything but arm at present.
From 866299c8c0902c8e21eee7b0f54c3abd74feb494 Mon Sep 17 00:00:00 2001
From: CeroWrt Admin <dave.t...@bufferbloat.net>
Date: Thu, 17 Dec 2015 12:49:36 +0100
Subject: [PATCH] openwrt: Add support for musl in the linux kernel perf
 utility

The linux kernel's "perf" profiling utility has a hard dependency on
glibc's strerror_r which is a notorious incompatability with other
posix standards.

This patch wraps the musl std strerr_r with something that behaves like
glibc strerror_r - and currently breaks glibc support as I was unable to
figure out a combination of defines that would work mutually across both
C libraries.

My current take on it is that linux mainline perf should switch to using
strerror_l which is std and known to be thread safe.

The patch only supports linux-4.4x but is easily extended backwards.

H/T to stephen walker for the initial attempt.
---
 package/devel/perf/Makefile|   2 +-
 .../patches-4.4/280-perf-fixes-for-musl.patch  | 143 +
 2 files changed, 144 insertions(+), 1 deletion(-)
 create mode 100644 target/linux/generic/patches-4.4/280-perf-fixes-for-musl.patch

diff --git a/package/devel/perf/Makefile b/package/devel/perf/Makefile
index 5e3d63f..27ef7b8 100644
--- a/package/devel/perf/Makefile
+++ b/package/devel/perf/Makefile
@@ -19,7 +19,7 @@ include $(INCLUDE_DIR)/package.mk
 define Package/perf
   SECTION:=devel
   CATEGORY:=Development
-  DEPENDS:= @USE_GLIBC +libelf1 +libdw +libpthread +librt +binutils
+  DEPENDS:=+libelf1 +libdw +libpthread +librt +binutils
   TITLE:=Linux performance monitoring tool
   VERSION:=$(LINUX_VERSION)-$(PKG_RELEASE)
   URL:=http://www.kernel.org
diff --git a/target/linux/generic/patches-4.4/280-perf-fixes-for-musl.patch b/target/linux/generic/patches-4.4/280-perf-fixes-for-musl.patch
new file mode 100644
index 000..029ac7d
--- /dev/null
+++ b/target/linux/generic/patches-4.4/280-perf-fixes-for-musl.patch
@@ -0,0 +1,143 @@
+From 9673ba5369408008deef840e21edab3fa7a575fd Mon Sep 17 00:00:00 2001
+From: Dave Taht <dave.t...@bufferbloat.net>
+Date: Tue, 15 Dec 2015 17:19:14 +0100
+Subject: [PATCH] perf fixes for musl
+
+---
+ tools/lib/api/fs/tracing_path.c|  3 +++
+ tools/lib/traceevent/event-parse.c |  4 
+ tools/perf/perf.c  | 17 -
+ tools/perf/util/cache.h|  2 +-
+ tools/perf/util/cloexec.c  |  4 
+ tools/perf/util/cloexec.h  |  4 
+ tools/perf/util/util.h |  4 
+ 7 files changed, 28 insertions(+), 10 deletions(-)
+
+diff --git a/tools/lib/api/fs/tracing_path.c b/tools/lib/api/fs/tracing_path.c
+index a26bb5e..a04df38 100644
+--- a/tools/lib/api/fs/tracing_path.c
 b/tools/lib/api/fs/tracing_path.c
+@@ -10,6 +10,9 @@
+ #include "fs.h"
+ 
+ #include "tracing_path.h"
++/* musl has a xpg compliant strerror_r by default */
++#define strerror_r(err, buf, buflen) \
++(strerror_r(err, buf, buflen) ? NULL : buf)
+ 
+ 
+ char tracing_mnt[PATH_MAX] = "/sys/kernel/debug";
+diff --git a/tools/lib/traceevent/event-parse.c b/tools/lib/traceevent/event-parse.c
+index 2a912df..0644b42 100644
+--- a/tools/lib/traceevent/event-parse.c
 b/tools/lib/traceevent/event-parse.c
+@@ -36,6 +36,10 @@
+ #include "event-parse.h"
+ #include "event-utils.h"
+ 
++/* musl has a xpg compliant strerror_r by default */
++#define strerror_r(err, buf, buflen) \
++(strerror_r(err, buf, buflen) ? NULL : buf)
++
+ static const char *input_buf;
+ static unsigned long long input_buf_ptr;
+ static unsigned long long input_buf_siz;
+diff --git a/tools/perf/perf.c b/tools/perf/perf.c
+index 3d4c7c0..91f57b0 100644
+--- a/tools/perf/perf.c
 b/tools/perf/perf.c
+@@ -523,6 +523,21 @@ void pthread__unblock_sigwinch(void)
+ 	pthread_sigmask(SIG_UNBLOCK, , NULL);
+ }
+ 
++unsigned cache_line_size(void);
++
++uns

Re: [Babel-users] Detecting bridges

2015-12-16 Thread Dave Taht
the resulting nanog conversation on detecting wireless bridged ended
up interesting - with several clever techniques proposed - all
probably futile.

http://mailman.nanog.org/pipermail/nanog/2015-December/082902.html

I fear the default for babel should become etx or rtt as most of the
world bridges wifi to ethernet.

___
Babel-users mailing list
Babel-users@lists.alioth.debian.org
http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/babel-users


Re: [Babel-users] Detecting bridges

2015-12-15 Thread Dave Taht
On Mon, Dec 14, 2015 at 8:44 PM, Juliusz Chroboczek
 wrote:
> Wrong thread?

Nope.

>
>> Is there a reliable way of determining that an underlying interface is
>> a bridge?
>
> https://github.com/jech/babeld/blob/master/kernel_netlink.c#L723
>
> However, this only works for the Linux software bridge.  It doesn't work
> for the hardware switches built into your favourite router, and of course
> it doesn't work for switches connected over Ethernet, which is what the
> WLAN-SI people are using.

I was basically wondering if there was something like an igmp message that asked
if this "wire" was bridged to anything.

The default outside of babel towers and the yurtlab is to bridge wifi
to ethernet.

>
> -- Juliusz
>
>

___
Babel-users mailing list
Babel-users@lists.alioth.debian.org
http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/babel-users


[Babel-users] latency in WLAN-SI

2015-12-15 Thread Dave Taht
I tend to fork convos, sorry.

On Tue, Dec 15, 2015 at 12:33 AM, Mitar  wrote:
> Hi!
>
> On Mon, Dec 14, 2015 at 11:39 AM, Juliusz Chroboczek
>  wrote:
>> I'd like more evidence that this is needed.  Estimating packet loss is
>> very slow (since we're computing a metric from what is just a discrete,
>> one-bit signal), so it slows down convergence.  Hopefully we can get away
>> without it.
>
> Hm, isn't computation on WiFi links exactly the same?
>
>> Does your RTT increase at the same time as packet loss?  If so, we could
>> probably do without packet loss.
>
> Not really. At least on the fiber links (which are most of our VPN
> links) it does not.
>
>> (Recall that the goal is not to have an accurate model of the real
>> world -- the goal is to have traffic flow according to optimal paths.  If
>> the traffic is going where you want it to go, there's no need to add more
>> complexity to babeld.)
>
> Currently it seems that the routes over VPN are dropped while we would
> prefer them to stay up, even if there is a slight packet loss.
>
>>> Why have you disabled packet-loss metric on VPN links?
>>
>> Because it's an experimental feature, that hasn't had enough real-world
>> deployment.  It works beautifully in our tests, it works beautifully in
>> Nexedi's network, and if it works as well in your network, I'll enable it
>> by default.
>
> Hm, are we talking here about packet-loss or delay-based routing
> (RTT)? I understand that RTT metric is experimental, but I was talking
> about packet-loss, why is that not enabled. Or am I missing something?
>
>> First, it slows down reaction to link failures.  If you're on an Ethernet,
>> and you lose two packets in a row, you can be pretty sure the link is
>> down
>
> Or we have a very short buffer. ;-)

You certainly know how to tweak me!

I'd hope that your fiber nodes were fast enough to never need stuff
like fq_codel and sqm-scripts configured, but that would need testing
and confirmation.

I note that recently a change to fq_codel's default behavior landed in
openwrt, changing the quantum from 300 to 1514. This really should not
affect anything but really slow links (say, < 4mbit), and fq_codel was
never the right thing for anything but p2p wifi anyway.

>
> But yes, maybe our recent instability in VPN links is more to the
> problems with routing we have, then really link instability.
>
> But we do have VPN links which go between countries. We have observed
> really crazy stuff on for example links between Croatia and Slovenia.
> Sometimes extra 100 ms appears on the link, because they have some
> issues at Internet exchanges, for example (so delay is added at the
> Internet exchange).

Do you have something like smokeping configured? I would love to see
data on not just the vpn issues but on the latencies across the mesh.
The latency under load data toke showed at battle mesh for how linux
wifi behaves under load these days does also apply in the real world,
but perhaps less... or much more, depending on the quality of the
link. Yes, I have seen 10s of seconds of delay...

In lui of deploying smokeping everywhere I've been thinking of adding
a lightweight latency test to flent, where we just test lightweight
udp and icmp continuously with, say, a 1sec or even 60 sec period, to
dozens or hundreds of hosts, for hours at a time, all the time
Smokeping's basic plot is good, but flent can do a better job of
aggregating data across more variables. Another approach would be
embed timestamps in the needed overhead traffic (be that babel or
other) and get everything synced up via ntp...

As best as I recall your vpn was pure udp, no crypto, no retries I
see in your (nice!) web interface that you track if a node is
reachable, but not an observed rtt.

>
> I do not think that VPN links should be seen as Ethernet. For Ethernet
> I agree that if you loose two packets you have probably issues. But
> for VPN you have stuff in between, from bad ISPs, to MTU issues which
> make some packets get lost (especially while PMTU is in progress).

Could you clarify the behavior of your vpn? vpn's over tcp will never
lose packets. vpns that do crypto tend to bottleneck on the crypto
step and drop packets in the read side of the socket. nearly anything
using the tun/tap interfaces tends to be slow, the recent Foo over UDP
stuff corrects some of that:
https://www.netdev01.org/docs/herbert-UDP-Encapsulation-Linux.pdf

>> Second, the link quality estimator uses ETX, which is optimised for
>> multicast Hellos over WiFi links (it's quadratic in loss rate).
>> A different formula should be used for lossy wired links and for unicast
>> wireless tunnels.  (But then, perhaps ETX works well enough on tunnels --
>> I have no idea.)
>
> We have been using ETX with OLSRv1 on tunnels without visible issues.
>
> What do you use for ETX? ETX = 1 / (d_f x d_r) is for unicast (as
> described in the A High-Throughput Path Metric for Multi-Hop 

Re: [Babel-users] Detecting bridges

2015-12-15 Thread Dave Taht
On Tue, Dec 15, 2015 at 11:00 AM, Henning Rogge <hro...@gmail.com> wrote:
> On Tue, Dec 15, 2015 at 10:35 AM, Dave Taht <dave.t...@gmail.com> wrote:
>> On Mon, Dec 14, 2015 at 8:44 PM, Juliusz Chroboczek
>>>> Is there a reliable way of determining that an underlying interface is
>>>> a bridge?
>>>
>>> https://github.com/jech/babeld/blob/master/kernel_netlink.c#L723
>>>
>>> However, this only works for the Linux software bridge.  It doesn't work
>>> for the hardware switches built into your favourite router, and of course
>>> it doesn't work for switches connected over Ethernet, which is what the
>>> WLAN-SI people are using.
>>
>> I was basically wondering if there was something like an igmp message that 
>> asked
>> if this "wire" was bridged to anything.
>>
>> The default outside of babel towers and the yurtlab is to bridge wifi
>> to ethernet.
>
> Maybe something like this?
>
> https://en.wikipedia.org/wiki/Link_Layer_Discovery_Protocol
>
> Not sure how well it is supported by consumer grade switches.

Well, while I have seen those go by, along with stp, in fiddling with
bridging and unbridging a test openwrt box (linksys ac1200) today I do
not even see BPDUs with wireshark...

https://en.wikipedia.org/wiki/Spanning_Tree_Protocol

deep magic, long hidden. My thought was to put out a query over a
protocol like this, and/or learn something about the bridging
topology, passively, just as the switches do.

> Henning Rogge

___
Babel-users mailing list
Babel-users@lists.alioth.debian.org
http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/babel-users


Re: [Babel-users] latency in WLAN-SI

2015-12-15 Thread Dave Taht
In pouring through the astonishing *wealth* of data available via
nodewatcher, I finally scrolled down to the chart next to the very
bottom to find rtt measurements.

so are you really seeing real-world peaks in the 7 second range or is
that an artifact of something else?

zoomed in: http://snapon.cs.kau.se/~d/peaks_7_sec.png

https://nodes.wlan-si.net/node/e6668d55-5e8d-4e32-94e3-ea8e9c23e5f5/

___
Babel-users mailing list
Babel-users@lists.alioth.debian.org
http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/babel-users


Re: [Babel-users] Bucket full, dropping packet

2015-12-13 Thread Dave Taht
On Sun, Dec 13, 2015 at 2:52 PM, Juliusz Chroboczek
 wrote:
>> Ok, I can do some profiling on the babeld that is running on the VPN
>> server with the large number of links. Just tell me what profiling data
>> do you want? Should I just compile a debug build and run babeld through
>> callgrind or do you have something else in mind?
>
> I'm not familiar with callgrind -- I've had both results with both "perf
> record" and gprof.  But yes, callgrind should be fine.
>
> I need to find out where the CPU time is going.  I suspect either the
> quadratic loop in xroute.c, or linear-time route selection in route.c.
> I intend to fix both, but I'd like to be sure.
>
>> Yes, you only need to establish a VPN connection to our server using
>> tunneldigger-client [1] (it compiles on Debian) and run babeld on the
>> VPN interface. We only need to allocate an IPv4 address for you so there
>> will be no conflicts.
>
> Ok, I'll see on Monday if I can get an extra VM before Christmas.

I have a half dozen machines all over the world, courtesy of linode.
Can spin a new one up for you in a matter of minutes.

>
> -- Juliusz
>
> ___
> Babel-users mailing list
> Babel-users@lists.alioth.debian.org
> http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/babel-users

___
Babel-users mailing list
Babel-users@lists.alioth.debian.org
http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/babel-users


Re: [Babel-users] Bucket full, dropping packet

2015-12-08 Thread Dave Taht
On Tue, Dec 8, 2015 at 5:58 PM, Jernej Kos  wrote:
> Hello!
>
> On 07. 12. 2015 17:14, Juliusz Chroboczek wrote:
>> Yes, that's expected.  Please increase the limits, be bold, multiply them
>> by 20.
>
> It seems that raising the limits solved the problem. Thanks!
>
> We are still on the lookout for unparsable packets ;-)

I would like to see someone working on a babel fuzzer, or does someone
know of a tool that could generate tons of packets bogus in every way
possible?
>
> Jernej
>
>
> ___
> Babel-users mailing list
> Babel-users@lists.alioth.debian.org
> http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/babel-users

___
Babel-users mailing list
Babel-users@lists.alioth.debian.org
http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/babel-users


Re: [Babel-users] Low level rewrite, install filters

2015-11-13 Thread Dave Taht
in the march towards 1.7 process...

0) What other features do you plan for 1.7?

1) Not clear to me if this does atomic route updates instead of delete/add?

2) can I try to get the "version" number into the telnet interface and
command line in this go around?

3) fixing up wifi channel awareness to use the current linux netlink api
(trying to find someone to take that on)



On Thu, Nov 12, 2015 at 5:50 PM, Juliusz Chroboczek
 wrote:
> Mitar, Kosko,
>
> Matthieu has just done a tremendous job of cleaning up the lower layers of
> babeld.  One of the effects is that export filters are now known as
> install filters, so you say something like
>
>   install ip ::/0 le 0 table 42
>
> I've taken the risk of pushing this patch series into trunk, please test.
> If you're using babeld in production and don't need the install filtering
> functionality, please stick to the tag babeld-1.6.3 for now.
>
> This will become babeld-1.7.0 at some point.  I might create a 1.6 branch
> if it proves unstable.
>
> Please, please, please test.
>
> -- Juliusz
>
>
>
>
> ___
> Babel-users mailing list
> Babel-users@lists.alioth.debian.org
> http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/babel-users

___
Babel-users mailing list
Babel-users@lists.alioth.debian.org
http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/babel-users


[Babel-users] Last call for signatures to the FCC on the wifi lockdown issue

2015-10-09 Thread Dave Taht
The CeroWrt project's letter to the FCC on how to better manage the
software on wifi and home routers vs some proposed regulations is now
in last call for signatures. The final draft of our FCC submittal is
here:

https://docs.google.com/document/d/15QhugvMlIOjH7iCxFdqJFhhwT6_nmYT2j8xAscCImX0/edit?usp=sharing

The principal signers (Dave Taht and Vint Cerf), are joined by many
network researchers, open source developers, and dozens of developers
of aftermarket firmware projects like OpenWrt.

Prominent signers currently include:

Jonathan Corbet, David P. Reed, Dan Geer, Jim Gettys, Phil Karn, Felix
nFietkau, Corinna "Elektra" Aichele, Randell Jesup, Eric S. Raymond,
Simon Kelly, Andreas Petlund, Sascha Meinrath, Joe Touch, Dave Farber,
Nick Feamster, Paul Vixie, Bob Frankston, Eric Schultz, Brahm Cohen,
Jeff Osborn, Harald Alvestrand, and James Woodyatt.

If you would like to join our call for substituting sane software
engineering practices over misguided regulations, the window for
adding your signature to the letter closes at 11:59AM ET, today,
Friday, 2015-10-08.

Sign via webform here: http://goo.gl/forms/WCF7kPcFl9

We are at approximately 170 signatures as I write.

For more details on the controversy we are attempting to address, or
to submit your own filing to the FCC see:

https://libreplanet.org/wiki/Save_WiFi
https://www.dearfcc.org/

Sincerely,

Dave Täht
CeroWrt Project Architect
Tel: +46547001161

___
Babel-users mailing list
Babel-users@lists.alioth.debian.org
http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/babel-users

Re: [Babel-users] [PATCH] Add option to consider sysctl write failures as non-fatal.

2015-08-10 Thread Dave Taht
Well, it would be better if babel checked to see if the (sometimes
read-only) sysctl value was already correct, instead of blithely
trying to write it.

On Mon, Aug 10, 2015 at 9:43 AM, Jernej Kos jer...@kos.mx wrote:
 Hello!

 +1 for this patch. We are also running babeld in a Docker container and
 this requires us to run it as a privileged container due to some Docker
 deficiencies in setting network sysctls inside containers.


 Jernej

 On 10. 08. 2015 18:02, Toke Høiland-Jørgensen wrote:
 Babeld will exit with a fatal error if it is unable to write sysctls.
 When running in a container, however, /proc/sys may be mounted
 read-only, which causes babeld to fail.

 This adds a switch to consider sysctl failures as non-fatal, in which
 case a warning will be issues rather than having the daemon fail to
 start.

 Signed-off-by: Toke Høiland-Jørgensen t...@toke.dk
 ---
  babeld.c |  8 ++--
  babeld.h |  1 +
  babeld.man   | 12 
  configuration.c  |  3 +++
  kernel_netlink.c |  7 ++-
  5 files changed, 28 insertions(+), 3 deletions(-)

 diff --git a/babeld.c b/babeld.c
 index 943f042..45b14fa 100644
 --- a/babeld.c
 +++ b/babeld.c
 @@ -67,6 +67,7 @@ int default_wired_hello_interval = -1;
  int resend_delay = -1;
  int random_id = 0;
  int do_daemonise = 0;
 +int sysctl_nonfatal = 0;
  const char *logfile = NULL,
  *pidfile = /var/run/babeld.pid,
  *state_file = /var/lib/babel-state;
 @@ -128,7 +129,7 @@ main(int argc, char **argv)

  while(1) {
  opt = getopt(argc, argv,
 - m:p:h:H:i:k:A:srR:uS:d:g:lwz:M:t:T:c:C:DL:I:);
 + m:p:h:H:i:k:A:srR:uS:d:g:lwz:M:t:T:c:C:DL:I:F);
  if(opt  0)
  break;

 @@ -263,6 +264,9 @@ main(int argc, char **argv)
  case 'D':
  do_daemonise = 1;
  break;
 + case 'F':
 +sysctl_nonfatal = 1;
 +break;
  case 'L':
  logfile = optarg;
  break;
 @@ -800,7 +804,7 @@ main(int argc, char **argv)
  
  [-t table] [-T table] [-c file] [-C statement]\n
  
 -[-d level] [-D] [-L logfile] [-I pidfile]\n
 +[-d level] [-D] [-F] [-L logfile] [-I pidfile]\n
  
  [id] interface...\n,
  argv[0]);
 diff --git a/babeld.h b/babeld.h
 index 92ce9b5..c1f26fe 100644
 --- a/babeld.h
 +++ b/babeld.h
 @@ -86,6 +86,7 @@ extern time_t reboot_time;
  extern int default_wireless_hello_interval, default_wired_hello_interval;
  extern int resend_delay;
  extern int random_id;
 +extern int sysctl_nonfatal;
  extern int do_daemonise;
  extern const char *logfile, *pidfile, *state_file;
  extern int link_detect;
 diff --git a/babeld.man b/babeld.man
 index ec600c2..7182f30 100644
 --- a/babeld.man
 +++ b/babeld.man
 @@ -135,6 +135,11 @@ Specify a configuration statement directly on the 
 command line.
  .B \-D
  Daemonise at startup.
  .TP
 +.B \-F
 +Don't consider failures writing to sysctl as fatal. Warn of the failures, 
 but
 +continue running. This is useful when running babeld in a container that 
 mounts
 +/proc/sys as read-only (as systemd-nspawn does, for instance).
 +.TP
  .BI \-L  logfile
  Specify a file to log random ``how do you do?'' messages to.  This
  defaults to standard error if not daemonising, and to
 @@ -253,6 +258,13 @@ This specifies whether to daemonize at startup, and is 
 equivalent to
  the command-line option
  .BR \-D .
  .TP
 +.BR sysctl-nonfatal  { true | false }
 +This controls whether to consider failures writing to sysctl as fatal. If 
 set,
 +warn of the failures, but continue running. This is useful when running 
 babeld
 +in a container that mounts /proc/sys as read-only (as systemd-nspawn does, 
 for
 +instance). Equivalent to the command line option
 +.BR \-F .
 +.TP
  .BI state-file  filename
  This specifies the name of the file used for preserving long-term
  information between invocations of the
 diff --git a/configuration.c b/configuration.c
 index 6a9c09d..571e220 100644
 --- a/configuration.c
 +++ b/configuration.c
 @@ -691,6 +691,7 @@ parse_option(int c, gnc_t gnc, void *closure, char 
 *token)
strcmp(token, link-detect) == 0 ||
strcmp(token, random-id) == 0 ||
strcmp(token, daemonise) == 0 ||
 +  strcmp(token, sysctl-nonfatal) == 0 ||
strcmp(token, ipv6-subtrees) == 0 ||
strcmp(token, reflect-kernel-metric) == 0) {
  int b;
 @@ -706,6 +707,8 @@ parse_option(int c, gnc_t gnc, void *closure, char 
 *token)
  random_id = b;
  else if(strcmp(token, daemonise) == 0)
  do_daemonise = b;
 +else if(strcmp(token, sysctl-nonfatal) == 0)
 +sysctl_nonfatal = b;
  else if(strcmp(token, ipv6-subtrees) == 0)
  has_ipv6_subtrees = b;
  else if(strcmp(token, 

Re: [Babel-users] Fwd: Specify router-id explicitly

2015-07-10 Thread Dave Taht
I just did a pull of your head, built it, started a new babel...
waited 2 minutes (I was assuming the calculation for the routerid had
changed?)... and did not get a working route to elsewhere until I
tried talking to another route.

(note I did not update the other 2 babelds in operation on this
network from the old openwrt babels)

I get working routes locally, but the next box upstream refused any connection.

d@ganesha:~/git/babeld$ ip route
default via 172.26.22.224 dev wlan0  proto babel onlink
76.102.227.25 via 172.26.22.224 dev wlan0  proto babel onlink
172.26.0.0/16 dev wlan0  proto kernel  scope link  src 172.26.65.1
172.26.16.0/24 via 172.26.22.224 dev wlan0  proto babel onlink
172.26.16.1 via 172.26.22.224 dev wlan0  proto babel onlink
172.26.17.0/24 via 172.26.18.224 dev wlan0  proto babel onlink
172.26.17.1 via 172.26.22.224 dev wlan0  proto babel onlink
172.26.18.0/24 dev wlan0  proto kernel  scope link  src 172.26.18.225
172.26.19.0/24 via 172.26.22.224 dev wlan0  proto babel onlink
172.26.20.0/24 via 172.26.18.224 dev wlan0  proto babel onlink
172.26.22.224 via 172.26.22.224 dev wlan0  proto babel onlink
172.26.128.0/27 dev eth0  proto kernel  scope link  src 172.26.128.18
192.168.1.0/24 dev eth0  proto kernel  scope link  src 192.168.1.202
d@ganesha:~/git/babeld$ ssh root@172.26.22.224

fails.

Weirdness - after I tried connecting to here

traceroute to 172.26.19.1 (172.26.19.1), 30 hops max, 60 byte packets
 1  172.26.22.224  102.543 ms  102.529 ms  103.409 ms # 100ms
 2  172.26.19.1  104.586 ms  107.553 ms  107.572 ms

stuff started to work.

I went back to the older version and got the same result for over 2
minutes but it did eventually work. Then I went back to the newer one
waiting over 2 minutes... and that too, eventually did work.

running ubuntu 3.19.0-15-generic, ath9k wifi. Babeld.conf is:

ipv6-subtrees true
out if vpn6 ip 0.0.0.0/0 deny
in if vpn6 ip 0.0.0.0/0 deny

I am going to put this down to a bug not in babel. (100ms to my next hop? wtf?)

And I am getting on a plane transiting paris in the next few minutes.
See you in prague!

~

On Fri, Jul 10, 2015 at 11:53 AM, Juliusz Chroboczek
j...@pps.univ-paris-diderot.fr wrote:
 This means that all the interfaces on a node will have a link-local
 address?  That's a very bad idea.

 This was meant to say the same link-local address.

 ___
 Babel-users mailing list
 Babel-users@lists.alioth.debian.org
 http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/babel-users



-- 
Dave Täht
worldwide bufferbloat report:
http://www.dslreports.com/speedtest/results/bufferbloat
And:
What will it take to vastly improve wifi for everyone?
https://plus.google.com/u/0/explore/makewififast

___
Babel-users mailing list
Babel-users@lists.alioth.debian.org
http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/babel-users

[Babel-users] redistribute local prune?

2015-07-08 Thread Dave Taht
Instead of adding a covering route, and explicitly denying other
routes from being distributed,
would it be possible to have a smarter filter with syntax like

redistribute local prune

That would cut something like this down from:

2601:696:8300:2cb0:dc9f:abff:fe06:1a24 via fe80::28c6:8eff:febb:9ff0
dev eth0  proto babel  metric 1024
2601:696:8300:2cb0::/64 via fe80::28c6:8eff:febb:9ff0 dev eth0  proto
babel  metric 1024
2601:696:8300:2cb1::/64 via fe80::28c6:8eff:febb:9ff0 dev eth0  proto
babel  metric 1024
2601:696:8300:2cb2::/64 via fe80::28c6:8eff:febb:9ff0 dev eth0  proto
babel  metric 1024
2601:696:8300:2cb0::/60 via fe80::28c6:8eff:febb:9ff0 dev eth0  proto
babel  metric 1024

to just

2601:696:8300:2cb0::/60 via fe80::28c6:8eff:febb:9ff0 dev eth0  proto
babel  metric 1024


-- 
Dave Täht
worldwide bufferbloat report:
http://www.dslreports.com/speedtest/results/bufferbloat
And:
What will it take to vastly improve wifi for everyone?
https://plus.google.com/u/0/explore/makewififast

___
Babel-users mailing list
Babel-users@lists.alioth.debian.org
http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/babel-users

[Babel-users] a cautionary note on setting up new babel nodes

2015-06-26 Thread Dave Taht
As I am rolling out a bunch of new babel nodes, I decided to get a
cluster (2 nanos and a pico) up in the lab, where I have good
connectivity to the rest of the network, to replace an aging cluster
by the pool.

So I booted it up and configured it for the right channels and a new
set of ip addresses... didnt have good LED support at all (RSSI does
not seem to do anything)...

I got blinkenlights to sort of work, and they were lit up, kind of
solid, for some reason... [1]

...people started wandering by to complain about the network...
naturally I didnt notice because I was even closer to the exit points
than anyone else...

...to discover that I was offering the shortest path to the exit
nodes, and thus had bypassed the two existing ~50mbit links into lab
links that were located indoors and going through a thousand+ meters
of trees... that was barely doing a megabit with 800+ms of delay.
(channel diversity not working did not help either)

After that experience, I decided that I would make the firmware for
unconfigured nodes export a 512 metric, and use a high rxcost until
they were fully configured AND in place. I might disable ipv4 entirely
in favor of the autoconfigured ula openwrt has, and just start
configuring stuff based on the appearance of new ulas in the network.

[1] if you come up with a useful LED config for nanostations and
picostations, let me know.

-- 
Dave Täht
worldwide bufferbloat report:
http://www.dslreports.com/speedtest/results/bufferbloat
And:
What will it take to vastly improve wifi for everyone?
https://plus.google.com/u/0/explore/makewififast

___
Babel-users mailing list
Babel-users@lists.alioth.debian.org
http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/babel-users

Re: [Babel-users] Diversity routing in a many channeled world

2015-06-18 Thread Dave Taht
The iw package appears canonical, and BSD licensed, and uses
nl80211.h (exported from the linux kernel, also BSD licensed), and
it's 3 netlink calls to derive the channel and other info.

http://pastebin.com/JXqcLNfX

root@davedesk2:~# strace -f -v iw dev wlan0 info 2 netlink.txt
Interface wlan0
ifindex 13
wdev 0x2

addr 04:a1:51:a3:13:04
ssid davedesk2-test
type AP
wiphy 0
channel 11 (2462 MHz), width: 20 MHz, center1: 2462 MHz

Regrettably I don't have time to poke into this further this month, I
have musl breakages elsewhere (like snmp) to fix. Are we at a point
where I could just file a bug somewhere?

___
Babel-users mailing list
Babel-users@lists.alioth.debian.org
http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/babel-users


[Babel-users] Diversity routing in a many channeled world

2015-06-17 Thread Dave Taht
On Wed, Jun 17, 2015 at 3:38 PM, Juliusz Chroboczek
j...@pps.univ-paris-diderot.fr wrote:
 To somewhat answer my own question, when babel is used on a standalone
 AP interface, it is unable to automagically determine the channel it is
 on. So I guess whatever it uses can only figure out a adhoc interface?


 The message you're getting means that the SICGIWFREQ ioctl has failed.
 Could you please ask the netdev mailing list whether it's supposed to
 succeed on an AP, and whether I should be using a different API?

well, linux-wireless would be better and for sanity's sake I am
currently on neither list.

So I went to a definitive bit of source code instead:

apt-get source network-manager

This has a truly mindblowing level number of abstractions piled on
abstractions...

and what I see in src/wifi are libs for wext and nl80211.

for wext, I see SIOCSIWFREQ being used, only, in network-manager.  I
have no idea what the two differ in actual intent?

#define SIOCSIWFREQ 0x8B04  /* set channel/frequency (Hz) */
#define SIOCGIWFREQ 0x8B05  /* get channel/frequency (Hz) */

nl80211 has this rather big dump format, but seems comprehensive.

I would suspect nl80211 is more of what is needed nowadays... (this is
an ath9k chip here)... or all 3.

sort of on this topic was being able to distinguish between ht20,
ht40, and the newer vht80, 160, 320 modes, which babel supports
multiple channels in the encoding, but I am not sure is pulled out in
the code. And on 2.4ghz channel 3 interferes with both 1 and 6. Etc.

on to babel-1.6.2! :)



 -- Juliusz



-- 
Dave Täht
worldwide bufferbloat report:
http://www.dslreports.com/speedtest/results/bufferbloat
And:
What will it take to vastly improve wifi for everyone?
https://plus.google.com/u/0/explore/makewififast

___
Babel-users mailing list
Babel-users@lists.alioth.debian.org
http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/babel-users

Re: [Babel-users] ANNOUNCE: babeld-1.6.1

2015-06-17 Thread Dave Taht
On Wed, Jun 17, 2015 at 1:28 AM, Gabriel Kerneis gabr...@kerneis.info wrote:
 Le 2015-06-17 05:41, Dave Taht a écrit :

 I did notice, in finally attempting to switch off of the old babels
 package in openwrt to the new babel package updated for 1.6.1, that
 there is no option in openwrt to enable diversity routing anymore. Is
 this an intentional omission?


 It should work like any other babel option:

 config general
 option diversity true
 option diversity_factor 128

 Please let me know if this doesn't work.

1) Thx. I will try. Wouldn't I want to use 3 here, instead of true?

I realize (now) that the init script translates the whatever_whatever
stuff into the babel command set, but it would be nice if the whole
command set in openwrt was documented and commented out in the
babeld.config file.

Where I had gone wrong was grepping for -z and diversity in the
babeld.init file...

2) Is there any reason why diversity should not default to true?

3) some more configuration examples in babeld.conf and in the openwrt
babeld.config file would be helpful. I am not attempting right now to
use hnetd to further configure babel, and do get a variety of /56s and
/60s from my upstreams, so i was thinking that the below src specific
routing options might be good examples? (but have not tried them yet,
so good is an option)

http://pastebin.com/Cmdh9YMM

4) I am also curious if an old configuration trick of mine could be
overridden successfully with the new babel /tmp/babel.d/ configuration
method?

in /etc/babeld.conf

out if wan ip 0.0.0.0/0 deny
in if wan ip 0.0.0.0/0 deny

then later in /tmp/babeld.d/100-yea-no-nat.conf

out if wan ip 0.0.0.0/0 allow
in if wan ip 0.0.0.0/0 allow

My general assumption is not. In my case it is always safe to run
babel over ipv6 on all interfaces, and sometimes not on ipv4. I am
perfectly happy to just get a set of ipv6 routes first.



(no, haven't got around to doing any of this, barely got a compile
working last night)







 Best,
 --
 Gabriel


 ___
 Babel-users mailing list
 Babel-users@lists.alioth.debian.org
 http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/babel-users



-- 
Dave Täht
What will it take to vastly improve wifi for everyone?
https://plus.google.com/u/0/explore/makewififast

___
Babel-users mailing list
Babel-users@lists.alioth.debian.org
http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/babel-users

Re: [Babel-users] ANNOUNCE: babeld-1.6.1

2015-06-17 Thread Dave Taht
another configuration example... given that the openwrt firewall is
default-deny.

config rule
option name 'Allow-Babel'
option family 'ipv6'
option src 'wan'
option dest_port '6696'
option proto 'udp'
option target 'ACCEPT'

-- 
Dave Täht
What will it take to vastly improve wifi for everyone?
https://plus.google.com/u/0/explore/makewififast

___
Babel-users mailing list
Babel-users@lists.alioth.debian.org
http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/babel-users

Re: [Babel-users] ANNOUNCE: babeld-1.6.1

2015-06-16 Thread Dave Taht
I build this and configured it on three systems, no issues.

was about to attempt an openwrt build now that dnsmasq-2.73 has landed
also... where...

I did notice, in finally attempting to switch off of the old babels
package in openwrt to the new babel package updated for 1.6.1, that
there is no option in openwrt to enable diversity routing anymore. Is
this an intentional omission?

On Tue, Jun 16, 2015 at 3:27 PM, Juliusz Chroboczek
j...@pps.univ-paris-diderot.fr wrote:
 Dear all,

 Babeld-1.6.1 is available from


 http://www.pps.univ-paris-diderot.fr/~jch/software/files/babeld-1.6.1.tar.gz

 http://www.pps.univ-paris-diderot.fr/~jch/software/files/babeld-1.6.1.tar.gz.asc

 This version fixes a buffer overflow which, while probably not
 exploitable, could in principle cause incorrect routing tables to be
 computed in the presence of source-specific routes.  The other
 user-visible change is that ipv6-subtrees will be used by default on Linux
 3.11 and later.

 Please upgrade.

 -- Juliusz

 16 June 2015: babeld-1.6.1.

   * Fixed a buffer overflow in zone_equal.  This is probably not
 exploitable, but might cause incorrect routing tables in the presence
 of source-specific routing.
   * Added support for defaulting ipv6-subtrees automatically based on the
 kernel version.
   * Fixed compilation under musl.



 ___
 Babel-users mailing list
 Babel-users@lists.alioth.debian.org
 http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/babel-users



-- 
Dave Täht
What will it take to vastly improve wifi for everyone?
https://plus.google.com/u/0/explore/makewififast

___
Babel-users mailing list
Babel-users@lists.alioth.debian.org
http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/babel-users

  1   2   3   >