Dusan, Thanks. I seemed to have misunderstood yo before. That sounds like it, yes.
After reading through most, this might be _the_ issue: https://github.com/moby/moby/issues/16720#issuecomment-435637740 https://github.com/moby/moby/issues/16720#issuecomment-444862701 Alessandro, can you try the suggested once the container is in failed state? conntrack -D -p udp Marc Missatge de Dusan Pajin <dusan.pa...@gmail.com> del dia dc., 9 de juny 2021 a les 21:54: > > Hi, > > Alessandro, do you use docker-compose or docker swarm (docker stack)? > > The behavior I am referring to is described in number of issues on Github, > for example: > https://github.com/moby/moby/issues/16720 > https://github.com/docker/for-linux/issues/182 > https://github.com/moby/moby/issues/18845 > https://github.com/moby/libnetwork/issues/1994 > https://github.com/robcowart/elastiflow/issues/414 > In some of those issues you will find links to other issues and so on. > > I don't have an explanation why this works for you in some situations and > some not. > SInce that is the case, you might try clearing the conntrack table, which is > described in some of the issues above. > Using the host network is certainly not convenient, but it is doable. > > Kind regards, > Dusan > > > > On Wed, Jun 9, 2021 at 7:37 PM Marc Sune <marcde...@gmail.com> wrote: >> >> Dusan, Alessandro, >> >> Let me answer Dusan first. >> >> Missatge de Dusan Pajin <dusan.pa...@gmail.com> del dia dc., 9 de juny >> 2021 a les 18:08: >> > >> > Hi Alessandro, >> > >> > I would say that this is a "known" issue or behavior in docker which is >> > experienced by everyone who ever wanted to receive syslog, netflow, >> > telemetry or any other similar UDP stream from network devices. When you >> > expose ports in your docker-compose file, the docker will create the IP >> > tables rules to steer the traffic to your container in docker's bridge >> > network, but unfortunately also translate the source IP address of the >> > packets. I am not sure what is the reasoning behind such a behavior. If >> > you try to search for solutions for this issue, you will find some >> > proposals, but none of them used to work in my case. >> >> That is not my understanding. I've also double checked with a devops >> Docker guru in my organization. >> >> In the default network docker mode, masquerading only happens for >> egress traffic not ingress. >> >> I actually tried it locally by running an httpd container (apache2) >> and redirect 8080 on the "host" to port 80 on the container. Container >> is on the docker range, LAN on my laptop is 192.168.1.36, .33 being >> another client in my LAN. >> >> root@d64c65384e87:/usr/local/apache2# tcpdump -l -n >> tcpdump: verbose output suppressed, use -v or -vv for full protocol decode >> listening on eth0, link-type EN10MB (Ethernet), capture size 262144 bytes >> 17:21:49.546067 IP 192.168.1.33.46595 > 172.17.0.3.80: Flags [F.], seq >> 2777556344, ack 4139714538, win 172, options [nop,nop,TS val 21290101 >> ecr 3311681356], length 0 >> 17:21:49.546379 IP 192.168.1.33.46591 > 172.17.0.3.80: Flags [F.], seq >> 3001175791, ack 61192428, win 172, options [nop,nop,TS val 21290101 >> ecr 3311686360], length 0 >> 17:21:49.546402 IP 172.17.0.3.80 > 192.168.1.33.46591: Flags [.], ack >> 1, win 236, options [nop,nop,TS val 3311689311 ecr 21290101], length 0 >> 17:21:49.546845 IP 172.17.0.3.80 > 192.168.1.33.46595: Flags [F.], seq >> 1, ack 1, win 227, options [nop,nop,TS val 3311689311 ecr 21290101], >> length 0 >> 17:21:49.550993 IP 192.168.1.33.46595 > 172.17.0.3.80: Flags [.], ack >> 2, win 172, options [nop,nop,TS val 21290110 ecr 3311689311], length 0 >> >> That works as expected, showing the real 1.33 address. >> >> Mind that there is a lot of confusion, because firewall services in >> the system's OS can interfere with the rules set by the docker daemon >> itself: >> >> https://stackoverflow.com/a/47913950/9321563 >> >> Alessandro, >> >> I need to analyse in detail your rules, but what is clear is that >> "something" is modifying them (see the two first rules)... whether >> these two lines in particular are causing the issue, I am not sure: >> >> Pre: >> >> Chain POSTROUTING (policy ACCEPT) >> target prot opt source destination >> MASQUERADE all -- 192.168.200.0/24 anywhere >> MASQUERADE all -- 172.17.0.0/16 anywhere >> MASQUERADE tcp -- 192.168.200.3 192.168.200.3 tcp dpt:8086 >> MASQUERADE tcp -- 192.168.200.5 192.168.200.5 tcp dpt:3000 >> MASQUERADE udp -- 192.168.200.9 192.168.200.9 udp dpt:50000 >> MASQUERADE tcp -- 192.168.200.11 192.168.200.11 tcp dpt:9092 >> MASQUERADE udp -- 192.168.200.4 192.168.200.4 udp dpt:50005 >> MASQUERADE udp -- 192.168.200.8 192.168.200.8 udp dpt:5600 >> MASQUERADE tcp -- 192.168.200.8 192.168.200.8 tcp dpt:bgp >> MASQUERADE udp -- 192.168.200.2 192.168.200.2 udp dpt:20013 >> >> Post: >> >> Chain POSTROUTING (policy ACCEPT 4799 packets, 1170K bytes) >> pkts bytes target prot opt in out source >> destination >> 340 20392 MASQUERADE all -- any !br-d662f1cf56fa >> 192.168.200.0/24 anywhere <---------------------- >> 453 28712 MASQUERADE all -- any !docker0 172.17.0.0/16 >> anywhere <---------------------- >> 0 0 MASQUERADE tcp -- any any 192.168.200.3 >> 192.168.200.3 tcp dpt:8086 >> 0 0 MASQUERADE tcp -- any any 192.168.200.5 >> 192.168.200.5 tcp dpt:3000 >> 0 0 MASQUERADE udp -- any any 192.168.200.9 >> 192.168.200.9 udp dpt:50000 >> 0 0 MASQUERADE tcp -- any any 192.168.200.11 >> 192.168.200.11 tcp dpt:9092 >> 0 0 MASQUERADE udp -- any any 192.168.200.4 >> 192.168.200.4 udp dpt:50005 >> 0 0 MASQUERADE udp -- any any 192.168.200.8 >> 192.168.200.8 udp dpt:5600 >> 0 0 MASQUERADE tcp -- any any 192.168.200.8 >> 192.168.200.8 tcp dpt:bgp >> 0 0 MASQUERADE udp -- any any 192.168.200.2 >> 192.168.200.2 udp dpt:20013 >> >> Which OS are you using in the host? >> >> A bit of a moonshot, when the problem occurs can you try manually >> (using iptabes) to remove the first two rules and set them exactly as >> in the PRE scenario. Use >> >> iptables -t nat -I <rule_num> <rest of params> >> >> which allows you to add it in a specific position. I think the problem >> might be somewhere else though. >> >> marc >> >> > >> > What definitely works is not to expose specific ports, but to configure >> > your container in docker-compose to be attached directly to the host >> > network. In that case, there will be no translation rules and no source >> > NAT and container will be directly connected to all host's network >> > interfaces. >> > In such case, be aware that Docker DNS will not work, so to export >> > information from pmacct container further to kafka, you would need to send >> > it to "localhost", if the kafka container is running on the same host and >> > not to "kafka". This shouldn't be a big problem in your setup. >> > >> > Btw, I am using docker swarm and not docker-compose, although they both >> > use docker-compose files with similar syntax, but I don't think there is >> > difference in their behavior. >> > >> > Hope this helps >> > >> > Kind regards, >> > Dusan >> > >> > On Wed, Jun 9, 2021 at 3:29 PM Paolo Lucente <pa...@pmacct.net> wrote: >> >> >> >> >> >> Hi Alessandro, >> >> >> >> (thanks for the kind words, first and foremost) >> >> >> >> Indeed, the test that Marc proposes is very sound, ie. check the actual >> >> packets coming in "on the wire" with tcpdump: do they really change >> >> sender IP address? >> >> >> >> Let me also confirm that what is used to populate peer_ip_src is the >> >> sender IP address coming straight from the socket (Marc's question) and, >> >> contrary to sFlow, there is typically there is no other way to infer >> >> such info (Alessandro's question). >> >> >> >> Paolo >> >> >> >> >> >> On 9/6/21 14:51, Marc Sune wrote: >> >> > Alessandro, >> >> > >> >> > inline >> >> > >> >> > Missatge de Alessandro Montano | FIBERTELECOM >> >> > <a.mont...@fibertelecom.it> del dia dc., 9 de juny 2021 a les 10:12: >> >> >> >> >> >> Hi Paolo (and Marc), >> >> >> >> >> >> this is my first post here ... first of all THANKS FOR YOU GREAT JOB :) >> >> >> >> >> >> I'm using pmacct/nfacctd container from docker-hub >> >> >> (+kafka+telegraf+influxdb+grafana) and it's really a powerfull tool >> >> >> >> >> >> The sender are JUNIPER MX204 routers, using j-flow (extended netflow) >> >> >> >> >> >> NFACCTD VERSION: >> >> >> NetFlow Accounting Daemon, nfacctd 1.7.6-git [20201226-0 (7ad9d1b)] >> >> >> '--enable-mysql' '--enable-pgsql' '--enable-sqlite3' >> >> >> '--enable-kafka' '--enable-geoipv2' '--enable-jansson' >> >> >> '--enable-rabbitmq' '--enable-nflog' '--enable-ndpi' '--enable-zmq' >> >> >> '--enable-avro' '--enable-serdes' '--enable-redis' '--enable-gnutls' >> >> >> 'AVRO_CFLAGS=-I/usr/local/avro/include' >> >> >> 'AVRO_LIBS=-L/usr/local/avro/lib -lavro' '--enable-l2' >> >> >> '--enable-traffic-bins' '--enable-bgp-bins' '--enable-bmp-bins' >> >> >> '--enable-st-bins' >> >> >> >> >> >> SYSTEM: >> >> >> Linux 76afde386f6f 5.4.0-73-generic #82-Ubuntu SMP Wed Apr 14 17:39:42 >> >> >> UTC 2021 x86_64 GNU/Linux >> >> >> >> >> >> CONFIG: >> >> >> debug: false >> >> >> daemonize: false >> >> >> pidfile: /var/run/nfacctd.pid >> >> >> logfile: /var/log/pmacct/nfacctd.log >> >> >> nfacctd_renormalize: true >> >> >> nfacctd_port: 20013 >> >> >> aggregate[k]: peer_src_ip, peer_dst_ip, in_iface, out_iface, vlan, >> >> >> sampling_direction, etype, src_as, dst_as, as_path, proto, src_net, >> >> >> src_mask, dst_net, dst_mask, flows >> >> >> nfacctd_time_new: true >> >> >> plugins: kafka[k] >> >> >> kafka_output[k]: json >> >> >> kafka_topic[k]: nfacct >> >> >> kafka_broker_host[k]: kafka >> >> >> kafka_broker_port[k]: 9092 >> >> >> kafka_refresh_time[k]: 60 >> >> >> kafka_history[k]: 1m >> >> >> kafka_history_roundoff[k]: m >> >> >> kafka_max_writers[k]: 1 >> >> >> kafka_markers[k]: true >> >> >> networks_file_no_lpm: true >> >> >> use_ip_next_hop: true >> >> >> >> >> >> DOCKER-COMPOSE: >> >> >> #Docker version 20.10.2, build 20.10.2-0ubuntu1~20.04.2 >> >> >> #docker-compose version 1.29.2, build 5becea4c >> >> >> version: "3.9" >> >> >> services: >> >> >> nfacct: >> >> >> networks: >> >> >> - ingress >> >> >> image: pmacct/nfacctd >> >> >> restart: on-failure >> >> >> ports: >> >> >> - "20013:20013/udp" >> >> >> volumes: >> >> >> - /etc/localtime:/etc/localtime >> >> >> - ./nfacct/etc:/etc/pmacct >> >> >> - ./nfacct/lib:/var/lib/pmacct >> >> >> - ./nfacct/log:/var/log/pmacct >> >> >> networks: >> >> >> ingress: >> >> >> name: ingress >> >> >> ipam: >> >> >> config: >> >> >> - subnet: 192.168.200.0/24 >> >> >> >> >> >> My problem is the value of field PEER_IP_SRC ... at start everything >> >> >> is correct, and it works well for a (long) while ... hours ... days ... >> >> >> I have ten routers so "peer_ip_src": "151.157.228.xxx" where xxx can >> >> >> easily identify the sender. Perfect. >> >> >> >> >> >> Suddenly ... "peer_ip_src": "192.168.200.1" for all records (and I >> >> >> loose the sender info!!!) ... >> >> >> >> >> >> It seems that docker-proxy decide to do nat/masquerading and translate >> >> >> source_ip for the udp stream. >> >> >> The only way for me to have the correct behavior again is to >> >> >> stop/start the container. >> >> >> >> >> >> How can I fix it? Or, is there an alternative way to obtain the same >> >> >> info (router ip) from inside the netflow stream, and not from the udp >> >> >> packet. >> >> > >> >> > Paolo is definitely the right person to answer how "peer_ip_src" is >> >> > populated. >> >> > >> >> > However, there is something that I don't fully understand. To the best >> >> > of my knowledge, even when binding ports, docker (actually the kernel, >> >> > configured by docker) shouldn't masquerade traffic at all - if >> >> > masquerade is truly what happens. And certainly that wouldn't happen >> >> > "randomly" in the middle of the execution. >> >> > >> >> > My first thought would be that this is something related to pmacct >> >> > itself, and that records are incorrectly generated but traffic is ok. >> >> > >> >> > I doubt the linux kernel iptables rules would randomly change the way >> >> > traffic is manipulated, unless of course, something else on that >> >> > machine/server is reloading iptables, and the resulting ruleset is >> >> > _slightly different_ for the traffic flowing towards the docker >> >> > container, effectively modifying the streams that go to pmacct (e.g. >> >> > rule priority reording). That _could_ explain why restarting the >> >> > daemon suddenly works, as order would be fixed. >> >> > >> >> > Some more info would be needed to discard an iptables/docker issue: >> >> > >> >> > * Dump the iptables -L and iptables -t nat -L before and after the >> >> > issue and compare. >> >> > * Use iptables -vL and iptables -t nat -vL to monitor counters, before >> >> > and after the issue, specially in the NAT table. >> >> > * Get inside the running container >> >> > (https://github.com/pmacct/pmacct/blob/master/docs/DOCKER.md#opening-a-shell-on-a-running-container), >> >> > install tcpdump, and write the pcap to a file, before and after the >> >> > incident. >> >> > >> >> > Since these dumps might contain sensitive data, you can send them >> >> > anonymized or in private. >> >> > >> >> > Hopefully with this info we will see if it's an iptables issue or we >> >> > have to look somewhere else. >> >> > >> >> > Regards >> >> > marc >> >> > >> >> >> >> >> >> Thanks for your support. >> >> >> >> >> >> Cheers. >> >> >> >> >> >> -- >> >> >> AlexIT >> >> >> -- >> >> >> docker-doctors mailing list >> >> >> docker-doct...@pmacct.net >> >> >> http://acaraje.pmacct.net/cgi-bin/mailman/listinfo/docker-doctors >> >> > >> >> > _______________________________________________ >> >> > pmacct-discussion mailing list >> >> > http://www.pmacct.net/#mailinglists >> >> > >> >> >> >> _______________________________________________ >> >> pmacct-discussion mailing list >> >> http://www.pmacct.net/#mailinglists _______________________________________________ pmacct-discussion mailing list http://www.pmacct.net/#mailinglists