Dusan, Alessandro, Let me answer Dusan first.
Missatge de Dusan Pajin <[email protected]> del dia dc., 9 de juny 2021 a les 18:08: > > Hi Alessandro, > > I would say that this is a "known" issue or behavior in docker which is > experienced by everyone who ever wanted to receive syslog, netflow, telemetry > or any other similar UDP stream from network devices. When you expose ports > in your docker-compose file, the docker will create the IP tables rules to > steer the traffic to your container in docker's bridge network, but > unfortunately also translate the source IP address of the packets. I am not > sure what is the reasoning behind such a behavior. If you try to search for > solutions for this issue, you will find some proposals, but none of them used > to work in my case. That is not my understanding. I've also double checked with a devops Docker guru in my organization. In the default network docker mode, masquerading only happens for egress traffic not ingress. I actually tried it locally by running an httpd container (apache2) and redirect 8080 on the "host" to port 80 on the container. Container is on the docker range, LAN on my laptop is 192.168.1.36, .33 being another client in my LAN. root@d64c65384e87:/usr/local/apache2# tcpdump -l -n tcpdump: verbose output suppressed, use -v or -vv for full protocol decode listening on eth0, link-type EN10MB (Ethernet), capture size 262144 bytes 17:21:49.546067 IP 192.168.1.33.46595 > 172.17.0.3.80: Flags [F.], seq 2777556344, ack 4139714538, win 172, options [nop,nop,TS val 21290101 ecr 3311681356], length 0 17:21:49.546379 IP 192.168.1.33.46591 > 172.17.0.3.80: Flags [F.], seq 3001175791, ack 61192428, win 172, options [nop,nop,TS val 21290101 ecr 3311686360], length 0 17:21:49.546402 IP 172.17.0.3.80 > 192.168.1.33.46591: Flags [.], ack 1, win 236, options [nop,nop,TS val 3311689311 ecr 21290101], length 0 17:21:49.546845 IP 172.17.0.3.80 > 192.168.1.33.46595: Flags [F.], seq 1, ack 1, win 227, options [nop,nop,TS val 3311689311 ecr 21290101], length 0 17:21:49.550993 IP 192.168.1.33.46595 > 172.17.0.3.80: Flags [.], ack 2, win 172, options [nop,nop,TS val 21290110 ecr 3311689311], length 0 That works as expected, showing the real 1.33 address. Mind that there is a lot of confusion, because firewall services in the system's OS can interfere with the rules set by the docker daemon itself: https://stackoverflow.com/a/47913950/9321563 Alessandro, I need to analyse in detail your rules, but what is clear is that "something" is modifying them (see the two first rules)... whether these two lines in particular are causing the issue, I am not sure: Pre: Chain POSTROUTING (policy ACCEPT) target prot opt source destination MASQUERADE all -- 192.168.200.0/24 anywhere MASQUERADE all -- 172.17.0.0/16 anywhere MASQUERADE tcp -- 192.168.200.3 192.168.200.3 tcp dpt:8086 MASQUERADE tcp -- 192.168.200.5 192.168.200.5 tcp dpt:3000 MASQUERADE udp -- 192.168.200.9 192.168.200.9 udp dpt:50000 MASQUERADE tcp -- 192.168.200.11 192.168.200.11 tcp dpt:9092 MASQUERADE udp -- 192.168.200.4 192.168.200.4 udp dpt:50005 MASQUERADE udp -- 192.168.200.8 192.168.200.8 udp dpt:5600 MASQUERADE tcp -- 192.168.200.8 192.168.200.8 tcp dpt:bgp MASQUERADE udp -- 192.168.200.2 192.168.200.2 udp dpt:20013 Post: Chain POSTROUTING (policy ACCEPT 4799 packets, 1170K bytes) pkts bytes target prot opt in out source destination 340 20392 MASQUERADE all -- any !br-d662f1cf56fa 192.168.200.0/24 anywhere <---------------------- 453 28712 MASQUERADE all -- any !docker0 172.17.0.0/16 anywhere <---------------------- 0 0 MASQUERADE tcp -- any any 192.168.200.3 192.168.200.3 tcp dpt:8086 0 0 MASQUERADE tcp -- any any 192.168.200.5 192.168.200.5 tcp dpt:3000 0 0 MASQUERADE udp -- any any 192.168.200.9 192.168.200.9 udp dpt:50000 0 0 MASQUERADE tcp -- any any 192.168.200.11 192.168.200.11 tcp dpt:9092 0 0 MASQUERADE udp -- any any 192.168.200.4 192.168.200.4 udp dpt:50005 0 0 MASQUERADE udp -- any any 192.168.200.8 192.168.200.8 udp dpt:5600 0 0 MASQUERADE tcp -- any any 192.168.200.8 192.168.200.8 tcp dpt:bgp 0 0 MASQUERADE udp -- any any 192.168.200.2 192.168.200.2 udp dpt:20013 Which OS are you using in the host? A bit of a moonshot, when the problem occurs can you try manually (using iptabes) to remove the first two rules and set them exactly as in the PRE scenario. Use iptables -t nat -I <rule_num> <rest of params> which allows you to add it in a specific position. I think the problem might be somewhere else though. marc > > What definitely works is not to expose specific ports, but to configure your > container in docker-compose to be attached directly to the host network. In > that case, there will be no translation rules and no source NAT and container > will be directly connected to all host's network interfaces. > In such case, be aware that Docker DNS will not work, so to export > information from pmacct container further to kafka, you would need to send it > to "localhost", if the kafka container is running on the same host and not to > "kafka". This shouldn't be a big problem in your setup. > > Btw, I am using docker swarm and not docker-compose, although they both use > docker-compose files with similar syntax, but I don't think there is > difference in their behavior. > > Hope this helps > > Kind regards, > Dusan > > On Wed, Jun 9, 2021 at 3:29 PM Paolo Lucente <[email protected]> wrote: >> >> >> Hi Alessandro, >> >> (thanks for the kind words, first and foremost) >> >> Indeed, the test that Marc proposes is very sound, ie. check the actual >> packets coming in "on the wire" with tcpdump: do they really change >> sender IP address? >> >> Let me also confirm that what is used to populate peer_ip_src is the >> sender IP address coming straight from the socket (Marc's question) and, >> contrary to sFlow, there is typically there is no other way to infer >> such info (Alessandro's question). >> >> Paolo >> >> >> On 9/6/21 14:51, Marc Sune wrote: >> > Alessandro, >> > >> > inline >> > >> > Missatge de Alessandro Montano | FIBERTELECOM >> > <[email protected]> del dia dc., 9 de juny 2021 a les 10:12: >> >> >> >> Hi Paolo (and Marc), >> >> >> >> this is my first post here ... first of all THANKS FOR YOU GREAT JOB :) >> >> >> >> I'm using pmacct/nfacctd container from docker-hub >> >> (+kafka+telegraf+influxdb+grafana) and it's really a powerfull tool >> >> >> >> The sender are JUNIPER MX204 routers, using j-flow (extended netflow) >> >> >> >> NFACCTD VERSION: >> >> NetFlow Accounting Daemon, nfacctd 1.7.6-git [20201226-0 (7ad9d1b)] >> >> '--enable-mysql' '--enable-pgsql' '--enable-sqlite3' '--enable-kafka' >> >> '--enable-geoipv2' '--enable-jansson' '--enable-rabbitmq' >> >> '--enable-nflog' '--enable-ndpi' '--enable-zmq' '--enable-avro' >> >> '--enable-serdes' '--enable-redis' '--enable-gnutls' >> >> 'AVRO_CFLAGS=-I/usr/local/avro/include' 'AVRO_LIBS=-L/usr/local/avro/lib >> >> -lavro' '--enable-l2' '--enable-traffic-bins' '--enable-bgp-bins' >> >> '--enable-bmp-bins' '--enable-st-bins' >> >> >> >> SYSTEM: >> >> Linux 76afde386f6f 5.4.0-73-generic #82-Ubuntu SMP Wed Apr 14 17:39:42 >> >> UTC 2021 x86_64 GNU/Linux >> >> >> >> CONFIG: >> >> debug: false >> >> daemonize: false >> >> pidfile: /var/run/nfacctd.pid >> >> logfile: /var/log/pmacct/nfacctd.log >> >> nfacctd_renormalize: true >> >> nfacctd_port: 20013 >> >> aggregate[k]: peer_src_ip, peer_dst_ip, in_iface, out_iface, vlan, >> >> sampling_direction, etype, src_as, dst_as, as_path, proto, src_net, >> >> src_mask, dst_net, dst_mask, flows >> >> nfacctd_time_new: true >> >> plugins: kafka[k] >> >> kafka_output[k]: json >> >> kafka_topic[k]: nfacct >> >> kafka_broker_host[k]: kafka >> >> kafka_broker_port[k]: 9092 >> >> kafka_refresh_time[k]: 60 >> >> kafka_history[k]: 1m >> >> kafka_history_roundoff[k]: m >> >> kafka_max_writers[k]: 1 >> >> kafka_markers[k]: true >> >> networks_file_no_lpm: true >> >> use_ip_next_hop: true >> >> >> >> DOCKER-COMPOSE: >> >> #Docker version 20.10.2, build 20.10.2-0ubuntu1~20.04.2 >> >> #docker-compose version 1.29.2, build 5becea4c >> >> version: "3.9" >> >> services: >> >> nfacct: >> >> networks: >> >> - ingress >> >> image: pmacct/nfacctd >> >> restart: on-failure >> >> ports: >> >> - "20013:20013/udp" >> >> volumes: >> >> - /etc/localtime:/etc/localtime >> >> - ./nfacct/etc:/etc/pmacct >> >> - ./nfacct/lib:/var/lib/pmacct >> >> - ./nfacct/log:/var/log/pmacct >> >> networks: >> >> ingress: >> >> name: ingress >> >> ipam: >> >> config: >> >> - subnet: 192.168.200.0/24 >> >> >> >> My problem is the value of field PEER_IP_SRC ... at start everything is >> >> correct, and it works well for a (long) while ... hours ... days ... >> >> I have ten routers so "peer_ip_src": "151.157.228.xxx" where xxx can >> >> easily identify the sender. Perfect. >> >> >> >> Suddenly ... "peer_ip_src": "192.168.200.1" for all records (and I loose >> >> the sender info!!!) ... >> >> >> >> It seems that docker-proxy decide to do nat/masquerading and translate >> >> source_ip for the udp stream. >> >> The only way for me to have the correct behavior again is to stop/start >> >> the container. >> >> >> >> How can I fix it? Or, is there an alternative way to obtain the same info >> >> (router ip) from inside the netflow stream, and not from the udp packet. >> > >> > Paolo is definitely the right person to answer how "peer_ip_src" is >> > populated. >> > >> > However, there is something that I don't fully understand. To the best >> > of my knowledge, even when binding ports, docker (actually the kernel, >> > configured by docker) shouldn't masquerade traffic at all - if >> > masquerade is truly what happens. And certainly that wouldn't happen >> > "randomly" in the middle of the execution. >> > >> > My first thought would be that this is something related to pmacct >> > itself, and that records are incorrectly generated but traffic is ok. >> > >> > I doubt the linux kernel iptables rules would randomly change the way >> > traffic is manipulated, unless of course, something else on that >> > machine/server is reloading iptables, and the resulting ruleset is >> > _slightly different_ for the traffic flowing towards the docker >> > container, effectively modifying the streams that go to pmacct (e.g. >> > rule priority reording). That _could_ explain why restarting the >> > daemon suddenly works, as order would be fixed. >> > >> > Some more info would be needed to discard an iptables/docker issue: >> > >> > * Dump the iptables -L and iptables -t nat -L before and after the >> > issue and compare. >> > * Use iptables -vL and iptables -t nat -vL to monitor counters, before >> > and after the issue, specially in the NAT table. >> > * Get inside the running container >> > (https://github.com/pmacct/pmacct/blob/master/docs/DOCKER.md#opening-a-shell-on-a-running-container), >> > install tcpdump, and write the pcap to a file, before and after the >> > incident. >> > >> > Since these dumps might contain sensitive data, you can send them >> > anonymized or in private. >> > >> > Hopefully with this info we will see if it's an iptables issue or we >> > have to look somewhere else. >> > >> > Regards >> > marc >> > >> >> >> >> Thanks for your support. >> >> >> >> Cheers. >> >> >> >> -- >> >> AlexIT >> >> -- >> >> docker-doctors mailing list >> >> [email protected] >> >> http://acaraje.pmacct.net/cgi-bin/mailman/listinfo/docker-doctors >> > >> > _______________________________________________ >> > pmacct-discussion mailing list >> > http://www.pmacct.net/#mailinglists >> > >> >> _______________________________________________ >> pmacct-discussion mailing list >> http://www.pmacct.net/#mailinglists _______________________________________________ pmacct-discussion mailing list http://www.pmacct.net/#mailinglists
