Hi, On 1/8/19 7:37 PM, Alexandre DERUMIER wrote: > I'm able to reproduce with: > --------------------------- > on 1 host: > > cluster.fw: > [OPTIONS] > > enable: 1 > policy_in: ACCEPT > > > > > #!/usr/bin/perl > > use IO::File; > use PVE::Firewall; > use Data::Dumper; > use Time::HiRes qw ( time alarm sleep usleep ); > > while(1){ > > $filename = "/etc/pve/firewall/cluster.fw"; > > if (my $fh = IO::File->new($filename, O_RDONLY)) { > > $cluster_conf = PVE::Firewall::parse_clusterfw_config($filename, > $fh, $verbose); > my $cluster_options = $cluster_conf->{options}; > > if (!$cluster_options->{enable}) { > print Dumper($cluster_options); > die "error\n"; > } > > } > usleep(100); > }; > > > the script is running fine. > > > on another host, edit the file (simple open/write), > then the script on first host, return > > $VAR1 = {}; > error
that is expected, AFAICT, a modify operation shouldn't be: * read FILE -> modify -> write FILE but rather: * read FILE -> modify -> write FILE.TMP -> move FILE.TMP to FILE if it's wanted that always a valid content is read. Else yes, you may have a small time window where the file is truncated. But, file_set_contents - which save_clusterfw_conf uses - does this already[0], so maybe this is the "high-level fuse rename isn't atomic" bug again... May need to take a closer look tomorrow. [0]: https://git.proxmox.com/?p=pve-common.git;a=blob;f=src/PVE/Tools.pm;h=accf6539da94d2b5d5b6f4539310fe5c4d526c7e;hb=HEAD#l213 > > ----- Mail original ----- > De: "aderumier" <aderum...@odiso.com> > À: "pve-devel" <pve-devel@pve.proxmox.com> > Envoyé: Mardi 8 Janvier 2019 19:15:06 > Objet: [pve-devel] firewall : possible bug/race when cluster.fw is replicated > and rules are updated ? > > Hi, > I'm currently debugging a possible firewalling problem. > I'm running some cephfs client in vm, firewalled by proxmox. > cephfs client are really sensitive to network problem, and mainly with > packets logss or dropped packets. > > I'm really not sure, but I have currently puppet updating my cluster.fw, at > regular interval, > and sometimes, I have all the vm on a specific host (or multiple hosts), at > the same time, have a small disconnect (maybe some second). > > > I would like to known, if cluster.fw replication is atomic in /etc/pve/ ? > or if they are any chance, that during file replication, the firewall try to > read the file, > it could be empty ? > > > I just wonder (I'm really really not sure) if I could trigger this: > > > sub update { > my $code = sub { > > my $cluster_conf = load_clusterfw_conf(); > my $cluster_options = $cluster_conf->{options}; > > if (!$cluster_options->{enable}) { > PVE::Firewall::remove_pvefw_chains(); > return; > } > > > cluster.conf not readable/absent/.... , and remove_pvefw_chains called. > then after some seconds, rules are applied again. > > > I'm going to add some log to try to reproduce it. (BTW, it could be great to > logs rules changed, maybe an audit log with a diff could be great) > _______________________________________________ > pve-devel mailing list > pve-devel@pve.proxmox.com > https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel > > _______________________________________________ > pve-devel mailing list > pve-devel@pve.proxmox.com > https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel _______________________________________________ pve-devel mailing list pve-devel@pve.proxmox.com https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel