That is an interesting question because you are right, they are supposed to be immutable. Dave, is something happening in an 18 hour window as he describes?
-- Kevin A. McGrail Asst. Treasurer & VP Fundraising, Apache Software Foundation Chair Emeritus Apache SpamAssassin Project https://www.linkedin.com/in/kmcgrail - 703.798.0171 On Tue, Mar 20, 2018 at 2:04 PM, Dave Warren <d...@thedave.ca> wrote: > When I first set up my mirror, I was told "Because the items are release > artifacts, they are never altered or removed, just added.", so I configured > my caching around this design. This does not seem to be the case in > general, it wasn't just a one-off problem on 1827131. > > Looking at the original 1827131 issue, I was suspicious: rsync ran over > 200 times between when I first got the file, and when rsync later updated > the file, so it seemed unlikely that an incomplete file somehow slipped > through. rsync doesn't normally allow this, but even if it did, it should > have been fixed within 5 minutes, not 18 hours later. > > From logs, I can tell that 1827131 and 1827165 were each written, then ~18 > hours later, updated. With 1827165 I was able to capture the original files > including the .sha1, which rules out any sort of incomplete copy as the > .sha1 wouldn't match if there was any sort of error or truncation during > the copy process. > > tl;dr: > > My concern is this: Is it expected behaviour that the files are first > published, then later an updated version of the same update number is > published? > > And second: Given that the scores are modified between these two versions, > what is the impact on SpamAssassin users who obtain the first vs the > second? There are several score differences, some significant. > > > > On 2018-03-20 11:27, Kevin A. McGrail wrote: > >> Not sure what you mean. Original ticket was about 1827131.tar.gz >> >> -- >> Kevin A. McGrail >> Asst. Treasurer & VP Fundraising, Apache Software Foundation >> Chair Emeritus Apache SpamAssassin Project >> https://www.linkedin.com/in/kmcgrail - 703.798.0171 >> >> On Tue, Mar 20, 2018 at 1:10 PM, Dave Warren <d...@thedave.ca> wrote: >> >> Interestingly the file really is changing and wasn't just a poorly timed >>> copy, check this out: >>> >>> Date: Mon, 19 Mar 2018 02:38:08 -0600 (MDT), the files were created: >>> >>> .d..t...... ./ >>> >>>> f+++++++++ 1827165.tar.gz >>>> f+++++++++ 1827165.tar.gz.asc >>>> f+++++++++ 1827165.tar.gz.sha1 >>>> >>> >>> And the .sha1 hash validates (which obviously wouldn't happen if I had an >>> incomplete copy). >>> >>> # sha1sum 1827165.tar.gz;cat 1827165.tar.gz.sha1 >>> a3abb2aad004a3401acfad9167e77b0ca31ef9c4 1827165.tar.gz >>> a3abb2aad004a3401acfad9167e77b0ca31ef9c4 /usr/local/spamassassin/automc >>> /tmp/stage/3.4.2/update.tgz >>> >>> >>> Date: Mon, 19 Mar 2018 20:48:17 -0600 (MDT), the files were updated: >>> >>> f.st...... 1827165.tar.gz >>>> f..t...... 1827165.tar.gz.asc >>>> f.st...... 1827165.tar.gz.sha1 >>>> >>> >>> And once again, the .sha1 hash validates the new file: >>> >>> # sha1sum 1827165.tar.gz;cat 1827165.tar.gz.sha1 >>> ea74b1eb682bbb25c2028ffe01a8e20bd1943885 1827165.tar.gz >>> ea74b1eb682bbb25c2028ffe01a8e20bd1943885 /usr/local/spamassassin/automc >>> /tmp/mkupdate-with-scores/1827165.tar.gz >>> >>> >>> I don't know if any of this is actually a problem, but it's not what I >>> expected to see. >>> >>> If anyone is curious, I placed copies of the files named -first and >>> -second as appropriate, including uncompressed copies of the .tar.gz >>> files. >>> The files are here: https://mirrors.razx.cloud/sa-update-backup/1827165/ >>> >>> This is curiosity more than anything else at this stage, I will leave my >>> caching to be less aggressive to allow files to be updated. >>> >>> >>> On 2018-03-19 13:36, Kevin A. McGrail wrote: >>> >>> I would guess you caught it mid copy and it arose because of the caching. >>>> Just a guess but glad we know what's going on. >>>> >>>> On Mon, Mar 19, 2018, 15:09 Dave Warren <d...@thedave.ca> wrote: >>>> >>>> Howdy. I'm on this list. >>>> >>>>> >>>>> Okay, so this is a bit odd, it looks like the file 1827131.tar.gz was >>>>> actually modified by rsync many hours after the initial write: >>>>> >>>>> Date: Sun, 18 Mar 2018 02:36:30 -0600 (MDT) >>>>> .d..t...... ./ >>>>> >f+++++++++ 1827131.tar.gz >>>>> >f+++++++++ 1827131.tar.gz.asc >>>>> >f+++++++++ 1827131.tar.gz.sha1 >>>>> >>>>> My cron runs every 5 minutes (with up to 220 seconds variability). I >>>>> see >>>>> "MIRROR.CHECK" being updated at 03:18, 04:21, 05:23, 06:18, etc) >>>>> confirming rsync was running. >>>>> >>>>> >>>>> 1827131.tar.gz is modified just over 18 hours later: >>>>> >>>>> Date: Sun, 18 Mar 2018 20:47:39 -0600 (MDT) >>>>> >f.st...... 1827131.tar.gz >>>>> >f..t...... 1827131.tar.gz.asc >>>>> >f.st...... 1827131.tar.gz.sha1 >>>>> >>>>> I was under the impression that the *.tar.gz* files were immutable, but >>>>> looking through my rsync logs, this is definitely not the case, I see >>>>> the files being created and later updated nearly daily (although not >>>>> every day, March 8th I see 1826189.tar.gz was created and never >>>>> modified), the only reference to it is here: >>>>> >>>>> 8 Mar 2018 19:46:40 -0700 (MST) >>>>> .d..t...... ./ >>>>> >f+++++++++ 1826189.tar.gz >>>>> >f+++++++++ 1826189.tar.gz.asc >>>>> >f+++++++++ 1826189.tar.gz.sha1 >>>>> >>>>> >>>>> Due to my belief in the immutable nature of these files, the files were >>>>> being cached without verifying whether the on-disk source had changed. >>>>> For the moment, I will cache less aggressively which should resolve the >>>>> problem. >>>>> >>>>> >>>>> Can anyone confirm why the files are being modified? Is this >>>>> intentional/expected? >>>>> >>>>> >>>>> >>>>> >>>>> On 2018-03-19 07:52, Dave Jones wrote: >>>>> >>>>> I found an email address in the SA archives from 2013. Hopefully this >>>>>> makes it to him. >>>>>> >>>>>> On 03/19/2018 08:33 AM, Dave Jones wrote: >>>>>> >>>>>> Is Dave Warren on this list? If no response, does anyone have an old >>>>>>> email with his contact info so I can ask him how his rsync's are >>>>>>> setup? >>>>>>> >>>>>>> Dave >>>>>>> >>>>>>> On 03/19/2018 08:26 AM, bugzilla-dae...@bugzilla.spamassassin.org >>>>>>> >>>>>>> wrote: >>>>>> >>>>> >>>>> https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7566 >>>>>> >>>>>>> >>>>>>>> Dave Jones <da...@apache.org> changed: >>>>>>>> >>>>>>>> What |Removed |Added >>>>>>>> >>>>>>>> ------------------------------------------------------------ >>>>>>>> >>>>>>> ---------------- >>>>> >>>>> >>>>>> CC| |da...@apache.org >>>>>>>> >>>>>>>> --- Comment #2 from Dave Jones <da...@apache.org> --- >>>>>>>> I guess I can add logic to our hourly script to check sha1 values on >>>>>>>> the latest >>>>>>>> tar.gz to catch rsync'ing issues. >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>> >>>>> >>>> >>> >> >