Hi Matt,

I posted in another thread regarding what is used for file comparisons during 
sync if a multipart ETag is encountered and believe you answered it here 
already.

If a multipart ETag is encountered during sync, s3cmd then reverts to date 
comparison, but not the S3 stored "Last-Modified" date. Rather it uses the date 
values from x-amz-meta-s3cmd-attrs.

I have been using --no-preserve, so I don't have the x-amz-meta-s3cmd-attrs 
values for comparison. I felt I didn't need that extra metadata on my objects 
in S3 (in retrospect, I'd have been better off to have it likely). 

What then happens on sync when a multipart ETag is encountered and there is no 
x-amz-meta-s3cmd-attrs for date comparison?

Are we using size only at that point?

Mike


On Apr 13, 2014, at 10:52 PM, Matt Domsch <m...@domsch.com> wrote:

> The algorithm is in S3/FileLists.py compare_filelists().
> 
> Check if one side has a file (by name) the other doesn't.  If so, there's 
> nothing to compare.
> 
> Check that both files have the same size (as reported by stat() for local 
> files, and in the remote directory listing)
> 
> If checking MD5:
>   calculate or get the MD5, compare the two values
> 
> Date isn't actually compared, because what's in the XML returned by the 
> object listing from S3 doesn't contain the file date, only the 
> "Last-Modified" header, which is when the file was uploaded to S3.
> 
> Date (really, ctime, mtime, atime) as obtained from local files, when 
> --preserve is used (the default) _is_ stored in the x-amz-meta-s3cmd-attrs 
> metadata value for an object.  But getting this value back from S3 requires 
> doing a HEAD on the object itself, for every object, which is really 
> expensive.  So, we don't do that, unless we are comparing MD5s, and the MD5 
> (really, ETag) value returned in the directory listing indicates the file was 
> uploaded using multipart upload, in which case the ETag  value isn't the MD5 
> value for the whole file, but only for the last chunk of the file committed 
> to disk (not necessarily even the MD5 of the last chunk of the file).  We 
> don't, in general, get that value.
> 
> Now, if we are syncing from remote to local, we get x-amz-meta-s3cmd-attrs 
> value "for free" as a header when we do the GET to get the object, so we do 
> use it to set the values back to what were originally stored there.
> 
> So, the manpage is correct, date is not used in the comparison for syncing 
> purposes.  One could argue that the expensive HEAD call is still cheaper than 
> calculating the local MD5 of a file, but we can mitigate the local expense 
> using the --cache-file mechanism such that we only read the local file once 
> and then read its md5 out of the cache until it changes, so  the HEAD isn't 
> cheaper in general.
> 
> To detect a change to a file whose size hasn't changed, but its content has, 
> we have to do the HEAD call, and calculate the MD5 of the local file (and use 
> --cache-file to record that for posterity), and compare.
> 
> 
> 
> 
> On Sun, Apr 13, 2014 at 5:08 PM, WagnerOne <wag...@wagnerone.com> wrote:
> Hi,
> 
> The man page states the following:
> 
> --no-check-md5
>    Do not check MD5 sums when comparing files for [sync].  Only size will be 
> compared. May significantly speed up transfer but may also miss some changed 
> files.
> 
> When this says "only size will be compared", I'm taking it to mean only "size 
> of size and md5" will be compared?
> 
> Date (and name, of course), is still used in addition to size if 
> --no-check-md5 is passed?
> 
> Thanks,
> Mike
> 
> --
> wag...@wagnerone.com
> "I have no complaints, ever, about anything."-"Steve McQueen
> 
> 
> 
> ------------------------------------------------------------------------------
> Put Bad Developers to Shame
> Dominate Development with Jenkins Continuous Integration
> Continuously Automate Build, Test & Deployment
> Start a new project now. Try Jenkins in the cloud.
> http://p.sf.net/sfu/13600_Cloudbees
> _______________________________________________
> S3tools-general mailing list
> S3tools-general@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/s3tools-general
> 
> 
> ------------------------------------------------------------------------------
> Learn Graph Databases - Download FREE O'Reilly Book
> "Graph Databases" is the definitive new guide to graph databases and their
> applications. Written by three acclaimed leaders in the field,
> this first edition is now available. Download your free book today!
> http://p.sf.net/sfu/NeoTech_______________________________________________
> S3tools-general mailing list
> S3tools-general@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/s3tools-general

-- 
wag...@wagnerone.com
"I want to hear the man in the suit say that it's all wrong. I want to hear a 
man with millions of dollars say that he hurts people."-Rollins


------------------------------------------------------------------------------
"Accelerate Dev Cycles with Automated Cross-Browser Testing - For FREE
Instantly run your Selenium tests across 300+ browser/OS combos.
Get unparalleled scalability from the best Selenium testing platform available
Simple to use. Nothing to install. Get started now for free."
http://p.sf.net/sfu/SauceLabs
_______________________________________________
S3tools-general mailing list
S3tools-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/s3tools-general

Reply via email to