Hi Matt,
I posted in another thread regarding what is used for file comparisons during
sync if a multipart ETag is encountered and believe you answered it here
already.
If a multipart ETag is encountered during sync, s3cmd then reverts to date
comparison, but not the S3 stored "Last-Modified" date. Rather it uses the date
values from x-amz-meta-s3cmd-attrs.
I have been using --no-preserve, so I don't have the x-amz-meta-s3cmd-attrs
values for comparison. I felt I didn't need that extra metadata on my objects
in S3 (in retrospect, I'd have been better off to have it likely).
What then happens on sync when a multipart ETag is encountered and there is no
x-amz-meta-s3cmd-attrs for date comparison?
Are we using size only at that point?
Mike
On Apr 13, 2014, at 10:52 PM, Matt Domsch <m...@domsch.com> wrote:
> The algorithm is in S3/FileLists.py compare_filelists().
>
> Check if one side has a file (by name) the other doesn't. If so, there's
> nothing to compare.
>
> Check that both files have the same size (as reported by stat() for local
> files, and in the remote directory listing)
>
> If checking MD5:
> calculate or get the MD5, compare the two values
>
> Date isn't actually compared, because what's in the XML returned by the
> object listing from S3 doesn't contain the file date, only the
> "Last-Modified" header, which is when the file was uploaded to S3.
>
> Date (really, ctime, mtime, atime) as obtained from local files, when
> --preserve is used (the default) _is_ stored in the x-amz-meta-s3cmd-attrs
> metadata value for an object. But getting this value back from S3 requires
> doing a HEAD on the object itself, for every object, which is really
> expensive. So, we don't do that, unless we are comparing MD5s, and the MD5
> (really, ETag) value returned in the directory listing indicates the file was
> uploaded using multipart upload, in which case the ETag value isn't the MD5
> value for the whole file, but only for the last chunk of the file committed
> to disk (not necessarily even the MD5 of the last chunk of the file). We
> don't, in general, get that value.
>
> Now, if we are syncing from remote to local, we get x-amz-meta-s3cmd-attrs
> value "for free" as a header when we do the GET to get the object, so we do
> use it to set the values back to what were originally stored there.
>
> So, the manpage is correct, date is not used in the comparison for syncing
> purposes. One could argue that the expensive HEAD call is still cheaper than
> calculating the local MD5 of a file, but we can mitigate the local expense
> using the --cache-file mechanism such that we only read the local file once
> and then read its md5 out of the cache until it changes, so the HEAD isn't
> cheaper in general.
>
> To detect a change to a file whose size hasn't changed, but its content has,
> we have to do the HEAD call, and calculate the MD5 of the local file (and use
> --cache-file to record that for posterity), and compare.
>
>
>
>
> On Sun, Apr 13, 2014 at 5:08 PM, WagnerOne <wag...@wagnerone.com> wrote:
> Hi,
>
> The man page states the following:
>
> --no-check-md5
> Do not check MD5 sums when comparing files for [sync]. Only size will be
> compared. May significantly speed up transfer but may also miss some changed
> files.
>
> When this says "only size will be compared", I'm taking it to mean only "size
> of size and md5" will be compared?
>
> Date (and name, of course), is still used in addition to size if
> --no-check-md5 is passed?
>
> Thanks,
> Mike
>
> --
> wag...@wagnerone.com
> "I have no complaints, ever, about anything."-"Steve McQueen
>
>
>
> ------------------------------------------------------------------------------
> Put Bad Developers to Shame
> Dominate Development with Jenkins Continuous Integration
> Continuously Automate Build, Test & Deployment
> Start a new project now. Try Jenkins in the cloud.
> http://p.sf.net/sfu/13600_Cloudbees
> _______________________________________________
> S3tools-general mailing list
> S3tools-general@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/s3tools-general
>
>
> ------------------------------------------------------------------------------
> Learn Graph Databases - Download FREE O'Reilly Book
> "Graph Databases" is the definitive new guide to graph databases and their
> applications. Written by three acclaimed leaders in the field,
> this first edition is now available. Download your free book today!
> http://p.sf.net/sfu/NeoTech_______________________________________________
> S3tools-general mailing list
> S3tools-general@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/s3tools-general
--
wag...@wagnerone.com
"I want to hear the man in the suit say that it's all wrong. I want to hear a
man with millions of dollars say that he hurts people."-Rollins
------------------------------------------------------------------------------
"Accelerate Dev Cycles with Automated Cross-Browser Testing - For FREE
Instantly run your Selenium tests across 300+ browser/OS combos.
Get unparalleled scalability from the best Selenium testing platform available
Simple to use. Nothing to install. Get started now for free."
http://p.sf.net/sfu/SauceLabs
_______________________________________________
S3tools-general mailing list
S3tools-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/s3tools-general