Thanks for the comments Will. I had a look at Duplicity and as you say it
looks like a decent backup tool but isn't what I'm looking for. I will have
a better look at Python but at present my inclination is to stick to
bash/sed/awk.

I wonder if Matt could answer the question regarding MD5's returned by a
list operation, i.e. does Amazon calculate them based on its own file copy
or does it expect to be given the MD5 by the upload software? I seem to
remember reading somewhere that Amazon only uses the MD5 for verification
of transfer, so that if a file is uploaded in multiple parts it only
calculates its own MD5s for each part. Maybe that's outdated information.

I tried to verify this by uploading a 35MB file using the s3 console (so
s3cmd wouldn't know anything about it) and checking how long it took to
download vs how long to list with the --list-md5 option (doing the list
operation first). The download was about 15 seconds on my system but the
MD5 listing was almost instant, so Amazon had the MD5. However I don't
think the upload was multipart, because it restarted 3 times, sometimes
getting over half way, before it managed the upload and restarted from the
beginning. So I'm still none the wiser.

I do believe in verifying backups which is why I'm keen on the MD5 check
based on the actual file at s3. I haven't seen any cloud service offer to
do hashes on their data - I think one which did would have an extra selling
point. As far as I'm concerned I'd be happy to pay a fee for such a
service, they wouldn't have to charge much to make it viable. Of course
you'd have to make sure their client software didn't cheat by doing the
hash on your own PC and you'd want to use independent software locally to
verify their hash.

Regards
Russell

On 21 March 2015 at 21:10, Will McCown <w...@ross-mccown.com> wrote:

> On 3/21/2015 11:51 AM, Russell Gadd wrote:
> > My questions are:
> >
> >  1. Where does Amazon get its MD5 from? Is it calculated locally in my
> >     PC and sent in some headers? If Amazon calculates it at their end
> >     from the file it has on its servers then the verification is ok but
> >     otherwise how do I know their copy of the file is valid?
>
> I believe that Amazon calculates it on their end, or at least I hope so
> as I use it as an integrity check for my own backups. If you learn
> otherwise please let us know.
>
> >  2. How easy is it to find out how to use Amazon's AWS CLI in Linux? I
> >     have tried out s3cmd and it seems easy to use, but at first glance
> >     the AWS CLI looks pretty complex.
> >  3. I plan to use Bash and a little sed / awk in Linux. I've already
> >     done some code to create and manipulate this index as a trial. I
> >     don't particularly like Bash as such but it does a job.
> >     Alternatively I could perhaps use this project to learn some other
> >     language such as Python, but I'm not particularly keen to do this
> >     unless it confers particular advantages. Any opinions would be
> >     welcome (leaning perhaps to a C-like language if possible).
>
> I would certainly borrow heavily from s3cmd as an example.  I've looked
> at the CLI as well and find it pretty complex (but I'm not a really
> a programmer).  You might also want to check out the package called
> "duplicity".  I've been using it with s3 as the back end for a while
> and it seems to work pretty well (but works in the classical
> full/incremental backup mode which isn't quite what you are
> are thinking of).  But duplicity is written in python and will
> be another example of an implementation of an s3 back end.
>
> I used to write lots of complicated base/sed/awk scripts to do stuff,
> but these days I think Perl or Python is a much better choice for
> such things.  Both languages have a tremendous open-source library
> bases to draw upon that you can do a lot with very little actual
> coding.
>
> --
> Will McCown, Rolling Hills Estates, CA
> w...@ross-mccown.com
>
>
> ------------------------------------------------------------------------------
> Dive into the World of Parallel Programming The Go Parallel Website,
> sponsored
> by Intel and developed in partnership with Slashdot Media, is your hub for
> all
> things parallel software development, from weekly thought leadership blogs
> to
> news, videos, case studies, tutorials and more. Take a look and join the
> conversation now. http://goparallel.sourceforge.net/
> _______________________________________________
> S3tools-general mailing list
> S3tools-general@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/s3tools-general
>
------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the 
conversation now. http://goparallel.sourceforge.net/
_______________________________________________
S3tools-general mailing list
S3tools-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/s3tools-general

Reply via email to