Re: [S3tools-general] s3cmd object count to sync error

2014-04-23 Thread WagnerOne
Ah ha! Thanks for the fix.

It'd be useful to see the counts of new files to upload and updated files to 
upload called out separately in the verbose INFO: Summary section. 

I thought I might take a crack at that, but got lost pretty fast in s3cmd src. 
:| 

Mike

On Apr 21, 2014, at 7:19 PM, Matt Domsch m...@domsch.com wrote:

 Thanks for the report.
 
 Try with this patch.  The reported number of files to upload in the Summary 
 line was correct, but the number y in [x of y] was incorrect, it was only 
 using the new files to upload count, not new files plus updated files 
 count.
 
 diff --git a/s3cmd b/s3cmd
 index 913d04a..467dc07 100755
 --- a/s3cmd
 +++ b/s3cmd
 @@ -1311,8 +1311,9 @@ def cmd_sync_local2remote(args):
  update_count = len(update_list)
  copy_count = len(copy_pairs)
  remote_count = len(remote_list)
 +upload_count = local_count + update_count
  
 -info(uSummary: %d local files to upload, %d files to remote copy, 
 %d remote files to delete % (local_count + update_count, copy_count, 
 remote_count))
 +info(uSummary: %d local files to upload, %d files to remote copy, 
 %d remote files to delete % (upload_count, copy_count, remote_count))
  
  _set_remote_uri(local_list, destination_base, single_file_local)
  _set_remote_uri(update_list, destination_base, single_file_local)
 @@ -1349,8 +1350,8 @@ def cmd_sync_local2remote(args):
  total_size = 0
  total_elapsed = 0.0
  timestamp_start = time.time()
 -n, total_size = _upload(local_list, 0, local_count, total_size)
 -n, total_size = _upload(update_list, n, local_count, total_size)
 +n, total_size = _upload(local_list, 0, upload_count, total_size)
 +n, total_size = _upload(update_list, n, upload_count, total_size)
  n_copies, saved_bytes, failed_copy_files  = remote_copy(s3, 
 copy_pairs, destination_base)
  
  #upload file that could not be copied
 
 
 
 On Mon, Apr 21, 2014 at 3:45 PM, WagnerOne wag...@wagnerone.com wrote:
 example s3cmd for the below
 
 /usr/bin/s3cmd -c /home/ec2-user/.s3cfg sync --delete-removed --no-preserve 
 --verbose --progress /localdirectory/ s3://mybucket/directory/
 
 
 syncs were running build commit d52d5edcc916512e979917f04abcea19d3a25af7
 Date:   Sat Apr 12 20:40:16 2014 -0500
 
 
 s3cmd, when syncing a particular directory to S3 is telling me this:
 
 INFO: Found 1567064 local files, 1566681 remote files
 INFO: Verifying attributes...
 [list of a few disabled md5 check objects]
 INFO: Summary: 14263 local files to upload, 417 files to remote copy, 0 
 remote files to delete
 WARNING: Module python-magic is not available. Guessing MIME types based on 
 file extensions.
 /file.jpg - s3://mybucket/file.jpg  [1 of 111]
 ^M  4096 of 83840 4% in0s   403.67 kB/s^M 12288 of 8384014% in
 0s  1188.59 kB/s  failed
 WARNING: Upload failed: /file.jpg (timed out)
 WARNING: Retrying on lower speed (throttle=0.00)
 WARNING: Waiting 3 sec...
 /file.jpg - s3://mybucket/file.jpg  [1 of 111]
 
 and continues to the sync finish...
 
 ^M  4096 of 1521626% in0s   433.79 kB/s^M 15216 of 15216   100% in
 0s   201.74 kB/s  done
 /file.txt - s3://mybucket/file.txt  [14263 of 111]
 
 then it begins to do remote copies as applicable for the sync.
 
 
 I've not seen this before where the items to be uploaded in the progress 
 output doesn't match the INFO line statement of objects to upload.
 
 I also think I have not encountered the (timed out) error before and wonder 
 if maybe something related to that function is somehow mucking up the 
 objects to be uploaded count display. Just an idea.
 
 
 Another one from the same set of batched sync commands:
 
 INFO: Found 615171 local files, 587979 remote files
 INFO: Verifying attributes...
 [list of a few disabled md5 check objects]
 INFO: Summary: 6446 local files to upload, 21557 files to remote copy, 0 
 remote files to delete
 WARNING: Module python-magic is not available. Guessing MIME types based on 
 file extensions.
 /file.png - s3://mybucket.png  [1 of 5741]
 ...
 /file.mp3 - s3://mybucket/file.mp3  [6446 of 5741]
 ...
 Done. Uploaded 233380819 bytes in 2578.0 seconds, 88.41 kB/s.  Copied 21557 
 files saving 373663253 bytes transfer.
 
 That one also had an initial file timeout...
 
 
 Another one... almost matches! :)
 
 INFO: Summary: 68371 local files to upload, 29128 files to remote copy, 0 
 remote files to delete
 WARNING: Module python-magic is not available. Guessing MIME types based on 
 file extensions.
 /file.pdf - s3://mybucket/file.pdf  [1 of 68321]
 ...
 completed with [68371 of 68321]
 
 
 FWIW, as I inspect the logs, I did see this on subsequent syncs that did not 
 time out on the initial object upload.
 
 
 Here is one that is starting to maybe indicate what's happening maybe...
 
 INFO: Found 206744 local files, 206740 remote files
 INFO: Verifying attributes...
 INFO: Summary: 6164 local files to 

Re: [S3tools-general] question regarding --no-check-md5

2014-04-23 Thread Matt Domsch
Yes, I believe it is accurate.  it's also fair to ask why we don't follow
suit and compare local mtime with S3 LastModified, and upload if mtime is
newer.

it's been that way since Michal first wrote the initial sync code back in
September 2007.  Doesn't mean it has to stay that way.


On Wed, Apr 23, 2014 at 8:29 AM, WagnerOne wag...@wagnerone.com wrote:

 Thank you, Matt, for inspecting that and for the continued explanation.

 If I have a local and S3 file pair and the local file is modified such
 that size is not modified, but its date is, a sync with aws cli would copy
 that modified local file over the existing s3 counterpart (due to the
 source file having a newer mtime when compared to S3 LastModified), but a
 sync with s3cmd --no-check-md5 (which I unfortunately often have to use)
 would not.

 Is this statement accurate?

 Mike

 On Apr 14, 2014, at 6:51 PM, Matt Domsch m...@domsch.com wrote:

  aws-cli (after a quick perusal of their source code) uses the
 LastModified value (set by S3 to be the time the upload of the object
 occurred) on objects in S3, which is obtainable from the ListBucket XML,
 without doing a HEAD call.  They then go on to calculate the difference
 between LastModified and stat.mtime(), accounting for time zone
 differences.   local files use stat.mtime.
 
  aws-cli then proceeds to compare the two values; if LastModified is
 newer than stat.mtime, syncing local-remote is skipped (the remote is
 newer than local); likewise on download, if LastModified is older than the
 local file, it too is skipped.
 
  Neither tool sets LastModified = stat.mtime on upload (nor can they).
  s3cmd gets around this by setting stat.mtime into the file's metadata when
 --preserve (the default) is used, but then would have to use a HEAD or GET
 call to get it back.  s3cmd does update the local on-disk mtime and atime
 when downloading (GETting), because we get the header back, and that's free
 then.  Likewise, aws-cli sets both mtime and atime = LastModified on
 download.
 
  So aside from aws-cli skipping over newer files in destination, I think
 their behavior is identical.
 
 
 
 
 
 
 --
  Learn Graph Databases - Download FREE O'Reilly Book
  Graph Databases is the definitive new guide to graph databases and
 their
  applications. Written by three acclaimed leaders in the field,
  this first edition is now available. Download your free book today!
 
 http://p.sf.net/sfu/NeoTech___
  S3tools-general mailing list
  S3tools-general@lists.sourceforge.net
  https://lists.sourceforge.net/lists/listinfo/s3tools-general

 --
 wag...@wagnerone.com
 Never fall into the trap of judging that which you don't understand as
 nonsense. That error can destroy you.-Feist




 --
 Start Your Social Network Today - Download eXo Platform
 Build your Enterprise Intranet with eXo Platform Software
 Java Based Open Source Intranet - Social, Extensible, Cloud Ready
 Get Started Now And Turn Your Intranet Into A Collaboration Platform
 http://p.sf.net/sfu/ExoPlatform
 ___
 S3tools-general mailing list
 S3tools-general@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/s3tools-general


--
Start Your Social Network Today - Download eXo Platform
Build your Enterprise Intranet with eXo Platform Software
Java Based Open Source Intranet - Social, Extensible, Cloud Ready
Get Started Now And Turn Your Intranet Into A Collaboration Platform
http://p.sf.net/sfu/ExoPlatform___
S3tools-general mailing list
S3tools-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/s3tools-general