[s3ql] Re: Any way to stabilize S3QL in my convoluted Amazon Cloud Drive setup?
heya, thanks for answer. okey havent changed acd_cli setting. but... keep reading Want to share what im doing now: as the (HMAC mismatch errors) make the s3ql drive read only i decided to try more modern client (active project) rclone mount, my errors went away from factor 100 to 1. been testing now couple days and tweeking, i totally replaced the read-only acd_cli mount with rclone mount with this setting: *fusermount -u /mnt/amazon; rclone mount --read-only --allow-other --acd-templink-threshold 0 --stats 5s --buffer-size 256M -v remote:/ /mnt/amazon*one thing i can say, it is alot faster on seeking fsck, instead wait more than 30minutes, it can do a check now in 5minutes or so. file scanning improvement is huge. as acd_cli is badly outdated, this rclone has support natively to retry if bad file came. have been reading other threads on internet and it seems people have moved to it. look this topic: https://forum.rclone.org/t/best-mount-settings-for-streaming-plex/344 im testing and if this keeps working good im going to dump acd_cli. but i will keep still the option to go back acd_cli. rclone mount has been up now about 24hours and 100gigabytes has transferred (as media metadata scanning) and not even single crash or error on s3ql i bet rclone could(maybe) work also as read/write to s3ql, havent tried yet. mounting the s3ql data folder and keep the database on local fs. as acd_cli cant be use as r/w on s3ql. need to try this out later :) -- You received this message because you are subscribed to the Google Groups "s3ql" group. To unsubscribe from this group and stop receiving emails from it, send an email to s3ql+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.
[s3ql] Re: Any way to stabilize S3QL in my convoluted Amazon Cloud Drive setup?
Hi Riku, On Sunday, April 2, 2017 at 2:55:11 AM UTC-5, Riku Bister wrote: > > We need find now where we can do a retry on HMAC, it is totally random, > and when that comes first time the filesystem becames read only, *after > first HMAC message comes there cannot be written on filesystem* until run > fsck and remount it. > I haven't seen this error in my usage - it may be worth creating a new filesystem to verify if the problem is actually a file transfer issue or an issue with your existing filesystem. Also, you seem to be getting much more frequent communication errors than I've experienced - do you have acd_cli configured to retry on errors? Here's my config but to avoid derailing the thread refer to acd_cli support for this aspect: acd_client.ini: [transfer] fs_chunk_size = 1310720 chunk_retries = 5 connection_timeout = 10 idle_timeout = 20 fuse.ini: [read] open_chunk_limit = 500 timeout = 10 > *Also there is another crash came 70 OSerror but this is releated on > mount.py or something else* > [...] > File "/root/s3ql/s3ql-2.21_modattu/src/s3ql/backends/comprenc.py", line > 549, in _read_and_decrypt > buf = self.fh.read(size) > OSError: [Errno 70] Communication error on send > I ran across this as well and added a retry to comprenc.py. The error hasn't show up again so for now this hack (and really, these are crude hacks) isn't tested - I've committed the changes at: https://bitbucket.org/taligentx/s3ql Hope this helps, Nikhil -- You received this message because you are subscribed to the Google Groups "s3ql" group. To unsubscribe from this group and stop receiving emails from it, send an email to s3ql+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.
[s3ql] Re: Any way to stabilize S3QL in my convoluted Amazon Cloud Drive setup?
after testing some time it is yes way better, like 100x better. but there became new problems. ofc i expected this to came. :) We need find now where we can do a retry on HMAC, it is totally random, and when that comes first time the filesystem becames read only, *after first HMAC message comes there cannot be written on filesystem* until run fsck and remount it. there is nothing more coming onto log. i was running intensity media metacheck on drive. well it seems i cant do it. ALL IS Releated read problems, it just should retry when get bad data but it has not doing it, it just make problems or crash. there is none error chekking on the code atm. 2017-04-02 00:31:20.311 2188:fuse-worker-11 s3ql.backends.local._read_meta: OSError: [Errno 70] Communication error on send, retrying 2017-04-02 00:33:52.621 2188:fuse-worker-29 s3ql.fs._readwrite: Backend returned malformed data for block 13 of inode 10785 (HMAC mismatch) 2017-04-02 00:39:44.771 2188:fuse-worker-4 s3ql.fs._readwrite: Backend returned malformed data for block 0 of inode 10782 (HMAC mismatch) 2017-04-02 00:40:50.619 2188:fuse-worker-3 s3ql.backends.local._read_meta: OSError: [Errno 70] Communication error on send, retrying 2017-04-02 00:41:01.669 2188:fuse-worker-3 s3ql.backends.local._read_meta: OSError: [Errno 70] Communication error on send, retrying 2017-04-02 01:07:34.093 2188:fuse-worker-16 s3ql.fs._readwrite: Backend returned malformed data for block 16 of inode 10773 (HMAC mismatch) 2017-04-02 01:10:36.720 2188:fuse-worker-20 s3ql.backends.local._read_meta: OSError: [Errno 70] Communication error on send, retrying 2017-04-02 01:10:49.613 2188:fuse-worker-20 s3ql.backends.local._read_meta: OSError: [Errno 70] Communication error on send, retrying 2017-04-02 01:11:01.563 2188:fuse-worker-20 s3ql.backends.local._read_meta: OSError: [Errno 70] Communication error on send, retrying 2017-04-02 01:11:31.853 2188:fuse-worker-20 s3ql.backends.local._read_meta: OSError: [Errno 70] Communication error on send, retrying 2017-04-02 01:36:53.096 2188:fuse-worker-12 s3ql.fs._readwrite: Backend returned malformed data for block 0 of inode 6530 (HMAC mismatch) 2017-04-02 01:42:52.386 2188:fuse-worker-27 s3ql.backends.local._read_meta: OSError: [Errno 70] Communication error on send, retrying 2017-04-02 01:42:58.635 2188:fuse-worker-27 s3ql.backends.local._read_meta: OSError: [Errno 70] Communication error on send, retrying 2017-04-02 01:53:09.669 2188:fuse-worker-8 s3ql.backends.local._read_meta: OSError: [Errno 70] Communication error on send, retrying 2017-04-02 02:42:53.766 2188:fuse-worker-22 s3ql.backends.local._read_meta: OSError: [Errno 70] Communication error on send, retrying 2017-04-02 02:43:04.944 2188:fuse-worker-22 s3ql.backends.local._read_meta: OSError: [Errno 70] Communication error on send, retrying 2017-04-02 02:43:20.940 2188:fuse-worker-22 s3ql.backends.local._read_meta: OSError: [Errno 70] Communication error on send, retrying 2017-04-02 02:43:53.985 2188:fuse-worker-22 s3ql.backends.local._read_meta: OSError: [Errno 70] Communication error on send, retrying 2017-04-02 02:51:18.443 2188:fuse-worker-27 s3ql.fs._readwrite: Backend returned malformed data for block 7 of inode 7141 (HMAC mismatch) 2017-04-02 02:56:49.378 2188:fuse-worker-3 s3ql.fs._readwrite: Backend returned malformed data for block 15 of inode 7129 (HMAC mismatch) 2017-04-02 02:58:07.603 2188:fuse-worker-20 s3ql.backends.local._read_meta: OSError: [Errno 70] Communication error on send, retrying 2017-04-02 02:58:16.970 2188:fuse-worker-20 s3ql.backends.local._read_meta: OSError: [Errno 70] Communication error on send, retrying 2017-04-02 03:13:59.453 2188:fuse-worker-30 s3ql.fs._readwrite: Backend returned malformed data for block 0 of inode 7164 (HMAC mismatch) 2017-04-02 03:22:43.542 2188:fuse-worker-15 s3ql.backends.local._read_meta: OSError: [Errno 70] Communication error on send, retrying 2017-04-02 03:59:14.248 2188:fuse-worker-4 s3ql.backends.local._read_meta: OSError: [Errno 70] Communication error on send, retrying 2017-04-02 04:02:26.486 2188:fuse-worker-23 s3ql.backends.local._read_meta: OSError: [Errno 70] Communication error on send, retrying 2017-04-02 04:14:04.004 2188:fuse-worker-16 s3ql.backends.local._read_meta: OSError: [Errno 70] Communication error on send, retrying 2017-04-02 04:14:10.287 2188:fuse-worker-16 s3ql.backends.local._read_meta: OSError: [Errno 70] Communication error on send, retrying 2017-04-02 04:14:27.620 2188:fuse-worker-16 s3ql.backends.local._read_meta: OSError: [Errno 70] Communication error on send, retrying 2017-04-02 04:14:49.440 2188:fuse-worker-16 s3ql.backends.local._read_meta: OSError: [Errno 70] Communication error on send, retrying 2017-04-02 04:15:24.850 2188:fuse-worker-16 s3ql.backends.local._read_meta: OSError: [Errno 70] Communication error on send, retrying 2017-04-02 04:17:07.127 2188:fuse-worker-16 s3ql.backends.local._read_meta: OSError: [Errno 70] Communication error
[s3ql] Re: Any way to stabilize S3QL in my convoluted Amazon Cloud Drive setup?
On Saturday, October 15, 2016 at 9:29:25 AM UTC-5, Mike Beaubien wrote: > > File "/usr/lib/s3ql/s3ql/backends/local.py", line 245, in _read_meta > buf = fh.read(9) > OSError: [Errno 70] Communication error on send > > It's probably just some temporary error for whatever network reason. Is > there anyway to get s3ql to ignore and retry on these errors? > The recent S3 outage seemed to trigger this error much more frequently when using s3ql with acd_cli and overlayfs. Out of curiosity and for testing purposes, I added a retry loop to local.py, replacing buf = fh.read(9): Line 20: import time Line 242: def _read_meta(fh): while True: try: buf = fh.read(9) except OSError as e: if e.errno in (os.errno.ECOMM, os.errno.EFAULT): log.info('OSError: %s, retrying' % e) time.sleep(1) continue break ECOMM error 70 "Communication error on send" (more frequent) and EFAULT error 14 "Bad address" (rare) are the only errors I've seen so far using s3ql and acd_cli together. The 1s sleep interval was to prevent hammering ACD but in practice it hasn't come into play - logs show the retries occurring on wider intervals. It'd be more ideal to use the retry code that already exists in s3ql with the exponentially increasing interval. Very crude, but so far it's been working well - the filesystem has yet to crash after a few days of testing. As of lately while streaming video, the retry occurs at varying intervals - typically 20-45min apart, rarely 10-30s apart for a couple of minutes. Playback is fine with infrequent retries, but will buffer briefly if retries are repeated within a few minutes and continue normally after buffering (rare). A real ACD backend would be ideal but for now this has prevented crashes and having to wait a couple of hours for fsck to complete. On a side note, there is one advantage to this combination of s3ql, acd_cli, and overlayfs with Plex - media files can be stored locally first via the upper layer of overlayfs and given time for Plex to perform deep analysis of the files. The files can be uploaded at will once the analysis is complete - the analysis would take much longer and chew up bandwidth if the files were immediately stored on ACD. For those willing to experiment, hope this helps! -Nikhil -- You received this message because you are subscribed to the Google Groups "s3ql" group. To unsubscribe from this group and stop receiving emails from it, send an email to s3ql+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.