[s3ql] Re: Any way to stabilize S3QL in my convoluted Amazon Cloud Drive setup?

2017-04-04 Thread Riku Bister
heya, thanks for answer. okey havent changed acd_cli setting. but... keep 
reading

Want to share what im doing now:
as the (HMAC mismatch errors) make the s3ql drive read only i decided to 
try more modern client (active project)
rclone mount, my errors went away from factor 100 to 1. been testing now 
couple days and tweeking, 
i totally replaced the read-only acd_cli mount with rclone mount with this 
setting:


*fusermount -u /mnt/amazon; rclone mount --read-only --allow-other 
--acd-templink-threshold 0 --stats 5s --buffer-size 256M -v remote:/ 
/mnt/amazon*one thing i can say, it is alot faster on seeking fsck, instead 
wait more than 30minutes, it can do a check now in 5minutes or so. file 
scanning improvement is huge. as acd_cli is badly outdated, this rclone has 
support natively to retry if bad file came. have been reading other threads 
on internet and it seems people have moved to it. look this topic: 
https://forum.rclone.org/t/best-mount-settings-for-streaming-plex/344
im testing and if this keeps working good im going to dump acd_cli. but i 
will keep still the option to go back acd_cli. 
rclone mount has been up now about 24hours and 100gigabytes has transferred 
(as media metadata scanning) and not even single crash or error on s3ql

i bet rclone could(maybe) work also as read/write to s3ql, havent tried 
yet. mounting the s3ql data folder and keep the database on local fs. as 
acd_cli cant be use as r/w on s3ql. need to try this out later :)

-- 
You received this message because you are subscribed to the Google Groups 
"s3ql" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to s3ql+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


[s3ql] Re: Any way to stabilize S3QL in my convoluted Amazon Cloud Drive setup?

2017-04-03 Thread nikhilc
Hi Riku,

On Sunday, April 2, 2017 at 2:55:11 AM UTC-5, Riku Bister wrote:

>
> We need find now where we can do a retry on HMAC, it is totally random, 
> and when that comes first time the filesystem becames read only, *after 
> first HMAC message comes there cannot be written on filesystem* until run 
> fsck and remount it. 
>

I haven't seen this error in my usage - it may be worth creating a new 
filesystem to verify if the problem is actually a file transfer issue or an 
issue with your existing filesystem.

Also, you seem to be getting much more frequent communication errors than 
I've experienced - do you have acd_cli configured to retry on errors? 
 Here's my config but to avoid derailing the thread refer to acd_cli 
support for this aspect:

acd_client.ini:
[transfer]
fs_chunk_size = 1310720
chunk_retries = 5
connection_timeout = 10
idle_timeout = 20

fuse.ini:
[read]
open_chunk_limit = 500
timeout = 10
 

> *Also there is another crash came 70 OSerror but this is releated on 
> mount.py or something else*
> [...]
>   File "/root/s3ql/s3ql-2.21_modattu/src/s3ql/backends/comprenc.py", line 
> 549, in _read_and_decrypt
> buf = self.fh.read(size)
> OSError: [Errno 70] Communication error on send
>

I ran across this as well and added a retry to comprenc.py.  The error 
hasn't show up again so for now this hack (and really, these are crude 
hacks) isn't tested - I've committed the changes at: 
 https://bitbucket.org/taligentx/s3ql

Hope this helps,
Nikhil

-- 
You received this message because you are subscribed to the Google Groups 
"s3ql" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to s3ql+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


[s3ql] Re: Any way to stabilize S3QL in my convoluted Amazon Cloud Drive setup?

2017-04-02 Thread Riku Bister

after testing some time it is yes way better, like 100x better. but there 
became new problems. ofc i expected this to came. :) 
We need find now where we can do a retry on HMAC, it is totally random, and 
when that comes first time the filesystem becames read only, *after first 
HMAC message comes there cannot be written on filesystem* until run fsck 
and remount it. there is nothing more coming onto log. i was running 
intensity media metacheck on drive. well it seems i cant do it. ALL IS 
Releated read problems, it just should retry when get bad data but it has 
not doing it, it just make problems or crash. there is none error chekking 
on the code atm.

2017-04-02 00:31:20.311 2188:fuse-worker-11 s3ql.backends.local._read_meta: 
OSError: [Errno 70] Communication error on send, retrying
2017-04-02 00:33:52.621 2188:fuse-worker-29 s3ql.fs._readwrite: Backend 
returned malformed data for block 13 of inode 10785 (HMAC mismatch)
2017-04-02 00:39:44.771 2188:fuse-worker-4 s3ql.fs._readwrite: Backend 
returned malformed data for block 0 of inode 10782 (HMAC mismatch)
2017-04-02 00:40:50.619 2188:fuse-worker-3 s3ql.backends.local._read_meta: 
OSError: [Errno 70] Communication error on send, retrying
2017-04-02 00:41:01.669 2188:fuse-worker-3 s3ql.backends.local._read_meta: 
OSError: [Errno 70] Communication error on send, retrying
2017-04-02 01:07:34.093 2188:fuse-worker-16 s3ql.fs._readwrite: Backend 
returned malformed data for block 16 of inode 10773 (HMAC mismatch)
2017-04-02 01:10:36.720 2188:fuse-worker-20 s3ql.backends.local._read_meta: 
OSError: [Errno 70] Communication error on send, retrying
2017-04-02 01:10:49.613 2188:fuse-worker-20 s3ql.backends.local._read_meta: 
OSError: [Errno 70] Communication error on send, retrying
2017-04-02 01:11:01.563 2188:fuse-worker-20 s3ql.backends.local._read_meta: 
OSError: [Errno 70] Communication error on send, retrying
2017-04-02 01:11:31.853 2188:fuse-worker-20 s3ql.backends.local._read_meta: 
OSError: [Errno 70] Communication error on send, retrying
2017-04-02 01:36:53.096 2188:fuse-worker-12 s3ql.fs._readwrite: Backend 
returned malformed data for block 0 of inode 6530 (HMAC mismatch)
2017-04-02 01:42:52.386 2188:fuse-worker-27 s3ql.backends.local._read_meta: 
OSError: [Errno 70] Communication error on send, retrying
2017-04-02 01:42:58.635 2188:fuse-worker-27 s3ql.backends.local._read_meta: 
OSError: [Errno 70] Communication error on send, retrying
2017-04-02 01:53:09.669 2188:fuse-worker-8 s3ql.backends.local._read_meta: 
OSError: [Errno 70] Communication error on send, retrying
2017-04-02 02:42:53.766 2188:fuse-worker-22 s3ql.backends.local._read_meta: 
OSError: [Errno 70] Communication error on send, retrying
2017-04-02 02:43:04.944 2188:fuse-worker-22 s3ql.backends.local._read_meta: 
OSError: [Errno 70] Communication error on send, retrying
2017-04-02 02:43:20.940 2188:fuse-worker-22 s3ql.backends.local._read_meta: 
OSError: [Errno 70] Communication error on send, retrying
2017-04-02 02:43:53.985 2188:fuse-worker-22 s3ql.backends.local._read_meta: 
OSError: [Errno 70] Communication error on send, retrying
2017-04-02 02:51:18.443 2188:fuse-worker-27 s3ql.fs._readwrite: Backend 
returned malformed data for block 7 of inode 7141 (HMAC mismatch)
2017-04-02 02:56:49.378 2188:fuse-worker-3 s3ql.fs._readwrite: Backend 
returned malformed data for block 15 of inode 7129 (HMAC mismatch)
2017-04-02 02:58:07.603 2188:fuse-worker-20 s3ql.backends.local._read_meta: 
OSError: [Errno 70] Communication error on send, retrying
2017-04-02 02:58:16.970 2188:fuse-worker-20 s3ql.backends.local._read_meta: 
OSError: [Errno 70] Communication error on send, retrying
2017-04-02 03:13:59.453 2188:fuse-worker-30 s3ql.fs._readwrite: Backend 
returned malformed data for block 0 of inode 7164 (HMAC mismatch)
2017-04-02 03:22:43.542 2188:fuse-worker-15 s3ql.backends.local._read_meta: 
OSError: [Errno 70] Communication error on send, retrying
2017-04-02 03:59:14.248 2188:fuse-worker-4 s3ql.backends.local._read_meta: 
OSError: [Errno 70] Communication error on send, retrying
2017-04-02 04:02:26.486 2188:fuse-worker-23 s3ql.backends.local._read_meta: 
OSError: [Errno 70] Communication error on send, retrying
2017-04-02 04:14:04.004 2188:fuse-worker-16 s3ql.backends.local._read_meta: 
OSError: [Errno 70] Communication error on send, retrying
2017-04-02 04:14:10.287 2188:fuse-worker-16 s3ql.backends.local._read_meta: 
OSError: [Errno 70] Communication error on send, retrying
2017-04-02 04:14:27.620 2188:fuse-worker-16 s3ql.backends.local._read_meta: 
OSError: [Errno 70] Communication error on send, retrying
2017-04-02 04:14:49.440 2188:fuse-worker-16 s3ql.backends.local._read_meta: 
OSError: [Errno 70] Communication error on send, retrying
2017-04-02 04:15:24.850 2188:fuse-worker-16 s3ql.backends.local._read_meta: 
OSError: [Errno 70] Communication error on send, retrying
2017-04-02 04:17:07.127 2188:fuse-worker-16 s3ql.backends.local._read_meta: 
OSError: [Errno 70] Communication error 

[s3ql] Re: Any way to stabilize S3QL in my convoluted Amazon Cloud Drive setup?

2017-03-03 Thread nikhilc
On Saturday, October 15, 2016 at 9:29:25 AM UTC-5, Mike Beaubien wrote:
>
>   File "/usr/lib/s3ql/s3ql/backends/local.py", line 245, in _read_meta
> buf = fh.read(9)
> OSError: [Errno 70] Communication error on send
>
> It's probably just some temporary error for whatever network reason. Is 
> there anyway to get s3ql to ignore and retry on these errors?
>

The recent S3 outage seemed to trigger this error much more frequently when 
using s3ql with acd_cli and overlayfs.  Out of curiosity and for testing 
purposes, I added a retry loop to local.py, replacing buf = fh.read(9):

Line 20:
import time

Line 242:
def _read_meta(fh):
while True:
try:
buf = fh.read(9)
except OSError as e:
if e.errno in (os.errno.ECOMM, os.errno.EFAULT):
log.info('OSError: %s, retrying' % e)
time.sleep(1)
continue
break

ECOMM error 70 "Communication error on send" (more frequent) and EFAULT 
error 14 "Bad address" (rare) are the only errors I've seen so far using 
s3ql and acd_cli together.  The 1s sleep interval was to prevent hammering 
ACD but in practice it hasn't come into play - logs show the retries 
occurring on wider intervals.  It'd be more ideal to use the retry code 
that already exists in s3ql with the exponentially increasing interval.

Very crude, but so far it's been working well - the filesystem has yet to 
crash after a few days of testing.  As of lately while streaming video, the 
retry occurs at varying intervals - typically 20-45min apart, rarely 10-30s 
apart for a couple of minutes.  Playback is fine with infrequent retries, 
but will buffer briefly if retries are repeated within a few minutes and 
continue normally after buffering (rare).

A real ACD backend would be ideal but for now this has prevented crashes 
and having to wait a couple of hours for fsck to complete.  On a side note, 
there is one advantage to this combination of s3ql, acd_cli, and overlayfs 
with Plex - media files can be stored locally first via the upper layer of 
overlayfs and given time for Plex to perform deep analysis of the files. 
 The files can be uploaded at will once the analysis is complete - the 
analysis would take much longer and chew up bandwidth if the files were 
immediately stored on ACD.

For those willing to experiment, hope this helps!
-Nikhil

-- 
You received this message because you are subscribed to the Google Groups 
"s3ql" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to s3ql+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.