I am busy with an exciting setup create a nice approach to
have a huge cloud drive.  I think a lot of us will need this seen
some cloud services shut down their unlimited space. (Google Drive)

This time I want to do it right, this means:
1) encryption
2) compression
3) deduplication (to save my wallet)
4) sync on the fly

After many try outs both on windows and linux, failures with 'cloud drive' 
solutions
such as tvtdrive, raidrive, expandrive, mountain duck + cryptomator
(and losing some data due to bad design, crash computer, losing cache etc)

Then I did consider to use Mega.nz seen they have a good offer for a lot of 
cloud space,
with zeroknowledge encryption (is it?).  When I did see no way to check 
hashes, rclone has no 2FA for mega (yet) seen outdated go library,  I need 
to trust Go (Rclone) - but I prefer python of course and even worse, it is 
based on symmetric encryption where you put your key in their client !  And 
I saw no way to replicate/backup the storage. I was done with that one as 
well.

So finally found an excellent s3 cloud provider, wasabi, seen I hate to pay 
egress and this
matters when You use s3ql !  So no payment for up and downloads. And there 
are s3 providers in Europe as well, the place I live.

So I did end up with s3ql + wasabi
And it is the best thing I ever could do so far !

s3ql and the team of Nikolaus Rath is way too humble in my fair opinion on 
this project !
Belief me.

Ok. Now we go to my setup:

My setup is as follow. I do use s3 cloud space wasabi. On top of that, I 
run s3ql so I have a posix filesystem, having encryption, compression and 
deduplication ! I use a permanent and big cache on a 16TB zfs filesytem on 
FreeBSD. God, I like ZFS. That filesystem ROCKS. No single filesystem come 
close to that piece of gold.

So in this setup, my ZFS secures me against bitrot somehow, however it 
stores the cache only anyway of my files of s3ql. 

s3ql provides to me a mountable space to s3 wasabi, so I have a posix 
filesystem. It does compress, deduplicate and encrypt. It syncs if needed 
the blocks not in the cache on the fly. When it does crash, s3ql checks the 
cache against the files in s3... so I like that as well...

Once it is mounted I provide a share on the network. using NFS or SMB...
The big cache is used to backup with borgbackup or to access files that are 
highly needed.... So I do something crazy here, I use a HUGE s3ql cache so 
I cache my bucket
so files do not need to be downloaded from s3 bucket.

So in my setup my source files are in s3 cloud, the local storage is used 
as cache for those files.  Not the other way, I use a NAS to store my files 
and sync them with s3
Needless to say I need to have high trust on s3ql filesystem for this, so 
for now I test with unimportant data I can find back on the internet.

So now I need some help, because I have now my cloud drive, of course I can 
replicate and backup using s3 my bucket, but I am still very prudent and I 
want to have my loval backup. I want tripple security.

For this my eye is on borgbackup, seen it has deduplication. It has NOT s3 
endpoints, however seen s3ql mounts my data in my linux tree, this is not 
an issue.
I do like borgbackup because it has also deduplication and it is pretty 
fast.

However the manual speaks about inodes.
To create a borg repo, there are 4 options (man borg create)
https://borgbackup.readthedocs.io/en/stable/usage/create.html

Backup speed is increased by not reprocessing files that are already part 
of existing archives and weren’t modified.  The detection of unmodified 
files is done by comparing multiple file metadata values with previous 
values kept in the files cache.

This comparison can operate in different modes as given by --files-cache:

    ctime,size,inode (default)
    mtime,size,inode (default behaviour of borg versions older than 
1.1.0rc4)
    ctime,size (ignore the inode number)
    mtime,size (ignore the inode number)
    rechunk,ctime (all files are considered modified - rechunk, cache ctime)
    rechunk,mtime (all files are considered modified - rechunk, cache mtime)
    disabled (disable the files cache, all files considered modified - 
rechunk)

At the moment, as said,  I use a very large cache on my S3QL files. My 
cache equals (almost) stored data. A few TB. Which is not pretty space 
efficient, locally
however files are of course fast to access. For that I use a huge 10TB.  

Not sure if that need to be really be a raid or ZFS to avoid data loss as 
well...

In fact what happens if the cache got corrupted?  how does s3ql detects 
corrupted cached files? Suppose I put this on a single drive, not on raid, 
and cache gets corrupted.

Seen I use a large cache with a big timeframe not to timeout
Would my source files be corrupted?  Does s3ql repairs my corrupted cache?
Do i need to schedule a command to check my local cache against corruption?
Will my corrupted cache been served until I perform some steps?

So why such huge cache.  I do use that because I do not want files are 
downloaded each time from S3 space ....when I do backup. The files need to 
be compared if they are changed or not, however my files stored in s3ql 
rarely change.

Borgbackup states, if INODES are stable, you can leave checking by default 
(csize, size,inodes) They say SMB is not stable...

So are inodes stable in S3QL ? I see this is a fuse filesystem, but what 
about inodes, once mounted, can they be considered be stable for both 
cached and not cached files? Can it be that only the cached files do have 
stable inodes?

What if I make S3QL to use a much smaller (default size) cache, seen I 
rarely access and use the files...

How should I compare the uncached files with the borg repository in such 
way, if files are not cached or not changed, they are not downloaded?  Do I 
need to exclude the inode option check?

What about cached files that are not changed?

So I need some guidance here.

Thanks.










-- 
You received this message because you are subscribed to the Google Groups 
"s3ql" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/s3ql/84204e2e-02ac-4a09-9a09-ea4b44c03741n%40googlegroups.com.

Reply via email to