Re: [s3ql] Re: S3QL 3.3 performance

Grunthos Wed, 14 Apr 2021 07:09:44 -0700

OK...will have a look into this. I'd definitely like a faster restore 
operation.


In terms of my short term problem, the estimated time for restore using 
multiple rsyncs is now about 60 hours. The estimated time to just copy the 
data from the bucket is 14 hours...can I just setup a 'local' rsync using 
the S3QL encryption key from GC and restore? Or is that too optimistic?  

On Wednesday, April 14, 2021 at 6:50:24 PM UTC+10 [email protected] wrote:

> On Apr 14 2021, Grunthos <[email protected]> wrote:
> > On Wednesday, April 14, 2021 at 5:36:30 PM UTC+10 [email protected] 
> wrote:
> >
> >>
> >> Yes, all of these would be possible and probably be faster. I think 
> >> option (2) would me the best one. 
> >>
> >> Pull requests are welcome :-). 
> >>
> >>
> > I had a funny feeling that might be the answer...and in terms of utility 
> > and design, ISTM that " add a special s3ql command to do a 'tree copy' 
> -- 
> > it would know exactly which blocks it needed and download them en-masse 
> > while restoring files (and would need a lot of cache, possibly even a 
> > temporary cache drive)" is a good plan.
> >
> > I am not at all sure I am up for the (probable) deep-dive required, but 
> if 
> > I were to look at this could you give some suggested starting points? My 
> > very naieve approach (not knowing the internals at all) would be to 
> build a 
> > list of all required blocks, do some kind of topo sort, then start 
> multiple 
> > download threads. As each block was downloaded, determine if a new file 
> can 
> > be copied yet, and if so, copy it, then release and blocks that are no 
> > longer needed.
> >
> > ...like I said, naieve, and hightly dependant on internals...and maybe 
> > should use some kind of private mount to avoid horror.
>
> I think there's a simpler solution.
>
> 1. Add a new special xattr to trigger the functionality (look at
> s3qlcp.py and copy_tree() in fs.py) 
>
> 2. Have fs.py write directly to the destination directory (which should
> be outside the S3QL mountpoint)
>
> 3. Start a number of async workers (no need for threads) that, in a
> loop, download blocks and write them to a given offset in a given fh.
>
> 4. Have the main thread recursively traverse the source and issue "copy"
> requests to the workers (through a queue)
>
> 5. Wait for all workers to finish.
>
> 6. Profit.
>
>
> I wouldn't even bother putting blocks in the cache - just download and
> write to the destination on the fly. It may be worth checking if a block
> is *already* in the cache and, if so, skip download though.
>
>
> With this implementation, blocks referenced by multiple files will be
> downloaded multiple times. I think this can be improved upon once the
> minimum functionality is working.
>
>
> Best,
> -Nikolaus
>
>
> -- 
> GPG Fingerprint: ED31 791B 2C5C 1613 AF38 8B8A D113 FCAC 3C4E 599F
>
> »Time flies like an arrow, fruit flies like a Banana.«
>

-- 
You received this message because you are subscribed to the Google Groups 
"s3ql" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/s3ql/f5dccb84-0b05-4218-9ece-a267e9ac8568n%40googlegroups.com.

Re: [s3ql] Re: S3QL 3.3 performance

Reply via email to