Re: Resumable pk put?

tamas Mon, 06 May 2019 21:15:45 -0700

Diskpacked uses a few big files, so it is gentler to the filesystem.




The best combination is a blobpacked storage, with filepacked small-, and 
diskpacked large blob backend.




This ensures that the blobs are packed, zipped, close together, but uses less 
files.




As I know this is achievable only through the low-level config, now.




Tamás Gulácsi






On Tue, May 7, 2019 at 1:24 AM +0200, "Ian Denhardt" <[email protected]> wrote:










Thanks for the pointers. I've managed to solve the performance issue,
through two things:

1. I wrote a simple seccomp wrapper that just silently ignores calls to
   fsync & sync. Obviously I don't have any intention of using this
   after the initial import; I'm not crazy. But as expected, this sped
   things up a lot.
2. The bigger difference came from switching to diskpacked storage. Is
   there a reason this isn't the default?

It managed to get through copying about 20GiB of data while I was in the
shower, so I think this solves my immediate issue.

Thanks again,

-Ian

Quoting Brad Fitzpatrick (2019-05-03 15:16:25)
>    Perkeep in generally is very (too) aggressive at fsyncing per blob, and
>    it cuts up files into lots of small blobs, so importing lots of data is
>    slow. There's a plan to fix this, but life (baby) got in the� way, so
>    it's kinda on hold until I find a few minutes to think. The two high
>    level plans is to let clients specify transactions implicitly or
>    explicitly: implicitly = one multipart/mime POST of a bunch of blobs is
>    one transaction so should be 1 fsync, not 1 per blob, serially. The
>    more complex one involves API changes and lets clients create their own
>    transactions and associate, say, a whole file or directory upload with
>    that transaction, and then wait on all the associated blobs to be
>    committed (fsynced, or whatever blob storage impl requires) before
>    noting that it's good locally.
>    As for (2), though, pk-put won't repeat any work it's done. It'll still
>    walk your local filesystem to see what's there, but it'll learn that
>    it's already uploaded from either its local cache or from the server
>    before it uploads chunks again.
>    So it might be slow (throughput wise) but holding 2TB should be no
>    problem, and auto-resume should work. If you run with the pk-put
>    verbose option it'll show lots of stats about where which phases are
>    at.
>
>    On Fri, May 3, 2019 at 11:13 AM Ian Denhardt <[1][email protected]>
>    wrote:
>
>      Hey All,
>      I have about 2TB of files that I'm looking at importing into
>      perkeep. I
>      have a couple questions.
>      First, do others have experience they can share re: how perkeep
>      performs
>      holding this much data? From what I've read it sounds like
>      architecturally it should be manageable, but I'd like to know if
>      anyone
>      can say how that's worked out in practice for them.
>      Assuming this is realistic, I have some logistical questions about
>      getting the data in there in the first place.
>      I left a pk-put going on a large sub-tree last night, and came back
>      to
>      it today. It had spent about 12 hours copying things, finally
>      running in
>      to some hiccough uploading a particular file (I don't have the error
>      message recorded, but it was something along the lines of "server
>      did
>      not receive blob"). Trying to upload that file again worked fine, so
>      I
>      assume some transient thing.
>      During the transfer, usage on the drives holding the blobs grew by
>      about
>      80 GiB. This is transferring data between two hard drives connected
>      to
>      the same machine via USB 3.0. Questions:
>      1. Is that kind of performance normal for pk-put?
>      2. Is there currently any way to do a "resumable" version of pk-put,
>      �  � where it can quickly pick up where it left off?
>      If the answer to (2) is no, I might be interested in contributing
>      such a
>      feature, and would appreciate pointers as to where to start.
>      Thanks.
>      -Ian
>      --
>      You received this message because you are subscribed to the Google
>      Groups "Perkeep" group.
>      To unsubscribe from this group and stop receiving emails from it,
>      send an email to [2][email protected].
>      For more options, visit [3]https://groups.google.com/d/optout.
>
>    --
>    You received this message because you are subscribed to the Google
>    Groups "Perkeep" group.
>    To unsubscribe from this group and stop receiving emails from it, send
>    an email to [4][email protected].
>    For more options, visit [5]https://groups.google.com/d/optout.
>
> Verweise
>
>    1. mailto:[email protected]
>    2. mailto:perkeep%[email protected]
>    3. https://groups.google.com/d/optout
>    4. mailto:[email protected]
>    5. https://groups.google.com/d/optout

-- 
You received this message because you are subscribed to the Google Groups 
"Perkeep" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.





-- 
You received this message because you are subscribed to the Google Groups 
"Perkeep" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/perkeep/4F88B3ED9B7D9141.507f5fde-e0a4-4eb6-8a04-1243362d0c11%40mail.outlook.com.
For more options, visit https://groups.google.com/d/optout.

Re: Resumable pk put?

Reply via email to