Thanks for the pointers. I've managed to solve the performance issue, through two things:
1. I wrote a simple seccomp wrapper that just silently ignores calls to fsync & sync. Obviously I don't have any intention of using this after the initial import; I'm not crazy. But as expected, this sped things up a lot. 2. The bigger difference came from switching to diskpacked storage. Is there a reason this isn't the default? It managed to get through copying about 20GiB of data while I was in the shower, so I think this solves my immediate issue. Thanks again, -Ian Quoting Brad Fitzpatrick (2019-05-03 15:16:25) > Perkeep in generally is very (too) aggressive at fsyncing per blob, and > it cuts up files into lots of small blobs, so importing lots of data is > slow. There's a plan to fix this, but life (baby) got in the� way, so > it's kinda on hold until I find a few minutes to think. The two high > level plans is to let clients specify transactions implicitly or > explicitly: implicitly = one multipart/mime POST of a bunch of blobs is > one transaction so should be 1 fsync, not 1 per blob, serially. The > more complex one involves API changes and lets clients create their own > transactions and associate, say, a whole file or directory upload with > that transaction, and then wait on all the associated blobs to be > committed (fsynced, or whatever blob storage impl requires) before > noting that it's good locally. > As for (2), though, pk-put won't repeat any work it's done. It'll still > walk your local filesystem to see what's there, but it'll learn that > it's already uploaded from either its local cache or from the server > before it uploads chunks again. > So it might be slow (throughput wise) but holding 2TB should be no > problem, and auto-resume should work. If you run with the pk-put > verbose option it'll show lots of stats about where which phases are > at. > > On Fri, May 3, 2019 at 11:13 AM Ian Denhardt <[1][email protected]> > wrote: > > Hey All, > I have about 2TB of files that I'm looking at importing into > perkeep. I > have a couple questions. > First, do others have experience they can share re: how perkeep > performs > holding this much data? From what I've read it sounds like > architecturally it should be manageable, but I'd like to know if > anyone > can say how that's worked out in practice for them. > Assuming this is realistic, I have some logistical questions about > getting the data in there in the first place. > I left a pk-put going on a large sub-tree last night, and came back > to > it today. It had spent about 12 hours copying things, finally > running in > to some hiccough uploading a particular file (I don't have the error > message recorded, but it was something along the lines of "server > did > not receive blob"). Trying to upload that file again worked fine, so > I > assume some transient thing. > During the transfer, usage on the drives holding the blobs grew by > about > 80 GiB. This is transferring data between two hard drives connected > to > the same machine via USB 3.0. Questions: > 1. Is that kind of performance normal for pk-put? > 2. Is there currently any way to do a "resumable" version of pk-put, > � � where it can quickly pick up where it left off? > If the answer to (2) is no, I might be interested in contributing > such a > feature, and would appreciate pointers as to where to start. > Thanks. > -Ian > -- > You received this message because you are subscribed to the Google > Groups "Perkeep" group. > To unsubscribe from this group and stop receiving emails from it, > send an email to [2][email protected]. > For more options, visit [3]https://groups.google.com/d/optout. > > -- > You received this message because you are subscribed to the Google > Groups "Perkeep" group. > To unsubscribe from this group and stop receiving emails from it, send > an email to [4][email protected]. > For more options, visit [5]https://groups.google.com/d/optout. > > Verweise > > 1. mailto:[email protected] > 2. mailto:perkeep%[email protected] > 3. https://groups.google.com/d/optout > 4. mailto:[email protected] > 5. https://groups.google.com/d/optout -- You received this message because you are subscribed to the Google Groups "Perkeep" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/d/optout.
