Thanks for the in-detailed explanation Brad, The fsync details in the issues make sense.
As a quick follow-up, the smaller and smaller blob sizes I see throughout uploading a file, is that known/expected (i.e some sort of sorting by pk-put?) as well or is that some artifact of the running hash used to split the file? /Viktor On Friday, May 4, 2018 at 7:56:00 PM UTC+2, Brad Fitzpatrick wrote: > > Yes, the write performance is known. > > In general, we've ignored spinning disks as the target of initially > incoming blobs since SSDs and NVMe continue to get cheaper. The first > write should be to SSDs. > > We do have https://github.com/perkeep/perkeep/issues/999 open to track > changing the protocol to allow clients to work with the server to do > more efficient batching. > > For read performance, the "blobpacked" storage format rearranges all > those little blobs into one large contiguous zip file on disk, so read > performance later is very fast, streaming contiguously from disk. So > it's only writes that are slow. > > There is also the "cond" storage type to route schema blobs & data > blobs differently. > > Note that once we change the config file format > (https://github.com/perkeep/perkeep/issues/1134), it'll be much easier > to configure all the wrapper storage targets into arbitrary graphs. > Currently the low-level JSON config for that is tedious. > > For now I recommend you use store all blobs by default on SSD, but put > your "blobpacked" storage on spinning media, which will be like 95% of > your stored bytes. Again, the current config file format doesn't make > that easy, but it'll be trivial (and with documented examples) to do > that soon. > > > > > On Fri, May 4, 2018 at 9:58 AM, Viktor <[email protected] <javascript:>> > wrote: > > Hi, > > > > Just had a look at perkeepd as I found the philosophy interesting, and > did a > > short test by, in short: > > > > Downloading the repo, checking out release/0.10, runing perkeepd and > trying > > to put in some files. Unfortunately the performance was way slower then > my > > network, and thus tried locally on the same machine, first with the blob > > storage on a hdd and then on a ssd (index on ssd in both cases) > > > > I figured that this might be of interest for you, I will also try to > find > > out what is going on to both learn and see if I can use it :-) > > > > 1. When pk-put'ng a file it is initially broken into rather large > > portions/blobs, but towards the end it has degenerated to breaking into > very > > small blobs - I can only assume creating performance problems both in > terms > > of disk seeks/writes and ram usage. > > 2. The write performance is way lower than expected (CPU usage at ~20% > > throughout the pk-put process), generating about 2MB/s write performance > > with blob storage on a Hdd and about 10MB/s on a Ssd, both > significantly > > below the expected performance of the disks. > > > > Is this expected known? > > > > Recreation/case: > > > > Setup: > > git clone https://camlistore.googlesource.com/camlistore perkeep.org > > git checkout release/0.10 > > go make.go > > perkeepd > > (editing config to set index to be on ssd path and blobs to be on hdd > and > > ssd path respectively) > > > > Case 1: > > ---------- > > dd if=/dev/urandom of=random.txt bs=1024 count=1048576 > > time pk-put file --permanode --title="test" --tag=backup random.txt > > > > Simultaneously looking at the output from perkeepd I see that the blob > size > > seems to be decreasing over time, see e.g. the output of the early blobs > and > > late blobs: > > Early: (assuming the first 2 are permanodes etc.) > > 2018/05/04 18:29:47 Received blob > > [sha224-4556e3a2241ea654a80d72d003382f976d864a54fc90b83fef40cf7c; 449 > bytes] > > 2018/05/04 18:29:47 Received blob > > [sha224-3dcc6a1e24475efc417b4ab1b6cfe02ef2d9c085ead96b5a242ed58e; 628 > bytes] > > 2018/05/04 18:29:50 Received blob > > [sha224-0db450c2fbc821a7f3547ca6ad5c8d5971802f2573c5b8d61bade2c8; 68904 > > bytes] > > 2018/05/04 18:29:50 Received blob > > [sha224-425b332fa1ee724e191235a981341db7a25084790a868465e7ed0436; 77478 > > bytes] > > 2018/05/04 18:29:50 Received blob > > [sha224-f5e6e2d09840ab93fc252ff172911f0013ae06c706f43e52b5e5a014; 66016 > > bytes] > > 2018/05/04 18:29:50 Received blob > > [sha224-f3e373b4a528b8ec3734f0408f30d30aba8ca4d743f6b17ba52bda74; 262144 > > bytes] > > 2018/05/04 18:29:50 Received blob > > [sha224-2573b030433ca209ddb5e2aea291946a6f67e5934b6a3a0e1471c8fc; 81458 > > bytes] > > 2018/05/04 18:29:50 Received blob > > [sha224-451b14a507b3e15f44a589be39a34987cab8e403034822367e6a25ab; 68287 > > bytes] > > 2018/05/04 18:29:50 Received blob > > [sha224-45a2a81086cc4497aa0fbded863da10a1a28a0ad6747f44c09c713d3; 69632 > > bytes] > > Late: > > 2018/05/04 18:31:18 Received blob > > [sha224-ebdf3a3e7d72f5eaab737a353ba59d97d82cb4fd108131b1b03befa1; 414 > bytes] > > 2018/05/04 18:31:18 Received blob > > [sha224-363b5bcb8b145fc2f29faf0d6d8de7ddb27bf68107ea544290c178ba; 412 > bytes] > > 2018/05/04 18:31:18 Received blob > > [sha224-7b67715d347b97b6768ffbe7f4c9fcfa0fd083f9055e783bd2527750; 646 > bytes] > > 2018/05/04 18:31:18 Received blob > > [sha224-c96a6e7fd1597cee8f347ef0deee45e645d8665ef96ae6372a582dda; 1947 > > bytes] > > 2018/05/04 18:31:18 Received blob > > [sha224-19bce313cda795b30a7161487cdb7be0108dd3b466e92911a5465fb5; 295 > bytes] > > 2018/05/04 18:31:18 Received blob > > [sha224-cde65fa0e356be938d3e395f2f01bab940631df3609007b96094a8d1; 297 > bytes] > > 2018/05/04 18:31:18 Received blob > > [sha224-e80de3ab6d991b62878e078a7d3140a5d7f4717c2fdaabf4757c91b1; 531 > bytes] > > 2018/05/04 18:31:18 Received blob > > [sha224-fe46fa614191ec90c11a5ccf40282c1d210292d7eca297a0d6ec2609; 412 > bytes] > > 2018/05/04 18:31:18 Received blob > > [sha224-d63b69b880326dbd0aa0e73e2f3cf4305bb144db66bb950476603fb1; 1235 > > bytes] > > 2018/05/04 18:31:18 Received blob > > [sha224-7b51468d9cbcec724b70c1accf822cac7bf3c6aacd25876caf80439d; 295 > bytes] > > > > Maybe this is expected because some sorting is going on (will dig into > the > > code later) - but initially found it a bit strange an a potential > > explanation for 2). > > > > Case 2: > > --------- > > Simply using 1Gb / time for the HDD and SSD case shows about 2MB/s and > > 10MB/s, which is significantly lower than I expected (in particular > since > > cpu utilization is low so it does not seem to be a sha-hasing problem). > > > > > > Hopefully this is useful - I will continue to look into it regardless. > > > > /Viktor > > > > > > -- > > You received this message because you are subscribed to the Google > Groups > > "Perkeep" group. > > To unsubscribe from this group and stop receiving emails from it, send > an > > email to [email protected] <javascript:>. > > For more options, visit https://groups.google.com/d/optout. > -- You received this message because you are subscribed to the Google Groups "Perkeep" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/d/optout.
