It might be a variation in the process I tried. This shouldn't affect much the statistics anyway, given that the population sample is big enough in both cases.
2016-07-25 17:46 GMT+02:00 Michael Dürig <[email protected]>: > > Interesting numbers. Most of them look as I would have expected. I.e. the > distributions in the dumb case are more regular (smaller std. dev, mean and > median closer to each other), bigger segment sizes, etc. > > What I don't understand is the total number of records. These numbers differ > greatly between current and dumb. Is this a test artefact (i.e. test not > reproducible) or are we missing out on something. > > Michael > > > On 25.7.16 4:01 , Francesco Mari wrote: >> >> I put together some statistics [1] for the process I described above. >> The "dumb" variant requires more segments to store the same amount of >> data, because of the increased size of serialised record IDs. As you >> can see the amount of records per segment is definitely lower in the >> dumb variant. >> >> On the other hand, ignoring the growth of segment ID reference table >> seems to be a good choice. As shown from the segment size average, >> dumb segments are usually fuller that their counterpart. Moreover, a >> lower standard deviation shows that it's more common to have full dumb >> segments. >> >> In addition, my analysis seems to have found a bug too. There are a >> lot of segments with no segment ID references and only one record, >> which is very likely to be the segment info. The flush thread writes >> every 5 seconds the current segment buffer, provided that the buffer >> is not empty. It turns out that a segment buffer is never empty, since >> it always contains at least one record. As such, we are currently >> leaking almost empty segments every 5 seconds, that waste additional >> space on disk because of the padding required by the TAR format. >> >> [1]: >> https://docs.google.com/spreadsheets/d/1gXhmPsm4rDyHnle4TUh-mtB2HRtRyADXALARRFDh7z4/edit?usp=sharing >> >> 2016-07-25 10:05 GMT+02:00 Michael Dürig <[email protected]>: >>> >>> >>> Hi Jukka, >>> >>> Thanks for sharing your perspective and the historical background. >>> >>> I agree that repository size shouldn't be a primary concern. However, we >>> have seen many repositories (especially with an external data store) >>> where >>> the content is extremely fine granular. Much more than in an initial >>> content >>> installation of CQ (which I believe was one of the initial setup for >>> collecting statistics). So we should at least understand the impact of >>> the >>> patch in various scenarios. >>> >>> My main concern is the cache footprint of node records. Those are made up >>> of >>> a list of record ids and would thus grow by a factor of 6 with the >>> current >>> patch. >>> >>> Locality is not so much of concern here. I would expect it to actually >>> improve as the patch gets rid of the 255 references limit of segments. A >>> limit which in practical deployments leads to degeneration of segment >>> sizes >>> (I regularly see median sizes below 5k). See OAK-2896 for some background >>> on >>> this. >>> Furthermore we already did a big step forward in improving locality in >>> concurrent write scenarios when we introduced the >>> SegmentBufferWriterPool. >>> In essence: thread affinity for segments. >>> >>> We should probably be more carefully looking at the micro benchmarks. I >>> guess we neglected this part a bit in the past. Unfortunately CI >>> infrastructure isn't making this easy for us... OTOH those benchmarks >>> only >>> tell you so much. Many of the problems we recently faced only surfaced in >>> the large: huge repos, high concurrent load, many days of traffic. >>> >>> Michael >>> >>> >>> >>> >>> >>> On 23.7.16 12:34 , Jukka Zitting wrote: >>>> >>>> >>>> Hi, >>>> >>>> Cool! I'm pretty sure there are various ways in which the format could >>>> be >>>> improved, as the original design was based mostly on intuition, guided >>>> somewhat by collected stats >>>> <http://markmail.org/message/kxe3iy2hnodxsghe> >>>> and >>>> the micro-benchmarks <https://issues.apache.org/jira/browse/OAK-119> >>>> used >>>> to optimize common operations. >>>> >>>> Note though that the total size of the repository was not and probably >>>> shouldn't be a primary metric, since the size of a typical repository is >>>> governed mostly by binaries and string properties (though it's a good >>>> idea >>>> to make sure you avoid things like duplicates of large binaries). >>>> Instead >>>> the rationale for squeezing things like record ids to as few bytes as >>>> possible is captured in the principles listed in the original design doc >>>> <http://jackrabbit.apache.org/oak/docs/nodestore/segmentmk.html>: >>>> >>>> - Compactness. The formatting of records is optimized for size to >>>> reduce >>>> IO costs and to fit as much content in caches as possible. A node >>>> stored in >>>> SegmentNodeStore typically consumes only a fraction of the size it >>>> would as >>>> a bundle in Jackrabbit Classic. >>>> - Locality. Segments are written so that related records, like a node >>>> and its immediate children, usually end up stored in the same >>>> segment. >>>> This >>>> makes tree traversals very fast and avoids most cache misses for >>>> typical >>>> clients that access more than one related node per session. >>>> >>>> Thus I would recommend keeping an eye also on benchmark results in >>>> addition >>>> to raw repository size when evaluating possible improvements. Also, the >>>> number and size of data segments are good size metrics to look at in >>>> addition to total disk usage. >>>> >>>> BR, >>>> >>>> Jukka Zitting >>>> >>>> On Fri, Jul 22, 2016 at 5:55 AM Francesco Mari >>>> <[email protected]> >>>> wrote: >>>> >>>>> The impact on repository size needs to be assessed with more specific >>>>> tests. In particular, I found RecordUsageAnalyserTest and >>>>> SegmentSizeTest unsuitable to this task. It's not a coincidence that >>>>> these tests are usually the first to be disabled or blindly updated >>>>> every time a small fix changes the size of the records. >>>>> >>>>> Regarding GC, the segment graph could be computed during the mark >>>>> phase. Of course, it's handy to have this information pre-computed for >>>>> you, but since the record graph is traversed anyway we could think >>>>> about dynamically reconstructing the segment graph when needed. >>>>> >>>>> There are still so many questions to answer, but I think that this >>>>> simplification exercise can be worth the effort. >>>>> >>>>> 2016-07-22 11:34 GMT+02:00 Michael Dürig <[email protected]>: >>>>>> >>>>>> >>>>>> >>>>>> Hi, >>>>>> >>>>>> Neat! I would have expected a greater impact on the size of the >>>>>> segment >>>>>> store. But as you say it probably all depends on the binary/content >>>>> >>>>> >>>>> ratio. I >>>>>> >>>>>> >>>>>> think we should look at the #references / repository size ratio for >>>>>> repositories of different structures and see how such a number differs >>>>> >>>>> >>>>> with >>>>>> >>>>>> >>>>>> and without the patch. >>>>>> >>>>>> I like the patch as it fixes OAK-2896 while at the same time reducing >>>>>> complexity a lot. >>>>>> >>>>>> OTOH we need to figure out how to regain the lost functionality (e.g. >>>>>> gc) >>>>>> and asses its impact on repository size. >>>>>> >>>>>> Michael >>>>>> >>>>>> >>>>>> >>>>>> On 22.7.16 11:32 , Francesco Mari wrote: >>>>>>> >>>>>>> >>>>>>> >>>>>>> Hi, >>>>>>> >>>>>>> Yesterday I took some time for a little experiment: how many >>>>>>> optimisations can be removed from the current segment format while >>>>>>> maintaining the same functionality? >>>>>>> >>>>>>> I made some work in a branch on GitHub [1]. The code on that branch >>>>>>> is >>>>>>> similar to the current trunk except for the following changes: >>>>>>> >>>>>>> 1. Record IDs are always serialised in their entirety. As such, a >>>>>>> serialised record ID occupies 18 bytes instead of 3. >>>>>>> >>>>>>> 2. Because of the previous change, the table of referenced segment >>>>>>> IDs >>>>>>> is not needed anymore, so I removed it from the segment header. It >>>>>>> turns out that this table is indeed needed for the mark phase of >>>>>>> compaction, so this feature is broken in that branch. >>>>>>> >>>>>>> Anyway, since the code is in a runnable state, I generated some >>>>>>> content using the current trunk and the dumber version of >>>>>>> oak-segment-tar. This is the repository created by the dumb >>>>>>> oak-segment-tar: >>>>>>> >>>>>>> 524744 data00000a.tar >>>>>>> 524584 data00001a.tar >>>>>>> 524688 data00002a.tar >>>>>>> 460896 data00003a.tar >>>>>>> 8 journal.log >>>>>>> 0 repo.lock >>>>>>> >>>>>>> This is the one created by the current trunk: >>>>>>> >>>>>>> 524864 data00000a.tar >>>>>>> 524656 data00001a.tar >>>>>>> 524792 data00002a.tar >>>>>>> 297288 data00003a.tar >>>>>>> 8 journal.log >>>>>>> 0 repo.lock >>>>>>> >>>>>>> The process that generates the content doesn't change between the two >>>>>>> executions, and the generated content is coming from a real world >>>>>>> scenario. For those familiar with it, the content is generated by an >>>>>>> installation of Adobe Experience Manager. >>>>>>> >>>>>>> It looks like that the size of the repository is not changing so >>>>>>> much. >>>>>>> Probably the de-optimisation in the small is dwarfed by the binary >>>>>>> content in the large. Another effect of my change is that there is no >>>>>>> limit on the number of referenced segment IDs per segment, and this >>>>>>> might allow segments to pack more records than before. >>>>>>> >>>>>>> Questions apart, the clear advantage of this change is a great >>>>>>> simplification of the code. I guess I can remove some lines more, but >>>>>>> what I peeled off is already a considerable amount. Look at the code! >>>>>>> >>>>>>> Francesco >>>>>>> >>>>>>> [1]: https://github.com/francescomari/jackrabbit-oak/tree/dumb >>>>>>> >>>>>> >>>>> >>>> >>> >
