The recent discovery in OAK-4604 shows that my POC suffers from the same problem. I fixed it in my latest commit.
2016-07-26 11:47 GMT+02:00 Francesco Mari <[email protected]>: > With my latest commits on this branch [1] I enabled every previously > ignored test, fixing them when needed., The only two exceptions are > RecordUsageAnalyserTest and SegmentSizeTest, that were simply deleted. > I also added a couple of tests to cover the cases that work slightly > differently than before. > > [1]: https://github.com/francescomari/jackrabbit-oak/tree/dumb > > 2016-07-25 17:48 GMT+02:00 Francesco Mari <[email protected]>: >> It might be a variation in the process I tried. This shouldn't affect >> much the statistics anyway, given that the population sample is big >> enough in both cases. >> >> 2016-07-25 17:46 GMT+02:00 Michael Dürig <[email protected]>: >>> >>> Interesting numbers. Most of them look as I would have expected. I.e. the >>> distributions in the dumb case are more regular (smaller std. dev, mean and >>> median closer to each other), bigger segment sizes, etc. >>> >>> What I don't understand is the total number of records. These numbers differ >>> greatly between current and dumb. Is this a test artefact (i.e. test not >>> reproducible) or are we missing out on something. >>> >>> Michael >>> >>> >>> On 25.7.16 4:01 , Francesco Mari wrote: >>>> >>>> I put together some statistics [1] for the process I described above. >>>> The "dumb" variant requires more segments to store the same amount of >>>> data, because of the increased size of serialised record IDs. As you >>>> can see the amount of records per segment is definitely lower in the >>>> dumb variant. >>>> >>>> On the other hand, ignoring the growth of segment ID reference table >>>> seems to be a good choice. As shown from the segment size average, >>>> dumb segments are usually fuller that their counterpart. Moreover, a >>>> lower standard deviation shows that it's more common to have full dumb >>>> segments. >>>> >>>> In addition, my analysis seems to have found a bug too. There are a >>>> lot of segments with no segment ID references and only one record, >>>> which is very likely to be the segment info. The flush thread writes >>>> every 5 seconds the current segment buffer, provided that the buffer >>>> is not empty. It turns out that a segment buffer is never empty, since >>>> it always contains at least one record. As such, we are currently >>>> leaking almost empty segments every 5 seconds, that waste additional >>>> space on disk because of the padding required by the TAR format. >>>> >>>> [1]: >>>> https://docs.google.com/spreadsheets/d/1gXhmPsm4rDyHnle4TUh-mtB2HRtRyADXALARRFDh7z4/edit?usp=sharing >>>> >>>> 2016-07-25 10:05 GMT+02:00 Michael Dürig <[email protected]>: >>>>> >>>>> >>>>> Hi Jukka, >>>>> >>>>> Thanks for sharing your perspective and the historical background. >>>>> >>>>> I agree that repository size shouldn't be a primary concern. However, we >>>>> have seen many repositories (especially with an external data store) >>>>> where >>>>> the content is extremely fine granular. Much more than in an initial >>>>> content >>>>> installation of CQ (which I believe was one of the initial setup for >>>>> collecting statistics). So we should at least understand the impact of >>>>> the >>>>> patch in various scenarios. >>>>> >>>>> My main concern is the cache footprint of node records. Those are made up >>>>> of >>>>> a list of record ids and would thus grow by a factor of 6 with the >>>>> current >>>>> patch. >>>>> >>>>> Locality is not so much of concern here. I would expect it to actually >>>>> improve as the patch gets rid of the 255 references limit of segments. A >>>>> limit which in practical deployments leads to degeneration of segment >>>>> sizes >>>>> (I regularly see median sizes below 5k). See OAK-2896 for some background >>>>> on >>>>> this. >>>>> Furthermore we already did a big step forward in improving locality in >>>>> concurrent write scenarios when we introduced the >>>>> SegmentBufferWriterPool. >>>>> In essence: thread affinity for segments. >>>>> >>>>> We should probably be more carefully looking at the micro benchmarks. I >>>>> guess we neglected this part a bit in the past. Unfortunately CI >>>>> infrastructure isn't making this easy for us... OTOH those benchmarks >>>>> only >>>>> tell you so much. Many of the problems we recently faced only surfaced in >>>>> the large: huge repos, high concurrent load, many days of traffic. >>>>> >>>>> Michael >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> On 23.7.16 12:34 , Jukka Zitting wrote: >>>>>> >>>>>> >>>>>> Hi, >>>>>> >>>>>> Cool! I'm pretty sure there are various ways in which the format could >>>>>> be >>>>>> improved, as the original design was based mostly on intuition, guided >>>>>> somewhat by collected stats >>>>>> <http://markmail.org/message/kxe3iy2hnodxsghe> >>>>>> and >>>>>> the micro-benchmarks <https://issues.apache.org/jira/browse/OAK-119> >>>>>> used >>>>>> to optimize common operations. >>>>>> >>>>>> Note though that the total size of the repository was not and probably >>>>>> shouldn't be a primary metric, since the size of a typical repository is >>>>>> governed mostly by binaries and string properties (though it's a good >>>>>> idea >>>>>> to make sure you avoid things like duplicates of large binaries). >>>>>> Instead >>>>>> the rationale for squeezing things like record ids to as few bytes as >>>>>> possible is captured in the principles listed in the original design doc >>>>>> <http://jackrabbit.apache.org/oak/docs/nodestore/segmentmk.html>: >>>>>> >>>>>> - Compactness. The formatting of records is optimized for size to >>>>>> reduce >>>>>> IO costs and to fit as much content in caches as possible. A node >>>>>> stored in >>>>>> SegmentNodeStore typically consumes only a fraction of the size it >>>>>> would as >>>>>> a bundle in Jackrabbit Classic. >>>>>> - Locality. Segments are written so that related records, like a node >>>>>> and its immediate children, usually end up stored in the same >>>>>> segment. >>>>>> This >>>>>> makes tree traversals very fast and avoids most cache misses for >>>>>> typical >>>>>> clients that access more than one related node per session. >>>>>> >>>>>> Thus I would recommend keeping an eye also on benchmark results in >>>>>> addition >>>>>> to raw repository size when evaluating possible improvements. Also, the >>>>>> number and size of data segments are good size metrics to look at in >>>>>> addition to total disk usage. >>>>>> >>>>>> BR, >>>>>> >>>>>> Jukka Zitting >>>>>> >>>>>> On Fri, Jul 22, 2016 at 5:55 AM Francesco Mari >>>>>> <[email protected]> >>>>>> wrote: >>>>>> >>>>>>> The impact on repository size needs to be assessed with more specific >>>>>>> tests. In particular, I found RecordUsageAnalyserTest and >>>>>>> SegmentSizeTest unsuitable to this task. It's not a coincidence that >>>>>>> these tests are usually the first to be disabled or blindly updated >>>>>>> every time a small fix changes the size of the records. >>>>>>> >>>>>>> Regarding GC, the segment graph could be computed during the mark >>>>>>> phase. Of course, it's handy to have this information pre-computed for >>>>>>> you, but since the record graph is traversed anyway we could think >>>>>>> about dynamically reconstructing the segment graph when needed. >>>>>>> >>>>>>> There are still so many questions to answer, but I think that this >>>>>>> simplification exercise can be worth the effort. >>>>>>> >>>>>>> 2016-07-22 11:34 GMT+02:00 Michael Dürig <[email protected]>: >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> Hi, >>>>>>>> >>>>>>>> Neat! I would have expected a greater impact on the size of the >>>>>>>> segment >>>>>>>> store. But as you say it probably all depends on the binary/content >>>>>>> >>>>>>> >>>>>>> ratio. I >>>>>>>> >>>>>>>> >>>>>>>> think we should look at the #references / repository size ratio for >>>>>>>> repositories of different structures and see how such a number differs >>>>>>> >>>>>>> >>>>>>> with >>>>>>>> >>>>>>>> >>>>>>>> and without the patch. >>>>>>>> >>>>>>>> I like the patch as it fixes OAK-2896 while at the same time reducing >>>>>>>> complexity a lot. >>>>>>>> >>>>>>>> OTOH we need to figure out how to regain the lost functionality (e.g. >>>>>>>> gc) >>>>>>>> and asses its impact on repository size. >>>>>>>> >>>>>>>> Michael >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> On 22.7.16 11:32 , Francesco Mari wrote: >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> Hi, >>>>>>>>> >>>>>>>>> Yesterday I took some time for a little experiment: how many >>>>>>>>> optimisations can be removed from the current segment format while >>>>>>>>> maintaining the same functionality? >>>>>>>>> >>>>>>>>> I made some work in a branch on GitHub [1]. The code on that branch >>>>>>>>> is >>>>>>>>> similar to the current trunk except for the following changes: >>>>>>>>> >>>>>>>>> 1. Record IDs are always serialised in their entirety. As such, a >>>>>>>>> serialised record ID occupies 18 bytes instead of 3. >>>>>>>>> >>>>>>>>> 2. Because of the previous change, the table of referenced segment >>>>>>>>> IDs >>>>>>>>> is not needed anymore, so I removed it from the segment header. It >>>>>>>>> turns out that this table is indeed needed for the mark phase of >>>>>>>>> compaction, so this feature is broken in that branch. >>>>>>>>> >>>>>>>>> Anyway, since the code is in a runnable state, I generated some >>>>>>>>> content using the current trunk and the dumber version of >>>>>>>>> oak-segment-tar. This is the repository created by the dumb >>>>>>>>> oak-segment-tar: >>>>>>>>> >>>>>>>>> 524744 data00000a.tar >>>>>>>>> 524584 data00001a.tar >>>>>>>>> 524688 data00002a.tar >>>>>>>>> 460896 data00003a.tar >>>>>>>>> 8 journal.log >>>>>>>>> 0 repo.lock >>>>>>>>> >>>>>>>>> This is the one created by the current trunk: >>>>>>>>> >>>>>>>>> 524864 data00000a.tar >>>>>>>>> 524656 data00001a.tar >>>>>>>>> 524792 data00002a.tar >>>>>>>>> 297288 data00003a.tar >>>>>>>>> 8 journal.log >>>>>>>>> 0 repo.lock >>>>>>>>> >>>>>>>>> The process that generates the content doesn't change between the two >>>>>>>>> executions, and the generated content is coming from a real world >>>>>>>>> scenario. For those familiar with it, the content is generated by an >>>>>>>>> installation of Adobe Experience Manager. >>>>>>>>> >>>>>>>>> It looks like that the size of the repository is not changing so >>>>>>>>> much. >>>>>>>>> Probably the de-optimisation in the small is dwarfed by the binary >>>>>>>>> content in the large. Another effect of my change is that there is no >>>>>>>>> limit on the number of referenced segment IDs per segment, and this >>>>>>>>> might allow segments to pack more records than before. >>>>>>>>> >>>>>>>>> Questions apart, the clear advantage of this change is a great >>>>>>>>> simplification of the code. I guess I can remove some lines more, but >>>>>>>>> what I peeled off is already a considerable amount. Look at the code! >>>>>>>>> >>>>>>>>> Francesco >>>>>>>>> >>>>>>>>> [1]: https://github.com/francescomari/jackrabbit-oak/tree/dumb >>>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>> >>>
