Re: Are dumb segments dumb?

Francesco Mari Wed, 10 Aug 2016 03:20:07 -0700

While the testing effort on dumb segments is ongoing, I opened
OAK-4659 and attached a patch to it. This change is based on the dumb
segments, and improves the format by implementing logic record IDs.
This way, records can by addressed by a record number instead of using
their offsets inside the segment.


2016-07-27 17:06 GMT+02:00 Michael Dürig <[email protected]>:
>
> Looks good! I think we should give this one a spin. Some minor points we
> should keep an eye on before we commit this though:
>
> - does tooling still work with the changes in the segment format? Some of
> them access the segments directly such that expanding the segment header by
> 2 bytes might break them.
>
> - have a look at the micro benchmarks and compare to before.
>
> - remind us to remember ;-) updating the documentation of the segment format
> at some point
>
> - I would like to have something along the lines of the segment size test
> back. Probably not as a unit test but more as a benchmark for record sizes.
> So instead of it failing the build, it would output some numbers which we
> could then graph very much the same way like for performance benchmarks.
>
> Michael
>
>
>
> On 26.7.16 11:47 , Francesco Mari wrote:
>>
>> With my latest commits on this branch [1] I enabled every previously
>> ignored test, fixing them when needed., The only two exceptions are
>> RecordUsageAnalyserTest and SegmentSizeTest, that were simply deleted.
>> I also added a couple of tests to cover the cases that work slightly
>> differently than before.
>>
>> [1]: https://github.com/francescomari/jackrabbit-oak/tree/dumb
>>
>> 2016-07-25 17:48 GMT+02:00 Francesco Mari <[email protected]>:
>>>
>>> It might be a variation in the process I tried. This shouldn't affect
>>> much the statistics anyway, given that the population sample is big
>>> enough in both cases.
>>>
>>> 2016-07-25 17:46 GMT+02:00 Michael Dürig <[email protected]>:
>>>>
>>>>
>>>> Interesting numbers. Most of them look as I would have expected. I.e.
>>>> the
>>>> distributions in the dumb case are more regular (smaller std. dev, mean
>>>> and
>>>> median closer to each other), bigger segment sizes, etc.
>>>>
>>>> What I don't understand is the total number of records. These numbers
>>>> differ
>>>> greatly between current and dumb. Is this a test artefact (i.e. test not
>>>> reproducible) or are we missing out on something.
>>>>
>>>> Michael
>>>>
>>>>
>>>> On 25.7.16 4:01 , Francesco Mari wrote:
>>>>>
>>>>>
>>>>> I put together some statistics [1] for the process I described above.
>>>>> The "dumb" variant requires more segments to store the same amount of
>>>>> data, because of the increased size of serialised record IDs.  As you
>>>>> can see the amount of records per segment is definitely lower in the
>>>>> dumb variant.
>>>>>
>>>>> On the other hand, ignoring the growth of segment ID reference table
>>>>> seems to be a good choice. As shown from the segment size average,
>>>>> dumb segments are usually fuller that their counterpart. Moreover, a
>>>>> lower standard deviation shows that it's more common to have full dumb
>>>>> segments.
>>>>>
>>>>> In addition, my analysis seems to have found a bug too. There are a
>>>>> lot of segments with no segment ID references and only one record,
>>>>> which is very likely to be the segment info. The flush thread writes
>>>>> every 5 seconds the current segment buffer, provided that the buffer
>>>>> is not empty. It turns out that a segment buffer is never empty, since
>>>>> it always contains at least one record. As such, we are currently
>>>>> leaking almost empty segments every 5 seconds, that waste additional
>>>>> space on disk because of the padding required by the TAR format.
>>>>>
>>>>> [1]:
>>>>>
>>>>> https://docs.google.com/spreadsheets/d/1gXhmPsm4rDyHnle4TUh-mtB2HRtRyADXALARRFDh7z4/edit?usp=sharing
>>>>>
>>>>> 2016-07-25 10:05 GMT+02:00 Michael Dürig <[email protected]>:
>>>>>>
>>>>>>
>>>>>>
>>>>>> Hi Jukka,
>>>>>>
>>>>>> Thanks for sharing your perspective and the historical background.
>>>>>>
>>>>>> I agree that repository size shouldn't be a primary concern. However,
>>>>>> we
>>>>>> have seen many repositories (especially with an external data store)
>>>>>> where
>>>>>> the content is extremely fine granular. Much more than in an initial
>>>>>> content
>>>>>> installation of CQ (which I believe was one of the initial setup for
>>>>>> collecting statistics). So we should at least understand the impact of
>>>>>> the
>>>>>> patch in various scenarios.
>>>>>>
>>>>>> My main concern is the cache footprint of node records. Those are made
>>>>>> up
>>>>>> of
>>>>>> a list of record ids and would thus grow by a factor of 6 with the
>>>>>> current
>>>>>> patch.
>>>>>>
>>>>>> Locality is not so much of concern here. I would expect it to actually
>>>>>> improve as the patch gets rid of the 255 references limit of segments.
>>>>>> A
>>>>>> limit which in practical deployments leads to degeneration of segment
>>>>>> sizes
>>>>>> (I regularly see median sizes below 5k). See OAK-2896 for some
>>>>>> background
>>>>>> on
>>>>>> this.
>>>>>> Furthermore we already did a big step forward in improving locality in
>>>>>> concurrent write scenarios when we introduced the
>>>>>> SegmentBufferWriterPool.
>>>>>> In essence: thread affinity for segments.
>>>>>>
>>>>>> We should probably be more carefully looking at the micro benchmarks.
>>>>>> I
>>>>>> guess we neglected this part a bit in the past. Unfortunately CI
>>>>>> infrastructure isn't making this easy for us... OTOH those benchmarks
>>>>>> only
>>>>>> tell you so much. Many of the problems we recently faced only surfaced
>>>>>> in
>>>>>> the large: huge repos, high concurrent load, many days of traffic.
>>>>>>
>>>>>> Michael
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> On 23.7.16 12:34 , Jukka Zitting wrote:
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>> Cool! I'm pretty sure there are various ways in which the format
>>>>>>> could
>>>>>>> be
>>>>>>> improved, as the original design was based mostly on intuition,
>>>>>>> guided
>>>>>>> somewhat by collected stats
>>>>>>> <http://markmail.org/message/kxe3iy2hnodxsghe>
>>>>>>> and
>>>>>>> the micro-benchmarks <https://issues.apache.org/jira/browse/OAK-119>
>>>>>>> used
>>>>>>> to optimize common operations.
>>>>>>>
>>>>>>> Note though that the total size of the repository was not and
>>>>>>> probably
>>>>>>> shouldn't be a primary metric, since the size of a typical repository
>>>>>>> is
>>>>>>> governed mostly by binaries and string properties (though it's a good
>>>>>>> idea
>>>>>>> to make sure you avoid things like duplicates of large binaries).
>>>>>>> Instead
>>>>>>> the rationale for squeezing things like record ids to as few bytes as
>>>>>>> possible is captured in the principles listed in the original design
>>>>>>> doc
>>>>>>> <http://jackrabbit.apache.org/oak/docs/nodestore/segmentmk.html>:
>>>>>>>
>>>>>>>    - Compactness. The formatting of records is optimized for size to
>>>>>>> reduce
>>>>>>>    IO costs and to fit as much content in caches as possible. A node
>>>>>>> stored in
>>>>>>>    SegmentNodeStore typically consumes only a fraction of the size it
>>>>>>> would as
>>>>>>>    a bundle in Jackrabbit Classic.
>>>>>>>    - Locality. Segments are written so that related records, like a
>>>>>>> node
>>>>>>>    and its immediate children, usually end up stored in the same
>>>>>>> segment.
>>>>>>> This
>>>>>>>    makes tree traversals very fast and avoids most cache misses for
>>>>>>> typical
>>>>>>>    clients that access more than one related node per session.
>>>>>>>
>>>>>>> Thus I would recommend keeping an eye also on benchmark results in
>>>>>>> addition
>>>>>>> to raw repository size when evaluating possible improvements. Also,
>>>>>>> the
>>>>>>> number and size of data segments are good size metrics to look at in
>>>>>>> addition to total disk usage.
>>>>>>>
>>>>>>> BR,
>>>>>>>
>>>>>>> Jukka Zitting
>>>>>>>
>>>>>>> On Fri, Jul 22, 2016 at 5:55 AM Francesco Mari
>>>>>>> <[email protected]>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> The impact on repository size needs to be assessed with more
>>>>>>>> specific
>>>>>>>> tests. In particular, I found RecordUsageAnalyserTest and
>>>>>>>> SegmentSizeTest unsuitable to this task. It's not a coincidence that
>>>>>>>> these tests are usually the first to be disabled or blindly updated
>>>>>>>> every time a small fix changes the size of the records.
>>>>>>>>
>>>>>>>> Regarding GC, the segment graph could be computed during the mark
>>>>>>>> phase. Of course, it's handy to have this information pre-computed
>>>>>>>> for
>>>>>>>> you, but since the record graph is traversed anyway we could think
>>>>>>>> about dynamically reconstructing the segment graph when needed.
>>>>>>>>
>>>>>>>> There are still so many questions to answer, but I think that this
>>>>>>>> simplification exercise can be worth the effort.
>>>>>>>>
>>>>>>>> 2016-07-22 11:34 GMT+02:00 Michael Dürig <[email protected]>:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Hi,
>>>>>>>>>
>>>>>>>>> Neat! I would have expected a greater impact on the size of the
>>>>>>>>> segment
>>>>>>>>> store. But as you say it probably all depends on the binary/content
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> ratio. I
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> think we should look at the #references / repository size ratio for
>>>>>>>>> repositories of different structures and see how such a number
>>>>>>>>> differs
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> with
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> and without the patch.
>>>>>>>>>
>>>>>>>>> I like the patch as it fixes OAK-2896 while at the same time
>>>>>>>>> reducing
>>>>>>>>> complexity a lot.
>>>>>>>>>
>>>>>>>>> OTOH we need to figure out how to regain the lost functionality
>>>>>>>>> (e.g.
>>>>>>>>> gc)
>>>>>>>>> and asses its impact on repository size.
>>>>>>>>>
>>>>>>>>> Michael
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On 22.7.16 11:32 , Francesco Mari wrote:
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Hi,
>>>>>>>>>>
>>>>>>>>>> Yesterday I took some time for a little experiment: how many
>>>>>>>>>> optimisations can be removed from the current segment format while
>>>>>>>>>> maintaining the same functionality?
>>>>>>>>>>
>>>>>>>>>> I made some work in a branch on GitHub [1]. The code on that
>>>>>>>>>> branch
>>>>>>>>>> is
>>>>>>>>>> similar to the current trunk except for the following changes:
>>>>>>>>>>
>>>>>>>>>> 1. Record IDs are always serialised in their entirety. As such, a
>>>>>>>>>> serialised record ID occupies 18 bytes instead of 3.
>>>>>>>>>>
>>>>>>>>>> 2. Because of the previous change, the table of referenced segment
>>>>>>>>>> IDs
>>>>>>>>>> is not needed anymore, so I removed it from the segment header. It
>>>>>>>>>> turns out that this table is indeed needed for the mark phase of
>>>>>>>>>> compaction, so this feature is broken in that branch.
>>>>>>>>>>
>>>>>>>>>> Anyway, since the code is in a runnable state, I generated some
>>>>>>>>>> content using the current trunk and the dumber version of
>>>>>>>>>> oak-segment-tar. This is the repository created by the dumb
>>>>>>>>>> oak-segment-tar:
>>>>>>>>>>
>>>>>>>>>> 524744 data00000a.tar
>>>>>>>>>> 524584 data00001a.tar
>>>>>>>>>> 524688 data00002a.tar
>>>>>>>>>> 460896 data00003a.tar
>>>>>>>>>> 8 journal.log
>>>>>>>>>> 0 repo.lock
>>>>>>>>>>
>>>>>>>>>> This is the one created by the current trunk:
>>>>>>>>>>
>>>>>>>>>> 524864 data00000a.tar
>>>>>>>>>> 524656 data00001a.tar
>>>>>>>>>> 524792 data00002a.tar
>>>>>>>>>> 297288 data00003a.tar
>>>>>>>>>> 8 journal.log
>>>>>>>>>> 0 repo.lock
>>>>>>>>>>
>>>>>>>>>> The process that generates the content doesn't change between the
>>>>>>>>>> two
>>>>>>>>>> executions, and the generated content is coming from a real world
>>>>>>>>>> scenario. For those familiar with it, the content is generated by
>>>>>>>>>> an
>>>>>>>>>> installation of Adobe Experience Manager.
>>>>>>>>>>
>>>>>>>>>> It looks like that the size of the repository is not changing so
>>>>>>>>>> much.
>>>>>>>>>> Probably the de-optimisation in the small is dwarfed by the binary
>>>>>>>>>> content in the large. Another effect of my change is that there is
>>>>>>>>>> no
>>>>>>>>>> limit on the number of referenced segment IDs per segment, and
>>>>>>>>>> this
>>>>>>>>>> might allow segments to pack more records than before.
>>>>>>>>>>
>>>>>>>>>> Questions apart, the clear advantage of this change is a great
>>>>>>>>>> simplification of the code. I guess I can remove some lines more,
>>>>>>>>>> but
>>>>>>>>>> what I peeled off is already a considerable amount. Look at the
>>>>>>>>>> code!
>>>>>>>>>>
>>>>>>>>>> Francesco
>>>>>>>>>>
>>>>>>>>>> [1]: https://github.com/francescomari/jackrabbit-oak/tree/dumb
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>
>

Re: Are dumb segments dumb?

Reply via email to