Re: Are dumb segments dumb?

Francesco Mari Wed, 27 Jul 2016 01:48:07 -0700

The recent discovery in OAK-4604 shows that my POC suffers from the
same problem. I fixed it in my latest commit.


2016-07-26 11:47 GMT+02:00 Francesco Mari <[email protected]>:
> With my latest commits on this branch [1] I enabled every previously
> ignored test, fixing them when needed., The only two exceptions are
> RecordUsageAnalyserTest and SegmentSizeTest, that were simply deleted.
> I also added a couple of tests to cover the cases that work slightly
> differently than before.
>
> [1]: https://github.com/francescomari/jackrabbit-oak/tree/dumb
>
> 2016-07-25 17:48 GMT+02:00 Francesco Mari <[email protected]>:
>> It might be a variation in the process I tried. This shouldn't affect
>> much the statistics anyway, given that the population sample is big
>> enough in both cases.
>>
>> 2016-07-25 17:46 GMT+02:00 Michael Dürig <[email protected]>:
>>>
>>> Interesting numbers. Most of them look as I would have expected. I.e. the
>>> distributions in the dumb case are more regular (smaller std. dev, mean and
>>> median closer to each other), bigger segment sizes, etc.
>>>
>>> What I don't understand is the total number of records. These numbers differ
>>> greatly between current and dumb. Is this a test artefact (i.e. test not
>>> reproducible) or are we missing out on something.
>>>
>>> Michael
>>>
>>>
>>> On 25.7.16 4:01 , Francesco Mari wrote:
>>>>
>>>> I put together some statistics [1] for the process I described above.
>>>> The "dumb" variant requires more segments to store the same amount of
>>>> data, because of the increased size of serialised record IDs.  As you
>>>> can see the amount of records per segment is definitely lower in the
>>>> dumb variant.
>>>>
>>>> On the other hand, ignoring the growth of segment ID reference table
>>>> seems to be a good choice. As shown from the segment size average,
>>>> dumb segments are usually fuller that their counterpart. Moreover, a
>>>> lower standard deviation shows that it's more common to have full dumb
>>>> segments.
>>>>
>>>> In addition, my analysis seems to have found a bug too. There are a
>>>> lot of segments with no segment ID references and only one record,
>>>> which is very likely to be the segment info. The flush thread writes
>>>> every 5 seconds the current segment buffer, provided that the buffer
>>>> is not empty. It turns out that a segment buffer is never empty, since
>>>> it always contains at least one record. As such, we are currently
>>>> leaking almost empty segments every 5 seconds, that waste additional
>>>> space on disk because of the padding required by the TAR format.
>>>>
>>>> [1]:
>>>> https://docs.google.com/spreadsheets/d/1gXhmPsm4rDyHnle4TUh-mtB2HRtRyADXALARRFDh7z4/edit?usp=sharing
>>>>
>>>> 2016-07-25 10:05 GMT+02:00 Michael Dürig <[email protected]>:
>>>>>
>>>>>
>>>>> Hi Jukka,
>>>>>
>>>>> Thanks for sharing your perspective and the historical background.
>>>>>
>>>>> I agree that repository size shouldn't be a primary concern. However, we
>>>>> have seen many repositories (especially with an external data store)
>>>>> where
>>>>> the content is extremely fine granular. Much more than in an initial
>>>>> content
>>>>> installation of CQ (which I believe was one of the initial setup for
>>>>> collecting statistics). So we should at least understand the impact of
>>>>> the
>>>>> patch in various scenarios.
>>>>>
>>>>> My main concern is the cache footprint of node records. Those are made up
>>>>> of
>>>>> a list of record ids and would thus grow by a factor of 6 with the
>>>>> current
>>>>> patch.
>>>>>
>>>>> Locality is not so much of concern here. I would expect it to actually
>>>>> improve as the patch gets rid of the 255 references limit of segments. A
>>>>> limit which in practical deployments leads to degeneration of segment
>>>>> sizes
>>>>> (I regularly see median sizes below 5k). See OAK-2896 for some background
>>>>> on
>>>>> this.
>>>>> Furthermore we already did a big step forward in improving locality in
>>>>> concurrent write scenarios when we introduced the
>>>>> SegmentBufferWriterPool.
>>>>> In essence: thread affinity for segments.
>>>>>
>>>>> We should probably be more carefully looking at the micro benchmarks. I
>>>>> guess we neglected this part a bit in the past. Unfortunately CI
>>>>> infrastructure isn't making this easy for us... OTOH those benchmarks
>>>>> only
>>>>> tell you so much. Many of the problems we recently faced only surfaced in
>>>>> the large: huge repos, high concurrent load, many days of traffic.
>>>>>
>>>>> Michael
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On 23.7.16 12:34 , Jukka Zitting wrote:
>>>>>>
>>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> Cool! I'm pretty sure there are various ways in which the format could
>>>>>> be
>>>>>> improved, as the original design was based mostly on intuition, guided
>>>>>> somewhat by collected stats
>>>>>> <http://markmail.org/message/kxe3iy2hnodxsghe>
>>>>>> and
>>>>>> the micro-benchmarks <https://issues.apache.org/jira/browse/OAK-119>
>>>>>> used
>>>>>> to optimize common operations.
>>>>>>
>>>>>> Note though that the total size of the repository was not and probably
>>>>>> shouldn't be a primary metric, since the size of a typical repository is
>>>>>> governed mostly by binaries and string properties (though it's a good
>>>>>> idea
>>>>>> to make sure you avoid things like duplicates of large binaries).
>>>>>> Instead
>>>>>> the rationale for squeezing things like record ids to as few bytes as
>>>>>> possible is captured in the principles listed in the original design doc
>>>>>> <http://jackrabbit.apache.org/oak/docs/nodestore/segmentmk.html>:
>>>>>>
>>>>>>    - Compactness. The formatting of records is optimized for size to
>>>>>> reduce
>>>>>>    IO costs and to fit as much content in caches as possible. A node
>>>>>> stored in
>>>>>>    SegmentNodeStore typically consumes only a fraction of the size it
>>>>>> would as
>>>>>>    a bundle in Jackrabbit Classic.
>>>>>>    - Locality. Segments are written so that related records, like a node
>>>>>>    and its immediate children, usually end up stored in the same
>>>>>> segment.
>>>>>> This
>>>>>>    makes tree traversals very fast and avoids most cache misses for
>>>>>> typical
>>>>>>    clients that access more than one related node per session.
>>>>>>
>>>>>> Thus I would recommend keeping an eye also on benchmark results in
>>>>>> addition
>>>>>> to raw repository size when evaluating possible improvements. Also, the
>>>>>> number and size of data segments are good size metrics to look at in
>>>>>> addition to total disk usage.
>>>>>>
>>>>>> BR,
>>>>>>
>>>>>> Jukka Zitting
>>>>>>
>>>>>> On Fri, Jul 22, 2016 at 5:55 AM Francesco Mari
>>>>>> <[email protected]>
>>>>>> wrote:
>>>>>>
>>>>>>> The impact on repository size needs to be assessed with more specific
>>>>>>> tests. In particular, I found RecordUsageAnalyserTest and
>>>>>>> SegmentSizeTest unsuitable to this task. It's not a coincidence that
>>>>>>> these tests are usually the first to be disabled or blindly updated
>>>>>>> every time a small fix changes the size of the records.
>>>>>>>
>>>>>>> Regarding GC, the segment graph could be computed during the mark
>>>>>>> phase. Of course, it's handy to have this information pre-computed for
>>>>>>> you, but since the record graph is traversed anyway we could think
>>>>>>> about dynamically reconstructing the segment graph when needed.
>>>>>>>
>>>>>>> There are still so many questions to answer, but I think that this
>>>>>>> simplification exercise can be worth the effort.
>>>>>>>
>>>>>>> 2016-07-22 11:34 GMT+02:00 Michael Dürig <[email protected]>:
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> Neat! I would have expected a greater impact on the size of the
>>>>>>>> segment
>>>>>>>> store. But as you say it probably all depends on the binary/content
>>>>>>>
>>>>>>>
>>>>>>> ratio. I
>>>>>>>>
>>>>>>>>
>>>>>>>> think we should look at the #references / repository size ratio for
>>>>>>>> repositories of different structures and see how such a number differs
>>>>>>>
>>>>>>>
>>>>>>> with
>>>>>>>>
>>>>>>>>
>>>>>>>> and without the patch.
>>>>>>>>
>>>>>>>> I like the patch as it fixes OAK-2896 while at the same time reducing
>>>>>>>> complexity a lot.
>>>>>>>>
>>>>>>>> OTOH we need to figure out how to regain the lost functionality (e.g.
>>>>>>>> gc)
>>>>>>>> and asses its impact on repository size.
>>>>>>>>
>>>>>>>> Michael
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On 22.7.16 11:32 , Francesco Mari wrote:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Hi,
>>>>>>>>>
>>>>>>>>> Yesterday I took some time for a little experiment: how many
>>>>>>>>> optimisations can be removed from the current segment format while
>>>>>>>>> maintaining the same functionality?
>>>>>>>>>
>>>>>>>>> I made some work in a branch on GitHub [1]. The code on that branch
>>>>>>>>> is
>>>>>>>>> similar to the current trunk except for the following changes:
>>>>>>>>>
>>>>>>>>> 1. Record IDs are always serialised in their entirety. As such, a
>>>>>>>>> serialised record ID occupies 18 bytes instead of 3.
>>>>>>>>>
>>>>>>>>> 2. Because of the previous change, the table of referenced segment
>>>>>>>>> IDs
>>>>>>>>> is not needed anymore, so I removed it from the segment header. It
>>>>>>>>> turns out that this table is indeed needed for the mark phase of
>>>>>>>>> compaction, so this feature is broken in that branch.
>>>>>>>>>
>>>>>>>>> Anyway, since the code is in a runnable state, I generated some
>>>>>>>>> content using the current trunk and the dumber version of
>>>>>>>>> oak-segment-tar. This is the repository created by the dumb
>>>>>>>>> oak-segment-tar:
>>>>>>>>>
>>>>>>>>> 524744 data00000a.tar
>>>>>>>>> 524584 data00001a.tar
>>>>>>>>> 524688 data00002a.tar
>>>>>>>>> 460896 data00003a.tar
>>>>>>>>> 8 journal.log
>>>>>>>>> 0 repo.lock
>>>>>>>>>
>>>>>>>>> This is the one created by the current trunk:
>>>>>>>>>
>>>>>>>>> 524864 data00000a.tar
>>>>>>>>> 524656 data00001a.tar
>>>>>>>>> 524792 data00002a.tar
>>>>>>>>> 297288 data00003a.tar
>>>>>>>>> 8 journal.log
>>>>>>>>> 0 repo.lock
>>>>>>>>>
>>>>>>>>> The process that generates the content doesn't change between the two
>>>>>>>>> executions, and the generated content is coming from a real world
>>>>>>>>> scenario. For those familiar with it, the content is generated by an
>>>>>>>>> installation of Adobe Experience Manager.
>>>>>>>>>
>>>>>>>>> It looks like that the size of the repository is not changing so
>>>>>>>>> much.
>>>>>>>>> Probably the de-optimisation in the small is dwarfed by the binary
>>>>>>>>> content in the large. Another effect of my change is that there is no
>>>>>>>>> limit on the number of referenced segment IDs per segment, and this
>>>>>>>>> might allow segments to pack more records than before.
>>>>>>>>>
>>>>>>>>> Questions apart, the clear advantage of this change is a great
>>>>>>>>> simplification of the code. I guess I can remove some lines more, but
>>>>>>>>> what I peeled off is already a considerable amount. Look at the code!
>>>>>>>>>
>>>>>>>>> Francesco
>>>>>>>>>
>>>>>>>>> [1]: https://github.com/francescomari/jackrabbit-oak/tree/dumb
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>

Re: Are dumb segments dumb?

Reply via email to