I opened OAK-4596 to track the segment leak.

2016-07-25 16:01 GMT+02:00 Francesco Mari <[email protected]>:
> I put together some statistics [1] for the process I described above.
> The "dumb" variant requires more segments to store the same amount of
> data, because of the increased size of serialised record IDs.  As you
> can see the amount of records per segment is definitely lower in the
> dumb variant.
>
> On the other hand, ignoring the growth of segment ID reference table
> seems to be a good choice. As shown from the segment size average,
> dumb segments are usually fuller that their counterpart. Moreover, a
> lower standard deviation shows that it's more common to have full dumb
> segments.
>
> In addition, my analysis seems to have found a bug too. There are a
> lot of segments with no segment ID references and only one record,
> which is very likely to be the segment info. The flush thread writes
> every 5 seconds the current segment buffer, provided that the buffer
> is not empty. It turns out that a segment buffer is never empty, since
> it always contains at least one record. As such, we are currently
> leaking almost empty segments every 5 seconds, that waste additional
> space on disk because of the padding required by the TAR format.
>
> [1]: 
> https://docs.google.com/spreadsheets/d/1gXhmPsm4rDyHnle4TUh-mtB2HRtRyADXALARRFDh7z4/edit?usp=sharing
>
> 2016-07-25 10:05 GMT+02:00 Michael Dürig <[email protected]>:
>>
>> Hi Jukka,
>>
>> Thanks for sharing your perspective and the historical background.
>>
>> I agree that repository size shouldn't be a primary concern. However, we
>> have seen many repositories (especially with an external data store) where
>> the content is extremely fine granular. Much more than in an initial content
>> installation of CQ (which I believe was one of the initial setup for
>> collecting statistics). So we should at least understand the impact of the
>> patch in various scenarios.
>>
>> My main concern is the cache footprint of node records. Those are made up of
>> a list of record ids and would thus grow by a factor of 6 with the current
>> patch.
>>
>> Locality is not so much of concern here. I would expect it to actually
>> improve as the patch gets rid of the 255 references limit of segments. A
>> limit which in practical deployments leads to degeneration of segment sizes
>> (I regularly see median sizes below 5k). See OAK-2896 for some background on
>> this.
>> Furthermore we already did a big step forward in improving locality in
>> concurrent write scenarios when we introduced the SegmentBufferWriterPool.
>> In essence: thread affinity for segments.
>>
>> We should probably be more carefully looking at the micro benchmarks. I
>> guess we neglected this part a bit in the past. Unfortunately CI
>> infrastructure isn't making this easy for us... OTOH those benchmarks only
>> tell you so much. Many of the problems we recently faced only surfaced in
>> the large: huge repos, high concurrent load, many days of traffic.
>>
>> Michael
>>
>>
>>
>>
>>
>> On 23.7.16 12:34 , Jukka Zitting wrote:
>>>
>>> Hi,
>>>
>>> Cool! I'm pretty sure there are various ways in which the format could be
>>> improved, as the original design was based mostly on intuition, guided
>>> somewhat by collected stats <http://markmail.org/message/kxe3iy2hnodxsghe>
>>> and
>>> the micro-benchmarks <https://issues.apache.org/jira/browse/OAK-119> used
>>> to optimize common operations.
>>>
>>> Note though that the total size of the repository was not and probably
>>> shouldn't be a primary metric, since the size of a typical repository is
>>> governed mostly by binaries and string properties (though it's a good idea
>>> to make sure you avoid things like duplicates of large binaries). Instead
>>> the rationale for squeezing things like record ids to as few bytes as
>>> possible is captured in the principles listed in the original design doc
>>> <http://jackrabbit.apache.org/oak/docs/nodestore/segmentmk.html>:
>>>
>>>    - Compactness. The formatting of records is optimized for size to
>>> reduce
>>>    IO costs and to fit as much content in caches as possible. A node
>>> stored in
>>>    SegmentNodeStore typically consumes only a fraction of the size it
>>> would as
>>>    a bundle in Jackrabbit Classic.
>>>    - Locality. Segments are written so that related records, like a node
>>>    and its immediate children, usually end up stored in the same segment.
>>> This
>>>    makes tree traversals very fast and avoids most cache misses for
>>> typical
>>>    clients that access more than one related node per session.
>>>
>>> Thus I would recommend keeping an eye also on benchmark results in
>>> addition
>>> to raw repository size when evaluating possible improvements. Also, the
>>> number and size of data segments are good size metrics to look at in
>>> addition to total disk usage.
>>>
>>> BR,
>>>
>>> Jukka Zitting
>>>
>>> On Fri, Jul 22, 2016 at 5:55 AM Francesco Mari <[email protected]>
>>> wrote:
>>>
>>>> The impact on repository size needs to be assessed with more specific
>>>> tests. In particular, I found RecordUsageAnalyserTest and
>>>> SegmentSizeTest unsuitable to this task. It's not a coincidence that
>>>> these tests are usually the first to be disabled or blindly updated
>>>> every time a small fix changes the size of the records.
>>>>
>>>> Regarding GC, the segment graph could be computed during the mark
>>>> phase. Of course, it's handy to have this information pre-computed for
>>>> you, but since the record graph is traversed anyway we could think
>>>> about dynamically reconstructing the segment graph when needed.
>>>>
>>>> There are still so many questions to answer, but I think that this
>>>> simplification exercise can be worth the effort.
>>>>
>>>> 2016-07-22 11:34 GMT+02:00 Michael Dürig <[email protected]>:
>>>>>
>>>>>
>>>>> Hi,
>>>>>
>>>>> Neat! I would have expected a greater impact on the size of the segment
>>>>> store. But as you say it probably all depends on the binary/content
>>>>
>>>> ratio. I
>>>>>
>>>>> think we should look at the #references / repository size ratio for
>>>>> repositories of different structures and see how such a number differs
>>>>
>>>> with
>>>>>
>>>>> and without the patch.
>>>>>
>>>>> I like the patch as it fixes OAK-2896 while at the same time reducing
>>>>> complexity a lot.
>>>>>
>>>>> OTOH we need to figure out how to regain the lost functionality (e.g.
>>>>> gc)
>>>>> and asses its impact on repository size.
>>>>>
>>>>> Michael
>>>>>
>>>>>
>>>>>
>>>>> On 22.7.16 11:32 , Francesco Mari wrote:
>>>>>>
>>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> Yesterday I took some time for a little experiment: how many
>>>>>> optimisations can be removed from the current segment format while
>>>>>> maintaining the same functionality?
>>>>>>
>>>>>> I made some work in a branch on GitHub [1]. The code on that branch is
>>>>>> similar to the current trunk except for the following changes:
>>>>>>
>>>>>> 1. Record IDs are always serialised in their entirety. As such, a
>>>>>> serialised record ID occupies 18 bytes instead of 3.
>>>>>>
>>>>>> 2. Because of the previous change, the table of referenced segment IDs
>>>>>> is not needed anymore, so I removed it from the segment header. It
>>>>>> turns out that this table is indeed needed for the mark phase of
>>>>>> compaction, so this feature is broken in that branch.
>>>>>>
>>>>>> Anyway, since the code is in a runnable state, I generated some
>>>>>> content using the current trunk and the dumber version of
>>>>>> oak-segment-tar. This is the repository created by the dumb
>>>>>> oak-segment-tar:
>>>>>>
>>>>>> 524744 data00000a.tar
>>>>>> 524584 data00001a.tar
>>>>>> 524688 data00002a.tar
>>>>>> 460896 data00003a.tar
>>>>>> 8 journal.log
>>>>>> 0 repo.lock
>>>>>>
>>>>>> This is the one created by the current trunk:
>>>>>>
>>>>>> 524864 data00000a.tar
>>>>>> 524656 data00001a.tar
>>>>>> 524792 data00002a.tar
>>>>>> 297288 data00003a.tar
>>>>>> 8 journal.log
>>>>>> 0 repo.lock
>>>>>>
>>>>>> The process that generates the content doesn't change between the two
>>>>>> executions, and the generated content is coming from a real world
>>>>>> scenario. For those familiar with it, the content is generated by an
>>>>>> installation of Adobe Experience Manager.
>>>>>>
>>>>>> It looks like that the size of the repository is not changing so much.
>>>>>> Probably the de-optimisation in the small is dwarfed by the binary
>>>>>> content in the large. Another effect of my change is that there is no
>>>>>> limit on the number of referenced segment IDs per segment, and this
>>>>>> might allow segments to pack more records than before.
>>>>>>
>>>>>> Questions apart, the clear advantage of this change is a great
>>>>>> simplification of the code. I guess I can remove some lines more, but
>>>>>> what I peeled off is already a considerable amount. Look at the code!
>>>>>>
>>>>>> Francesco
>>>>>>
>>>>>> [1]: https://github.com/francescomari/jackrabbit-oak/tree/dumb
>>>>>>
>>>>>
>>>>
>>>
>>

Reply via email to