bigreybear opened a new pull request, #11985:
URL: https://github.com/apache/iotdb/pull/11985
## Description
The recent deadlock issues are primarily attributed to the double-bucket
index design. Each thread maintains a local collection that organizes pages
into buckets based on available space, while the PageManager maintains another
gloabl set. When a write thread requires a page for writing/updating, it first
checks its local buckets and retrieves pages without locking(as they are
already locked). If no suitable page is found locally, the thread then accesses
the global buckets, where it must retrieve pages with a lock.
The issue arises when a page, which is appropriate for the expected size and
referred by the thread previously, is recorded in the global buckets but not in
the local ones. This results in the page being locked twice but unlocked only
once, leading to a deadlock. Specifically, if a page increases its available
space but is not checked and registered in the local buckets, a deadlock
becomes highly probable.
This revision made a comprehensive review of any modifications made to the
SchemaPage.spareSize field, which determines the page's suitability for the
expected space. All methods that modify this attribute are listed below.
Additionally, these methods are now accompanied by statements for recording
them in the appropriate buckets
SegmentedPage spare size write-value occurence
1. deleteSegment (handled by previous PR)
2. purgeSegment (handled by previous PR)
3. relocateSegment
called by:
3.1 SegmentedPage.write // covered when write within PageManager
3.2 SegmentedPage.update // covered when update within PageManager
4. reaarangeSegment
called by:
4.1 relocateSegment, trace to 3.
4.2 compactSegment // covered
compactSegment called by
4.2.1 allocNewSegment, only called by pre-allocate, now covered
with bucket sort
4.2.2 transplantSegment // covered previously
5. extendSegmentInPlace // only extend last segment so only decrease spare
size
6. registerNewSegment // only decrease
Furthermore, this PR also refactore SegmentedPage.write/update inteface so
that overflow will return a negative value rather than an exception.
### Content1 ...
### Content2 ...
### Content3 ...
<!--
In each section, please describe design decisions made, including:
- Choice of algorithms
- Behavioral aspects. What configuration values are acceptable? How are
corner cases and error
conditions handled, such as when there are insufficient resources?
- Class organization and design (how the logic is split between classes,
inheritance, composition,
design patterns)
- Method organization and design (how the logic is split between methods,
parameters and return types)
- Naming (class, method, API, configuration, HTTP endpoint, names of
emitted metrics)
-->
<!-- It's good to describe an alternative design (or mention an alternative
name) for every design
(or naming) decision point and compare the alternatives with the designs
that you've implemented
(or the names you've chosen) to highlight the advantages of the chosen
designs and names. -->
<!-- If there was a discussion of the design of the feature implemented in
this PR elsewhere
(e. g. a "Proposal" issue, any other issue, or a thread in the development
mailing list),
link to that discussion from this PR description and explain what have
changed in your final design
compared to your original proposal or the consensus version in the end of
the discussion.
If something hasn't changed since the original discussion, you can omit a
detailed discussion of
those aspects of the design here, perhaps apart from brief mentioning for
the sake of readability
of this PR description. -->
<!-- Some of the aspects mentioned above may be omitted for simple and small
changes. -->
<hr>
This PR has:
- [ ] been self-reviewed.
- [ ] concurrent read
- [ ] concurrent write
- [ ] concurrent read and write
- [ ] added documentation for new or modified features or behaviors.
- [ ] added Javadocs for most classes and all non-trivial methods.
- [ ] added or updated version, __license__, or notice information
- [ ] added comments explaining the "why" and the intent of the code
wherever would not be obvious
for an unfamiliar reader.
- [ ] added unit tests or modified existing tests to cover new code paths,
ensuring the threshold
for code coverage.
- [ ] added integration tests.
- [ ] been tested in a test IoTDB cluster.
<!-- Check the items by putting "x" in the brackets for the done things. Not
all of these items
apply to every PR. Remove the items which are not done or not relevant to
the PR. None of the items
from the checklist above are strictly necessary, but it would be very
helpful if you at least
self-review the PR. -->
<hr>
##### Key changed/added classes (or packages if there are too many classes)
in this PR
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]