[ 
https://issues.apache.org/jira/browse/KUDU-3291?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Wong updated KUDU-3291:
------------------------------
    Code Review: https://gerrit.cloudera.org/c/17547

> Crash when performing a diff scan after delta flush races with a batch of ops 
> that update the same row
> ------------------------------------------------------------------------------------------------------
>
>                 Key: KUDU-3291
>                 URL: https://issues.apache.org/jira/browse/KUDU-3291
>             Project: Kudu
>          Issue Type: Bug
>    Affects Versions: 1.10.0, 1.10.1, 1.11.0, 1.12.0, 1.11.1, 1.13.0, 1.14.0
>            Reporter: Andrew Wong
>            Assignee: Andrew Wong
>            Priority: Critical
>
> It's possible to run into the following crash:
> {code:java}
> F0604 23:20:50.032124 35483072 delta_store.h:153] Check failed: 
> a.delta_store_id == b.delta_store_id (4445773336 vs. 4445771896)
> *** Check failure stack trace: ***
> *** Aborted at 1622874050 (unix time) try "date -d @1622874050" if you are 
> using GNU date ***
> PC: @     0x7fff724b033a __pthread_kill
> *** SIGABRT (@0x7fff724b033a) received by PID 69138 (TID 0x1021d6dc0) stack 
> trace: ***
>     @     0x7fff725615fd _sigtramp
>     @     0x7ffeef948568 (unknown)
>     @     0x7fff72437808 abort
>     @        0x107920599 google::logging_fail()
>     @        0x10791f4cf google::LogMessage::SendToLog()
>     @        0x10791fb95 google::LogMessage::Flush()
>     @        0x107923c9f google::LogMessageFatal::~LogMessageFatal()
>     @        0x107920b29 google::LogMessageFatal::~LogMessageFatal()
>     @        0x1009ae07e 
> kudu::tablet::SelectedDeltas::DeltaLessThanFunctor::operator()()
>     @        0x1009aa561 std::__1::max<>()
>     @        0x10099c740 kudu::tablet::SelectedDeltas::ProcessDelta()
>     @        0x10099e719 kudu::tablet::SelectedDeltas::MergeFrom()
>     @        0x1009a2b30 kudu::tablet::DeltaPreparer<>::SelectDeltas()
>     @        0x10094a545 kudu::tablet::DeltaFileIterator<>::SelectDeltas()
>     @        0x10098b10c kudu::tablet::DeltaIteratorMerger::SelectDeltas()
>     @        0x10097133f 
> kudu::tablet::DeltaApplier::InitializeSelectionVector()
>     @        0x1056df4fb kudu::MaterializingIterator::MaterializeBlock()
>     @        0x1056df2d8 kudu::MaterializingIterator::NextBlock()
>     @        0x1056d1c5b kudu::MergeIterState::PullNextBlock()
>     @        0x1056d5e62 kudu::MergeIterator::RefillHotHeap()
>     @        0x1056d4f0b kudu::MergeIterator::Init()
>     @        0x1006a413d kudu::tablet::Tablet::Iterator::Init()
>     @        0x1002cb3b9 
> kudu::tablet::DiffScanTest_TestDiffScanAfterDeltaFlush_Test::TestBody()
>     @        0x1005f1b88 
> testing::internal::HandleExceptionsInMethodIfSupported<>()
>     @        0x1005f1add testing::Test::Run()
>     @        0x1005f2dd0 testing::TestInfo::Run()
>     @        0x1005f3807 testing::TestSuite::Run()
>     @        0x100601b57 testing::internal::UnitTestImpl::RunAllTests()
>     @        0x100601418 
> testing::internal::HandleExceptionsInMethodIfSupported<>()
>     @        0x10060139c testing::UnitTest::Run()
>     @        0x100476201 RUN_ALL_TESTS()
>     @        0x100475fa8 main
> {code}
> The [crash 
> line|https://github.com/apache/kudu/blob/e574903ace741a531c49aba15f97e856ea80ca4b/src/kudu/tablet/delta_store.h#L149]
>  assumes that all deltas for a given row that have the same timestamp belong 
> in the same delta store, and it uses this assumption to order the deltas in a 
> diff scan.
> However, this is not true because, unlike the case for MRS flushes, we don't 
> wait for all ops to finish applying before flushing the DMS. This means that 
> a batch containing multiple updates to the same row may be spread across 
> multiple DMSs if we delta flush while the batch of updates is being applied.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to