[kudu-CR] [tablet] Fixed the bug of DeltaTracker::CountDeletedRows

2019-08-21 Thread Adar Dembo (Code Review)
Adar Dembo has submitted this change and it was merged. ( 
http://gerrit.cloudera.org:8080/14061 )

Change subject: [tablet] Fixed the bug of DeltaTracker::CountDeletedRows
..

[tablet] Fixed the bug of DeltaTracker::CountDeletedRows

When Tablet.CountLiveRows was called in a multi-thread case, there's a
chance we'll see the following failure.

User stack:
F0814 12:05:51.975797 96375 diskrowset.cc:759] Check failed: *count >= 0 (-3 
vs. 0)
*** Check failure stack trace: ***
*** Aborted at 156571 (unix time) try "date -d @156571" if you are 
using GNU date ***
PC: @ 0x7f9bd20425f7 __GI_raise
*** SIGABRT (@0x70900017872) received by PID 96370 (TID 0x7f9bce2d7700) from 
PID 96370; stack trace: ***
@ 0x7f9bdaff6100 (unknown)
@ 0x7f9bd20425f7 __GI_raise
@ 0x7f9bd2043ce8 __GI_abort
@ 0x7f9bd4540c99 google::logging_fail()
@ 0x7f9bd454246d google::LogMessage::Fail()
@ 0x7f9bd45443c3 google::LogMessage::SendToLog()
@ 0x7f9bd4541fc9 google::LogMessage::Flush()
@ 0x7f9bd4544d4f google::LogMessageFatal::~LogMessageFatal()
@ 0x7f9bddc9aabe kudu::tablet::DiskRowSet::CountLiveRows()
@ 0x7f9bddbdeb79 kudu::tablet::Tablet::CountLiveRows()
@   0x49891f 
kudu::tablet::MultiThreadedTabletTest<>::CollectStatisticsThread()
@   0x4ae34b boost::_mfi::mf1<>::operator()()
@   0x4add25 boost::_bi::list2<>::operator()<>()
@   0x4acfe9 boost::_bi::bind_t<>::operator()()
@   0x4ac8a6 
boost::detail::function::void_function_obj_invoker0<>::invoke()
@ 0x7f9bd7116492 boost::function0<>::operator()()
@ 0x7f9bd62e5324 kudu::Thread::SuperviseThread()
@ 0x7f9bdafeedc5 start_thread
@ 0x7f9bd2103ced __clone

This is because there is DeltaTracker lack of lock protection when modify
the number of live rows in rowset_metadata_ and reset the deleted_row_count_.
This caused deleted_row_count_ to be duplicated when calculating the number
of live rows of DRS. Consider the following sequence:
| T1| T2
|-- |--
|+ In DT::Flush |
|  Take compact_flush_lock_ (excl)  |
|  Take component_lock_ (excl)  |
|  deleted_row_count_ = ... |
|  Release component_lock_  |
|  + In DT::FlushDMS|
|Call RSMD::IncrementLiveRows   |
|--> RSMD::live_row_count - deleted_row_count_
|   |+ In DRS::CountLiveRows
|   |  Take component_lock_ (shared)
|   |  Call RSMD::live_row_count - 
DT::CountDeletedRows
|   |  --> RSMD::live_row_count - 
deleted_row_count_
|   |  --> we double counted deleted_row_count_ 
!!!
|  Take component_lock_ (excl)  |
|  deleted_row_count_ = 0   |
|  Release component_lock_  |
|  Release compact_flush_lock_  |

Change-Id: I9bb4456123087778c9dc799777c5990938a84fdf
Reviewed-on: http://gerrit.cloudera.org:8080/14061
Reviewed-by: Adar Dembo 
Tested-by: Adar Dembo 
---
M src/kudu/integration-tests/raft_consensus-itest.cc
M src/kudu/integration-tests/test_workload.cc
M src/kudu/integration-tests/test_workload.h
M src/kudu/tablet/delta_tracker.cc
M src/kudu/tablet/delta_tracker.h
M src/kudu/tablet/diskrowset.cc
M src/kudu/tablet/metadata-test.cc
M src/kudu/tablet/mt-tablet-test.cc
M src/kudu/tablet/rowset_metadata.cc
M src/kudu/tablet/rowset_metadata.h
10 files changed, 177 insertions(+), 80 deletions(-)

Approvals:
  Adar Dembo: Looks good to me, approved; Verified

--
To view, visit http://gerrit.cloudera.org:8080/14061
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: I9bb4456123087778c9dc799777c5990938a84fdf
Gerrit-Change-Number: 14061
Gerrit-PatchSet: 11
Gerrit-Owner: Yao Xu 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Reviewer: Yao Xu 
Gerrit-Reviewer: helifu 


[kudu-CR] [tablet] Fixed the bug of DeltaTracker::CountDeletedRows

2019-08-21 Thread Adar Dembo (Code Review)
Adar Dembo has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/14061 )

Change subject: [tablet] Fixed the bug of DeltaTracker::CountDeletedRows
..


Patch Set 10: Verified+1 Code-Review+2

Overriding Jenkins, failure seems unrelated to this fix.


--
To view, visit http://gerrit.cloudera.org:8080/14061
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9bb4456123087778c9dc799777c5990938a84fdf
Gerrit-Change-Number: 14061
Gerrit-PatchSet: 10
Gerrit-Owner: Yao Xu 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Reviewer: Yao Xu 
Gerrit-Reviewer: helifu 
Gerrit-Comment-Date: Wed, 21 Aug 2019 17:59:08 +
Gerrit-HasComments: No


[kudu-CR] [tablet] Fixed the bug of DeltaTracker::CountDeletedRows

2019-08-21 Thread Adar Dembo (Code Review)
Adar Dembo has removed a vote on this change.

Change subject: [tablet] Fixed the bug of DeltaTracker::CountDeletedRows
..


Removed Verified-1 by Kudu Jenkins (120)
--
To view, visit http://gerrit.cloudera.org:8080/14061
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: deleteVote
Gerrit-Change-Id: I9bb4456123087778c9dc799777c5990938a84fdf
Gerrit-Change-Number: 14061
Gerrit-PatchSet: 10
Gerrit-Owner: Yao Xu 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Reviewer: Yao Xu 
Gerrit-Reviewer: helifu 


[kudu-CR] [tablet] Fixed the bug of DeltaTracker::CountDeletedRows

2019-08-20 Thread Yao Xu (Code Review)
Yao Xu has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/14061 )

Change subject: [tablet] Fixed the bug of DeltaTracker::CountDeletedRows
..


Patch Set 10:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/14061/9/src/kudu/integration-tests/test_workload.cc
File src/kudu/integration-tests/test_workload.cc:

http://gerrit.cloudera.org:8080/#/c/14061/9/src/kudu/integration-tests/test_workload.cc@239
PS9, Line 239:
> Nit: add a using for this.
Done



--
To view, visit http://gerrit.cloudera.org:8080/14061
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9bb4456123087778c9dc799777c5990938a84fdf
Gerrit-Change-Number: 14061
Gerrit-PatchSet: 10
Gerrit-Owner: Yao Xu 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Reviewer: Yao Xu 
Gerrit-Reviewer: helifu 
Gerrit-Comment-Date: Wed, 21 Aug 2019 05:19:10 +
Gerrit-HasComments: Yes


[kudu-CR] [tablet] Fixed the bug of DeltaTracker::CountDeletedRows

2019-08-20 Thread Yao Xu (Code Review)
Hello Tidy Bot, Kudu Jenkins, helifu, Adar Dembo,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/14061

to look at the new patch set (#10).

Change subject: [tablet] Fixed the bug of DeltaTracker::CountDeletedRows
..

[tablet] Fixed the bug of DeltaTracker::CountDeletedRows

When Tablet.CountLiveRows was called in a multi-thread case, there's a
chance we'll see the following failure.

User stack:
F0814 12:05:51.975797 96375 diskrowset.cc:759] Check failed: *count >= 0 (-3 
vs. 0)
*** Check failure stack trace: ***
*** Aborted at 156571 (unix time) try "date -d @156571" if you are 
using GNU date ***
PC: @ 0x7f9bd20425f7 __GI_raise
*** SIGABRT (@0x70900017872) received by PID 96370 (TID 0x7f9bce2d7700) from 
PID 96370; stack trace: ***
@ 0x7f9bdaff6100 (unknown)
@ 0x7f9bd20425f7 __GI_raise
@ 0x7f9bd2043ce8 __GI_abort
@ 0x7f9bd4540c99 google::logging_fail()
@ 0x7f9bd454246d google::LogMessage::Fail()
@ 0x7f9bd45443c3 google::LogMessage::SendToLog()
@ 0x7f9bd4541fc9 google::LogMessage::Flush()
@ 0x7f9bd4544d4f google::LogMessageFatal::~LogMessageFatal()
@ 0x7f9bddc9aabe kudu::tablet::DiskRowSet::CountLiveRows()
@ 0x7f9bddbdeb79 kudu::tablet::Tablet::CountLiveRows()
@   0x49891f 
kudu::tablet::MultiThreadedTabletTest<>::CollectStatisticsThread()
@   0x4ae34b boost::_mfi::mf1<>::operator()()
@   0x4add25 boost::_bi::list2<>::operator()<>()
@   0x4acfe9 boost::_bi::bind_t<>::operator()()
@   0x4ac8a6 
boost::detail::function::void_function_obj_invoker0<>::invoke()
@ 0x7f9bd7116492 boost::function0<>::operator()()
@ 0x7f9bd62e5324 kudu::Thread::SuperviseThread()
@ 0x7f9bdafeedc5 start_thread
@ 0x7f9bd2103ced __clone

This is because there is DeltaTracker lack of lock protection when modify
the number of live rows in rowset_metadata_ and reset the deleted_row_count_.
This caused deleted_row_count_ to be duplicated when calculating the number
of live rows of DRS. Consider the following sequence:
| T1| T2
|-- |--
|+ In DT::Flush |
|  Take compact_flush_lock_ (excl)  |
|  Take component_lock_ (excl)  |
|  deleted_row_count_ = ... |
|  Release component_lock_  |
|  + In DT::FlushDMS|
|Call RSMD::IncrementLiveRows   |
|--> RSMD::live_row_count - deleted_row_count_
|   |+ In DRS::CountLiveRows
|   |  Take component_lock_ (shared)
|   |  Call RSMD::live_row_count - 
DT::CountDeletedRows
|   |  --> RSMD::live_row_count - 
deleted_row_count_
|   |  --> we double counted deleted_row_count_ 
!!!
|  Take component_lock_ (excl)  |
|  deleted_row_count_ = 0   |
|  Release component_lock_  |
|  Release compact_flush_lock_  |

Change-Id: I9bb4456123087778c9dc799777c5990938a84fdf
---
M src/kudu/integration-tests/raft_consensus-itest.cc
M src/kudu/integration-tests/test_workload.cc
M src/kudu/integration-tests/test_workload.h
M src/kudu/tablet/delta_tracker.cc
M src/kudu/tablet/delta_tracker.h
M src/kudu/tablet/diskrowset.cc
M src/kudu/tablet/metadata-test.cc
M src/kudu/tablet/mt-tablet-test.cc
M src/kudu/tablet/rowset_metadata.cc
M src/kudu/tablet/rowset_metadata.h
10 files changed, 177 insertions(+), 80 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/61/14061/10
--
To view, visit http://gerrit.cloudera.org:8080/14061
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I9bb4456123087778c9dc799777c5990938a84fdf
Gerrit-Change-Number: 14061
Gerrit-PatchSet: 10
Gerrit-Owner: Yao Xu 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Reviewer: Yao Xu 
Gerrit-Reviewer: helifu 


[kudu-CR] [tablet] Fixed the bug of DeltaTracker::CountDeletedRows

2019-08-20 Thread Adar Dembo (Code Review)
Adar Dembo has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/14061 )

Change subject: [tablet] Fixed the bug of DeltaTracker::CountDeletedRows
..


Patch Set 9:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/14061/9/src/kudu/integration-tests/test_workload.cc
File src/kudu/integration-tests/test_workload.cc:

http://gerrit.cloudera.org:8080/#/c/14061/9/src/kudu/integration-tests/test_workload.cc@239
PS9, Line 239: std::
Nit: add a using for this.



--
To view, visit http://gerrit.cloudera.org:8080/14061
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9bb4456123087778c9dc799777c5990938a84fdf
Gerrit-Change-Number: 14061
Gerrit-PatchSet: 9
Gerrit-Owner: Yao Xu 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Reviewer: Yao Xu 
Gerrit-Reviewer: helifu 
Gerrit-Comment-Date: Tue, 20 Aug 2019 16:30:47 +
Gerrit-HasComments: Yes


[kudu-CR] [tablet] Fixed the bug of DeltaTracker::CountDeletedRows

2019-08-20 Thread Yao Xu (Code Review)
Yao Xu has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/14061 )

Change subject: [tablet] Fixed the bug of DeltaTracker::CountDeletedRows
..


Patch Set 9:

(9 comments)

http://gerrit.cloudera.org:8080/#/c/14061/4/src/kudu/integration-tests/raft_consensus-itest.cc
File src/kudu/integration-tests/raft_consensus-itest.cc:

http://gerrit.cloudera.org:8080/#/c/14061/4/src/kudu/integration-tests/raft_consensus-itest.cc@969
PS4, Line 969:   // Otherwise, test deleting data on MRS.
 :   int32_t write_interval_millis = 0
> Why are these necessary? Please doc.
Done


http://gerrit.cloudera.org:8080/#/c/14061/4/src/kudu/integration-tests/raft_consensus-itest.cc@3139
PS4, Line 3139: WriteRequestPB w_req;
> Nit: did you mean to add this empty line at the end?
a slip of a pen


http://gerrit.cloudera.org:8080/#/c/14061/4/src/kudu/integration-tests/test_workload.h
File src/kudu/integration-tests/test_workload.h:

http://gerrit.cloudera.org:8080/#/c/14061/4/src/kudu/integration-tests/test_workload.h@152
PS4, Line 152:   enum WritePattern {
> Nit: "Insert random rows, then delete them."
Done


http://gerrit.cloudera.org:8080/#/c/14061/4/src/kudu/integration-tests/test_workload.cc
File src/kudu/integration-tests/test_workload.cc:

http://gerrit.cloudera.org:8080/#/c/14061/4/src/kudu/integration-tests/test_workload.cc@140
PS4, Line 140:   while (should_run_.Load()) {
> Since you have to pass 'this', how about you make this a named member funct
Done


http://gerrit.cloudera.org:8080/#/c/14061/4/src/kudu/integration-tests/test_workload.cc@168
PS4, Line 168:   // Note: overridi
> Nit: the previous style ('KuduPartialRow*' ) is more correct.
Done


http://gerrit.cloudera.org:8080/#/c/14061/4/src/kudu/integration-tests/test_workload.cc@237
PS4, Line 237:
 : size_t 
TestWorkload::GetNumberOfErrors(kudu::client::KuduSession* session) {
> Should note that when INSERT_RANDOM_ROWS_WITH_DELETE is used, ReadThread do
Done


http://gerrit.cloudera.org:8080/#/c/14061/4/src/kudu/tablet/delta_tracker.cc
File src/kudu/tablet/delta_tracker.cc:

http://gerrit.cloudera.org:8080/#/c/14061/4/src/kudu/tablet/delta_tracker.cc@748
PS4, Line 748: // Merge the deleted row count of the old DMS to the 
RowSetMetadata
 : // and reset deleted_row_count_ should be atomic, so we lock 
the
 : // component_lock_ in exclusive mode.
 : std::lock_guard loc
> These comments just duplicate the actual code. It would be more interesting
Done


http://gerrit.cloudera.org:8080/#/c/14061/4/src/kudu/tablet/rowset_metadata.h
File src/kudu/tablet/rowset_metadata.h:

http://gerrit.cloudera.org:8080/#/c/14061/4/src/kudu/tablet/rowset_metadata.h@120
PS4, Line 120:   // Atomically commit the new redo delta block to 
RowSetMetadata.
> Could you doc this function?
Done


http://gerrit.cloudera.org:8080/#/c/14061/4/src/kudu/tablet/rowset_metadata.cc
File src/kudu/tablet/rowset_metadata.cc:

http://gerrit.cloudera.org:8080/#/c/14061/4/src/kudu/tablet/rowset_metadata.cc@291
PS4, Line 291: std::lock_guard l(lock_);
> Additional locks to old tables that do not support live row
 > counting.

Emm, I think our newly created table will turn this on by default, so this 
should be acceptable.



--
To view, visit http://gerrit.cloudera.org:8080/14061
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9bb4456123087778c9dc799777c5990938a84fdf
Gerrit-Change-Number: 14061
Gerrit-PatchSet: 9
Gerrit-Owner: Yao Xu 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Reviewer: Yao Xu 
Gerrit-Reviewer: helifu 
Gerrit-Comment-Date: Tue, 20 Aug 2019 11:11:36 +
Gerrit-HasComments: Yes


[kudu-CR] [tablet] Fixed the bug of DeltaTracker::CountDeletedRows

2019-08-20 Thread Yao Xu (Code Review)
Hello Tidy Bot, Kudu Jenkins, helifu, Adar Dembo,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/14061

to look at the new patch set (#9).

Change subject: [tablet] Fixed the bug of DeltaTracker::CountDeletedRows
..

[tablet] Fixed the bug of DeltaTracker::CountDeletedRows

When Tablet.CountLiveRows was called in a multi-thread case, there's a
chance we'll see the following failure.

User stack:
F0814 12:05:51.975797 96375 diskrowset.cc:759] Check failed: *count >= 0 (-3 
vs. 0)
*** Check failure stack trace: ***
*** Aborted at 156571 (unix time) try "date -d @156571" if you are 
using GNU date ***
PC: @ 0x7f9bd20425f7 __GI_raise
*** SIGABRT (@0x70900017872) received by PID 96370 (TID 0x7f9bce2d7700) from 
PID 96370; stack trace: ***
@ 0x7f9bdaff6100 (unknown)
@ 0x7f9bd20425f7 __GI_raise
@ 0x7f9bd2043ce8 __GI_abort
@ 0x7f9bd4540c99 google::logging_fail()
@ 0x7f9bd454246d google::LogMessage::Fail()
@ 0x7f9bd45443c3 google::LogMessage::SendToLog()
@ 0x7f9bd4541fc9 google::LogMessage::Flush()
@ 0x7f9bd4544d4f google::LogMessageFatal::~LogMessageFatal()
@ 0x7f9bddc9aabe kudu::tablet::DiskRowSet::CountLiveRows()
@ 0x7f9bddbdeb79 kudu::tablet::Tablet::CountLiveRows()
@   0x49891f 
kudu::tablet::MultiThreadedTabletTest<>::CollectStatisticsThread()
@   0x4ae34b boost::_mfi::mf1<>::operator()()
@   0x4add25 boost::_bi::list2<>::operator()<>()
@   0x4acfe9 boost::_bi::bind_t<>::operator()()
@   0x4ac8a6 
boost::detail::function::void_function_obj_invoker0<>::invoke()
@ 0x7f9bd7116492 boost::function0<>::operator()()
@ 0x7f9bd62e5324 kudu::Thread::SuperviseThread()
@ 0x7f9bdafeedc5 start_thread
@ 0x7f9bd2103ced __clone

This is because there is DeltaTracker lack of lock protection when modify
the number of live rows in rowset_metadata_ and reset the deleted_row_count_.
This caused deleted_row_count_ to be duplicated when calculating the number
of live rows of DRS. Consider the following sequence:
| T1| T2
|-- |--
|+ In DT::Flush |
|  Take compact_flush_lock_ (excl)  |
|  Take component_lock_ (excl)  |
|  deleted_row_count_ = ... |
|  Release component_lock_  |
|  + In DT::FlushDMS|
|Call RSMD::IncrementLiveRows   |
|--> RSMD::live_row_count - deleted_row_count_
|   |+ In DRS::CountLiveRows
|   |  Take component_lock_ (shared)
|   |  Call RSMD::live_row_count - 
DT::CountDeletedRows
|   |  --> RSMD::live_row_count - 
deleted_row_count_
|   |  --> we double counted deleted_row_count_ 
!!!
|  Take component_lock_ (excl)  |
|  deleted_row_count_ = 0   |
|  Release component_lock_  |
|  Release compact_flush_lock_  |

Change-Id: I9bb4456123087778c9dc799777c5990938a84fdf
---
M src/kudu/integration-tests/raft_consensus-itest.cc
M src/kudu/integration-tests/test_workload.cc
M src/kudu/integration-tests/test_workload.h
M src/kudu/tablet/delta_tracker.cc
M src/kudu/tablet/delta_tracker.h
M src/kudu/tablet/diskrowset.cc
M src/kudu/tablet/metadata-test.cc
M src/kudu/tablet/mt-tablet-test.cc
M src/kudu/tablet/rowset_metadata.cc
M src/kudu/tablet/rowset_metadata.h
10 files changed, 171 insertions(+), 77 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/61/14061/9
--
To view, visit http://gerrit.cloudera.org:8080/14061
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I9bb4456123087778c9dc799777c5990938a84fdf
Gerrit-Change-Number: 14061
Gerrit-PatchSet: 9
Gerrit-Owner: Yao Xu 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Reviewer: Yao Xu 
Gerrit-Reviewer: helifu 


[kudu-CR] [tablet] Fixed the bug of DeltaTracker::CountDeletedRows

2019-08-20 Thread Yao Xu (Code Review)
Hello Tidy Bot, Kudu Jenkins, helifu, Adar Dembo,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/14061

to look at the new patch set (#8).

Change subject: [tablet] Fixed the bug of DeltaTracker::CountDeletedRows
..

[tablet] Fixed the bug of DeltaTracker::CountDeletedRows

When Tablet.CountLiveRows was called in a multi-thread case, there's a
chance we'll see the following failure.

User stack:
F0814 12:05:51.975797 96375 diskrowset.cc:759] Check failed: *count >= 0 (-3 
vs. 0)
*** Check failure stack trace: ***
*** Aborted at 156571 (unix time) try "date -d @156571" if you are 
using GNU date ***
PC: @ 0x7f9bd20425f7 __GI_raise
*** SIGABRT (@0x70900017872) received by PID 96370 (TID 0x7f9bce2d7700) from 
PID 96370; stack trace: ***
@ 0x7f9bdaff6100 (unknown)
@ 0x7f9bd20425f7 __GI_raise
@ 0x7f9bd2043ce8 __GI_abort
@ 0x7f9bd4540c99 google::logging_fail()
@ 0x7f9bd454246d google::LogMessage::Fail()
@ 0x7f9bd45443c3 google::LogMessage::SendToLog()
@ 0x7f9bd4541fc9 google::LogMessage::Flush()
@ 0x7f9bd4544d4f google::LogMessageFatal::~LogMessageFatal()
@ 0x7f9bddc9aabe kudu::tablet::DiskRowSet::CountLiveRows()
@ 0x7f9bddbdeb79 kudu::tablet::Tablet::CountLiveRows()
@   0x49891f 
kudu::tablet::MultiThreadedTabletTest<>::CollectStatisticsThread()
@   0x4ae34b boost::_mfi::mf1<>::operator()()
@   0x4add25 boost::_bi::list2<>::operator()<>()
@   0x4acfe9 boost::_bi::bind_t<>::operator()()
@   0x4ac8a6 
boost::detail::function::void_function_obj_invoker0<>::invoke()
@ 0x7f9bd7116492 boost::function0<>::operator()()
@ 0x7f9bd62e5324 kudu::Thread::SuperviseThread()
@ 0x7f9bdafeedc5 start_thread
@ 0x7f9bd2103ced __clone

This is because there is DeltaTracker lack of lock protection when modify
the number of live rows in rowset_metadata_ and reset the deleted_row_count_.
This caused deleted_row_count_ to be duplicated when calculating the number
of live rows of DRS. Consider the following sequence:
| T1| T2
|-- |--
|+ In DT::Flush |
|  Take compact_flush_lock_ (excl)  |
|  Take component_lock_ (excl)  |
|  deleted_row_count_ = ... |
|  Release component_lock_  |
|  + In DT::FlushDMS|
|Call RSMD::IncrementLiveRows   |
|--> RSMD::live_row_count - deleted_row_count_
|   |+ In DRS::CountLiveRows
|   |  Take component_lock_ (shared)
|   |  Call RSMD::live_row_count - 
DT::CountDeletedRows
|   |  --> RSMD::live_row_count - 
deleted_row_count_
|   |  --> we double counted deleted_row_count_ 
!!!
|  Take component_lock_ (excl)  |
|  deleted_row_count_ = 0   |
|  Release component_lock_  |
|  Release compact_flush_lock_  |

Change-Id: I9bb4456123087778c9dc799777c5990938a84fdf
---
M src/kudu/integration-tests/raft_consensus-itest.cc
M src/kudu/integration-tests/test_workload.cc
M src/kudu/integration-tests/test_workload.h
M src/kudu/tablet/delta_tracker.cc
M src/kudu/tablet/delta_tracker.h
M src/kudu/tablet/diskrowset.cc
M src/kudu/tablet/metadata-test.cc
M src/kudu/tablet/mt-tablet-test.cc
M src/kudu/tablet/rowset_metadata.cc
M src/kudu/tablet/rowset_metadata.h
10 files changed, 171 insertions(+), 76 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/61/14061/8
--
To view, visit http://gerrit.cloudera.org:8080/14061
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I9bb4456123087778c9dc799777c5990938a84fdf
Gerrit-Change-Number: 14061
Gerrit-PatchSet: 8
Gerrit-Owner: Yao Xu 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Reviewer: Yao Xu 
Gerrit-Reviewer: helifu 


[kudu-CR] [tablet] Fixed the bug of DeltaTracker::CountDeletedRows

2019-08-20 Thread Yao Xu (Code Review)
Hello Tidy Bot, Kudu Jenkins, helifu, Adar Dembo,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/14061

to look at the new patch set (#7).

Change subject: [tablet] Fixed the bug of DeltaTracker::CountDeletedRows
..

[tablet] Fixed the bug of DeltaTracker::CountDeletedRows

When Tablet.CountLiveRows was called in a multi-thread case, there's a
chance we'll see the following failure.

User stack:
F0814 12:05:51.975797 96375 diskrowset.cc:759] Check failed: *count >= 0 (-3 
vs. 0)
*** Check failure stack trace: ***
*** Aborted at 156571 (unix time) try "date -d @156571" if you are 
using GNU date ***
PC: @ 0x7f9bd20425f7 __GI_raise
*** SIGABRT (@0x70900017872) received by PID 96370 (TID 0x7f9bce2d7700) from 
PID 96370; stack trace: ***
@ 0x7f9bdaff6100 (unknown)
@ 0x7f9bd20425f7 __GI_raise
@ 0x7f9bd2043ce8 __GI_abort
@ 0x7f9bd4540c99 google::logging_fail()
@ 0x7f9bd454246d google::LogMessage::Fail()
@ 0x7f9bd45443c3 google::LogMessage::SendToLog()
@ 0x7f9bd4541fc9 google::LogMessage::Flush()
@ 0x7f9bd4544d4f google::LogMessageFatal::~LogMessageFatal()
@ 0x7f9bddc9aabe kudu::tablet::DiskRowSet::CountLiveRows()
@ 0x7f9bddbdeb79 kudu::tablet::Tablet::CountLiveRows()
@   0x49891f 
kudu::tablet::MultiThreadedTabletTest<>::CollectStatisticsThread()
@   0x4ae34b boost::_mfi::mf1<>::operator()()
@   0x4add25 boost::_bi::list2<>::operator()<>()
@   0x4acfe9 boost::_bi::bind_t<>::operator()()
@   0x4ac8a6 
boost::detail::function::void_function_obj_invoker0<>::invoke()
@ 0x7f9bd7116492 boost::function0<>::operator()()
@ 0x7f9bd62e5324 kudu::Thread::SuperviseThread()
@ 0x7f9bdafeedc5 start_thread
@ 0x7f9bd2103ced __clone

This is because there is DeltaTracker lack of lock protection when modify
the number of live rows in rowset_metadata_ and reset the deleted_row_count_.
This caused deleted_row_count_ to be duplicated when calculating the number
of live rows of DRS. Consider the following sequence:
| T1| T2
|-- |--
|+ In DT::Flush |
|  Take compact_flush_lock_ (excl)  |
|  Take component_lock_ (excl)  |
|  deleted_row_count_ = ... |
|  Release component_lock_  |
|  + In DT::FlushDMS|
|Call RSMD::IncrementLiveRows   |
|--> RSMD::live_row_count - deleted_row_count_
|   |+ In DRS::CountLiveRows
|   |  Take component_lock_ (shared)
|   |  Call RSMD::live_row_count - 
DT::CountDeletedRows
|   |  --> RSMD::live_row_count - 
deleted_row_count_
|   |  --> we double counted deleted_row_count_ 
!!!
|  Take component_lock_ (excl)  |
|  deleted_row_count_ = 0   |
|  Release component_lock_  |
|  Release compact_flush_lock_  |

Change-Id: I9bb4456123087778c9dc799777c5990938a84fdf
---
M src/kudu/integration-tests/raft_consensus-itest.cc
M src/kudu/integration-tests/test_workload.cc
M src/kudu/integration-tests/test_workload.h
M src/kudu/tablet/delta_tracker.cc
M src/kudu/tablet/delta_tracker.h
M src/kudu/tablet/diskrowset.cc
M src/kudu/tablet/metadata-test.cc
M src/kudu/tablet/mt-tablet-test.cc
M src/kudu/tablet/rowset_metadata.cc
M src/kudu/tablet/rowset_metadata.h
10 files changed, 172 insertions(+), 76 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/61/14061/7
--
To view, visit http://gerrit.cloudera.org:8080/14061
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I9bb4456123087778c9dc799777c5990938a84fdf
Gerrit-Change-Number: 14061
Gerrit-PatchSet: 7
Gerrit-Owner: Yao Xu 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Reviewer: Yao Xu 
Gerrit-Reviewer: helifu 


[kudu-CR] [tablet] Fixed the bug of DeltaTracker::CountDeletedRows

2019-08-20 Thread Yao Xu (Code Review)
Hello Tidy Bot, Kudu Jenkins, helifu, Adar Dembo,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/14061

to look at the new patch set (#6).

Change subject: [tablet] Fixed the bug of DeltaTracker::CountDeletedRows
..

[tablet] Fixed the bug of DeltaTracker::CountDeletedRows

When Tablet.CountLiveRows was called in a multi-thread case, there's a
chance we'll see the following failure.

User stack:
F0814 12:05:51.975797 96375 diskrowset.cc:759] Check failed: *count >= 0 (-3 
vs. 0)
*** Check failure stack trace: ***
*** Aborted at 156571 (unix time) try "date -d @156571" if you are 
using GNU date ***
PC: @ 0x7f9bd20425f7 __GI_raise
*** SIGABRT (@0x70900017872) received by PID 96370 (TID 0x7f9bce2d7700) from 
PID 96370; stack trace: ***
@ 0x7f9bdaff6100 (unknown)
@ 0x7f9bd20425f7 __GI_raise
@ 0x7f9bd2043ce8 __GI_abort
@ 0x7f9bd4540c99 google::logging_fail()
@ 0x7f9bd454246d google::LogMessage::Fail()
@ 0x7f9bd45443c3 google::LogMessage::SendToLog()
@ 0x7f9bd4541fc9 google::LogMessage::Flush()
@ 0x7f9bd4544d4f google::LogMessageFatal::~LogMessageFatal()
@ 0x7f9bddc9aabe kudu::tablet::DiskRowSet::CountLiveRows()
@ 0x7f9bddbdeb79 kudu::tablet::Tablet::CountLiveRows()
@   0x49891f 
kudu::tablet::MultiThreadedTabletTest<>::CollectStatisticsThread()
@   0x4ae34b boost::_mfi::mf1<>::operator()()
@   0x4add25 boost::_bi::list2<>::operator()<>()
@   0x4acfe9 boost::_bi::bind_t<>::operator()()
@   0x4ac8a6 
boost::detail::function::void_function_obj_invoker0<>::invoke()
@ 0x7f9bd7116492 boost::function0<>::operator()()
@ 0x7f9bd62e5324 kudu::Thread::SuperviseThread()
@ 0x7f9bdafeedc5 start_thread
@ 0x7f9bd2103ced __clone

This is because there is DeltaTracker lack of lock protection when modify
the number of live rows in rowset_metadata_ and reset the deleted_row_count_.
This caused deleted_row_count_ to be duplicated when calculating the number
of live rows of DRS. Consider the following sequence:
| T1| T2
|-- |--
|+ In DT::Flush |
|  Take compact_flush_lock_ (excl)  |
|  Take component_lock_ (excl)  |
|  deleted_row_count_ = ... |
|  Release component_lock_  |
|  + In DT::FlushDMS|
|Call RSMD::IncrementLiveRows   |
|--> RSMD::live_row_count - deleted_row_count_
|   |+ In DRS::CountLiveRows
|   |  Take component_lock_ (shared)
|   |  Call RSMD::live_row_count - 
DT::CountDeletedRows
|   |  --> RSMD::live_row_count - 
deleted_row_count_
|   |  --> we double counted deleted_row_count_ 
!!!
|  Take component_lock_ (excl)  |
|  deleted_row_count_ = 0   |
|  Release component_lock_  |
|  Release compact_flush_lock_  |

Change-Id: I9bb4456123087778c9dc799777c5990938a84fdf
---
M src/kudu/integration-tests/raft_consensus-itest.cc
M src/kudu/integration-tests/test_workload.cc
M src/kudu/integration-tests/test_workload.h
M src/kudu/tablet/delta_tracker.cc
M src/kudu/tablet/delta_tracker.h
M src/kudu/tablet/diskrowset.cc
M src/kudu/tablet/metadata-test.cc
M src/kudu/tablet/mt-tablet-test.cc
M src/kudu/tablet/rowset_metadata.cc
M src/kudu/tablet/rowset_metadata.h
10 files changed, 170 insertions(+), 76 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/61/14061/6
--
To view, visit http://gerrit.cloudera.org:8080/14061
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I9bb4456123087778c9dc799777c5990938a84fdf
Gerrit-Change-Number: 14061
Gerrit-PatchSet: 6
Gerrit-Owner: Yao Xu 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Reviewer: Yao Xu 
Gerrit-Reviewer: helifu 


[kudu-CR] [tablet] Fixed the bug of DeltaTracker::CountDeletedRows

2019-08-20 Thread Yao Xu (Code Review)
Hello Tidy Bot, Kudu Jenkins, helifu, Adar Dembo,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/14061

to look at the new patch set (#5).

Change subject: [tablet] Fixed the bug of DeltaTracker::CountDeletedRows
..

[tablet] Fixed the bug of DeltaTracker::CountDeletedRows

When Tablet.CountLiveRows was called in a multi-thread case, there's a
chance we'll see the following failure.

User stack:
F0814 12:05:51.975797 96375 diskrowset.cc:759] Check failed: *count >= 0 (-3 
vs. 0)
*** Check failure stack trace: ***
*** Aborted at 156571 (unix time) try "date -d @156571" if you are 
using GNU date ***
PC: @ 0x7f9bd20425f7 __GI_raise
*** SIGABRT (@0x70900017872) received by PID 96370 (TID 0x7f9bce2d7700) from 
PID 96370; stack trace: ***
@ 0x7f9bdaff6100 (unknown)
@ 0x7f9bd20425f7 __GI_raise
@ 0x7f9bd2043ce8 __GI_abort
@ 0x7f9bd4540c99 google::logging_fail()
@ 0x7f9bd454246d google::LogMessage::Fail()
@ 0x7f9bd45443c3 google::LogMessage::SendToLog()
@ 0x7f9bd4541fc9 google::LogMessage::Flush()
@ 0x7f9bd4544d4f google::LogMessageFatal::~LogMessageFatal()
@ 0x7f9bddc9aabe kudu::tablet::DiskRowSet::CountLiveRows()
@ 0x7f9bddbdeb79 kudu::tablet::Tablet::CountLiveRows()
@   0x49891f 
kudu::tablet::MultiThreadedTabletTest<>::CollectStatisticsThread()
@   0x4ae34b boost::_mfi::mf1<>::operator()()
@   0x4add25 boost::_bi::list2<>::operator()<>()
@   0x4acfe9 boost::_bi::bind_t<>::operator()()
@   0x4ac8a6 
boost::detail::function::void_function_obj_invoker0<>::invoke()
@ 0x7f9bd7116492 boost::function0<>::operator()()
@ 0x7f9bd62e5324 kudu::Thread::SuperviseThread()
@ 0x7f9bdafeedc5 start_thread
@ 0x7f9bd2103ced __clone

This is because there is DeltaTracker lack of lock protection when modify
the number of live rows in rowset_metadata_ and reset the deleted_row_count_.
This caused deleted_row_count_ to be duplicated when calculating the number
of live rows of DRS. Consider the following sequence:
| T1| T2
|-- |--
|+ In DT::Flush |
|  Take compact_flush_lock_ (excl)  |
|  Take component_lock_ (excl)  |
|  deleted_row_count_ = ... |
|  Release component_lock_  |
|  + In DT::FlushDMS|
|Call RSMD::IncrementLiveRows   |
|--> RSMD::live_row_count - deleted_row_count_
|   |+ In DRS::CountLiveRows
|   |  Take component_lock_ (shared)
|   |  Call RSMD::live_row_count - 
DT::CountDeletedRows
|   |  --> RSMD::live_row_count - 
deleted_row_count_
|   |  --> we double counted deleted_row_count_ 
!!!
|  Take component_lock_ (excl)  |
|  deleted_row_count_ = 0   |
|  Release component_lock_  |
|  Release compact_flush_lock_  |

Change-Id: I9bb4456123087778c9dc799777c5990938a84fdf
---
M src/kudu/integration-tests/raft_consensus-itest.cc
M src/kudu/integration-tests/test_workload.cc
M src/kudu/integration-tests/test_workload.h
M src/kudu/tablet/delta_tracker.cc
M src/kudu/tablet/delta_tracker.h
M src/kudu/tablet/diskrowset.cc
M src/kudu/tablet/metadata-test.cc
M src/kudu/tablet/mt-tablet-test.cc
M src/kudu/tablet/rowset_metadata.cc
M src/kudu/tablet/rowset_metadata.h
10 files changed, 170 insertions(+), 76 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/61/14061/5
--
To view, visit http://gerrit.cloudera.org:8080/14061
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I9bb4456123087778c9dc799777c5990938a84fdf
Gerrit-Change-Number: 14061
Gerrit-PatchSet: 5
Gerrit-Owner: Yao Xu 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Reviewer: Yao Xu 
Gerrit-Reviewer: helifu 


[kudu-CR] [tablet] Fixed the bug of DeltaTracker::CountDeletedRows

2019-08-19 Thread Adar Dembo (Code Review)
Adar Dembo has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/14061 )

Change subject: [tablet] Fixed the bug of DeltaTracker::CountDeletedRows
..


Patch Set 4:

(9 comments)

http://gerrit.cloudera.org:8080/#/c/14061/4/src/kudu/integration-tests/raft_consensus-itest.cc
File src/kudu/integration-tests/raft_consensus-itest.cc:

http://gerrit.cloudera.org:8080/#/c/14061/4/src/kudu/integration-tests/raft_consensus-itest.cc@969
PS4, Line 969:   
workload.set_write_pattern(TestWorkload::INSERT_RANDOM_ROWS_WITH_DELETE);
 :   workload.set_write_batch_size(3);
Why are these necessary? Please doc.


http://gerrit.cloudera.org:8080/#/c/14061/4/src/kudu/integration-tests/raft_consensus-itest.cc@3139
PS4, Line 3139:
Nit: did you mean to add this empty line at the end?


http://gerrit.cloudera.org:8080/#/c/14061/4/src/kudu/integration-tests/test_workload.h
File src/kudu/integration-tests/test_workload.h:

http://gerrit.cloudera.org:8080/#/c/14061/4/src/kudu/integration-tests/test_workload.h@152
PS4, Line 152: // Insert random row keys and delete them.
Nit: "Insert random rows, then delete them."


http://gerrit.cloudera.org:8080/#/c/14061/4/src/kudu/integration-tests/test_workload.cc
File src/kudu/integration-tests/test_workload.cc:

http://gerrit.cloudera.org:8080/#/c/14061/4/src/kudu/integration-tests/test_workload.cc@140
PS4, Line 140:   auto check_error_func = [](KuduSession* session, TestWorkload* 
workload) {
Since you have to pass 'this', how about you make this a named member function 
instead?


http://gerrit.cloudera.org:8080/#/c/14061/4/src/kudu/integration-tests/test_workload.cc@168
PS4, Line 168: KuduPartialRow *row
Nit: the previous style ('KuduPartialRow*' ) is more correct.


http://gerrit.cloudera.org:8080/#/c/14061/4/src/kudu/integration-tests/test_workload.cc@237
PS4, Line 237: int64_t expected_row_count =
 : write_pattern_ == INSERT_RANDOM_ROWS_WITH_DELETE ? 0 : 
rows_inserted_.Load();
Should note that when INSERT_RANDOM_ROWS_WITH_DELETE is used, ReadThread 
doesn't really verify anything except that a scan works.


http://gerrit.cloudera.org:8080/#/c/14061/2/src/kudu/tablet/delta_tracker.cc
File src/kudu/tablet/delta_tracker.cc:

http://gerrit.cloudera.org:8080/#/c/14061/2/src/kudu/tablet/delta_tracker.cc@746
PS2, Line 746:
> Yeah, very good case!!
L746 updates the in-memory tablet superblock to reflect the DMS flush.
L747 atomically writes that new tablet superblock to disk.

If the tserver crashes in between, the changes made by L746 are lost. But 
that's OK; the post-crash state is correct as long as it reflects either the 
entirety of the DMS flush, or none of it.


http://gerrit.cloudera.org:8080/#/c/14061/4/src/kudu/tablet/delta_tracker.cc
File src/kudu/tablet/delta_tracker.cc:

http://gerrit.cloudera.org:8080/#/c/14061/4/src/kudu/tablet/delta_tracker.cc@748
PS4, Line 748: // Lock the component_lock_ in exclusive mode.
 : std::lock_guard lock(component_lock_);
 : // Merge the deleted row count of the old DMS to the 
RowSetMetadata
 : // and reset deleted_row_count_.
These comments just duplicate the actual code. It would be more interesting to 
instead describe _why_ this is necessary. i.e. why do we need to take 
component_lock_ in exclusive mode before doing this?


http://gerrit.cloudera.org:8080/#/c/14061/4/src/kudu/tablet/rowset_metadata.h
File src/kudu/tablet/rowset_metadata.h:

http://gerrit.cloudera.org:8080/#/c/14061/4/src/kudu/tablet/rowset_metadata.h@120
PS4, Line 120:   Status CommitRedoDeltaDataBlock(int64_t dms_id,
Could you doc this function?



--
To view, visit http://gerrit.cloudera.org:8080/14061
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9bb4456123087778c9dc799777c5990938a84fdf
Gerrit-Change-Number: 14061
Gerrit-PatchSet: 4
Gerrit-Owner: Yao Xu 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Reviewer: Yao Xu 
Gerrit-Reviewer: helifu 
Gerrit-Comment-Date: Mon, 19 Aug 2019 19:19:38 +
Gerrit-HasComments: Yes


[kudu-CR] [tablet] Fixed the bug of DeltaTracker::CountDeletedRows

2019-08-19 Thread helifu (Code Review)
helifu has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/14061 )

Change subject: [tablet] Fixed the bug of DeltaTracker::CountDeletedRows
..


Patch Set 4:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/14061/2/src/kudu/tablet/delta_tracker.cc
File src/kudu/tablet/delta_tracker.cc:

http://gerrit.cloudera.org:8080/#/c/14061/2/src/kudu/tablet/delta_tracker.cc@746
PS2, Line 746:
> Is it problematic that the live row count isn't changed atomically as part
Yeah, very good case!!
And I'm curious about the result: what will happen when the tserver crashes 
between DMS flushing thread calls CommitRedoDeltaDataBlock and flush the tablet 
metadata? Meaning, crash between Line746 and Line747. Thanks in advance.


http://gerrit.cloudera.org:8080/#/c/14061/4/src/kudu/tablet/rowset_metadata.cc
File src/kudu/tablet/rowset_metadata.cc:

http://gerrit.cloudera.org:8080/#/c/14061/4/src/kudu/tablet/rowset_metadata.cc@291
PS4, Line 291: std::lock_guard l(lock_);
Additional locks to old tables that do not support live row counting.



--
To view, visit http://gerrit.cloudera.org:8080/14061
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9bb4456123087778c9dc799777c5990938a84fdf
Gerrit-Change-Number: 14061
Gerrit-PatchSet: 4
Gerrit-Owner: Yao Xu 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Reviewer: Yao Xu 
Gerrit-Reviewer: helifu 
Gerrit-Comment-Date: Mon, 19 Aug 2019 11:31:23 +
Gerrit-HasComments: Yes


[kudu-CR] [tablet] Fixed the bug of DeltaTracker::CountDeletedRows

2019-08-15 Thread Yao Xu (Code Review)
Yao Xu has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/14061 )

Change subject: [tablet] Fixed the bug of DeltaTracker::CountDeletedRows
..


Patch Set 4:

I implemented an integration test case that simulated this scenario with 
high-frequency flush, tablet server crashes, and heartbeats(count live rows).


--
To view, visit http://gerrit.cloudera.org:8080/14061
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9bb4456123087778c9dc799777c5990938a84fdf
Gerrit-Change-Number: 14061
Gerrit-PatchSet: 4
Gerrit-Owner: Yao Xu 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Reviewer: Yao Xu 
Gerrit-Reviewer: helifu 
Gerrit-Comment-Date: Thu, 15 Aug 2019 15:00:35 +
Gerrit-HasComments: No


[kudu-CR] [tablet] Fixed the bug of DeltaTracker::CountDeletedRows

2019-08-15 Thread Yao Xu (Code Review)
Hello Tidy Bot, Kudu Jenkins, helifu, Adar Dembo,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/14061

to look at the new patch set (#4).

Change subject: [tablet] Fixed the bug of DeltaTracker::CountDeletedRows
..

[tablet] Fixed the bug of DeltaTracker::CountDeletedRows

When Tablet.CountLiveRows was called in a multi-thread case, there's a
chance we'll see the following failure.

User stack:
F0814 12:05:51.975797 96375 diskrowset.cc:759] Check failed: *count >= 0 (-3 
vs. 0)
*** Check failure stack trace: ***
*** Aborted at 156571 (unix time) try "date -d @156571" if you are 
using GNU date ***
PC: @ 0x7f9bd20425f7 __GI_raise
*** SIGABRT (@0x70900017872) received by PID 96370 (TID 0x7f9bce2d7700) from 
PID 96370; stack trace: ***
@ 0x7f9bdaff6100 (unknown)
@ 0x7f9bd20425f7 __GI_raise
@ 0x7f9bd2043ce8 __GI_abort
@ 0x7f9bd4540c99 google::logging_fail()
@ 0x7f9bd454246d google::LogMessage::Fail()
@ 0x7f9bd45443c3 google::LogMessage::SendToLog()
@ 0x7f9bd4541fc9 google::LogMessage::Flush()
@ 0x7f9bd4544d4f google::LogMessageFatal::~LogMessageFatal()
@ 0x7f9bddc9aabe kudu::tablet::DiskRowSet::CountLiveRows()
@ 0x7f9bddbdeb79 kudu::tablet::Tablet::CountLiveRows()
@   0x49891f 
kudu::tablet::MultiThreadedTabletTest<>::CollectStatisticsThread()
@   0x4ae34b boost::_mfi::mf1<>::operator()()
@   0x4add25 boost::_bi::list2<>::operator()<>()
@   0x4acfe9 boost::_bi::bind_t<>::operator()()
@   0x4ac8a6 
boost::detail::function::void_function_obj_invoker0<>::invoke()
@ 0x7f9bd7116492 boost::function0<>::operator()()
@ 0x7f9bd62e5324 kudu::Thread::SuperviseThread()
@ 0x7f9bdafeedc5 start_thread
@ 0x7f9bd2103ced __clone

This is because there is DeltaTracker lack of lock protection when modify
the number of live rows in rowset_metadata_ and reset the deleted_row_count_.
This caused deleted_row_count_ to be duplicated when calculating the number
of live rows of DRS. Consider the following sequence:
| T1| T2
|-- |--
|+ In DT::Flush |
|  Take compact_flush_lock_ (excl)  |
|  Take component_lock_ (excl)  |
|  deleted_row_count_ = ... |
|  Release component_lock_  |
|  + In DT::FlushDMS|
|Call RSMD::IncrementLiveRows   |
|--> RSMD::live_row_count - deleted_row_count_
|   |+ In DRS::CountLiveRows
|   |  Take component_lock_ (shared)
|   |  Call RSMD::live_row_count - 
DT::CountDeletedRows
|   |  --> RSMD::live_row_count - 
deleted_row_count_
|   |  --> we double counted deleted_row_count_ 
!!!
|  Take component_lock_ (excl)  |
|  deleted_row_count_ = 0   |
|  Release component_lock_  |
|  Release compact_flush_lock_  |

Change-Id: I9bb4456123087778c9dc799777c5990938a84fdf
---
M src/kudu/integration-tests/raft_consensus-itest.cc
M src/kudu/integration-tests/test_workload.cc
M src/kudu/integration-tests/test_workload.h
M src/kudu/tablet/delta_tracker.cc
M src/kudu/tablet/delta_tracker.h
M src/kudu/tablet/diskrowset.cc
M src/kudu/tablet/metadata-test.cc
M src/kudu/tablet/mt-tablet-test.cc
M src/kudu/tablet/rowset_metadata.cc
M src/kudu/tablet/rowset_metadata.h
10 files changed, 145 insertions(+), 76 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/61/14061/4
--
To view, visit http://gerrit.cloudera.org:8080/14061
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I9bb4456123087778c9dc799777c5990938a84fdf
Gerrit-Change-Number: 14061
Gerrit-PatchSet: 4
Gerrit-Owner: Yao Xu 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Reviewer: Yao Xu 
Gerrit-Reviewer: helifu 


[kudu-CR] [tablet] Fixed the bug of DeltaTracker::CountDeletedRows

2019-08-15 Thread Yao Xu (Code Review)
Hello Kudu Jenkins, helifu, Adar Dembo,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/14061

to look at the new patch set (#3).

Change subject: [tablet] Fixed the bug of DeltaTracker::CountDeletedRows
..

[tablet] Fixed the bug of DeltaTracker::CountDeletedRows

When Tablet.CountLiveRows was called in a multi-thread case, there's a
chance we'll see the following failure.

User stack:
F0814 12:05:51.975797 96375 diskrowset.cc:759] Check failed: *count >= 0 (-3 
vs. 0)
*** Check failure stack trace: ***
*** Aborted at 156571 (unix time) try "date -d @156571" if you are 
using GNU date ***
PC: @ 0x7f9bd20425f7 __GI_raise
*** SIGABRT (@0x70900017872) received by PID 96370 (TID 0x7f9bce2d7700) from 
PID 96370; stack trace: ***
@ 0x7f9bdaff6100 (unknown)
@ 0x7f9bd20425f7 __GI_raise
@ 0x7f9bd2043ce8 __GI_abort
@ 0x7f9bd4540c99 google::logging_fail()
@ 0x7f9bd454246d google::LogMessage::Fail()
@ 0x7f9bd45443c3 google::LogMessage::SendToLog()
@ 0x7f9bd4541fc9 google::LogMessage::Flush()
@ 0x7f9bd4544d4f google::LogMessageFatal::~LogMessageFatal()
@ 0x7f9bddc9aabe kudu::tablet::DiskRowSet::CountLiveRows()
@ 0x7f9bddbdeb79 kudu::tablet::Tablet::CountLiveRows()
@   0x49891f 
kudu::tablet::MultiThreadedTabletTest<>::CollectStatisticsThread()
@   0x4ae34b boost::_mfi::mf1<>::operator()()
@   0x4add25 boost::_bi::list2<>::operator()<>()
@   0x4acfe9 boost::_bi::bind_t<>::operator()()
@   0x4ac8a6 
boost::detail::function::void_function_obj_invoker0<>::invoke()
@ 0x7f9bd7116492 boost::function0<>::operator()()
@ 0x7f9bd62e5324 kudu::Thread::SuperviseThread()
@ 0x7f9bdafeedc5 start_thread
@ 0x7f9bd2103ced __clone

This is because there is DeltaTracker lack of lock protection when modify
the number of live rows in rowset_metadata_ and reset the deleted_row_count_.
This caused deleted_row_count_ to be duplicated when calculating the number
of live rows of DRS. Consider the following sequence:
| T1| T2
|-- |--
|+ In DT::Flush |
|  Take compact_flush_lock_ (excl)  |
|  Take component_lock_ (excl)  |
|  deleted_row_count_ = ... |
|  Release component_lock_  |
|  + In DT::FlushDMS|
|Call RSMD::IncrementLiveRows   |
|--> RSMD::live_row_count - deleted_row_count_
|   |+ In DRS::CountLiveRows
|   |  Take component_lock_ (shared)
|   |  Call RSMD::live_row_count - 
DT::CountDeletedRows
|   |  --> RSMD::live_row_count - 
deleted_row_count_
|   |  --> we double counted deleted_row_count_ 
!!!
|  Take component_lock_ (excl)  |
|  deleted_row_count_ = 0   |
|  Release component_lock_  |
|  Release compact_flush_lock_  |

Change-Id: I9bb4456123087778c9dc799777c5990938a84fdf
---
M src/kudu/integration-tests/raft_consensus-itest.cc
M src/kudu/integration-tests/test_workload.cc
M src/kudu/integration-tests/test_workload.h
M src/kudu/tablet/delta_tracker.cc
M src/kudu/tablet/delta_tracker.h
M src/kudu/tablet/diskrowset.cc
M src/kudu/tablet/metadata-test.cc
M src/kudu/tablet/mt-tablet-test.cc
M src/kudu/tablet/rowset_metadata.cc
M src/kudu/tablet/rowset_metadata.h
10 files changed, 148 insertions(+), 74 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/61/14061/3
--
To view, visit http://gerrit.cloudera.org:8080/14061
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I9bb4456123087778c9dc799777c5990938a84fdf
Gerrit-Change-Number: 14061
Gerrit-PatchSet: 3
Gerrit-Owner: Yao Xu 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Yao Xu 
Gerrit-Reviewer: helifu 


[kudu-CR] [tablet] Fixed the bug of DeltaTracker::CountDeletedRows

2019-08-14 Thread Yao Xu (Code Review)
Yao Xu has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/14061 )

Change subject: [tablet] Fixed the bug of DeltaTracker::CountDeletedRows
..


Patch Set 2:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/14061/2//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/14061/2//COMMIT_MSG@38
PS2, Line 38: This is because there is DeltaTracker lack of lock protection 
when modify
: the number of live rows in rowset_metadata_ and reset the 
deleted_row_count_.
: This caused deleted_row_count_ to be duplicated when calculating 
the number
: of live rows of DRS.
> It took me a while to understand the race. Below is a thread interleaving t
Well, it's really clear. Got a new skill !!


http://gerrit.cloudera.org:8080/#/c/14061/2/src/kudu/tablet/delta_tracker.cc
File src/kudu/tablet/delta_tracker.cc:

http://gerrit.cloudera.org:8080/#/c/14061/2/src/kudu/tablet/delta_tracker.cc@746
PS2, Line 746:   
RETURN_NOT_OK(rowset_metadata_->CommitRedoDeltaDataBlock(dms->id(), block_id));
> Is it problematic that the live row count isn't changed atomically as part
Mmm, this is a real problem, and I'll try to add some test cases to my 
integration tests. Great Thanks :)



--
To view, visit http://gerrit.cloudera.org:8080/14061
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9bb4456123087778c9dc799777c5990938a84fdf
Gerrit-Change-Number: 14061
Gerrit-PatchSet: 2
Gerrit-Owner: Yao Xu 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Yao Xu 
Gerrit-Reviewer: helifu 
Gerrit-Comment-Date: Thu, 15 Aug 2019 03:44:06 +
Gerrit-HasComments: Yes


[kudu-CR] [tablet] Fixed the bug of DeltaTracker::CountDeletedRows

2019-08-14 Thread Adar Dembo (Code Review)
Adar Dembo has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/14061 )

Change subject: [tablet] Fixed the bug of DeltaTracker::CountDeletedRows
..


Patch Set 2:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/14061/2//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/14061/2//COMMIT_MSG@38
PS2, Line 38: This is because there is DeltaTracker lack of lock protection 
when modify
: the number of live rows in rowset_metadata_ and reset the 
deleted_row_count_.
: This caused deleted_row_count_ to be duplicated when calculating 
the number
: of live rows of DRS.
It took me a while to understand the race. Below is a thread interleaving that 
I sketched which describes it:


T1   T2
---  ---
+ In DT::Flush
Take compact_flush_lock_ (excl)
Take component_lock_ (excl)
deleted_row_count_ = ...
Release component_lock_
+ In DT::FlushDMS
Call RSMD::IncrementLiveRows
--> RSMD::live_row_count now
includes deleted_row_count_
 + In DRS::CountLiveRows
 Take component_lock_ (shared)
 Call RSMD::live_row_count - 
DT::CountDeletedRows
 --> RSMD::live_row_count - deleted_row_count_
 --> we double counted deleted_row_count_ !!!
Take component_lock_ (excl)
deleted_row_count_ = 0
Release component_lock_

Maybe you can clean this up and add it to the commit message?


http://gerrit.cloudera.org:8080/#/c/14061/2/src/kudu/tablet/delta_tracker.cc
File src/kudu/tablet/delta_tracker.cc:

http://gerrit.cloudera.org:8080/#/c/14061/2/src/kudu/tablet/delta_tracker.cc@746
PS2, Line 746:   
RETURN_NOT_OK(rowset_metadata_->CommitRedoDeltaDataBlock(dms->id(), block_id));
Is it problematic that the live row count isn't changed atomically as part of 
this operation? Consider the following sequence:
1. One MM thread is in the middle of flushing a DMS, and makes it as far as 
IncrementLiveRows before a context switch. Meaning, the RSMD's live row count 
has changed, but neither the RSMD's last durable DMS ID nor its list of redo 
block IDs has been updated.
2. Meanwhile, another MM thread does a compaction, flushing the tablet metadata 
(and all RSMDs), including the updated live row count from #1.
3. The server crashes before the DMS flushing thread can call 
CommitRedoDeltaDataBlock and flush the tablet metadata.
4. On restart, the tablet is bootstrapped. Its metadata includes a live row 
count that reflects the DMS flush, but it's as if the DMS flush "never 
happened" because a) the new redo block ID wasn't committed, and b) the 
rowset's last durable DMS ID wasn't updated.

As I see it, it seems like all of the RSMD changes in the DMS flush need to 
happen atomically; otherwise the live row count may not reflect reality after a 
crash. Do you think this is a real problem? If so, could we prove it in a unit 
test in some way?



--
To view, visit http://gerrit.cloudera.org:8080/14061
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9bb4456123087778c9dc799777c5990938a84fdf
Gerrit-Change-Number: 14061
Gerrit-PatchSet: 2
Gerrit-Owner: Yao Xu 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: helifu 
Gerrit-Comment-Date: Wed, 14 Aug 2019 17:23:26 +
Gerrit-HasComments: Yes


[kudu-CR] [tablet] Fixed the bug of DeltaTracker::CountDeletedRows

2019-08-14 Thread helifu (Code Review)
helifu has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/14061 )

Change subject: [tablet] Fixed the bug of DeltaTracker::CountDeletedRows
..


Patch Set 2: Code-Review+1

(1 comment)

There is a gap between updating "rowset_metadata_" and resetting the 
"deleted_row_count_" indeed. Yeah, it is a bug! Thanks to Yao Xu.

http://gerrit.cloudera.org:8080/#/c/14061/2/src/kudu/tablet/delta_tracker.cc
File src/kudu/tablet/delta_tracker.cc:

http://gerrit.cloudera.org:8080/#/c/14061/2/src/kudu/tablet/delta_tracker.cc@786
PS2, Line 786: 
rowset_metadata_->IncrementLiveRows(-old_dms->deleted_row_count());
It seems to be the only way although it looks a bit obtrusive.



--
To view, visit http://gerrit.cloudera.org:8080/14061
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9bb4456123087778c9dc799777c5990938a84fdf
Gerrit-Change-Number: 14061
Gerrit-PatchSet: 2
Gerrit-Owner: Yao Xu 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: helifu 
Gerrit-Comment-Date: Wed, 14 Aug 2019 07:59:36 +
Gerrit-HasComments: Yes


[kudu-CR] [tablet] Fixed the bug of DeltaTracker::CountDeletedRows

2019-08-13 Thread Yao Xu (Code Review)
Hello Kudu Jenkins, helifu, Adar Dembo,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/14061

to look at the new patch set (#2).

Change subject: [tablet] Fixed the bug of DeltaTracker::CountDeletedRows
..

[tablet] Fixed the bug of DeltaTracker::CountDeletedRows

When Tablet.CountLiveRows was called in a multi-thread case, there's a
chance we'll see the following failure.

User stack:
F0814 12:05:51.975797 96375 diskrowset.cc:759] Check failed: *count >= 0 (-3 
vs. 0)
*** Check failure stack trace: ***
*** Aborted at 156571 (unix time) try "date -d @156571" if you are 
using GNU date ***
PC: @ 0x7f9bd20425f7 __GI_raise
*** SIGABRT (@0x70900017872) received by PID 96370 (TID 0x7f9bce2d7700) from 
PID 96370; stack trace: ***
@ 0x7f9bdaff6100 (unknown)
@ 0x7f9bd20425f7 __GI_raise
@ 0x7f9bd2043ce8 __GI_abort
@ 0x7f9bd4540c99 google::logging_fail()
@ 0x7f9bd454246d google::LogMessage::Fail()
@ 0x7f9bd45443c3 google::LogMessage::SendToLog()
@ 0x7f9bd4541fc9 google::LogMessage::Flush()
@ 0x7f9bd4544d4f google::LogMessageFatal::~LogMessageFatal()
@ 0x7f9bddc9aabe kudu::tablet::DiskRowSet::CountLiveRows()
@ 0x7f9bddbdeb79 kudu::tablet::Tablet::CountLiveRows()
@   0x49891f 
kudu::tablet::MultiThreadedTabletTest<>::CollectStatisticsThread()
@   0x4ae34b boost::_mfi::mf1<>::operator()()
@   0x4add25 boost::_bi::list2<>::operator()<>()
@   0x4acfe9 boost::_bi::bind_t<>::operator()()
@   0x4ac8a6 
boost::detail::function::void_function_obj_invoker0<>::invoke()
@ 0x7f9bd7116492 boost::function0<>::operator()()
@ 0x7f9bd62e5324 kudu::Thread::SuperviseThread()
@ 0x7f9bdafeedc5 start_thread
@ 0x7f9bd2103ced __clone

This is because there is DeltaTracker lack of lock protection when modify
the number of live rows in rowset_metadata_ and reset the deleted_row_count_.
This caused deleted_row_count_ to be duplicated when calculating the number
of live rows of DRS.

Change-Id: I9bb4456123087778c9dc799777c5990938a84fdf
---
M src/kudu/tablet/delta_tracker.cc
M src/kudu/tablet/delta_tracker.h
M src/kudu/tablet/mt-tablet-test.cc
3 files changed, 15 insertions(+), 15 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/61/14061/2
--
To view, visit http://gerrit.cloudera.org:8080/14061
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I9bb4456123087778c9dc799777c5990938a84fdf
Gerrit-Change-Number: 14061
Gerrit-PatchSet: 2
Gerrit-Owner: Yao Xu 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: helifu 


[kudu-CR] [tablet] Fixed the bug of DeltaTracker::CountDeletedRows

2019-08-13 Thread Yao Xu (Code Review)
Yao Xu has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/14061


Change subject: [tablet] Fixed the bug of DeltaTracker::CountDeletedRows
..

[tablet] Fixed the bug of DeltaTracker::CountDeletedRows

When Tablet.CountLiveRows was called in a multi-thread case, there's a
chance we'll see the following failure.

User stack:
F0814 12:05:51.975797 96375 diskrowset.cc:759] Check failed: *count >= 0 (-3 
vs. 0)
*** Check failure stack trace: ***
*** Aborted at 156571 (unix time) try "date -d @156571" if you are 
using GNU date ***
PC: @ 0x7f9bd20425f7 __GI_raise
*** SIGABRT (@0x70900017872) received by PID 96370 (TID 0x7f9bce2d7700) from 
PID 96370; stack trace: ***
@ 0x7f9bdaff6100 (unknown)
@ 0x7f9bd20425f7 __GI_raise
@ 0x7f9bd2043ce8 __GI_abort
@ 0x7f9bd4540c99 google::logging_fail()
@ 0x7f9bd454246d google::LogMessage::Fail()
@ 0x7f9bd45443c3 google::LogMessage::SendToLog()
@ 0x7f9bd4541fc9 google::LogMessage::Flush()
@ 0x7f9bd4544d4f google::LogMessageFatal::~LogMessageFatal()
@ 0x7f9bddc9aabe kudu::tablet::DiskRowSet::CountLiveRows()
@ 0x7f9bddbdeb79 kudu::tablet::Tablet::CountLiveRows()
@   0x49891f 
kudu::tablet::MultiThreadedTabletTest<>::CollectStatisticsThread()
@   0x4ae34b boost::_mfi::mf1<>::operator()()
@   0x4add25 boost::_bi::list2<>::operator()<>()
@   0x4acfe9 boost::_bi::bind_t<>::operator()()
@   0x4ac8a6 
boost::detail::function::void_function_obj_invoker0<>::invoke()
@ 0x7f9bd7116492 boost::function0<>::operator()()
@ 0x7f9bd62e5324 kudu::Thread::SuperviseThread()
@ 0x7f9bdafeedc5 start_thread
@ 0x7f9bd2103ced __clone

This is because there is DeltaTracker lack of lock protection when modify
the number of live rows in rowset_metadata_ and reset the deleted_row_count_.
This caused deleted_row_count_ to be duplicated when calculating the number
of live rows of DRS.

Change-Id: I9bb4456123087778c9dc799777c5990938a84fdf
---
M src/kudu/tablet/delta_tracker.cc
M src/kudu/tablet/delta_tracker.h
M src/kudu/tablet/mt-tablet-test.cc
3 files changed, 14 insertions(+), 15 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/61/14061/1
--
To view, visit http://gerrit.cloudera.org:8080/14061
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I9bb4456123087778c9dc799777c5990938a84fdf
Gerrit-Change-Number: 14061
Gerrit-PatchSet: 1
Gerrit-Owner: Yao Xu