[kudu-CR] KUDU-1601. Delete ancient UNDO delta blocks in the background

2017-02-23 Thread Mike Percy (Code Review)
Mike Percy has submitted this change and it was merged.

Change subject: KUDU-1601. Delete ancient UNDO delta blocks in the background
..


KUDU-1601. Delete ancient UNDO delta blocks in the background

This patch adds a maintenance manager background task that deletes
"ancient" UNDO delta blocks, which means blocks that correspond to data
that is considered no longer reachable and a candidate for garbage
collection. The task only deletes entire blocks and so does not provoke
write amplification.

This maintenance task operates in the following way:

1. UpdateStats() returns the maximum potentially gc'able bytes of undos
   in the rowset, which is the sum of all undo delta store sizes up
   until an initialized one with max_timestamp > the AHM (ancient
   history mark). The accuracy of this estimate improves over time, and
   in a steady state will be exact, as undo delta blocks are initialized
   while running Perform().

2. Perform() initializes undo delta stores for the tablet for some
   budgeted amount of time. Per rowset it initializes undo delta stores
   with its budget until it finds the earliest one with max_timestamp >
   AHM. That makes the next UpdateStats() call more accurate. Once it
   has exhausted its time budget, or has initialized all ancient undo
   blocks, it garbage-collects all of the known ancient undo delta
   blocks in the tablet.

To avoid starvation of performance improvement maintenance ops, a new
flag named --data_gc_prioritization_prob has been introduced that
incorporates some randomness into the scheduler at the maintenance
manager level. This controls the fraction of the time that the scheduler
considers data GC ops higher priority than performance improvement ops.

This patch includes the following:

* New UNDO delta block GC MM task
* New UNDO delta block GC metrics (at the tablet level only)
* Flags to enable / disable the GC task as well as flags to throttle it
* A few minor improvements in the maintenance manager
* Fixes for a few preexisting clang-tidy lint complaints

Notable implementation details:

* When performing undo delta GC in Tablet::DeleteAncientUndoDeltas(), we
  only flush the tablet metadata after making the metadata changes
  across all rowsets. This is safe because we are not actually modifying
  any data, we are simply removing references to blocks that are no
  longer reachable by new scanners. The code path that handles the
  metadata update for compactions and ancient history data GC,
  DeltaTracker::CommitDeltaStoreMetadataUpdate(), has a DCHECK in place
  to ensure that it is never called without specifying blocks to remove.
  This guarantees that the DeltaMemStore flush code path located in
  DeltaTracker::FlushDMS(), the only delta-related code path that
  modifies user-visible data, does not utilize that routine for its
  flush. This fact was also verified by inspection -- FlushDMS()
  contains its own flush code path.

Includes the following tests:

* RowSet-level unit test in diskrowset-test
* Tablet-level functional test in tablet_history_gc-test
* Tablet-level concurrency test in mt-tablet-test
* Integration test utilizing the tserver-level MM task in
  tablet_history_gc-itest
* Incorporated into RandomizedTabletHistoryGcITest in
  tablet_history_gc-itest

Manual testing:

* I ran 300 iterations of TabletHistoryGcITest.TestUndoDeltaBlockGc on
  the dist-test cluster under TSAN with 12 stress threads:
  http://dist-test.cloudera.org/job?job_id=mpercy.1487901212.3733

* I also ran YCSB on a 10-node cluster on a table with 200 tablets with
  mostly default parameters except for --tablet_history_max_age_sec=60.
  YCSB was configured like so:

recordcount=10
operationcount=600
updateproportion=1.0
requestdistribution=zipfian
threadcount=10
kudu_pre_split_num_tablets=200
kudu_sync_ops=true

  This workload took 839 seconds to run and I did not observe an average
  update latency increase over time (there was a mild sawtooth pattern),
  which indicated to me that the compaction operations were keeping up
  with the updates. The undo delta GC operations were also keeping pace
  and garbage was being collected aggressively, with generally only tens
  of MB, or less, of reclaimable data per tablet being present at any
  given time. It seems the current defaults are reasonable, although
  additional performance testing is likely warranted.

Change-Id: I0309bf7acfb6d018860c80f354012c3500da5c68
Reviewed-on: http://gerrit.cloudera.org:8080/4363
Tested-by: Mike Percy 
Reviewed-by: David Ribeiro Alves 
---
M src/kudu/integration-tests/tablet_history_gc-itest.cc
M src/kudu/master/master.cc
M src/kudu/tablet/delta_tracker.cc
M src/kudu/tablet/delta_tracker.h
M src/kudu/tablet/deltafile.cc
M src/kudu/tablet/deltafile.h
M src/kudu/tablet/diskrowset-test.cc
M src/kudu/tablet/diskrowset.cc
M 

[kudu-CR] KUDU-1601. Delete ancient UNDO delta blocks in the background

2017-02-23 Thread David Ribeiro Alves (Code Review)
David Ribeiro Alves has posted comments on this change.

Change subject: KUDU-1601. Delete ancient UNDO delta blocks in the background
..


Patch Set 20: Code-Review+2

(1 comment)

http://gerrit.cloudera.org:8080/#/c/4363/20//COMMIT_MSG
Commit Message:

PS20, Line 74: * I ran 300 iterations of 
TabletHistoryGcITest.TestUndoDeltaBlockGc on
 :   the dist-test cluster under TSAN with 12 stress threads:
 :   http://dist-test.cloudera.org/job?job_id=mpercy.1487901212.3733
 : 
 : * I also ran YCSB on a 10-node cluster on a table with 200 
tablets with
 :   mostly default parameters except for 
--tablet_history_max_age_sec=60.
 :   YCSB was configured like so:
 : 
 : recordcount=10
 : operationcount=600
 : updateproportion=1.0
 : requestdistribution=zipfian
 : threadcount=10
 : kudu_pre_split_num_tablets=200
 : kudu_sync_ops=true
 : 
 :   This workload took 839 seconds to run and I did not observe an 
average
 :   update latency increase over time (there was a mild sawtooth 
pattern),
 :   which indicated to me that the compaction operations were 
keeping up
 :   with the updates. The undo delta GC operations were also 
keeping pace
 :   and garbage was being collected aggressively, with generally 
only tens
 :   of MB, or less, of reclaimable data per tablet being present 
at any
 :   given time. It seems the current defaults are reasonable, 
although
 :   additional performance testing is likely warranted.
thanks for doing this.


-- 
To view, visit http://gerrit.cloudera.org:8080/4363
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I0309bf7acfb6d018860c80f354012c3500da5c68
Gerrit-PatchSet: 20
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Mike Percy 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Jean-Daniel Cryans 
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Tidy Bot
Gerrit-HasComments: Yes


[kudu-CR] KUDU-1601. Delete ancient UNDO delta blocks in the background

2017-02-23 Thread Mike Percy (Code Review)
Mike Percy has posted comments on this change.

Change subject: KUDU-1601. Delete ancient UNDO delta blocks in the background
..


Patch Set 20: Verified+1

Overriding "failed to bind" error unrelated to this patch.

-- 
To view, visit http://gerrit.cloudera.org:8080/4363
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I0309bf7acfb6d018860c80f354012c3500da5c68
Gerrit-PatchSet: 20
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Mike Percy 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Jean-Daniel Cryans 
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Tidy Bot
Gerrit-HasComments: No


[kudu-CR] KUDU-1601. Delete ancient UNDO delta blocks in the background

2017-02-23 Thread Mike Percy (Code Review)
Hello David Ribeiro Alves, Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/4363

to look at the new patch set (#20).

Change subject: KUDU-1601. Delete ancient UNDO delta blocks in the background
..

KUDU-1601. Delete ancient UNDO delta blocks in the background

This patch adds a maintenance manager background task that deletes
"ancient" UNDO delta blocks, which means blocks that correspond to data
that is considered no longer reachable and a candidate for garbage
collection. The task only deletes entire blocks and so does not provoke
write amplification.

This maintenance task operates in the following way:

1. UpdateStats() returns the maximum potentially gc'able bytes of undos
   in the rowset, which is the sum of all undo delta store sizes up
   until an initialized one with max_timestamp > the AHM (ancient
   history mark). The accuracy of this estimate improves over time, and
   in a steady state will be exact, as undo delta blocks are initialized
   while running Perform().

2. Perform() initializes undo delta stores for the tablet for some
   budgeted amount of time. Per rowset it initializes undo delta stores
   with its budget until it finds the earliest one with max_timestamp >
   AHM. That makes the next UpdateStats() call more accurate. Once it
   has exhausted its time budget, or has initialized all ancient undo
   blocks, it garbage-collects all of the known ancient undo delta
   blocks in the tablet.

To avoid starvation of performance improvement maintenance ops, a new
flag named --data_gc_prioritization_prob has been introduced that
incorporates some randomness into the scheduler at the maintenance
manager level. This controls the fraction of the time that the scheduler
considers data GC ops higher priority than performance improvement ops.

This patch includes the following:

* New UNDO delta block GC MM task
* New UNDO delta block GC metrics (at the tablet level only)
* Flags to enable / disable the GC task as well as flags to throttle it
* A few minor improvements in the maintenance manager
* Fixes for a few preexisting clang-tidy lint complaints

Notable implementation details:

* When performing undo delta GC in Tablet::DeleteAncientUndoDeltas(), we
  only flush the tablet metadata after making the metadata changes
  across all rowsets. This is safe because we are not actually modifying
  any data, we are simply removing references to blocks that are no
  longer reachable by new scanners. The code path that handles the
  metadata update for compactions and ancient history data GC,
  DeltaTracker::CommitDeltaStoreMetadataUpdate(), has a DCHECK in place
  to ensure that it is never called without specifying blocks to remove.
  This guarantees that the DeltaMemStore flush code path located in
  DeltaTracker::FlushDMS(), the only delta-related code path that
  modifies user-visible data, does not utilize that routine for its
  flush. This fact was also verified by inspection -- FlushDMS()
  contains its own flush code path.

Includes the following tests:

* RowSet-level unit test in diskrowset-test
* Tablet-level functional test in tablet_history_gc-test
* Tablet-level concurrency test in mt-tablet-test
* Integration test utilizing the tserver-level MM task in
  tablet_history_gc-itest
* Incorporated into RandomizedTabletHistoryGcITest in
  tablet_history_gc-itest

Manual testing:

* I ran 300 iterations of TabletHistoryGcITest.TestUndoDeltaBlockGc on
  the dist-test cluster under TSAN with 12 stress threads:
  http://dist-test.cloudera.org/job?job_id=mpercy.1487901212.3733

* I also ran YCSB on a 10-node cluster on a table with 200 tablets with
  mostly default parameters except for --tablet_history_max_age_sec=60.
  YCSB was configured like so:

recordcount=10
operationcount=600
updateproportion=1.0
requestdistribution=zipfian
threadcount=10
kudu_pre_split_num_tablets=200
kudu_sync_ops=true

  This workload took 839 seconds to run and I did not observe an average
  update latency increase over time (there was a mild sawtooth pattern),
  which indicated to me that the compaction operations were keeping up
  with the updates. The undo delta GC operations were also keeping pace
  and garbage was being collected aggressively, with generally only tens
  of MB, or less, of reclaimable data per tablet being present at any
  given time. It seems the current defaults are reasonable, although
  additional performance testing is likely warranted.

Change-Id: I0309bf7acfb6d018860c80f354012c3500da5c68
---
M src/kudu/integration-tests/tablet_history_gc-itest.cc
M src/kudu/master/master.cc
M src/kudu/tablet/delta_tracker.cc
M src/kudu/tablet/delta_tracker.h
M src/kudu/tablet/deltafile.cc
M src/kudu/tablet/deltafile.h
M src/kudu/tablet/diskrowset-test.cc
M src/kudu/tablet/diskrowset.cc
M src/kudu/tablet/diskrowset.h
M src/kudu/tablet/memrowset.h

[kudu-CR] KUDU-1601. Delete ancient UNDO delta blocks in the background

2017-02-23 Thread David Ribeiro Alves (Code Review)
David Ribeiro Alves has posted comments on this change.

Change subject: KUDU-1601. Delete ancient UNDO delta blocks in the background
..


Patch Set 19: Code-Review+1

code lgtm, just waiting on those dist-test/cluster test results

-- 
To view, visit http://gerrit.cloudera.org:8080/4363
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I0309bf7acfb6d018860c80f354012c3500da5c68
Gerrit-PatchSet: 19
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Mike Percy 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Jean-Daniel Cryans 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Tidy Bot
Gerrit-HasComments: No


[kudu-CR] KUDU-1601. Delete ancient UNDO delta blocks in the background

2017-02-23 Thread Mike Percy (Code Review)
Mike Percy has posted comments on this change.

Change subject: KUDU-1601. Delete ancient UNDO delta blocks in the background
..


Patch Set 18:

Posting an updated patch for review. I'll follow up here with more testing 
details in a little while.

-- 
To view, visit http://gerrit.cloudera.org:8080/4363
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I0309bf7acfb6d018860c80f354012c3500da5c68
Gerrit-PatchSet: 18
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Mike Percy 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Jean-Daniel Cryans 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Tidy Bot
Gerrit-HasComments: No


[kudu-CR] KUDU-1601. Delete ancient UNDO delta blocks in the background

2017-02-23 Thread Mike Percy (Code Review)
Hello Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/4363

to look at the new patch set (#19).

Change subject: KUDU-1601. Delete ancient UNDO delta blocks in the background
..

KUDU-1601. Delete ancient UNDO delta blocks in the background

This patch adds a maintenance manager background task that deletes
"ancient" UNDO delta blocks, which means blocks that correspond to data
that is considered no longer reachable and a candidate for garbage
collection. The task only deletes entire blocks and so does not provoke
write amplification.

This maintenance task operates in the following way:

1. UpdateStats() returns the maximum potentially gc'able bytes of undos
   in the rowset, which is the sum of all undo delta store sizes up
   until an initialized one with max_timestamp > the AHM (ancient
   history mark). The accuracy of this estimate improves over time, and
   in a steady state will be exact, as undo delta blocks are initialized
   while running Perform().

2. Perform() initializes undo delta stores for the tablet for some
   budgeted amount of time. Per rowset it initializes undo delta stores
   with its budget until it finds the earliest one with max_timestamp >
   AHM. That makes the next UpdateStats() call more accurate. Once it
   has exhausted its time budget, or has initialized all ancient undo
   blocks, it garbage-collects all of the known ancient undo delta
   blocks in the tablet.

To avoid starvation of performance improvement maintenance ops, a new
flag named --data_gc_prioritization_prob has been introduced that
incorporates some randomness into the scheduler at the maintenance
manager level. This controls the fraction of the time that the scheduler
considers data GC ops higher priority than performance improvement ops.

This patch includes the following:

* New UNDO delta block GC MM task
* New UNDO delta block GC metrics (at the tablet level only)
* Flags to enable / disable the GC task as well as flags to throttle it
* A few minor improvements in the maintenance manager
* Fixes for a few preexisting clang-tidy lint complaints

Notable implementation details:

* When performing undo delta GC in Tablet::DeleteAncientUndoDeltas(), we
  only flush the tablet metadata after making the metadata changes
  across all rowsets. This is safe because we are not actually modifying
  any data, we are simply removing references to blocks that are no
  longer reachable by new scanners. The code path that handles the
  metadata update for compactions and ancient history data GC,
  DeltaTracker::CommitDeltaStoreMetadataUpdate(), has a DCHECK in place
  to ensure that it is never called without specifying blocks to remove.
  This guarantees that the DeltaMemStore flush code path located in
  DeltaTracker::FlushDMS(), the only delta-related code path that
  modifies user-visible data, does not utilize that routine for its
  flush. This fact was also verified by inspection -- FlushDMS()
  contains its own flush code path.

Includes the following tests:

* RowSet-level unit test in diskrowset-test
* Tablet-level functional test in tablet_history_gc-test
* Tablet-level concurrency test in mt-tablet-test
* Integration test utilizing the tserver-level MM task in
  tablet_history_gc-itest
* Incorporated into RandomizedTabletHistoryGcITest in
  tablet_history_gc-itest

Change-Id: I0309bf7acfb6d018860c80f354012c3500da5c68
---
M src/kudu/integration-tests/tablet_history_gc-itest.cc
M src/kudu/master/master.cc
M src/kudu/tablet/delta_tracker.cc
M src/kudu/tablet/delta_tracker.h
M src/kudu/tablet/deltafile.cc
M src/kudu/tablet/deltafile.h
M src/kudu/tablet/diskrowset-test.cc
M src/kudu/tablet/diskrowset.cc
M src/kudu/tablet/diskrowset.h
M src/kudu/tablet/memrowset.h
M src/kudu/tablet/mock-rowsets.h
M src/kudu/tablet/mt-tablet-test.cc
M src/kudu/tablet/rowset.h
M src/kudu/tablet/rowset_metadata.cc
M src/kudu/tablet/rowset_metadata.h
M src/kudu/tablet/tablet.cc
M src/kudu/tablet/tablet.h
M src/kudu/tablet/tablet_history_gc-test.cc
M src/kudu/tablet/tablet_metrics.cc
M src/kudu/tablet/tablet_metrics.h
M src/kudu/tablet/tablet_mm_ops.cc
M src/kudu/tablet/tablet_mm_ops.h
M src/kudu/tserver/tablet_server.cc
M src/kudu/util/maintenance_manager-test.cc
M src/kudu/util/maintenance_manager.cc
M src/kudu/util/maintenance_manager.h
26 files changed, 1,347 insertions(+), 143 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/63/4363/19
-- 
To view, visit http://gerrit.cloudera.org:8080/4363
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I0309bf7acfb6d018860c80f354012c3500da5c68
Gerrit-PatchSet: 19
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Mike Percy 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Jean-Daniel 

[kudu-CR] KUDU-1601. Delete ancient UNDO delta blocks in the background

2017-02-23 Thread Mike Percy (Code Review)
Mike Percy has posted comments on this change.

Change subject: KUDU-1601. Delete ancient UNDO delta blocks in the background
..


Patch Set 18:

(12 comments)

http://gerrit.cloudera.org:8080/#/c/4363/18//COMMIT_MSG
Commit Message:

PS18, Line 17: for the op
> nit: remove "for the op" here and below
Done


PS18, Line 17: 1. UpdateStats() for the op returns the maximum potentially 
gc'able
 :bytes of undos in the rowset, which is the sum of all undo 
delta
 :store sizes up until an initialized one with
 :max_timestamp > the AHM (ancient history mark).
> mention that eventually this will be exact
Done


PS18, Line 22: deltas
> s/deltas/delta stores for the tablet (and remove on the tablet at the end)
Done


PS18, Line 23: Per rowset it performs a binary search and
 :initializes undo delta stores until it finds the earliest one 
with
 :max_timestamp > AHM
> this is no longer true
Done


PS18, Line 30: To avoid starvation of performance ops, there is a flag that
 : incorporates some randomness into the scheduler, at the MM 
level. This
 : limits the fraction of the time we consider data GC ops higher
 : priority than performance ops.
> mention the flag by name
Done


PS18, Line 47: This should be safe because we are not actually
 :   modifying any data, we are simply removing references to 
blocks that
 :   are no longer reachable by new scanners.
> mention how you made sure that this wouldn't race with compactions
Done


http://gerrit.cloudera.org:8080/#/c/4363/18/src/kudu/integration-tests/tablet_history_gc-itest.cc
File src/kudu/integration-tests/tablet_history_gc-itest.cc:

PS18, Line 193:   AssertEventually([&] {
  : 
ASSERT_GT(tablet->metrics()->undo_delta_block_gc_init_duration->TotalCount(), 
0);
  : 
ASSERT_GT(tablet->metrics()->undo_delta_block_gc_delete_duration->TotalCount(), 
0);
  : 
ASSERT_GT(tablet->metrics()->undo_delta_block_gc_perform_duration->TotalCount(),
 0);
  : ASSERT_EQ(0, 
tablet->metrics()->undo_delta_block_gc_running->value());
  : 
ASSERT_GT(tablet->metrics()->undo_delta_block_gc_bytes_deleted->value(), 0);
  : ASSERT_EQ(0, 
tablet->metrics()->undo_delta_block_estimated_retained_bytes->value());
  :   });
> would be good to add an assertion that actually made sure we're occupying l
Done


http://gerrit.cloudera.org:8080/#/c/4363/18/src/kudu/tablet/delta_tracker.cc
File src/kudu/tablet/delta_tracker.cc:

PS18, Line 178: void DeltaTracker::ValidateDeltaOrder(const 
std::shared_ptr& first,
  :   const 
std::shared_ptr& second,
  :   DeltaType type) {
> oh, I see you just made them part of the class. please move the docs to the
Done


PS18, Line 321: because this function is specified
  : // only to be used for compactions
> thanks for making this clearer.
Got a little more specific in this comment and in the header comment.


http://gerrit.cloudera.org:8080/#/c/4363/18/src/kudu/tablet/delta_tracker.h
File src/kudu/tablet/delta_tracker.h:

PS18, Line 178: ValidateDeltaOrder
> these are new right? please add some docs
Done


PS18, Line 181: ValidateDeltasOrdered
> these are new right? please add some docs
Done


http://gerrit.cloudera.org:8080/#/c/4363/18/src/kudu/tablet/tablet_mm_ops.h
File src/kudu/tablet/tablet_mm_ops.h:

PS18, Line 134: void UpdateStats(MaintenanceOpStats* stats) override;
> doc this and perform since they have unusual behavior
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/4363
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I0309bf7acfb6d018860c80f354012c3500da5c68
Gerrit-PatchSet: 18
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Mike Percy 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Jean-Daniel Cryans 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Tidy Bot
Gerrit-HasComments: Yes


[kudu-CR] KUDU-1601. Delete ancient UNDO delta blocks in the background

2017-02-23 Thread David Ribeiro Alves (Code Review)
David Ribeiro Alves has posted comments on this change.

Change subject: KUDU-1601. Delete ancient UNDO delta blocks in the background
..


Patch Set 18:

(13 comments)

http://gerrit.cloudera.org:8080/#/c/4363/18//COMMIT_MSG
Commit Message:

PS18, Line 7: KUDU-1601. Delete ancient UNDO delta blocks in the background
As we discussed it would be important to run this on dist-test and have the 
link here. Maybe even report real cluster results, if you have the time.


PS18, Line 17: for the op
nit: remove "for the op" here and below


PS18, Line 17: 1. UpdateStats() for the op returns the maximum potentially 
gc'able
 :bytes of undos in the rowset, which is the sum of all undo 
delta
 :store sizes up until an initialized one with
 :max_timestamp > the AHM (ancient history mark).
mention that eventually this will be exact


PS18, Line 22: deltas
s/deltas/delta stores for the tablet (and remove on the tablet at the end)


PS18, Line 23: Per rowset it performs a binary search and
 :initializes undo delta stores until it finds the earliest one 
with
 :max_timestamp > AHM
this is no longer true


PS18, Line 30: To avoid starvation of performance ops, there is a flag that
 : incorporates some randomness into the scheduler, at the MM 
level. This
 : limits the fraction of the time we consider data GC ops higher
 : priority than performance ops.
mention the flag by name


PS18, Line 47: This should be safe because we are not actually
 :   modifying any data, we are simply removing references to 
blocks that
 :   are no longer reachable by new scanners.
mention how you made sure that this wouldn't race with compactions


http://gerrit.cloudera.org:8080/#/c/4363/18/src/kudu/integration-tests/tablet_history_gc-itest.cc
File src/kudu/integration-tests/tablet_history_gc-itest.cc:

PS18, Line 193:   AssertEventually([&] {
  : 
ASSERT_GT(tablet->metrics()->undo_delta_block_gc_init_duration->TotalCount(), 
0);
  : 
ASSERT_GT(tablet->metrics()->undo_delta_block_gc_delete_duration->TotalCount(), 
0);
  : 
ASSERT_GT(tablet->metrics()->undo_delta_block_gc_perform_duration->TotalCount(),
 0);
  : ASSERT_EQ(0, 
tablet->metrics()->undo_delta_block_gc_running->value());
  : 
ASSERT_GT(tablet->metrics()->undo_delta_block_gc_bytes_deleted->value(), 0);
  : ASSERT_EQ(0, 
tablet->metrics()->undo_delta_block_estimated_retained_bytes->value());
  :   });
would be good to add an assertion that actually made sure we're occupying less 
space on disk. Anyway at all this is feasible easily?


http://gerrit.cloudera.org:8080/#/c/4363/18/src/kudu/tablet/delta_tracker.cc
File src/kudu/tablet/delta_tracker.cc:

PS18, Line 178: void DeltaTracker::ValidateDeltaOrder(const 
std::shared_ptr& first,
  :   const 
std::shared_ptr& second,
  :   DeltaType type) {
oh, I see you just made them part of the class. please move the docs to the 
header.


PS18, Line 321: because this function is specified
  : // only to be used for compactions
thanks for making this clearer.
you mean this block right? the undos gc code also calls this funtion.


http://gerrit.cloudera.org:8080/#/c/4363/18/src/kudu/tablet/delta_tracker.h
File src/kudu/tablet/delta_tracker.h:

PS18, Line 178: ValidateDeltaOrder
these are new right? please add some docs


PS18, Line 181: ValidateDeltasOrdered
these are new right? please add some docs


http://gerrit.cloudera.org:8080/#/c/4363/18/src/kudu/tablet/tablet_mm_ops.h
File src/kudu/tablet/tablet_mm_ops.h:

PS18, Line 134: void UpdateStats(MaintenanceOpStats* stats) override;
doc this and perform since they have unusual behavior


-- 
To view, visit http://gerrit.cloudera.org:8080/4363
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I0309bf7acfb6d018860c80f354012c3500da5c68
Gerrit-PatchSet: 18
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Mike Percy 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Jean-Daniel Cryans 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Tidy Bot
Gerrit-HasComments: Yes


[kudu-CR] KUDU-1601. Delete ancient UNDO delta blocks in the background

2017-02-23 Thread Mike Percy (Code Review)
Hello Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/4363

to look at the new patch set (#18).

Change subject: KUDU-1601. Delete ancient UNDO delta blocks in the background
..

KUDU-1601. Delete ancient UNDO delta blocks in the background

This patch adds a maintenance manager background task that deletes
"ancient" UNDO delta blocks, which means blocks that correspond to data
that is considered no longer reachable and a candidate for garbage
collection. The task only deletes entire blocks and so does not provoke
write amplification.

This maintenance task operates in the following way:

1. UpdateStats() for the op returns the maximum potentially gc'able
   bytes of undos in the rowset, which is the sum of all undo delta
   store sizes up until an initialized one with
   max_timestamp > the AHM (ancient history mark).

2. Perform() for the op initializes deltas for some budgeted amount of
   time on the tablet. Per rowset it performs a binary search and
   initializes undo delta stores until it finds the earliest one with
   max_timestamp > AHM. That makes the next UpdateStats() call more
   accurate. Then it starts from the oldest and initializes undos within
   the budget. Finally, once it's through with its time budget, it
   garbage-collects as many undos in the rowset as possible.

To avoid starvation of performance ops, there is a flag that
incorporates some randomness into the scheduler, at the MM level. This
limits the fraction of the time we consider data GC ops higher
priority than performance ops.

This patch includes the following:

* New UNDO delta block GC MM task
* New UNDO delta block GC metrics (at the tablet level only)
* Flags to enable / disable the GC task as well as flags to throttle it
* A few minor improvements in the maintenance manager
* Fixes for a few preexisting clang-tidy lint complaints

Notable implementation details:

* When performing undo delta GC in Tablet::DeleteAncientUndoDeltas(), we
  only flush the tablet metadata after making the metadata changes
  across all rowsets. This should be safe because we are not actually
  modifying any data, we are simply removing references to blocks that
  are no longer reachable by new scanners.

Includes the following tests:

* RowSet-level unit test in diskrowset-test
* Tablet-level functional test in tablet_history_gc-test
* Tablet-level concurrency test in mt-tablet-test
* Integration test utilizing the tserver-level MM task in
  tablet_history_gc-itest
* Incorporated into RandomizedTabletHistoryGcITest in
  tablet_history_gc-itest

Change-Id: I0309bf7acfb6d018860c80f354012c3500da5c68
---
M src/kudu/integration-tests/tablet_history_gc-itest.cc
M src/kudu/master/master.cc
M src/kudu/tablet/delta_tracker.cc
M src/kudu/tablet/delta_tracker.h
M src/kudu/tablet/deltafile.cc
M src/kudu/tablet/deltafile.h
M src/kudu/tablet/diskrowset-test.cc
M src/kudu/tablet/diskrowset.cc
M src/kudu/tablet/diskrowset.h
M src/kudu/tablet/memrowset.h
M src/kudu/tablet/mock-rowsets.h
M src/kudu/tablet/mt-tablet-test.cc
M src/kudu/tablet/rowset.h
M src/kudu/tablet/rowset_metadata.cc
M src/kudu/tablet/rowset_metadata.h
M src/kudu/tablet/tablet.cc
M src/kudu/tablet/tablet.h
M src/kudu/tablet/tablet_history_gc-test.cc
M src/kudu/tablet/tablet_metrics.cc
M src/kudu/tablet/tablet_metrics.h
M src/kudu/tablet/tablet_mm_ops.cc
M src/kudu/tablet/tablet_mm_ops.h
M src/kudu/tserver/tablet_server.cc
M src/kudu/util/maintenance_manager-test.cc
M src/kudu/util/maintenance_manager.cc
M src/kudu/util/maintenance_manager.h
26 files changed, 1,322 insertions(+), 140 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/63/4363/18
-- 
To view, visit http://gerrit.cloudera.org:8080/4363
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I0309bf7acfb6d018860c80f354012c3500da5c68
Gerrit-PatchSet: 18
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Mike Percy 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Jean-Daniel Cryans 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Tidy Bot


[kudu-CR] KUDU-1601. Delete ancient UNDO delta blocks in the background

2017-02-23 Thread Mike Percy (Code Review)
Mike Percy has posted comments on this change.

Change subject: KUDU-1601. Delete ancient UNDO delta blocks in the background
..


Patch Set 16:

(7 comments)

http://gerrit.cloudera.org:8080/#/c/4363/16/src/kudu/tablet/delta_tracker.cc
File src/kudu/tablet/delta_tracker.cc:

PS16, Line 301: because this function is specified
  : // only to be used for compactions.
> again not totally clear. Can you elaborate on why this being just used for 
I tried again, LMK if this is clearer.


PS16, Line 363: Timestamp::kInvalidTimestamp
> my observation about making the timestamp kInitialTimestamp was that you wo
In the case of this API I don't think we need to support this so I will remove 
it.

That's not what I want with the Init() API, though. If max_timestamp > AHM then 
it would short circuit, whereas I would like it to Init() everything if the AHM 
is invalid.


PS16, Line 383: // First, find the boundary of where we should stop 
initializing.
  :   // We use binary search to find the oldest delta store with 
max_timestamp >= AHM.
  :   // Using binary search here allows us to get a better 
estimate from
  :   // EstimateBytesInPotentiallyAncientUndoDeltas() later.
> I'm not following. you use binary search but then ignore what you found or 
The main point is we are calling Init() to try and optimize Estimate(). But 
based on our conversation on Slack I'll remove this early optimization for now.


PS16, Line 387: if (ancient_history_mark != Timestamp::kInvalidTimestamp) {
> it's kind of weird that if timestamp is kInvalidTimestmap this doesn't actu
The purpose of this search was to find the undo at the boundary of the AHM, so 
if there is no AHM it doesn't make sense to do the search. But I've removed 
this code so now it's a moot point.

In the code block below, if the AHM is invalid then it actually Init()s 
everything, which I rely on in tests.


PS16, Line 391: deadline.Initialized()
> can we skip the Initialized and just use MonoTime::max() when we don't want
removed


PS16, Line 407: deadline.Initialized()
> same
This is useful because I also want to provide the option to specify an 
unlimited budget at the Tablet level (it's used by tests) and it's more natural 
to convert an uninitialized budget to an uninitialized deadline vs. trying to 
ensure that a max budget ends up being coerced into a max deadline.


PS16, Line 431: if (ancient_history_mark == Timestamp::kInvalidTimestamp) {
  : return Status::InvalidArgument("ancient_history_mark must 
not be an invalid timestamp");
  :   }
> so in the method above we silently do nothing on this value for the ahm, wh
I think it makes sense that you can't delete ancient undos if you don't have 
the AHM to determine whether they are ancient. However this would probably make 
a better DCHECK since I think it's a programming error.


-- 
To view, visit http://gerrit.cloudera.org:8080/4363
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I0309bf7acfb6d018860c80f354012c3500da5c68
Gerrit-PatchSet: 16
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Mike Percy 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Jean-Daniel Cryans 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Tidy Bot
Gerrit-HasComments: Yes


[kudu-CR] KUDU-1601. Delete ancient UNDO delta blocks in the background

2017-02-23 Thread David Ribeiro Alves (Code Review)
David Ribeiro Alves has posted comments on this change.

Change subject: KUDU-1601. Delete ancient UNDO delta blocks in the background
..


Patch Set 17:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/4363/17/src/kudu/tablet/delta_tracker.cc
File src/kudu/tablet/delta_tracker.cc:

PS17, Line 383:   // First, find the boundary of where we should stop 
initializing.
  :   // We use binary search to find the oldest delta store with 
max_timestamp >= AHM.
  :   // Using binary search here allows us to get a better 
estimate from
  :   // EstimateBytesInPotentiallyAncientUndoDeltas() later.
  :   if (ancient_history_mark != Timestamp::kInvalidTimestamp) {
  : size_t pos_lower_bound_incl = 0;
  : size_t pos_upper_bound_excl = undos_newest_first.size();
  : while (pos_lower_bound_incl < pos_upper_bound_excl) {
  :   if (deadline.Initialized() && MonoTime::Now() >= 
deadline) break;
  :   size_t midpoint = (pos_lower_bound_incl + 
pos_upper_bound_excl) / 2;
  :   auto& undo = undos_newest_first[midpoint];
  :   if (!undo->Initted()) {
  : RETURN_NOT_OK(undo->Init());
  :   }
  :   if (undo->delta_stats().max_timestamp() < 
ancient_history_mark) {
  : pos_upper_bound_excl = midpoint; // Undos are stored 
newest-first.
  :   } else {
  : pos_lower_bound_incl = midpoint;
  :   }
  : }
  :   }
think you might have forgotten to post the replies to my comments. It's still 
not totally clear to me why and how much binary search helps here. If we have a 
long undo history the we'd find the last one we could delete (which could 
change the next time we did this) but then below we go at it from the first 
one. Unless you have some numbers showing this helps I'm pro keeping it simple.


-- 
To view, visit http://gerrit.cloudera.org:8080/4363
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I0309bf7acfb6d018860c80f354012c3500da5c68
Gerrit-PatchSet: 17
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Mike Percy 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Jean-Daniel Cryans 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Tidy Bot
Gerrit-HasComments: Yes


[kudu-CR] KUDU-1601. Delete ancient UNDO delta blocks in the background

2017-02-22 Thread Mike Percy (Code Review)
Mike Percy has uploaded a new patch set (#17).

Change subject: KUDU-1601. Delete ancient UNDO delta blocks in the background
..

KUDU-1601. Delete ancient UNDO delta blocks in the background

This patch adds a maintenance manager background task that deletes
"ancient" UNDO delta blocks, which means blocks that correspond to data
that is considered no longer reachable and a candidate for garbage
collection. The task only deletes entire blocks and so does not provoke
write amplification.

This maintenance task operates in the following way:

1. UpdateStats() for the op returns the maximum potentially gc'able
   bytes of undos in the rowset, which is the sum of all undo delta
   store sizes up until an initialized one with
   max_timestamp > the AHM (ancient history mark).

2. Perform() for the op initializes deltas for some budgeted amount of
   time on the tablet. Per rowset it performs a binary search and
   initializes undo delta stores until it finds the earliest one with
   max_timestamp > AHM. That makes the next UpdateStats() call more
   accurate. Then it starts from the oldest and initializes undos within
   the budget. Finally, once it's through with its time budget, it
   garbage-collects as many undos in the rowset as possible.

To avoid starvation of performance ops, there is a flag that
incorporates some randomness into the scheduler, at the MM level. This
limits the fraction of the time we consider data GC ops higher
priority than performance ops.

This patch includes the following:

* New UNDO delta block GC MM task
* New UNDO delta block GC metrics (at the tablet level only)
* Flags to enable / disable the GC task as well as flags to throttle it
* A few minor improvements in the maintenance manager
* Fixes for a few preexisting clang-tidy lint complaints

Notable implementation details:

* When performing undo delta GC in Tablet::DeleteAncientUndoDeltas(), we
  only flush the tablet metadata after making the metadata changes
  across all rowsets. This should be safe because we are not actually
  modifying any data, we are simply removing references to blocks that
  are no longer reachable by new scanners.

Includes the following tests:

* RowSet-level unit test in diskrowset-test
* Tablet-level functional test in tablet_history_gc-test
* Tablet-level concurrency test in mt-tablet-test
* Integration test utilizing the tserver-level MM task in
  tablet_history_gc-itest
* Incorporated into RandomizedTabletHistoryGcITest in
  tablet_history_gc-itest

Change-Id: I0309bf7acfb6d018860c80f354012c3500da5c68
---
M src/kudu/integration-tests/tablet_history_gc-itest.cc
M src/kudu/master/master.cc
M src/kudu/tablet/delta_tracker.cc
M src/kudu/tablet/delta_tracker.h
M src/kudu/tablet/diskrowset-test.cc
M src/kudu/tablet/diskrowset.cc
M src/kudu/tablet/diskrowset.h
M src/kudu/tablet/memrowset.h
M src/kudu/tablet/mock-rowsets.h
M src/kudu/tablet/mt-tablet-test.cc
M src/kudu/tablet/rowset.h
M src/kudu/tablet/rowset_metadata.cc
M src/kudu/tablet/rowset_metadata.h
M src/kudu/tablet/tablet.cc
M src/kudu/tablet/tablet.h
M src/kudu/tablet/tablet_history_gc-test.cc
M src/kudu/tablet/tablet_metrics.cc
M src/kudu/tablet/tablet_metrics.h
M src/kudu/tablet/tablet_mm_ops.cc
M src/kudu/tablet/tablet_mm_ops.h
M src/kudu/tserver/tablet_server.cc
M src/kudu/util/maintenance_manager-test.cc
M src/kudu/util/maintenance_manager.cc
M src/kudu/util/maintenance_manager.h
24 files changed, 1,278 insertions(+), 126 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/63/4363/17
-- 
To view, visit http://gerrit.cloudera.org:8080/4363
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I0309bf7acfb6d018860c80f354012c3500da5c68
Gerrit-PatchSet: 17
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Mike Percy 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Jean-Daniel Cryans 
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Tidy Bot


[kudu-CR] KUDU-1601. Delete ancient UNDO delta blocks in the background

2017-02-22 Thread David Ribeiro Alves (Code Review)
David Ribeiro Alves has posted comments on this change.

Change subject: KUDU-1601. Delete ancient UNDO delta blocks in the background
..


Patch Set 16:

(7 comments)

http://gerrit.cloudera.org:8080/#/c/4363/16/src/kudu/tablet/delta_tracker.cc
File src/kudu/tablet/delta_tracker.cc:

PS16, Line 301: because this function is specified
  : // only to be used for compactions.
again not totally clear. Can you elaborate on why this being just used for 
compactions is relevant?


PS16, Line 363: Timestamp::kInvalidTimestamp
my observation about making the timestamp kInitialTimestamp was that you 
wouldn't need this extra if condition, it would always compare to less than the 
AHM


PS16, Line 383: // First, find the boundary of where we should stop 
initializing.
  :   // We use binary search to find the oldest delta store with 
max_timestamp >= AHM.
  :   // Using binary search here allows us to get a better 
estimate from
  :   // EstimateBytesInPotentiallyAncientUndoDeltas() later.
I'm not following. you use binary search but then ignore what you found or am 
i'm missing something? what's the trick here? if it's not extremely useful I'd 
suggest dropping this and going for simplicity at least postpone to a follow up 
patch where we can show this is much better.


PS16, Line 387: if (ancient_history_mark != Timestamp::kInvalidTimestamp) {
it's kind of weird that if timestamp is kInvalidTimestmap this doesn't actually 
init anything. In which cases is kInvalidTimestamp used? is it the default 
value for the the AHM?


PS16, Line 391: deadline.Initialized()
can we skip the Initialized and just use MonoTime::max() when we don't want a 
deadline?


PS16, Line 407: deadline.Initialized()
same


PS16, Line 431: if (ancient_history_mark == Timestamp::kInvalidTimestamp) {
  : return Status::InvalidArgument("ancient_history_mark must 
not be an invalid timestamp");
  :   }
so in the method above we silently do nothing on this value for the ahm, while 
here we return an error status.


-- 
To view, visit http://gerrit.cloudera.org:8080/4363
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I0309bf7acfb6d018860c80f354012c3500da5c68
Gerrit-PatchSet: 16
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Mike Percy 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Jean-Daniel Cryans 
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Tidy Bot
Gerrit-HasComments: Yes


[kudu-CR] KUDU-1601. Delete ancient UNDO delta blocks in the background

2017-02-22 Thread Mike Percy (Code Review)
Mike Percy has posted comments on this change.

Change subject: KUDU-1601. Delete ancient UNDO delta blocks in the background
..


Patch Set 16: Verified+1

The ITClient test failed most likely due to KUDU-1894. Overriding.

-- 
To view, visit http://gerrit.cloudera.org:8080/4363
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I0309bf7acfb6d018860c80f354012c3500da5c68
Gerrit-PatchSet: 16
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Mike Percy 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Jean-Daniel Cryans 
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Tidy Bot
Gerrit-HasComments: No


[kudu-CR] KUDU-1601. Delete ancient UNDO delta blocks in the background

2017-02-22 Thread Mike Percy (Code Review)
Hello Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/4363

to look at the new patch set (#16).

Change subject: KUDU-1601. Delete ancient UNDO delta blocks in the background
..

KUDU-1601. Delete ancient UNDO delta blocks in the background

This patch adds a maintenance manager background task that deletes
"ancient" UNDO delta blocks, which means blocks that correspond to data
that is considered no longer reachable and a candidate for garbage
collection. The task only deletes entire blocks and so does not provoke
write amplification.

This maintenance task operates in the following way:

1. UpdateStats() for the op returns the maximum potentially gc'able
   bytes of undos in the rowset, which is the sum of all undo delta
   store sizes up until an initialized one with
   max_timestamp > the AHM (ancient history mark).

2. Perform() for the op initializes deltas for some budgeted amount of
   time on the tablet. Per rowset it performs a binary search and
   initializes undo delta stores until it finds the earliest one with
   max_timestamp > AHM. That makes the next UpdateStats() call more
   accurate. Then it starts from the oldest and initializes undos within
   the budget. Finally, once it's through with its time budget, it
   garbage-collects as many undos in the rowset as possible.

To avoid starvation of performance ops, there is a flag that
incorporates some randomness into the scheduler, at the MM level. This
limits the fraction of the time we consider data GC ops higher
priority than performance ops.

This patch includes the following:

* New UNDO delta block GC MM task
* New UNDO delta block GC metrics (at the tablet level only)
* Flags to enable / disable the GC task as well as flags to throttle it
* A few minor improvements in the maintenance manager
* Fixes for a few preexisting clang-tidy lint complaints

Notable implementation details:

* When performing undo delta GC in Tablet::DeleteAncientUndoDeltas(), we
  only flush the tablet metadata after making the metadata changes
  across all rowsets. This should be safe because we are not actually
  modifying any data, we are simply removing references to blocks that
  are no longer reachable by new scanners.

Includes the following tests:

* RowSet-level unit test in diskrowset-test
* Tablet-level functional test in tablet_history_gc-test
* Tablet-level concurrency test in mt-tablet-test
* Integration test utilizing the tserver-level MM task in
  tablet_history_gc-itest
* Incorporated into RandomizedTabletHistoryGcITest in
  tablet_history_gc-itest

Change-Id: I0309bf7acfb6d018860c80f354012c3500da5c68
---
M src/kudu/integration-tests/tablet_history_gc-itest.cc
M src/kudu/master/master.cc
M src/kudu/tablet/delta_tracker.cc
M src/kudu/tablet/delta_tracker.h
M src/kudu/tablet/diskrowset-test.cc
M src/kudu/tablet/diskrowset.cc
M src/kudu/tablet/diskrowset.h
M src/kudu/tablet/memrowset.h
M src/kudu/tablet/mock-rowsets.h
M src/kudu/tablet/mt-tablet-test.cc
M src/kudu/tablet/rowset.h
M src/kudu/tablet/rowset_metadata.cc
M src/kudu/tablet/rowset_metadata.h
M src/kudu/tablet/tablet.cc
M src/kudu/tablet/tablet.h
M src/kudu/tablet/tablet_history_gc-test.cc
M src/kudu/tablet/tablet_metrics.cc
M src/kudu/tablet/tablet_metrics.h
M src/kudu/tablet/tablet_mm_ops.cc
M src/kudu/tablet/tablet_mm_ops.h
M src/kudu/tserver/tablet_server.cc
M src/kudu/util/maintenance_manager-test.cc
M src/kudu/util/maintenance_manager.cc
M src/kudu/util/maintenance_manager.h
24 files changed, 1,256 insertions(+), 126 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/63/4363/16
-- 
To view, visit http://gerrit.cloudera.org:8080/4363
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I0309bf7acfb6d018860c80f354012c3500da5c68
Gerrit-PatchSet: 16
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Mike Percy 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Jean-Daniel Cryans 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Tidy Bot


[kudu-CR] KUDU-1601. Delete ancient UNDO delta blocks in the background

2017-02-22 Thread Mike Percy (Code Review)
Hello Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/4363

to look at the new patch set (#15).

Change subject: KUDU-1601. Delete ancient UNDO delta blocks in the background
..

KUDU-1601. Delete ancient UNDO delta blocks in the background

This patch adds a maintenance manager background task that deletes
"ancient" UNDO delta blocks, which means blocks that correspond to data
that is considered no longer reachable and a candidate for garbage
collection. The task only deletes entire blocks and so does not provoke
write amplification.

This maintenance task operates in the following way:

1. UpdateStats() for the op returns the maximum potentially gc'able
   bytes of undos in the rowset, which is the sum of all undo delta
   store sizes up until an initialized one with
   max_timestamp > the AHM (ancient history mark).

2. Perform() for the op initializes deltas for some budgeted amount of
   time on the tablet. Per rowset it performs a binary search and
   initializes undo delta stores until it finds the earliest one with
   max_timestamp > AHM. That makes the next UpdateStats() call more
   accurate. Then it starts from the oldest and initializes undos within
   the budget. Finally, once it's through with its time budget, it
   garbage-collects as many undos in the rowset as possible.

To avoid starvation of performance ops, there is a flag that
incorporates some randomness into the scheduler, at the MM level. This
limits the fraction of the time we consider data GC ops higher
priority than performance ops.

This patch includes the following:

* New UNDO delta block GC MM task
* New UNDO delta block GC metrics (at the tablet level only)
* Flags to enable / disable the GC task as well as flags to throttle it
* A few minor improvements in the maintenance manager
* Fixes for a few preexisting clang-tidy lint complaints

Notable implementation details:

* When performing undo delta GC in Tablet::DeleteAncientUndoDeltas(), we
  only flush the tablet metadata after making the metadata changes
  across all rowsets. This should be safe because we are not actually
  modifying any data, we are simply removing references to blocks that
  are no longer reachable by new scanners.

Includes the following tests:

* RowSet-level unit test in diskrowset-test
* Tablet-level functional test in tablet_history_gc-test
* Tablet-level concurrency test in mt-tablet-test
* Integration test utilizing the tserver-level MM task in
  tablet_history_gc-itest
* Incorporated into RandomizedTabletHistoryGcITest in
  tablet_history_gc-itest

Change-Id: I0309bf7acfb6d018860c80f354012c3500da5c68
---
M src/kudu/integration-tests/tablet_history_gc-itest.cc
M src/kudu/master/master.cc
M src/kudu/tablet/delta_tracker.cc
M src/kudu/tablet/delta_tracker.h
M src/kudu/tablet/diskrowset-test.cc
M src/kudu/tablet/diskrowset.cc
M src/kudu/tablet/diskrowset.h
M src/kudu/tablet/memrowset.h
M src/kudu/tablet/mock-rowsets.h
M src/kudu/tablet/mt-tablet-test.cc
M src/kudu/tablet/rowset.h
M src/kudu/tablet/rowset_metadata.cc
M src/kudu/tablet/rowset_metadata.h
M src/kudu/tablet/tablet.cc
M src/kudu/tablet/tablet.h
M src/kudu/tablet/tablet_history_gc-test.cc
M src/kudu/tablet/tablet_metrics.cc
M src/kudu/tablet/tablet_metrics.h
M src/kudu/tablet/tablet_mm_ops.cc
M src/kudu/tablet/tablet_mm_ops.h
M src/kudu/tserver/tablet_server.cc
M src/kudu/util/maintenance_manager-test.cc
M src/kudu/util/maintenance_manager.cc
M src/kudu/util/maintenance_manager.h
24 files changed, 1,257 insertions(+), 126 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/63/4363/15
-- 
To view, visit http://gerrit.cloudera.org:8080/4363
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I0309bf7acfb6d018860c80f354012c3500da5c68
Gerrit-PatchSet: 15
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Mike Percy 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Jean-Daniel Cryans 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Tidy Bot


[kudu-CR] KUDU-1601. Delete ancient UNDO delta blocks in the background

2017-02-22 Thread Mike Percy (Code Review)
Mike Percy has posted comments on this change.

Change subject: KUDU-1601. Delete ancient UNDO delta blocks in the background
..


Patch Set 14:

(11 comments)

http://gerrit.cloudera.org:8080/#/c/4363/13/src/kudu/tablet/tablet.cc
File src/kudu/tablet/tablet.cc:

PS13, Line 136: enable_undo_delta_block_gc
> ah, I missed this flag lived here (I'm reviewing in reverse file order) yea
tablet.cc is the best place to prevent registration of the task since we 
register it here in Tablet::RegisterMaintenanceOps(). However if you think I 
should DECLARE it here and DEFINE it in tablet_mm_ops.cc then I can do that. I 
originally had it run but mark the task as not runnable but JD asked me to 
prevent it from running at all, so I changed it to do this.


PS13, Line 1764: InitAncientUndoDeltas
> why are you returning bytes_in_ancient_undos? the op seems to ignore this.
Added a short-circuit to UndoDeltaBlockGCOp::Perform() if this returns 0 bytes.


PS13, Line 1774: // No need to hold the selection lock here, since we are not 
actually making
   :   // any changes.
> nit: this is slightly confusing cuz we do get a read lock to copy them, I t
Done


PS13, Line 1782:   vector> 
rowset_ancient_undos_est_sizes; // index, bytes
   :   rowset_ancient_undos_est_sizes.reserve(rowsets.size());
   :   for (size_t i = 0; i < rowsets.size(); i++) {
   : const auto& rowset = rowsets[i];
   : int64_t bytes;
   : 
RETURN_NOT_OK(rowset->EstimateBytesInPotentiallyAncientUndoDeltas(ancient_history_mark,
   :
   ));
   : rowset_ancient_undos_est_sizes.emplace_back(i, bytes);
   :   }
> are adding already intialized undo deltas part of this? why? eventually thi
It's estimating the size of each rowset so we can initialize them greedily, 
starting with the largest first. Updated some of the comments.


PS13, Line 1793: td::sort(rowset_ancient_undos_est_sizes.begin(), 
rowset_ancient_undos_est_sizes.end(),
   : [&](const pair& a, const 
pair& b) {
   :   return a.second > b.second; // Descending order.
   : });
> if we only had the uninitialized undos in rowset_ancient_undos_est_sizes wo
Yes, because that means we start checking and initializing with the rowset with 
the most data in its undos.


PS13, Line 1834: std::lock_guard compact_lock(compact_select_lock_);
> we hold this lock for the whole time? all other uses of this lock in tablet
Oh, you're right. Fixed.


http://gerrit.cloudera.org:8080/#/c/4363/13/src/kudu/tablet/tablet_mm_ops.cc
File src/kudu/tablet/tablet_mm_ops.cc:

PS13, Line 276: stats->set_runnable(true);
> set runnable to false if there no data to gc
Done


PS13, Line 283: WARN_NOT_OK(tablet_->InitAncientUndoDeltas(time_budget, 
_bytes_in_ancient_undos),
  :   Substitute("$0Unable to initialize old undo delta 
files on tablet $1",
  :   LogPrefix(), tablet_->tablet_id()));
> is tablet_bytes_in_ancient_undos defined if the method returns not ok? seem
Good point, I hadn't reconsidered the method signature after moving it out of 
UpdateStats(). Done.


http://gerrit.cloudera.org:8080/#/c/4363/13/src/kudu/util/maintenance_manager.cc
File src/kudu/util/maintenance_manager.cc:

PS13, Line 69: DEFINE_int64(data_gc_min_size_mb, 0,
 :  "The (exclusive) minimum number of megabytes of 
ancient data on "
 :  "disk, per tablet, needed to prioritize deletion of 
that data.");
> do we actually need this knob? when would we turn this up or down? if you w
My thought was that since I am not really sure what the ideal settings are yet, 
2 knobs would allow us to set this high but the prioritization probability 
high, for example.

With the new data GC job, disabling can take place with the "enable_..." flag, 
but if you think it's generally useful for the future we could add -1 as a 
special value here... but we don't need it yet.


PS13, Line 324: op->io_usage() == MaintenanceOp::LOW_IO_USAGE
> remind me what this is doing?
We have to give some classification to each job so I chose LOW_IO_USAGE for the 
data GC job... but I think that's the wrong classification now that we are 
doing the Init() on the worker thread. I'll remove this part.


PS13, Line 395: rand_.NextDoubleFraction() > FLAGS_data_gc_prioritization_prob
> nit: can you revert this (<= not >) and flip the if?
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/4363
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I0309bf7acfb6d018860c80f354012c3500da5c68
Gerrit-PatchSet: 14
Gerrit-Project: kudu
Gerrit-Branch: master

[kudu-CR] KUDU-1601. Delete ancient UNDO delta blocks in the background

2017-02-22 Thread David Ribeiro Alves (Code Review)
David Ribeiro Alves has posted comments on this change.

Change subject: KUDU-1601. Delete ancient UNDO delta blocks in the background
..


Patch Set 13:

(11 comments)

http://gerrit.cloudera.org:8080/#/c/4363/13/src/kudu/tablet/tablet.cc
File src/kudu/tablet/tablet.cc:

PS13, Line 136: enable_undo_delta_block_gc
ah, I missed this flag lived here (I'm reviewing in reverse file order) yeah 
maybe it makes more sense to have a separate flag just for this. However if 
it's only pertaining to the background task (which i still find a bit weird) 
why not move it there? (i.e. the flag turns all undo gc ops to non-runnable)


PS13, Line 1764: InitAncientUndoDeltas
why are you returning bytes_in_ancient_undos? the op seems to ignore this.


PS13, Line 1774: // No need to hold the selection lock here, since we are not 
actually making
   :   // any changes.
nit: this is slightly confusing cuz we do get a read lock to copy them, I think 
you could remove the comment altogether (the method above doesn't have it, for 
once)


PS13, Line 1782:   vector> 
rowset_ancient_undos_est_sizes; // index, bytes
   :   rowset_ancient_undos_est_sizes.reserve(rowsets.size());
   :   for (size_t i = 0; i < rowsets.size(); i++) {
   : const auto& rowset = rowsets[i];
   : int64_t bytes;
   : 
RETURN_NOT_OK(rowset->EstimateBytesInPotentiallyAncientUndoDeltas(ancient_history_mark,
   :
   ));
   : rowset_ancient_undos_est_sizes.emplace_back(i, bytes);
   :   }
are adding already intialized undo deltas part of this? why? eventually this 
method shouldn't do anything, right?


PS13, Line 1793: td::sort(rowset_ancient_undos_est_sizes.begin(), 
rowset_ancient_undos_est_sizes.end(),
   : [&](pair& a, pair& b) {
   :   return a.second > b.second; // Descending order.
   : });
if we only had the uninitialized undos in rowset_ancient_undos_est_sizes would 
this still make sense? (just trying to simplify really)


PS13, Line 1834: std::lock_guard compact_lock(compact_select_lock_);
we hold this lock for the whole time? all other uses of this lock in tablet.cc 
are just to read/get a snapshot of rowsets. we might do IO on delete right?

don't we need to follow a strategy like in Tablet::PickRowSetsToCompact()


http://gerrit.cloudera.org:8080/#/c/4363/13/src/kudu/tablet/tablet_mm_ops.cc
File src/kudu/tablet/tablet_mm_ops.cc:

PS13, Line 276: stats->set_runnable(true);
set runnable to false if there no data to gc


PS13, Line 283: WARN_NOT_OK(tablet_->InitAncientUndoDeltas(time_budget, 
_bytes_in_ancient_undos),
  :   Substitute("$0Unable to initialize old undo delta 
files on tablet $1",
  :   LogPrefix(), tablet_->tablet_id()));
is tablet_bytes_in_ancient_undos defined if the method returns not ok? seems 
like we should RETURN_NOT_OK here and skip deleting anything


http://gerrit.cloudera.org:8080/#/c/4363/13/src/kudu/util/maintenance_manager.cc
File src/kudu/util/maintenance_manager.cc:

PS13, Line 69: DEFINE_int64(data_gc_min_size_mb, 0,
 :  "The (exclusive) minimum number of megabytes of 
ancient data on "
 :  "disk, per tablet, needed to prioritize deletion of 
that data.");
do we actually need this knob? when would we turn this up or down? if you want 
to keep it it does seem like it should be higher. (i.e. we don't want to skip a 
compaction to recover a couple of bytes)
Finally maybe use a special value to mean "never gc" like -1 or something so 
that we can turn off gc if needed.


PS13, Line 324: op->io_usage() == MaintenanceOp::LOW_IO_USAGE
remind me what this is doing?


PS13, Line 395: rand_.NextDoubleFraction() > FLAGS_data_gc_prioritization_prob
nit: can you revert this (<= not >) and flip the if?


-- 
To view, visit http://gerrit.cloudera.org:8080/4363
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I0309bf7acfb6d018860c80f354012c3500da5c68
Gerrit-PatchSet: 13
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Mike Percy 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Jean-Daniel Cryans 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Tidy Bot
Gerrit-HasComments: Yes


[kudu-CR] KUDU-1601. Delete ancient UNDO delta blocks in the background

2017-02-22 Thread Mike Percy (Code Review)
Hello Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/4363

to look at the new patch set (#14).

Change subject: KUDU-1601. Delete ancient UNDO delta blocks in the background
..

KUDU-1601. Delete ancient UNDO delta blocks in the background

This patch adds a maintenance manager background task that deletes
"ancient" UNDO delta blocks, which means blocks that correspond to data
that is considered no longer reachable and a candidate for garbage
collection. The task only deletes entire blocks and so does not provoke
write amplification.

This maintenance task operates in the following way:

1. UpdateStats() for the op returns the maximum potentially gc'able
   bytes of undos in the rowset, which is the sum of all undo delta
   store sizes up until an initialized one with
   max_timestamp > the AHM (ancient history mark).

2. Perform() for the op initializes deltas for some budgeted amount of
   time on the tablet. Per rowset it performs a binary search and
   initializes undo delta stores until it finds the earliest one with
   max_timestamp > AHM. That makes the next UpdateStats() call more
   accurate. Then it starts from the oldest and initializes undos within
   the budget. Finally, once it's through with its time budget, it
   garbage-collects as many undos in the rowset as possible.

To avoid starvation of performance ops, there is a flag that
incorporates some randomness into the scheduler, at the MM level. This
limits the fraction of the time we consider data GC ops higher
priority than performance ops.

This patch includes the following:

* New UNDO delta block GC MM task
* New UNDO delta block GC metrics (at the tablet level only)
* Flags to enable / disable the GC task as well as flags to throttle it
* A few minor improvements in the maintenance manager
* Fixes for a few preexisting clang-tidy lint complaints

Notable implementation details:

* When performing undo delta GC in Tablet::DeleteAncientUndoDeltas(), we
  only flush the tablet metadata after making the metadata changes
  across all rowsets. This should be safe because we are not actually
  modifying any data, we are simply removing references to blocks that
  are no longer reachable by new scanners.

Includes the following tests:

* RowSet-level unit test in diskrowset-test
* Tablet-level functional test in tablet_history_gc-test
* Tablet-level concurrency test in mt-tablet-test
* Integration test utilizing the tserver-level MM task in
  tablet_history_gc-itest
* Incorporated into RandomizedTabletHistoryGcITest in
  tablet_history_gc-itest

Change-Id: I0309bf7acfb6d018860c80f354012c3500da5c68
---
M src/kudu/integration-tests/tablet_history_gc-itest.cc
M src/kudu/master/master.cc
M src/kudu/tablet/delta_tracker.cc
M src/kudu/tablet/delta_tracker.h
M src/kudu/tablet/diskrowset-test.cc
M src/kudu/tablet/diskrowset.cc
M src/kudu/tablet/diskrowset.h
M src/kudu/tablet/memrowset.h
M src/kudu/tablet/mock-rowsets.h
M src/kudu/tablet/mt-tablet-test.cc
M src/kudu/tablet/rowset.h
M src/kudu/tablet/rowset_metadata.cc
M src/kudu/tablet/rowset_metadata.h
M src/kudu/tablet/tablet.cc
M src/kudu/tablet/tablet.h
M src/kudu/tablet/tablet_history_gc-test.cc
M src/kudu/tablet/tablet_metrics.cc
M src/kudu/tablet/tablet_metrics.h
M src/kudu/tablet/tablet_mm_ops.cc
M src/kudu/tablet/tablet_mm_ops.h
M src/kudu/tserver/tablet_server.cc
M src/kudu/util/maintenance_manager-test.cc
M src/kudu/util/maintenance_manager.cc
M src/kudu/util/maintenance_manager.h
24 files changed, 1,261 insertions(+), 127 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/63/4363/14
-- 
To view, visit http://gerrit.cloudera.org:8080/4363
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I0309bf7acfb6d018860c80f354012c3500da5c68
Gerrit-PatchSet: 14
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Mike Percy 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Jean-Daniel Cryans 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Tidy Bot


[kudu-CR] KUDU-1601. Delete ancient UNDO delta blocks in the background

2017-02-22 Thread Mike Percy (Code Review)
Hello Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/4363

to look at the new patch set (#13).

Change subject: KUDU-1601. Delete ancient UNDO delta blocks in the background
..

KUDU-1601. Delete ancient UNDO delta blocks in the background

This patch adds a maintenance manager background task that deletes
"ancient" UNDO delta blocks, which means blocks that correspond to data
that is considered no longer reachable and a candidate for garbage
collection. The task only deletes entire blocks and so does not provoke
write amplification.

This maintenance task operates in the following way:

1. UpdateStats() for the op returns the maximum potentially gc'able
   bytes of undos in the rowset, which is the sum of all undo delta
   store sizes up until an initialized one with
   max_timestamp > the AHM (ancient history mark).

2. Perform() for the op initializes deltas for some budgeted amount of
   time on the tablet. Per rowset it performs a binary search and
   initializes undo delta stores until it finds the earliest one with
   max_timestamp > AHM. That makes the next UpdateStats() call more
   accurate. Then it starts from the oldest and initializes undos within
   the budget. Finally, once it's through with its time budget, it
   garbage-collects as many undos in the rowset as possible.

To avoid starvation of performance ops, there is a flag that
incorporates some randomness into the scheduler, at the MM level. This
limits the fraction of the time we consider data GC ops higher
priority than performance ops.

This patch includes the following:

* New UNDO delta block GC MM task
* New UNDO delta block GC metrics (at the tablet level only)
* Flags to enable / disable the GC task as well as flags to throttle it
* A few minor improvements in the maintenance manager
* Fixes for a few preexisting clang-tidy lint complaints

Notable implementation details:

* When performing undo delta GC in Tablet::DeleteAncientUndoDeltas(), we
  only flush the tablet metadata after making the metadata changes
  across all rowsets. This should be safe because we are not actually
  modifying any data, we are simply removing references to blocks that
  are no longer reachable by new scanners.

Includes the following tests:

* RowSet-level unit test in diskrowset-test
* Tablet-level functional test in tablet_history_gc-test
* Tablet-level concurrency test in mt-tablet-test
* Integration test utilizing the tserver-level MM task in
  tablet_history_gc-itest
* Incorporated into RandomizedTabletHistoryGcITest in
  tablet_history_gc-itest

Change-Id: I0309bf7acfb6d018860c80f354012c3500da5c68
---
M src/kudu/integration-tests/tablet_history_gc-itest.cc
M src/kudu/master/master.cc
M src/kudu/tablet/delta_tracker.cc
M src/kudu/tablet/delta_tracker.h
M src/kudu/tablet/diskrowset-test.cc
M src/kudu/tablet/diskrowset.cc
M src/kudu/tablet/diskrowset.h
M src/kudu/tablet/memrowset.h
M src/kudu/tablet/mock-rowsets.h
M src/kudu/tablet/mt-tablet-test.cc
M src/kudu/tablet/rowset.h
M src/kudu/tablet/rowset_metadata.cc
M src/kudu/tablet/rowset_metadata.h
M src/kudu/tablet/tablet.cc
M src/kudu/tablet/tablet.h
M src/kudu/tablet/tablet_history_gc-test.cc
M src/kudu/tablet/tablet_metrics.cc
M src/kudu/tablet/tablet_metrics.h
M src/kudu/tablet/tablet_mm_ops.cc
M src/kudu/tablet/tablet_mm_ops.h
M src/kudu/tserver/tablet_server.cc
M src/kudu/util/maintenance_manager-test.cc
M src/kudu/util/maintenance_manager.cc
M src/kudu/util/maintenance_manager.h
24 files changed, 1,259 insertions(+), 127 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/63/4363/13
-- 
To view, visit http://gerrit.cloudera.org:8080/4363
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I0309bf7acfb6d018860c80f354012c3500da5c68
Gerrit-PatchSet: 13
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Mike Percy 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Jean-Daniel Cryans 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Tidy Bot


[kudu-CR] KUDU-1601. Delete ancient UNDO delta blocks in the background

2017-02-22 Thread Mike Percy (Code Review)
Mike Percy has posted comments on this change.

Change subject: KUDU-1601. Delete ancient UNDO delta blocks in the background
..


Patch Set 12:

(36 comments)

http://gerrit.cloudera.org:8080/#/c/4363/9/src/kudu/tablet/delta_tracker.cc
File src/kudu/tablet/delta_tracker.cc:

PS9, Line 434: update.RemoveUndoDeltaBlocks(block_ids_to_remove);
> Ah, didn't know that. Think that having that info somewhere would be great.
Done, added a comment to DeleteAncientUndoDeltas()


http://gerrit.cloudera.org:8080/#/c/4363/12/src/kudu/tablet/delta_tracker.cc
File src/kudu/tablet/delta_tracker.cc:

PS12, Line 287: compactions
> compactions and GC now, right?
That's true. Fixed.


PS12, Line 300: No consistency problems will be visible
> not totally your fault, but I don't really understand why this is the case.
Because it's specified to only be used for compactions, not flushes. And that 
is true.


PS12, Line 300: // No consistency problems will be visible if we don't 
successfully
  : // Flush(), so no need to CHECK_OK here.
> think we need a full explanation why this is the case, here or somewhere el
I added an extra sentence at the end. LMK if you're still not comfortable with 
it.


PS12, Line 367: if (max_deltas_to_initialize != -1 &&
  :   tmp_blocks_initialized == max_deltas_to_initialize) 
break;
> as I mentioned in the header, don't think we need both a time and count lim
Done


PS12, Line 396: effectively a compaction
> i think i understand what you're saying, but this is misleading without add
Updated the comment


PS12, Line 420: -1
> use a constant?
Removed this param


http://gerrit.cloudera.org:8080/#/c/4363/9/src/kudu/tablet/delta_tracker.h
File src/kudu/tablet/delta_tracker.h:

PS9, Line 173: ed.
> because kInvalidTimestamp is a big positive number whereas kInitialTimestam
I look at it in the opposite way, hit me up on Slack if you want to discuss 
this point. A high timestamp is way in the future whereas a low one is way in 
the past.


http://gerrit.cloudera.org:8080/#/c/4363/12/src/kudu/tablet/delta_tracker.h
File src/kudu/tablet/delta_tracker.h:

PS12, Line 181: int64_t max_deltas_to_initialize,
  : MonoTime deadline,
> do we need both a max_deltas_to_initialize and a dealine? these seem a bit 
You are right... at some point I wanted more control but I don't think it's 
used now.


http://gerrit.cloudera.org:8080/#/c/4363/12/src/kudu/tablet/diskrowset-test.cc
File src/kudu/tablet/diskrowset-test.cc:

PS12, Line 564: // Major delta compaction is additive.
> ?
Tried to be terse but I guess it's cryptic. I reworded this.


PS12, Line 567: We should be left with no delta stores.
> nit: There shouldn't be any delta stores left.
Done


PS12, Line 568: constexpr
> curious, why constexpr? any way think it would make sense to have this cons
I am going to remove this parameter also, since it's never used.


http://gerrit.cloudera.org:8080/#/c/4363/12/src/kudu/tablet/diskrowset.cc
File src/kudu/tablet/diskrowset.cc:

PS12, Line 545: since we've already updated the metadata
> nit: surround with commas
Done


PS12, Line 556: We don't CHECK_OK on Flush here because if we don't 
successfully flush we
  :   // don't have consistency problems in the case of major delta 
compaction
> as I mentioned elsewhere I think we need a good explanation as to why. I as
I added an explanation right below this line: "we are not adding additional 
mutations that weren't already present"


http://gerrit.cloudera.org:8080/#/c/4363/12/src/kudu/tablet/mt-tablet-test.cc
File src/kudu/tablet/mt-tablet-test.cc:

PS12, Line 331:   /*
  :   if (bytes_in_ancient_undos > 0) {
  : LOG(INFO) << "Found " << bytes_in_ancient_undos << " 
bytes of ancient delta undos";
  :   }
  :   */
> don't think you meant to leave this.
Done


http://gerrit.cloudera.org:8080/#/c/4363/12/src/kudu/tablet/rowset.h
File src/kudu/tablet/rowset.h:

PS12, Line 146:   // Initialize up to 'max_deltas_to_initialize' deltas that 
while 'deadline' has
  :   // not yet been passed and while a delta file with a max 
timestamp later than
  :   // 'ancient_history_mark' has not yet been encountered. 
Return the number of
  :   // deltas initialized during this invocation in 
'num_deltas_initialized' and
  :   // the total amount of on-disk data known to be entirely 
composed of ancient
  :   // undo delta blocks in 'bytes_in_ancient_undos'.
> is this comment different from the delta_tracker one? if not maybe make tha
Good idea, done.


PS12, Line 159:   // timestamp lower than the given ancient history mark. This 
method only
  :   // checks the oldest 'max_deltas_to_delete' UNDO delta files, 
and only if
  :   // they are already initialized.
  :   //
 

[kudu-CR] KUDU-1601. Delete ancient UNDO delta blocks in the background

2017-02-22 Thread David Ribeiro Alves (Code Review)
David Ribeiro Alves has posted comments on this change.

Change subject: KUDU-1601. Delete ancient UNDO delta blocks in the background
..


Patch Set 12:

(14 comments)

http://gerrit.cloudera.org:8080/#/c/4363/12/src/kudu/tablet/tablet_history_gc-test.cc
File src/kudu/tablet/tablet_history_gc-test.cc:

PS12, Line 550: // No deletes.
why is this relevant?


PS12, Line 559: MonoTime deadline = MonoTime::Now() + 
MonoDelta::FromSeconds(60);
use MonoTime::Max() here


PS12, Line 560: bytes_in_ancient_undos
assert that this is GT 0


http://gerrit.cloudera.org:8080/#/c/4363/12/src/kudu/tablet/tablet_metrics.cc
File src/kudu/tablet/tablet_metrics.cc:

PS12, Line 119: Does not include bytes garbage collected during compactions.
why? think it would make sense to have both, no?


PS12, Line 126: Does not include blocks garbage collected during compactions.
not sure it's worth it to have both metrics. what does the number of blocks 
tell you that number of bytes doesn't?


http://gerrit.cloudera.org:8080/#/c/4363/12/src/kudu/tserver/tablet_service.cc
File src/kudu/tserver/tablet_service.cc:

PS12, Line 1131: WARN_NOT_OK(server_->tablet_manager()->GetTabletPeers(), 
"Cannot list tablets"
return on not OK?


http://gerrit.cloudera.org:8080/#/c/4363/12/src/kudu/tserver/ts_tablet_manager.cc
File src/kudu/tserver/ts_tablet_manager.cc:

PS12, Line 842: WARN_NOT_OK(GetTabletPeers(_to_shutdown), "Cannot list 
tablets");
should this be a warn? what happens if we don't get all of them? from what I 
get of the impl it always returns Status::OK


http://gerrit.cloudera.org:8080/#/c/4363/12/src/kudu/tserver/ts_tablet_manager.h
File src/kudu/tserver/ts_tablet_manager.h:

PS12, Line 190: fs_manager
docs


http://gerrit.cloudera.org:8080/#/c/4363/12/src/kudu/tserver/tserver-path-handlers.cc
File src/kudu/tserver/tserver-path-handlers.cc:

PS12, Line 114: WARN_NOT_OK(tserver_->tablet_manager()->GetTabletPeers(), 
"Cannot list tablets");
what happens below on not-OK?


PS12, Line 182: WARN_NOT_OK(tserver_->tablet_manager()->GetTabletPeers(), 
"Cannot list tablets");
same


http://gerrit.cloudera.org:8080/#/c/4363/12/src/kudu/tserver/tserver_mm_ops.cc
File src/kudu/tserver/tserver_mm_ops.cc:

PS12, Line 59: Total Undo Delta Block GC Init Duration
think the explanation text below is better than the metric name and not much 
longer, same for some of the other metrics


PS12, Line 108: enable_undo_delta_block_gc
same comment as before regarding merging the enable_ flag with the undo max age 
flag


PS12, Line 176: wih
typo


http://gerrit.cloudera.org:8080/#/c/4363/12/src/kudu/tserver/tserver_mm_ops.h
File src/kudu/tserver/tserver_mm_ops.h:

PS12, Line 50: class UndoDeltaBlockGCOp : public MaintenanceOp
likely can make this a regular tablet op with the changes we discussed.


-- 
To view, visit http://gerrit.cloudera.org:8080/4363
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I0309bf7acfb6d018860c80f354012c3500da5c68
Gerrit-PatchSet: 12
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Mike Percy 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Jean-Daniel Cryans 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Tidy Bot
Gerrit-HasComments: Yes


[kudu-CR] KUDU-1601. Delete ancient UNDO delta blocks in the background

2017-02-21 Thread David Ribeiro Alves (Code Review)
David Ribeiro Alves has posted comments on this change.

Change subject: KUDU-1601. Delete ancient UNDO delta blocks in the background
..


Patch Set 11:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/4363/11/src/kudu/tserver/tserver_mm_ops.cc
File src/kudu/tserver/tserver_mm_ops.cc:

PS11, Line 153: void UndoDeltaBlockGCOp::UpdateStats(MaintenanceOpStats* stats) 
{
I see other folks' concerns about running this IO here.

If this is a long running tablet server, as it is, this task will hog the 
maintenance manager getting delta stats for every delta block in every tablet. 
Even 50 percent of the time is likely too much, particularly because it's 
hogging the actual scheduler, not running in the background. This will happen 
every other time that FindBestOp() runs and as long as there is something to gc 
(data_gc_min_size_mb is > 0, the default). Say that a user sets  
undo_delta_block_gc_init_budget_millis to 1 this will make the sheduler not 
run anything else for that time while there are delta files to open.

My suggestion:

- Move the IO in UpdateStats() to some ReadDeltaStats() method.
- Perform() starts by running ReadDeltaStats() for some budget of time like you 
have currently (or maybe even none see the end of this comment).
- If there are more undos to gc than the minimum (maybe move the flag here) 
then Perfom() actually gc's something, otherwise it returns.
- UpdateStats() sets the op to runnable if a previous perform established that 
there are more undos to gc than the minimum. If there aren't it still set's the 
op to runnable with some probability (say 5 percent or something), so that 
eventually Perform() will run.

Initially few of these run since their low pri in the types of ops and start at 
0, but over time you get a complete picture of the delta files and stuff gets 
deleted.

Since this is now lower probability and runs on the threadpool wont hog the 
scheduler you might even consider having one of these per tablet and maybe even 
having no budget.

Finally you could consider merging the flag in the mm and the enable_ flag, 
(i.e if you set data_gc_min_size_mb to -1 undo gc never gets prioritized)


http://gerrit.cloudera.org:8080/#/c/4363/11/src/kudu/util/maintenance_manager.cc
File src/kudu/util/maintenance_manager.cc:

PS11, Line 355: Look at ops that we can run quickly that free up data on disk.
does this have the highest priority right after log retention? we moved those 
up in pri because of reboot not so much because of actually freeing space (even 
though they are more likely to free more space than undo gc).

I suggest we move this before the perf ops


-- 
To view, visit http://gerrit.cloudera.org:8080/4363
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I0309bf7acfb6d018860c80f354012c3500da5c68
Gerrit-PatchSet: 11
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Mike Percy 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Jean-Daniel Cryans 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Tidy Bot
Gerrit-HasComments: Yes


[kudu-CR] KUDU-1601. Delete ancient UNDO delta blocks in the background

2017-02-21 Thread Mike Percy (Code Review)
Hello Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/4363

to look at the new patch set (#11).

Change subject: KUDU-1601. Delete ancient UNDO delta blocks in the background
..

KUDU-1601. Delete ancient UNDO delta blocks in the background

This patch adds a maintenance manager background task that deletes
"ancient" UNDO delta blocks, which means blocks that correspond to data
that is considered no longer reachable and a candidate for garbage
collection. The task only deletes entire blocks and so does not provoke
write amplification.

The task was written to operate at the server level (as opposed to the
tablet level) because the UpdateStats() part of the task spends a
budgeted amount of time opening (initializing) undo delta blocks in
order to read the stats from their headers. Without reading the delta
stats, it is impossible to know whether a given block is a candidate for
GC.

This patch includes the following:

* New UNDO delta block GC MM task running at the TS / Master level
* New UNDO delta block GC metrics, both at the tablet and the server level
* Flags to enable / disable the GC task as well as flags to throttle it
* A few minor improvements in the maintenance manager
* Fixes for a few preexisting clang-tidy lint complaints

A couple of implementation details:

* We shut down external MM ops on a tablet before shutting down the
  tablet itself. This is enforced with a condition variable and a new
  mutex in tablet.cc
* When performing undo delta GC in Tablet::DeleteAncientUndoDeltas(), we
  only flush the tablet metadata after making the metadata changes
  across all rowsets. This should be safe because we are not actually
  modifying any data, we are simply removing references to blocks that
  are no longer reachable by new scanners.

Includes the following tests:

* RowSet-level unit test in diskrowset-test
* Tablet-level functional test in tablet_history_gc-test
* Tablet-level concurrency test in mt-tablet-test
* Integration test utilizing the tserver-level MM task in
  tablet_history_gc-itest
* Incorporated into RandomizedTabletHistoryGcITest in
  tablet_history_gc-itest

Change-Id: I0309bf7acfb6d018860c80f354012c3500da5c68
---
M src/kudu/integration-tests/alter_table-test.cc
M src/kudu/integration-tests/full_stack-insert-scan-test.cc
M src/kudu/integration-tests/fuzz-itest.cc
M src/kudu/integration-tests/tablet_history_gc-itest.cc
M src/kudu/integration-tests/ts_tablet_manager-itest.cc
M src/kudu/master/catalog_manager.cc
M src/kudu/master/catalog_manager.h
M src/kudu/master/master.cc
M src/kudu/master/master.h
M src/kudu/server/server_base.h
M src/kudu/tablet/delta_tracker.cc
M src/kudu/tablet/delta_tracker.h
M src/kudu/tablet/diskrowset-test.cc
M src/kudu/tablet/diskrowset.cc
M src/kudu/tablet/diskrowset.h
M src/kudu/tablet/memrowset.h
M src/kudu/tablet/mock-rowsets.h
M src/kudu/tablet/mt-tablet-test.cc
M src/kudu/tablet/rowset.h
M src/kudu/tablet/rowset_metadata.cc
M src/kudu/tablet/rowset_metadata.h
M src/kudu/tablet/tablet.cc
M src/kudu/tablet/tablet.h
M src/kudu/tablet/tablet_history_gc-test.cc
M src/kudu/tablet/tablet_metrics.cc
M src/kudu/tablet/tablet_metrics.h
M src/kudu/tools/kudu-tool-test.cc
M src/kudu/tserver/CMakeLists.txt
M src/kudu/tserver/tablet_peer_lookup.h
M src/kudu/tserver/tablet_server.cc
M src/kudu/tserver/tablet_server.h
M src/kudu/tserver/tablet_service.cc
M src/kudu/tserver/ts_tablet_manager.cc
M src/kudu/tserver/ts_tablet_manager.h
M src/kudu/tserver/tserver-path-handlers.cc
A src/kudu/tserver/tserver_mm_ops.cc
A src/kudu/tserver/tserver_mm_ops.h
M src/kudu/util/maintenance_manager-test.cc
M src/kudu/util/maintenance_manager.cc
M src/kudu/util/maintenance_manager.h
40 files changed, 1,513 insertions(+), 141 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/63/4363/11
-- 
To view, visit http://gerrit.cloudera.org:8080/4363
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I0309bf7acfb6d018860c80f354012c3500da5c68
Gerrit-PatchSet: 11
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Mike Percy 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Jean-Daniel Cryans 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Tidy Bot


[kudu-CR] KUDU-1601. Delete ancient UNDO delta blocks in the background

2017-02-21 Thread Mike Percy (Code Review)
Mike Percy has posted comments on this change.

Change subject: KUDU-1601. Delete ancient UNDO delta blocks in the background
..


Patch Set 9:

(32 comments)

http://gerrit.cloudera.org:8080/#/c/4363/9//COMMIT_MSG
Commit Message:

PS9, Line 15: The task was written to operate at the server level
> in contrast to?
Done


http://gerrit.cloudera.org:8080/#/c/4363/9/src/kudu/integration-tests/tablet_history_gc-itest.cc
File src/kudu/integration-tests/tablet_history_gc-itest.cc:

PS9, Line 112: // Don't GC until we move the clock.
> nit: weird comment. isn't this more related to the increments to the clock 
I'll just remove the comment.


PS9, Line 116: is fine
> s/is fine/cluster
Done


PS9, Line 118: // Create a single-replica tablet so we can write to it and test 
undo GC on it.
> don't think we need this comment
Done


PS9, Line 148: j
> s/j/row_value
Done


PS9, Line 774: ASSERT_OK(tablet->DeleteAncientUndoDeltas(_deleted, 
_deleted));
> add some assertion that makes sure than when a value is non-zero both value
Done


http://gerrit.cloudera.org:8080/#/c/4363/9/src/kudu/master/catalog_manager.h
File src/kudu/master/catalog_manager.h:

PS9, Line 501: GetTabletPeers
> docs (at least a "see:")
Done


Line 730:   // CatalogManager-wide maintenance operations.
> nit: prepend "Start/Stop "
Done


Line 771:   std::vector maintenance_ops_;
> docs
Done


http://gerrit.cloudera.org:8080/#/c/4363/9/src/kudu/server/server_base.h
File src/kudu/server/server_base.h:

PS9, Line 101:   virtual MaintenanceManager* maintenance_manager() = 0;
> docs
Done


http://gerrit.cloudera.org:8080/#/c/4363/9/src/kudu/tablet/delta_tracker.cc
File src/kudu/tablet/delta_tracker.cc:

PS9, Line 434: RETURN_NOT_OK(CommitDeltaUpdate(update, undos_to_remove, {}, 
UNDO, NO_FLUSH_METADATA));
> so the caller is supposed to flush the metadata _after_ the blocks are dele
The blocks don't actually get deleted until we flush the metadata, since that 
is when the orphans list gets traversed.


http://gerrit.cloudera.org:8080/#/c/4363/9/src/kudu/tablet/delta_tracker.h
File src/kudu/tablet/delta_tracker.h:

PS9, Line 149: not
 :   // modifications to data values.
> not sure what this means... which modifications to data values are you refe
I was trying to say that we shouldn't use it for mutations because the flush 
happens last. If we aren't adding mutations then the ordering doesn't matter 
much... we could never explicitly flush the metadata and still maintain 
correctness.


PS9, Line 165: max_timestamp
> surround with ' for consistency (or remove the underscore)
Removed the underscore


PS9, Line 173: Timestamp::kInvalidTimestamp
> maybe we should use kInitialTimestamp instead (kmin +1 instead of kMax -1),
Why do you think that helps with comparison mistakes? If we say invalid == 
super old then if there was a bug we might accidentally to Init (or GC) newer 
deltas.


PS9, Line 176: If no return value is desired
> this reads weird
Any suggestions? I tried to improve this.


PS9, Line 178: InitAncientUndoDeltas
> is this always called independently of whether we are GC'ing undos? if so w
Not really, but OK that seems fine.


PS9, Line 189: / Does not flush updates to the rowset metadata. The caller must 
flush the
 :   // metadata explicitly.
> should likely append "after calling this method and if it returns an OK sta
Done


PS9, Line 228: not including the DeltaMemStore.
> the deltamemstore is alway REDO, this doesn't make much sense
Done


PS9, Line 244: WhichStores
> this "WhichStores" seems to imply it's not "all"
Updated the comment. If you are referring to the name of the type, it's an 
already-existing type and I don't think it's strictly wrong to say something 
like: "Which of these men robbed the bank?" "All of them."


PS9, Line 298: LogPrefix
> docs
I don't think it's necessary or desirable to doc this because it's for 
LOG_WITH_PREFIX() and it's everywhere throughout the code base these days.


http://gerrit.cloudera.org:8080/#/c/4363/9/src/kudu/tablet/diskrowset-test.cc
File src/kudu/tablet/diskrowset-test.cc:

PS9, Line 568: -1
> magic number, use const or add /* */ style coment, same for the nulls
Done


http://gerrit.cloudera.org:8080/#/c/4363/9/src/kudu/tablet/diskrowset.cc
File src/kudu/tablet/diskrowset.cc:

PS9, Line 20: stl_logging.h
> interesting, is this to print the vector?
Yep!


PS9, Line 543: CHECK_OK(rowset_metadata_->Flush());
> fuzzy a to why its not ok to flush here and its ok to flush in other places
Hmm. You're right, this wasn't consistent. I'm making it consistent with the 
other one, with an explanation.


http://gerrit.cloudera.org:8080/#/c/4363/9/src/kudu/tserver/tserver_mm_ops.cc
File src/kudu/tserver/tserver_mm_ops.cc:

PS9, Line 34: .
> maybe add "(doesn't prevent compactions from deleting ancient data)"
Done, with a minor tweak to that wording


PS9, Line 37: 

[kudu-CR] KUDU-1601. Delete ancient UNDO delta blocks in the background

2017-02-21 Thread David Ribeiro Alves (Code Review)
David Ribeiro Alves has posted comments on this change.

Change subject: KUDU-1601. Delete ancient UNDO delta blocks in the background
..


Patch Set 9:

(9 comments)

http://gerrit.cloudera.org:8080/#/c/4363/9/src/kudu/tserver/tserver_mm_ops.cc
File src/kudu/tserver/tserver_mm_ops.cc:

PS9, Line 34: .
maybe add "(doesn't prevent compactions from deleting ancient data)"


PS9, Line 37: DEFINE_int32(undo_delta_block_gc_init_budget_millis, 100,
this should likely be tagged as advanced, maybe even hidden.


PS9, Line 43: DEFINE_double(undo_delta_block_gc_delete_max_selection_ratio, 0.5,
: "The maximum fraction of the time that undo delta block gc 
may be selected "
: "to run by the maintenance manager. This is measured by the 
ratio of number "
: "of times UpdateStats() is called over the number of times 
Perform() is "
: "called on the UndoDeltaBlockGCOp. This is intended to avoid 
that task "
: "starving other maintenance tasks.");
tags (hidden, likely). this is hard to understand as a prioritization cap.


http://gerrit.cloudera.org:8080/#/c/4363/9/src/kudu/tserver/tserver_mm_ops.h
File src/kudu/tserver/tserver_mm_ops.h:

PS9, Line 1: // Licensed to the Apache Software Foundation (ASF) under one
Not sure I got the complete picture of all the discussions and the controversy 
around the server level mm. I think it's a great idea to have one but I think 
that MaintenanceMananger::FindBestOp() is way too tablet specific for the 
current implementation to make any sense in this case.
In specific I think it's forcing you to expose flags that we don't really want 
if we had a server level implementation of that maintenance manager method.


PS9, Line 55: OVERRIDE
use c++11 overrides


PS9, Line 65: LogPrefix
docs


http://gerrit.cloudera.org:8080/#/c/4363/9/src/kudu/util/maintenance_manager.cc
File src/kudu/util/maintenance_manager.cc:

PS9, Line 69: The target minimum size (in megabytes), per tablet, of ancient "
:  "data on disk needed to prioritize deletion of that 
ancient data.
this sentence is very hard to parse. also why do we need a min? if we have a 
single gc'able undo delta shouldn't we always gc it, even if at a lower 
priority? in other words shouldn't the scheduler prioritize gc from something 
like 0 (no data to gc) to 1 (a lot of data to gc/disk is full) in a continuous 
spectrum?


http://gerrit.cloudera.org:8080/#/c/4363/9/src/kudu/util/maintenance_manager.h
File src/kudu/util/maintenance_manager.h:

PS9, Line 118: Important: Update Clear() when adding fields to this class.
nice add, but could you move it to the class header?


PS9, Line 135: The approximate amount of disk space that not doing this 
operation keeps us from GCing from
 :   // the data blocks.
I couldn't parse this, I know its a parallel comment to the fields above, but 
those are pretty unparseable too. How about: "Approximate amount of data that 
would be GCd if this operation ran"


-- 
To view, visit http://gerrit.cloudera.org:8080/4363
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I0309bf7acfb6d018860c80f354012c3500da5c68
Gerrit-PatchSet: 9
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Mike Percy 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Jean-Daniel Cryans 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Tidy Bot
Gerrit-HasComments: Yes


[kudu-CR] KUDU-1601. Delete ancient UNDO delta blocks in the background

2017-02-21 Thread David Ribeiro Alves (Code Review)
David Ribeiro Alves has posted comments on this change.

Change subject: KUDU-1601. Delete ancient UNDO delta blocks in the background
..


Patch Set 9:

(23 comments)

http://gerrit.cloudera.org:8080/#/c/4363/9//COMMIT_MSG
Commit Message:

PS9, Line 15: The task was written to operate at the server level
in contrast to?


http://gerrit.cloudera.org:8080/#/c/4363/9/src/kudu/integration-tests/tablet_history_gc-itest.cc
File src/kudu/integration-tests/tablet_history_gc-itest.cc:

PS9, Line 112: // Don't GC until we move the clock.
nit: weird comment. isn't this more related to the increments to the clock that 
you make? i.e. if you didn't set this we still wouldn't GC until we moved the 
clock.


PS9, Line 116: is fine
s/is fine/cluster


PS9, Line 118: // Create a single-replica tablet so we can write to it and test 
undo GC on it.
don't think we need this comment


PS9, Line 148: j
s/j/row_value


PS9, Line 774: ASSERT_OK(tablet->DeleteAncientUndoDeltas(_deleted, 
_deleted));
add some assertion that makes sure than when a value is non-zero both values 
are non-zero (this is correct, right?)


http://gerrit.cloudera.org:8080/#/c/4363/9/src/kudu/master/catalog_manager.h
File src/kudu/master/catalog_manager.h:

PS9, Line 501: GetTabletPeers
docs (at least a "see:")


Line 730:   // CatalogManager-wide maintenance operations.
nit: prepend "Start/Stop "


Line 771:   std::vector maintenance_ops_;
docs


http://gerrit.cloudera.org:8080/#/c/4363/9/src/kudu/server/server_base.h
File src/kudu/server/server_base.h:

PS9, Line 101:   virtual MaintenanceManager* maintenance_manager() = 0;
docs


http://gerrit.cloudera.org:8080/#/c/4363/9/src/kudu/tablet/delta_tracker.cc
File src/kudu/tablet/delta_tracker.cc:

PS9, Line 434: RETURN_NOT_OK(CommitDeltaUpdate(update, undos_to_remove, {}, 
UNDO, NO_FLUSH_METADATA));
so the caller is supposed to flush the metadata _after_ the blocks are deleted? 
is there no problem if the caller fails to flush the metadata and the blocks 
are deleted? do we have coverage for that?
aside: having a nice comment on the order of operations here.


http://gerrit.cloudera.org:8080/#/c/4363/9/src/kudu/tablet/delta_tracker.h
File src/kudu/tablet/delta_tracker.h:

PS9, Line 149: not
 :   // modifications to data values.
not sure what this means... which modifications to data values are you 
referring to?
I think the ambiguity you're trying to address stems from the method name.
maybe rename this to: CommitDeltaStoreMetadataUpdate or some such


PS9, Line 165: max_timestamp
surround with ' for consistency (or remove the underscore)


PS9, Line 173: Timestamp::kInvalidTimestamp
maybe we should use kInitialTimestamp instead (kmin +1 instead of kMax -1), to 
perclude any possible comparison mistakes

also not sure I understand what you mean by "max_timestamp of each delta is not 
considered"


PS9, Line 176: If no return value is desired
this reads weird


PS9, Line 178: InitAncientUndoDeltas
is this always called independently of whether we are GC'ing undos? if so would 
likely make sense to remove "ancient" from the method name


PS9, Line 189: / Does not flush updates to the rowset metadata. The caller must 
flush the
 :   // metadata explicitly.
should likely append "after calling this method and if it returns an OK status"


PS9, Line 228: not including the DeltaMemStore.
the deltamemstore is alway REDO, this doesn't make much sense


PS9, Line 244: WhichStores
this "WhichStores" seems to imply it's not "all"


PS9, Line 298: LogPrefix
docs


http://gerrit.cloudera.org:8080/#/c/4363/9/src/kudu/tablet/diskrowset-test.cc
File src/kudu/tablet/diskrowset-test.cc:

PS9, Line 568: -1
magic number, use const or add /* */ style coment, same for the nulls


http://gerrit.cloudera.org:8080/#/c/4363/9/src/kudu/tablet/diskrowset.cc
File src/kudu/tablet/diskrowset.cc:

PS9, Line 20: stl_logging.h
interesting, is this to print the vector?


PS9, Line 543: CHECK_OK(rowset_metadata_->Flush());
fuzzy a to why its not ok to flush here and its ok to flush in other places


-- 
To view, visit http://gerrit.cloudera.org:8080/4363
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I0309bf7acfb6d018860c80f354012c3500da5c68
Gerrit-PatchSet: 9
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Mike Percy 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Jean-Daniel Cryans 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Tidy Bot
Gerrit-HasComments: Yes


[kudu-CR] KUDU-1601. Delete ancient UNDO delta blocks in the background

2017-02-21 Thread Mike Percy (Code Review)
Hello Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/4363

to look at the new patch set (#10).

Change subject: KUDU-1601. Delete ancient UNDO delta blocks in the background
..

KUDU-1601. Delete ancient UNDO delta blocks in the background

This patch adds a maintenance manager background task that deletes
"ancient" UNDO delta blocks, which means blocks that correspond to data
that is considered no longer reachable and a candidate for garbage
collection. The task only deletes entire blocks and so does not provoke
write amplification.

The task was written to operate at the server level because the
UpdateStats() part of the task spends a budgeted amount of time opening
(initializing) undo delta blocks in order to read the stats from their
headers. Without reading the delta stats, it is impossible to know
whether a given block is a candidate for GC.

This patch includes the following:

* New UNDO delta block GC MM task running at the TS / Master level
* New UNDO delta block GC metrics, both at the tablet and the server level
* Flags to enable / disable the GC task as well as flags to throttle it
* A few minor improvements in the maintenance manager
* Fixes for a few preexisting clang-tidy lint complaints

A couple of implementation details:

* We shut down external MM ops on a tablet before shutting down the
  tablet itself. This is enforced with a condition variable and a new
  mutex in tablet.cc
* When performing undo delta GC in Tablet::DeleteAncientUndoDeltas(), we
  only flush the tablet metadata after making the metadata changes
  across all rowsets. This should be safe because we are not actually
  modifying any data, we are simply removing references to blocks that
  are no longer reachable by new scanners.

Includes the following tests:

* RowSet-level unit test in diskrowset-test
* Tablet-level functional test in tablet_history_gc-test
* Tablet-level concurrency test in mt-tablet-test
* Integration test utilizing the tserver-level MM task in
  tablet_history_gc-itest
* Incorporated into RandomizedTabletHistoryGcITest in
  tablet_history_gc-itest

Change-Id: I0309bf7acfb6d018860c80f354012c3500da5c68
---
M src/kudu/integration-tests/alter_table-test.cc
M src/kudu/integration-tests/full_stack-insert-scan-test.cc
M src/kudu/integration-tests/fuzz-itest.cc
M src/kudu/integration-tests/tablet_history_gc-itest.cc
M src/kudu/integration-tests/ts_tablet_manager-itest.cc
M src/kudu/master/catalog_manager.cc
M src/kudu/master/catalog_manager.h
M src/kudu/master/master.cc
M src/kudu/master/master.h
M src/kudu/server/server_base.h
M src/kudu/tablet/delta_tracker.cc
M src/kudu/tablet/delta_tracker.h
M src/kudu/tablet/diskrowset-test.cc
M src/kudu/tablet/diskrowset.cc
M src/kudu/tablet/diskrowset.h
M src/kudu/tablet/memrowset.h
M src/kudu/tablet/mock-rowsets.h
M src/kudu/tablet/mt-tablet-test.cc
M src/kudu/tablet/rowset.h
M src/kudu/tablet/rowset_metadata.cc
M src/kudu/tablet/rowset_metadata.h
M src/kudu/tablet/tablet.cc
M src/kudu/tablet/tablet.h
M src/kudu/tablet/tablet_history_gc-test.cc
M src/kudu/tablet/tablet_metrics.cc
M src/kudu/tablet/tablet_metrics.h
M src/kudu/tools/kudu-tool-test.cc
M src/kudu/tserver/CMakeLists.txt
M src/kudu/tserver/tablet_peer_lookup.h
M src/kudu/tserver/tablet_server.cc
M src/kudu/tserver/tablet_server.h
M src/kudu/tserver/tablet_service.cc
M src/kudu/tserver/ts_tablet_manager.cc
M src/kudu/tserver/ts_tablet_manager.h
M src/kudu/tserver/tserver-path-handlers.cc
A src/kudu/tserver/tserver_mm_ops.cc
A src/kudu/tserver/tserver_mm_ops.h
M src/kudu/util/maintenance_manager-test.cc
M src/kudu/util/maintenance_manager.cc
M src/kudu/util/maintenance_manager.h
40 files changed, 1,477 insertions(+), 135 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/63/4363/10
-- 
To view, visit http://gerrit.cloudera.org:8080/4363
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I0309bf7acfb6d018860c80f354012c3500da5c68
Gerrit-PatchSet: 10
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Mike Percy 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Jean-Daniel Cryans 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Tidy Bot


[kudu-CR] KUDU-1601. Delete ancient UNDO delta blocks in the background

2017-02-21 Thread Mike Percy (Code Review)
Mike Percy has posted comments on this change.

Change subject: KUDU-1601. Delete ancient UNDO delta blocks in the background
..


Patch Set 5:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/4363/5//COMMIT_MSG
Commit Message:

PS5, Line 15: The task was written to operate at the server level because the
: UpdateStats() part of the task spends a budgeted amount of time 
opening
: (initializing) undo delta blocks in order to read the stats from 
their
: headers.
> Maybe I'm missing something obvious, but I don't see how the explanation (s
> What is it about UpdateStats() doing I/O that forces the op to be 
> server-level, or makes it more desirable?

The idea is to attempt to budget the amount of time that can be spent per MM 
cycle on initializing delta files. If we have # of these tasks == # tablets 
then it's not really obvious how to do that without building a special facility 
for it. Doing it at the server level makes it easy because it's effectively a 
singleton, so we can put that logic in UpdateStats().

> Would it be possible to make opening of eager UNDO delta blocks an MM op in 
> its own right? Doing I/O in UpdateStats() (regardless of at what level) seems 
> weird; it has the potential to stall other cheap MM operations (such as log 
> GC) if the I/O is blocked for a really long time.

That seems problematic because it's not really clear how to schedule it. In 
general we want to sort-of "always" be opening deltas. Once we have all deltas 
open, it's pretty cheap to make decisions about GCing data (and we want to do 
it with high priority because it's cheap and important to do).

> It also puts the I/O on the scheduler thread instead of on the MM operations 
> thread(s).

I do agree that putting the scheduler at the mercy of disk latency is not 
ideal. OTOH, if the disk(s) are really so burdened, all of the MM ops will be 
slow anyway.

> If you don't want to introduce two new ops, perhaps they can be "combined" 
> such that Perform() either opens some UNDO blocks, or does ancient history 
> GC, and we'll decide which approach to take during UpdateStats()?

I suppose another way to do this could be to schedule the job with very high 
priority but not more than some fraction of the time (I've already added logic 
to do this, and defaulted it to 50% of the time as the max). Then Perform() 
would basically consist of: InitializeAncientBlocks(tablets, budget_deadline); 
DeleteAncientBlocks(tablets);

Such an approach doesn't try to solve the related but separate problem of 
eventually opening *all* delta files, not just ancient ones... but maybe it 
could be made to in a follow-up change.


http://gerrit.cloudera.org:8080/#/c/4363/5/src/kudu/master/catalog_manager.cc
File src/kudu/master/catalog_manager.cc:

Line 2357:   // The catalog manager has only one tablet peer.
> It's possible for sys_catalog_ to not yet be set, in which case this functi
Thanks for the catch.


-- 
To view, visit http://gerrit.cloudera.org:8080/4363
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I0309bf7acfb6d018860c80f354012c3500da5c68
Gerrit-PatchSet: 5
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Mike Percy 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Jean-Daniel Cryans 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Tidy Bot
Gerrit-HasComments: Yes


[kudu-CR] KUDU-1601. Delete ancient UNDO delta blocks in the background

2017-02-20 Thread Adar Dembo (Code Review)
Adar Dembo has posted comments on this change.

Change subject: KUDU-1601. Delete ancient UNDO delta blocks in the background
..


Patch Set 5:

(2 comments)

Mostly just passing through as I was interested to see the solution taken.

http://gerrit.cloudera.org:8080/#/c/4363/5//COMMIT_MSG
Commit Message:

PS5, Line 15: The task was written to operate at the server level because the
: UpdateStats() part of the task spends a budgeted amount of time 
opening
: (initializing) undo delta blocks in order to read the stats from 
their
: headers.
Maybe I'm missing something obvious, but I don't see how the explanation 
(second half) justifies the assertion (first half). What is it about 
UpdateStats() doing I/O that forces the op to be server-level, or makes it more 
desirable?

I agree with a comment JD made in chat: would it be possible to make opening of 
eager UNDO delta blocks an MM op in its own right? Doing I/O in UpdateStats() 
(regardless of at what level) seems weird; it has the potential to stall other 
cheap MM operations (such as log GC) if the I/O is blocked for a really long 
time. It also puts the I/O on the scheduler thread instead of on the MM 
operations thread(s).

If you don't want to introduce two new ops, perhaps they can be "combined" such 
that Perform() either opens some UNDO blocks, or does ancient history GC, and 
we'll decide which approach to take during UpdateStats()?


http://gerrit.cloudera.org:8080/#/c/4363/5/src/kudu/master/catalog_manager.cc
File src/kudu/master/catalog_manager.cc:

Line 2357:   // The catalog manager has only one tablet peer.
It's possible for sys_catalog_ to not yet be set, in which case this function 
should return an error (see GetTabletPeer()).


-- 
To view, visit http://gerrit.cloudera.org:8080/4363
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I0309bf7acfb6d018860c80f354012c3500da5c68
Gerrit-PatchSet: 5
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Mike Percy 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Jean-Daniel Cryans 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Tidy Bot
Gerrit-HasComments: Yes


[kudu-CR] KUDU-1601. Delete ancient UNDO delta blocks in the background

2017-02-20 Thread Mike Percy (Code Review)
Hello Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/4363

to look at the new patch set (#5).

Change subject: KUDU-1601. Delete ancient UNDO delta blocks in the background
..

KUDU-1601. Delete ancient UNDO delta blocks in the background

This patch adds a maintenance manager background task that deletes
"ancient" UNDO delta blocks, which means blocks that correspond to data
that is considered no longer reachable and a candidate for garbage
collection. The task only deletes entire blocks and so does not provoke
write amplification.

The task was written to operate at the server level because the
UpdateStats() part of the task spends a budgeted amount of time opening
(initializing) undo delta blocks in order to read the stats from their
headers. Without reading the delta stats, it is impossible to know
whether a given block is a candidate for GC.

This patch includes the following:

* New UNDO delta block GC MM task running at the TS / Master level
* New UNDO delta block GC metrics, both at the tablet and the server level
* Flags to enable / disable the GC task as well as flags to throttle it
* A few minor improvements in the maintenance manager
* Fixes for a few preexisting clang-tidy lint complaints

A couple of implementation details:

* We shut down external MM ops on a tablet before shutting down the
  tablet itself. This is enforced with a condition variable and a new
  mutex in tablet.cc
* When performing undo delta GC in Tablet::DeleteAncientUndoDeltas(), we
  only flush the tablet metadata after making the metadata changes
  across all rowsets. This should be safe because we are not actually
  modifying any data, we are simply removing references to blocks that
  are no longer reachable by new scanners.

Includes the following tests:

* RowSet-level unit test in diskrowset-test
* Tablet-level functional test in tablet_history_gc-test
* Tablet-level concurrency test in mt-tablet-test
* Integration test utilizing the tserver-level MM task in
  tablet_history_gc-itest
* Incorporated into RandomizedTabletHistoryGcITest in
  tablet_history_gc-itest

Change-Id: I0309bf7acfb6d018860c80f354012c3500da5c68
---
M src/kudu/integration-tests/tablet_history_gc-itest.cc
M src/kudu/master/catalog_manager.cc
M src/kudu/master/catalog_manager.h
M src/kudu/master/master.cc
M src/kudu/master/master.h
M src/kudu/server/server_base.h
M src/kudu/tablet/delta_tracker.cc
M src/kudu/tablet/delta_tracker.h
M src/kudu/tablet/diskrowset-test.cc
M src/kudu/tablet/diskrowset.cc
M src/kudu/tablet/diskrowset.h
M src/kudu/tablet/memrowset.h
M src/kudu/tablet/mock-rowsets.h
M src/kudu/tablet/mt-tablet-test.cc
M src/kudu/tablet/rowset.h
M src/kudu/tablet/rowset_metadata.cc
M src/kudu/tablet/rowset_metadata.h
M src/kudu/tablet/tablet.cc
M src/kudu/tablet/tablet.h
M src/kudu/tablet/tablet_history_gc-test.cc
M src/kudu/tablet/tablet_metrics.cc
M src/kudu/tablet/tablet_metrics.h
M src/kudu/tserver/CMakeLists.txt
M src/kudu/tserver/tablet_peer_lookup.h
M src/kudu/tserver/tablet_server.cc
M src/kudu/tserver/tablet_server.h
M src/kudu/tserver/ts_tablet_manager.cc
M src/kudu/tserver/ts_tablet_manager.h
A src/kudu/tserver/tserver_mm_ops.cc
A src/kudu/tserver/tserver_mm_ops.h
M src/kudu/util/maintenance_manager-test.cc
M src/kudu/util/maintenance_manager.cc
M src/kudu/util/maintenance_manager.h
33 files changed, 1,379 insertions(+), 118 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/63/4363/5
-- 
To view, visit http://gerrit.cloudera.org:8080/4363
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I0309bf7acfb6d018860c80f354012c3500da5c68
Gerrit-PatchSet: 5
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Mike Percy 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Jean-Daniel Cryans 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Tidy Bot


[kudu-CR] KUDU-1601. Delete ancient UNDO delta blocks in the background

2017-02-20 Thread Mike Percy (Code Review)
Hello Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/4363

to look at the new patch set (#4).

Change subject: KUDU-1601. Delete ancient UNDO delta blocks in the background
..

KUDU-1601. Delete ancient UNDO delta blocks in the background

This patch adds a maintenance manager background task that deletes
"ancient" UNDO delta blocks, which means blocks that correspond to data
that is considered no longer reachable and a candidate for garbage
collection. The task only deletes entire blocks and so does not provoke
write amplification.

The task was written to operate at the server level because the
UpdateStats() part of the task spends a budgeted amount of time opening
(initializing) undo delta blocks in order to read the stats from their
headers. Without reading the delta stats, it is impossible to know
whether a given block is a candidate for GC.

This patch includes the following:

* New UNDO delta block GC MM task running at the TS / Master level
* New UNDO delta block GC metrics, both at the tablet and the server level
* Flags to enable / disable the GC task as well as flags to throttle it
* A few minor improvements in the maintenance manager
* Fixes for a few preexisting clang-tidy lint complaints

A couple of implementation details:

* We shut down external MM ops on a tablet before shutting down the
  tablet itself. This is enforced with a condition variable and a new
  mutex in tablet.cc
* When performing undo delta GC in Tablet::DeleteAncientUndoDeltas(), we
  only flush the tablet metadata after making the metadata changes
  across all rowsets. This should be safe because we are not actually
  modifying any data, we are simply removing references to blocks that
  are no longer reachable by new scanners.

Includes the following tests:

* RowSet-level unit test in diskrowset-test
* Tablet-level functional test in tablet_history_gc-test
* Tablet-level concurrency test in mt-tablet-test
* Integration test utilizing the tserver-level MM task in
  tablet_history_gc-itest
* Incorporated into RandomizedTabletHistoryGcITest in
  tablet_history_gc-itest

Change-Id: I0309bf7acfb6d018860c80f354012c3500da5c68
---
M src/kudu/integration-tests/tablet_history_gc-itest.cc
M src/kudu/master/catalog_manager.cc
M src/kudu/master/catalog_manager.h
M src/kudu/master/master.cc
M src/kudu/master/master.h
M src/kudu/server/server_base.h
M src/kudu/tablet/delta_tracker.cc
M src/kudu/tablet/delta_tracker.h
M src/kudu/tablet/diskrowset-test.cc
M src/kudu/tablet/diskrowset.cc
M src/kudu/tablet/diskrowset.h
M src/kudu/tablet/memrowset.h
M src/kudu/tablet/mock-rowsets.h
M src/kudu/tablet/mt-tablet-test.cc
M src/kudu/tablet/rowset.h
M src/kudu/tablet/rowset_metadata.cc
M src/kudu/tablet/rowset_metadata.h
M src/kudu/tablet/tablet.cc
M src/kudu/tablet/tablet.h
M src/kudu/tablet/tablet_history_gc-test.cc
M src/kudu/tablet/tablet_metrics.cc
M src/kudu/tablet/tablet_metrics.h
M src/kudu/tserver/CMakeLists.txt
M src/kudu/tserver/tablet_peer_lookup.h
M src/kudu/tserver/tablet_server.cc
M src/kudu/tserver/tablet_server.h
M src/kudu/tserver/ts_tablet_manager.cc
M src/kudu/tserver/ts_tablet_manager.h
A src/kudu/tserver/tserver_mm_ops.cc
A src/kudu/tserver/tserver_mm_ops.h
M src/kudu/util/maintenance_manager-test.cc
M src/kudu/util/maintenance_manager.cc
M src/kudu/util/maintenance_manager.h
33 files changed, 1,378 insertions(+), 117 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/63/4363/4
-- 
To view, visit http://gerrit.cloudera.org:8080/4363
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I0309bf7acfb6d018860c80f354012c3500da5c68
Gerrit-PatchSet: 4
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Mike Percy 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Jean-Daniel Cryans 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Tidy Bot


[kudu-CR] KUDU-1601. Delete ancient UNDO delta blocks in the background

2017-02-20 Thread Mike Percy (Code Review)
Mike Percy has abandoned this change.

Change subject: KUDU-1601. Delete ancient UNDO delta blocks in the background
..


Abandoned

Duplicate of https://gerrit.cloudera.org/4363/

-- 
To view, visit http://gerrit.cloudera.org:8080/6063
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: abandon
Gerrit-Change-Id: I8432548d4e6b6d5197c15737b9fae9f850f4de32
Gerrit-PatchSet: 1
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Mike Percy 
Gerrit-Reviewer: Kudu Jenkins


[kudu-CR] KUDU-1601. Delete ancient UNDO delta blocks in the background

2017-02-20 Thread Mike Percy (Code Review)
Hello Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/4363

to look at the new patch set (#3).

Change subject: KUDU-1601. Delete ancient UNDO delta blocks in the background
..

KUDU-1601. Delete ancient UNDO delta blocks in the background

This patch adds a maintenance manager background task that deletes
"ancient" UNDO delta blocks, which means blocks that correspond to data
that is considered no longer reachable and a candidate for garbage
collection. The task only deletes entire blocks and so does not provoke
write amplification.

The task was written to operate at the server level because the
UpdateStats() part of the task spends a budgeted amount of time opening
(initializing) undo delta blocks in order to read the stats from their
headers. Without reading the delta stats, it is impossible to know
whether a given block is a candidate for GC.

This patch includes the following:

* New UNDO delta block GC MM task running at the TS / Master level
* New UNDO delta block GC metrics, both at the tablet and the server level
* Flags to enable / disable the GC task as well as flags to throttle it
* A few minor improvements in the maintenance manager
* Fixes for a few preexisting clang-tidy lint complaints

A couple of implementations details:

* We shut down MM ops on a tablet before shutting down the tablet
  itself. This is enforced with a condition variable and a new mutex in
  tablet.cc.
* When performing undo delta GC in Tablet::DeleteAncientUndoDeltas(), we
  only flush the tablet metadata after making the metadata changes
  across all rowsets. This should be safe because we are not actually
  modifying any data, we are simply removing references to blocks that
  are no longer reachable by new scanners.

Includes the following tests:

* RowSet-level unit test in diskrowset-test
* Tablet-level functional test in tablet_history_gc-test
* Tablet-level concurrency test in mt-tablet-test
* Integration test utilizing the tserver-level MM task in
  tablet_history_gc-itest
* Incorporated into RandomizedTabletHistoryGcITest in
  tablet_history_gc-itest

Change-Id: I0309bf7acfb6d018860c80f354012c3500da5c68
---
M src/kudu/integration-tests/tablet_history_gc-itest.cc
M src/kudu/master/catalog_manager.cc
M src/kudu/master/catalog_manager.h
M src/kudu/master/master.cc
M src/kudu/master/master.h
M src/kudu/server/server_base.h
M src/kudu/tablet/delta_tracker.cc
M src/kudu/tablet/delta_tracker.h
M src/kudu/tablet/diskrowset-test.cc
M src/kudu/tablet/diskrowset.cc
M src/kudu/tablet/diskrowset.h
M src/kudu/tablet/memrowset.h
M src/kudu/tablet/mock-rowsets.h
M src/kudu/tablet/mt-tablet-test.cc
M src/kudu/tablet/rowset.h
M src/kudu/tablet/rowset_metadata.cc
M src/kudu/tablet/rowset_metadata.h
M src/kudu/tablet/tablet.cc
M src/kudu/tablet/tablet.h
M src/kudu/tablet/tablet_history_gc-test.cc
M src/kudu/tablet/tablet_metrics.cc
M src/kudu/tablet/tablet_metrics.h
M src/kudu/tserver/CMakeLists.txt
M src/kudu/tserver/tablet_peer_lookup.h
M src/kudu/tserver/tablet_server.cc
M src/kudu/tserver/tablet_server.h
M src/kudu/tserver/ts_tablet_manager.cc
M src/kudu/tserver/ts_tablet_manager.h
A src/kudu/tserver/tserver_mm_ops.cc
A src/kudu/tserver/tserver_mm_ops.h
M src/kudu/util/maintenance_manager-test.cc
M src/kudu/util/maintenance_manager.cc
M src/kudu/util/maintenance_manager.h
33 files changed, 1,377 insertions(+), 114 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/63/4363/3
-- 
To view, visit http://gerrit.cloudera.org:8080/4363
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I0309bf7acfb6d018860c80f354012c3500da5c68
Gerrit-PatchSet: 3
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Mike Percy 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Jean-Daniel Cryans 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Tidy Bot


[kudu-CR] KUDU-1601. Delete ancient UNDO delta blocks in the background

2017-02-17 Thread Mike Percy (Code Review)
Mike Percy has uploaded a new change for review.

  http://gerrit.cloudera.org:8080/6063

Change subject: KUDU-1601. Delete ancient UNDO delta blocks in the background
..

KUDU-1601. Delete ancient UNDO delta blocks in the background

This patch adds a maintenance manager background task that deletes
"ancient" UNDO delta blocks, which means blocks that correspond to data
that is considered no longer reachable and a candidate for garbage
collection. The task only deletes entire blocks and so does not provoke
write amplification.

The task was written to operate at the server level because the
UpdateStats() part of the task spends a budgeted amount of time opening
(initializing) undo delta blocks in order to read the stats from their
headers. Without reading the delta stats, it is impossible to know
whether a given block is a candidate for GC.

This patch includes the following:

* Adds UNDO delta block GC task at TS / Master level
* Adds a bunch of metrics, both at the tablet and TS level
* Adds flags to enable / disable the GC task as well as flags to
  throttle it
* Makes several minor improvements in the maintenance manager
* Shut down MM ops before shutting down Tablet
* Fix a few lint complaints
* Only flush tablet metadata after doing all writes.

TODO: This patch could use more tests, including the following:

* Rowset-level unit test
* Incorporate into mt-tablet-test
* Incorporate into fuzz-test
* Integration test utilizing the actual MM tasks (tserver, master)

Change-Id: I0309bf7acfb6d018860c80f354012c3500da5c68

W

Change-Id: I8432548d4e6b6d5197c15737b9fae9f850f4de32
---
M src/kudu/master/catalog_manager.cc
M src/kudu/master/catalog_manager.h
M src/kudu/master/master.cc
M src/kudu/master/master.h
M src/kudu/server/server_base.h
M src/kudu/tablet/delta_tracker.cc
M src/kudu/tablet/delta_tracker.h
M src/kudu/tablet/diskrowset.cc
M src/kudu/tablet/diskrowset.h
M src/kudu/tablet/memrowset.h
M src/kudu/tablet/mock-rowsets.h
M src/kudu/tablet/rowset.h
M src/kudu/tablet/rowset_metadata.cc
M src/kudu/tablet/rowset_metadata.h
M src/kudu/tablet/tablet.cc
M src/kudu/tablet/tablet.h
M src/kudu/tablet/tablet_history_gc-test.cc
M src/kudu/tablet/tablet_metrics.cc
M src/kudu/tablet/tablet_metrics.h
M src/kudu/tserver/CMakeLists.txt
M src/kudu/tserver/tablet_peer_lookup.h
M src/kudu/tserver/tablet_server.cc
M src/kudu/tserver/tablet_server.h
M src/kudu/tserver/ts_tablet_manager.cc
M src/kudu/tserver/ts_tablet_manager.h
A src/kudu/tserver/tserver_mm_ops.cc
A src/kudu/tserver/tserver_mm_ops.h
M src/kudu/util/maintenance_manager-test.cc
M src/kudu/util/maintenance_manager.cc
M src/kudu/util/maintenance_manager.h
30 files changed, 1,099 insertions(+), 79 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/63/6063/1
-- 
To view, visit http://gerrit.cloudera.org:8080/6063
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: I8432548d4e6b6d5197c15737b9fae9f850f4de32
Gerrit-PatchSet: 1
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Mike Percy