Hello Alexey Serbin, Kudu Jenkins, Andrew Wong, Bankim Bhavsar,

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/17974

to look at the new patch set (#7).

Change subject: [encryption] KUDU-3331 Encrypt file system
......................................................................

[encryption] KUDU-3331 Encrypt file system

de02a34 introduced encryption support to Env in a self-contained way,
but it's not used across Kudu.

This commit integrates this encryption support into the project and
modifies several test suites to also run tests with encryption enabled.

I also renamed "encrypted" to "is_sensitive" in *FileOption as a file
with this flag will be encrypted only if encryption is enabled for the
process.

When encryption is enabled, the following files are encrypted:

- WAL segments
- LBM blocks and metadata
- FBM blocks
- tablet and consensus metadata

Logs, rolling logs, instance and block manager instance files,
configuration files in integration tests are not encrypted.

As FileCache is not used to access instance files, it only supports
handling sensitive files and can't be used to access unencrypted files.

As the PBC CLI tool needs can be used to dump encrypted (metadata) and
unencrypted files (instance) as well, it needs to be able to determine
if a file is encrypted or not. As encryption headers are not yet
implemented, I introduced a hack which checks the file name and treats
the file as unencrypted if it ends with "instance" and encrypted
otherwise.

I ran some benchmarks to compare running Kudu with encryption enabled
and disabled.

The following are StartupBenchmark tests run with KUDU_ALLOW_SLOW_TESTS
set to true, which uses a block count of 1,000,000.

It seems enabling encryption adds around 20% overhead on startup in a
typical use-case with no deletes.

 Performance counter stats for './bin/log_block_manager-test 
--gtest_filter=*StartupBenchmark/0' (10 runs):

      40391.075316      task-clock (msec)         #    2.021 CPUs utilized      
      ( +-  1.05% )
            11,089      context-switches          #    0.275 K/sec              
      ( +-  9.87% )
               280      cpu-migrations            #    0.007 K/sec              
      ( +-  1.58% )
           593,982      page-faults               #    0.015 M/sec              
      ( +-  2.13% )
   110,595,311,391      cycles                    #    2.738 GHz                
      ( +-  1.03% )
    90,580,214,722      instructions              #    0.82  insn per cycle     
      ( +-  0.14% )
    16,449,237,957      branches                  #  407.249 M/sec              
      ( +-  0.15% )
        67,169,915      branch-misses             #    0.41% of all branches    
      ( +-  0.49% )

      19.988553457 seconds time elapsed                                         
 ( +-  0.58% )

 Performance counter stats for './bin/log_block_manager-test 
--encrypt_data_at_rest=1 --gtest_filter=*StartupBenchmark/0' (10 runs):

      51317.845606      task-clock (msec)         #    2.133 CPUs utilized      
      ( +-  0.90% )
            13,214      context-switches          #    0.257 K/sec              
      ( +-  4.03% )
               292      cpu-migrations            #    0.006 K/sec              
      ( +-  1.76% )
           737,815      page-faults               #    0.014 M/sec              
      ( +-  1.49% )
   144,898,246,536      cycles                    #    2.824 GHz                
      ( +-  1.08% )
   126,702,271,070      instructions              #    0.87  insn per cycle     
      ( +-  0.05% )
    24,116,649,584      branches                  #  469.947 M/sec              
      ( +-  0.05% )
       106,793,688      branch-misses             #    0.44% of all branches    
      ( +-  0.35% )

      24.055824830 seconds time elapsed                                         
 ( +-  0.89% )

With deletes, the difference seems to decrease to about 14% when 90% of
the blocks are deleted.

 Performance counter stats for './bin/log_block_manager-test 
--gtest_filter=*StartupBenchmark/1' (10 runs):

      53247.212289      task-clock (msec)         #    1.494 CPUs utilized      
      ( +-  0.69% )
            94,868      context-switches          #    0.002 M/sec              
      ( +-  0.13% )
               530      cpu-migrations            #    0.010 K/sec              
      ( +-  1.48% )
           399,284      page-faults               #    0.007 M/sec              
      ( +-  1.66% )
   145,147,457,046      cycles                    #    2.726 GHz                
      ( +-  0.48% )
   141,892,983,444      instructions              #    0.98  insn per cycle     
      ( +-  0.04% )
    26,167,495,753      branches                  #  491.434 M/sec              
      ( +-  0.04% )
        59,986,442      branch-misses             #    0.23% of all branches    
      ( +-  0.33% )

      35.648681894 seconds time elapsed                                         
 ( +-  1.40% )

 Performance counter stats for './bin/log_block_manager-test 
--encrypt_data_at_rest=1 --gtest_filter=*StartupBenchmark/1' (10 runs):

      70616.598642      task-clock (msec)         #    1.737 CPUs utilized      
      ( +-  0.81% )
            95,082      context-switches          #    0.001 M/sec              
      ( +-  0.28% )
               523      cpu-migrations            #    0.007 K/sec              
      ( +-  1.69% )
           679,834      page-faults               #    0.010 M/sec              
      ( +-  1.66% )
   203,066,615,244      cycles                    #    2.876 GHz                
      ( +-  1.05% )
   209,355,734,267      instructions              #    1.03  insn per cycle     
      ( +-  0.08% )
    40,477,560,095      branches                  #  573.202 M/sec              
      ( +-  0.07% )
       133,637,310      branch-misses             #    0.33% of all branches    
      ( +-  1.48% )

      40.653406472 seconds time elapsed                                         
 ( +-  1.52% )

Delete tablet benchmark takes less than a second to run, so I ran it
1000 times with encryption disabled and enabled. It seems encryption
costs about 30% of overhead in this case.

 Performance counter stats for './bin/tablet_server-test 
--gtest_filter=TabletServerTest.TestDeleteTabletBenchmark' (1000 runs):

        735.800649      task-clock (msec)         #    0.994 CPUs utilized      
      ( +-  0.33% )
             3,613      context-switches          #    0.005 M/sec              
      ( +-  0.15% )
               178      cpu-migrations            #    0.242 K/sec              
      ( +-  0.29% )
            10,722      page-faults               #    0.015 M/sec              
      ( +-  0.08% )
     1,316,404,469      cycles                    #    1.789 GHz                
      ( +-  0.19% )
     1,629,691,550      instructions              #    1.24  insn per cycle     
      ( +-  0.21% )
       337,778,107      branches                  #  459.062 M/sec              
      ( +-  0.19% )
         6,340,956      branch-misses             #    1.88% of all branches    
      ( +-  0.21% )

       0.739940005 seconds time elapsed                                         
 ( +-  2.33% )

 Performance counter stats for './bin/tablet_server-test 
--encrypt_data_at_rest=1 
--gtest_filter=TabletServerTest.TestDeleteTabletBenchmark' (1000 runs):

        769.368354      task-clock (msec)         #    0.792 CPUs utilized      
      ( +-  0.34% )
             3,633      context-switches          #    0.005 M/sec              
      ( +-  0.13% )
               183      cpu-migrations            #    0.238 K/sec              
      ( +-  0.29% )
            10,737      page-faults               #    0.014 M/sec              
      ( +-  0.07% )
     1,356,327,815      cycles                    #    1.763 GHz                
      ( +-  0.14% )
     1,635,206,270      instructions              #    1.21  insn per cycle     
      ( +-  0.06% )
       338,261,840      branches                  #  439.662 M/sec              
      ( +-  0.06% )
         6,486,125      branch-misses             #    1.92% of all branches    
      ( +-  0.21% )

       0.971974609 seconds time elapsed                                         
 ( +-  2.42% )

I also wanted to run dense_node-itest with -num_seconds=240 and
-num_tablets=1000, but the amount of data written (both in terms of
number of blocks and bytes), and so was the time spent reopening the
tablet server (seemingly without correlation to the amount of data under
management by the LBM). This was true with and without encryption being
enabled, so it's hard to draw any conclusions from this benchmark.

$ for x in {0..1}; do perf stat -r 5 --log-fd=3 ./bin/dense_node-itest 
-num_tablets=1000 -num_seconds=240 --gtest_filter=DenseNodeTest.RunTest/$x 3>&1 
2> >(grep "dense_node-itest.cc") > >(grep "===="); done
[==========] Running 1 test from 1 test suite.
I1110 05:23:51.307804 107446 dense_node-itest.cc:223] Time spent restarting 
master: real 0.083s user 0.000s     sys 0.001s
I1110 05:24:19.118248 107446 dense_node-itest.cc:226] Time spent restarting 
tserver: real 27.810s       user 0.017s     sys 0.100s
I1110 05:24:19.118268 107446 dense_node-itest.cc:237] not waiting for 
bootstrapping tablets (flag disabled)
I1110 05:24:19.465764 107446 dense_node-itest.cc:242] 
log_block_manager_blocks_under_management: 38136
I1110 05:24:19.465798 107446 dense_node-itest.cc:242] 
log_block_manager_bytes_under_management: 250252129
I1110 05:24:19.465804 107446 dense_node-itest.cc:242] 
log_block_manager_containers: 9892
I1110 05:24:19.465808 107446 dense_node-itest.cc:242] 
log_block_manager_full_containers: 4017
I1110 05:24:19.465812 107446 dense_node-itest.cc:242] threads_running: 1334
[==========] 1 test from 1 test suite ran. (608499 ms total)
[==========] Running 1 test from 1 test suite.
I1110 05:33:58.751773 121520 dense_node-itest.cc:223] Time spent restarting 
master: real 0.053s user 0.000s     sys 0.001s
I1110 05:41:09.045049 121520 dense_node-itest.cc:226] Time spent restarting 
tserver: real 430.293s      user 0.291s     sys 1.523s
I1110 05:41:09.045073 121520 dense_node-itest.cc:237] not waiting for 
bootstrapping tablets (flag disabled)
I1110 05:41:09.257555 121520 dense_node-itest.cc:242] 
log_block_manager_blocks_under_management: 29376
I1110 05:41:09.257593 121520 dense_node-itest.cc:242] 
log_block_manager_bytes_under_management: 199511663
I1110 05:41:09.257599 121520 dense_node-itest.cc:242] 
log_block_manager_containers: 11464
I1110 05:41:09.257604 121520 dense_node-itest.cc:242] 
log_block_manager_full_containers: 1319
I1110 05:41:09.257608 121520 dense_node-itest.cc:242] threads_running: 226
[==========] 1 test from 1 test suite ran. (947330 ms total)
[==========] Running 1 test from 1 test suite.
I1110 05:50:01.919380 143199 dense_node-itest.cc:223] Time spent restarting 
master: real 0.094s user 0.000s     sys 0.003s
I1110 05:51:13.537168 143199 dense_node-itest.cc:226] Time spent restarting 
tserver: real 71.618s       user 0.057s     sys 0.249s
I1110 05:51:13.537186 143199 dense_node-itest.cc:237] not waiting for 
bootstrapping tablets (flag disabled)
I1110 05:51:13.581818 143199 dense_node-itest.cc:242] 
log_block_manager_blocks_under_management: 32557
I1110 05:51:13.581848 143199 dense_node-itest.cc:242] 
log_block_manager_bytes_under_management: 189720720
I1110 05:51:13.581853 143199 dense_node-itest.cc:242] 
log_block_manager_containers: 9910
I1110 05:51:13.581857 143199 dense_node-itest.cc:242] 
log_block_manager_full_containers: 2070
I1110 05:51:13.581861 143199 dense_node-itest.cc:242] threads_running: 225
[==========] 1 test from 1 test suite ran. (630376 ms total)
[==========] Running 1 test from 1 test suite.
I1110 06:00:18.327955 161886 dense_node-itest.cc:223] Time spent restarting 
master: real 0.094s user 0.001s     sys 0.001s
I1110 06:00:36.625900 161886 dense_node-itest.cc:226] Time spent restarting 
tserver: real 18.298s       user 0.011s     sys 0.063s
I1110 06:00:36.625918 161886 dense_node-itest.cc:237] not waiting for 
bootstrapping tablets (flag disabled)
I1110 06:00:36.779762 161886 dense_node-itest.cc:242] 
log_block_manager_blocks_under_management: 34068
I1110 06:00:36.779801 161886 dense_node-itest.cc:242] 
log_block_manager_bytes_under_management: 236467612
I1110 06:00:36.779808 161886 dense_node-itest.cc:242] 
log_block_manager_containers: 9880
I1110 06:00:36.779814 161886 dense_node-itest.cc:242] 
log_block_manager_full_containers: 1410
I1110 06:00:36.779817 161886 dense_node-itest.cc:242] threads_running: 1332
[==========] 1 test from 1 test suite ran. (481328 ms total)
[==========] Running 1 test from 1 test suite.
I1110 06:08:12.651594 176648 dense_node-itest.cc:223] Time spent restarting 
master: real 0.084s user 0.001s     sys 0.002s
I1110 06:09:14.436517 176648 dense_node-itest.cc:226] Time spent restarting 
tserver: real 61.785s       user 0.045s     sys 0.222s
I1110 06:09:14.436537 176648 dense_node-itest.cc:237] not waiting for 
bootstrapping tablets (flag disabled)
I1110 06:09:14.732786 176648 dense_node-itest.cc:242] 
log_block_manager_blocks_under_management: 32334
I1110 06:09:14.732823 176648 dense_node-itest.cc:242] 
log_block_manager_bytes_under_management: 225235870
I1110 06:09:14.732829 176648 dense_node-itest.cc:242] 
log_block_manager_containers: 10119
I1110 06:09:14.732833 176648 dense_node-itest.cc:242] 
log_block_manager_full_containers: 5070
I1110 06:09:14.732837 176648 dense_node-itest.cc:242] threads_running: 1313
[==========] 1 test from 1 test suite ran. (506357 ms total)

 Performance counter stats for './bin/dense_node-itest -num_tablets=1000 
-num_seconds=240 --gtest_filter=DenseNodeTest.RunTest/0' (5 runs):

    2995824.938107      task-clock (msec)         #    4.719 CPUs utilized      
      ( +-  5.08% )
        51,210,606      context-switches          #    0.017 M/sec              
      ( +-  8.77% )
        10,794,492      cpu-migrations            #    0.004 M/sec              
      ( +- 12.18% )
         5,334,419      page-faults               #    0.002 M/sec              
      ( +-  9.46% )
 6,394,831,534,947      cycles                    #    2.135 GHz                
      ( +-  9.25% )
 3,981,645,181,601      instructions              #    0.62  insn per cycle     
      ( +-  9.86% )
   792,397,266,216      branches                  #  264.501 M/sec              
      ( +-  9.75% )
     6,963,950,058      branch-misses             #    0.88% of all branches    
      ( +-  7.25% )

     634.804501857 seconds time elapsed                                         
 ( +- 13.11% )

[==========] Running 1 test from 1 test suite.
I1110 06:16:46.444833 194734 dense_node-itest.cc:223] Time spent restarting 
master: real 0.075s user 0.000s     sys 0.002s
I1110 06:17:29.511780 194734 dense_node-itest.cc:226] Time spent restarting 
tserver: real 43.067s       user 0.031s     sys 0.151s
I1110 06:17:29.511799 194734 dense_node-itest.cc:237] not waiting for 
bootstrapping tablets (flag disabled)
I1110 06:17:29.575773 194734 dense_node-itest.cc:242] 
log_block_manager_blocks_under_management: 33354
I1110 06:17:29.575809 194734 dense_node-itest.cc:242] 
log_block_manager_bytes_under_management: 162500445
I1110 06:17:29.575814 194734 dense_node-itest.cc:242] 
log_block_manager_containers: 9946
I1110 06:17:29.575817 194734 dense_node-itest.cc:242] 
log_block_manager_full_containers: 2847
I1110 06:17:29.575821 194734 dense_node-itest.cc:242] threads_running: 228
[==========] 1 test from 1 test suite ran. (468746 ms total)
[==========] Running 1 test from 1 test suite.
I1110 06:24:34.692034 14506 dense_node-itest.cc:223] Time spent restarting 
master: real 4.389s  user 0.003s     sys 0.017s
I1110 06:25:16.629412 14506 dense_node-itest.cc:226] Time spent restarting 
tserver: real 41.937s        user 0.027s     sys 0.146s
I1110 06:25:16.629434 14506 dense_node-itest.cc:237] not waiting for 
bootstrapping tablets (flag disabled)
I1110 06:25:22.050498 14506 dense_node-itest.cc:242] 
log_block_manager_blocks_under_management: 47634
I1110 06:25:22.050537 14506 dense_node-itest.cc:242] 
log_block_manager_bytes_under_management: 409166624
I1110 06:25:22.050544 14506 dense_node-itest.cc:242] 
log_block_manager_containers: 9779
I1110 06:25:22.050549 14506 dense_node-itest.cc:242] 
log_block_manager_full_containers: 3405
I1110 06:25:22.050552 14506 dense_node-itest.cc:242] threads_running: 1342
[==========] 1 test from 1 test suite ran. (434683 ms total)
[==========] Running 1 test from 1 test suite.
I1110 06:31:46.989178 26886 dense_node-itest.cc:223] Time spent restarting 
master: real 0.093s  user 0.000s     sys 0.003s
I1110 06:32:04.775068 26886 dense_node-itest.cc:226] Time spent restarting 
tserver: real 17.786s        user 0.010s     sys 0.048s
I1110 06:32:04.775091 26886 dense_node-itest.cc:237] not waiting for 
bootstrapping tablets (flag disabled)
I1110 06:32:04.830790 26886 dense_node-itest.cc:242] 
log_block_manager_blocks_under_management: 34068
I1110 06:32:04.830832 26886 dense_node-itest.cc:242] 
log_block_manager_bytes_under_management: 225249875
I1110 06:32:04.830838 26886 dense_node-itest.cc:242] 
log_block_manager_containers: 10352
I1110 06:32:04.830842 26886 dense_node-itest.cc:242] 
log_block_manager_full_containers: 1198
I1110 06:32:04.830849 26886 dense_node-itest.cc:242] threads_running: 1307
[==========] 1 test from 1 test suite ran. (401113 ms total)
[==========] Running 1 test from 1 test suite.
I1110 06:38:28.355651 39523 dense_node-itest.cc:223] Time spent restarting 
master: real 0.348s  user 0.001s     sys 0.003s
I1110 06:39:14.934329 39523 dense_node-itest.cc:226] Time spent restarting 
tserver: real 46.579s        user 0.034s     sys 0.161s
I1110 06:39:14.934355 39523 dense_node-itest.cc:237] not waiting for 
bootstrapping tablets (flag disabled)
I1110 06:39:15.086436 39523 dense_node-itest.cc:242] 
log_block_manager_blocks_under_management: 42942
I1110 06:39:15.086474 39523 dense_node-itest.cc:242] 
log_block_manager_bytes_under_management: 337224147
I1110 06:39:15.086480 39523 dense_node-itest.cc:242] 
log_block_manager_containers: 10706
I1110 06:39:15.086484 39523 dense_node-itest.cc:242] 
log_block_manager_full_containers: 2511
I1110 06:39:15.086489 39523 dense_node-itest.cc:242] threads_running: 1326
[==========] 1 test from 1 test suite ran. (542365 ms total)
[==========] Running 1 test from 1 test suite.
I1110 06:47:30.400804 54331 dense_node-itest.cc:223] Time spent restarting 
master: real 0.125s  user 0.001s     sys 0.002s
I1110 06:51:28.288359 54331 dense_node-itest.cc:226] Time spent restarting 
tserver: real 237.888s       user 0.160s     sys 0.855s
I1110 06:51:28.288374 54331 dense_node-itest.cc:237] not waiting for 
bootstrapping tablets (flag disabled)
I1110 06:51:28.633086 54331 dense_node-itest.cc:242] 
log_block_manager_blocks_under_management: 35292
I1110 06:51:28.633126 54331 dense_node-itest.cc:242] 
log_block_manager_bytes_under_management: 198352642
I1110 06:51:28.633132 54331 dense_node-itest.cc:242] 
log_block_manager_containers: 10015
I1110 06:51:28.633139 54331 dense_node-itest.cc:242] 
log_block_manager_full_containers: 2514
I1110 06:51:28.633143 54331 dense_node-itest.cc:242] threads_running: 1300
[==========] 1 test from 1 test suite ran. (709361 ms total)

 Performance counter stats for './bin/dense_node-itest -num_tablets=1000 
-num_seconds=240 --gtest_filter=DenseNodeTest.RunTest/1' (5 runs):

    3115638.620173      task-clock (msec)         #    6.094 CPUs utilized      
      ( +-  2.07% )
        50,595,541      context-switches          #    0.016 M/sec              
      ( +-  7.87% )
        10,972,126      cpu-migrations            #    0.004 M/sec              
      ( +-  9.15% )
         5,985,924      page-faults               #    0.002 M/sec              
      ( +- 10.89% )
 6,073,807,994,303      cycles                    #    1.949 GHz                
      ( +- 10.17% )
 4,094,332,719,732      instructions              #    0.67  insn per cycle     
      ( +-  7.76% )
   807,628,648,546      branches                  #  259.218 M/sec              
      ( +-  7.83% )
     7,158,875,684      branch-misses             #    0.89% of all branches    
      ( +-  6.38% )

     511.281338094 seconds time elapsed                                         
 ( +- 10.71% )

Change-Id: I909d0c4af0c1fca0d14c99a6627842dbe2ed7524
---
M src/kudu/consensus/consensus_meta-test.cc
M src/kudu/consensus/consensus_meta.cc
M src/kudu/consensus/log.cc
M src/kudu/consensus/log_index.cc
M src/kudu/consensus/log_util.cc
M src/kudu/fs/block_manager-test.cc
M src/kudu/fs/dir_manager.cc
M src/kudu/fs/dir_util.cc
M src/kudu/fs/file_block_manager.cc
M src/kudu/fs/fs_manager-test.cc
M src/kudu/fs/fs_manager.cc
M src/kudu/fs/log_block_manager-test-util.cc
M src/kudu/fs/log_block_manager-test.cc
M src/kudu/fs/log_block_manager.cc
M src/kudu/integration-tests/dense_node-itest.cc
M src/kudu/integration-tests/mini_cluster_fs_inspector.cc
M src/kudu/integration-tests/raft_consensus-itest.cc
M src/kudu/integration-tests/security-itest.cc
M src/kudu/mini-cluster/external_mini_cluster.cc
M src/kudu/mini-cluster/external_mini_cluster.h
M src/kudu/postgres/mini_postgres.cc
M src/kudu/ranger/ranger_client.cc
M src/kudu/security/test/mini_kdc.cc
M src/kudu/tablet/tablet_metadata.cc
M src/kudu/tools/kudu-tool-test.cc
M src/kudu/tools/tool_action_pbc.cc
M src/kudu/tserver/tablet_copy_client.cc
M src/kudu/tserver/tablet_copy_source_session-test.cc
M src/kudu/tserver/tablet_server-test.cc
M src/kudu/util/env-test.cc
M src/kudu/util/env.cc
M src/kudu/util/env.h
M src/kudu/util/env_posix.cc
M src/kudu/util/env_util.cc
M src/kudu/util/file_cache-test.cc
M src/kudu/util/file_cache.cc
M src/kudu/util/pb_util-test.cc
M src/kudu/util/pb_util.cc
M src/kudu/util/pb_util.h
M src/kudu/util/rolling_log.cc
M src/kudu/util/yamlreader-test.cc
41 files changed, 450 insertions(+), 216 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/74/17974/7
--
To view, visit http://gerrit.cloudera.org:8080/17974
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I909d0c4af0c1fca0d14c99a6627842dbe2ed7524
Gerrit-Change-Number: 17974
Gerrit-PatchSet: 7
Gerrit-Owner: Attila Bukor <[email protected]>
Gerrit-Reviewer: Alexey Serbin <[email protected]>
Gerrit-Reviewer: Andrew Wong <[email protected]>
Gerrit-Reviewer: Attila Bukor <[email protected]>
Gerrit-Reviewer: Bankim Bhavsar <[email protected]>
Gerrit-Reviewer: Kudu Jenkins (120)

Reply via email to