Adar Dembo has posted comments on this change. ( http://gerrit.cloudera.org:8080/15483 )
Change subject: KUDU-3079 Add MiniRanger ...................................................................... Patch Set 16: (43 comments) http://gerrit.cloudera.org:8080/#/c/15483/16//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/15483/16//COMMIT_MSG@33 PS16, Line 33: RESY REST http://gerrit.cloudera.org:8080/#/c/15483/16/build-support/dist_test.py File build-support/dist_test.py: http://gerrit.cloudera.org:8080/#/c/15483/16/build-support/dist_test.py@116 PS16, Line 116: # Add Postgres and Ranger Nit: terminate with a period. May also want to note that these are symlinks. http://gerrit.cloudera.org:8080/#/c/15483/16/build-support/run_dist_test.py File build-support/run_dist_test.py: http://gerrit.cloudera.org:8080/#/c/15483/16/build-support/run_dist_test.py@165 PS16, Line 165: for bin_path in glob.glob(os.path.join(ROOT, "build/*/bin")): Can you just merge this loop with the chrony one, and update the comment? http://gerrit.cloudera.org:8080/#/c/15483/16/build-support/run_dist_test.py@176 PS16, Line 176: glob.glob("/usr/lib/jvm/java-1.8.0-*")[0] This was already done above (L154); reuse. http://gerrit.cloudera.org:8080/#/c/15483/16/src/kudu/mini-cluster/external_mini_cluster.cc File src/kudu/mini-cluster/external_mini_cluster.cc: http://gerrit.cloudera.org:8080/#/c/15483/16/src/kudu/mini-cluster/external_mini_cluster.cc@337 PS16, Line 337: if (opts_.enable_ranger) { Where do we configure the Kudu Masters to talk to MiniRanger? Or is that going to be in a follow-up? http://gerrit.cloudera.org:8080/#/c/15483/16/src/kudu/ranger/CMakeLists.txt File src/kudu/ranger/CMakeLists.txt: http://gerrit.cloudera.org:8080/#/c/15483/16/src/kudu/ranger/CMakeLists.txt@69 PS16, Line 69: # need to chmod +x thirdparty/src/postgresql-42.2.10/postgresql-42.2.10.jar What is this comment telling us? Is this a TODO? http://gerrit.cloudera.org:8080/#/c/15483/16/src/kudu/ranger/CMakeLists.txt@100 PS16, Line 100: mini_ranger Alphabetical sorting. http://gerrit.cloudera.org:8080/#/c/15483/16/src/kudu/ranger/mini_ranger-test.cc File src/kudu/ranger/mini_ranger-test.cc: http://gerrit.cloudera.org:8080/#/c/15483/16/src/kudu/ranger/mini_ranger-test.cc@40 PS16, Line 40: MiniRanger ranger; Should be ranger_ http://gerrit.cloudera.org:8080/#/c/15483/16/src/kudu/ranger/mini_ranger-test.cc@41 PS16, Line 41: Nit: empty line here. http://gerrit.cloudera.org:8080/#/c/15483/16/src/kudu/ranger/mini_ranger-test.cc@44 PS16, Line 44: TEST_F(MiniRangerTest, TestGrantPrivilege) { We should at least a full lifecycle (start, restart, stop, etc) Maybe also test persistence (i.e. you started Ranger and did something. If you restart it, are the effects of that change still there?) http://gerrit.cloudera.org:8080/#/c/15483/16/src/kudu/ranger/mini_ranger-test.cc@52 PS16, Line 52: policy.items.emplace_back(item); std::move http://gerrit.cloudera.org:8080/#/c/15483/16/src/kudu/ranger/mini_ranger.h File src/kudu/ranger/mini_ranger.h: http://gerrit.cloudera.org:8080/#/c/15483/16/src/kudu/ranger/mini_ranger.h@40 PS16, Line 40: typedef std::pair<std::vector<std::string>, std::vector<ActionPB>> policy_item; This needs documentation, and should be called PolicyItem. http://gerrit.cloudera.org:8080/#/c/15483/16/src/kudu/ranger/mini_ranger.h@42 PS16, Line 42: struct AuthorizationPolicy { Doc. http://gerrit.cloudera.org:8080/#/c/15483/16/src/kudu/ranger/mini_ranger.h@50 PS16, Line 50: class MiniRanger { Doc. http://gerrit.cloudera.org:8080/#/c/15483/16/src/kudu/ranger/mini_ranger.h@57 PS16, Line 57: move How did you get away with this? We shouldn't be 'using' anything in this header. Should be std::move. http://gerrit.cloudera.org:8080/#/c/15483/16/src/kudu/ranger/mini_ranger.h@64 PS16, Line 64: Status AddPolicy(const AuthorizationPolicy& policy) WARN_UNUSED_RESULT; Doc. Also, can we pass by value and move into it? http://gerrit.cloudera.org:8080/#/c/15483/16/src/kudu/ranger/mini_ranger.h@69 PS16, Line 69: static Status CreateRangerConfigs(const std::string& config_string, Doc. http://gerrit.cloudera.org:8080/#/c/15483/16/src/kudu/ranger/mini_ranger.h@94 PS16, Line 94: std::string ranger_admin_url_; Doc. http://gerrit.cloudera.org:8080/#/c/15483/16/src/kudu/ranger/mini_ranger.cc File src/kudu/ranger/mini_ranger.cc: http://gerrit.cloudera.org:8080/#/c/15483/16/src/kudu/ranger/mini_ranger.cc@53 PS16, Line 53: What a mess. In future work, could you: 1. Take a stab at removing every property/parameter we don't need. 2. Documenting the effect of what remains? I know it's a big ask, but these configs are just completely inscrutable. I have no idea how anyone in the future could modify them safely. Might also consider storing these templates in a separate file, with a header to retrieve the de-templatized versions of them. http://gerrit.cloudera.org:8080/#/c/15483/16/src/kudu/ranger/mini_ranger.cc@1132 PS16, Line 1132: <value>simple</value> Does this warrant a TODO or something? I presume we want to support Kerberos at some point. http://gerrit.cloudera.org:8080/#/c/15483/16/src/kudu/ranger/mini_ranger.cc@1141 PS16, Line 1141: // TODO(awong): codify that we're waiting for Postgres to become usable. Yeah, this should be the next priority, otherwise the massive number of Sentry tests that we should (hopefully) bring to bear with Ranger will take a very long time to run. http://gerrit.cloudera.org:8080/#/c/15483/16/src/kudu/ranger/mini_ranger.cc@1147 PS16, Line 1147: if (ranger_process_ && ranger_process_->IsStarted()) { What if Start() failed mid-way such that mini_pg_ has started but Ranger hasn't? Don't we want to explicitly stop mini_pg_ (so we can warn if it failed)? http://gerrit.cloudera.org:8080/#/c/15483/16/src/kudu/ranger/mini_ranger.cc@1148 PS16, Line 1148: CHECK_OK(Stop()); Should WARN_NOT_OK. http://gerrit.cloudera.org:8080/#/c/15483/16/src/kudu/ranger/mini_ranger.cc@1157 PS16, Line 1157: unique_ptr<Subprocess> ranger_shutdown(new Subprocess({ Holy moly you can't just send the thing a TERM/KILL? What happens if you don't do this fancy shutdown? Also, doesn't need to be heap allocated. http://gerrit.cloudera.org:8080/#/c/15483/16/src/kudu/ranger/mini_ranger.cc@1179 PS16, Line 1179: if (data_root_.empty()) { : data_root_ = GetTestDataDirectory(); : } Do in the constructor and make data_root_ const? http://gerrit.cloudera.org:8080/#/c/15483/16/src/kudu/ranger/mini_ranger.cc@1213 PS16, Line 1213: if (!env->FileExists(admin_home)) { Decompose this block into a helper. http://gerrit.cloudera.org:8080/#/c/15483/16/src/kudu/ranger/mini_ranger.cc@1215 PS16, Line 1215: env->CreateDir(admin_home); RETURN_NOT_OK. http://gerrit.cloudera.org:8080/#/c/15483/16/src/kudu/ranger/mini_ranger.cc@1223 PS16, Line 1223: LOG(INFO) << "Starting Ranger out of " << admin_home; Shouldn't this be outside the "fresh install" case? http://gerrit.cloudera.org:8080/#/c/15483/16/src/kudu/ranger/mini_ranger.cc@1232 PS16, Line 1232: RETURN_NOT_OK(GetFQDN(&fqdn)); What about setting up Ranger on a UNIQUE_LOOPBACK address? http://gerrit.cloudera.org:8080/#/c/15483/16/src/kudu/ranger/mini_ranger.cc@1237 PS16, Line 1237: uint16_t ranger_shutdown_port = GetRandomPort(); There should be a comment somewhere explaining all of the ports that we need to set up, and what purpose they serve. http://gerrit.cloudera.org:8080/#/c/15483/16/src/kudu/ranger/mini_ranger.cc@1238 PS16, Line 1238: RETURN_NOT_OK(CreateRangerConfigs( Decompose all the config stuff into a helper. http://gerrit.cloudera.org:8080/#/c/15483/16/src/kudu/ranger/mini_ranger.cc@1258 PS16, Line 1258: if (fresh_install) { Decompose this block into a helper. http://gerrit.cloudera.org:8080/#/c/15483/16/src/kudu/ranger/mini_ranger.cc@1261 PS16, Line 1261: symlink(JoinPathSegments(ranger_home_, "ews/webapp").c_str(), : kWebAppDir.c_str()); Add a symlink function to env.h and env_posix.h (and a test to env-test.cc). Can be done in a follow-up if you prefer, but soon (as this doesn't even check return value). http://gerrit.cloudera.org:8080/#/c/15483/16/src/kudu/ranger/mini_ranger.cc@1264 PS16, Line 1264: Much of this encapsulates setup.sh from apache/ranger Link? http://gerrit.cloudera.org:8080/#/c/15483/16/src/kudu/ranger/mini_ranger.cc@1275 PS16, Line 1275: unique_ptr<Subprocess> db_setup(new Subprocess( Doesn't need to be heap allocated. http://gerrit.cloudera.org:8080/#/c/15483/16/src/kudu/ranger/mini_ranger.cc@1343 PS16, Line 1343: EasyCurl curl; : EasyJson service; : service.Set("name", "kudu"); : service.Set("type", "kudu"); : faststring result; : Status curl_result = curl.PostToURL(JoinPathSegments(ranger_admin_url_, : "service/plugins/services"), : service.ToString(), &result, {"Content-Type: application/json"}, : "admin", "admin"); Decompose into a helper. http://gerrit.cloudera.org:8080/#/c/15483/16/src/kudu/ranger/mini_ranger.cc@1357 PS16, Line 1357: Status MiniRanger::CreateRangerConfigs(const string& config_string, const string& file_path) { Pass by value and std::move both of these. http://gerrit.cloudera.org:8080/#/c/15483/16/src/kudu/ranger/mini_ranger.cc@1366 PS16, Line 1366: int16_t MiniRanger::GetRandomPort() { Duplicated from MiniPostgres; decompose into net_util somewhere. http://gerrit.cloudera.org:8080/#/c/15483/16/src/kudu/ranger/mini_ranger.cc@1434 PS16, Line 1434: LOG(INFO) << result.ToString(); If there's an error, will it be reflected in the HTTP status code (i.e. will curl fail?) Or do we need to process 'result' to find it? http://gerrit.cloudera.org:8080/#/c/15483/16/src/kudu/util/curl_util.h File src/kudu/util/curl_util.h: PS16: Can you unit test this change? See PasswdWebserverTest in webserver-test (which you may want to refactor to take advantage of the new functionality anyway). http://gerrit.cloudera.org:8080/#/c/15483/16/src/kudu/util/curl_util.h@51 PS16, Line 51: const std::string& username = "", : const std::string& password = ""); These are likely to be the same with each call to FetchURL or PostToURL; could you plumb them as a setter instead? Also, please make that setter mutually exclusive with set_use_spnego() (i.e. allow just one to be used). http://gerrit.cloudera.org:8080/#/c/15483/16/thirdparty/build-thirdparty.sh File thirdparty/build-thirdparty.sh: http://gerrit.cloudera.org:8080/#/c/15483/16/thirdparty/build-thirdparty.sh@298 PS16, Line 298: cp -R \ This isn't a directory, so why -R? Also don't you want -f so you can overwrite the destination file if it exists (i.e. multiple invocations of this script). Alternatively, maybe a symlink would be good enough? http://gerrit.cloudera.org:8080/#/c/15483/16/thirdparty/patches/ranger-fixscripts.patch File thirdparty/patches/ranger-fixscripts.patch: PS16: The patch is obviously just a limited view into the script itself, but is it OK to just outright remove these three variables? Looks like XAPOLICYMGR_DIR is needed to set XAPOLICYMGR_EWS_DIR just below; how does this work? -- To view, visit http://gerrit.cloudera.org:8080/15483 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I15ab1eb8abe71c074c26b286073442882e101bc6 Gerrit-Change-Number: 15483 Gerrit-PatchSet: 16 Gerrit-Owner: Attila Bukor <[email protected]> Gerrit-Reviewer: Adar Dembo <[email protected]> Gerrit-Reviewer: Alexey Serbin <[email protected]> Gerrit-Reviewer: Andrew Wong <[email protected]> Gerrit-Reviewer: Attila Bukor <[email protected]> Gerrit-Reviewer: Grant Henke <[email protected]> Gerrit-Reviewer: Hao Hao <[email protected]> Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Tidy Bot (241) Gerrit-Comment-Date: Sat, 21 Mar 2020 00:33:23 +0000 Gerrit-HasComments: Yes
