Dan Burkert has posted comments on this change. Change subject: Add basic Hive MetaStore client ......................................................................
Patch Set 7: (69 comments) http://gerrit.cloudera.org:8080/#/c/7053/6//COMMIT_MSG Commit Message: Line 9: This patch lays the groundwork for a Hive metastore client. > any progress on the design doc for this work? Would be good to be able to r Not yet, but I plan to delay landing this until the design work is more concrete, so I'll make sure to add a link then. http://gerrit.cloudera.org:8080/#/c/7053/6/CMakeLists.txt File CMakeLists.txt: Line 829: # We rely on the JVM for running the HMS during tests. > do we need to update any build docs to specify that java is now a requireme I'm not sure. I've changed this so that these dependencies are only required if !NO_TESTS. So our options are to add Java as a dependency when building from source, or to add NO_TESTS to the from-source build instructions. Not sure which I prefer, I'll try and take an informal poll. http://gerrit.cloudera.org:8080/#/c/7053/5/build-support/dist_test.py File build-support/dist_test.py: Line 82: # Hive MetaStore tests require hadoop and hive libraries. These files are > These directories will be recursively copied? Or are these just symlinks? C Done http://gerrit.cloudera.org:8080/#/c/7053/5/build-support/run_dist_test.py File build-support/run_dist_test.py: Line 127: # are used in mini_hms.cc. > Can you use ROOT to derive the paths instead of relying on the cwd? Done Line 129: env['HADOOP_HOME'] = glob.glob(os.path.join(ROOT, "thirdparty/src/hadoop-*"))[0] > Can we run detect-java-home.sh from bigtop instead? This seems rather fragi This is how other dist-test projects do it: https://github.com/cloudera/dist_test/blob/master/grind/python/disttest/isolate.py#L114 http://gerrit.cloudera.org:8080/#/c/7053/5/cmake_modules/FindJavaHome.cmake File cmake_modules/FindJavaHome.cmake: Line 19: # https://github.com/apache/bigtop/blob/38e1571b2f73bbfa6ab0c01a689fae967b8399d9/bigtop-packages/src/common/bigtop-utils/bigtop-detect-javahome > If this is a specific version for a reason, please explain why. IIRC, it's Done http://gerrit.cloudera.org:8080/#/c/7053/6/cmake_modules/FindJavaHome.cmake File cmake_modules/FindJavaHome.cmake: Line 64: set(JAVA_HOME $ENV{JAVA_HOME}) > can we do something here to check that EXISTS ${JAVA_HOME}/bin/java just in Done http://gerrit.cloudera.org:8080/#/c/7053/5/cmake_modules/FindThrift.cmake File cmake_modules/FindThrift.cmake: PS5, Line 106: list(APPEND ${SRCS} "${THRIFT_CC_OUT}" "fb303_types.cpp" "fb303_constants.cpp" "FacebookService.cpp") : list(APPEND ${HDRS} "${THRIFT_H_OUT}" " > Remove this? Done http://gerrit.cloudera.org:8080/#/c/7053/6/cmake_modules/FindThrift.cmake File cmake_modules/FindThrift.cmake: PS6, Line 105: # TODO(dan): Add the fb303 files manually. This is a complete hack. : list(APPEND ${SRCS} "${THRIFT_CC_OUT}" "fb303_types.cpp" "fb303_constants.cpp" "FacebookService.cpp") : list(APPEND ${HDRS} "${THRIFT_H_OUT}" "fb303_types.h" "fb303_constants.h" "FacebookService.h") : > hrm, could we at least add this as an optional argument like REQUIRE_FB303? Yah perhaps. Right now we only have one consumer, so I'd like to punt on this (I already find this script complex). http://gerrit.cloudera.org:8080/#/c/7053/5/src/kudu/hms/CMakeLists.txt File src/kudu/hms/CMakeLists.txt: PS5, Line 21: : add_libra > This is only used in one place, just inline it there? Done Line 23: target_link_libraries(hms_thrift thrift) > Is hms_thrift going to be linked into the Kudu client library? If not, it d Done Line 35: > Likewise. Done Line 44: execute_process(COMMAND ln -sf > Nit: use lowercase for cmake built-ins (i.e. add_custom_target(...)). Done Line 49: "${EXECUTABLE_OUTPUT_PATH}/java-home") > So this means the symlinks will be recreated on every invocation of "make" Done http://gerrit.cloudera.org:8080/#/c/7053/6/src/kudu/hms/CMakeLists.txt File src/kudu/hms/CMakeLists.txt: Line 23: target_link_libraries(hms_thrift thrift) > since the client won't use hms at all, I think we can use the built-in ADD_ Done Line 35: > same Done http://gerrit.cloudera.org:8080/#/c/7053/5/src/kudu/hms/hive_metastore-test.cc File src/kudu/hms/hive_metastore-test.cc: Line 23 > But we're not explicitly including boost here. Is this from ThriftHiveMetas Then it would be leaking into non-test code. It's also harder because it's a generated header. Line 42 > Also test that Stop() works? Done Line 52 > Surprised the compiler needs the vector<string> part and isn't able to figu It doesn't compile without it, not sure why. http://gerrit.cloudera.org:8080/#/c/7053/6/src/kudu/hms/hive_metastore-test.cc File src/kudu/hms/hive_metastore-test.cc: Line 53 > I think if you include stl_logging.h then you don't need to do this << part Done Line 61 > is there a gaurantee that they come back in sorted order or is a std::sort( Done Line 62 > same Done http://gerrit.cloudera.org:8080/#/c/7053/5/src/kudu/hms/hive_metastore.cc File src/kudu/hms/hive_metastore.cc: Line 49 > Are these all of the exception classes that may be thrown? Is there some wa TException is something of a catch-all case, although it's not unexepected (IO errors fall under TException). My understanding is that will catch anything thrown from thrift. Line 63 > So client_ is actually a pointer? Is the pointer hidden behind a typedef or No I think this is just a ctor argument (boost::shared_ptr<TProtocol>). Line 71 > Maybe WARN_NOT_OK() instead? I think that's the convention we use. Done http://gerrit.cloudera.org:8080/#/c/7053/6/src/kudu/hms/hive_metastore.cc File src/kudu/hms/hive_metastore.cc: PS6, Line 53: > compared to the other macros we have like RETURN_NOT_OK_PREPEND, it's odd t Done http://gerrit.cloudera.org:8080/#/c/7053/5/src/kudu/hms/hive_metastore.h File src/kudu/hms/hive_metastore.h: Line 33 > Maybe HiveMetastoreClient then? Or HMSClient? HiveMetastore suggests the ac Done Line 39 > What happens if Start or Stop are called multiple times, or if Stop is call They can be called multiple times, but the bound port will change. I don't think it's worth worrying about - we can cross that bridge if we need fault tests. Line 51 > Forward declare and maybe avoid some includes? Done Line 57 > Guess we can't forward declare this, so maybe forward declaring the above i Done http://gerrit.cloudera.org:8080/#/c/7053/6/src/kudu/hms/hive_metastore.h File src/kudu/hms/hive_metastore.h: PS6, Line 51: > I think GSG discourages namespace aliases, but maybe we should make an exce SGTM. http://gerrit.cloudera.org:8080/#/c/7053/5/src/kudu/hms/mini_hms-test.cc File src/kudu/hms/mini_hms-test.cc: Line 24 > MiniHmsTest Done Line 26 > Shouldn't we also test some sort of RPC? I'm just going to get rid of this test since it's duplicated by the hms_client-test entirely. I originally wrote it because I created mini_hms before hms_client. http://gerrit.cloudera.org:8080/#/c/7053/5/src/kudu/hms/mini_hms.cc File src/kudu/hms/mini_hms.cc: Line 20: #include <csignal> > Don't need? It's needed for SIGKILL. Line 27: #include <glog/logging.h> > Don't need? Done Line 28: > Don't need? Done Line 33: #include "kudu/util/status.h" > Don't need? needed for SCOPED_LOG_SLOW_EXECUTION Line 47: VLOG(1) << "Stopping HMS"; > WARN_NOT_OK() here? Done Line 60: const char* env = std::getenv(env_var.c_str()); > Use a ternary? Done Line 63: if (!Env::Default()->FileExists(*home_dir)) { > Extra space here. Done Line 65: } > Shouldn't we gracefully return this sort of error via Status? First part is done. Second part - I don't really think that's worth the trouble, hive should complain about it appropriately. Line 72: SCOPED_LOG_SLOW_EXECUTION(WARNING, 20000, "Starting HMS"); > Remove? Or uncomment? Done Line 77: Env* env = Env::Default(); > How about asking users to supply this in the class constructor? Why? This isn't how mini_kdc works. Line 106: schematool.SetCurrentDir(tmp_dir); > Nit: do you actually need to initialize the value? Would prefer to adhere t I see you like to tempt the UB demons. Line 109: int rc; > Maybe include the exit status? Should use GetExitStatus() anyway. I'm skeptical that anyone can glean information from the exit status of schemaTool. Line 112: return Status::RuntimeError("schematool failed"); > Kerberos HMS? Done http://gerrit.cloudera.org:8080/#/c/7053/6/src/kudu/hms/mini_hms.cc File src/kudu/hms/mini_hms.cc: Line 65: } > nit: extra space Done Line 91: auto db_path = JoinPathSegments(tmp_dir, "metastore_db"); > vlog? Done PS6, Line 114: > copy pasta Done http://gerrit.cloudera.org:8080/#/c/7053/5/src/kudu/security/test/mini_kdc.cc File src/kudu/security/test/mini_kdc.cc: Line 151: RETURN_NOT_OK(WaitForUdpBind(kdc_process_->pid(), &options_.port, MonoDelta::FromSeconds(1))); > This seems like a rather short timeout. The old code was even more aggressi It hasn't seemed to be a problem. The kdc should be fast. http://gerrit.cloudera.org:8080/#/c/7053/5/src/kudu/util/subprocess.cc File src/kudu/util/subprocess.cc: Line 749 > Nit: remove unnecessary this prefixes, here and below. Done http://gerrit.cloudera.org:8080/#/c/7053/6/src/kudu/util/subprocess.cc File src/kudu/util/subprocess.cc: PS6, Line 760: : > this was a bit specific to the KDC case, I think could be cleaned up a bit. I don't think it's specific to the KDC, it's going to take some time for any process to start up. The HMS, for instance, takes on the order of 5 seconds. http://gerrit.cloudera.org:8080/#/c/7053/5/src/kudu/util/subprocess.h File src/kudu/util/subprocess.h: Line 205 > "to the specified executable" Done http://gerrit.cloudera.org:8080/#/c/7053/6/src/kudu/util/subprocess.h File src/kudu/util/subprocess.h: PS6, Line 160: t { return argv_[0]; } > "bind to ANY listening..."? Done PS6, Line 205: > typo Done PS6, Line 206: > you mean the $PATH environment variable? Done http://gerrit.cloudera.org:8080/#/c/7053/5/thirdparty/LICENSE.txt File thirdparty/LICENSE.txt: Line 620: Source: https://thrift.apache.org > Copy/pasta Done Line 658: Source: https://www.gnu.org/software/bison > Copy/pasta? Done http://gerrit.cloudera.org:8080/#/c/7053/6/thirdparty/LICENSE.txt File thirdparty/LICENSE.txt: Line 620: Source: https://thrift.apache.org > wrong paste? Done Line 658: Source: https://www.gnu.org/software/bison > same Done http://gerrit.cloudera.org:8080/#/c/7053/5/thirdparty/build-definitions.sh File thirdparty/build-definitions.sh: Line 501: CPPFLAGS="$EXTRA_CPPFLAGS $OPENSSL_CFLAGS" \ > Isn't this wrong though? Curl is a C program so I'd expect CXXFLAGS has no Done Line 671: ./bootstrap.sh --prefix=$PREFIX threading=multi --with-libraries=date_time > Is this actually needed? I don't see a corresponding change to FindKuduBoos Done PS5, Line 698: > Why do we need these two additions? I can take my answer in the form of a c Done Line 732: LDFLAGS="$EXTRA_LDFLAGS" \ > Should be cp -f, right? Otherwise will it overwrite? We don't use -f anywhere else in this file. Could you elaborate on why it's needed? http://gerrit.cloudera.org:8080/#/c/7053/5/thirdparty/download-thirdparty.sh File thirdparty/download-thirdparty.sh: Line 323: fi > No autoreconf? autoreconf is failing on OS X. Line 331: fi > No autoreconf? Yah, it breaks on OS X, and this is a build-only dependency. Line 342 > Perhaps we should follow our LLVM tarball's lead and remove it a priori, to We don't do that for LLVM. https://github.com/apache/kudu/blob/master/thirdparty/package-llvm.sh http://gerrit.cloudera.org:8080/#/c/7053/5/thirdparty/vars.sh File thirdparty/vars.sh: Line 180: BISON_VERSION=3.0.4 > Why use this version of bison? The latest is 3.0.4 and it's quite old alrea Done Line 185: HIVE_NAME=apache-hive-$HIVE_VERSION-bin > Why not just hive-$HIVE_VERSION? Because they name their tarball contents in different way. -- To view, visit http://gerrit.cloudera.org:8080/7053 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I155223da912bc18a759df2f1f6bc25d1132a99ee Gerrit-PatchSet: 7 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Dan Burkert <danburk...@apache.org> Gerrit-Reviewer: Adar Dembo <a...@cloudera.com> Gerrit-Reviewer: Dan Burkert <danburk...@apache.org> Gerrit-Reviewer: Jean-Daniel Cryans <jdcry...@apache.org> Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon <t...@apache.org> Gerrit-HasComments: Yes