[1/3] kudu git commit: KUDU-2297 (part 3): refactor process-wide stack collection out of /stacks
Repository: kudu Updated Branches: refs/heads/master 416b3018a -> 00815045f KUDU-2297 (part 3): refactor process-wide stack collection out of /stacks Previously a bunch of logic to collect all the stacks from the process was in the /stacks path handler. This logic is relatively generic and shouldn't be intermingled with the formatting code. In particular I'd like to use it in the diagnostics log, where a different output format is desirable. Change-Id: Ibb7c6edd31254f3d7e0cbef1eaf575bde3570df6 Reviewed-on: http://gerrit.cloudera.org:8080/9329 Tested-by: Todd LipconReviewed-by: Mike Percy Project: http://git-wip-us.apache.org/repos/asf/kudu/repo Commit: http://git-wip-us.apache.org/repos/asf/kudu/commit/831483b4 Tree: http://git-wip-us.apache.org/repos/asf/kudu/tree/831483b4 Diff: http://git-wip-us.apache.org/repos/asf/kudu/diff/831483b4 Branch: refs/heads/master Commit: 831483b47fd0e2165c3612811fdbf6337e3891b8 Parents: 416b301 Author: Todd Lipcon Authored: Wed Feb 14 14:43:32 2018 -0800 Committer: Todd Lipcon Committed: Fri Feb 23 23:06:30 2018 + -- src/kudu/server/default_path_handlers.cc | 101 +- src/kudu/server/diagnostics_log.cc | 4 +- src/kudu/util/debug-util-test.cc | 65 + src/kudu/util/debug-util.cc | 75 +++ src/kudu/util/debug-util.h | 59 +++ 5 files changed, 191 insertions(+), 113 deletions(-) -- http://git-wip-us.apache.org/repos/asf/kudu/blob/831483b4/src/kudu/server/default_path_handlers.cc -- diff --git a/src/kudu/server/default_path_handlers.cc b/src/kudu/server/default_path_handlers.cc index b857be5..7574def 100644 --- a/src/kudu/server/default_path_handlers.cc +++ b/src/kudu/server/default_path_handlers.cc @@ -18,17 +18,13 @@ #include "kudu/server/default_path_handlers.h" #include -#include #include #include #include -#include -#include #include #include #include -#include #include #include @@ -48,13 +44,12 @@ #include "kudu/gutil/strings/human_readable.h" #include "kudu/gutil/strings/numbers.h" #include "kudu/gutil/strings/split.h" -#include "kudu/gutil/strings/strip.h" #include "kudu/gutil/strings/substitute.h" #include "kudu/server/pprof_path_handlers.h" #include "kudu/server/webserver.h" +#include "kudu/util/array_view.h" #include "kudu/util/debug-util.h" #include "kudu/util/easy_json.h" -#include "kudu/util/env.h" #include "kudu/util/faststring.h" #include "kudu/util/flag_tags.h" #include "kudu/util/flags.h" @@ -157,90 +152,34 @@ static void FlagsHandler(const Webserver::WebRequest& req, // Prints out the current stack trace of all threads in the process. static void StacksHandler(const Webserver::WebRequest& /*req*/, Webserver::PrerenderedWebResponse* resp) { - MonoTime start = MonoTime::Now(); std::ostringstream* output = resp->output; - vector tids; - Status s = ListThreads(); + + StackTraceSnapshot snap; + auto start = MonoTime::Now(); + Status s = snap.SnapshotAllStacks(); if (!s.ok()) { -*output << "Failed to list threads: " << s.ToString(); +*output << "Failed to collect stacks: " << s.ToString(); return; } - struct Info { -pid_t tid; -Status status; -string thread_name; -StackTraceCollector stc; -StackTrace stack; - }; - - // Initially trigger all the stack traces. - vector infos(tids.size()); - for (int i = 0; i < tids.size(); i++) { -infos[i].tid = tids[i]; -infos[i].status = infos[i].stc.TriggerAsync(tids[i], [i].stack); - } - - // Now collect the thread names while we are waiting on stack trace collection. - for (auto& info : infos) { -// Get the thread's name by reading proc. -// TODO(todd): should we have the dumped thread fill in its own name using -// prctl to avoid having to open and read /proc? Or maybe we should use the -// Kudu ThreadMgr to get the thread names for the cases where we are using -// the kudu::Thread wrapper at least. -faststring buf; -Status s = ReadFileToString(Env::Default(), -Substitute("/proc/self/task/$0/comm", info.tid), -); -if (!s.ok()) { - info.thread_name = ""; -} else { - info.thread_name = buf.ToString(); - StripTrailingNewline(_name); -} - } - - // Now actually collect all the stacks. - MonoTime deadline = MonoTime::Now() + MonoDelta::FromSeconds(1); - for (auto& info : infos) { -info.status = info.status.AndThen([&] { -return info.stc.AwaitCollection(deadline); - }); - } + auto dur = MonoTime::Now() - start; - // And group the threads by their stack
[2/3] kudu git commit: KUDU-2297 (part 4): periodically dump stacks to diagnostics log
KUDU-2297 (part 4): periodically dump stacks to diagnostics log This modifies the diagnostics log to periodically dump stack traces. This is slightly complicated by the fact that symbolized stack traces can be relatively large. So, we separate the logging of symbols and stack traces. When an address first appears in a log file, it is logged as part of a symbol line. Later logs of the same address do not need to re-log the symbol. With this, a typical stack trace dump of an idle tserver is about 4KB pre-compression, and a 'symbols' dump is about 6KB. So logging stacks reasonably often should not use much disk space or IO. Currently this is enabled on the same interval as the metrics log, but only if a new experimental flag --diagnostics-log-stack-traces is enabled. I'm planning to move it to a different flag in a later commit, but wanted to keep this one simple and incremental. Change-Id: Ic32abf2c48ac6a5f3c384e58838b34671bcaf147 Reviewed-on: http://gerrit.cloudera.org:8080/9330 Tested-by: Todd LipconReviewed-by: Mike Percy Project: http://git-wip-us.apache.org/repos/asf/kudu/repo Commit: http://git-wip-us.apache.org/repos/asf/kudu/commit/09298f3b Tree: http://git-wip-us.apache.org/repos/asf/kudu/tree/09298f3b Diff: http://git-wip-us.apache.org/repos/asf/kudu/diff/09298f3b Branch: refs/heads/master Commit: 09298f3b6756f10a7d598b4de77676d49c38f117 Parents: 831483b Author: Todd Lipcon Authored: Wed Feb 14 16:17:43 2018 -0800 Committer: Todd Lipcon Committed: Fri Feb 23 23:06:40 2018 + -- src/kudu/server/diagnostics_log.cc | 152 +++- src/kudu/server/diagnostics_log.h | 6 ++ src/kudu/util/debug-util.cc| 36 src/kudu/util/debug-util.h | 19 +++- src/kudu/util/rolling_log-test.cc | 16 ++-- src/kudu/util/rolling_log.cc | 13 +-- src/kudu/util/rolling_log.h| 36 +--- 7 files changed, 236 insertions(+), 42 deletions(-) -- http://git-wip-us.apache.org/repos/asf/kudu/blob/09298f3b/src/kudu/server/diagnostics_log.cc -- diff --git a/src/kudu/server/diagnostics_log.cc b/src/kudu/server/diagnostics_log.cc index 0246347..ab47d8a 100644 --- a/src/kudu/server/diagnostics_log.cc +++ b/src/kudu/server/diagnostics_log.cc @@ -22,14 +22,22 @@ #include #include #include +#include +#include #include +#include +#include "kudu/gutil/map-util.h" #include "kudu/gutil/ref_counted.h" +#include "kudu/gutil/stringprintf.h" #include "kudu/gutil/strings/substitute.h" #include "kudu/gutil/walltime.h" +#include "kudu/util/array_view.h" #include "kudu/util/condition_variable.h" +#include "kudu/util/debug-util.h" #include "kudu/util/env.h" +#include "kudu/util/flag_tags.h" #include "kudu/util/jsonwriter.h" #include "kudu/util/logging.h" #include "kudu/util/metrics.h" @@ -41,17 +49,60 @@ using std::string; using std::unique_ptr; +using std::vector; using strings::Substitute; +// GLog already implements symbolization. Just import their hidden symbol. +namespace google { +// Symbolizes a program counter. On success, returns true and write the +// symbol name to "out". The symbol name is demangled if possible +// (supports symbols generated by GCC 3.x or newer). Otherwise, +// returns false. +bool Symbolize(void *pc, char *out, int out_size); +} + + +// TODO(todd): this is a placeholder flag. This should actually be an interval. +// Tagging as experimental and disabling by default while this work is in flux. +DEFINE_bool(diagnostics_log_stack_traces, false, +"Whether to periodically log stack traces to the diagnostics log."); +TAG_FLAG(diagnostics_log_stack_traces, runtime); +TAG_FLAG(diagnostics_log_stack_traces, experimental); + namespace kudu { namespace server { +// Track which symbols have been emitted to the log already. +class DiagnosticsLog::SymbolSet { + public: + SymbolSet() { +set_.set_empty_key(nullptr); + } + + // Return true if the addr was added, false if it already existed. + bool Add(void* addr) { +return InsertIfNotPresent(_, addr); + } + + void ResetIfLogRolled(int roll_count) { +if (roll_count_ != roll_count) { + roll_count_ = roll_count; + set_.clear(); +} + } + + private: + int roll_count_ = 0; + google::dense_hash_set set_; +}; + DiagnosticsLog::DiagnosticsLog(string log_dir, MetricRegistry* metric_registry) : log_dir_(std::move(log_dir)), metric_registry_(metric_registry), wake_(_), -metrics_log_interval_(MonoDelta::FromSeconds(60)) { +metrics_log_interval_(MonoDelta::FromSeconds(60)), +symbols_(new SymbolSet()) { } DiagnosticsLog::~DiagnosticsLog() { Stop(); @@ -111,10 +162,109 @@ void
[3/3] kudu git commit: Don't perform compactions when clean time has not been advanced
Don't perform compactions when clean time has not been advanced Due to KUDU-2233 we might not advance safe time, and thus clean time, at bootstrap causing possible corruption or crashes. This patch adds a check to make sure that clean time has been advanced at all before performing a compaction. This is temporary fix until we have a more strict check like the one proposed in https://gerrit.cloudera.org/#/c/8887/. Change-Id: Ia74abdf7d806efc4239dc9cff4a5da28621d331a Reviewed-on: http://gerrit.cloudera.org:8080/9436 Reviewed-by: Todd LipconTested-by: David Ribeiro Alves Project: http://git-wip-us.apache.org/repos/asf/kudu/repo Commit: http://git-wip-us.apache.org/repos/asf/kudu/commit/00815045 Tree: http://git-wip-us.apache.org/repos/asf/kudu/tree/00815045 Diff: http://git-wip-us.apache.org/repos/asf/kudu/diff/00815045 Branch: refs/heads/master Commit: 00815045fc3295c12023fd7ae7ad220645a10c3a Parents: 09298f3 Author: David Alves Authored: Thu Feb 22 09:38:29 2018 -0800 Committer: David Ribeiro Alves Committed: Fri Feb 23 23:06:45 2018 + -- src/kudu/tablet/tablet.cc | 7 +++ 1 file changed, 7 insertions(+) -- http://git-wip-us.apache.org/repos/asf/kudu/blob/00815045/src/kudu/tablet/tablet.cc -- diff --git a/src/kudu/tablet/tablet.cc b/src/kudu/tablet/tablet.cc index 431de32..86182ab 100644 --- a/src/kudu/tablet/tablet.cc +++ b/src/kudu/tablet/tablet.cc @@ -1675,6 +1675,13 @@ Status Tablet::Compact(CompactFlags flags) { void Tablet::UpdateCompactionStats(MaintenanceOpStats* stats) { + if (mvcc_.GetCleanTimestamp() == Timestamp::kInitialTimestamp) { +KLOG_EVERY_N_SECS(WARNING, 30) << LogPrefix() << "Can't schedule compaction. Clean time has " + << "not been advanced past its initial value."; +stats->set_runnable(false); +return; + } + // TODO: use workload statistics here to find out how "hot" the tablet has // been in the last 5 minutes, and somehow scale the compaction quality // based on that, so we favor hot tablets.
[1/2] kudu git commit: KUDU-2291 (part 8): fix a TSAN issue with libunwind initialization
Repository: kudu Updated Branches: refs/heads/master 2a5a12fbd -> 57fe0c3db KUDU-2291 (part 8): fix a TSAN issue with libunwind initialization libunwind uses double-checked locking for initialization, which isn't technically safe. We previously tried to work around this by calling into the stack trace library before starting any kudu::Threads, but that still left us open to races in unit tests like rw_mutex-test which uses std::thread. This patch changes the forced single-threaded initialization to use GoogleOnce instead. Prior to this patch, looping rw_mutex-test on TSAN failed 12/1000 times. With the patch it passed 1000/1000. Change-Id: I522b6553e9cb9a30d7106ff55ad119f7df1f949c Reviewed-on: http://gerrit.cloudera.org:8080/9409 Tested-by: Kudu Jenkins Reviewed-by: Mike PercyProject: http://git-wip-us.apache.org/repos/asf/kudu/repo Commit: http://git-wip-us.apache.org/repos/asf/kudu/commit/20ba3c7b Tree: http://git-wip-us.apache.org/repos/asf/kudu/tree/20ba3c7b Diff: http://git-wip-us.apache.org/repos/asf/kudu/diff/20ba3c7b Branch: refs/heads/master Commit: 20ba3c7ba6c64c4a1521e7aab06a99a93cc01d45 Parents: 2a5a12f Author: Todd Lipcon Authored: Thu Feb 22 16:59:38 2018 -0800 Committer: Todd Lipcon Committed: Fri Feb 23 20:42:19 2018 + -- src/kudu/util/debug-util.cc | 26 ++ src/kudu/util/thread.cc | 5 - 2 files changed, 26 insertions(+), 5 deletions(-) -- http://git-wip-us.apache.org/repos/asf/kudu/blob/20ba3c7b/src/kudu/util/debug-util.cc -- diff --git a/src/kudu/util/debug-util.cc b/src/kudu/util/debug-util.cc index 0af7abe..5a142bc 100644 --- a/src/kudu/util/debug-util.cc +++ b/src/kudu/util/debug-util.cc @@ -44,10 +44,12 @@ #include #endif +#include "kudu/gutil/basictypes.h" #include "kudu/gutil/dynamic_annotations.h" #include "kudu/gutil/hash/city.h" #include "kudu/gutil/linux_syscall_support.h" #include "kudu/gutil/macros.h" +#include "kudu/gutil/once.h" #include "kudu/gutil/spinlock.h" #include "kudu/gutil/stringprintf.h" #include "kudu/gutil/strings/numbers.h" @@ -305,6 +307,19 @@ bool InitSignalHandlerUnlocked(int signum) { return state == INITIALIZED; } +#ifdef __linux__ +GoogleOnceType g_prime_libunwind_once; + +void PrimeLibunwind() { + // The first call into libunwind does some unsafe double-checked locking + // for initialization. So, we make sure that the first call is not concurrent + // with any other call. + unw_cursor_t cursor; + unw_context_t uc; + unw_getcontext(); + RAW_CHECK(unw_init_local(, ) >= 0, "unw_init_local failed"); +} +#endif } // anonymous namespace Status SetStackTraceSignal(int signum) { @@ -392,6 +407,15 @@ Status StackTraceCollector::TriggerAsync(int64_t tid, StackTrace* stack) { return Status::NotSupported("unable to take thread stack: signal handler unavailable"); } } + // Ensure that libunwind is primed for use before we send any signals. Otherwise + // we can hit a deadlock with the following stack: + // GoogleOnceInit() [waits on the 'once' to finish, but will never finish] + // StackTrace::Collect() + // + // PrimeLibUnwind + // GoogleOnceInit() [not yet initted, so starts initializing] + // StackTrace::Collect() + GoogleOnceInit(_prime_libunwind_once, ); std::unique_ptr data(new SignalData()); // Set the target TID in our communication structure, so if we end up with any @@ -545,6 +569,8 @@ void StackTrace::Collect(int skip_frames) { const int kMaxDepth = arraysize(frames_); #ifdef __linux__ + GoogleOnceInit(_prime_libunwind_once, ); + unw_cursor_t cursor; unw_context_t uc; unw_getcontext(); http://git-wip-us.apache.org/repos/asf/kudu/blob/20ba3c7b/src/kudu/util/thread.cc -- diff --git a/src/kudu/util/thread.cc b/src/kudu/util/thread.cc index b5db351..3275e6c 100644 --- a/src/kudu/util/thread.cc +++ b/src/kudu/util/thread.cc @@ -41,7 +41,6 @@ #include #include "kudu/gutil/atomicops.h" -#include "kudu/gutil/basictypes.h" #include "kudu/gutil/bind.h" #include "kudu/gutil/bind_helpers.h" #include "kudu/gutil/dynamic_annotations.h" @@ -49,7 +48,6 @@ #include "kudu/gutil/once.h" #include "kudu/gutil/port.h" #include "kudu/gutil/strings/substitute.h" -#include "kudu/util/debug-util.h" #include "kudu/util/flag_tags.h" #include "kudu/util/kernel_stack_watchdog.h" #include "kudu/util/logging.h" @@ -407,9 +405,6 @@ void ThreadMgr::ThreadPathHandler(const WebCallbackRegistry::WebRequest& req, } static void InitThreading() { - // Warm up the stack trace library. This avoids a race in libunwind initialization - // by making sure we initialize it before we start any other threads.
[2/2] kudu git commit: KUDU-2191 (5/n): Add Kerberos SASL support to the HMS client
KUDU-2191 (5/n): Add Kerberos SASL support to the HMS client The bulk of this commit is adding a new Thrift transport type, SaslClientTransport, which facilitates SASL GSSAPI negotiation, as well as integrity/privacy channel protection. The new transport is based on Impala's version with some significant changes: - Impala has a client and server SASL transport, necessitating a common superclass (SaslTransport). Since we only need a client transport, I collapsed all of the logic into a single class, which I think makes the code easier to follow. - The transport uses Kudu helper types where possible, e.g., faststring buffers, and our existing SASL utility infrastructure. - Integrity and privacy channel protection are implemented. There are no standlone unit-tests for the transport, since that would require implementing the server-specific counterpart. Instead, the class is tested indirectly through using the HMS client to communicate with a Kerberos-enabled HMS instance. Change-Id: I8f217ae05fd36c8ee88fe20eeccd73d49233a345 Reviewed-on: http://gerrit.cloudera.org:8080/8692 Tested-by: Kudu Jenkins Reviewed-by: Todd LipconProject: http://git-wip-us.apache.org/repos/asf/kudu/repo Commit: http://git-wip-us.apache.org/repos/asf/kudu/commit/57fe0c3d Tree: http://git-wip-us.apache.org/repos/asf/kudu/tree/57fe0c3d Diff: http://git-wip-us.apache.org/repos/asf/kudu/diff/57fe0c3d Branch: refs/heads/master Commit: 57fe0c3db086c9fc61fbcef9b2d879422b387a7e Parents: 20ba3c7 Author: Dan Burkert Authored: Tue Nov 14 18:14:06 2017 -0800 Committer: Dan Burkert Committed: Fri Feb 23 21:09:17 2018 + -- src/kudu/hms/CMakeLists.txt | 7 +- src/kudu/hms/hms_client-test.cc | 119 +- src/kudu/hms/hms_client.cc | 54 ++- src/kudu/hms/hms_client.h | 15 +- src/kudu/hms/mini_hms.cc| 91 - src/kudu/hms/mini_hms.h | 16 + src/kudu/hms/sasl_client_transport.cc | 402 +++ src/kudu/hms/sasl_client_transport.h| 176 .../mini-cluster/external_mini_cluster-test.cc | 25 +- src/kudu/mini-cluster/external_mini_cluster.cc | 11 + src/kudu/rpc/client_negotiation.cc | 8 +- src/kudu/rpc/sasl_common.cc | 89 ++-- src/kudu/rpc/sasl_common.h | 55 ++- src/kudu/rpc/server_negotiation.cc | 9 +- 14 files changed, 994 insertions(+), 83 deletions(-) -- http://git-wip-us.apache.org/repos/asf/kudu/blob/57fe0c3d/src/kudu/hms/CMakeLists.txt -- diff --git a/src/kudu/hms/CMakeLists.txt b/src/kudu/hms/CMakeLists.txt index e449047..f50eead 100644 --- a/src/kudu/hms/CMakeLists.txt +++ b/src/kudu/hms/CMakeLists.txt @@ -24,10 +24,13 @@ target_link_libraries(hms_thrift thrift) add_dependencies(hms_thrift ${HMS_THRIFT_TGTS}) set(HMS_SRCS - hms_client.cc) + hms_client.cc + sasl_client_transport.cc) set(HMS_DEPS + gflags glog hms_thrift + krpc kudu_util) add_library(kudu_hms ${HMS_SRCS}) @@ -62,6 +65,7 @@ set(MINI_HMS_SRCS add_library(mini_hms ${MINI_HMS_SRCS}) target_link_libraries(mini_hms gutil + krpc kudu_test_util kudu_util) add_dependencies(mini_hms hms-plugin) @@ -71,6 +75,7 @@ if (NOT NO_TESTS) set(KUDU_TEST_LINK_LIBS kudu_hms mini_hms +mini_kdc ${KUDU_MIN_TEST_LIBS}) # This test has to run serially since otherwise starting the HMS can take a very http://git-wip-us.apache.org/repos/asf/kudu/blob/57fe0c3d/src/kudu/hms/hms_client-test.cc -- diff --git a/src/kudu/hms/hms_client-test.cc b/src/kudu/hms/hms_client-test.cc index 9ec0f4c..87601eb 100644 --- a/src/kudu/hms/hms_client-test.cc +++ b/src/kudu/hms/hms_client-test.cc @@ -22,12 +22,15 @@ #include #include +#include #include // IWYU pragma: keep #include #include "kudu/hms/hive_metastore_constants.h" #include "kudu/hms/hive_metastore_types.h" #include "kudu/hms/mini_hms.h" +#include "kudu/rpc/sasl_common.h" +#include "kudu/security/test/mini_kdc.h" #include "kudu/util/monotime.h" #include "kudu/util/net/net_util.h" #include "kudu/util/net/sockaddr.h" @@ -36,6 +39,8 @@ #include "kudu/util/test_macros.h" #include "kudu/util/test_util.h" +using boost::optional; +using kudu::rpc::SaslProtection; using std::make_pair; using std::string; using std::vector; @@ -43,7 +48,8 @@ using std::vector; namespace kudu { namespace hms { -class HmsClientTest : public KuduTest { +class HmsClientTest : public KuduTest, + public ::testing::WithParamInterface { public: Status
[2/2] kudu git commit: [Spark] Simplifiy creating range partitioned tables
[Spark] Simplifiy creating range partitioned tables In order to create a range partitioned table a user needs to create PartialRows representing the lower and upper bounds. In the Spark API the Kudu Schema isnât made available which could make matching the expected schema tricky. This small change makes retrieving the generated Kudu schema possible and also allows for table creation with a Kudu schema directly. Change-Id: Ied45b90b38d9e123868c4b0430dc52c888f033b3 Reviewed-on: http://gerrit.cloudera.org:8080/9379 Tested-by: Kudu Jenkins Reviewed-by: Hao HaoReviewed-by: Dan Burkert Project: http://git-wip-us.apache.org/repos/asf/kudu/repo Commit: http://git-wip-us.apache.org/repos/asf/kudu/commit/e94556ef Tree: http://git-wip-us.apache.org/repos/asf/kudu/tree/e94556ef Diff: http://git-wip-us.apache.org/repos/asf/kudu/diff/e94556ef Branch: refs/heads/master Commit: e94556ef767b11f7508d7dae3a92e124769f800e Parents: 52ca55a Author: Grant Henke Authored: Mon Feb 12 11:06:06 2018 -0600 Committer: Grant Henke Committed: Fri Feb 23 17:48:16 2018 + -- .../apache/kudu/spark/kudu/KuduContext.scala| 30 +++-- .../kudu/spark/kudu/DefaultSourceTest.scala | 35 2 files changed, 63 insertions(+), 2 deletions(-) -- http://git-wip-us.apache.org/repos/asf/kudu/blob/e94556ef/java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduContext.scala -- diff --git a/java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduContext.scala b/java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduContext.scala index ece4df0..165719a 100644 --- a/java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduContext.scala +++ b/java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduContext.scala @@ -154,11 +154,38 @@ class KuduContext(val kuduMaster: String, * @param schema struct schema of table * @param keys primary keys of the table * @param options replication and partitioning options for the table +* @return the KuduTable that was created */ def createTable(tableName: String, schema: StructType, keys: Seq[String], options: CreateTableOptions): KuduTable = { +val kuduSchema = createSchema(schema, keys) +createTable(tableName, kuduSchema, options) + } + + /** +* Creates a kudu table for the given schema. Partitioning can be specified through options. +* +* @param tableName table to create +* @param schema schema of table +* @param options replication and partitioning options for the table +* @return the KuduTable that was created +*/ + def createTable(tableName: String, + schema: Schema, + options: CreateTableOptions): KuduTable = { +syncClient.createTable(tableName, schema, options) + } + + /** +* Creates a kudu schema for the given struct schema. +* +* @param schema struct schema of table +* @param keys primary keys of the table +* @return the Kudu schema +*/ + def createSchema(schema: StructType, keys: Seq[String]): Schema = { val kuduCols = new util.ArrayList[ColumnSchema]() // add the key columns first, in the order specified for (key <- keys) { @@ -171,8 +198,7 @@ class KuduContext(val kuduMaster: String, val col = createColumn(field, isKey = false) kuduCols.add(col) } - -syncClient.createTable(tableName, new Schema(kuduCols), options) +new Schema(kuduCols) } private def createColumn(field: StructField, isKey: Boolean): ColumnSchema = { http://git-wip-us.apache.org/repos/asf/kudu/blob/e94556ef/java/kudu-spark/src/test/scala/org/apache/kudu/spark/kudu/DefaultSourceTest.scala -- diff --git a/java/kudu-spark/src/test/scala/org/apache/kudu/spark/kudu/DefaultSourceTest.scala b/java/kudu-spark/src/test/scala/org/apache/kudu/spark/kudu/DefaultSourceTest.scala index 59997d2..3894dd0 100644 --- a/java/kudu-spark/src/test/scala/org/apache/kudu/spark/kudu/DefaultSourceTest.scala +++ b/java/kudu-spark/src/test/scala/org/apache/kudu/spark/kudu/DefaultSourceTest.scala @@ -101,6 +101,41 @@ class DefaultSourceTest extends FunSuite with TestContext with BeforeAndAfter wi assertFalse(kuduContext.tableExists(tableName)) } + test("table creation with partitioning") { +val tableName = "testcreatepartitionedtable" +if (kuduContext.tableExists(tableName)) { + kuduContext.deleteTable(tableName) +} +val df = sqlContext.read.options(kuduOptions).kudu + +val kuduSchema = kuduContext.createSchema(df.schema, Seq("key")) +val lower =
kudu git commit: fs: clarify error message at startup
Repository: kudu Updated Branches: refs/heads/master e94556ef7 -> 2a5a12fbd fs: clarify error message at startup Currently the error messages you get when you start a Kudu instance without specifying all the expected directories is: I0205 09:26:39.961467 25499 server_base.cc:229] Could not load existing FS layout: Not found: /data/01/kudu/data/instance: No such file or directory (error 2) I0205 09:26:39.961493 25499 server_base.cc:230] Creating new FS layout F0205 09:26:39.963004 25499 tablet_server_main.cc:80] Check failed: _s.ok() Bad status: Already present: Could not create new FS layout: unable to create file system roots: FSManager root is not empty. See https://kudu.apache.org/releases/1.7.0/docs/troubleshooting.html: /data/01/kudu/wal This can be misleading for users, who might think, "Well if it's not starting because this path isn't empty, I'll just delete the contents of that path!" This would be a terrible senario, as such actions could easily lead to data loss. This patch attempts to make it more obvious that Kudu may already exist in some (possibly incomplete) form at the specified locations, and that if the user truly wants to change directories, they should run the appropriate tools. Now, starting Kudu will log. I0221 18:36:14.403893 35003 server_base.cc:431] Could not load existing FS layout: Not found: /data/04/kudu/data/instance: No such file or directory (error 2) I0221 18:36:14.403903 35003 server_base.cc:432] Attempting to create new FS layout instead I0221 18:36:14.404762 35003 server_base.cc:438] To start Kudu with a different FS layout, the `kudu fs update_dirs` tool must be run first F0221 18:36:14.404858 35003 tablet_server_main.cc:80] Check failed: _s.ok() Bad status: Already present: FS layout already exists; not overwriting existing layout. See https://kudu.apache.org/releases/1.7.0/docs/troubleshooting.html: unable to create file system roots: FSManager roots already exist: /data/01/kudu/wal,/data/02/kudu/data,/data/03/kudu/data Change-Id: I72294036e9aa78b285803d5d78b685cf444d9662 Reviewed-on: http://gerrit.cloudera.org:8080/9281 Tested-by: Kudu Jenkins Reviewed-by: Andrew WongProject: http://git-wip-us.apache.org/repos/asf/kudu/repo Commit: http://git-wip-us.apache.org/repos/asf/kudu/commit/2a5a12fb Tree: http://git-wip-us.apache.org/repos/asf/kudu/tree/2a5a12fb Diff: http://git-wip-us.apache.org/repos/asf/kudu/diff/2a5a12fb Branch: refs/heads/master Commit: 2a5a12fbd69882b387f83e1da71d8a0fd19b6a63 Parents: e94556e Author: Andrew Wong Authored: Fri Feb 9 14:56:39 2018 -0800 Committer: Andrew Wong Committed: Fri Feb 23 19:09:02 2018 + -- docs/troubleshooting.adoc | 2 +- src/kudu/fs/fs_manager.cc | 11 +++ src/kudu/server/server_base.cc | 16 +--- 3 files changed, 21 insertions(+), 8 deletions(-) -- http://git-wip-us.apache.org/repos/asf/kudu/blob/2a5a12fb/docs/troubleshooting.adoc -- diff --git a/docs/troubleshooting.adoc b/docs/troubleshooting.adoc index a0a0b23..7c48ef3 100644 --- a/docs/troubleshooting.adoc +++ b/docs/troubleshooting.adoc @@ -490,7 +490,7 @@ When Kudu starts, it checks each configured data directory, expecting either for initialized or for all to be empty. If a server fails to start with a log message like -Check failed: _s.ok() Bad status: Already present: Could not create new FS layout: FSManager root is not empty: /data0/kudu/data +Check failed: _s.ok() Bad status: Already present: FS layout already exists; not overwriting existing layout: FSManager roots already exist: /data0/kudu/data then this precondition has failed. This could be because Kudu was configured with non-empty data http://git-wip-us.apache.org/repos/asf/kudu/blob/2a5a12fb/src/kudu/fs/fs_manager.cc -- diff --git a/src/kudu/fs/fs_manager.cc b/src/kudu/fs/fs_manager.cc index 380169d..e7757f9 100644 --- a/src/kudu/fs/fs_manager.cc +++ b/src/kudu/fs/fs_manager.cc @@ -57,7 +57,6 @@ #include "kudu/util/pb_util.h" #include "kudu/util/scoped_cleanup.h" #include "kudu/util/stopwatch.h" -#include "kudu/util/website_util.h" DEFINE_bool(enable_data_block_fsync, true, "Whether to enable fsync() of data blocks, metadata, and their parent directories. " @@ -520,6 +519,7 @@ Status FsManager::CreateFileSystemRoots( CHECK(!opts_.read_only); // It's OK if a root already exists as long as there's nothing in it. + vector non_empty_roots; for (const auto& root : canonicalized_roots) { if (!root.status.ok()) { return Status::IOError("cannot create FS layout; at least one directory " @@ -533,12 +533,15 @@ Status FsManager::CreateFileSystemRoots(