[1/3] kudu git commit: KUDU-2297 (part 3): refactor process-wide stack collection out of /stacks

2018-02-23 Thread dralves
Repository: kudu
Updated Branches:
  refs/heads/master 416b3018a -> 00815045f


KUDU-2297 (part 3): refactor process-wide stack collection out of /stacks

Previously a bunch of logic to collect all the stacks from the process
was in the /stacks path handler. This logic is relatively generic and
shouldn't be intermingled with the formatting code. In particular I'd
like to use it in the diagnostics log, where a different output format
is desirable.

Change-Id: Ibb7c6edd31254f3d7e0cbef1eaf575bde3570df6
Reviewed-on: http://gerrit.cloudera.org:8080/9329
Tested-by: Todd Lipcon 
Reviewed-by: Mike Percy 


Project: http://git-wip-us.apache.org/repos/asf/kudu/repo
Commit: http://git-wip-us.apache.org/repos/asf/kudu/commit/831483b4
Tree: http://git-wip-us.apache.org/repos/asf/kudu/tree/831483b4
Diff: http://git-wip-us.apache.org/repos/asf/kudu/diff/831483b4

Branch: refs/heads/master
Commit: 831483b47fd0e2165c3612811fdbf6337e3891b8
Parents: 416b301
Author: Todd Lipcon 
Authored: Wed Feb 14 14:43:32 2018 -0800
Committer: Todd Lipcon 
Committed: Fri Feb 23 23:06:30 2018 +

--
 src/kudu/server/default_path_handlers.cc | 101 +-
 src/kudu/server/diagnostics_log.cc   |   4 +-
 src/kudu/util/debug-util-test.cc |  65 +
 src/kudu/util/debug-util.cc  |  75 +++
 src/kudu/util/debug-util.h   |  59 +++
 5 files changed, 191 insertions(+), 113 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/kudu/blob/831483b4/src/kudu/server/default_path_handlers.cc
--
diff --git a/src/kudu/server/default_path_handlers.cc 
b/src/kudu/server/default_path_handlers.cc
index b857be5..7574def 100644
--- a/src/kudu/server/default_path_handlers.cc
+++ b/src/kudu/server/default_path_handlers.cc
@@ -18,17 +18,13 @@
 #include "kudu/server/default_path_handlers.h"
 
 #include 
-#include 
 
 #include 
 #include 
 #include 
-#include 
-#include 
 #include 
 #include 
 #include 
-#include 
 #include 
 
 #include 
@@ -48,13 +44,12 @@
 #include "kudu/gutil/strings/human_readable.h"
 #include "kudu/gutil/strings/numbers.h"
 #include "kudu/gutil/strings/split.h"
-#include "kudu/gutil/strings/strip.h"
 #include "kudu/gutil/strings/substitute.h"
 #include "kudu/server/pprof_path_handlers.h"
 #include "kudu/server/webserver.h"
+#include "kudu/util/array_view.h"
 #include "kudu/util/debug-util.h"
 #include "kudu/util/easy_json.h"
-#include "kudu/util/env.h"
 #include "kudu/util/faststring.h"
 #include "kudu/util/flag_tags.h"
 #include "kudu/util/flags.h"
@@ -157,90 +152,34 @@ static void FlagsHandler(const Webserver::WebRequest& req,
 // Prints out the current stack trace of all threads in the process.
 static void StacksHandler(const Webserver::WebRequest& /*req*/,
   Webserver::PrerenderedWebResponse* resp) {
-  MonoTime start = MonoTime::Now();
   std::ostringstream* output = resp->output;
-  vector tids;
-  Status s = ListThreads();
+
+  StackTraceSnapshot snap;
+  auto start = MonoTime::Now();
+  Status s = snap.SnapshotAllStacks();
   if (!s.ok()) {
-*output << "Failed to list threads: " << s.ToString();
+*output << "Failed to collect stacks: " << s.ToString();
 return;
   }
-  struct Info {
-pid_t tid;
-Status status;
-string thread_name;
-StackTraceCollector stc;
-StackTrace stack;
-  };
-
-  // Initially trigger all the stack traces.
-  vector infos(tids.size());
-  for (int i = 0; i < tids.size(); i++) {
-infos[i].tid = tids[i];
-infos[i].status = infos[i].stc.TriggerAsync(tids[i], [i].stack);
-  }
-
-  // Now collect the thread names while we are waiting on stack trace 
collection.
-  for (auto& info : infos) {
-// Get the thread's name by reading proc.
-// TODO(todd): should we have the dumped thread fill in its own name using
-// prctl to avoid having to open and read /proc? Or maybe we should use the
-// Kudu ThreadMgr to get the thread names for the cases where we are using
-// the kudu::Thread wrapper at least.
-faststring buf;
-Status s = ReadFileToString(Env::Default(),
-Substitute("/proc/self/task/$0/comm", 
info.tid),
-);
-if (!s.ok()) {
-  info.thread_name = "";
-}  else {
-  info.thread_name = buf.ToString();
-  StripTrailingNewline(_name);
-}
-  }
-
-  // Now actually collect all the stacks.
-  MonoTime deadline = MonoTime::Now() + MonoDelta::FromSeconds(1);
-  for (auto& info : infos) {
-info.status = info.status.AndThen([&] {
-return info.stc.AwaitCollection(deadline);
-  });
-  }
+  auto dur = MonoTime::Now() - start;
 
-  // And group the threads by their stack 

[2/3] kudu git commit: KUDU-2297 (part 4): periodically dump stacks to diagnostics log

2018-02-23 Thread dralves
KUDU-2297 (part 4): periodically dump stacks to diagnostics log

This modifies the diagnostics log to periodically dump stack traces.
This is slightly complicated by the fact that symbolized stack traces
can be relatively large. So, we separate the logging of symbols and
stack traces. When an address first appears in a log file, it is logged
as part of a symbol line. Later logs of the same address do not need
to re-log the symbol.

With this, a typical stack trace dump of an idle tserver is about 4KB
pre-compression, and a 'symbols' dump is about 6KB. So logging stacks
reasonably often should not use much disk space or IO.

Currently this is enabled on the same interval as the metrics log, but
only if a new experimental flag --diagnostics-log-stack-traces is
enabled. I'm planning to move it to a different flag in a later commit,
but wanted to keep this one simple and incremental.

Change-Id: Ic32abf2c48ac6a5f3c384e58838b34671bcaf147
Reviewed-on: http://gerrit.cloudera.org:8080/9330
Tested-by: Todd Lipcon 
Reviewed-by: Mike Percy 


Project: http://git-wip-us.apache.org/repos/asf/kudu/repo
Commit: http://git-wip-us.apache.org/repos/asf/kudu/commit/09298f3b
Tree: http://git-wip-us.apache.org/repos/asf/kudu/tree/09298f3b
Diff: http://git-wip-us.apache.org/repos/asf/kudu/diff/09298f3b

Branch: refs/heads/master
Commit: 09298f3b6756f10a7d598b4de77676d49c38f117
Parents: 831483b
Author: Todd Lipcon 
Authored: Wed Feb 14 16:17:43 2018 -0800
Committer: Todd Lipcon 
Committed: Fri Feb 23 23:06:40 2018 +

--
 src/kudu/server/diagnostics_log.cc | 152 +++-
 src/kudu/server/diagnostics_log.h  |   6 ++
 src/kudu/util/debug-util.cc|  36 
 src/kudu/util/debug-util.h |  19 +++-
 src/kudu/util/rolling_log-test.cc  |  16 ++--
 src/kudu/util/rolling_log.cc   |  13 +--
 src/kudu/util/rolling_log.h|  36 +---
 7 files changed, 236 insertions(+), 42 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/kudu/blob/09298f3b/src/kudu/server/diagnostics_log.cc
--
diff --git a/src/kudu/server/diagnostics_log.cc 
b/src/kudu/server/diagnostics_log.cc
index 0246347..ab47d8a 100644
--- a/src/kudu/server/diagnostics_log.cc
+++ b/src/kudu/server/diagnostics_log.cc
@@ -22,14 +22,22 @@
 #include 
 #include 
 #include 
+#include 
 
+#include 
 #include 
+#include 
 
+#include "kudu/gutil/map-util.h"
 #include "kudu/gutil/ref_counted.h"
+#include "kudu/gutil/stringprintf.h"
 #include "kudu/gutil/strings/substitute.h"
 #include "kudu/gutil/walltime.h"
+#include "kudu/util/array_view.h"
 #include "kudu/util/condition_variable.h"
+#include "kudu/util/debug-util.h"
 #include "kudu/util/env.h"
+#include "kudu/util/flag_tags.h"
 #include "kudu/util/jsonwriter.h"
 #include "kudu/util/logging.h"
 #include "kudu/util/metrics.h"
@@ -41,17 +49,60 @@
 
 using std::string;
 using std::unique_ptr;
+using std::vector;
 using strings::Substitute;
 
+// GLog already implements symbolization. Just import their hidden symbol.
+namespace google {
+// Symbolizes a program counter.  On success, returns true and write the
+// symbol name to "out".  The symbol name is demangled if possible
+// (supports symbols generated by GCC 3.x or newer).  Otherwise,
+// returns false.
+bool Symbolize(void *pc, char *out, int out_size);
+}
+
+
+// TODO(todd): this is a placeholder flag. This should actually be an interval.
+// Tagging as experimental and disabling by default while this work is in flux.
+DEFINE_bool(diagnostics_log_stack_traces, false,
+"Whether to periodically log stack traces to the diagnostics 
log.");
+TAG_FLAG(diagnostics_log_stack_traces, runtime);
+TAG_FLAG(diagnostics_log_stack_traces, experimental);
+
 namespace kudu {
 namespace server {
 
+// Track which symbols have been emitted to the log already.
+class DiagnosticsLog::SymbolSet {
+ public:
+  SymbolSet() {
+set_.set_empty_key(nullptr);
+  }
+
+  // Return true if the addr was added, false if it already existed.
+  bool Add(void* addr) {
+return InsertIfNotPresent(_, addr);
+  }
+
+  void ResetIfLogRolled(int roll_count) {
+if (roll_count_ != roll_count) {
+  roll_count_ = roll_count;
+  set_.clear();
+}
+  }
+
+ private:
+  int roll_count_ = 0;
+  google::dense_hash_set set_;
+};
+
 DiagnosticsLog::DiagnosticsLog(string log_dir,
MetricRegistry* metric_registry) :
 log_dir_(std::move(log_dir)),
 metric_registry_(metric_registry),
 wake_(_),
-metrics_log_interval_(MonoDelta::FromSeconds(60)) {
+metrics_log_interval_(MonoDelta::FromSeconds(60)),
+symbols_(new SymbolSet()) {
 }
 DiagnosticsLog::~DiagnosticsLog() {
   Stop();
@@ -111,10 +162,109 @@ void 

[3/3] kudu git commit: Don't perform compactions when clean time has not been advanced

2018-02-23 Thread dralves
Don't perform compactions when clean time has not been advanced

Due to KUDU-2233 we might not advance safe time, and thus clean
time, at bootstrap causing possible corruption or crashes.

This patch adds a check to make sure that clean time has been
advanced at all before performing a compaction.

This is temporary fix until we have a more strict check like the
one proposed in https://gerrit.cloudera.org/#/c/8887/.

Change-Id: Ia74abdf7d806efc4239dc9cff4a5da28621d331a
Reviewed-on: http://gerrit.cloudera.org:8080/9436
Reviewed-by: Todd Lipcon 
Tested-by: David Ribeiro Alves 


Project: http://git-wip-us.apache.org/repos/asf/kudu/repo
Commit: http://git-wip-us.apache.org/repos/asf/kudu/commit/00815045
Tree: http://git-wip-us.apache.org/repos/asf/kudu/tree/00815045
Diff: http://git-wip-us.apache.org/repos/asf/kudu/diff/00815045

Branch: refs/heads/master
Commit: 00815045fc3295c12023fd7ae7ad220645a10c3a
Parents: 09298f3
Author: David Alves 
Authored: Thu Feb 22 09:38:29 2018 -0800
Committer: David Ribeiro Alves 
Committed: Fri Feb 23 23:06:45 2018 +

--
 src/kudu/tablet/tablet.cc | 7 +++
 1 file changed, 7 insertions(+)
--


http://git-wip-us.apache.org/repos/asf/kudu/blob/00815045/src/kudu/tablet/tablet.cc
--
diff --git a/src/kudu/tablet/tablet.cc b/src/kudu/tablet/tablet.cc
index 431de32..86182ab 100644
--- a/src/kudu/tablet/tablet.cc
+++ b/src/kudu/tablet/tablet.cc
@@ -1675,6 +1675,13 @@ Status Tablet::Compact(CompactFlags flags) {
 
 void Tablet::UpdateCompactionStats(MaintenanceOpStats* stats) {
 
+  if (mvcc_.GetCleanTimestamp() == Timestamp::kInitialTimestamp) {
+KLOG_EVERY_N_SECS(WARNING, 30) << LogPrefix() <<  "Can't schedule 
compaction. Clean time has "
+   << "not been advanced past its initial 
value.";
+stats->set_runnable(false);
+return;
+  }
+
   // TODO: use workload statistics here to find out how "hot" the tablet has
   // been in the last 5 minutes, and somehow scale the compaction quality
   // based on that, so we favor hot tablets.



[1/2] kudu git commit: KUDU-2291 (part 8): fix a TSAN issue with libunwind initialization

2018-02-23 Thread danburkert
Repository: kudu
Updated Branches:
  refs/heads/master 2a5a12fbd -> 57fe0c3db


KUDU-2291 (part 8): fix a TSAN issue with libunwind initialization

libunwind uses double-checked locking for initialization, which isn't
technically safe. We previously tried to work around this by calling into the
stack trace library before starting any kudu::Threads, but that still left us
open to races in unit tests like rw_mutex-test which uses std::thread.

This patch changes the forced single-threaded initialization to use GoogleOnce
instead.

Prior to this patch, looping rw_mutex-test on TSAN failed 12/1000 times.
With the patch it passed 1000/1000.

Change-Id: I522b6553e9cb9a30d7106ff55ad119f7df1f949c
Reviewed-on: http://gerrit.cloudera.org:8080/9409
Tested-by: Kudu Jenkins
Reviewed-by: Mike Percy 


Project: http://git-wip-us.apache.org/repos/asf/kudu/repo
Commit: http://git-wip-us.apache.org/repos/asf/kudu/commit/20ba3c7b
Tree: http://git-wip-us.apache.org/repos/asf/kudu/tree/20ba3c7b
Diff: http://git-wip-us.apache.org/repos/asf/kudu/diff/20ba3c7b

Branch: refs/heads/master
Commit: 20ba3c7ba6c64c4a1521e7aab06a99a93cc01d45
Parents: 2a5a12f
Author: Todd Lipcon 
Authored: Thu Feb 22 16:59:38 2018 -0800
Committer: Todd Lipcon 
Committed: Fri Feb 23 20:42:19 2018 +

--
 src/kudu/util/debug-util.cc | 26 ++
 src/kudu/util/thread.cc |  5 -
 2 files changed, 26 insertions(+), 5 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/kudu/blob/20ba3c7b/src/kudu/util/debug-util.cc
--
diff --git a/src/kudu/util/debug-util.cc b/src/kudu/util/debug-util.cc
index 0af7abe..5a142bc 100644
--- a/src/kudu/util/debug-util.cc
+++ b/src/kudu/util/debug-util.cc
@@ -44,10 +44,12 @@
 #include 
 #endif
 
+#include "kudu/gutil/basictypes.h"
 #include "kudu/gutil/dynamic_annotations.h"
 #include "kudu/gutil/hash/city.h"
 #include "kudu/gutil/linux_syscall_support.h"
 #include "kudu/gutil/macros.h"
+#include "kudu/gutil/once.h"
 #include "kudu/gutil/spinlock.h"
 #include "kudu/gutil/stringprintf.h"
 #include "kudu/gutil/strings/numbers.h"
@@ -305,6 +307,19 @@ bool InitSignalHandlerUnlocked(int signum) {
   return state == INITIALIZED;
 }
 
+#ifdef __linux__
+GoogleOnceType g_prime_libunwind_once;
+
+void PrimeLibunwind() {
+  // The first call into libunwind does some unsafe double-checked locking
+  // for initialization. So, we make sure that the first call is not concurrent
+  // with any other call.
+  unw_cursor_t cursor;
+  unw_context_t uc;
+  unw_getcontext();
+  RAW_CHECK(unw_init_local(, ) >= 0, "unw_init_local failed");
+}
+#endif
 } // anonymous namespace
 
 Status SetStackTraceSignal(int signum) {
@@ -392,6 +407,15 @@ Status StackTraceCollector::TriggerAsync(int64_t tid, 
StackTrace* stack) {
   return Status::NotSupported("unable to take thread stack: signal handler 
unavailable");
 }
   }
+  // Ensure that libunwind is primed for use before we send any signals. 
Otherwise
+  // we can hit a deadlock with the following stack:
+  //   GoogleOnceInit()   [waits on the 'once' to finish, but will never 
finish]
+  //   StackTrace::Collect()
+  //   
+  //   PrimeLibUnwind
+  //   GoogleOnceInit()   [not yet initted, so starts initializing]
+  //   StackTrace::Collect()
+  GoogleOnceInit(_prime_libunwind_once, );
 
   std::unique_ptr data(new SignalData());
   // Set the target TID in our communication structure, so if we end up with 
any
@@ -545,6 +569,8 @@ void StackTrace::Collect(int skip_frames) {
   const int kMaxDepth = arraysize(frames_);
 
 #ifdef __linux__
+  GoogleOnceInit(_prime_libunwind_once, );
+
   unw_cursor_t cursor;
   unw_context_t uc;
   unw_getcontext();

http://git-wip-us.apache.org/repos/asf/kudu/blob/20ba3c7b/src/kudu/util/thread.cc
--
diff --git a/src/kudu/util/thread.cc b/src/kudu/util/thread.cc
index b5db351..3275e6c 100644
--- a/src/kudu/util/thread.cc
+++ b/src/kudu/util/thread.cc
@@ -41,7 +41,6 @@
 #include 
 
 #include "kudu/gutil/atomicops.h"
-#include "kudu/gutil/basictypes.h"
 #include "kudu/gutil/bind.h"
 #include "kudu/gutil/bind_helpers.h"
 #include "kudu/gutil/dynamic_annotations.h"
@@ -49,7 +48,6 @@
 #include "kudu/gutil/once.h"
 #include "kudu/gutil/port.h"
 #include "kudu/gutil/strings/substitute.h"
-#include "kudu/util/debug-util.h"
 #include "kudu/util/flag_tags.h"
 #include "kudu/util/kernel_stack_watchdog.h"
 #include "kudu/util/logging.h"
@@ -407,9 +405,6 @@ void ThreadMgr::ThreadPathHandler(const 
WebCallbackRegistry::WebRequest& req,
 }
 
 static void InitThreading() {
-  // Warm up the stack trace library. This avoids a race in libunwind 
initialization
-  // by making sure we initialize it before we start any other threads.

[2/2] kudu git commit: KUDU-2191 (5/n): Add Kerberos SASL support to the HMS client

2018-02-23 Thread danburkert
KUDU-2191 (5/n): Add Kerberos SASL support to the HMS client

The bulk of this commit is adding a new Thrift transport type,
SaslClientTransport, which facilitates SASL GSSAPI negotiation, as well
as integrity/privacy channel protection. The new transport is based on
Impala's version with some significant changes:

- Impala has a client and server SASL transport, necessitating a common
  superclass (SaslTransport). Since we only need a client transport, I
  collapsed all of the logic into a single class, which I think makes the
  code easier to follow.
- The transport uses Kudu helper types where possible, e.g., faststring
  buffers, and our existing SASL utility infrastructure.
- Integrity and privacy channel protection are implemented.

There are no standlone unit-tests for the transport, since that would
require implementing the server-specific counterpart. Instead, the class
is tested indirectly through using the HMS client to communicate with a
Kerberos-enabled HMS instance.

Change-Id: I8f217ae05fd36c8ee88fe20eeccd73d49233a345
Reviewed-on: http://gerrit.cloudera.org:8080/8692
Tested-by: Kudu Jenkins
Reviewed-by: Todd Lipcon 


Project: http://git-wip-us.apache.org/repos/asf/kudu/repo
Commit: http://git-wip-us.apache.org/repos/asf/kudu/commit/57fe0c3d
Tree: http://git-wip-us.apache.org/repos/asf/kudu/tree/57fe0c3d
Diff: http://git-wip-us.apache.org/repos/asf/kudu/diff/57fe0c3d

Branch: refs/heads/master
Commit: 57fe0c3db086c9fc61fbcef9b2d879422b387a7e
Parents: 20ba3c7
Author: Dan Burkert 
Authored: Tue Nov 14 18:14:06 2017 -0800
Committer: Dan Burkert 
Committed: Fri Feb 23 21:09:17 2018 +

--
 src/kudu/hms/CMakeLists.txt |   7 +-
 src/kudu/hms/hms_client-test.cc | 119 +-
 src/kudu/hms/hms_client.cc  |  54 ++-
 src/kudu/hms/hms_client.h   |  15 +-
 src/kudu/hms/mini_hms.cc|  91 -
 src/kudu/hms/mini_hms.h |  16 +
 src/kudu/hms/sasl_client_transport.cc   | 402 +++
 src/kudu/hms/sasl_client_transport.h| 176 
 .../mini-cluster/external_mini_cluster-test.cc  |  25 +-
 src/kudu/mini-cluster/external_mini_cluster.cc  |  11 +
 src/kudu/rpc/client_negotiation.cc  |   8 +-
 src/kudu/rpc/sasl_common.cc |  89 ++--
 src/kudu/rpc/sasl_common.h  |  55 ++-
 src/kudu/rpc/server_negotiation.cc  |   9 +-
 14 files changed, 994 insertions(+), 83 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/kudu/blob/57fe0c3d/src/kudu/hms/CMakeLists.txt
--
diff --git a/src/kudu/hms/CMakeLists.txt b/src/kudu/hms/CMakeLists.txt
index e449047..f50eead 100644
--- a/src/kudu/hms/CMakeLists.txt
+++ b/src/kudu/hms/CMakeLists.txt
@@ -24,10 +24,13 @@ target_link_libraries(hms_thrift thrift)
 add_dependencies(hms_thrift ${HMS_THRIFT_TGTS})
 
 set(HMS_SRCS
-  hms_client.cc)
+  hms_client.cc
+  sasl_client_transport.cc)
 set(HMS_DEPS
+  gflags
   glog
   hms_thrift
+  krpc
   kudu_util)
 
 add_library(kudu_hms ${HMS_SRCS})
@@ -62,6 +65,7 @@ set(MINI_HMS_SRCS
 add_library(mini_hms ${MINI_HMS_SRCS})
 target_link_libraries(mini_hms
   gutil
+  krpc
   kudu_test_util
   kudu_util)
 add_dependencies(mini_hms hms-plugin)
@@ -71,6 +75,7 @@ if (NOT NO_TESTS)
   set(KUDU_TEST_LINK_LIBS
 kudu_hms
 mini_hms
+mini_kdc
 ${KUDU_MIN_TEST_LIBS})
 
   # This test has to run serially since otherwise starting the HMS can take a 
very

http://git-wip-us.apache.org/repos/asf/kudu/blob/57fe0c3d/src/kudu/hms/hms_client-test.cc
--
diff --git a/src/kudu/hms/hms_client-test.cc b/src/kudu/hms/hms_client-test.cc
index 9ec0f4c..87601eb 100644
--- a/src/kudu/hms/hms_client-test.cc
+++ b/src/kudu/hms/hms_client-test.cc
@@ -22,12 +22,15 @@
 #include 
 #include 
 
+#include 
 #include  // IWYU pragma: keep
 #include 
 
 #include "kudu/hms/hive_metastore_constants.h"
 #include "kudu/hms/hive_metastore_types.h"
 #include "kudu/hms/mini_hms.h"
+#include "kudu/rpc/sasl_common.h"
+#include "kudu/security/test/mini_kdc.h"
 #include "kudu/util/monotime.h"
 #include "kudu/util/net/net_util.h"
 #include "kudu/util/net/sockaddr.h"
@@ -36,6 +39,8 @@
 #include "kudu/util/test_macros.h"
 #include "kudu/util/test_util.h"
 
+using boost::optional;
+using kudu::rpc::SaslProtection;
 using std::make_pair;
 using std::string;
 using std::vector;
@@ -43,7 +48,8 @@ using std::vector;
 namespace kudu {
 namespace hms {
 
-class HmsClientTest : public KuduTest {
+class HmsClientTest : public KuduTest,
+  public 
::testing::WithParamInterface {
  public:
 
   Status 

[2/2] kudu git commit: [Spark] Simplifiy creating range partitioned tables

2018-02-23 Thread granthenke
[Spark] Simplifiy creating range partitioned tables

In order to create a range partitioned table a user
needs to create PartialRows representing the
lower and upper bounds. In the Spark API
the Kudu Schema isn’t made available which
could make matching the expected schema
tricky.

This small change makes retrieving the generated
Kudu schema possible and also allows for table
creation with a Kudu schema directly.

Change-Id: Ied45b90b38d9e123868c4b0430dc52c888f033b3
Reviewed-on: http://gerrit.cloudera.org:8080/9379
Tested-by: Kudu Jenkins
Reviewed-by: Hao Hao 
Reviewed-by: Dan Burkert 


Project: http://git-wip-us.apache.org/repos/asf/kudu/repo
Commit: http://git-wip-us.apache.org/repos/asf/kudu/commit/e94556ef
Tree: http://git-wip-us.apache.org/repos/asf/kudu/tree/e94556ef
Diff: http://git-wip-us.apache.org/repos/asf/kudu/diff/e94556ef

Branch: refs/heads/master
Commit: e94556ef767b11f7508d7dae3a92e124769f800e
Parents: 52ca55a
Author: Grant Henke 
Authored: Mon Feb 12 11:06:06 2018 -0600
Committer: Grant Henke 
Committed: Fri Feb 23 17:48:16 2018 +

--
 .../apache/kudu/spark/kudu/KuduContext.scala| 30 +++--
 .../kudu/spark/kudu/DefaultSourceTest.scala | 35 
 2 files changed, 63 insertions(+), 2 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/kudu/blob/e94556ef/java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduContext.scala
--
diff --git 
a/java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduContext.scala 
b/java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduContext.scala
index ece4df0..165719a 100644
--- 
a/java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduContext.scala
+++ 
b/java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduContext.scala
@@ -154,11 +154,38 @@ class KuduContext(val kuduMaster: String,
 * @param schema struct schema of table
 * @param keys primary keys of the table
 * @param options replication and partitioning options for the table
+* @return the KuduTable that was created
 */
   def createTable(tableName: String,
   schema: StructType,
   keys: Seq[String],
   options: CreateTableOptions): KuduTable = {
+val kuduSchema = createSchema(schema, keys)
+createTable(tableName, kuduSchema, options)
+  }
+
+  /**
+* Creates a kudu table for the given schema. Partitioning can be specified 
through options.
+*
+* @param tableName table to create
+* @param schema schema of table
+* @param options replication and partitioning options for the table
+* @return the KuduTable that was created
+*/
+  def createTable(tableName: String,
+  schema: Schema,
+  options: CreateTableOptions): KuduTable = {
+syncClient.createTable(tableName, schema, options)
+  }
+
+  /**
+* Creates a kudu schema for the given struct schema.
+*
+* @param schema struct schema of table
+* @param keys primary keys of the table
+* @return the Kudu schema
+*/
+  def createSchema(schema: StructType, keys: Seq[String]): Schema = {
 val kuduCols = new util.ArrayList[ColumnSchema]()
 // add the key columns first, in the order specified
 for (key <- keys) {
@@ -171,8 +198,7 @@ class KuduContext(val kuduMaster: String,
   val col = createColumn(field, isKey = false)
   kuduCols.add(col)
 }
-
-syncClient.createTable(tableName, new Schema(kuduCols), options)
+new Schema(kuduCols)
   }
 
   private def createColumn(field: StructField, isKey: Boolean): ColumnSchema = 
{

http://git-wip-us.apache.org/repos/asf/kudu/blob/e94556ef/java/kudu-spark/src/test/scala/org/apache/kudu/spark/kudu/DefaultSourceTest.scala
--
diff --git 
a/java/kudu-spark/src/test/scala/org/apache/kudu/spark/kudu/DefaultSourceTest.scala
 
b/java/kudu-spark/src/test/scala/org/apache/kudu/spark/kudu/DefaultSourceTest.scala
index 59997d2..3894dd0 100644
--- 
a/java/kudu-spark/src/test/scala/org/apache/kudu/spark/kudu/DefaultSourceTest.scala
+++ 
b/java/kudu-spark/src/test/scala/org/apache/kudu/spark/kudu/DefaultSourceTest.scala
@@ -101,6 +101,41 @@ class DefaultSourceTest extends FunSuite with TestContext 
with BeforeAndAfter wi
 assertFalse(kuduContext.tableExists(tableName))
   }
 
+  test("table creation with partitioning") {
+val tableName = "testcreatepartitionedtable"
+if (kuduContext.tableExists(tableName)) {
+  kuduContext.deleteTable(tableName)
+}
+val df = sqlContext.read.options(kuduOptions).kudu
+
+val kuduSchema = kuduContext.createSchema(df.schema, Seq("key"))
+val lower = 

kudu git commit: fs: clarify error message at startup

2018-02-23 Thread awong
Repository: kudu
Updated Branches:
  refs/heads/master e94556ef7 -> 2a5a12fbd


fs: clarify error message at startup

Currently the error messages you get when you start a Kudu instance
without specifying all the expected directories is:

I0205 09:26:39.961467 25499 server_base.cc:229] Could not load existing FS 
layout: Not found: /data/01/kudu/data/instance: No such file or directory 
(error 2)
I0205 09:26:39.961493 25499 server_base.cc:230] Creating new FS layout
F0205 09:26:39.963004 25499 tablet_server_main.cc:80] Check failed: _s.ok() Bad 
status: Already present: Could not create new FS layout: unable to create file 
system roots: FSManager root is not empty. See 
https://kudu.apache.org/releases/1.7.0/docs/troubleshooting.html: 
/data/01/kudu/wal

This can be misleading for users, who might think, "Well if it's not
starting because this path isn't empty, I'll just delete the contents of
that path!" This would be a terrible senario, as such actions could
easily lead to data loss. This patch attempts to make it more obvious
that Kudu may already exist in some (possibly incomplete) form at the
specified locations, and that if the user truly wants to change directories,
they should run the appropriate tools.

Now, starting Kudu will log.

I0221 18:36:14.403893 35003 server_base.cc:431] Could not load existing FS 
layout: Not found: /data/04/kudu/data/instance: No such file or directory 
(error 2)
I0221 18:36:14.403903 35003 server_base.cc:432] Attempting to create new FS 
layout instead
I0221 18:36:14.404762 35003 server_base.cc:438] To start Kudu with a different 
FS layout, the `kudu fs update_dirs` tool must be run first
F0221 18:36:14.404858 35003 tablet_server_main.cc:80] Check failed: _s.ok() Bad 
status: Already present: FS layout already exists; not overwriting existing 
layout. See https://kudu.apache.org/releases/1.7.0/docs/troubleshooting.html: 
unable to create file system roots: FSManager roots already exist: 
/data/01/kudu/wal,/data/02/kudu/data,/data/03/kudu/data

Change-Id: I72294036e9aa78b285803d5d78b685cf444d9662
Reviewed-on: http://gerrit.cloudera.org:8080/9281
Tested-by: Kudu Jenkins
Reviewed-by: Andrew Wong 


Project: http://git-wip-us.apache.org/repos/asf/kudu/repo
Commit: http://git-wip-us.apache.org/repos/asf/kudu/commit/2a5a12fb
Tree: http://git-wip-us.apache.org/repos/asf/kudu/tree/2a5a12fb
Diff: http://git-wip-us.apache.org/repos/asf/kudu/diff/2a5a12fb

Branch: refs/heads/master
Commit: 2a5a12fbd69882b387f83e1da71d8a0fd19b6a63
Parents: e94556e
Author: Andrew Wong 
Authored: Fri Feb 9 14:56:39 2018 -0800
Committer: Andrew Wong 
Committed: Fri Feb 23 19:09:02 2018 +

--
 docs/troubleshooting.adoc  |  2 +-
 src/kudu/fs/fs_manager.cc  | 11 +++
 src/kudu/server/server_base.cc | 16 +---
 3 files changed, 21 insertions(+), 8 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/kudu/blob/2a5a12fb/docs/troubleshooting.adoc
--
diff --git a/docs/troubleshooting.adoc b/docs/troubleshooting.adoc
index a0a0b23..7c48ef3 100644
--- a/docs/troubleshooting.adoc
+++ b/docs/troubleshooting.adoc
@@ -490,7 +490,7 @@ When Kudu starts, it checks each configured data directory, 
expecting either for
 initialized or for all to be empty. If a server fails to start with a log 
message like
 
 
-Check failed: _s.ok() Bad status: Already present: Could not create new FS 
layout: FSManager root is not empty: /data0/kudu/data
+Check failed: _s.ok() Bad status: Already present: FS layout already exists; 
not overwriting existing layout: FSManager roots already exist: /data0/kudu/data
 
 
 then this precondition has failed. This could be because Kudu was configured 
with non-empty data

http://git-wip-us.apache.org/repos/asf/kudu/blob/2a5a12fb/src/kudu/fs/fs_manager.cc
--
diff --git a/src/kudu/fs/fs_manager.cc b/src/kudu/fs/fs_manager.cc
index 380169d..e7757f9 100644
--- a/src/kudu/fs/fs_manager.cc
+++ b/src/kudu/fs/fs_manager.cc
@@ -57,7 +57,6 @@
 #include "kudu/util/pb_util.h"
 #include "kudu/util/scoped_cleanup.h"
 #include "kudu/util/stopwatch.h"
-#include "kudu/util/website_util.h"
 
 DEFINE_bool(enable_data_block_fsync, true,
 "Whether to enable fsync() of data blocks, metadata, and their 
parent directories. "
@@ -520,6 +519,7 @@ Status FsManager::CreateFileSystemRoots(
   CHECK(!opts_.read_only);
 
   // It's OK if a root already exists as long as there's nothing in it.
+  vector non_empty_roots;
   for (const auto& root : canonicalized_roots) {
 if (!root.status.ok()) {
   return Status::IOError("cannot create FS layout; at least one directory "
@@ -533,12 +533,15 @@ Status FsManager::CreateFileSystemRoots(