Todd Lipcon has posted comments on this change. Change subject: WIP: Add a new TIMESTAMP type ......................................................................
Patch Set 3: (38 comments) http://gerrit.cloudera.org:8080/#/c/5819/3/src/kudu/client/scan_batch.h File src/kudu/client/scan_batch.h: Line 31: #include "kudu/util/timestamp_value.h" is it possible to move this to common/ ? it seems like it's not so general of a thing that it really belongs in util/ IMO PS3, Line 219: their 'its'? PS3, Line 220: getter getters http://gerrit.cloudera.org:8080/#/c/5819/3/src/kudu/client/scanner-internal.cc File src/kudu/client/scanner-internal.cc: PS3, Line 325: LOG(INFO) << "ScanRequest: " << scan->DebugString(); : LOG(INFO) << "PartitionKey: " << partition_key; : I don't think you meant to keep these http://gerrit.cloudera.org:8080/#/c/5819/3/src/kudu/client/schema.h File src/kudu/client/schema.h: Line 135: TIMESTAMP = 10 hrm... what does this mean for people who might have been using the deprecated name with a previous version client? I'm a little afraid of silent (non-compilation-breaking) breakage if someone upgrades from 1.2 to 1.3. http://gerrit.cloudera.org:8080/#/c/5819/3/src/kudu/client/value-internal.h File src/kudu/client/value-internal.h: Line 30: class KuduValue::Data { > error: use of undeclared identifier 'KuduValue' [clang-diagnostic-error] not your fault, but I guess we need to be including value.h or somesuch Line 46: TimestampValue timestamp_val_; Looking back at the original code review where this was implemented, slice_val_ was left out of the union because it is a non-POD type and pre-C++11 that wasn't allowed. Now with C++11 it _is_ allowed, so I think we may as well stuff slice_val and timestamp_val inside the union now, if that doesn't cause some unforeseen issues http://gerrit.cloudera.org:8080/#/c/5819/3/src/kudu/client/value.cc File src/kudu/client/value.cc: PS3, Line 90: auto nit: can you use 'auto*' here so it's more obvious it's a pointer? Line 209: *val_void = ×tamp_val_; is this also a reasonable place to be checking that the value is well-formed? I remember discussing that we should verify it on receipt by the server, but I wonder if we also need to do some verification in the client to make sure we don't end up generating any bogus encoded keys (or crashing) http://gerrit.cloudera.org:8080/#/c/5819/3/src/kudu/client/value.h File src/kudu/client/value.h: Line 54: static KuduValue* FromTimestamp(TimestampValue t); > warning: function 'kudu::client::KuduValue::FromTimestamp' has a definition add doxygen http://gerrit.cloudera.org:8080/#/c/5819/3/src/kudu/common/encoded_key-test.cc File src/kudu/common/encoded_key-test.cc: Line 276: TEST_F(EncodedKeyTest, TestTimestampKey) { what about some test with a negative day? is that a legal value? Line 299: TimestampValue tv3(tv2.date() + 1, tv2.time_duration()); some little comments for each test case would be nice. eg here // Test a larger day portion with same nanoseconds portion http://gerrit.cloudera.org:8080/#/c/5819/3/src/kudu/common/key_encoder.h File src/kudu/common/key_encoder.h: Line 131: uint32_t u_date; could we just call: KeyEncoderTraits<INT32, Buffer>::Encode(&date, dst); KeyEncoderTraits<INT64, Buffer>::Encode(&time_duration, dst); here? Line 160: u_date = BigEndian::FromHost32(u_date); same as above, could we reuse the INT32/INT64 decoder? http://gerrit.cloudera.org:8080/#/c/5819/3/src/kudu/util/timestamp_value.cc File src/kudu/util/timestamp_value.cc: Line 42: Status ToPTime(const int32_t* date_val, const int64_t* time_duration_val, ptime* pt) { can be in an anonymous namespace? (also a few things below) PS3, Line 49: *reinterpret_cast<const date*>(date_val) I think bit_cast<date>(date_val) would be safer and also inherently include the static_assert you have above Line 60: *time_duration_val)); include e.what() or something? PS3, Line 65: class why's the class keyword necessary here? that's odd Line 68: *date_val = *reinterpret_cast<int32_t*>(&date_temp); same bitcast Line 73: if (PREDICT_FALSE(gmtime_r(&unix_time, &temp_tm) == NULL)) { A few weeks back I benchmarked different ways of going from time_t to ptime and found this was the slowest. Benchmark here: https://gist.github.com/toddlipcon/db1d791e27d1399c904869330e9c67a0 Results: https://gist.github.com/b01fa7e3da1d26e08b30372a1919ba24 Line 90: Status TimestampValue::FromUnixTime(int64_t unix_time, int64_t nanos, TimestampValue* out) { it seems this is attempting to allow any range of 'nanos'. maybe the code could be more efficient if we restricted it to be valid nanos? Line 111: if (PREDICT_FALSE(!t.IsValid())) { isn't there already a DCHECK to this effect in the above constructor? seems we need test coverage of this PS3, Line 138: CHECK_OK(ToPTime(&date_, &time_duration_, &pt)); : pt -= boost::posix_time::nanoseconds(1); : if (!pt.is_not_a_date_time()) { : FromPTime(pt, &date_, &time_duration_); : return true; : } : actually wondering if this would be simpler to implement by just doing: if (--nanos_ < 0) { DCHECK_EQ(nanos_, -1); nanos_ = 1e9 - 1; days_--; } rather than doing the to/from? I'm pretty sure it would run faster at least. Line 149: ptime min = boost::posix_time::ptime(boost::posix_time::min_date_time); any idea what this actually represents? maybe clearer to just hard-code the days/nanos value so we know a little better what's going on. Line 168: if (!pt.time_of_day().is_special()) { how do we get "special" values? Line 175: bool TimestampValue::IsConsecutive(const TimestampValue& other) const { again think this might be clearer just using the underlying fields instead of the to/from http://gerrit.cloudera.org:8080/#/c/5819/3/src/kudu/util/timestamp_value.h File src/kudu/util/timestamp_value.h: Line 17: #pragma once I think with client-facing headers we have to use old-style include guards. Line 19: #include <glog/logging.h> this probably needs the various incantations that other client-facing headers do (the stubs stuff, etc) Line 26: class TimestampValue { this needs a KUDU_EXPORT, right? PS3, Line 28: local We should add lots of extra-verbose documentation about timezones with this class. Namely, that this class represents a timezone-less time, and there are no timezone corrections done, etc.. perhaps complete with examples. Line 36: // Returns a non-OK status if the conversion could not be made. should document any "well-formedness" requirements and allowable range. Is there any value of unix_time which causes overflow in either direction of the julian day? Is 'nanos' required to be between 0 and 1e9? Line 39: static Status FromDateAndTimeDuration(int32_t date, int64_t time_duration, TimestampValue* out); document the parameters here. Also the bit above about well-formedness PS3, Line 41: // The minimum possible value for a TimestampValue; : static TimestampValue Min(); : : // The maximum possible value for a TimestampValue; : static TimestampValue Max(); are these valid values to actually insert? or only valid in the sense of a predicate? Also if we don't expect them to actually be used by users, perhaps we should KUDU_NO_EXPORT them Line 47: // "Raw" ctor, for internal use only. I think there is a doxygen way to un-document this. Or maybe make it private and declare appropriate friends PS3, Line 77: strings typo PS3, Line 92: // Returns whether 'other' is consecutive to 'this'. : bool IsConsecutive(const TimestampValue& other) const; I think the naming and docs are slightly confusing here - not 100% obvious which direction 'this' and 'other' are supposed to fall. Perhaps bool IsImmediatePredecessorOf(...) ? Or invert the two and call it IsImmediateSuccessorOf(...)? Line 113: int32_t date_; document valid range Line 115: // 8 bytes to store nanoseconds within the current day. should document the invariant that this is in the range (0..1e9] -- To view, visit http://gerrit.cloudera.org:8080/5819 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I63271adf6b87f048e30bf2d6d04c758884752afc Gerrit-PatchSet: 3 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: David Ribeiro Alves <[email protected]> Gerrit-Reviewer: Jean-Daniel Cryans <[email protected]> Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon <[email protected]> Gerrit-HasComments: Yes
