Todd Lipcon has posted comments on this change.

Change subject: WIP: Add a new TIMESTAMP type
......................................................................


Patch Set 3:

(38 comments)

http://gerrit.cloudera.org:8080/#/c/5819/3/src/kudu/client/scan_batch.h
File src/kudu/client/scan_batch.h:

Line 31: #include "kudu/util/timestamp_value.h"
is it possible to move this to common/ ? it seems like it's not so general of a 
thing that it really belongs in util/ IMO


PS3, Line 219: their
'its'?


PS3, Line 220: getter
getters


http://gerrit.cloudera.org:8080/#/c/5819/3/src/kudu/client/scanner-internal.cc
File src/kudu/client/scanner-internal.cc:

PS3, Line 325:   LOG(INFO) << "ScanRequest: " << scan->DebugString();
             :   LOG(INFO) << "PartitionKey: " << partition_key;
             :  
I don't think you meant to keep these


http://gerrit.cloudera.org:8080/#/c/5819/3/src/kudu/client/schema.h
File src/kudu/client/schema.h:

Line 135:     TIMESTAMP = 10
hrm... what does this mean for people who might have been using the deprecated 
name with a previous version client? I'm a little afraid of silent 
(non-compilation-breaking) breakage if someone upgrades from 1.2 to 1.3.


http://gerrit.cloudera.org:8080/#/c/5819/3/src/kudu/client/value-internal.h
File src/kudu/client/value-internal.h:

Line 30: class KuduValue::Data {
> error: use of undeclared identifier 'KuduValue' [clang-diagnostic-error]
not your fault, but I guess we need to be including value.h or somesuch


Line 46:   TimestampValue timestamp_val_;
Looking back at the original code review where this was implemented, slice_val_ 
was left out of the union because it is a non-POD type and pre-C++11 that 
wasn't allowed. Now with C++11 it _is_ allowed, so I think we may as well stuff 
slice_val and timestamp_val inside the union now, if that doesn't cause some 
unforeseen issues


http://gerrit.cloudera.org:8080/#/c/5819/3/src/kudu/client/value.cc
File src/kudu/client/value.cc:

PS3, Line 90: auto
nit: can you use 'auto*' here so it's more obvious it's a pointer?


Line 209:   *val_void = &timestamp_val_;
is this also a reasonable place to be checking that the value is well-formed? I 
remember discussing that we should verify it on receipt by the server, but I 
wonder if we also need to do some verification in the client to make sure we 
don't end up generating any bogus encoded keys (or crashing)


http://gerrit.cloudera.org:8080/#/c/5819/3/src/kudu/client/value.h
File src/kudu/client/value.h:

Line 54:   static KuduValue* FromTimestamp(TimestampValue t);
> warning: function 'kudu::client::KuduValue::FromTimestamp' has a definition
add doxygen


http://gerrit.cloudera.org:8080/#/c/5819/3/src/kudu/common/encoded_key-test.cc
File src/kudu/common/encoded_key-test.cc:

Line 276: TEST_F(EncodedKeyTest, TestTimestampKey) {
what about some test with a negative day? is that a legal value?


Line 299:   TimestampValue tv3(tv2.date() + 1, tv2.time_duration());
some little comments for each test case would be nice. eg here // Test a larger 
day portion with same nanoseconds portion


http://gerrit.cloudera.org:8080/#/c/5819/3/src/kudu/common/key_encoder.h
File src/kudu/common/key_encoder.h:

Line 131:     uint32_t u_date;
could we just call:

KeyEncoderTraits<INT32, Buffer>::Encode(&date, dst);
KeyEncoderTraits<INT64, Buffer>::Encode(&time_duration, dst);

here?


Line 160:     u_date = BigEndian::FromHost32(u_date);
same as above, could we reuse the INT32/INT64 decoder?


http://gerrit.cloudera.org:8080/#/c/5819/3/src/kudu/util/timestamp_value.cc
File src/kudu/util/timestamp_value.cc:

Line 42: Status ToPTime(const int32_t* date_val, const int64_t* 
time_duration_val, ptime* pt) {
can be in an anonymous namespace? (also a few things below)


PS3, Line 49: *reinterpret_cast<const date*>(date_val)
I think bit_cast<date>(date_val) would be safer and also inherently include the 
static_assert you have above


Line 60:                                               *time_duration_val));
include e.what() or something?


PS3, Line 65: class
why's the class keyword necessary here? that's odd


Line 68:   *date_val = *reinterpret_cast<int32_t*>(&date_temp);
same bitcast


Line 73:   if (PREDICT_FALSE(gmtime_r(&unix_time, &temp_tm) == NULL)) {
A few weeks back I benchmarked different ways of going from time_t to ptime and 
found this was the slowest. Benchmark here: 
https://gist.github.com/toddlipcon/db1d791e27d1399c904869330e9c67a0

Results:
https://gist.github.com/b01fa7e3da1d26e08b30372a1919ba24


Line 90: Status TimestampValue::FromUnixTime(int64_t unix_time, int64_t nanos, 
TimestampValue* out) {
it seems this is attempting to allow any range of 'nanos'. maybe the code could 
be more efficient if we restricted it to be valid nanos?


Line 111:   if (PREDICT_FALSE(!t.IsValid())) {
isn't there already a DCHECK to this effect in the above constructor? seems we 
need test coverage of this


PS3, Line 138:   CHECK_OK(ToPTime(&date_, &time_duration_, &pt));
             :   pt -= boost::posix_time::nanoseconds(1);
             :   if (!pt.is_not_a_date_time()) {
             :     FromPTime(pt, &date_, &time_duration_);
             :     return true;
             :   }
             :  
actually wondering if this would be simpler to implement by just doing:

if (--nanos_ < 0) {
  DCHECK_EQ(nanos_, -1);
  nanos_ = 1e9 - 1;
  days_--;
}

rather than doing the to/from? I'm pretty sure it would run faster at least.


Line 149:   ptime min = 
boost::posix_time::ptime(boost::posix_time::min_date_time);
any idea what this actually represents? maybe clearer to just hard-code the 
days/nanos value so we know a little better what's going on.


Line 168:   if (!pt.time_of_day().is_special()) {
how do we get "special" values?


Line 175: bool TimestampValue::IsConsecutive(const TimestampValue& other) const 
{
again think this might be clearer just using the underlying fields instead of 
the to/from


http://gerrit.cloudera.org:8080/#/c/5819/3/src/kudu/util/timestamp_value.h
File src/kudu/util/timestamp_value.h:

Line 17: #pragma once
I think with client-facing headers we have to use old-style include guards.


Line 19: #include <glog/logging.h>
this probably needs the various incantations that other client-facing headers 
do (the stubs stuff, etc)


Line 26: class TimestampValue {
this needs a KUDU_EXPORT, right?


PS3, Line 28: local
We should add lots of extra-verbose documentation about timezones with this 
class. Namely, that this class represents a timezone-less time, and there are 
no timezone corrections done, etc.. perhaps complete with examples.


Line 36:   // Returns a non-OK status if the conversion could not be made.
should document any "well-formedness" requirements and allowable range. Is 
there any value of unix_time which causes overflow in either direction of the 
julian day? Is 'nanos' required to be between 0 and 1e9?


Line 39:   static Status FromDateAndTimeDuration(int32_t date, int64_t 
time_duration, TimestampValue* out);
document the parameters here. Also the bit above about well-formedness


PS3, Line 41:   // The minimum possible value for a TimestampValue;
            :   static TimestampValue Min();
            : 
            :   // The maximum possible value for a TimestampValue;
            :   static TimestampValue Max();
are these valid values to actually insert? or only valid in the sense of a 
predicate? Also if we don't expect them to actually be used by users, perhaps 
we should KUDU_NO_EXPORT them


Line 47:   // "Raw" ctor, for internal use only.
I think there is a doxygen way to un-document this. Or maybe make it private 
and declare appropriate friends


PS3, Line 77: strings
typo


PS3, Line 92:   // Returns whether 'other' is consecutive to 'this'.
            :   bool IsConsecutive(const TimestampValue& other) const;
I think the naming and docs are slightly confusing here - not 100% obvious 
which direction 'this' and 'other' are supposed to fall. Perhaps bool 
IsImmediatePredecessorOf(...) ? Or invert the two and call it 
IsImmediateSuccessorOf(...)?


Line 113:   int32_t date_;
document valid range


Line 115:   // 8 bytes to store nanoseconds within the current day.
should document the invariant that this is in the range (0..1e9]


-- 
To view, visit http://gerrit.cloudera.org:8080/5819
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I63271adf6b87f048e30bf2d6d04c758884752afc
Gerrit-PatchSet: 3
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: David Ribeiro Alves <[email protected]>
Gerrit-Reviewer: Jean-Daniel Cryans <[email protected]>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <[email protected]>
Gerrit-HasComments: Yes

Reply via email to