Zoltan Borok-Nagy has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20496 )

Change subject: IMPALA-12373: Small String Optimization for StringValue
......................................................................


Patch Set 4:

(80 comments)

Thanks a lot for the comments!

http://gerrit.cloudera.org:8080/#/c/20496/4//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/20496/4//COMMIT_MSG@15
PS4, Line 15: byte
> Nit: bytes.
Done


http://gerrit.cloudera.org:8080/#/c/20496/4//COMMIT_MSG@16
PS4, Line 16: byte
> Nit: bytes.
Done


http://gerrit.cloudera.org:8080/#/c/20496/4//COMMIT_MSG@19
PS4, Line 19: between 'ptr' and 'len'
> It also means that there is no padding _after_ 'len': if it wasn't packed,
Done


http://gerrit.cloudera.org:8080/#/c/20496/4//COMMIT_MSG@20
PS4, Line 20: byte
> Nit: bytes.
Done


http://gerrit.cloudera.org:8080/#/c/20496/4//COMMIT_MSG@40
PS4, Line 40: we can also smallify all strings in a tuple at
            : once.
> TODO: The description of this use case is not clear to me.
The remaining parts of this paragraph try to explain it.
In short, it is something we can use to optimize Tuple::SmallifyStrings(), 
because we can return early on the first small string that we encounter.


http://gerrit.cloudera.org:8080/#/c/20496/4//COMMIT_MSG@46
PS4, Line 46:  * lower memory and CPU cache consumption
> I'm not really clear where this is coming from. StringValue appears to esse
Yes, I didn't change anything when we are creating the strings during SCANs. 
Because in such cases the strings already exist in some buffer, e.g. in a 
Parquet dictionary page. So things are *mostly* good when we are in the 
fragment of the SCAN NODE.

But whenever we transfer tuples over the network, we deepcopy each string, i.e. 
every string object will have its own buffer. And at EXCHANGE receivers we 
don't de-duplicate those.

But even without EXCHANGEs, spilling operators put tuples into 
BufferedTupleStream buffers, i.e. they deepcopy each tuple. So we can save a 
lot of memory in that case. Result spooling also uses BufferedTupleStream.


http://gerrit.cloudera.org:8080/#/c/20496/4//COMMIT_MSG@59
PS4, Line 59: 6.5 columns
> It's not clear to me what 6.5 columns means? Is it that in the 7th column h
Exactly. I could explain it more, but it is not too significant, and I don't 
want the commit message to be too long. (it is quite long already).


http://gerrit.cloudera.org:8080/#/c/20496/4//COMMIT_MSG@86
PS4, Line 86: *
> My biggest concerns are not about testing small strings, but about testing
I extended my TODO-list for now.

Extending alltypes would require a lot of changes. But I can add more backend 
tests for long string values. Also some e2e tests with long strings in complex 
types.

Other e2e tests and TPC-H / TPC-DS benchmarks should also give us good coverage.


http://gerrit.cloudera.org:8080/#/c/20496/4/be/src/codegen/llvm-codegen-test.cc
File be/src/codegen/llvm-codegen-test.cc:

http://gerrit.cloudera.org:8080/#/c/20496/4/be/src/codegen/llvm-codegen-test.cc@365
PS4, Line 365:   llvm::Function* str_setlen_fn = codegen->GetFunction(
> If we retrieve 'str_ptr_fn' and 'str_len_fn' at the beginning of the functi
Done


http://gerrit.cloudera.org:8080/#/c/20496/4/be/src/exec/analytic-eval-node.cc
File be/src/exec/analytic-eval-node.cc:

http://gerrit.cloudera.org:8080/#/c/20496/4/be/src/exec/analytic-eval-node.cc@397
PS4, Line 397:     StringValue::SimpleString s = sv->ToSimpleString();
> Do we need this conversion? Couldn't we just use sv->Len() and sv->Ptr()?
I would only switch in the trivial cases. Otherwise, looking at the assembly, 
in most cases the ToSimpleString() variants generate less CPU instructions.


http://gerrit.cloudera.org:8080/#/c/20496/4/be/src/exec/hdfs-text-table-writer.cc
File be/src/exec/hdfs-text-table-writer.cc:

http://gerrit.cloudera.org:8080/#/c/20496/4/be/src/exec/hdfs-text-table-writer.cc@167
PS4, Line 167:   StringValue::SimpleString s = str_val->ToSimpleString();
> Can't we use str_val->Len() and str_val->Ptr()?
I'd rather keep it this way as the function is non-trivial.


http://gerrit.cloudera.org:8080/#/c/20496/4/be/src/exec/kudu/kudu-util-ir.cc
File be/src/exec/kudu/kudu-util-ir.cc:

http://gerrit.cloudera.org:8080/#/c/20496/4/be/src/exec/kudu/kudu-util-ir.cc@59
PS4, Line 59:       StringValue::SimpleString s = sv->ToSimpleString();
> Can't we use sv->Len() and sv->Ptr()?
Done


http://gerrit.cloudera.org:8080/#/c/20496/4/be/src/exec/kudu/kudu-util-ir.cc@72
PS4, Line 72:       StringValue::SimpleString s = sv->ToSimpleString();
> Can't we use sv->Len() and sv->Ptr()?
Done


http://gerrit.cloudera.org:8080/#/c/20496/4/be/src/exec/orc/hdfs-orc-scanner.cc
File be/src/exec/orc/hdfs-orc-scanner.cc:

http://gerrit.cloudera.org:8080/#/c/20496/4/be/src/exec/orc/hdfs-orc-scanner.cc@1175
PS4, Line 1175:       StringValue::SimpleString s = sv->ToSimpleString();
> Can't we use sv->Len() and sv->Ptr()?
I'd rather keep it here.


http://gerrit.cloudera.org:8080/#/c/20496/4/be/src/exec/parquet/parquet-column-stats.cc
File be/src/exec/parquet/parquet-column-stats.cc:

http://gerrit.cloudera.org:8080/#/c/20496/4/be/src/exec/parquet/parquet-column-stats.cc@417
PS4, Line 417:   StringValue::SimpleString value_s = value->ToSimpleString();
> Can't we use value->Len() and value->Ptr()?
For now I'd prefer to keep it as is.


http://gerrit.cloudera.org:8080/#/c/20496/4/be/src/exec/parquet/parquet-common.h
File be/src/exec/parquet/parquet-common.h:

http://gerrit.cloudera.org:8080/#/c/20496/4/be/src/exec/parquet/parquet-common.h@560
PS4, Line 560:   StringValue::SimpleString s = v.ToSimpleString();
> Can't we use v->Len() and v->Ptr()?
No, at L561 we would take the address of an rvalue.

Also, I checked the assembly and ToSimpleString() variant uses fewer 
instructions: https://godbolt.org/z/KxnY314cj


http://gerrit.cloudera.org:8080/#/c/20496/4/be/src/exec/parquet/parquet-common.h@573
PS4, Line 573:   v->Clear();
> Do we call Clear() to ensure the StringValue is not small?
Done


http://gerrit.cloudera.org:8080/#/c/20496/4/be/src/exec/text-converter.inline.h
File be/src/exec/text-converter.inline.h:

http://gerrit.cloudera.org:8080/#/c/20496/4/be/src/exec/text-converter.inline.h@95
PS4, Line 95:         StringValue::SimpleString s = str.ToSimpleString();
> Can't we use str->Len() and str->Ptr()?
I'd rather keep it that way here.


http://gerrit.cloudera.org:8080/#/c/20496/4/be/src/exec/text-converter.inline.h@114
PS4, Line 114:         StringValue::SimpleString s = str.ToSimpleString();
> Can't we use str->Len() and str->Ptr()?
Done


http://gerrit.cloudera.org:8080/#/c/20496/4/be/src/experiments/string-search-sse.h
File be/src/experiments/string-search-sse.h:

http://gerrit.cloudera.org:8080/#/c/20496/4/be/src/experiments/string-search-sse.h@60
PS4, Line 60:     StringValue::SimpleString haystack_s = 
haystack.ToSimpleString();
> Can't we use haystack->Len() and haystack->Ptr()?
In such a complex function I would not invoke Ptr() and Len() that much.
I could introduce variables to hold ptr and len, but that would be a bigger 
change to this function.


http://gerrit.cloudera.org:8080/#/c/20496/4/be/src/exprs/string-functions-ir.cc
File be/src/exprs/string-functions-ir.cc:

http://gerrit.cloudera.org:8080/#/c/20496/4/be/src/exprs/string-functions-ir.cc@525
PS4, Line 525:   StringValue::SimpleString haystack_s = 
haystack.ToSimpleString();
> Can't we use haystack->Len() and haystack->Ptr()?
I would keep it as is in such a complex function.


http://gerrit.cloudera.org:8080/#/c/20496/4/be/src/runtime/buffered-tuple-stream-test.cc
File be/src/runtime/buffered-tuple-stream-test.cc:

http://gerrit.cloudera.org:8080/#/c/20496/4/be/src/runtime/buffered-tuple-stream-test.cc@235
PS4, Line 235: Assign
> We don't use sv.SetPtr() because it may be a small string? If so we could a
I've put it into an if stmt.


http://gerrit.cloudera.org:8080/#/c/20496/4/be/src/runtime/buffered-tuple-stream.cc
File be/src/runtime/buffered-tuple-stream.cc:

http://gerrit.cloudera.org:8080/#/c/20496/4/be/src/runtime/buffered-tuple-stream.cc@1118
PS4, Line 1118:       StringValue::SimpleString s = sv->ToSimpleString();
> Can't we use sv->Len() and sv->Ptr()?
Current implementation generates much less CPU instructions: 
https://godbolt.org/z/GrP6PWEfh

Probably because of the potential aliasing between string value objects and 
other pointers.


http://gerrit.cloudera.org:8080/#/c/20496/4/be/src/runtime/descriptors.cc
File be/src/runtime/descriptors.cc:

http://gerrit.cloudera.org:8080/#/c/20496/4/be/src/runtime/descriptors.cc@713
PS4, Line 713:       any_val->SetPtr(ptr);
> These pointers will now point into the fixed length part of StringValue if
AFAICT these should never outlive the underlying tuple data.


http://gerrit.cloudera.org:8080/#/c/20496/4/be/src/runtime/descriptors.cc@1112
PS4, Line 1112: CodegenWriteStringOrCollectionToSlot
> Optional: we could inline this into the caller, strings and collections are
For now I'd keep it as is, unless you feel strong about it.


http://gerrit.cloudera.org:8080/#/c/20496/4/be/src/runtime/descriptors.cc@1123
PS4, Line 1123:
> As far as I can see, the main difference between CodegenWriteCollectionToSl
For now I'd just remove the conditionals/comments.


http://gerrit.cloudera.org:8080/#/c/20496/4/be/src/runtime/descriptors.cc@1130
PS4, Line 1130: type.IsStringType()
> Now it shouldn't be a string here. Alternatively, 'type' shouldn't need to 
Done


http://gerrit.cloudera.org:8080/#/c/20496/4/be/src/runtime/descriptors.cc@1133
PS4, Line 1133: raw_type
> This shouldn't be needed, we know it's a collection.
I would keep it for readability.


http://gerrit.cloudera.org:8080/#/c/20496/4/be/src/runtime/descriptors.cc@1134
PS4, Line 1134: str_or_coll_value
> Should be renamed to 'coll_value'.
Done


http://gerrit.cloudera.org:8080/#/c/20496/4/be/src/runtime/descriptors.cc@1139
PS4, Line 1139:     if (type.IsCollectionType()) {
> This IF is no longer needed, we know it's a collection type.
Done


http://gerrit.cloudera.org:8080/#/c/20496/4/be/src/runtime/descriptors.cc@1168
PS4, Line 1168: type.IsCollectionType()
> It shouldn't be a collection type. Alternatively, 'type' shouldn't need to
Done.
'type' is not passed directly, it is retrieved from read_write_info.


http://gerrit.cloudera.org:8080/#/c/20496/4/be/src/runtime/raw-value.inline.h
File be/src/runtime/raw-value.inline.h:

http://gerrit.cloudera.org:8080/#/c/20496/4/be/src/runtime/raw-value.inline.h@227
PS4, Line 227:     StringValue::SimpleString s = v->ToSimpleString();
> Can't we use v->Len() and v->Ptr()?
I rather keep it this way.


http://gerrit.cloudera.org:8080/#/c/20496/4/be/src/runtime/raw-value.inline.h@352
PS4, Line 352:     StringValue::SimpleString s = v->ToSimpleString();
> Can't we use v->Len() and v->Ptr()?
I rather keep it this way.


http://gerrit.cloudera.org:8080/#/c/20496/4/be/src/runtime/smallable-string.h
File be/src/runtime/smallable-string.h:

http://gerrit.cloudera.org:8080/#/c/20496/4/be/src/runtime/smallable-string.h@29
PS4, Line 29: length
> Nit: the length.
Done


http://gerrit.cloudera.org:8080/#/c/20496/4/be/src/runtime/smallable-string.h@33
PS4, Line 33: /// being used.
> You could mention the differences between big endian and little endian.
Done


http://gerrit.cloudera.org:8080/#/c/20496/4/be/src/runtime/smallable-string.h@38
PS4, Line 38:   struct SimpleString {
> I've commented on most uses of this struct and ToSimpleString() that it may
I introduced it because of two things:

* it guarantees that IsSmall() is only computed once (I think the compiler is 
smart enough in most cases, but at few places there are quite long functions 
and multiple references to len and ptr)
* it was easier to rewrite the code at a lot of places


http://gerrit.cloudera.org:8080/#/c/20496/4/be/src/runtime/smallable-string.h@48
PS4, Line 48:   SmallableString(const SmallableString& other) {
> Could add a comment. Mention that it conserves the "smallness" state of 'ot
Done


http://gerrit.cloudera.org:8080/#/c/20496/4/be/src/runtime/smallable-string.h@52
PS4, Line 52:   SmallableString(char* ptr, int len) {
> Could add a comment.
Done


http://gerrit.cloudera.org:8080/#/c/20496/4/be/src/runtime/smallable-string.h@60
PS4, Line 60:   explicit SmallableString(const std::string& s) :
> How much do we need the constructor with 'const std::string& s' and 'const
AFAICT we only use these constructors during backend tests.


http://gerrit.cloudera.org:8080/#/c/20496/4/be/src/runtime/smallable-string.h@65
PS4, Line 65:
> Nit: unneeded space.
Done


http://gerrit.cloudera.org:8080/#/c/20496/4/be/src/runtime/smallable-string.h@77
PS4, Line 77:     rep.long_rep.len =len;
> nit: add space after =
Done


http://gerrit.cloudera.org:8080/#/c/20496/4/be/src/runtime/smallable-string.h@83
PS4, Line 83:     char last_char = reinterpret_cast<const 
char*>(this)[sizeof(*this) - 1];
> This seems to rely on the string size being limited to 2^31 bytes; we shoul
Added sentence about it to the class comment.
Added DCHECKs whenever we set len.


http://gerrit.cloudera.org:8080/#/c/20496/4/be/src/runtime/smallable-string.h@90
PS4, Line 90:     if (rep.long_rep.len == 0) { return true; }
> Shouldn't we set the "small bit"? Now we return true but a subsequent call
Done


http://gerrit.cloudera.org:8080/#/c/20496/4/be/src/runtime/smallable-string.h@93
PS4, Line 93:     memset(this, 0, sizeof(*this));
> This memset doesn't seem to be necessary. We're setting 'rep.small_rep.len'
We are zeroing out the object so compression algorithms will be more efficient. 
Added a comment about this.


http://gerrit.cloudera.org:8080/#/c/20496/4/be/src/runtime/smallable-string.h@97
PS4, Line 97: 0b10000000
> Could extract the masks into constexpr static members.
Done


http://gerrit.cloudera.org:8080/#/c/20496/4/be/src/runtime/smallable-string.h@100
PS4, Line 100:       rep.long_rep.ptr = s;
> When would we ever hit this else case? Doesn't line 89 guard against it?
Yes, removed this branch.


http://gerrit.cloudera.org:8080/#/c/20496/4/be/src/runtime/smallable-string.h@132
PS4, Line 132:   void SetLen(int len) {
> The semantics of this method don't seem very clear.
You are right, moved the DCHECK to the beginning.


http://gerrit.cloudera.org:8080/#/c/20496/4/be/src/runtime/smallable-string.h@133
PS4, Line 133:     if (!IsSmall()) {
> I'd invert the IF, i.e. "if (IsSmall()) ...". It's more difficult to unders
Done


http://gerrit.cloudera.org:8080/#/c/20496/4/be/src/runtime/smallable-string.h@170
PS4, Line 170: int
> We could use int32_t to indicate that 'len' should be 4 bytes long. If we d
Changed it to uint32_t.


http://gerrit.cloudera.org:8080/#/c/20496/4/be/src/runtime/sorter.cc
File be/src/runtime/sorter.cc:

http://gerrit.cloudera.org:8080/#/c/20496/4/be/src/runtime/sorter.cc@608
PS4, Line 608:       if (string_val->IsSmall()) continue;
> Optional: I think "if (!string_val->IsSmall()) { add length }" would be bet
I chose continue because I didn't want to add extra nesting.


http://gerrit.cloudera.org:8080/#/c/20496/4/be/src/runtime/sorter.cc@751
PS4, Line 751:       if (value->IsSmall()) continue;
> This could go into the branch at L743. That IF could also become "if conste
Done, but I also had to move the definition of 'value' above the if. But AFAICT 
it shouldn't cause any issue.


http://gerrit.cloudera.org:8080/#/c/20496/4/be/src/runtime/sorter.cc@786
PS4, Line 786:     using ptr_type = decltype(value->Ptr());
> Now that strings and collections are handled separately we don't need this
Done


http://gerrit.cloudera.org:8080/#/c/20496/4/be/src/runtime/string-buffer.h
File be/src/runtime/string-buffer.h:

http://gerrit.cloudera.org:8080/#/c/20496/4/be/src/runtime/string-buffer.h@45
PS4, Line 45:       StringValue::SimpleString s = str->ToSimpleString();
> We could use str->Ptr() and str->Len().
Done


http://gerrit.cloudera.org:8080/#/c/20496/4/be/src/runtime/string-search.h
File be/src/runtime/string-search.h:

http://gerrit.cloudera.org:8080/#/c/20496/4/be/src/runtime/string-search.h@83
PS4, Line 83:     if (str == NULL || pattern_ == NULL) {
> We could use "pattern_->Len() == 0" here. I think it's clearer to check it
Done


http://gerrit.cloudera.org:8080/#/c/20496/4/be/src/runtime/string-search.h@140
PS4, Line 140:     if (str == NULL || pattern_ == NULL) {
> See L83.
Done


http://gerrit.cloudera.org:8080/#/c/20496/4/be/src/runtime/string-value-ir.cc
File be/src/runtime/string-value-ir.cc:

http://gerrit.cloudera.org:8080/#/c/20496/4/be/src/runtime/string-value-ir.cc@23
PS4, Line 23: IR_ALWAYS_INLINE int StringValue::IrLen() const { return Len(); }
> Optional: instead of defining new wrapper functions, we could define String
I wanted to define them in the header so the compiler can surely inline the 
calls.

Probably LTO would also inline these, but I'm not sure how well it works with 
big executables like Impala.


http://gerrit.cloudera.org:8080/#/c/20496/4/be/src/runtime/string-value.h
File be/src/runtime/string-value.h:

http://gerrit.cloudera.org:8080/#/c/20496/4/be/src/runtime/string-value.h@36
PS4, Line 36: /// TODO: rename this to be less confusing with 
impala_udf::StringVal.
> Could mention SSO also here.
Done


http://gerrit.cloudera.org:8080/#/c/20496/4/be/src/runtime/string-value.h@53
PS4, Line 53:
> Nit: extra space.
Done


http://gerrit.cloudera.org:8080/#/c/20496/4/be/src/runtime/string-value.h@58
PS4, Line 58:
> Nit: extra space.
Done


http://gerrit.cloudera.org:8080/#/c/20496/4/be/src/runtime/string-value.h@73
PS4, Line 73:   bool IsSmall() const { return string_impl_.IsSmall(); }
            :
            :   bool Smallify() { return string_impl_.Smallify(); }
> Add comments?
Done


http://gerrit.cloudera.org:8080/#/c/20496/4/be/src/runtime/string-value.h@76
PS4, Line 76:
> line has trailing whitespace
Done


http://gerrit.cloudera.org:8080/#/c/20496/4/be/src/runtime/string-value.h@78
PS4, Line 78:   void SetLen(int len) { return string_impl_.SetLen(len); }
> Mention that we can only decrease the length if the string is small.
Now this is also true to long strings. Added a comment.


http://gerrit.cloudera.org:8080/#/c/20496/4/be/src/runtime/string-value.h@82
PS4, Line 82:   void SetPtr(char* ptr) { return string_impl_.SetPtr(ptr); }
> Mention that we can only call this if the string is not small.
Done


http://gerrit.cloudera.org:8080/#/c/20496/4/be/src/runtime/string-value.h@136
PS4, Line 136:     *sv = 
impala_udf::StringVal(reinterpret_cast<uint8_t*>(s.ptr), s.len);
> We could use sv->Ptr() and sv->Len().
Done


http://gerrit.cloudera.org:8080/#/c/20496/4/be/src/runtime/string-value.h@170
PS4, Line 170:   StringValue::SimpleString s = v.ToSimpleString();
> We could use v->Ptr() and v->Len().
Done


http://gerrit.cloudera.org:8080/#/c/20496/4/be/src/runtime/string-value.cc
File be/src/runtime/string-value.cc:

http://gerrit.cloudera.org:8080/#/c/20496/4/be/src/runtime/string-value.cc@38
PS4, Line 38: unsigned
> Why is it unsigned?
At some point I wanted to switch to unsigned types wherever possible, but that 
was too invasive because I had to modify too many callsites.

I don't think unsigned hurts here.


http://gerrit.cloudera.org:8080/#/c/20496/4/be/src/runtime/string-value.cc@50
PS4, Line 50: unsigned
> Why unsigned?
Same as above.


http://gerrit.cloudera.org:8080/#/c/20496/4/be/src/runtime/string-value.cc@54
PS4, Line 54:   int i = len - 1;
> Not relevant to this patch, but what if len==0? Shouldn't we also return an
I think so, but this should be fixed independently of this patch.


http://gerrit.cloudera.org:8080/#/c/20496/4/be/src/runtime/string-value.cc@80
PS4, Line 80: unsigned
> Why unsigned?
Same as above.


http://gerrit.cloudera.org:8080/#/c/20496/4/be/src/runtime/string-value.inline.h
File be/src/runtime/string-value.inline.h:

http://gerrit.cloudera.org:8080/#/c/20496/4/be/src/runtime/string-value.inline.h@117
PS4, Line 117: Substring
> Calling Smallify() on the result would solve this, but that would also smal
The tuple data is also on the heap and shouldn't be moved around, unless there 
is a spilling operator. But I wouldn't expect any expressions referencing to 
StringValues in that case.

>From our wiki:
Resources attached to a row batch may be referenced by previous batches in a 
row batch stream.

So I don't think we should expect our StringValue being freed before another 
StringValue that references it (from a previous row batch).

We could call Smallify(), but that might make Tuple::SmallifyStrings() return 
too early. Though I don't think it would matter much, so probably we want to be 
at the safe-side here.

What do you think?


http://gerrit.cloudera.org:8080/#/c/20496/4/be/src/runtime/string-value.inline.h@122
PS4, Line 122: inline StringValue StringValue::Substring(int start_pos, int 
new_len) const {
> Csaba's L117 comment also applies here.
Same as above.


http://gerrit.cloudera.org:8080/#/c/20496/4/be/src/runtime/string-value.inline.h@128
PS4, Line 128: inline StringValue StringValue::Trim() const {
> same as line 117
Same as above.


http://gerrit.cloudera.org:8080/#/c/20496/4/be/src/runtime/tuple-ir.cc
File be/src/runtime/tuple-ir.cc:

http://gerrit.cloudera.org:8080/#/c/20496/4/be/src/runtime/tuple-ir.cc@42
PS4, Line 42: str_len
> We don't really need this variable.
Done


http://gerrit.cloudera.org:8080/#/c/20496/4/be/src/runtime/tuple.cc
File be/src/runtime/tuple.cc:

http://gerrit.cloudera.org:8080/#/c/20496/4/be/src/runtime/tuple.cc@66
PS4, Line 66:
> Please add comment about why we can stop here.
Done


http://gerrit.cloudera.org:8080/#/c/20496/4/be/src/runtime/tuple.cc@91
PS4, Line 91: return
> An idea for this optimization:
Having the small ones first would mean we can return immediately when the 
strings are smallified.

I think we should make sure we always smallify the strings at once in a tuple, 
so we won't miss out smallifying strings. Or maybe latter we could start 
smallifying strings eagerly.


http://gerrit.cloudera.org:8080/#/c/20496/4/be/src/util/min-max-filter.cc
File be/src/util/min-max-filter.cc:

http://gerrit.cloudera.org:8080/#/c/20496/4/be/src/util/min-max-filter.cc@504
PS4, Line 504: CopyToBuffer
> In the function comment in the header we should mention that small strings
Done


http://gerrit.cloudera.org:8080/#/c/20496/4/be/src/util/min-max-filter.cc@507
PS4, Line 507:   if (value->IsSmall()) return;
> Can 'len' be shorter than 'value->Len()'? If yes, we should also decrease t
'len' can only be smaller than 'value->Len()' when 'value->Len()' is greater 
than MAX_BOUND_LENGTH = 1024.


http://gerrit.cloudera.org:8080/#/c/20496/4/be/src/util/tuple-row-compare.cc
File be/src/util/tuple-row-compare.cc:

http://gerrit.cloudera.org:8080/#/c/20496/4/be/src/util/tuple-row-compare.cc@246
PS4, Line 246:   LOG(WARNING) << "Start CodegenLexicalCompare";
> looks temporary logging
Done


http://gerrit.cloudera.org:8080/#/c/20496/4/be/src/util/tuple-row-compare.cc@350
PS4, Line 350:     LOG(WARNING) << "Finished CodegenLexicalCompare";
> looks temporary logging
Done


http://gerrit.cloudera.org:8080/#/c/20496/4/be/src/util/url-parser.cc
File be/src/util/url-parser.cc:

http://gerrit.cloudera.org:8080/#/c/20496/4/be/src/util/url-parser.cc@196
PS4, Line 196:         (trimmed_url.Ptr()[key_pos - 1] != '?' && 
trimmed_url.Ptr()[key_pos - 1] != '&')) {
> line too long (91 > 90)
Done



--
To view, visit http://gerrit.cloudera.org:8080/20496
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I741c3a5f12ab620b6b64b57d4c89b5f8e056efd3
Gerrit-Change-Number: 20496
Gerrit-PatchSet: 4
Gerrit-Owner: Zoltan Borok-Nagy <[email protected]>
Gerrit-Reviewer: Csaba Ringhofer <[email protected]>
Gerrit-Reviewer: Daniel Becker <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Michael Smith <[email protected]>
Gerrit-Reviewer: Zoltan Borok-Nagy <[email protected]>
Gerrit-Comment-Date: Thu, 28 Sep 2023 16:38:14 +0000
Gerrit-HasComments: Yes

Reply via email to