Peter Rozsa has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/24392 )

Change subject: IMPALA-15067: Add VariantValue and basic decoding functions
......................................................................


Patch Set 3:

(14 comments)

Nice patch Zoltan! This is the first batch of my comments, I'm still working on 
the review.

http://gerrit.cloudera.org:8080/#/c/24392/3/be/src/exprs/variant-functions-ir.cc
File be/src/exprs/variant-functions-ir.cc:

http://gerrit.cloudera.org:8080/#/c/24392/3/be/src/exprs/variant-functions-ir.cc@29
PS3, Line 29:   std::string json;
We could reserve value.len * [arbitrary number] for less reallocations.


http://gerrit.cloudera.org:8080/#/c/24392/3/be/src/exprs/variant-functions-ir.cc@34
PS3, Line 34:   StringVal result(ctx, json.size());
StringVal::CopyFrom does the same


http://gerrit.cloudera.org:8080/#/c/24392/3/be/src/runtime/variant-value.h
File be/src/runtime/variant-value.h:

http://gerrit.cloudera.org:8080/#/c/24392/3/be/src/runtime/variant-value.h@108
PS3, Line 108: VariantMetadata
nit: it's more readable to store it in a separate .h and the implementations in 
a separate .cc, on the other hand, logically they belong to the same unit.


http://gerrit.cloudera.org:8080/#/c/24392/3/be/src/runtime/variant-value.h@113
PS3, Line 113: int
could be uint32_t


http://gerrit.cloudera.org:8080/#/c/24392/3/be/src/runtime/variant-value.h@124
PS3, Line 124: int
could be uint32_t


http://gerrit.cloudera.org:8080/#/c/24392/3/be/src/runtime/variant-value.h@129
PS3, Line 129: is_valid
nit: IsValid


http://gerrit.cloudera.org:8080/#/c/24392/3/be/src/runtime/variant-value.h@132
PS3, Line 132:   uint32_t ReadOffset(int index) const;
             :
             :   const uint8_t* data_ = nullptr;
             :   int len_ = 0;
             :   int version_ = 0;
             :   bool is_sorted_ = false;
             :   int offset_size_ = 0;  // 1, 2, 3, or 4 bytes per offset
             :   int dict_size_ = 0;
             :   const uint8_t* offsets_ = nullptr;  // start of offset array
             :   const uint8_t* string_data_ = nullptr;  // start of string data
These fields can be reordered and typed more strictly:
const uint8_t* offsets_ = nullptr;
const uint8_t* string_data_ = nullptr;
uint32_t dict_size_ = 0;
uint8_t version_ = 0;
uint8_t offset_size_ = 0;
bool is_sorted_ = false;

len_ and data_ is not used after Init


http://gerrit.cloudera.org:8080/#/c/24392/3/be/src/runtime/variant-value.h@158
PS3, Line 158: len
nit: int32_t


http://gerrit.cloudera.org:8080/#/c/24392/3/be/src/runtime/variant-value.h@202
PS3, Line 202:   Status ToJson(const VariantMetadata* metadata, std::string* 
out) const;
metadata is available as a member


http://gerrit.cloudera.org:8080/#/c/24392/3/be/src/runtime/variant-value.h@204
PS3, Line 204: is_valid
nit: IsValid, other common accessors could follow the naming pattern


http://gerrit.cloudera.org:8080/#/c/24392/3/be/src/runtime/variant-value.h@213
PS3, Line 213:     memcpy(&val, data_ + offset, sizeof(T));
Please add a DCHECK before the memcpy that validates that there's no OOB access


http://gerrit.cloudera.org:8080/#/c/24392/3/be/src/runtime/variant-value.cc
File be/src/runtime/variant-value.cc:

http://gerrit.cloudera.org:8080/#/c/24392/3/be/src/runtime/variant-value.cc@70
PS3, Line 70:   for (int i = 0; i < offset_size_; ++i) {
            :     val |= static_cast<uint32_t>(p[i]) << (8 * i);
            :   }
This could be unrolled to a switch statement for the 4 possbile cases


http://gerrit.cloudera.org:8080/#/c/24392/3/be/src/runtime/variant-value.cc@92
PS3, Line 92:  StringValue sv = GetFieldName(mid);
StringValue construction could be skipped by reinterpreting the string_data_ + 
start address to a const char*, also the offset calculation could be hoisted 
here to avoid calling GetFieldName at all
```  const uint8_t* p = offsets_ + mid * offset_size_;
      uint32_t start = ReadOffsetAt(p);
      uint32_t end = ReadOffsetAt(p + offset_size_);
      int sv_len = end - start;
      const char* sv_ptr = reinterpret_cast<const char*>(string_data_ + start);
```


http://gerrit.cloudera.org:8080/#/c/24392/3/be/src/util/variant-util.cc
File be/src/util/variant-util.cc:

http://gerrit.cloudera.org:8080/#/c/24392/3/be/src/util/variant-util.cc@30
PS3, Line 30: static void JsonEscapeString(const char* ptr, int len, string* 
out) {
Could we use rapidjson to serialize JSON strings?



--
To view, visit http://gerrit.cloudera.org:8080/24392
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I904618570e8c21d099c9a96b496d85e9246483de
Gerrit-Change-Number: 24392
Gerrit-PatchSet: 3
Gerrit-Owner: Zoltan Borok-Nagy <[email protected]>
Gerrit-Reviewer: Arnab Karmakar <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Noemi Pap-Takacs <[email protected]>
Gerrit-Reviewer: Peter Rozsa <[email protected]>
Gerrit-Reviewer: Zoltan Borok-Nagy <[email protected]>
Gerrit-Comment-Date: Thu, 11 Jun 2026 13:34:19 +0000
Gerrit-HasComments: Yes

Reply via email to