Peter Rozsa has posted comments on this change. ( http://gerrit.cloudera.org:8080/24392 )
Change subject: IMPALA-15067: Add VariantValue and basic decoding functions ...................................................................... Patch Set 3: (14 comments) Nice patch Zoltan! This is the first batch of my comments, I'm still working on the review. http://gerrit.cloudera.org:8080/#/c/24392/3/be/src/exprs/variant-functions-ir.cc File be/src/exprs/variant-functions-ir.cc: http://gerrit.cloudera.org:8080/#/c/24392/3/be/src/exprs/variant-functions-ir.cc@29 PS3, Line 29: std::string json; We could reserve value.len * [arbitrary number] for less reallocations. http://gerrit.cloudera.org:8080/#/c/24392/3/be/src/exprs/variant-functions-ir.cc@34 PS3, Line 34: StringVal result(ctx, json.size()); StringVal::CopyFrom does the same http://gerrit.cloudera.org:8080/#/c/24392/3/be/src/runtime/variant-value.h File be/src/runtime/variant-value.h: http://gerrit.cloudera.org:8080/#/c/24392/3/be/src/runtime/variant-value.h@108 PS3, Line 108: VariantMetadata nit: it's more readable to store it in a separate .h and the implementations in a separate .cc, on the other hand, logically they belong to the same unit. http://gerrit.cloudera.org:8080/#/c/24392/3/be/src/runtime/variant-value.h@113 PS3, Line 113: int could be uint32_t http://gerrit.cloudera.org:8080/#/c/24392/3/be/src/runtime/variant-value.h@124 PS3, Line 124: int could be uint32_t http://gerrit.cloudera.org:8080/#/c/24392/3/be/src/runtime/variant-value.h@129 PS3, Line 129: is_valid nit: IsValid http://gerrit.cloudera.org:8080/#/c/24392/3/be/src/runtime/variant-value.h@132 PS3, Line 132: uint32_t ReadOffset(int index) const; : : const uint8_t* data_ = nullptr; : int len_ = 0; : int version_ = 0; : bool is_sorted_ = false; : int offset_size_ = 0; // 1, 2, 3, or 4 bytes per offset : int dict_size_ = 0; : const uint8_t* offsets_ = nullptr; // start of offset array : const uint8_t* string_data_ = nullptr; // start of string data These fields can be reordered and typed more strictly: const uint8_t* offsets_ = nullptr; const uint8_t* string_data_ = nullptr; uint32_t dict_size_ = 0; uint8_t version_ = 0; uint8_t offset_size_ = 0; bool is_sorted_ = false; len_ and data_ is not used after Init http://gerrit.cloudera.org:8080/#/c/24392/3/be/src/runtime/variant-value.h@158 PS3, Line 158: len nit: int32_t http://gerrit.cloudera.org:8080/#/c/24392/3/be/src/runtime/variant-value.h@202 PS3, Line 202: Status ToJson(const VariantMetadata* metadata, std::string* out) const; metadata is available as a member http://gerrit.cloudera.org:8080/#/c/24392/3/be/src/runtime/variant-value.h@204 PS3, Line 204: is_valid nit: IsValid, other common accessors could follow the naming pattern http://gerrit.cloudera.org:8080/#/c/24392/3/be/src/runtime/variant-value.h@213 PS3, Line 213: memcpy(&val, data_ + offset, sizeof(T)); Please add a DCHECK before the memcpy that validates that there's no OOB access http://gerrit.cloudera.org:8080/#/c/24392/3/be/src/runtime/variant-value.cc File be/src/runtime/variant-value.cc: http://gerrit.cloudera.org:8080/#/c/24392/3/be/src/runtime/variant-value.cc@70 PS3, Line 70: for (int i = 0; i < offset_size_; ++i) { : val |= static_cast<uint32_t>(p[i]) << (8 * i); : } This could be unrolled to a switch statement for the 4 possbile cases http://gerrit.cloudera.org:8080/#/c/24392/3/be/src/runtime/variant-value.cc@92 PS3, Line 92: StringValue sv = GetFieldName(mid); StringValue construction could be skipped by reinterpreting the string_data_ + start address to a const char*, also the offset calculation could be hoisted here to avoid calling GetFieldName at all ``` const uint8_t* p = offsets_ + mid * offset_size_; uint32_t start = ReadOffsetAt(p); uint32_t end = ReadOffsetAt(p + offset_size_); int sv_len = end - start; const char* sv_ptr = reinterpret_cast<const char*>(string_data_ + start); ``` http://gerrit.cloudera.org:8080/#/c/24392/3/be/src/util/variant-util.cc File be/src/util/variant-util.cc: http://gerrit.cloudera.org:8080/#/c/24392/3/be/src/util/variant-util.cc@30 PS3, Line 30: static void JsonEscapeString(const char* ptr, int len, string* out) { Could we use rapidjson to serialize JSON strings? -- To view, visit http://gerrit.cloudera.org:8080/24392 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I904618570e8c21d099c9a96b496d85e9246483de Gerrit-Change-Number: 24392 Gerrit-PatchSet: 3 Gerrit-Owner: Zoltan Borok-Nagy <[email protected]> Gerrit-Reviewer: Arnab Karmakar <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Noemi Pap-Takacs <[email protected]> Gerrit-Reviewer: Peter Rozsa <[email protected]> Gerrit-Reviewer: Zoltan Borok-Nagy <[email protected]> Gerrit-Comment-Date: Thu, 11 Jun 2026 13:34:19 +0000 Gerrit-HasComments: Yes
