Quanlong Huang has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17771 )

Change subject: WiP: IMPALA-10798 : Prototype for JSON reader
......................................................................


Patch Set 9:

(13 comments)

The prototype is in good shape. Let's roll it out!

http://gerrit.cloudera.org:8080/#/c/17771/9/be/src/exec/CMakeLists.txt
File be/src/exec/CMakeLists.txt:

http://gerrit.cloudera.org:8080/#/c/17771/9/be/src/exec/CMakeLists.txt@71
PS9, Line 71:
nit: redundant whitespace


http://gerrit.cloudera.org:8080/#/c/17771/9/be/src/exec/hdfs-json-scanner.h
File be/src/exec/hdfs-json-scanner.h:

http://gerrit.cloudera.org:8080/#/c/17771/9/be/src/exec/hdfs-json-scanner.h@38
PS9, Line 38: #include "arrow/buffer.h"
            : #include "arrow/io/type_fwd.h"
            : #include "arrow/result.h"
            : #include "arrow/type_fwd.h"
            : #include "arrow/util/macros.h"
            : #include "arrow/util/string_view.h"
            : #include "arrow/util/type_fwd.h"
nit: use <> for external includes


http://gerrit.cloudera.org:8080/#/c/17771/9/be/src/exec/hdfs-json-scanner.h@92
PS9, Line 92:   class ScanRangeInputStream : public arrow::io::InputStream {
I think we need the offset inside the file. When a large JSON file is split 
into several blocks, they will results in several scan ranges and thus several 
scanner instances. Each scanner only reads the portion corresponding to its 
scan range.


http://gerrit.cloudera.org:8080/#/c/17771/9/be/src/exec/hdfs-json-scanner.h@132
PS9, Line 132: row_read
nit: rows_read_


http://gerrit.cloudera.org:8080/#/c/17771/9/be/src/exec/hdfs-json-scanner.h@133
PS9, Line 133: num_rows
nit: num_rows_


http://gerrit.cloudera.org:8080/#/c/17771/9/be/src/exec/hdfs-json-scanner.h@136
PS9, Line 136:   const char* filename() const { return metadata_range_->file(); 
}
nit: could you move this above to be together with the methods?


http://gerrit.cloudera.org:8080/#/c/17771/9/be/src/exec/hdfs-json-scanner.cc
File be/src/exec/hdfs-json-scanner.cc:

http://gerrit.cloudera.org:8080/#/c/17771/9/be/src/exec/hdfs-json-scanner.cc@19
PS9, Line 19: #include <arrow/api.h>
            : #include <arrow/array.h>
            : #include <arrow/buffer.h>
            : #include <arrow/config.h>
            : #include <arrow/io/interfaces.h>
            : #include <arrow/json/api.h>
            : #include <arrow/json/chunked_builder.h>
            : #include <arrow/json/chunker.h>
            : #include <arrow/json/converter.h>
            : #include <arrow/json/options.h>
            : #include <arrow/json/parser.h>
            : #include <arrow/json/reader.h>
            : #include <arrow/memory_pool.h>
            : #include <arrow/table.h>
            : #include "arrow/buffer.h"
            : #include "arrow/io/interfaces.h"
nit: could you please remove headers that are included in hdfs-json-scanner.h?


http://gerrit.cloudera.org:8080/#/c/17771/9/be/src/exec/hdfs-json-scanner.cc@179
PS9, Line 179: std::shared_ptr<arrow::DataType> convert_type(const ColumnType 
ct) {
nit: I think we need to mark this as "static". BTW, could you rename it to sth 
like "ColumnType2ArrowType"?


http://gerrit.cloudera.org:8080/#/c/17771/9/be/src/exec/hdfs-json-scanner.cc@296
PS9, Line 296: for (auto column_ : columns_)
nit: for (const auto& column : table_->columns())

Our naming convention is using "_" at the end of class field names. Local 
variable names should not end with "_".


http://gerrit.cloudera.org:8080/#/c/17771/9/be/src/exec/hdfs-json-scanner.cc@305
PS9, Line 305:       auto ar = column_->chunk(0);
             :       auto ard = ar->data();
nit: could you rename these variables? I'm confused in what "ar" means.. BTW, 
if "ar" is only used once, we don't need a variable for it.


http://gerrit.cloudera.org:8080/#/c/17771/9/be/src/exec/hdfs-json-scanner.cc@331
PS9, Line 331:         memcpy(blob_, blob->data(), blob->size());
Is this copying all the binary values of a column?


http://gerrit.cloudera.org:8080/#/c/17771/9/be/src/exec/hdfs-json-scanner.cc@336
PS9, Line 336:         if (src_len == 0) {
             :           tuple->SetNull(slot_desc->null_indicator_offset());
nit: can we move this to line 326?


http://gerrit.cloudera.org:8080/#/c/17771/9/cmake_modules/FindArrow.cmake
File cmake_modules/FindArrow.cmake:

http://gerrit.cloudera.org:8080/#/c/17771/9/cmake_modules/FindArrow.cmake@18
PS9, Line 18: # - Find Orc (headers and liborc.a) with ORC_ROOT hinting a 
location
            : # This module defines
            : #  ORC_INCLUDE_DIR, directory containing headers
            : #  ORC_STATIC_LIB, path to liborc.a
            : #  ORC_FOUND
nit: please update these comments



--
To view, visit http://gerrit.cloudera.org:8080/17771
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If79364a421d862d0d837f9be694911e388d4d629
Gerrit-Change-Number: 17771
Gerrit-PatchSet: 9
Gerrit-Owner: Anonymous Coward <[email protected]>
Gerrit-Reviewer: Aman Sinha <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Quanlong Huang <[email protected]>
Gerrit-Comment-Date: Sat, 27 Nov 2021 11:55:05 +0000
Gerrit-HasComments: Yes

Reply via email to