Dimitris Tsirogiannis has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8523 )

Change subject: IMPALA-5931: Generates scan ranges in planner for s3/adls
......................................................................


Patch Set 6:

(8 comments)

http://gerrit.cloudera.org:8080/#/c/8523/6/be/src/scheduling/scheduler.cc
File be/src/scheduling/scheduler.cc:

http://gerrit.cloudera.org:8080/#/c/8523/6/be/src/scheduling/scheduler.cc@515
PS6, Line 515: int* num_generated
Why can't we get that from the 'expanded_locations' vector?


http://gerrit.cloudera.org:8080/#/c/8523/6/be/src/scheduling/scheduler.cc@530
PS6, Line 530: THdfsCompression::type compression;
             :     switch (fb_desc->compression()) {
             :       case FbCompression_NONE:
             :         compression = THdfsCompression::NONE;
             :         break;
             :       case FbCompression_DEFAULT:
             :         compression =  THdfsCompression::DEFAULT;
             :         break;
             :       case FbCompression_GZIP:
             :         compression =  THdfsCompression::GZIP;
             :         break;
             :       case FbCompression_DEFLATE:
             :         compression =  THdfsCompression::DEFLATE;
             :         break;
             :       case FbCompression_BZIP2:
             :         compression =  THdfsCompression::BZIP2;
             :         break;
             :       case FbCompression_SNAPPY:
             :         compression =  THdfsCompression::SNAPPY;
             :         break;
             :       case FbCompression_SNAPPY_BLOCKED:
             :         compression =  THdfsCompression::SNAPPY_BLOCKED;
             :         break;
             :       case FbCompression_LZO:
             :         compression =  THdfsCompression::LZO;
             :         break;
             :       case FbCompression_LZ4:
             :         compression =  THdfsCompression::LZ4;
             :         break;
             :       case FbCompression_ZLIB:
             :         compression =  THdfsCompression::ZLIB;
             :         break;
             :       default:
             :         return Status(Substitute("Invalid file descriptor 
compression code: $0",
             :                                  fb_desc->compression()));
             :     }
Why don't you create a util function to convert Fb to Thrift compression?


http://gerrit.cloudera.org:8080/#/c/8523/6/be/src/scheduling/scheduler.cc@625
PS6, Line 625: split
We use both split and scan ranges interchangeably. Can we stick to one? 
preferably scan ranges.


http://gerrit.cloudera.org:8080/#/c/8523/6/be/src/scheduling/scheduler.cc@722
PS6, Line 722: // TODO: relax this when splits are generated for files with 
block locations.
Hm, not clear what this TODO is really about. My understanding is if we decide 
to push scan generation for hdfs files, no? I would remove this TODO or keep it 
and create a JIRA for the scan generation of hdfs files. I don't think we have 
to do it in a single patch.


http://gerrit.cloudera.org:8080/#/c/8523/6/common/thrift/PlanNodes.thrift
File common/thrift/PlanNodes.thrift:

http://gerrit.cloudera.org:8080/#/c/8523/6/common/thrift/PlanNodes.thrift@205
PS6, Line 205: hdfs_file_split_spec
update name


http://gerrit.cloudera.org:8080/#/c/8523/6/common/thrift/PlanNodes.thrift@206
PS6, Line 206: .
"for filesystems that do not support blocks."


http://gerrit.cloudera.org:8080/#/c/8523/6/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
File fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java:

http://gerrit.cloudera.org:8080/#/c/8523/6/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@707
PS6, Line 707: hasBlocks
I'd prefer "supportBlocks" to avoid confusion between 'blocks not supported' 
and 'file doesn't have any blocks'.


http://gerrit.cloudera.org:8080/#/c/8523/6/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@742
PS6, Line 742: maxBlock size determine
couple of nits: 'maxBlock' and determines

Also, I think it would be nice to mention that in some cases maxBlockSize value 
can be overwritten in this function (e.g. for the case of non-splittable file 
formats).



--
To view, visit http://gerrit.cloudera.org:8080/8523
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I326065adbb2f7e632814113aae85cb51ca4779a5
Gerrit-Change-Number: 8523
Gerrit-PatchSet: 6
Gerrit-Owner: Vuk Ercegovac <[email protected]>
Gerrit-Reviewer: Dimitris Tsirogiannis <[email protected]>
Gerrit-Reviewer: Vuk Ercegovac <[email protected]>
Gerrit-Comment-Date: Thu, 18 Jan 2018 22:40:24 +0000
Gerrit-HasComments: Yes

Reply via email to