Vuk Ercegovac has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8523 )

Change subject: IMPALA-5931: Generates scan ranges in planner for s3/adls
......................................................................


Patch Set 6:

(9 comments)

http://gerrit.cloudera.org:8080/#/c/8523/6/be/src/scheduling/scheduler.cc
File be/src/scheduling/scheduler.cc:

http://gerrit.cloudera.org:8080/#/c/8523/6/be/src/scheduling/scheduler.cc@515
PS6, Line 515: int* num_generated
> Why can't we get that from the 'expanded_locations' vector?
expanded_locations need not be empty, so generated ranges from more than one 
generator spec may end up here. the num generated refers to the number of 
ranges generated for this invocation (one could use size difference, but I 
thought this was clearer). updated the comment to clarify.


http://gerrit.cloudera.org:8080/#/c/8523/6/be/src/scheduling/scheduler.cc@530
PS6, Line 530: THdfsCompression::type compression;
             :     switch (fb_desc->compression()) {
             :       case FbCompression_NONE:
             :         compression = THdfsCompression::NONE;
             :         break;
             :       case FbCompression_DEFAULT:
             :         compression =  THdfsCompression::DEFAULT;
             :         break;
             :       case FbCompression_GZIP:
             :         compression =  THdfsCompression::GZIP;
             :         break;
             :       case FbCompression_DEFLATE:
             :         compression =  THdfsCompression::DEFLATE;
             :         break;
             :       case FbCompression_BZIP2:
             :         compression =  THdfsCompression::BZIP2;
             :         break;
             :       case FbCompression_SNAPPY:
             :         compression =  THdfsCompression::SNAPPY;
             :         break;
             :       case FbCompression_SNAPPY_BLOCKED:
             :         compression =  THdfsCompression::SNAPPY_BLOCKED;
             :         break;
             :       case FbCompression_LZO:
             :         compression =  THdfsCompression::LZO;
             :         break;
             :       case FbCompression_LZ4:
             :         compression =  THdfsCompression::LZ4;
             :         break;
             :       case FbCompression_ZLIB:
             :         compression =  THdfsCompression::ZLIB;
             :         break;
             :       default:
             :         return Status(Substitute("Invalid file descriptor 
compression code: $0",
             :                                  fb_desc->compression()));
             :     }
> Why don't you create a util function to convert Fb to Thrift compression?
done. moved it to util/compress.h. if there's a better place for this, let me 
know. otherwise, since we're just starting with these flatbuffer <-> thrift 
conversions, I am also ok with leaving this here for now and making a separate 
util file when we need more of these.


http://gerrit.cloudera.org:8080/#/c/8523/6/be/src/scheduling/scheduler.cc@625
PS6, Line 625: split
> We use both split and scan ranges interchangeably. Can we stick to one? pre
Done


http://gerrit.cloudera.org:8080/#/c/8523/6/be/src/scheduling/scheduler.cc@722
PS6, Line 722: // TODO: relax this when splits are generated for files with 
block locations.
> Hm, not clear what this TODO is really about. My understanding is if we dec
right, its for avoiding flat buffer deserialization/serialization between FE 
and coordinator.
removed for now.


http://gerrit.cloudera.org:8080/#/c/8523/6/common/thrift/PlanNodes.thrift
File common/thrift/PlanNodes.thrift:

http://gerrit.cloudera.org:8080/#/c/8523/6/common/thrift/PlanNodes.thrift@205
PS6, Line 205: hdfs_file_split_spec
> update name
Done


http://gerrit.cloudera.org:8080/#/c/8523/6/common/thrift/PlanNodes.thrift@206
PS6, Line 206: .
> "for filesystems that do not support blocks."
Done


http://gerrit.cloudera.org:8080/#/c/8523/4/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
File fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java:

http://gerrit.cloudera.org:8080/#/c/8523/4/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@712
PS4, Line 712:     HdfsPartition.FileDescr
> changed it to be more explicit (based on the fs).
added a check to skip the file if it has zero fileblocks (and its fs supports 
blocks).


http://gerrit.cloudera.org:8080/#/c/8523/6/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
File fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java:

http://gerrit.cloudera.org:8080/#/c/8523/6/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@707
PS6, Line 707: hasBlocks
> I'd prefer "supportBlocks" to avoid confusion between 'blocks not supported
Done


http://gerrit.cloudera.org:8080/#/c/8523/6/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@742
PS6, Line 742: maxBlock size determine
> couple of nits: 'maxBlock' and determines
Done



--
To view, visit http://gerrit.cloudera.org:8080/8523
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I326065adbb2f7e632814113aae85cb51ca4779a5
Gerrit-Change-Number: 8523
Gerrit-PatchSet: 6
Gerrit-Owner: Vuk Ercegovac <[email protected]>
Gerrit-Reviewer: Dimitris Tsirogiannis <[email protected]>
Gerrit-Reviewer: Vuk Ercegovac <[email protected]>
Gerrit-Comment-Date: Mon, 22 Jan 2018 07:49:26 +0000
Gerrit-HasComments: Yes

Reply via email to