Vuk Ercegovac has posted comments on this change. ( http://gerrit.cloudera.org:8080/8523 )
Change subject: IMPALA-5931: Generates scan ranges in planner for s3/adls ...................................................................... Patch Set 6: (9 comments) http://gerrit.cloudera.org:8080/#/c/8523/6/be/src/scheduling/scheduler.cc File be/src/scheduling/scheduler.cc: http://gerrit.cloudera.org:8080/#/c/8523/6/be/src/scheduling/scheduler.cc@515 PS6, Line 515: int* num_generated > Why can't we get that from the 'expanded_locations' vector? expanded_locations need not be empty, so generated ranges from more than one generator spec may end up here. the num generated refers to the number of ranges generated for this invocation (one could use size difference, but I thought this was clearer). updated the comment to clarify. http://gerrit.cloudera.org:8080/#/c/8523/6/be/src/scheduling/scheduler.cc@530 PS6, Line 530: THdfsCompression::type compression; : switch (fb_desc->compression()) { : case FbCompression_NONE: : compression = THdfsCompression::NONE; : break; : case FbCompression_DEFAULT: : compression = THdfsCompression::DEFAULT; : break; : case FbCompression_GZIP: : compression = THdfsCompression::GZIP; : break; : case FbCompression_DEFLATE: : compression = THdfsCompression::DEFLATE; : break; : case FbCompression_BZIP2: : compression = THdfsCompression::BZIP2; : break; : case FbCompression_SNAPPY: : compression = THdfsCompression::SNAPPY; : break; : case FbCompression_SNAPPY_BLOCKED: : compression = THdfsCompression::SNAPPY_BLOCKED; : break; : case FbCompression_LZO: : compression = THdfsCompression::LZO; : break; : case FbCompression_LZ4: : compression = THdfsCompression::LZ4; : break; : case FbCompression_ZLIB: : compression = THdfsCompression::ZLIB; : break; : default: : return Status(Substitute("Invalid file descriptor compression code: $0", : fb_desc->compression())); : } > Why don't you create a util function to convert Fb to Thrift compression? done. moved it to util/compress.h. if there's a better place for this, let me know. otherwise, since we're just starting with these flatbuffer <-> thrift conversions, I am also ok with leaving this here for now and making a separate util file when we need more of these. http://gerrit.cloudera.org:8080/#/c/8523/6/be/src/scheduling/scheduler.cc@625 PS6, Line 625: split > We use both split and scan ranges interchangeably. Can we stick to one? pre Done http://gerrit.cloudera.org:8080/#/c/8523/6/be/src/scheduling/scheduler.cc@722 PS6, Line 722: // TODO: relax this when splits are generated for files with block locations. > Hm, not clear what this TODO is really about. My understanding is if we dec right, its for avoiding flat buffer deserialization/serialization between FE and coordinator. removed for now. http://gerrit.cloudera.org:8080/#/c/8523/6/common/thrift/PlanNodes.thrift File common/thrift/PlanNodes.thrift: http://gerrit.cloudera.org:8080/#/c/8523/6/common/thrift/PlanNodes.thrift@205 PS6, Line 205: hdfs_file_split_spec > update name Done http://gerrit.cloudera.org:8080/#/c/8523/6/common/thrift/PlanNodes.thrift@206 PS6, Line 206: . > "for filesystems that do not support blocks." Done http://gerrit.cloudera.org:8080/#/c/8523/4/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java File fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java: http://gerrit.cloudera.org:8080/#/c/8523/4/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@712 PS4, Line 712: HdfsPartition.FileDescr > changed it to be more explicit (based on the fs). added a check to skip the file if it has zero fileblocks (and its fs supports blocks). http://gerrit.cloudera.org:8080/#/c/8523/6/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java File fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java: http://gerrit.cloudera.org:8080/#/c/8523/6/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@707 PS6, Line 707: hasBlocks > I'd prefer "supportBlocks" to avoid confusion between 'blocks not supported Done http://gerrit.cloudera.org:8080/#/c/8523/6/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@742 PS6, Line 742: maxBlock size determine > couple of nits: 'maxBlock' and determines Done -- To view, visit http://gerrit.cloudera.org:8080/8523 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I326065adbb2f7e632814113aae85cb51ca4779a5 Gerrit-Change-Number: 8523 Gerrit-PatchSet: 6 Gerrit-Owner: Vuk Ercegovac <[email protected]> Gerrit-Reviewer: Dimitris Tsirogiannis <[email protected]> Gerrit-Reviewer: Vuk Ercegovac <[email protected]> Gerrit-Comment-Date: Mon, 22 Jan 2018 07:49:26 +0000 Gerrit-HasComments: Yes
