Dimitris Tsirogiannis has posted comments on this change. ( http://gerrit.cloudera.org:8080/8523 )
Change subject: IMPALA-5931: Generates scan ranges in planner for s3/adls ...................................................................... Patch Set 6: (8 comments) http://gerrit.cloudera.org:8080/#/c/8523/6/be/src/scheduling/scheduler.cc File be/src/scheduling/scheduler.cc: http://gerrit.cloudera.org:8080/#/c/8523/6/be/src/scheduling/scheduler.cc@515 PS6, Line 515: int* num_generated Why can't we get that from the 'expanded_locations' vector? http://gerrit.cloudera.org:8080/#/c/8523/6/be/src/scheduling/scheduler.cc@530 PS6, Line 530: THdfsCompression::type compression; : switch (fb_desc->compression()) { : case FbCompression_NONE: : compression = THdfsCompression::NONE; : break; : case FbCompression_DEFAULT: : compression = THdfsCompression::DEFAULT; : break; : case FbCompression_GZIP: : compression = THdfsCompression::GZIP; : break; : case FbCompression_DEFLATE: : compression = THdfsCompression::DEFLATE; : break; : case FbCompression_BZIP2: : compression = THdfsCompression::BZIP2; : break; : case FbCompression_SNAPPY: : compression = THdfsCompression::SNAPPY; : break; : case FbCompression_SNAPPY_BLOCKED: : compression = THdfsCompression::SNAPPY_BLOCKED; : break; : case FbCompression_LZO: : compression = THdfsCompression::LZO; : break; : case FbCompression_LZ4: : compression = THdfsCompression::LZ4; : break; : case FbCompression_ZLIB: : compression = THdfsCompression::ZLIB; : break; : default: : return Status(Substitute("Invalid file descriptor compression code: $0", : fb_desc->compression())); : } Why don't you create a util function to convert Fb to Thrift compression? http://gerrit.cloudera.org:8080/#/c/8523/6/be/src/scheduling/scheduler.cc@625 PS6, Line 625: split We use both split and scan ranges interchangeably. Can we stick to one? preferably scan ranges. http://gerrit.cloudera.org:8080/#/c/8523/6/be/src/scheduling/scheduler.cc@722 PS6, Line 722: // TODO: relax this when splits are generated for files with block locations. Hm, not clear what this TODO is really about. My understanding is if we decide to push scan generation for hdfs files, no? I would remove this TODO or keep it and create a JIRA for the scan generation of hdfs files. I don't think we have to do it in a single patch. http://gerrit.cloudera.org:8080/#/c/8523/6/common/thrift/PlanNodes.thrift File common/thrift/PlanNodes.thrift: http://gerrit.cloudera.org:8080/#/c/8523/6/common/thrift/PlanNodes.thrift@205 PS6, Line 205: hdfs_file_split_spec update name http://gerrit.cloudera.org:8080/#/c/8523/6/common/thrift/PlanNodes.thrift@206 PS6, Line 206: . "for filesystems that do not support blocks." http://gerrit.cloudera.org:8080/#/c/8523/6/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java File fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java: http://gerrit.cloudera.org:8080/#/c/8523/6/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@707 PS6, Line 707: hasBlocks I'd prefer "supportBlocks" to avoid confusion between 'blocks not supported' and 'file doesn't have any blocks'. http://gerrit.cloudera.org:8080/#/c/8523/6/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@742 PS6, Line 742: maxBlock size determine couple of nits: 'maxBlock' and determines Also, I think it would be nice to mention that in some cases maxBlockSize value can be overwritten in this function (e.g. for the case of non-splittable file formats). -- To view, visit http://gerrit.cloudera.org:8080/8523 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I326065adbb2f7e632814113aae85cb51ca4779a5 Gerrit-Change-Number: 8523 Gerrit-PatchSet: 6 Gerrit-Owner: Vuk Ercegovac <[email protected]> Gerrit-Reviewer: Dimitris Tsirogiannis <[email protected]> Gerrit-Reviewer: Vuk Ercegovac <[email protected]> Gerrit-Comment-Date: Thu, 18 Jan 2018 22:40:24 +0000 Gerrit-HasComments: Yes
