Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/10165 )
Change subject: IMPALA-6941: load more text scanner compression plugins ...................................................................... Patch Set 6: (12 comments) http://gerrit.cloudera.org:8080/#/c/10165/6/be/src/exec/hdfs-plugin-text-scanner.h File be/src/exec/hdfs-plugin-text-scanner.h: http://gerrit.cloudera.org:8080/#/c/10165/6/be/src/exec/hdfs-plugin-text-scanner.h@63 PS6, Line 63: HdfsScanner* (*create_scanner_fn) : (HdfsScanNodeBase* scan_node, RuntimeState* state) > Maybe you could create a typedef for the function pointers, e.g.: Done http://gerrit.cloudera.org:8080/#/c/10165/6/be/src/exec/hdfs-plugin-text-scanner.h@67 PS6, Line 67: Status (*issue_initial_ranges_fn)( : HdfsScanNodeBase* scan_node, const std::vector<HdfsFileDesc*>& files) > same as above Done http://gerrit.cloudera.org:8080/#/c/10165/6/be/src/exec/hdfs-plugin-text-scanner.h@73 PS6, Line 73: SpinLock > Wouldn't be better to use some read/write lock like boost::shared_mutex? Yeah I think that makes sense. I was thinking that the critical section is probably short enough that it doesn't matter in practice but we don't have to reason about that at all with the shared_mutex. http://gerrit.cloudera.org:8080/#/c/10165/6/be/src/exec/hdfs-plugin-text-scanner.cc File be/src/exec/hdfs-plugin-text-scanner.cc: http://gerrit.cloudera.org:8080/#/c/10165/6/be/src/exec/hdfs-plugin-text-scanner.cc@42 PS6, Line 42: "(Advanced) whitelist of HDFS " : "text scanner plugins that Impala will try to dynamically load." > I think you should mention that it is comma-separated, or give an example. I briefly summarized what the plugin does (versus the builtin text parsing). I think that addresses the question but lmk if it doesn't. http://gerrit.cloudera.org:8080/#/c/10165/6/be/src/exec/hdfs-plugin-text-scanner.cc@56 PS6, Line 56: HdfsScanner* (*create_scanner_fn) (HdfsScanNodeBase* scan_node, RuntimeState* state) > You could use the typedef here. Done http://gerrit.cloudera.org:8080/#/c/10165/6/be/src/exec/hdfs-plugin-text-scanner.cc@71 PS6, Line 71: Status (*issue_initial_ranges_fn)( : HdfsScanNodeBase* scan_node, const std::vector<HdfsFileDesc*>& files) > and here Done http://gerrit.cloudera.org:8080/#/c/10165/6/be/src/exec/hdfs-plugin-text-scanner.cc@123 PS6, Line 123: "GetImpalaBuildVersion", reinterpret_cast<void**>(&get_plugin_impala_build_version))); > nit: too long line Done http://gerrit.cloudera.org:8080/#/c/10165/6/be/src/exec/hdfs-plugin-text-scanner.cc@126 PS6, Line 126: stringstream ss; : ss << "Scanner plugin " << plugin_name << " was built against Impala version " : << get_plugin_impala_build_version() << ", but the running Impala version is " : << GetDaemonBuildVersion(); > nit: I think it's a little inconsistent that we use stringstream here and S Done. Yeah I got lazy about converting this :) http://gerrit.cloudera.org:8080/#/c/10165/6/be/src/exec/hdfs-scan-node-base.cc File be/src/exec/hdfs-scan-node-base.cc: http://gerrit.cloudera.org:8080/#/c/10165/6/be/src/exec/hdfs-scan-node-base.cc@643 PS6, Line 643: =_ > nit: missing space Done http://gerrit.cloudera.org:8080/#/c/10165/6/be/src/exec/hdfs-text-scanner.cc File be/src/exec/hdfs-text-scanner.cc: http://gerrit.cloudera.org:8080/#/c/10165/6/be/src/exec/hdfs-text-scanner.cc@129 PS6, Line 129: =_ > nit: missing space Done http://gerrit.cloudera.org:8080/#/c/10165/6/fe/src/main/java/org/apache/impala/catalog/HdfsCompression.java File fe/src/main/java/org/apache/impala/catalog/HdfsCompression.java: http://gerrit.cloudera.org:8080/#/c/10165/6/fe/src/main/java/org/apache/impala/catalog/HdfsCompression.java@41 PS6, Line 41: LZO, : LZO_INDEX, //Lzo index file. : LZ4, : ZSTD; > should we make these (and e.g. file extension) be runtime parameters as wel This and related enums (THdfsCompression, the flatbuffer version) are used in a lot of places, so it would probably require a lot of changes. I don't think it's inherently hard but it would take a while to verify that the fix in each place is correct. At least now we have all the built-in Hadoop decompression codecs: https://github.com/apache/hadoop/tree/a0a276162147e843a5a4e028abdca5b66f5118da/hadoop-common-project/hadoop-common/src/main/native/src/org/apache/hadoop/io/compress so we could just extend this list when a codec is added to Hadoop (which seems to happen at most once a year). http://gerrit.cloudera.org:8080/#/c/10165/6/tests/metadata/test_partition_metadata.py File tests/metadata/test_partition_metadata.py: http://gerrit.cloudera.org:8080/#/c/10165/6/tests/metadata/test_partition_metadata.py@168 PS6, Line 168: #unique_database = 'test_db' > nit: remove this line Done -- To view, visit http://gerrit.cloudera.org:8080/10165 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: If2a9c4a4a11bed81df706e9e834400bfedfe48e6 Gerrit-Change-Number: 10165 Gerrit-PatchSet: 6 Gerrit-Owner: Tim Armstrong <[email protected]> Gerrit-Reviewer: Dan Hecht <[email protected]> Gerrit-Reviewer: Tim Armstrong <[email protected]> Gerrit-Reviewer: Zoltan Borok-Nagy <[email protected]> Gerrit-Comment-Date: Wed, 02 May 2018 22:44:26 +0000 Gerrit-HasComments: Yes
