Alexander Behm created IMPALA-6625: -------------------------------------- Summary: Skip dictionary and collection conjunct assignment for non-Parquet scans. Key: IMPALA-6625 URL: https://issues.apache.org/jira/browse/IMPALA-6625 Project: IMPALA Issue Type: Improvement Components: Frontend Affects Versions: Impala 2.11.0, Impala 2.10.0, Impala 2.9.0 Reporter: Alexander Behm
In HdfsScanNode.init() we try to assign dictionary and collection conjuncts even for non-Parquet scans. Such predicates only make sense for Parquet scans, so there is no point in collecting them for other scans. The current behavior is undesirable because: * init() can be substantially slower because assigning dictionary filters may involve evaluating exprs in the BE which can be expensive * the explain plan of non-Parquet scans may have a section "parquet dictionary predicates" which is confusing/misleading Relevant code snippet from HdfsScanNode: {code} @Override public void init(Analyzer analyzer) throws ImpalaException { conjuncts_ = orderConjunctsByCost(conjuncts_); checkForSupportedFileFormats(); assignCollectionConjuncts(analyzer); computeDictionaryFilterConjuncts(analyzer); // compute scan range locations with optional sampling Set<HdfsFileFormat> fileFormats = computeScanRangeLocations(analyzer); ... if (fileFormats.contains(HdfsFileFormat.PARQUET)) { <--- assignment should go in here computeMinMaxTupleAndConjuncts(analyzer); } ... } {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)