Alexander Behm created IMPALA-6625:
--------------------------------------

             Summary: Skip dictionary and collection conjunct assignment for 
non-Parquet scans.
                 Key: IMPALA-6625
                 URL: https://issues.apache.org/jira/browse/IMPALA-6625
             Project: IMPALA
          Issue Type: Improvement
          Components: Frontend
    Affects Versions: Impala 2.11.0, Impala 2.10.0, Impala 2.9.0
            Reporter: Alexander Behm


In HdfsScanNode.init() we try to assign dictionary and collection conjuncts 
even for non-Parquet scans. Such predicates only make sense for Parquet scans, 
so there is no point in collecting them for other scans.

The current behavior is undesirable because:
* init() can be substantially slower because assigning dictionary filters may 
involve evaluating exprs in the BE which can be expensive
* the explain plan of non-Parquet scans may have a section "parquet dictionary 
predicates" which is confusing/misleading

Relevant code snippet from HdfsScanNode:
{code}
@Override
  public void init(Analyzer analyzer) throws ImpalaException {
    conjuncts_ = orderConjunctsByCost(conjuncts_);
    checkForSupportedFileFormats();

    assignCollectionConjuncts(analyzer);
    computeDictionaryFilterConjuncts(analyzer);

    // compute scan range locations with optional sampling
    Set<HdfsFileFormat> fileFormats = computeScanRangeLocations(analyzer);
...
    if (fileFormats.contains(HdfsFileFormat.PARQUET)) { <--- assignment should 
go in here
      computeMinMaxTupleAndConjuncts(analyzer);
    }
...
}
{code}




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to