Hello Dimitris Tsirogiannis, Alex Behm, Dan Hecht,

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/9697

to look at the new patch set (#8).

Change subject: IMPALA-6670: refresh lib-cache entries from plan (WIP)
......................................................................

IMPALA-6670: refresh lib-cache entries from plan (WIP)

When an impalad is in executor-only mode, it receives no
catalog updates. As a result, lib-cache entries are never
refreshed. A consequence is that udf queries can return
incorrect results or may not run due to resolution issues.
Both cases are caused by the executor using a stale copy
of the lib file. For incorrect results, an old version of
the method may be used. Resolution issues can come up if
a method is added to a lib file.

The solution in this change is to capture the coordinator's
view of the lib file's last modified time when planning.
This last modified time is then shipped with the plan to
executors. Executors must then use both the lib file path
and the last modified time as a key for the lib-cache.
If the coordinator's last modified time is more recent than
the executor's lib-cache entry, then the entry is refreshed.

Brief discussion of alternatives:

- lib-cache always checks last modified time
  + easy/local change to lib-cache
  - adds an fs lookup always. rejected for this reason

- keep the last modified time in the catalog
  - bound on staleness is too loose. consider the case where
    fn's f1, f2, f3 are created with last modified times of
    t1, t2, t3. treat the fn's last modified time as a low-watermark;
    if the cache entry has a more recent time, use it. Such a scheme
    would allow the version at t2 to persist. An old fn may keep the
    state from converging to the latest. This could end up with strange
    cases where different versions of the lib are used across executors
    for a single query.

    In contrast, the change in this path relies on the statestore to
    push versions forward at all coordinators, so will push all
    versions at all caches forward as well.

Testing:
- added an e2e custom cluster test

Change-Id: Icf740ea8c6a47e671427d30b4d139cb8507b7ff6
---
M be/src/codegen/codegen-callgraph.cc
M be/src/codegen/llvm-codegen-test.cc
M be/src/codegen/llvm-codegen.cc
M be/src/codegen/llvm-codegen.h
M be/src/exec/external-data-source-executor.cc
M be/src/exprs/agg-fn.cc
M be/src/exprs/hive-udf-call.cc
M be/src/exprs/scalar-fn-call.cc
M be/src/runtime/lib-cache.cc
M be/src/runtime/lib-cache.h
M be/src/service/fe-support.cc
M common/thrift/Frontend.thrift
M common/thrift/Types.thrift
M common/thrift/generate_error_codes.py
M fe/src/main/java/org/apache/impala/analysis/Expr.java
M fe/src/main/java/org/apache/impala/analysis/FunctionCallExpr.java
M fe/src/main/java/org/apache/impala/catalog/AggregateFunction.java
M fe/src/main/java/org/apache/impala/catalog/Function.java
M fe/src/main/java/org/apache/impala/catalog/ScalarFunction.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeExprsTest.java
M testdata/bin/copy-udfs-udas.sh
M tests/custom_cluster/test_coordinators.py
M tests/test-hive-udfs/src/main/java/org/apache/impala/TestUpdateUdf.java
23 files changed, 374 insertions(+), 81 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/97/9697/8
--
To view, visit http://gerrit.cloudera.org:8080/9697
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Icf740ea8c6a47e671427d30b4d139cb8507b7ff6
Gerrit-Change-Number: 9697
Gerrit-PatchSet: 8
Gerrit-Owner: Vuk Ercegovac <vercego...@cloudera.com>
Gerrit-Reviewer: Alex Behm <alex.b...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dhe...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dtsirogian...@cloudera.com>
Gerrit-Reviewer: Vuk Ercegovac <vercego...@cloudera.com>

Reply via email to