[Impala-ASF-CR] IMPALA-6092: avoid drop/create function interactions in e2e tests

2017-11-22 Thread Vuk Ercegovac (Code Review)
Vuk Ercegovac has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8593 )

Change subject: IMPALA-6092: avoid drop/create function interactions in e2e 
tests
..


Patch Set 2:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/8593/2/tests/query_test/test_udfs.py
File tests/query_test/test_udfs.py:

http://gerrit.cloudera.org:8080/#/c/8593/2/tests/query_test/test_udfs.py@422
PS2, Line 422: check_call(["hadoop", "fs", "-put", "-f", src_udf_path, 
tgt_udf_path])
> it does get left around (as with the other tests). perhaps a better pattern
changed the paths so that these tmp jar get cleaned up. a new dir is created 
for each test's db. I put the jar in that db dir (as discussed) and verified 
that the dirs + files are removed (useful for dev env where we re-use the 
state).



--
To view, visit http://gerrit.cloudera.org:8080/8593
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ica3538788b1d2ab5e361261e2ade62780b838e65
Gerrit-Change-Number: 8593
Gerrit-PatchSet: 2
Gerrit-Owner: Vuk Ercegovac 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Vuk Ercegovac 
Gerrit-Comment-Date: Wed, 22 Nov 2017 23:18:34 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-1422: support a constant on LHS of IN predicates.

2017-11-30 Thread Vuk Ercegovac (Code Review)
Vuk Ercegovac has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8322 )

Change subject: IMPALA-1422: support a constant on LHS of IN predicates.
..


Patch Set 8:

(24 comments)

http://gerrit.cloudera.org:8080/#/c/8322/8/fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java
File fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java:

http://gerrit.cloudera.org:8080/#/c/8322/8/fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java@326
PS8, Line 326:*condition using the LHS constant. The rewrite handles 
group by,
> Not clear what "handles" here means. It doesn't really do anything special
thanks for the suggestion.. re-worded.


http://gerrit.cloudera.org:8080/#/c/8322/8/fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java@327
PS8, Line 327:*analytic functions, limit, and uncorrelated subqueries. 
Correlated subqueries
> Move last sentence into a TODO like this:
Done


http://gerrit.cloudera.org:8080/#/c/8322/8/fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java@330
PS8, Line 330:*TODO: handle correlated NOT IN case
> remove (see above comment)
Done


http://gerrit.cloudera.org:8080/#/c/8322/8/fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java@340
PS8, Line 340:*So, if C is equal to any x_i, the expression is false. 
Similarly, if any
> Nice description. I'm wondering if we directly translate this insight into
yup, simplified.


http://gerrit.cloudera.org:8080/#/c/8322/8/fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java@355
PS8, Line 355: Preconditions.checkArgument(rhs.contains(Subquery.class));
> not needed because L356-357 performs the same check
Done


http://gerrit.cloudera.org:8080/#/c/8322/8/fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java@398
PS8, Line 398: newSubquery.analyze(rhsQuery.getAnalyzer());
> This seems wrong. It might happen to work but pretty sure this will eventua
Done


http://gerrit.cloudera.org:8080/#/c/8322/8/fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java@714
PS8, Line 714: if (isCorrelatedPredicate(subqueryStmt.getWhereClause(), 
subqueryTupleIds)) {
> containsCorrelatedPredicate()?
yes, good catch. further simplified.


http://gerrit.cloudera.org:8080/#/c/8322/8/fe/src/test/java/org/apache/impala/analysis/AnalyzeSubqueriesTest.java
File fe/src/test/java/org/apache/impala/analysis/AnalyzeSubqueriesTest.java:

http://gerrit.cloudera.org:8080/#/c/8322/8/fe/src/test/java/org/apache/impala/analysis/AnalyzeSubqueriesTest.java@277
PS8, Line 277:   // NOT IN subquery with a correlated non-equi predicate is 
ok if the subquery only
> [NOT] IN subquery
Done


http://gerrit.cloudera.org:8080/#/c/8322/8/fe/src/test/java/org/apache/impala/analysis/AnalyzeSubqueriesTest.java@292
PS8, Line 292: AnalyzesOk(String.format("select 1 from 
functional.allcomplextypes t " +
> What about queries without complex types and non-constant LHS in the IN pre
there's several above, for example L220 has a non-constant LHS and the types 
are not complex. L92 loops over a bunch of scalar types (and non-constant LHS).


http://gerrit.cloudera.org:8080/#/c/8322/8/fe/src/test/java/org/apache/impala/analysis/AnalyzeSubqueriesTest.java@299
PS8, Line 299:   AnalysisError(String.format("select * from 
functional.alltypestiny t where id %s " +
> These tests are pretty rough to follow, perhaps we should spent some time b
makes sense. the cut in this patch is just to get the tests to exercise the 
various branches more consistently where applicable.


http://gerrit.cloudera.org:8080/#/c/8322/8/fe/src/test/java/org/apache/impala/analysis/AnalyzeSubqueriesTest.java@437
PS8, Line 437:   // Tests for constant on the left hand side of [NOT] IN.
> Since we are doing a double rewrite here, it's interesting to test nested s
separated these out. I looked at making constant LHS just another dimension of 
the existing tests but I think a more comprehensive clean up here is better as 
a separate change. for now, I've added several more tests.


http://gerrit.cloudera.org:8080/#/c/8322/8/fe/src/test/java/org/apache/impala/analysis/AnalyzeSubqueriesTest.java@476
PS8, Line 476:
> You can omit this part if you like. Error message matching is prefix based.
Done


http://gerrit.cloudera.org:8080/#/c/8322/8/fe/src/test/java/org/apache/impala/analysis/AnalyzeSubqueriesTest.java@482
PS8, Line 482:   "SELECT int_col FROM functional.alltypestiny b 
WHERE b.id = a.id ORDER BY int_col ASC LIMIT 5");
> long line, you can omit this line in the expected error message
Done


http://gerrit.cloudera.org:8080/#/c/8322/8/fe/src/test/java/org/apache/impala/analysis/AnalyzeSubqueriesTest.java@520
PS8, Line 520: + "(select * from functional.tinyinttable x where t.id = 
x.int_col)");
> please move "+" to previous line (easier to read and more consistent with e
Done



[Impala-ASF-CR] IMPALA-1422: support a constant on LHS of IN predicates.

2017-11-30 Thread Vuk Ercegovac (Code Review)
Hello Dimitris Tsirogiannis, Alex Behm,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/8322

to look at the new patch set (#9).

Change subject: IMPALA-1422: support a constant on LHS of IN predicates.
..

IMPALA-1422: support a constant on LHS of IN predicates.

Currently, constant expressions for the LHS of the IN predicate
are not supported. This patch adds this support as a rewrite in
StmtRewriter (where subqueries are rewritten to joins). Since
there is a nested-loop variant of left semijoin, support for IN
is handled by not erring out. NOT IN is handled by a rewrite to
corresponding NOT EXISTS predicate. Support for NOT IN with a
correlated subquery is not included in this change.

Re-organized the frontend subquery analysis tests to expand coverage.

Testing:
- added frontend subquery analysis tests
- added e2e tests

Change-Id: I0d69889a3c72e90be9d4ccf47d2816819ae32acb
---
M fe/src/main/java/org/apache/impala/analysis/InPredicate.java
M fe/src/main/java/org/apache/impala/analysis/SlotRef.java
M fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeSubqueriesTest.java
M 
testdata/workloads/functional-planner/queries/PlannerTest/subquery-rewrite.test
A 
testdata/workloads/functional-query/queries/QueryTest/subquery-in-constant-lhs.test
M tests/query_test/test_queries.py
7 files changed, 903 insertions(+), 203 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/22/8322/9
--
To view, visit http://gerrit.cloudera.org:8080/8322
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I0d69889a3c72e90be9d4ccf47d2816819ae32acb
Gerrit-Change-Number: 8322
Gerrit-PatchSet: 9
Gerrit-Owner: Vuk Ercegovac 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Vuk Ercegovac 


[Impala-ASF-CR] IMPALA-1422: support a constant on LHS of IN predicates.

2017-11-30 Thread Vuk Ercegovac (Code Review)
Vuk Ercegovac has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8322 )

Change subject: IMPALA-1422: support a constant on LHS of IN predicates.
..


Patch Set 9:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/8322/9/fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java
File fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java:

http://gerrit.cloudera.org:8080/#/c/8322/9/fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java@368
PS9, Line 368: String wrapperAlias = 
rhsQuery.getTableAliasGenerator().getNextAlias();
> Please address this alias generation issue as discussed.
done. I create generators externally now since I can't create the outer-block 
before I've created this view. Then I pass them to the outer-block after its 
created.


http://gerrit.cloudera.org:8080/#/c/8322/9/fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java@401
PS9, Line 401:
> remove, no point in having a blank line here
Done



--
To view, visit http://gerrit.cloudera.org:8080/8322
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0d69889a3c72e90be9d4ccf47d2816819ae32acb
Gerrit-Change-Number: 8322
Gerrit-PatchSet: 9
Gerrit-Owner: Vuk Ercegovac 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Vuk Ercegovac 
Gerrit-Comment-Date: Fri, 01 Dec 2017 06:50:47 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-1422: support a constant on LHS of IN predicates.

2017-11-30 Thread Vuk Ercegovac (Code Review)
Hello Dimitris Tsirogiannis, Alex Behm,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/8322

to look at the new patch set (#10).

Change subject: IMPALA-1422: support a constant on LHS of IN predicates.
..

IMPALA-1422: support a constant on LHS of IN predicates.

Currently, constant expressions for the LHS of the IN predicate
are not supported. This patch adds this support as a rewrite in
StmtRewriter (where subqueries are rewritten to joins). Since
there is a nested-loop variant of left semijoin, support for IN
is handled by not erring out. NOT IN is handled by a rewrite to
corresponding NOT EXISTS predicate. Support for NOT IN with a
correlated subquery is not included in this change.

Re-organized the frontend subquery analysis tests to expand coverage.

Testing:
- added frontend subquery analysis tests
- added e2e tests

Change-Id: I0d69889a3c72e90be9d4ccf47d2816819ae32acb
---
M fe/src/main/java/org/apache/impala/analysis/InPredicate.java
M fe/src/main/java/org/apache/impala/analysis/SelectStmt.java
M fe/src/main/java/org/apache/impala/analysis/SlotRef.java
M fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeSubqueriesTest.java
M 
testdata/workloads/functional-planner/queries/PlannerTest/subquery-rewrite.test
A 
testdata/workloads/functional-query/queries/QueryTest/subquery-in-constant-lhs.test
M tests/query_test/test_queries.py
8 files changed, 921 insertions(+), 203 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/22/8322/10
--
To view, visit http://gerrit.cloudera.org:8080/8322
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I0d69889a3c72e90be9d4ccf47d2816819ae32acb
Gerrit-Change-Number: 8322
Gerrit-PatchSet: 10
Gerrit-Owner: Vuk Ercegovac 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Vuk Ercegovac 


[Impala-ASF-CR] IMPALA-4168: Adopt Oracle-style hint placement for INSERT/UPSERT

2017-12-12 Thread Vuk Ercegovac (Code Review)
Vuk Ercegovac has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8676 )

Change subject: IMPALA-4168: Adopt Oracle-style hint placement for INSERT/UPSERT
..


Patch Set 2:

(4 comments)

thanks for the updates! couple of comments to round out the testing and improve 
the test refactor.

http://gerrit.cloudera.org:8080/#/c/8676/2//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/8676/2//COMMIT_MSG@7
PS2, Line 7: Adopt
The jira says "adopt", but we're really not making this style the default. This 
change adds an additional  hinting style to coexist with the current placement. 
I'd just say:
Adds Oracle-style hint placement for INSERT/UPSERT

I'd also add a simple example to illustrate the new placement and update the 
testing section as needed pending the other comments.


http://gerrit.cloudera.org:8080/#/c/8676/2/fe/src/test/java/org/apache/impala/analysis/ParserTest.java
File fe/src/test/java/org/apache/impala/analysis/ParserTest.java:

http://gerrit.cloudera.org:8080/#/c/8676/2/fe/src/test/java/org/apache/impala/analysis/ParserTest.java@44
PS2, Line 44:   
These tests confirm that the oracle style hints are indeed added as expected 
for inserts/select and upsert.
Looking at https://gerrit.sjc.cloudera.com/#/c/4235/, which added additional 
hinting styles, we should also add the following tests:

- to-sql: test/.../analysis/ToSqlTest
   [this patch will print the hint always in the default location.
Look at L916 of main/.../analysis/InsertStmt. It should be an
isolated change to modify toSql to print in the user specified
location... will need a member to record the location style,
an additional constructor arg, and modified copy constructor)]
- end-to-end test: 
testdata/workloads/functional-planner/queries/PlannerTest/insert.test (look at 
L450)


http://gerrit.cloudera.org:8080/#/c/8676/2/fe/src/test/java/org/apache/impala/analysis/ParserTest.java@410
PS2, Line 410: "shuffle"
This difference is preventing you from having one loop and factoring lines such 
as L381-383. After looking at this, I'd go with the other suggestion I had to 
adjust TestInsertHints to include the loop over hint placement.
You can adjust the signature of TestInsertHints to be:
TestInsertHints(String pattern, String hint, String... expectedHints)
The pattern input can just be the string you'd pass in to String.format (so do 
the String.format in TestInsertHint).
The hint can be what you currently have where the prefix/suffix are varied.


http://gerrit.cloudera.org:8080/#/c/8676/2/fe/src/test/java/org/apache/impala/analysis/ParserTest.java@426
PS2, Line 426: ParsesOk
pls add a TestUpsertHints method that is similar to TestInsertHints so that it 
gets equivalent test coverage as TestInsertHints (note that ParsesOk does not 
verify that hints are actually attached to the statement).



--
To view, visit http://gerrit.cloudera.org:8080/8676
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ied7629d70197a0270cdc0853e00cc021fdb4dc20
Gerrit-Change-Number: 8676
Gerrit-PatchSet: 2
Gerrit-Owner: Kim Jin Chul 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: John Russell 
Gerrit-Reviewer: Kim Jin Chul 
Gerrit-Reviewer: Vuk Ercegovac 
Gerrit-Comment-Date: Tue, 12 Dec 2017 18:10:41 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5931: Generates scan ranges in planner for s3/adls (wip)

2017-12-13 Thread Vuk Ercegovac (Code Review)
Vuk Ercegovac has uploaded a new patch set (#2). ( 
http://gerrit.cloudera.org:8080/8523 )

Change subject: IMPALA-5931: Generates scan ranges in planner for s3/adls (wip)
..

IMPALA-5931: Generates scan ranges in planner for s3/adls (wip)

Currently, for filesystems that do not include physical
block information (e.g., block replica locations, caching),
synthetic blocks are generated and stored in the catalog
when metadata is loaded. Example file systems for which this is done
includes S3, ADLS, and local fs.

This change avoids generating these blocks when metadata is loaded.
Instead, scan ranges are directly generated from such files by the
backend coordinator. Previously, all scan ranges were produced by
the planner in HDFSScanNode in the frontend. Now, those files without
block information are sent to the coordinator represented by a split
specification that determines how the coordinator will create scan ranges
to send to executors.

This change reduces the space needed in the catalog and reduces the
scan range datastructures that are passed from the frontend to the
backend when planning and coordinating a query.
In addition a bug is avoided where non-splittable files were being
split anyways to support the query parameter that places a limit on
scan ranges.

The "wip" status is currently to add/run more tests.

Testing:
- local filesystem tests exercise this code path
- manually tried larger local filesystem tables (tpch) with multiple
  partitions and observed the same scan ranges.
  - TODO: s3 and adls testing

Change-Id: I326065adbb2f7e632814113aae85cb51ca4779a5
---
M be/src/scheduling/scheduler.cc
M common/thrift/PlanNodes.thrift
M fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
5 files changed, 282 insertions(+), 165 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/23/8523/2
--
To view, visit http://gerrit.cloudera.org:8080/8523
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I326065adbb2f7e632814113aae85cb51ca4779a5
Gerrit-Change-Number: 8523
Gerrit-PatchSet: 2
Gerrit-Owner: Vuk Ercegovac 


[Impala-ASF-CR] IMPALA-4993: extend dictionary filtering to collections

2017-12-13 Thread Vuk Ercegovac (Code Review)
Hello Lars Volker,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/8775

to look at the new patch set (#6).

Change subject: IMPALA-4993: extend dictionary filtering to collections
..

IMPALA-4993: extend dictionary filtering to collections

Currently, top-level scalar columns in parquet files can
be used at runtime to prune row-groups by evaluating certain
conjuncts over the column's dictionary (if available).

This change extends such pruning to scalar values that are
stored in collection type columns. Currently, dictionary
pruning works by finding eligible conjuncts for top-level
slots. Since only top-level slots are supported, the slots
are implicitly part of the scan node's tuple descriptor.
With this change, we track eligible conjuncts by slot as well
as the tuple that contains the slot (either top-level or
nested collection). Since collection conjuncts are already
managed by a map that associates tuple descriptors to a list
of their conjuncts, this extension follows the existing
representation.

The frontend builds the mapping of  to
conjuncts that are dictionary filterable. The backend is
adjusted to use the same representation. In addition, collection
readers are decomposed into scalar filterable columns and
other, non-dictionary filterable readers. When filtering
a row group using a conjunct associated to a (possibly)
nested collection type, an additional tuple buffer is
allocated per tuple descriptor.

Testing:
- e2e test extended to illustrate row-groups that are pruned
  by nested collection dictionary filters.

Change-Id: If3a2abcfc3d0f7d18756816659fed77ce12668dd
---
M be/src/exec/hdfs-parquet-scanner.cc
M be/src/exec/hdfs-parquet-scanner.h
M be/src/exec/hdfs-scan-node-base.cc
M be/src/exec/hdfs-scan-node-base.h
M be/src/exec/hdfs-scanner.cc
M be/src/exec/hdfs-scanner.h
M be/src/runtime/collection-value-builder.h
M common/thrift/PlanNodes.thrift
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
M 
testdata/workloads/functional-planner/queries/PlannerTest/constant-folding.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/mt-dop-validation.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/parquet-filtering.test
M testdata/workloads/functional-query/queries/QueryTest/parquet-filtering.test
13 files changed, 652 insertions(+), 201 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/75/8775/6
--
To view, visit http://gerrit.cloudera.org:8080/8775
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: If3a2abcfc3d0f7d18756816659fed77ce12668dd
Gerrit-Change-Number: 8775
Gerrit-PatchSet: 6
Gerrit-Owner: Vuk Ercegovac 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Vuk Ercegovac 


[Impala-ASF-CR] IMPALA-4993: extend dictionary filtering to collections

2017-12-13 Thread Vuk Ercegovac (Code Review)
Vuk Ercegovac has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8775 )

Change subject: IMPALA-4993: extend dictionary filtering to collections
..


Patch Set 5:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/8775/5/be/src/exec/hdfs-parquet-scanner.cc
File be/src/exec/hdfs-parquet-scanner.cc:

http://gerrit.cloudera.org:8080/#/c/8775/5/be/src/exec/hdfs-parquet-scanner.cc@799
PS5, Line 799:   back_inserter(dict_filterable_readers_));
> nit: We usually omit the std:: in cc files. Also these (and other places be
done. found some other uses so made this file consistent.



--
To view, visit http://gerrit.cloudera.org:8080/8775
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If3a2abcfc3d0f7d18756816659fed77ce12668dd
Gerrit-Change-Number: 8775
Gerrit-PatchSet: 5
Gerrit-Owner: Vuk Ercegovac 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Vuk Ercegovac 
Gerrit-Comment-Date: Thu, 14 Dec 2017 01:06:24 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4168: Adds Oracle-style hint placement for INSERT/UPSERT

2017-12-13 Thread Vuk Ercegovac (Code Review)
Vuk Ercegovac has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8676 )

Change subject: IMPALA-4168: Adds Oracle-style hint placement for INSERT/UPSERT
..


Patch Set 7: Code-Review+1

lgtm, thanks for the changes!

alex, please take a look.


--
To view, visit http://gerrit.cloudera.org:8080/8676
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ied7629d70197a0270cdc0853e00cc021fdb4dc20
Gerrit-Change-Number: 8676
Gerrit-PatchSet: 7
Gerrit-Owner: Kim Jin Chul 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: John Russell 
Gerrit-Reviewer: Kim Jin Chul 
Gerrit-Reviewer: Vuk Ercegovac 
Gerrit-Comment-Date: Thu, 14 Dec 2017 01:54:56 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4168: Adds Oracle-style hint placement for INSERT/UPSERT

2017-12-13 Thread Vuk Ercegovac (Code Review)
Vuk Ercegovac has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8676 )

Change subject: IMPALA-4168: Adds Oracle-style hint placement for INSERT/UPSERT
..


Patch Set 6:

(3 comments)

several more follow-up comments and we should be good. thx!

http://gerrit.cloudera.org:8080/#/c/8676/5//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/8676/5//COMMIT_MSG@12
PS5, Line 12: rebuild
omit "query rebuild"


http://gerrit.cloudera.org:8080/#/c/8676/5/fe/src/main/java/org/apache/impala/analysis/InsertStmt.java
File fe/src/main/java/org/apache/impala/analysis/InsertStmt.java:

http://gerrit.cloudera.org:8080/#/c/8676/5/fe/src/main/java/org/apache/impala/analysis/InsertStmt.java@56
PS5, Line 56: AtStatement
argh... I was inconsistent with my suggestion and see it here now. I have 
Start/End vs. AtStatement/AtQuery. I have a slight preference for the latter 
since its more precise. I'll let you decide that one. But it should be 
consistent.


http://gerrit.cloudera.org:8080/#/c/8676/5/fe/src/test/java/org/apache/impala/analysis/ParserTest.java
File fe/src/test/java/org/apache/impala/analysis/ParserTest.java:

http://gerrit.cloudera.org:8080/#/c/8676/5/fe/src/test/java/org/apache/impala/analysis/ParserTest.java@310
PS5, Line 310: actualPlanHints, String... expectedHints) {
 : List actualHints = Lists.newArrayList();
 : for (PlanHint hint: actualPlanHints) actualHints
there's some shadowing here (actualHints is a parameter and local variable)-- 
does it do what you intend? perhaps to make it clearer, call the local one 
"stringHints".



--
To view, visit http://gerrit.cloudera.org:8080/8676
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ied7629d70197a0270cdc0853e00cc021fdb4dc20
Gerrit-Change-Number: 8676
Gerrit-PatchSet: 6
Gerrit-Owner: Kim Jin Chul 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: John Russell 
Gerrit-Reviewer: Kim Jin Chul 
Gerrit-Reviewer: Vuk Ercegovac 
Gerrit-Comment-Date: Thu, 14 Dec 2017 01:33:06 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4993: extend dictionary filtering to collections

2017-12-13 Thread Vuk Ercegovac (Code Review)
Vuk Ercegovac has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8775 )

Change subject: IMPALA-4993: extend dictionary filtering to collections
..


Patch Set 6: Code-Review+1

Carry +1 from Lars


--
To view, visit http://gerrit.cloudera.org:8080/8775
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If3a2abcfc3d0f7d18756816659fed77ce12668dd
Gerrit-Change-Number: 8775
Gerrit-PatchSet: 6
Gerrit-Owner: Vuk Ercegovac 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Vuk Ercegovac 
Gerrit-Comment-Date: Thu, 14 Dec 2017 01:08:22 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4168: Adds Oracle-style hint placement for INSERT/UPSERT

2017-12-13 Thread Vuk Ercegovac (Code Review)
Vuk Ercegovac has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8676 )

Change subject: IMPALA-4168: Adds Oracle-style hint placement for INSERT/UPSERT
..


Patch Set 4:

(20 comments)

thanks for the changes! the refactor for the tests looks great. most follow-up 
comments are oriented around naming.

http://gerrit.cloudera.org:8080/#/c/8676/4/fe/src/main/cup/sql-parser.cup
File fe/src/main/cup/sql-parser.cup:

http://gerrit.cloudera.org:8080/#/c/8676/4/fe/src/main/cup/sql-parser.cup@717
PS4, Line 717: Please beware of that the both of Oracle and default hint styles 
are supported when
 : // extending INSERT/UPSERT syntax.
Replace with: "Note: when extending INSERT/UPSERT syntax, hinting is supported 
at the beginning of the statement and before the query."


http://gerrit.cloudera.org:8080/#/c/8676/4/fe/src/main/cup/sql-parser.cup@807
PS4, Line 807: Please beware of that the both of Oracle and default hint styles 
are supported when
 : // extending INSERT/UPSERT syntax.
make this consistent with the proposal on L717.


http://gerrit.cloudera.org:8080/#/c/8676/4/fe/src/main/java/org/apache/impala/analysis/CreateTableAsSelectStmt.java
File fe/src/main/java/org/apache/impala/analysis/CreateTableAsSelectStmt.java:

http://gerrit.cloudera.org:8080/#/c/8676/4/fe/src/main/java/org/apache/impala/analysis/CreateTableAsSelectStmt.java@86
PS4, Line 86: InsertStmt.HintLocation.DefaultLoc
more intuitive to pass in null here (what does DefaultLoc for no hints mean?)


http://gerrit.cloudera.org:8080/#/c/8676/4/fe/src/main/java/org/apache/impala/analysis/InsertStmt.java
File fe/src/main/java/org/apache/impala/analysis/InsertStmt.java:

http://gerrit.cloudera.org:8080/#/c/8676/4/fe/src/main/java/org/apache/impala/analysis/InsertStmt.java@56
PS4, Line 56:   // A flag to determine the location of a hint
This is a good place to describe the alternatives, including examples. Also, 
I'd use more descriptive names ("oracle" and "default" don't convey much). How 
about the following:

// Determines the location of optional hints. The AtStatment option
// is motivated by Oracle's hint placement at the start of the
// statement and the AtQuery option places the hint right before
// the query (if specified).
//
// Examples:
//   Start: INSERT /* my hint */  ... SELECT ...
//   End:   INSERT  /* my hint */ SELECT ...
public enum HintLocation { Start, End };


http://gerrit.cloudera.org:8080/#/c/8676/4/fe/src/main/java/org/apache/impala/analysis/InsertStmt.java@210
PS4, Line 210: hintLoc_ = hintLoc;
can a caller pass in null? how about we pick the default (current placement) if 
null?


http://gerrit.cloudera.org:8080/#/c/8676/4/fe/src/main/java/org/apache/impala/analysis/InsertStmt.java@908
PS4, Line 908: hintLoc_ == HintLocation.OracleLoc && !planHints_.isEmpty()
I would flip these tests around (test for whether we have hints, then test the 
hint location).


http://gerrit.cloudera.org:8080/#/c/8676/4/fe/src/test/java/org/apache/impala/analysis/ParserTest.java
File fe/src/test/java/org/apache/impala/analysis/ParserTest.java:

http://gerrit.cloudera.org:8080/#/c/8676/4/fe/src/test/java/org/apache/impala/analysis/ParserTest.java@303
PS4, Line 303: BuildQueryStmt
more specific: rename to InjectInsertHint


http://gerrit.cloudera.org:8080/#/c/8676/4/fe/src/test/java/org/apache/impala/analysis/ParserTest.java@310
PS4, Line 310: hints
rename to actualHints


http://gerrit.cloudera.org:8080/#/c/8676/4/fe/src/test/java/org/apache/impala/analysis/ParserTest.java@310
PS4, Line 310: TestHints
rename to VerifyHints


http://gerrit.cloudera.org:8080/#/c/8676/4/fe/src/test/java/org/apache/impala/analysis/ParserTest.java@311
PS4, Line 311: actualHints
rename to actualStringHints


http://gerrit.cloudera.org:8080/#/c/8676/4/fe/src/test/java/org/apache/impala/analysis/ParserTest.java@318
PS4, Line 318: Parses stmt and checks that the insert hints stmt are the 
expected hints.
How about:
Injects hints into pattern and checks that the injected hints match the 
expected hints.


http://gerrit.cloudera.org:8080/#/c/8676/4/fe/src/test/java/org/apache/impala/analysis/ParserTest.java@320
PS4, Line 320: TestInsertHints
rename this to: TestInsertStmtHints... our InsertStmt covers both inserts and 
upserts. Update the comment to state that its used for both inserts and updates.


http://gerrit.cloudera.org:8080/#/c/8676/4/fe/src/test/java/org/apache/impala/analysis/ParserTest.java@327
PS4, Line 327: /**
 :* Parses stmt and checks that the upsert hints stmt are the 
expected hints.
 :*/
 :   private void TestUpsertHints(String pattern, String hint, 
String... expectedHints) {
 : TestInsertHints(pattern, hint, expectedHints);
 :   }
remove-- not needed.


http://gerrit.cloudera.org:8080/#/c/8676/4/fe/src/test/java/org/apache/impala/analysis/ParserTest.java@396
PS4, Line 396: 

[Impala-ASF-CR] IMPALA-4993: extend dictionary filtering to collections

2017-12-12 Thread Vuk Ercegovac (Code Review)
Vuk Ercegovac has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8775 )

Change subject: IMPALA-4993: extend dictionary filtering to collections
..


Patch Set 3:

updating two more tests that core-tests found where additional dictionary 
predicates are used in plans.


--
To view, visit http://gerrit.cloudera.org:8080/8775
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If3a2abcfc3d0f7d18756816659fed77ce12668dd
Gerrit-Change-Number: 8775
Gerrit-PatchSet: 3
Gerrit-Owner: Vuk Ercegovac 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Vuk Ercegovac 
Gerrit-Comment-Date: Wed, 13 Dec 2017 00:19:52 +
Gerrit-HasComments: No


[Impala-ASF-CR] Revert "IMPALA-3887: Wait for HDFS replication in data loading"

2017-12-19 Thread Vuk Ercegovac (Code Review)
Vuk Ercegovac has abandoned this change. ( http://gerrit.cloudera.org:8080/8879 
)

Change subject: Revert "IMPALA-3887: Wait for HDFS replication in data loading"
..


Abandoned

going w/ dave's patch
--
To view, visit http://gerrit.cloudera.org:8080/8879
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: abandon
Gerrit-Change-Id: I4318b4fcb94aa67ac55692ae5b961915e038
Gerrit-Change-Number: 8879
Gerrit-PatchSet: 1
Gerrit-Owner: Vuk Ercegovac 


[Impala-ASF-CR] Revert "IMPALA-3887: Wait for HDFS replication in data loading"

2017-12-19 Thread Vuk Ercegovac (Code Review)
Vuk Ercegovac has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/8879


Change subject: Revert "IMPALA-3887: Wait for HDFS replication in data loading"
..

Revert "IMPALA-3887: Wait for HDFS replication in data loading"

Using fsck on non-hdfs file systems causes errors.

This reverts commit 5a7c10ec3d500c1bb582e985cb299e7dbcfe5520.

Change-Id: I4318b4fcb94aa67ac55692ae5b961915e038
---
M testdata/bin/create-load-data.sh
1 file changed, 0 insertions(+), 19 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/79/8879/1
--
To view, visit http://gerrit.cloudera.org:8080/8879
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I4318b4fcb94aa67ac55692ae5b961915e038
Gerrit-Change-Number: 8879
Gerrit-PatchSet: 1
Gerrit-Owner: Vuk Ercegovac 


[Impala-ASF-CR] IMPALA-4993: extend dictionary filtering to collections

2017-12-13 Thread Vuk Ercegovac (Code Review)
Hello Lars Volker,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/8775

to look at the new patch set (#5).

Change subject: IMPALA-4993: extend dictionary filtering to collections
..

IMPALA-4993: extend dictionary filtering to collections

Currently, top-level scalar columns in parquet files can
be used at runtime to prune row-groups by evaluating certain
conjuncts over the column's dictionary (if available).

This change extends such pruning to scalar values that are
stored in collection type columns. Currently, dictionary
pruning works by finding eligible conjuncts for top-level
slots. Since only top-level slots are supported, the slots
are implicitly part of the scan node's tuple descriptor.
With this change, we track eligible conjuncts by slot as well
as the tuple that contains the slot (either top-level or
nested collection). Since collection conjuncts are already
managed by a map that associates tuple descriptors to a list
of their conjuncts, this extension follows the existing
representation.

The frontend builds the mapping of  to
conjuncts that are dictionary filterable. The backend is
adjusted to use the same representation. In addition, collection
readers are decomposed into scalar filterable columns and
other, non-dictionary filterable readers. When filtering
a row group using a conjunct associated to a (possibly)
nested collection type, an additional tuple buffer is
allocated per tuple descriptor.

Testing:
- e2e test extended to illustrate row-groups that are pruned
  by nested collection dictionary filters.

Change-Id: If3a2abcfc3d0f7d18756816659fed77ce12668dd
---
M be/src/exec/hdfs-parquet-scanner.cc
M be/src/exec/hdfs-parquet-scanner.h
M be/src/exec/hdfs-scan-node-base.cc
M be/src/exec/hdfs-scan-node-base.h
M be/src/exec/hdfs-scanner.cc
M be/src/exec/hdfs-scanner.h
M be/src/runtime/collection-value-builder.h
M common/thrift/PlanNodes.thrift
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
M 
testdata/workloads/functional-planner/queries/PlannerTest/constant-folding.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/mt-dop-validation.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/parquet-filtering.test
M testdata/workloads/functional-query/queries/QueryTest/parquet-filtering.test
13 files changed, 650 insertions(+), 198 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/75/8775/5
--
To view, visit http://gerrit.cloudera.org:8080/8775
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: If3a2abcfc3d0f7d18756816659fed77ce12668dd
Gerrit-Change-Number: 8775
Gerrit-PatchSet: 5
Gerrit-Owner: Vuk Ercegovac 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Vuk Ercegovac 


[Impala-ASF-CR] IMPALA-4993: extend dictionary filtering to collections

2017-12-13 Thread Vuk Ercegovac (Code Review)
Vuk Ercegovac has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8775 )

Change subject: IMPALA-4993: extend dictionary filtering to collections
..


Patch Set 4:

(8 comments)

http://gerrit.cloudera.org:8080/#/c/8775/4/be/src/exec/hdfs-scan-node-base.h
File be/src/exec/hdfs-scan-node-base.h:

http://gerrit.cloudera.org:8080/#/c/8775/4/be/src/exec/hdfs-scan-node-base.h@90
PS4, Line 90:   /// sequence-based file
> This looks like a rebase artifact. I often look at diffs between patchsets
ok, I think I ran into a conflict so wanted to merge it.


http://gerrit.cloudera.org:8080/#/c/8775/4/be/src/exec/hdfs-scanner.cc
File be/src/exec/hdfs-scanner.cc:

http://gerrit.cloudera.org:8080/#/c/8775/4/be/src/exec/hdfs-scanner.cc@90
PS4, Line 90: std::
> We usually drop the std:: prefix in cc files.
Done


http://gerrit.cloudera.org:8080/#/c/8775/4/be/src/exec/hdfs-scanner.cc@96
PS4, Line 96: std::
> same here.
Done


http://gerrit.cloudera.org:8080/#/c/8775/1/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
File fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java:

http://gerrit.cloudera.org:8080/#/c/8775/1/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@656
PS1, Line 656: // Check to see if the conjunct evaluates to true when the 
slot is NULL
> nit: conjuncts still lacks the _ :)
whoops, done.


http://gerrit.cloudera.org:8080/#/c/8775/4/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
File fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java:

http://gerrit.cloudera.org:8080/#/c/8775/4/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@111
PS4, Line 111: the
> Duplicate word
Done


http://gerrit.cloudera.org:8080/#/c/8775/4/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@1099
PS4, Line 1099: output.append(detailPrefix + "parquet statistics 
predicates: " +
> This one compiles the string differently than the others, probably I did th
sure, and found one more place.


http://gerrit.cloudera.org:8080/#/c/8775/4/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@1116
PS4, Line 1116: List totalIdxList = Lists.newArrayList();
> This is more of a guess, but can't you pass entry.getValue() to newArrayLis
indeed. done.


http://gerrit.cloudera.org:8080/#/c/8775/1/testdata/workloads/functional-query/queries/QueryTest/parquet-filtering.test
File 
testdata/workloads/functional-query/queries/QueryTest/parquet-filtering.test:

http://gerrit.cloudera.org:8080/#/c/8775/1/testdata/workloads/functional-query/queries/QueryTest/parquet-filtering.test@275
PS1, Line 275: 1
> That makes sense. Should we add a quick comment for the next person to come
yes, good point. done.



--
To view, visit http://gerrit.cloudera.org:8080/8775
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If3a2abcfc3d0f7d18756816659fed77ce12668dd
Gerrit-Change-Number: 8775
Gerrit-PatchSet: 4
Gerrit-Owner: Vuk Ercegovac 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Vuk Ercegovac 
Gerrit-Comment-Date: Wed, 13 Dec 2017 21:28:08 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-6092: avoid drop/create function interactions in e2e tests

2017-11-20 Thread Vuk Ercegovac (Code Review)
Vuk Ercegovac has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8593 )

Change subject: IMPALA-6092: avoid drop/create function interactions in e2e 
tests
..


Patch Set 1:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/8593/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/8593/1//COMMIT_MSG@13
PS1, Line 13: test_udf_invalid_symbol drops a function from that jar, which 
causes the use
> nit: line overflow in multiple places.
Done


http://gerrit.cloudera.org:8080/#/c/8593/1/tests/query_test/test_udfs.py
File tests/query_test/test_udfs.py:

http://gerrit.cloudera.org:8080/#/c/8593/1/tests/query_test/test_udfs.py@422
PS1, Line 422: check_call(["hadoop", "fs", "-put", "-f", src_udf_path, 
tgt_udf_path])
> Just to be sure, this works ok on local fs right? Basically we "put" from l
Couple of points. This always copies from a local source and it renames to a 
unique tgt name in local fs. Its also the way that the create/drop tests are 
also setup (they're careful not to use the same shared jar across tests).

I assume the other tests work, but just in case-- how do I run just the local 
fs version of the test to verify this?



--
To view, visit http://gerrit.cloudera.org:8080/8593
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ica3538788b1d2ab5e361261e2ade62780b838e65
Gerrit-Change-Number: 8593
Gerrit-PatchSet: 1
Gerrit-Owner: Vuk Ercegovac 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Vuk Ercegovac 
Gerrit-Comment-Date: Mon, 20 Nov 2017 20:22:18 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4985: use parquet stats of nested types for dynamic pruning

2017-11-20 Thread Vuk Ercegovac (Code Review)
Hello Lars Volker, Alex Behm,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/8480

to look at the new patch set (#5).

Change subject: IMPALA-4985: use parquet stats of nested types for dynamic 
pruning
..

IMPALA-4985: use parquet stats of nested types for dynamic pruning

Currently, parquet row-groups can be pruned at run-time using
min/max stats when predicates (in, binary) are specified for
column scalar types. This patch extends pruning to nested types
for the same class of predicates. A nested value is an instance
of a nested type (struct, array, map). A nested value consists of
other nested and scalar values (as declared by its type).
Predicates that can be used for row-group pruning must be applied to
nested scalar values. In addition, the parent of the nested scalar
must also be required, that is, not empty. The latter requirement
is conservative: some filters that could be used for pruning are
not used for correctness reasons.

Testing:
- extended nested-types-parquet-stats e2e test cases.

Change-Id: I0c99e20cb080b504442cd5376ea3e046016158fe
---
M be/src/exec/hdfs-parquet-scanner.h
M fe/src/main/java/org/apache/impala/analysis/CollectionStructType.java
M fe/src/main/java/org/apache/impala/analysis/SelectStmt.java
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
M 
testdata/workloads/functional-planner/queries/PlannerTest/parquet-filtering.test
M 
testdata/workloads/functional-query/queries/QueryTest/nested-types-parquet-stats.test
M tests/query_test/test_nested_types.py
7 files changed, 508 insertions(+), 35 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/80/8480/5
--
To view, visit http://gerrit.cloudera.org:8080/8480
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I0c99e20cb080b504442cd5376ea3e046016158fe
Gerrit-Change-Number: 8480
Gerrit-PatchSet: 5
Gerrit-Owner: Vuk Ercegovac 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Vuk Ercegovac 


[Impala-ASF-CR] IMPALA-6092: avoid drop/create function interactions in e2e tests

2017-11-18 Thread Vuk Ercegovac (Code Review)
Vuk Ercegovac has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/8593


Change subject: IMPALA-6092: avoid drop/create function interactions in e2e 
tests
..

IMPALA-6092: avoid drop/create function interactions in e2e tests

The e2e unit tests for udfs can interact via the backend lib_cache,
causing test flakes. IMPALA-6215 explains a race between the lib_cache
and UdfExecutor in the frontend which is the likely the root cause.
Two e2e tests use the same jar (test_java_udfs and test_udf_invalid_symbol),
test_udf_invalid_symbol drops a function from that jar, which causes the use
of that jar to fail in the test_java_udfs test. Since the state of lib_cache
is per process, its state causes these interactions across unit tests.
This change avoids the interactions by using separate jars for the separate 
tests.

Change-Id: Ica3538788b1d2ab5e361261e2ade62780b838e65
---
M be/src/runtime/lib-cache.h
M fe/src/main/java/org/apache/impala/hive/executor/UdfExecutor.java
M tests/query_test/test_udfs.py
3 files changed, 12 insertions(+), 6 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/93/8593/1
--
To view, visit http://gerrit.cloudera.org:8080/8593
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: Ica3538788b1d2ab5e361261e2ade62780b838e65
Gerrit-Change-Number: 8593
Gerrit-PatchSet: 1
Gerrit-Owner: Vuk Ercegovac 


[Impala-ASF-CR] IMPALA-6092: avoid drop/create function interactions in e2e tests

2017-11-21 Thread Vuk Ercegovac (Code Review)
Vuk Ercegovac has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8593 )

Change subject: IMPALA-6092: avoid drop/create function interactions in e2e 
tests
..


Patch Set 2:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/8593/1/tests/query_test/test_udfs.py
File tests/query_test/test_udfs.py:

http://gerrit.cloudera.org:8080/#/c/8593/1/tests/query_test/test_udfs.py@422
PS1, Line 422: check_call(["hadoop", "fs", "-put", "-f", src_udf_path, 
tgt_udf_path])
> What I meant was that "hadoop fs -put/copyFromLocal" from what I understand
thanks for clarifying-- verified that the test_udfs.py tests pass when run 
against the local fs.



--
To view, visit http://gerrit.cloudera.org:8080/8593
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ica3538788b1d2ab5e361261e2ade62780b838e65
Gerrit-Change-Number: 8593
Gerrit-PatchSet: 2
Gerrit-Owner: Vuk Ercegovac 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Vuk Ercegovac 
Gerrit-Comment-Date: Tue, 21 Nov 2017 19:00:11 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-6092: avoid drop/create function interactions in e2e tests

2017-11-21 Thread Vuk Ercegovac (Code Review)
Vuk Ercegovac has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8593 )

Change subject: IMPALA-6092: avoid drop/create function interactions in e2e 
tests
..


Patch Set 2:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/8593/2/tests/query_test/test_udfs.py
File tests/query_test/test_udfs.py:

http://gerrit.cloudera.org:8080/#/c/8593/2/tests/query_test/test_udfs.py@422
PS2, Line 422: check_call(["hadoop", "fs", "-put", "-f", src_udf_path, 
tgt_udf_path])
> if the test fails partway through, will this file get deleted somehow or wi
it does get left around (as with the other tests). perhaps a better pattern 
here is to put these in a dir and use the tester's teardown method to cleanup. 
not sure how many other places follow the current pattern.



--
To view, visit http://gerrit.cloudera.org:8080/8593
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ica3538788b1d2ab5e361261e2ade62780b838e65
Gerrit-Change-Number: 8593
Gerrit-PatchSet: 2
Gerrit-Owner: Vuk Ercegovac 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Vuk Ercegovac 
Gerrit-Comment-Date: Tue, 21 Nov 2017 19:15:16 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4985: use parquet stats of nested types for dynamic pruning

2017-11-21 Thread Vuk Ercegovac (Code Review)
Hello Lars Volker, Alex Behm,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/8480

to look at the new patch set (#7).

Change subject: IMPALA-4985: use parquet stats of nested types for dynamic 
pruning
..

IMPALA-4985: use parquet stats of nested types for dynamic pruning

Currently, parquet row-groups can be pruned at run-time using
min/max stats when predicates (in, binary) are specified for
column scalar types. This patch extends pruning to nested types
for the same class of predicates. A nested value is an instance
of a nested type (struct, array, map). A nested value consists of
other nested and scalar values (as declared by its type).
Predicates that can be used for row-group pruning must be applied to
nested scalar values. In addition, the parent of the nested scalar
must also be required, that is, not empty. The latter requirement
is conservative: some filters that could be used for pruning are
not used for correctness reasons.

Testing:
- extended nested-types-parquet-stats e2e test cases.

Change-Id: I0c99e20cb080b504442cd5376ea3e046016158fe
---
M be/src/exec/hdfs-parquet-scanner.h
M fe/src/main/java/org/apache/impala/analysis/CollectionStructType.java
M fe/src/main/java/org/apache/impala/analysis/SelectStmt.java
M fe/src/main/java/org/apache/impala/analysis/SlotRef.java
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
M 
testdata/workloads/functional-planner/queries/PlannerTest/parquet-filtering.test
M 
testdata/workloads/functional-query/queries/QueryTest/nested-types-parquet-stats.test
M tests/query_test/test_nested_types.py
8 files changed, 644 insertions(+), 35 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/80/8480/7
--
To view, visit http://gerrit.cloudera.org:8080/8480
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I0c99e20cb080b504442cd5376ea3e046016158fe
Gerrit-Change-Number: 8480
Gerrit-PatchSet: 7
Gerrit-Owner: Vuk Ercegovac 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Vuk Ercegovac 


[Impala-ASF-CR] IMPALA-4985: use parquet stats of nested types for dynamic pruning

2017-11-21 Thread Vuk Ercegovac (Code Review)
Hello Lars Volker, Alex Behm,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/8480

to look at the new patch set (#6).

Change subject: IMPALA-4985: use parquet stats of nested types for dynamic 
pruning
..

IMPALA-4985: use parquet stats of nested types for dynamic pruning

Currently, parquet row-groups can be pruned at run-time using
min/max stats when predicates (in, binary) are specified for
column scalar types. This patch extends pruning to nested types
for the same class of predicates. A nested value is an instance
of a nested type (struct, array, map). A nested value consists of
other nested and scalar values (as declared by its type).
Predicates that can be used for row-group pruning must be applied to
nested scalar values. In addition, the parent of the nested scalar
must also be required, that is, not empty. The latter requirement
is conservative: some filters that could be used for pruning are
not used for correctness reasons.

Testing:
- extended nested-types-parquet-stats e2e test cases.

Change-Id: I0c99e20cb080b504442cd5376ea3e046016158fe
---
M be/src/exec/hdfs-parquet-scanner.h
M fe/src/main/java/org/apache/impala/analysis/CollectionStructType.java
M fe/src/main/java/org/apache/impala/analysis/SelectStmt.java
M fe/src/main/java/org/apache/impala/analysis/SlotRef.java
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
M 
testdata/workloads/functional-planner/queries/PlannerTest/parquet-filtering.test
M 
testdata/workloads/functional-query/queries/QueryTest/nested-types-parquet-stats.test
M tests/query_test/test_nested_types.py
8 files changed, 644 insertions(+), 35 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/80/8480/6
--
To view, visit http://gerrit.cloudera.org:8080/8480
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I0c99e20cb080b504442cd5376ea3e046016158fe
Gerrit-Change-Number: 8480
Gerrit-PatchSet: 6
Gerrit-Owner: Vuk Ercegovac 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Vuk Ercegovac 


[Impala-ASF-CR] IMPALA-4985: use parquet stats of nested types for dynamic pruning

2017-11-21 Thread Vuk Ercegovac (Code Review)
Vuk Ercegovac has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8480 )

Change subject: IMPALA-4985: use parquet stats of nested types for dynamic 
pruning
..


Patch Set 6:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/8480/6/fe/src/main/java/org/apache/impala/analysis/SlotRef.java
File fe/src/main/java/org/apache/impala/analysis/SlotRef.java:

http://gerrit.cloudera.org:8080/#/c/8480/6/fe/src/main/java/org/apache/impala/analysis/SlotRef.java@154
PS6, Line 154:* Checks if slotRef refers to an array "pos" pseudo-column.
> Checks if this SlotRef
Done


http://gerrit.cloudera.org:8080/#/c/8480/6/fe/src/main/java/org/apache/impala/analysis/SlotRef.java@160
PS6, Line 160:   public boolean isArrayPosReference() {
> isArrayPosRef()
Done


http://gerrit.cloudera.org:8080/#/c/8480/6/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
File fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java:

http://gerrit.cloudera.org:8080/#/c/8480/6/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@541
PS6, Line 541:   // It is assumed that analysis adds these guards such that 
they are correct, but
> guards -> filters
done. checked that this not used in the rest of this file.


http://gerrit.cloudera.org:8080/#/c/8480/5/testdata/workloads/functional-planner/queries/PlannerTest/parquet-filtering.test
File 
testdata/workloads/functional-planner/queries/PlannerTest/parquet-filtering.test:

http://gerrit.cloudera.org:8080/#/c/8480/5/testdata/workloads/functional-planner/queries/PlannerTest/parquet-filtering.test@96
PS5, Line 96:  PLAN
> Got it. Might want to change that, but definitely not in this patch.
right, a naive change could cause too many changes/noise.

for the point about conflicts, perhaps a plan visitor to extract and test would 
be useful for checking for simple, local properties like these. that would move 
all of these tests from this text file to a plain java test.



--
To view, visit http://gerrit.cloudera.org:8080/8480
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0c99e20cb080b504442cd5376ea3e046016158fe
Gerrit-Change-Number: 8480
Gerrit-PatchSet: 6
Gerrit-Owner: Vuk Ercegovac 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Vuk Ercegovac 
Gerrit-Comment-Date: Wed, 22 Nov 2017 00:05:47 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-6198: marks a test as debug-only

2017-11-16 Thread Vuk Ercegovac (Code Review)
Vuk Ercegovac has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/8573


Change subject: IMPALA-6198: marks a test as debug-only
..

IMPALA-6198: marks a test as debug-only

The test_catalog_wait test uses flags that are only compiled
for debug binaries. This change marks the test as debug-only
so that it does not break release tests.

Change-Id: I92640b8192545cccea0411c04cc5fcf59fbefed0
---
M tests/custom_cluster/test_catalog_wait.py
1 file changed, 2 insertions(+), 0 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/73/8573/1
--
To view, visit http://gerrit.cloudera.org:8080/8573
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I92640b8192545cccea0411c04cc5fcf59fbefed0
Gerrit-Change-Number: 8573
Gerrit-PatchSet: 1
Gerrit-Owner: Vuk Ercegovac 


[Impala-ASF-CR] IMPALA-6273: fixes subquery tests for functional hbase

2017-12-05 Thread Vuk Ercegovac (Code Review)
Vuk Ercegovac has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8765 )

Change subject: IMPALA-6273: fixes subquery tests for functional_hbase
..


Patch Set 3:

reduced the number of tests that explicitly reference functional so that
test coverage is reduced less. there is one case that remains for which I've
filed a separate jira that describes the issue I ran into with nulls + hbase.


--
To view, visit http://gerrit.cloudera.org:8080/8765
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ibaa3a3df7362ac6d3ed07aff133dc4b3520bb4e0
Gerrit-Change-Number: 8765
Gerrit-PatchSet: 3
Gerrit-Owner: Vuk Ercegovac 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Vuk Ercegovac 
Gerrit-Comment-Date: Tue, 05 Dec 2017 19:48:29 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-6273: fixes subquery tests for functional hbase

2017-12-05 Thread Vuk Ercegovac (Code Review)
Vuk Ercegovac has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8765 )

Change subject: IMPALA-6273: fixes subquery tests for functional_hbase
..


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/8765/1/testdata/workloads/functional-query/queries/QueryTest/subquery-in-constant-lhs.test
File 
testdata/workloads/functional-query/queries/QueryTest/subquery-in-constant-lhs.test:

http://gerrit.cloudera.org:8080/#/c/8765/1/testdata/workloads/functional-query/queries/QueryTest/subquery-in-constant-lhs.test@116
PS1, Line 116: WHERE NULL IN (SELECT d FROM functional.nulltable);
> Consider leaving a comment here about why we're specifying the db explicitl
Done



--
To view, visit http://gerrit.cloudera.org:8080/8765
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ibaa3a3df7362ac6d3ed07aff133dc4b3520bb4e0
Gerrit-Change-Number: 8765
Gerrit-PatchSet: 1
Gerrit-Owner: Vuk Ercegovac 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Vuk Ercegovac 
Gerrit-Comment-Date: Tue, 05 Dec 2017 18:41:57 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-6273: fixes subquery tests for functional hbase

2017-12-05 Thread Vuk Ercegovac (Code Review)
Hello Philip Zeyliger,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/8765

to look at the new patch set (#2).

Change subject: IMPALA-6273: fixes subquery tests for functional_hbase
..

IMPALA-6273: fixes subquery tests for functional_hbase

IMPALA-1422 introduced tests that do not work with
the testing setup for hbase. Namely, tinyinttable is
not defined in the functional_hbase database, but
is defined in the functional database. Exhaustive
tests uncovered the issue.

This change makes two changes so that tests work with
functional_hbase:
1) use a table that is present in both functional and
   functional_hbase. the tests needed a subquery result
   with a single int column. tinyinttable is replaced
   with an inline view that provides this single int
   column in a portable manner.
2) nulls are handled differently with hbase (see IMPALA-728)
   so the nulltable used in the tests is set to
   functional.nulltable to avoid inconsistent results across
   input formats.

Testing:
- ran e2e tests with exhaustive exploration strategy for the
  broken test.

Change-Id: Ibaa3a3df7362ac6d3ed07aff133dc4b3520bb4e0
---
M 
testdata/workloads/functional-query/queries/QueryTest/subquery-in-constant-lhs.test
1 file changed, 17 insertions(+), 9 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/65/8765/2
--
To view, visit http://gerrit.cloudera.org:8080/8765
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ibaa3a3df7362ac6d3ed07aff133dc4b3520bb4e0
Gerrit-Change-Number: 8765
Gerrit-PatchSet: 2
Gerrit-Owner: Vuk Ercegovac 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Vuk Ercegovac 


[Impala-ASF-CR] IMPALA-6286: Remove invalid runtime filter targets.

2017-12-07 Thread Vuk Ercegovac (Code Review)
Vuk Ercegovac has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8783 )

Change subject: IMPALA-6286: Remove invalid runtime filter targets.
..


Patch Set 6:

(4 comments)

I have some questions on the different handling of that exception and a 
suggestion to make the callsites more explicit.

http://gerrit.cloudera.org:8080/#/c/8783/6/fe/src/main/java/org/apache/impala/analysis/Analyzer.java
File fe/src/main/java/org/apache/impala/analysis/Analyzer.java:

http://gerrit.cloudera.org:8080/#/c/8783/6/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@1533
PS6, Line 1533: isTrueWithNullSlots
prior, this would have tripped an assertion on backend error. why is it ok to 
ignore now?


http://gerrit.cloudera.org:8080/#/c/8783/6/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@1909
PS6, Line 1909: boolean
might be clearer to have an enum here: true, false, unknown.
instead, you are eating the error-- which presumably is rare and we may want to 
know about it-- and folding the unknown case into every call-site's catch.


http://gerrit.cloudera.org:8080/#/c/8783/6/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@1923
PS6, Line 1923: return FeSupport.EvalPredicate(nullTuplePred, 
getQueryCtx());
just noting that prior, we explicitly asserted the exception case should not 
happen whereas now that case is silently eaten (L1535). is this what was 
tripped by the test's too low limit? if so, shouldn't the tests be modified? 
that test includes an is null predicate for the nested collection-- unclear if 
that's needed for the test.


http://gerrit.cloudera.org:8080/#/c/8783/6/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
File fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java:

http://gerrit.cloudera.org:8080/#/c/8783/6/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@640
PS6, Line 640: continue
would it be useful to know that pruning could not be done due to a backend 
error?



--
To view, visit http://gerrit.cloudera.org:8080/8783
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I88153eea9f4b5117df60366fad2bd91776b95298
Gerrit-Change-Number: 8783
Gerrit-PatchSet: 6
Gerrit-Owner: Alex Behm 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Vuk Ercegovac 
Gerrit-Comment-Date: Fri, 08 Dec 2017 07:27:40 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-6286: Remove invalid runtime filter targets.

2017-12-08 Thread Vuk Ercegovac (Code Review)
Vuk Ercegovac has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8783 )

Change subject: IMPALA-6286: Remove invalid runtime filter targets.
..


Patch Set 7: Code-Review+1

(3 comments)

thanks for the explanation... looks fine to me.

http://gerrit.cloudera.org:8080/#/c/8783/6/fe/src/main/java/org/apache/impala/analysis/Analyzer.java
File fe/src/main/java/org/apache/impala/analysis/Analyzer.java:

http://gerrit.cloudera.org:8080/#/c/8783/6/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@1491
PS6, Line 1491: d
nit: predicates


http://gerrit.cloudera.org:8080/#/c/8783/6/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@1533
PS6, Line 1533: isTrueWithNullSlots
> The predicates generated in this function are for optimization purposes and
ok, that's what I was looking for. for runtime filter and dictionary pruning, 
they're clearly for optimization purposes so skipping is easy to reason about. 
its unclear for this method (getBoundPredicates)-- that a caller can expect 
some predicates to not be bound due to env/runtime issues. In particular, the 
previous behavior would have aborted this query, right? In which case, I'd 
expect additional queries to run with this change.


http://gerrit.cloudera.org:8080/#/c/8783/6/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@1923
PS6, Line 1923: }
> Btw, I plan to add new targeted tests for these expr evaluation failure cas
makes sense-- we should then see the BE failure for TupleIsNullPredicate 
instead of the assertion failure... didn't see the TupleIsNullPredicate 
example, since this change did not require changes at that callsite.
for, getBoundPredicate-- we'd expect to run some queries that previously would 
fail the assertion.



--
To view, visit http://gerrit.cloudera.org:8080/8783
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I88153eea9f4b5117df60366fad2bd91776b95298
Gerrit-Change-Number: 8783
Gerrit-PatchSet: 7
Gerrit-Owner: Alex Behm 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Vuk Ercegovac 
Gerrit-Comment-Date: Fri, 08 Dec 2017 17:56:02 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4993: extend dictionary filtering to collections

2017-12-05 Thread Vuk Ercegovac (Code Review)
Vuk Ercegovac has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/8775


Change subject: IMPALA-4993: extend dictionary filtering to collections
..

IMPALA-4993: extend dictionary filtering to collections

Currently, top-level scalar columns in parquet files can
be used at runtime to prune row-groups by evaluating certain
conjuncts over the column's dictionary (if available).

This change extends such pruning to scalar values that are
stored in collection type columns. Currently, dictionary
pruning works by finding eligible conjuncts for top-level
slots. Since only top-level slots are supported, the slots
are implicitly part of the scan node's tuple descriptor.
With this change, we track eligible conjuncts by slot as well
as the tuple that contains the slot (either top-level or
nested collection). Since collection conjuncts are already
managed by a map that associates tuple descriptors to a list
of their conjuncts, this extension follows the existing
representation.

The frontend builds the mapping of  to
conjuncts that are dictionary filterable. The backend is
adjusted to use the same representation. In addition, collection
readers are decomposed into scalar filterable columns and
other, non-dictionary filterable readers. When filtering
a row group using a conjunct associated to a (possibly)
nested collection type, an additional tuple buffer is
allocated per tuple descriptor.

Testing:
- e2e test extended to illustrate row-groups that are pruned
  by nested collection dictionary filters.

Change-Id: If3a2abcfc3d0f7d18756816659fed77ce12668dd
---
M be/src/exec/hdfs-parquet-scanner.cc
M be/src/exec/hdfs-parquet-scanner.h
M be/src/exec/hdfs-scan-node-base.cc
M be/src/exec/hdfs-scan-node-base.h
M be/src/exec/hdfs-scanner.cc
M be/src/exec/hdfs-scanner.h
M be/src/runtime/collection-value-builder.h
M common/thrift/PlanNodes.thrift
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
M testdata/workloads/functional-query/queries/QueryTest/parquet-filtering.test
10 files changed, 518 insertions(+), 180 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/75/8775/1
--
To view, visit http://gerrit.cloudera.org:8080/8775
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: If3a2abcfc3d0f7d18756816659fed77ce12668dd
Gerrit-Change-Number: 8775
Gerrit-PatchSet: 1
Gerrit-Owner: Vuk Ercegovac 


[Impala-ASF-CR] IMPALA-4993: extend dictionary filtering to collections

2017-12-11 Thread Vuk Ercegovac (Code Review)
Vuk Ercegovac has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8775 )

Change subject: IMPALA-4993: extend dictionary filtering to collections
..


Patch Set 1:

(19 comments)

thx for the reviews.

http://gerrit.cloudera.org:8080/#/c/8775/1/be/src/exec/hdfs-parquet-scanner.h
File be/src/exec/hdfs-parquet-scanner.h:

http://gerrit.cloudera.org:8080/#/c/8775/1/be/src/exec/hdfs-parquet-scanner.h@452
PS1, Line 452: Only scalar columns can be
 :   /// dictionary filtered
> Can you mention in the comment why?
Done. re-worded


http://gerrit.cloudera.org:8080/#/c/8775/1/be/src/exec/hdfs-scan-node-base.h
File be/src/exec/hdfs-scan-node-base.h:

http://gerrit.cloudera.org:8080/#/c/8775/1/be/src/exec/hdfs-scan-node-base.h@167
PS1, Line 167: return thrift_dict_filter_conjuncts_;
> You could make the calling code cleaner by returning an empty vector here.
its used in one place so don't think its worth it.


http://gerrit.cloudera.org:8080/#/c/8775/1/be/src/exec/hdfs-scan-node-base.h@382
PS1, Line 382:   /// Dictionary filtering eligible conjuncts for each slot.
> Can you add to the comment that this can be nullptr?
Done


http://gerrit.cloudera.org:8080/#/c/8775/1/be/src/exec/hdfs-scanner.cc
File be/src/exec/hdfs-scanner.cc:

http://gerrit.cloudera.org:8080/#/c/8775/1/be/src/exec/hdfs-scanner.cc@90
PS1, Line 90: auto
> We usually only use auto for iterators and const& in for loops. I think thi
Done


http://gerrit.cloudera.org:8080/#/c/8775/1/be/src/exec/hdfs-scanner.cc@97
PS1, Line 97: entry.tuple_id
> you could just use 'tuple_id' here (from L86).
Done


http://gerrit.cloudera.org:8080/#/c/8775/1/common/thrift/PlanNodes.thrift
File common/thrift/PlanNodes.thrift:

http://gerrit.cloudera.org:8080/#/c/8775/1/common/thrift/PlanNodes.thrift@202
PS1, Line 202: between a TupleId, SlotId pair to a
> nit: I think it should be "mapping between ... and ..." or "mapping from ..
Done


http://gerrit.cloudera.org:8080/#/c/8775/1/common/thrift/PlanNodes.thrift@209
PS1, Line 209: optional
> Shouldn't they all be required?
Done


http://gerrit.cloudera.org:8080/#/c/8775/1/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
File fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java:

http://gerrit.cloudera.org:8080/#/c/8775/1/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@187
PS1, Line 187:   // Map from pairs of TupleDescriptor, SlotIds to indices in 
PlanNodes.conjuncts_ and
> I think this comment could benefit from a description of what makes a slot/
min-max pruning is discussed in the header, but dictionary pruning is not. I 
added a few sentences there so that its less surprising when seeing these 
supporting data-structures for the first time.


http://gerrit.cloudera.org:8080/#/c/8775/1/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@188
PS1, Line 188: collectionConjuncts
> nit: missing _
Done


http://gerrit.cloudera.org:8080/#/c/8775/1/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@188
PS1, Line 188:   // collectionConjuncts that are eligible for dictionary 
filtering. The TupleDescriptor
> nit: I'd phrase it as "slots in the TupleDescriptor of this scan node map t
Done


http://gerrit.cloudera.org:8080/#/c/8775/1/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@624
PS1, Line 624: Preconditions.checkState(tupleIds.size() == 1);
> It doesn't seem obvious to me why this is true. Can you add a comment?
A bit unclear to me as well. However, if I understand this literally, we only 
support single slot conjuncts for dictionary pruning, so therefore should 
expect that that single slot only refers to a single tuple. I reduced the 
multiple checks against slot id and added a comment.


http://gerrit.cloudera.org:8080/#/c/8775/1/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@656
PS1, Line 656:* Walks through conjuncts and collectionConjuncts and 
populates
> nit: trailing _s
Done


http://gerrit.cloudera.org:8080/#/c/8775/1/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@668
PS1, Line 668: notEmptyCollections_.contains(entry.getKey())
> This could be factored into its own method together with the comment above.
Done


http://gerrit.cloudera.org:8080/#/c/8775/1/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@1010
PS1, Line 1010: for (Map.Entry, 
List> entry : dictionaryFilterConjuncts_.entrySet()) {
> long line
Done


http://gerrit.cloudera.org:8080/#/c/8775/1/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@1075
PS1, Line 1075:   if (!dictionaryFilterConjuncts_.isEmpty()) {
> This check looks like it's not needed.
whoops... missed the clear simplification. Done.


http://gerrit.cloudera.org:8080/#/c/8775/1/testdata/workloads/functional-query/queries/QueryTest/parquet-filtering.test
File 

[Impala-ASF-CR] IMPALA-4993: extend dictionary filtering to collections

2017-12-11 Thread Vuk Ercegovac (Code Review)
Hello Lars Volker,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/8775

to look at the new patch set (#2).

Change subject: IMPALA-4993: extend dictionary filtering to collections
..

IMPALA-4993: extend dictionary filtering to collections

Currently, top-level scalar columns in parquet files can
be used at runtime to prune row-groups by evaluating certain
conjuncts over the column's dictionary (if available).

This change extends such pruning to scalar values that are
stored in collection type columns. Currently, dictionary
pruning works by finding eligible conjuncts for top-level
slots. Since only top-level slots are supported, the slots
are implicitly part of the scan node's tuple descriptor.
With this change, we track eligible conjuncts by slot as well
as the tuple that contains the slot (either top-level or
nested collection). Since collection conjuncts are already
managed by a map that associates tuple descriptors to a list
of their conjuncts, this extension follows the existing
representation.

The frontend builds the mapping of  to
conjuncts that are dictionary filterable. The backend is
adjusted to use the same representation. In addition, collection
readers are decomposed into scalar filterable columns and
other, non-dictionary filterable readers. When filtering
a row group using a conjunct associated to a (possibly)
nested collection type, an additional tuple buffer is
allocated per tuple descriptor.

Testing:
- e2e test extended to illustrate row-groups that are pruned
  by nested collection dictionary filters.

Change-Id: If3a2abcfc3d0f7d18756816659fed77ce12668dd
---
M be/src/exec/hdfs-parquet-scanner.cc
M be/src/exec/hdfs-parquet-scanner.h
M be/src/exec/hdfs-scan-node-base.cc
M be/src/exec/hdfs-scan-node-base.h
M be/src/exec/hdfs-scanner.cc
M be/src/exec/hdfs-scanner.h
M be/src/runtime/collection-value-builder.h
M common/thrift/PlanNodes.thrift
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
M testdata/workloads/functional-query/queries/QueryTest/parquet-filtering.test
10 files changed, 561 insertions(+), 194 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/75/8775/2
--
To view, visit http://gerrit.cloudera.org:8080/8775
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: If3a2abcfc3d0f7d18756816659fed77ce12668dd
Gerrit-Change-Number: 8775
Gerrit-PatchSet: 2
Gerrit-Owner: Vuk Ercegovac 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Vuk Ercegovac 


[Impala-ASF-CR] IMPALA-6286: Remove invalid runtime filter targets.

2017-12-06 Thread Vuk Ercegovac (Code Review)
Vuk Ercegovac has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8783 )

Change subject: IMPALA-6286: Remove invalid runtime filter targets.
..


Patch Set 1:

(6 comments)

still getting the hang of target/source, outer, nullable in the context of 
runtime filters, but here are some comments.

http://gerrit.cloudera.org:8080/#/c/8783/1/fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java
File fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java:

http://gerrit.cloudera.org:8080/#/c/8783/1/fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java@278
PS1, Line 278: if (LOG.isTraceEnabled()) {
 : LOG.trace("Generating runtime filter from predicate " + 
joinPredicate);
 :   }
move this to L297 when you know that the filter will be generated.


http://gerrit.cloudera.org:8080/#/c/8783/1/fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java@284
PS1, Line 284: incorrectly alter the query results
produce incorrect results. [more direct]


http://gerrit.cloudera.org:8080/#/c/8783/1/fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java@285
PS1, Line 285: the expr
the target expression


http://gerrit.cloudera.org:8080/#/c/8783/1/fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java@285
PS1, Line 285: should
nit: must


http://gerrit.cloudera.org:8080/#/c/8783/1/fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java@287
PS1, Line 287: Literal
not following what is the Literal part of this predicate.


http://gerrit.cloudera.org:8080/#/c/8783/1/testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-propagation.test
File 
testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-propagation.test:

http://gerrit.cloudera.org:8080/#/c/8783/1/testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-propagation.test@1483
PS1, Line 1483: s
nit: produced



--
To view, visit http://gerrit.cloudera.org:8080/8783
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I88153eea9f4b5117df60366fad2bd91776b95298
Gerrit-Change-Number: 8783
Gerrit-PatchSet: 1
Gerrit-Owner: Alex Behm 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Vuk Ercegovac 
Gerrit-Comment-Date: Thu, 07 Dec 2017 00:42:24 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4993: extend dictionary filtering to collections

2017-12-11 Thread Vuk Ercegovac (Code Review)
Vuk Ercegovac has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8775 )

Change subject: IMPALA-4993: extend dictionary filtering to collections
..


Patch Set 3:

planner tests: adds a test and updates where needed.


--
To view, visit http://gerrit.cloudera.org:8080/8775
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If3a2abcfc3d0f7d18756816659fed77ce12668dd
Gerrit-Change-Number: 8775
Gerrit-PatchSet: 3
Gerrit-Owner: Vuk Ercegovac 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Vuk Ercegovac 
Gerrit-Comment-Date: Tue, 12 Dec 2017 00:12:51 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4993: extend dictionary filtering to collections

2017-12-11 Thread Vuk Ercegovac (Code Review)
Hello Lars Volker,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/8775

to look at the new patch set (#3).

Change subject: IMPALA-4993: extend dictionary filtering to collections
..

IMPALA-4993: extend dictionary filtering to collections

Currently, top-level scalar columns in parquet files can
be used at runtime to prune row-groups by evaluating certain
conjuncts over the column's dictionary (if available).

This change extends such pruning to scalar values that are
stored in collection type columns. Currently, dictionary
pruning works by finding eligible conjuncts for top-level
slots. Since only top-level slots are supported, the slots
are implicitly part of the scan node's tuple descriptor.
With this change, we track eligible conjuncts by slot as well
as the tuple that contains the slot (either top-level or
nested collection). Since collection conjuncts are already
managed by a map that associates tuple descriptors to a list
of their conjuncts, this extension follows the existing
representation.

The frontend builds the mapping of  to
conjuncts that are dictionary filterable. The backend is
adjusted to use the same representation. In addition, collection
readers are decomposed into scalar filterable columns and
other, non-dictionary filterable readers. When filtering
a row group using a conjunct associated to a (possibly)
nested collection type, an additional tuple buffer is
allocated per tuple descriptor.

Testing:
- e2e test extended to illustrate row-groups that are pruned
  by nested collection dictionary filters.

Change-Id: If3a2abcfc3d0f7d18756816659fed77ce12668dd
---
M be/src/exec/hdfs-parquet-scanner.cc
M be/src/exec/hdfs-parquet-scanner.h
M be/src/exec/hdfs-scan-node-base.cc
M be/src/exec/hdfs-scan-node-base.h
M be/src/exec/hdfs-scanner.cc
M be/src/exec/hdfs-scanner.h
M be/src/runtime/collection-value-builder.h
M common/thrift/PlanNodes.thrift
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
M 
testdata/workloads/functional-planner/queries/PlannerTest/parquet-filtering.test
M testdata/workloads/functional-query/queries/QueryTest/parquet-filtering.test
11 files changed, 638 insertions(+), 194 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/75/8775/3
--
To view, visit http://gerrit.cloudera.org:8080/8775
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: If3a2abcfc3d0f7d18756816659fed77ce12668dd
Gerrit-Change-Number: 8775
Gerrit-PatchSet: 3
Gerrit-Owner: Vuk Ercegovac 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Vuk Ercegovac 


[Impala-ASF-CR] [PREVIEW] IMPALA-4886: Expose table metrics in the catalog web UI.

2017-12-11 Thread Vuk Ercegovac (Code Review)
Vuk Ercegovac has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8529 )

Change subject: [PREVIEW] IMPALA-4886: Expose table metrics in the catalog web 
UI.
..


Patch Set 3: Code-Review+1

(6 comments)

mostly small comments and optional suggestions.

http://gerrit.cloudera.org:8080/#/c/8529/3/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
File fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java:

http://gerrit.cloudera.org:8080/#/c/8529/3/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@1395
PS3, Line 1395: metris
nit: metrics


http://gerrit.cloudera.org:8080/#/c/8529/3/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@1395
PS3, Line 1395: pretty-prints them into a
  :* string. Returns the string representation of the table 
metrics
condense: ... and returns a pretty-printed string representation.


http://gerrit.cloudera.org:8080/#/c/8529/3/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java:

http://gerrit.cloudera.org:8080/#/c/8529/3/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@252
PS3, Line 252: functionality
nit: purposes


http://gerrit.cloudera.org:8080/#/c/8529/3/fe/src/main/java/org/apache/impala/catalog/Table.java
File fe/src/main/java/org/apache/impala/catalog/Table.java:

http://gerrit.cloudera.org:8080/#/c/8529/3/fe/src/main/java/org/apache/impala/catalog/Table.java@579
PS3, Line 579: trully
nit: truly


http://gerrit.cloudera.org:8080/#/c/8529/3/fe/src/main/java/org/apache/impala/util/TopNCache.java
File fe/src/main/java/org/apache/impala/util/TopNCache.java:

http://gerrit.cloudera.org:8080/#/c/8529/3/fe/src/main/java/org/apache/impala/util/TopNCache.java@62
PS3, Line 62: function_.apply(t1).compareTo(function_.apply(t2))
nit: factor the two calls with a method called compareRanks(t1, t2) ?


http://gerrit.cloudera.org:8080/#/c/8529/3/fe/src/main/java/org/apache/impala/util/TopNCache.java@78
PS3, Line 78: if (!heap_.remove(item))
easier to read?

if (heap_.contains(item)) return;



--
To view, visit http://gerrit.cloudera.org:8080/8529
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I37d407979e6d3b1a444b6b6265900b148facde9e
Gerrit-Change-Number: 8529
Gerrit-PatchSet: 3
Gerrit-Owner: Dimitris Tsirogiannis 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Vuk Ercegovac 
Gerrit-Comment-Date: Tue, 12 Dec 2017 02:42:12 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4168: Adopt Oracle-style hint placement for INSERT/UPSERT

2017-12-11 Thread Vuk Ercegovac (Code Review)
Vuk Ercegovac has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8676 )

Change subject: IMPALA-4168: Adopt Oracle-style hint placement for INSERT/UPSERT
..


Patch Set 1:

(7 comments)

Thanks for the change!
I have some initial feedback for the tests.

http://gerrit.cloudera.org:8080/#/c/8676/1/fe/src/test/java/org/apache/impala/analysis/ParserTest.java
File fe/src/test/java/org/apache/impala/analysis/ParserTest.java:

http://gerrit.cloudera.org:8080/#/c/8676/1/fe/src/test/java/org/apache/impala/analysis/ParserTest.java@376
PS1, Line 376:   // Test insert hints.
instead of repeating the queries, how about we vary the hint location. I'm 
thinking of something like:

for (placement : {OracleStyle, DefaultStyle}) {
hint = String.format("%sshuffle%s", prefix, suffix)
defaultHint = hint;
oracleHint = ""
if (placement == oracleStyle) {
  defaultHint = "";
  oracleHint = hint;
}
TestInsertHints(String.format("insert %s into t %s select * from t", 
oracleHint, defaultHint),...)
TestInsertHints(...)
...
}

Another place to inject this variation may be in TestInsertHints (I see that 
the upsert just checks for parsing, not whether the hints were applied as 
expected-- would be useful to upgrade these while here).


http://gerrit.cloudera.org:8080/#/c/8676/1/fe/src/test/java/org/apache/impala/analysis/ParserTest.java@392
PS1, Line 392: Adopt
nit: remove the "Adopt".


http://gerrit.cloudera.org:8080/#/c/8676/1/fe/src/test/java/org/apache/impala/analysis/ParserTest.java@392
PS1, Line 392: IMPALA-4168:
include jira issue for bug fixes, not new features.


http://gerrit.cloudera.org:8080/#/c/8676/1/fe/src/test/java/org/apache/impala/analysis/ParserTest.java@409
PS1, Line 409: .
we can remove redundancy here as well.


http://gerrit.cloudera.org:8080/#/c/8676/1/fe/src/test/java/org/apache/impala/analysis/ParserTest.java@414
PS1, Line 414: IMPALA-4168:
same here


http://gerrit.cloudera.org:8080/#/c/8676/1/fe/src/test/java/org/apache/impala/analysis/ParserTest.java@483
PS1, Line 483: .
and remove redundancy here as well.


http://gerrit.cloudera.org:8080/#/c/8676/1/fe/src/test/java/org/apache/impala/analysis/ParserTest.java@499
PS1, Line 499: IMPALA-4168:
same here



--
To view, visit http://gerrit.cloudera.org:8080/8676
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ied7629d70197a0270cdc0853e00cc021fdb4dc20
Gerrit-Change-Number: 8676
Gerrit-PatchSet: 1
Gerrit-Owner: Kim Jin Chul 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: John Russell 
Gerrit-Reviewer: Kim Jin Chul 
Gerrit-Reviewer: Vuk Ercegovac 
Gerrit-Comment-Date: Tue, 12 Dec 2017 01:50:13 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-1422: support a constant on LHS of IN predicates.

2017-12-01 Thread Vuk Ercegovac (Code Review)
Hello Dimitris Tsirogiannis, Alex Behm,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/8322

to look at the new patch set (#12).

Change subject: IMPALA-1422: support a constant on LHS of IN predicates.
..

IMPALA-1422: support a constant on LHS of IN predicates.

Currently, constant expressions for the LHS of the IN predicate
are not supported. This patch adds this support as a rewrite in
StmtRewriter (where subqueries are rewritten to joins). Since
there is a nested-loop variant of left semijoin, support for IN
is handled by not erring out. NOT IN is handled by a rewrite to
corresponding NOT EXISTS predicate. Support for NOT IN with a
correlated subquery is not included in this change.

Re-organized the frontend subquery analysis tests to expand coverage.

Testing:
- added frontend subquery analysis tests
- added e2e tests

Change-Id: I0d69889a3c72e90be9d4ccf47d2816819ae32acb
---
M fe/src/main/java/org/apache/impala/analysis/InPredicate.java
M fe/src/main/java/org/apache/impala/analysis/SlotRef.java
M fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeSubqueriesTest.java
M 
testdata/workloads/functional-planner/queries/PlannerTest/subquery-rewrite.test
A 
testdata/workloads/functional-query/queries/QueryTest/subquery-in-constant-lhs.test
M tests/query_test/test_queries.py
7 files changed, 908 insertions(+), 203 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/22/8322/12
--
To view, visit http://gerrit.cloudera.org:8080/8322
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I0d69889a3c72e90be9d4ccf47d2816819ae32acb
Gerrit-Change-Number: 8322
Gerrit-PatchSet: 12
Gerrit-Owner: Vuk Ercegovac 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Vuk Ercegovac 


[Impala-ASF-CR] IMPALA-1422: support a constant on LHS of IN predicates.

2017-12-01 Thread Vuk Ercegovac (Code Review)
Vuk Ercegovac has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8322 )

Change subject: IMPALA-1422: support a constant on LHS of IN predicates.
..


Patch Set 11:

updates alias generation to use outer-block's alias generators.


--
To view, visit http://gerrit.cloudera.org:8080/8322
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0d69889a3c72e90be9d4ccf47d2816819ae32acb
Gerrit-Change-Number: 8322
Gerrit-PatchSet: 11
Gerrit-Owner: Vuk Ercegovac 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Vuk Ercegovac 
Gerrit-Comment-Date: Fri, 01 Dec 2017 21:37:18 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-1422: support a constant on LHS of IN predicates.

2017-12-01 Thread Vuk Ercegovac (Code Review)
Vuk Ercegovac has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8322 )

Change subject: IMPALA-1422: support a constant on LHS of IN predicates.
..


Patch Set 12:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/8322/12/testdata/workloads/functional-planner/queries/PlannerTest/subquery-rewrite.test
File 
testdata/workloads/functional-planner/queries/PlannerTest/subquery-rewrite.test:

http://gerrit.cloudera.org:8080/#/c/8322/12/testdata/workloads/functional-planner/queries/PlannerTest/subquery-rewrite.test@2277
PS12, Line 2277: (select int_col from functional.tinyinttable where
> Make this one more complicated or add another test like this:
done. we're getting a HJ for the inner and a nested-loop for the outer. the lhs 
constant is pushed down into the HJ of the nested join.



--
To view, visit http://gerrit.cloudera.org:8080/8322
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0d69889a3c72e90be9d4ccf47d2816819ae32acb
Gerrit-Change-Number: 8322
Gerrit-PatchSet: 12
Gerrit-Owner: Vuk Ercegovac 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Vuk Ercegovac 
Gerrit-Comment-Date: Sat, 02 Dec 2017 00:04:30 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-1422: support a constant on LHS of IN predicates.

2017-12-01 Thread Vuk Ercegovac (Code Review)
Hello Dimitris Tsirogiannis, Alex Behm,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/8322

to look at the new patch set (#13).

Change subject: IMPALA-1422: support a constant on LHS of IN predicates.
..

IMPALA-1422: support a constant on LHS of IN predicates.

Currently, constant expressions for the LHS of the IN predicate
are not supported. This patch adds this support as a rewrite in
StmtRewriter (where subqueries are rewritten to joins). Since
there is a nested-loop variant of left semijoin, support for IN
is handled by not erring out. NOT IN is handled by a rewrite to
corresponding NOT EXISTS predicate. Support for NOT IN with a
correlated subquery is not included in this change.

Re-organized the frontend subquery analysis tests to expand coverage.

Testing:
- added frontend subquery analysis tests
- added e2e tests

Change-Id: I0d69889a3c72e90be9d4ccf47d2816819ae32acb
---
M fe/src/main/java/org/apache/impala/analysis/InPredicate.java
M fe/src/main/java/org/apache/impala/analysis/SlotRef.java
M fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeSubqueriesTest.java
M 
testdata/workloads/functional-planner/queries/PlannerTest/subquery-rewrite.test
A 
testdata/workloads/functional-query/queries/QueryTest/subquery-in-constant-lhs.test
M tests/query_test/test_queries.py
7 files changed, 932 insertions(+), 203 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/22/8322/13
--
To view, visit http://gerrit.cloudera.org:8080/8322
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I0d69889a3c72e90be9d4ccf47d2816819ae32acb
Gerrit-Change-Number: 8322
Gerrit-PatchSet: 13
Gerrit-Owner: Vuk Ercegovac 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Vuk Ercegovac 


[Impala-ASF-CR] IMPALA-5931: Generates scan ranges in planner for s3/adls

2017-12-20 Thread Vuk Ercegovac (Code Review)
Vuk Ercegovac has uploaded a new patch set (#3). ( 
http://gerrit.cloudera.org:8080/8523 )

Change subject: IMPALA-5931: Generates scan ranges in planner for s3/adls
..

IMPALA-5931: Generates scan ranges in planner for s3/adls

Currently, for filesystems that do not include physical
block information (e.g., block replica locations, caching),
synthetic blocks are generated and stored in the catalog
when metadata is loaded. Example file systems for which this is done
includes S3, ADLS, and local fs.

This change avoids generating these blocks when metadata is loaded.
Instead, scan ranges are directly generated from such files by the
backend coordinator. Previously, all scan ranges were produced by
the planner in HDFSScanNode in the frontend. Now, those files without
block information are sent to the coordinator represented by a split
specification that determines how the coordinator will create scan ranges
to send to executors.

This change reduces the space needed in the catalog and reduces the
scan range datastructures that are passed from the frontend to the
backend when planning and coordinating a query.
In addition a bug is avoided where non-splittable files were being
split anyways to support the query parameter that places a limit on
scan ranges.

Testing:
- local fs tests cover the code paths in this change
- all core tests pass when configured with s3
- manually tried larger local filesystem tables (tpch) with multiple
  partitions and observed the same scan ranges.
- TODO: adls testing

Change-Id: I326065adbb2f7e632814113aae85cb51ca4779a5
---
M be/src/scheduling/scheduler.cc
M common/thrift/PlanNodes.thrift
M fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
5 files changed, 294 insertions(+), 166 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/23/8523/3
--
To view, visit http://gerrit.cloudera.org:8080/8523
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I326065adbb2f7e632814113aae85cb51ca4779a5
Gerrit-Change-Number: 8523
Gerrit-PatchSet: 3
Gerrit-Owner: Vuk Ercegovac 
Gerrit-Reviewer: Vuk Ercegovac 


[Impala-ASF-CR] IMPALA-5931: Generates scan ranges in planner for s3/adls

2017-12-20 Thread Vuk Ercegovac (Code Review)
Vuk Ercegovac has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8523 )

Change subject: IMPALA-5931: Generates scan ranges in planner for s3/adls
..


Patch Set 3:

all tests pass with s3


--
To view, visit http://gerrit.cloudera.org:8080/8523
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I326065adbb2f7e632814113aae85cb51ca4779a5
Gerrit-Change-Number: 8523
Gerrit-PatchSet: 3
Gerrit-Owner: Vuk Ercegovac 
Gerrit-Reviewer: Vuk Ercegovac 
Gerrit-Comment-Date: Wed, 20 Dec 2017 17:19:34 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-6948: Delete catalog update topic entries upon catalog restart

2018-05-04 Thread Vuk Ercegovac (Code Review)
Vuk Ercegovac has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/10289 )

Change subject: IMPALA-6948: Delete catalog update topic entries upon catalog 
restart
..


Patch Set 2:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/10289/1/be/src/statestore/statestore.cc
File be/src/statestore/statestore.cc:

http://gerrit.cloudera.org:8080/#/c/10289/1/be/src/statestore/statestore.cc@213
PS1, Line 213:   topic_update_log_.clear();
> Good point. I updated the metrics and the related fields. I don't think res
makes sense to not update. pls add a comment, perhaps in the header, to state 
that this bit of history is intentionally retained.


http://gerrit.cloudera.org:8080/#/c/10289/1/be/src/statestore/statestore.cc@665
PS1, Line 665: 
subscriber->SetLastTopicVersionProcessed(topic_it->first, update.from_version);
> Hm, not following. What do you suggest we dcheck?
basically asking if from_version can be set here and would that worth checking?


http://gerrit.cloudera.org:8080/#/c/10289/2/be/src/statestore/statestore.cc
File be/src/statestore/statestore.cc:

http://gerrit.cloudera.org:8080/#/c/10289/2/be/src/statestore/statestore.cc@216
PS2, Line 216: key_size_metric_->SetValue(0);
 :   topic_size_metric_->SetValue(0);
 :   value_size_metric_->SetValue(0);
these are shared across topics, so this should more accurately subtract what 
was cleared. since we keep track of the total per topic, perhaps subtract that 
off?



--
To view, visit http://gerrit.cloudera.org:8080/10289
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I74a8ade8e498ac35cb56d3775d2c67a86988d9b6
Gerrit-Change-Number: 10289
Gerrit-PatchSet: 2
Gerrit-Owner: Dimitris Tsirogiannis 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Vuk Ercegovac 
Gerrit-Comment-Date: Fri, 04 May 2018 23:07:41 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-6948: Delete catalog update topic entries upon catalog restart

2018-05-04 Thread Vuk Ercegovac (Code Review)
Vuk Ercegovac has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/10289 )

Change subject: IMPALA-6948: Delete catalog update topic entries upon catalog 
restart
..


Patch Set 1:

(6 comments)

http://gerrit.cloudera.org:8080/#/c/10289/1/be/src/catalog/catalog-server.cc
File be/src/catalog/catalog-server.cc:

http://gerrit.cloudera.org:8080/#/c/10289/1/be/src/catalog/catalog-server.cc@249
PS1, Line 249: last_sent_catalog_version_
add to the header comments for this member its expected range ( [0, ...)) and 
lets explicitly state that 0 means no updates have been sent.


http://gerrit.cloudera.org:8080/#/c/10289/1/be/src/statestore/statestore.cc
File be/src/statestore/statestore.cc:

http://gerrit.cloudera.org:8080/#/c/10289/1/be/src/statestore/statestore.cc@213
PS1, Line 213:   topic_update_log_.clear();
init the other members: total_key_size_bytes_, total_value_size_bytes_
(I see value bytes is decreased elsewhere)

decrement the gauges?
(I didn't see these decreased anywhere even though value bytes can decrease)

why should last_version_ not be reset?


http://gerrit.cloudera.org:8080/#/c/10289/1/be/src/statestore/statestore.cc@665
PS1, Line 665:   // cleared.
I didn't see that the case on L656 can happen when clearing. dcheck it?


http://gerrit.cloudera.org:8080/#/c/10289/1/tests/statestore/test_statestore.py
File tests/statestore/test_statestore.py:

http://gerrit.cloudera.org:8080/#/c/10289/1/tests/statestore/test_statestore.py@525
PS1, Line 525: sets the clear_topic_entries flag in a topic update message. 
"""
add the jira for context


http://gerrit.cloudera.org:8080/#/c/10289/1/tests/statestore/test_statestore.py@535
PS1, Line 535: num_updates=1,
 : key_template=
keep the same order of args as on L531.


http://gerrit.cloudera.org:8080/#/c/10289/1/tests/statestore/test_statestore.py@548
PS1, Line 548:
can some topic other than topic_name be called back here?
similar question with the version.

how can we ensure that this callback does not silently pass the test without 
checking the assertions on L546, 547? perhaps log and check that the log 
contains a line that proves the check was run?



--
To view, visit http://gerrit.cloudera.org:8080/10289
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I74a8ade8e498ac35cb56d3775d2c67a86988d9b6
Gerrit-Change-Number: 10289
Gerrit-PatchSet: 1
Gerrit-Owner: Dimitris Tsirogiannis 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Vuk Ercegovac 
Gerrit-Comment-Date: Fri, 04 May 2018 19:02:38 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-6948,IMPALA-6962: add end-to-end tests

2018-05-08 Thread Vuk Ercegovac (Code Review)
Hello Dimitris Tsirogiannis, Alex Behm,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/10291

to look at the new patch set (#3).

Change subject: IMPALA-6948,IMPALA-6962: add end-to-end tests
..

IMPALA-6948,IMPALA-6962: add end-to-end tests

Adds end-to-end tests to validate that following
various metadata operations, the catalog state
in catalogd and impalads is the same.

For IMPALA-6962, catalogd process restart for tests
is fixed.

Change-Id: Ic6c5b39e29b2885cd30fede18833cbf23fb755f5
---
M be/src/catalog/catalog-server.cc
M tests/common/impala_cluster.py
M tests/common/impala_service.py
A tests/custom_cluster/test_metadata_replicas.py
M tests/metadata/test_hms_integration.py
A tests/util/hive_utils.py
6 files changed, 243 insertions(+), 68 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/91/10291/3
--
To view, visit http://gerrit.cloudera.org:8080/10291
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ic6c5b39e29b2885cd30fede18833cbf23fb755f5
Gerrit-Change-Number: 10291
Gerrit-PatchSet: 3
Gerrit-Owner: Vuk Ercegovac 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Vuk Ercegovac 


[Impala-ASF-CR] IMPALA-6948,IMPALA-6962: add end-to-end tests

2018-05-08 Thread Vuk Ercegovac (Code Review)
Vuk Ercegovac has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/10291 )

Change subject: IMPALA-6948,IMPALA-6962: add end-to-end tests
..


Patch Set 2:

(5 comments)

http://gerrit.cloudera.org:8080/#/c/10291/2/tests/custom_cluster/test_metadata_replicas.py
File tests/custom_cluster/test_metadata_replicas.py:

http://gerrit.cloudera.org:8080/#/c/10291/2/tests/custom_cluster/test_metadata_replicas.py@39
PS2, Line 39: def test_load_all(self, cursor):
: """ Loads metadata for all tables and validates that they're 
all the same."""
: # Use describe to issue table loads.
: c_objects = 
self.cluster.catalogd.service.get_catalog_objects()
: for obj in c_objects:
:   if obj[1] == "TABLE": cursor.execute("describe %s" % obj[0])
: # Use a sentinel ddl with sync_ddl to make sure that all 
impalads
: # have receieved the effect of the previous ddl's.
: cursor.execute("set sync_ddl=true")
: cursor.execute("invalidate metadata functional.alltypes")
:
: self.__validate_metadata()
> This test is going to be very expensive and I am not sure how much extra co
Removing this test... alex raised concerns about expense as well. It takes 
about 7-8 s, which is still less than 50% of the total time (remaining time 
needed for custom cluster setup/teardown.
However, since its not the main point of this change, I'll remove it; we can 
add something like this later on if needed.


http://gerrit.cloudera.org:8080/#/c/10291/2/tests/custom_cluster/test_metadata_replicas.py@69
PS2, Line 69: cursor.execute("invalidate metadata")
> No need for the global one, you may do "invalidate metadata x".
Done


http://gerrit.cloudera.org:8080/#/c/10291/2/tests/custom_cluster/test_metadata_replicas.py@79
PS2, Line 79: wait_time_s = specific_build_type_timeout(10, 
slow_build_timeout=20)
: sleep(wait_time_s)
> I agree with the TODO and I think we should fix it now. Essentially, in ord
Added a catalog version to the /catalog page.
For this test, I check the catalog version and the latest catalog version from 
each impalad. When they are in agreement, the rest of the test can proceed.
When starting up the catalogd, we wait until its subscribed to the statestore. 
Since subscription happens *after* the catalog is instantiated and loaded, its 
safe to access the catalogd catalog version at this point.


http://gerrit.cloudera.org:8080/#/c/10291/2/tests/custom_cluster/test_metadata_replicas.py@81
PS2, Line 81: self.cluster.refresh()
> Hm, why is this needed?
refreshes the cluster process list after restarting a new catalogd process. 
needed to kill it later on L88.


http://gerrit.cloudera.org:8080/#/c/10291/2/tests/custom_cluster/test_metadata_replicas.py@123
PS2, Line 123: def __dump_objects(self, catalog, impalads):
 : """ For debugging, prints all objects and their version."""
 : print "CATALOG OBJECTS"
 : for k, v in catalog.items():
 :   print "%s, %s, %d" % (k, v[0], v[1])
 :
 : for idx in xrange(0,len(impalads)):
 :   print "IMPALAD %d OBJECTS" % idx
 :   for k, v in impalads[idx].items():
 : print "%s, %s, %d" % (k, v[0], v[1])
> Is this used anywhere?
just using it for debugging so removed.



--
To view, visit http://gerrit.cloudera.org:8080/10291
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic6c5b39e29b2885cd30fede18833cbf23fb755f5
Gerrit-Change-Number: 10291
Gerrit-PatchSet: 2
Gerrit-Owner: Vuk Ercegovac 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Vuk Ercegovac 
Gerrit-Comment-Date: Tue, 08 May 2018 20:32:39 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-7017: deflake/fix test catalog restart test

2018-05-14 Thread Vuk Ercegovac (Code Review)
Vuk Ercegovac has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/10397


Change subject: IMPALA-7017: deflake/fix test_catalog_restart test
..

IMPALA-7017: deflake/fix test_catalog_restart test

The custom_cluster/test_metadata_replicas.py:test_catalog_restart
test has been recently flaky/broken for two reasons:

1) Variable support for Hive and non-hdfs filesystems. Other tests that
depend on Hive have disabled tests for non-hdfs filesystems. Since the
functionality tested is not intended for all filesystems, this change
disables this test for all filesystems other than hdfs.

2) Several builds have been flaky when looking up catalogd's version.
This change adds a retry for obtaining the version.

Change-Id: Iab6edb01f0bd7f5408cfef28fd05fdc95fb78469
---
M tests/common/impala_service.py
M tests/custom_cluster/test_metadata_replicas.py
2 files changed, 22 insertions(+), 3 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/97/10397/1
--
To view, visit http://gerrit.cloudera.org:8080/10397
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: Iab6edb01f0bd7f5408cfef28fd05fdc95fb78469
Gerrit-Change-Number: 10397
Gerrit-PatchSet: 1
Gerrit-Owner: Vuk Ercegovac 


[Impala-ASF-CR] IMPALA-6948: Delete catalog update topic entries upon catalog restart

2018-05-07 Thread Vuk Ercegovac (Code Review)
Vuk Ercegovac has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/10289 )

Change subject: IMPALA-6948: Delete catalog update topic entries upon catalog 
restart
..


Patch Set 3:

(2 comments)

looks good. one remaining question about tightening the protocol.

http://gerrit.cloudera.org:8080/#/c/10289/1/be/src/statestore/statestore.cc
File be/src/statestore/statestore.cc:

http://gerrit.cloudera.org:8080/#/c/10289/1/be/src/statestore/statestore.cc@665
PS1, Line 665: 
subscriber->SetLastTopicVersionProcessed(topic_it->first, update.from_version);
> I see what you mean. Ideally, I want to keep that operation (deleting the t
makes sense. however, if I were to test these code paths, I would add a case 
where from_version is set *and* we clear_topic_entries(). current code in the 
catalog (a subscriber) does not allow this combination afaict. main question 
from the standpoint of the protocol is whether we even want to consider this 
combination. if not, then lets document it with a dcheck.


http://gerrit.cloudera.org:8080/#/c/10289/3/be/src/statestore/statestore.cc
File be/src/statestore/statestore.cc:

http://gerrit.cloudera.org:8080/#/c/10289/3/be/src/statestore/statestore.cc@215
PS3, Line 215: -
if all is correct, the metric won't be negative. however, what happens if it 
does go negative?



--
To view, visit http://gerrit.cloudera.org:8080/10289
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I74a8ade8e498ac35cb56d3775d2c67a86988d9b6
Gerrit-Change-Number: 10289
Gerrit-PatchSet: 3
Gerrit-Owner: Dimitris Tsirogiannis 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Vuk Ercegovac 
Gerrit-Comment-Date: Mon, 07 May 2018 19:22:53 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-6948: Delete catalog update topic entries upon catalog restart

2018-05-07 Thread Vuk Ercegovac (Code Review)
Vuk Ercegovac has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/10289 )

Change subject: IMPALA-6948: Delete catalog update topic entries upon catalog 
restart
..


Patch Set 4: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/10289
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I74a8ade8e498ac35cb56d3775d2c67a86988d9b6
Gerrit-Change-Number: 10289
Gerrit-PatchSet: 4
Gerrit-Owner: Dimitris Tsirogiannis 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Vuk Ercegovac 
Gerrit-Comment-Date: Mon, 07 May 2018 23:44:55 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-6948,IMPALA-6962: add end-to-end tests

2018-05-08 Thread Vuk Ercegovac (Code Review)
Vuk Ercegovac has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/10291 )

Change subject: IMPALA-6948,IMPALA-6962: add end-to-end tests
..


Patch Set 3:

(6 comments)

http://gerrit.cloudera.org:8080/#/c/10291/3/be/src/catalog/catalog-server.cc
File be/src/catalog/catalog-server.cc:

http://gerrit.cloudera.org:8080/#/c/10291/3/be/src/catalog/catalog-server.cc@304
PS3, Line 304: long current_catalog_version;
> Why is this dependent on the success of GetDbs()? I think the version and t
did it this way to share the error handling and avoid multiple "error" fields. 
I could have an error field and a list of reasons?


http://gerrit.cloudera.org:8080/#/c/10291/3/tests/common/impala_service.py
File tests/common/impala_service.py:

http://gerrit.cloudera.org:8080/#/c/10291/3/tests/common/impala_service.py@151
PS3, Line 151: """ Extracts and returns the version of the catalog object's 
'thrift_txt' representation."""
> I think the JSON field is called 'thrift_string'
the value that passed in here is coming from get_catalog_object_dump, which is 
just html, not json.


http://gerrit.cloudera.org:8080/#/c/10291/3/tests/custom_cluster/test_metadata_replicas.py
File tests/custom_cluster/test_metadata_replicas.py:

http://gerrit.cloudera.org:8080/#/c/10291/3/tests/custom_cluster/test_metadata_replicas.py@39
PS3, Line 39:   @pytest.mark.xfail(reason="IMPALA-6948")
> Issue is fixed now.
Done


http://gerrit.cloudera.org:8080/#/c/10291/3/tests/custom_cluster/test_metadata_replicas.py@54
PS3, Line 54:   "create table if not exists %s.x (a string)" % db_name)
> We should not do IF NOT EXISTS here and IF EXISTS below in the drop. If the
Done


http://gerrit.cloudera.org:8080/#/c/10291/3/tests/custom_cluster/test_metadata_replicas.py@55
PS3, Line 55: cursor.execute("set SYNC_DDL=true")
> Not obvious why we need this. Can we remove it if we consistently use the s
Done


http://gerrit.cloudera.org:8080/#/c/10291/3/tests/custom_cluster/test_metadata_replicas.py@57
PS3, Line 57: assert "x" in self.client.execute("show tables in %s" % 
db_name).data
> The use of both cursor and self.client is somewhat confusing. How many conn
Done



--
To view, visit http://gerrit.cloudera.org:8080/10291
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic6c5b39e29b2885cd30fede18833cbf23fb755f5
Gerrit-Change-Number: 10291
Gerrit-PatchSet: 3
Gerrit-Owner: Vuk Ercegovac 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Vuk Ercegovac 
Gerrit-Comment-Date: Wed, 09 May 2018 00:26:00 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-6948,IMPALA-6962: add end-to-end tests

2018-05-08 Thread Vuk Ercegovac (Code Review)
Vuk Ercegovac has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/10291 )

Change subject: IMPALA-6948,IMPALA-6962: add end-to-end tests
..


Patch Set 4:

changed how catalogd process is restarted: now using os.system instead of the 
wrapper around popen. that change fixes an issue with zombie procs in the 
jenkins env (that unfortunately was not present in the dev env).


--
To view, visit http://gerrit.cloudera.org:8080/10291
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic6c5b39e29b2885cd30fede18833cbf23fb755f5
Gerrit-Change-Number: 10291
Gerrit-PatchSet: 4
Gerrit-Owner: Vuk Ercegovac 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Vuk Ercegovac 
Gerrit-Comment-Date: Wed, 09 May 2018 05:35:54 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-6948,IMPALA-6962: add end-to-end tests

2018-05-08 Thread Vuk Ercegovac (Code Review)
Hello Dimitris Tsirogiannis, Alex Behm,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/10291

to look at the new patch set (#5).

Change subject: IMPALA-6948,IMPALA-6962: add end-to-end tests
..

IMPALA-6948,IMPALA-6962: add end-to-end tests

Adds end-to-end tests to validate that following
various metadata operations, the catalog state
in catalogd and impalads is the same.

For IMPALA-6962, catalogd process restart for tests
is fixed.

Change-Id: Ic6c5b39e29b2885cd30fede18833cbf23fb755f5
---
M be/src/catalog/catalog-server.cc
M tests/common/impala_cluster.py
M tests/common/impala_service.py
A tests/custom_cluster/test_metadata_replicas.py
M tests/metadata/test_hms_integration.py
A tests/util/hive_utils.py
6 files changed, 240 insertions(+), 68 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/91/10291/5
--
To view, visit http://gerrit.cloudera.org:8080/10291
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ic6c5b39e29b2885cd30fede18833cbf23fb755f5
Gerrit-Change-Number: 10291
Gerrit-PatchSet: 5
Gerrit-Owner: Vuk Ercegovac 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Vuk Ercegovac 


[Impala-ASF-CR] IMPALA-6948,IMPALA-6962: add end-to-end tests

2018-05-09 Thread Vuk Ercegovac (Code Review)
Vuk Ercegovac has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/10291 )

Change subject: IMPALA-6948,IMPALA-6962: add end-to-end tests
..


Patch Set 5:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/10291/3/be/src/catalog/catalog-server.cc
File be/src/catalog/catalog-server.cc:

http://gerrit.cloudera.org:8080/#/c/10291/3/be/src/catalog/catalog-server.cc@304
PS3, Line 304: void CatalogServer::CatalogUrlCallback(const 
Webserver::ArgumentMap& args,
> I just think it's weird that we don't show the dbs if there was an error wi
done. makes sense, I was treating it as all-or-nothing.


http://gerrit.cloudera.org:8080/#/c/10291/5/tests/custom_cluster/test_metadata_replicas.py
File tests/custom_cluster/test_metadata_replicas.py:

http://gerrit.cloudera.org:8080/#/c/10291/5/tests/custom_cluster/test_metadata_replicas.py@50
PS5, Line 50:   self.client.execute("invalidate metadata 
functional.alltypes")
> check that the catalog version is at least 50 at this point to make debuggi
Done



--
To view, visit http://gerrit.cloudera.org:8080/10291
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic6c5b39e29b2885cd30fede18833cbf23fb755f5
Gerrit-Change-Number: 10291
Gerrit-PatchSet: 5
Gerrit-Owner: Vuk Ercegovac 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Vuk Ercegovac 
Gerrit-Comment-Date: Wed, 09 May 2018 18:23:03 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-6948,IMPALA-6962: add end-to-end tests

2018-05-09 Thread Vuk Ercegovac (Code Review)
Hello Dimitris Tsirogiannis, Alex Behm,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/10291

to look at the new patch set (#6).

Change subject: IMPALA-6948,IMPALA-6962: add end-to-end tests
..

IMPALA-6948,IMPALA-6962: add end-to-end tests

Adds end-to-end tests to validate that following
various metadata operations, the catalog state
in catalogd and impalads is the same.

For IMPALA-6962, catalogd process restart for tests
is fixed.

Change-Id: Ic6c5b39e29b2885cd30fede18833cbf23fb755f5
---
M be/src/catalog/catalog-server.cc
M tests/common/impala_cluster.py
M tests/common/impala_service.py
A tests/custom_cluster/test_metadata_replicas.py
M tests/metadata/test_hms_integration.py
A tests/util/hive_utils.py
6 files changed, 242 insertions(+), 68 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/91/10291/6
--
To view, visit http://gerrit.cloudera.org:8080/10291
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ic6c5b39e29b2885cd30fede18833cbf23fb755f5
Gerrit-Change-Number: 10291
Gerrit-PatchSet: 6
Gerrit-Owner: Vuk Ercegovac 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Vuk Ercegovac 


[Impala-ASF-CR](2.x) IMPALA-6948,IMPALA-6962: add end-to-end tests

2018-05-09 Thread Vuk Ercegovac (Code Review)
Vuk Ercegovac has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/10363


Change subject: IMPALA-6948,IMPALA-6962: add end-to-end tests
..

IMPALA-6948,IMPALA-6962: add end-to-end tests

Adds end-to-end tests to validate that following
various metadata operations, the catalog state
in catalogd and impalads is the same.

For IMPALA-6962, catalogd process restart for tests
is fixed.

Change-Id: Ic6c5b39e29b2885cd30fede18833cbf23fb755f5
Reviewed-on: http://gerrit.cloudera.org:8080/10291
Reviewed-by: Alex Behm 
Tested-by: Impala Public Jenkins 
(cherry picked from commit 28c1f76529b79d437b66a80b954ec227e0ddc6cd)
---
M be/src/catalog/catalog-server.cc
M tests/common/impala_cluster.py
M tests/common/impala_service.py
A tests/custom_cluster/test_metadata_replicas.py
M tests/metadata/test_hms_integration.py
A tests/util/hive_utils.py
6 files changed, 242 insertions(+), 68 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/63/10363/1
--
To view, visit http://gerrit.cloudera.org:8080/10363
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: 2.x
Gerrit-MessageType: newchange
Gerrit-Change-Id: Ic6c5b39e29b2885cd30fede18833cbf23fb755f5
Gerrit-Change-Number: 10363
Gerrit-PatchSet: 1
Gerrit-Owner: Vuk Ercegovac 


[Impala-ASF-CR] IMPALA-5931: Generates scan ranges in planner for s3/adls

2018-05-10 Thread Vuk Ercegovac (Code Review)
Vuk Ercegovac has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8523 )

Change subject: IMPALA-5931: Generates scan ranges in planner for s3/adls
..


Patch Set 11:

(1 comment)

getting back to this change... the upcoming update includes the idea to better 
separate the split specs vs. concrete scan ranges. rebase changes caught this 
in the middle so the update also includes rebase changes (only affected 
HdfsScanNode.java).

http://gerrit.cloudera.org:8080/#/c/8523/11/common/thrift/PlanNodes.thrift
File common/thrift/PlanNodes.thrift:

http://gerrit.cloudera.org:8080/#/c/8523/11/common/thrift/PlanNodes.thrift@228
PS11, Line 228: 4: optional THdfsFileSplitGeneratorSpec 
hdfs_file_split_generator_spec
> will try it out.
The additional changes are relatively mechanical. Since this design explicitly 
does not allow these specs to propagate further in the backend, I agree that 
we're better off going this route.



--
To view, visit http://gerrit.cloudera.org:8080/8523
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I326065adbb2f7e632814113aae85cb51ca4779a5
Gerrit-Change-Number: 8523
Gerrit-PatchSet: 11
Gerrit-Owner: Vuk Ercegovac 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Vuk Ercegovac 
Gerrit-Comment-Date: Thu, 10 May 2018 21:42:02 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5931: Generates scan ranges in planner for s3/adls

2018-05-10 Thread Vuk Ercegovac (Code Review)
Hello Lars Volker, Dimitris Tsirogiannis, Alex Behm, Mostafa Mokhtar, Dan Hecht,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/8523

to look at the new patch set (#12).

Change subject: IMPALA-5931: Generates scan ranges in planner for s3/adls
..

IMPALA-5931: Generates scan ranges in planner for s3/adls

Currently, for filesystems that do not include physical
block information (e.g., block replica locations, caching),
synthetic blocks are generated and stored in the catalog
when metadata is loaded. Example file systems for which this is done
includes S3, ADLS, and local fs.

This change avoids generating these blocks when metadata is loaded.
Instead, scan ranges are directly generated from such files by the
backend coordinator. Previously, all scan ranges were produced by
the planner in HDFSScanNode in the frontend. Now, those files without
block information are sent to the coordinator represented by a split
specification that determines how the coordinator will create scan ranges
to send to executors.

This change reduces the space needed in the catalog and reduces the
scan range data structures that are passed from the frontend to the
backend when planning and coordinating a query.
In addition a bug is avoided where non-splittable files were being
split anyways to support the query parameter that places a limit on
scan ranges.

Testing:
- added backend scheduler tests
- mixed-filesystems test covers tables/queries with multiple fs's.
- local fs tests cover the code paths in this change
- all core tests pass when configured with s3
- manually tried larger local filesystem tables (tpch) with multiple
  partitions and observed the same scan ranges.
  - TODO: adls testing

Change-Id: I326065adbb2f7e632814113aae85cb51ca4779a5
---
M be/src/scheduling/query-schedule.h
M be/src/scheduling/scheduler-test-util.cc
M be/src/scheduling/scheduler-test-util.h
M be/src/scheduling/scheduler-test.cc
M be/src/scheduling/scheduler.cc
M be/src/scheduling/scheduler.h
M be/src/util/CMakeLists.txt
A be/src/util/flat_buffer.cc
A be/src/util/flat_buffer.h
M common/thrift/Frontend.thrift
M common/thrift/PlanNodes.thrift
M common/thrift/Planner.thrift
M fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
M fe/src/main/java/org/apache/impala/planner/DataSourceScanNode.java
M fe/src/main/java/org/apache/impala/planner/HBaseScanNode.java
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
M fe/src/main/java/org/apache/impala/planner/KuduScanNode.java
M fe/src/main/java/org/apache/impala/planner/ScanNode.java
M fe/src/main/java/org/apache/impala/service/Frontend.java
M fe/src/test/java/org/apache/impala/planner/PlannerTestBase.java
21 files changed, 718 insertions(+), 233 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/23/8523/12
--
To view, visit http://gerrit.cloudera.org:8080/8523
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I326065adbb2f7e632814113aae85cb51ca4779a5
Gerrit-Change-Number: 8523
Gerrit-PatchSet: 12
Gerrit-Owner: Vuk Ercegovac 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Vuk Ercegovac 


[Impala-ASF-CR] IMPALA-5931: Generates scan ranges in planner for s3/adls

2018-05-19 Thread Vuk Ercegovac (Code Review)
Vuk Ercegovac has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8523 )

Change subject: IMPALA-5931: Generates scan ranges in planner for s3/adls
..


Patch Set 14:

running more tests...


--
To view, visit http://gerrit.cloudera.org:8080/8523
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I326065adbb2f7e632814113aae85cb51ca4779a5
Gerrit-Change-Number: 8523
Gerrit-PatchSet: 14
Gerrit-Owner: Vuk Ercegovac 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Vuk Ercegovac 
Gerrit-Comment-Date: Sat, 19 May 2018 20:26:10 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5931: Generates scan ranges in planner for s3/adls

2018-05-15 Thread Vuk Ercegovac (Code Review)
Vuk Ercegovac has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8523 )

Change subject: IMPALA-5931: Generates scan ranges in planner for s3/adls
..


Patch Set 12:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/8523/12//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/8523/12//COMMIT_MSG@13
PS12, Line 13: includes S3, ADLS, and local fs.
> What about erasure coded HDFS data?
there are todo's in this change to extend to hdfs. pls see the comment in 
PlanNodes.thrift.



--
To view, visit http://gerrit.cloudera.org:8080/8523
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I326065adbb2f7e632814113aae85cb51ca4779a5
Gerrit-Change-Number: 8523
Gerrit-PatchSet: 12
Gerrit-Owner: Vuk Ercegovac 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Vuk Ercegovac 
Gerrit-Comment-Date: Tue, 15 May 2018 17:35:32 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5931: Generates scan ranges in planner for s3/adls

2018-05-23 Thread Vuk Ercegovac (Code Review)
Vuk Ercegovac has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8523 )

Change subject: IMPALA-5931: Generates scan ranges in planner for s3/adls
..


Patch Set 15:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/8523/15/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
File fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java:

http://gerrit.cloudera.org:8080/#/c/8523/15/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@783
PS15, Line 783:   boolean checkMissingDiskIds = 
FileSystemUtil.supportsStorageIds(partitionFs);
  :   boolean supportsBlocks = 
FileSystemUtil.supportsStorageIds(partitionFs);
> these are equivalent. how about getting rid of the checkMissingDiskIds, and
see reply to last comment for naming.


http://gerrit.cloudera.org:8080/#/c/8523/15/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@812
PS15, Line 812: result.first
> was changing the polarity of this a bug fix?
yes. there was a bug here, as well as double-counting total_bytes from a prev. 
merge, and several npe's from that latest thrift refactor (java thrift list is 
null, not empty)


http://gerrit.cloudera.org:8080/#/c/8523/15/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@914
PS15, Line 914: checkMissingDiskIds && !fileDesc.getIsEc()
> this was confusing because when I read the name "checkMissingDiskIds" I tho
renamed to: fsHasBlocks



--
To view, visit http://gerrit.cloudera.org:8080/8523
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I326065adbb2f7e632814113aae85cb51ca4779a5
Gerrit-Change-Number: 8523
Gerrit-PatchSet: 15
Gerrit-Owner: Vuk Ercegovac 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Tianyi Wang 
Gerrit-Reviewer: Vuk Ercegovac 
Gerrit-Comment-Date: Wed, 23 May 2018 20:44:37 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5931: Generates scan ranges in planner for s3/adls

2018-05-23 Thread Vuk Ercegovac (Code Review)
Hello Lars Volker, Tianyi Wang, Dimitris Tsirogiannis, Alex Behm, Mostafa 
Mokhtar, Dan Hecht,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/8523

to look at the new patch set (#16).

Change subject: IMPALA-5931: Generates scan ranges in planner for s3/adls
..

IMPALA-5931: Generates scan ranges in planner for s3/adls

Currently, for filesystems that do not include physical
block information (e.g., block replica locations, caching),
synthetic blocks are generated and stored in the catalog
when metadata is loaded. Example file systems for which this is done
includes S3, ADLS, and local fs.

This change avoids generating these blocks when metadata is loaded.
Instead, scan ranges are directly generated from such files by the
backend coordinator. Previously, all scan ranges were produced by
the planner in HDFSScanNode in the frontend. Now, those files without
block information are sent to the coordinator represented by a split
specification that determines how the coordinator will create scan ranges
to send to executors.

This change reduces the space needed in the catalog and reduces the
scan range data structures that are passed from the frontend to the
backend when planning and coordinating a query.
In addition a bug is avoided where non-splittable files were being
split anyways to support the query parameter that places a limit on
scan ranges.

Testing:
- added backend scheduler tests
- mixed-filesystems test covers tables/queries with multiple fs's.
- local fs tests cover the code paths in this change
- all core tests pass when configured with s3
- manually tried larger local filesystem tables (tpch) with multiple
  partitions and observed the same scan ranges.
  - TODO: adls testing

Change-Id: I326065adbb2f7e632814113aae85cb51ca4779a5
---
M be/src/scheduling/query-schedule.h
M be/src/scheduling/scheduler-test-util.cc
M be/src/scheduling/scheduler-test-util.h
M be/src/scheduling/scheduler-test.cc
M be/src/scheduling/scheduler.cc
M be/src/scheduling/scheduler.h
M be/src/util/CMakeLists.txt
A be/src/util/flat_buffer.cc
A be/src/util/flat_buffer.h
M common/thrift/Frontend.thrift
M common/thrift/PlanNodes.thrift
M common/thrift/Planner.thrift
M fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
M fe/src/main/java/org/apache/impala/planner/DataSourceScanNode.java
M fe/src/main/java/org/apache/impala/planner/HBaseScanNode.java
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
M fe/src/main/java/org/apache/impala/planner/KuduScanNode.java
M fe/src/main/java/org/apache/impala/planner/ScanNode.java
M fe/src/main/java/org/apache/impala/service/Frontend.java
M fe/src/test/java/org/apache/impala/planner/PlannerTestBase.java
21 files changed, 676 insertions(+), 246 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/23/8523/16
--
To view, visit http://gerrit.cloudera.org:8080/8523
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I326065adbb2f7e632814113aae85cb51ca4779a5
Gerrit-Change-Number: 8523
Gerrit-PatchSet: 16
Gerrit-Owner: Vuk Ercegovac 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Tianyi Wang 
Gerrit-Reviewer: Vuk Ercegovac 


[Impala-ASF-CR] IMPALA-5931: Generates scan ranges in planner for s3/adls

2018-05-22 Thread Vuk Ercegovac (Code Review)
Hello Lars Volker, Tianyi Wang, Dimitris Tsirogiannis, Alex Behm, Mostafa 
Mokhtar, Dan Hecht,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/8523

to look at the new patch set (#15).

Change subject: IMPALA-5931: Generates scan ranges in planner for s3/adls
..

IMPALA-5931: Generates scan ranges in planner for s3/adls

Currently, for filesystems that do not include physical
block information (e.g., block replica locations, caching),
synthetic blocks are generated and stored in the catalog
when metadata is loaded. Example file systems for which this is done
includes S3, ADLS, and local fs.

This change avoids generating these blocks when metadata is loaded.
Instead, scan ranges are directly generated from such files by the
backend coordinator. Previously, all scan ranges were produced by
the planner in HDFSScanNode in the frontend. Now, those files without
block information are sent to the coordinator represented by a split
specification that determines how the coordinator will create scan ranges
to send to executors.

This change reduces the space needed in the catalog and reduces the
scan range data structures that are passed from the frontend to the
backend when planning and coordinating a query.
In addition a bug is avoided where non-splittable files were being
split anyways to support the query parameter that places a limit on
scan ranges.

Testing:
- added backend scheduler tests
- mixed-filesystems test covers tables/queries with multiple fs's.
- local fs tests cover the code paths in this change
- all core tests pass when configured with s3
- manually tried larger local filesystem tables (tpch) with multiple
  partitions and observed the same scan ranges.
  - TODO: adls testing

Change-Id: I326065adbb2f7e632814113aae85cb51ca4779a5
---
M be/src/scheduling/query-schedule.h
M be/src/scheduling/scheduler-test-util.cc
M be/src/scheduling/scheduler-test-util.h
M be/src/scheduling/scheduler-test.cc
M be/src/scheduling/scheduler.cc
M be/src/scheduling/scheduler.h
M be/src/util/CMakeLists.txt
A be/src/util/flat_buffer.cc
A be/src/util/flat_buffer.h
M common/thrift/Frontend.thrift
M common/thrift/PlanNodes.thrift
M common/thrift/Planner.thrift
M fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
M fe/src/main/java/org/apache/impala/planner/DataSourceScanNode.java
M fe/src/main/java/org/apache/impala/planner/HBaseScanNode.java
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
M fe/src/main/java/org/apache/impala/planner/KuduScanNode.java
M fe/src/main/java/org/apache/impala/planner/ScanNode.java
M fe/src/main/java/org/apache/impala/service/Frontend.java
M fe/src/test/java/org/apache/impala/planner/PlannerTestBase.java
21 files changed, 677 insertions(+), 245 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/23/8523/15
--
To view, visit http://gerrit.cloudera.org:8080/8523
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I326065adbb2f7e632814113aae85cb51ca4779a5
Gerrit-Change-Number: 8523
Gerrit-PatchSet: 15
Gerrit-Owner: Vuk Ercegovac 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Tianyi Wang 
Gerrit-Reviewer: Vuk Ercegovac 


[Impala-ASF-CR] IMPALA-5931: Generates scan ranges in planner for s3/adls

2018-05-22 Thread Vuk Ercegovac (Code Review)
Vuk Ercegovac has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8523 )

Change subject: IMPALA-5931: Generates scan ranges in planner for s3/adls
..


Patch Set 14:

Next change handles the merge. It was straightforward; the ec files are handled 
as regular hdfs files, so do not go through the prev. synthetic block 
generation.

For testing, I re-ran core and s3 tests.


--
To view, visit http://gerrit.cloudera.org:8080/8523
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I326065adbb2f7e632814113aae85cb51ca4779a5
Gerrit-Change-Number: 8523
Gerrit-PatchSet: 14
Gerrit-Owner: Vuk Ercegovac 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Tianyi Wang 
Gerrit-Reviewer: Vuk Ercegovac 
Gerrit-Comment-Date: Wed, 23 May 2018 05:39:21 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5931: Generates scan ranges in planner for s3/adls

2018-05-24 Thread Vuk Ercegovac (Code Review)
Vuk Ercegovac has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8523 )

Change subject: IMPALA-5931: Generates scan ranges in planner for s3/adls
..


Patch Set 17: Code-Review+2

rebase and fix minor conflict.
carrying dan's +2


--
To view, visit http://gerrit.cloudera.org:8080/8523
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I326065adbb2f7e632814113aae85cb51ca4779a5
Gerrit-Change-Number: 8523
Gerrit-PatchSet: 17
Gerrit-Owner: Vuk Ercegovac 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Tianyi Wang 
Gerrit-Reviewer: Vuk Ercegovac 
Gerrit-Comment-Date: Thu, 24 May 2018 06:29:57 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5931: Generates scan ranges in planner for s3/adls

2018-05-24 Thread Vuk Ercegovac (Code Review)
Hello Lars Volker, Tianyi Wang, Dimitris Tsirogiannis, Alex Behm, Mostafa 
Mokhtar, Dan Hecht,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/8523

to look at the new patch set (#17).

Change subject: IMPALA-5931: Generates scan ranges in planner for s3/adls
..

IMPALA-5931: Generates scan ranges in planner for s3/adls

Currently, for filesystems that do not include physical
block information (e.g., block replica locations, caching),
synthetic blocks are generated and stored in the catalog
when metadata is loaded. Example file systems for which this is done
includes S3, ADLS, and local fs.

This change avoids generating these blocks when metadata is loaded.
Instead, scan ranges are directly generated from such files by the
backend coordinator. Previously, all scan ranges were produced by
the planner in HDFSScanNode in the frontend. Now, those files without
block information are sent to the coordinator represented by a split
specification that determines how the coordinator will create scan ranges
to send to executors.

This change reduces the space needed in the catalog and reduces the
scan range data structures that are passed from the frontend to the
backend when planning and coordinating a query.
In addition a bug is avoided where non-splittable files were being
split anyways to support the query parameter that places a limit on
scan ranges.

Testing:
- added backend scheduler tests
- mixed-filesystems test covers tables/queries with multiple fs's.
- local fs tests cover the code paths in this change
- all core tests pass when configured with s3
- manually tried larger local filesystem tables (tpch) with multiple
  partitions and observed the same scan ranges.
- TODO: adls testing

Change-Id: I326065adbb2f7e632814113aae85cb51ca4779a5
---
M be/src/scheduling/query-schedule.h
M be/src/scheduling/scheduler-test-util.cc
M be/src/scheduling/scheduler-test-util.h
M be/src/scheduling/scheduler-test.cc
M be/src/scheduling/scheduler.cc
M be/src/scheduling/scheduler.h
M be/src/util/CMakeLists.txt
A be/src/util/flat_buffer.cc
A be/src/util/flat_buffer.h
M common/thrift/Frontend.thrift
M common/thrift/PlanNodes.thrift
M common/thrift/Planner.thrift
M fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
M fe/src/main/java/org/apache/impala/planner/DataSourceScanNode.java
M fe/src/main/java/org/apache/impala/planner/HBaseScanNode.java
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
M fe/src/main/java/org/apache/impala/planner/KuduScanNode.java
M fe/src/main/java/org/apache/impala/planner/ScanNode.java
M fe/src/main/java/org/apache/impala/service/Frontend.java
M fe/src/test/java/org/apache/impala/planner/PlannerTestBase.java
21 files changed, 676 insertions(+), 246 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/23/8523/17
--
To view, visit http://gerrit.cloudera.org:8080/8523
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I326065adbb2f7e632814113aae85cb51ca4779a5
Gerrit-Change-Number: 8523
Gerrit-PatchSet: 17
Gerrit-Owner: Vuk Ercegovac 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Tianyi Wang 
Gerrit-Reviewer: Vuk Ercegovac 


[Impala-ASF-CR] IMPALA-4025: Part 1: Generalize and cleanup StmtRewriter

2018-05-24 Thread Vuk Ercegovac (Code Review)
Vuk Ercegovac has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/10495 )

Change subject: IMPALA-4025: Part 1: Generalize and cleanup StmtRewriter
..


Patch Set 1: Code-Review+2

thanks for the cleanup


--
To view, visit http://gerrit.cloudera.org:8080/10495
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9e7a6108d3d49be12ae032fdb54b5c3c23152a47
Gerrit-Change-Number: 10495
Gerrit-PatchSet: 1
Gerrit-Owner: Tianyi Wang 
Gerrit-Reviewer: Vuk Ercegovac 
Gerrit-Comment-Date: Thu, 24 May 2018 06:16:54 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-6933: Avoids db name collisions for Kudu tests

2018-05-25 Thread Vuk Ercegovac (Code Review)
Vuk Ercegovac has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/10513 )

Change subject: IMPALA-6933: Avoids db name collisions for Kudu tests
..


Patch Set 1:

went with the pid idea so to distinguish the same test method in separate 
processes. I added comments in conftest.py to state that the alternative api is 
"deprecated" and to prefer using the alternative unique_database fixture (it 
looks like these kudu tests are the only user of the other scheme).


--
To view, visit http://gerrit.cloudera.org:8080/10513
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7c2f8a35fec90ae0dabe80237d83954668b47f6e
Gerrit-Change-Number: 10513
Gerrit-PatchSet: 1
Gerrit-Owner: Vuk Ercegovac 
Gerrit-Reviewer: Michael Brown 
Gerrit-Reviewer: Vuk Ercegovac 
Gerrit-Comment-Date: Fri, 25 May 2018 23:15:39 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-6933: Avoids db name collisions for Kudu tests

2018-05-25 Thread Vuk Ercegovac (Code Review)
Hello Michael Brown,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/10513

to look at the new patch set (#2).

Change subject: IMPALA-6933: Avoids db name collisions for Kudu tests
..

IMPALA-6933: Avoids db name collisions for Kudu tests

Kudu tests generate temporary db names in a way so that its
unlikely, yet possible to collide. A recent test failure
indicates such a collision came up. The fix changes the
way that the name is generated so that it includes the
classes name for which the db name is generated. This db name
will make it easier to see which test created it and the name
will not collide with other names generated by other tests.

Testing:
- ran the updated test locally

Change-Id: I7c2f8a35fec90ae0dabe80237d83954668b47f6e
---
M tests/common/kudu_test_suite.py
M tests/conftest.py
2 files changed, 29 insertions(+), 5 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/13/10513/2
--
To view, visit http://gerrit.cloudera.org:8080/10513
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I7c2f8a35fec90ae0dabe80237d83954668b47f6e
Gerrit-Change-Number: 10513
Gerrit-PatchSet: 2
Gerrit-Owner: Vuk Ercegovac 
Gerrit-Reviewer: Michael Brown 
Gerrit-Reviewer: Vuk Ercegovac 


[Impala-ASF-CR] IMPALA-5931: Generates scan ranges in planner for s3/adls

2018-05-24 Thread Vuk Ercegovac (Code Review)
Vuk Ercegovac has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8523 )

Change subject: IMPALA-5931: Generates scan ranges in planner for s3/adls
..


Patch Set 17:

hmm. looks like some dependency order for generating flat buffers.


--
To view, visit http://gerrit.cloudera.org:8080/8523
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I326065adbb2f7e632814113aae85cb51ca4779a5
Gerrit-Change-Number: 8523
Gerrit-PatchSet: 17
Gerrit-Owner: Vuk Ercegovac 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Tianyi Wang 
Gerrit-Reviewer: Vuk Ercegovac 
Gerrit-Comment-Date: Thu, 24 May 2018 16:02:38 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5931: Generates scan ranges in planner for s3/adls

2018-05-24 Thread Vuk Ercegovac (Code Review)
Vuk Ercegovac has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8523 )

Change subject: IMPALA-5931: Generates scan ranges in planner for s3/adls
..


Patch Set 17:

Odd that this happens with gvo, but not for internal builds.


--
To view, visit http://gerrit.cloudera.org:8080/8523
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I326065adbb2f7e632814113aae85cb51ca4779a5
Gerrit-Change-Number: 8523
Gerrit-PatchSet: 17
Gerrit-Owner: Vuk Ercegovac 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Tianyi Wang 
Gerrit-Reviewer: Vuk Ercegovac 
Gerrit-Comment-Date: Thu, 24 May 2018 17:40:13 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5931: Generates scan ranges in planner for s3/adls

2018-05-24 Thread Vuk Ercegovac (Code Review)
Vuk Ercegovac has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8523 )

Change subject: IMPALA-5931: Generates scan ranges in planner for s3/adls
..


Patch Set 18:

latest change explicitly adds a dep for flat buffers for the backend (in the 
same place that the be depends on thrift and protos). afaict, this should have 
been in there for this change-- not sure why some of the builds didn't need it.


--
To view, visit http://gerrit.cloudera.org:8080/8523
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I326065adbb2f7e632814113aae85cb51ca4779a5
Gerrit-Change-Number: 8523
Gerrit-PatchSet: 18
Gerrit-Owner: Vuk Ercegovac 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Tianyi Wang 
Gerrit-Reviewer: Vuk Ercegovac 
Gerrit-Comment-Date: Thu, 24 May 2018 21:13:36 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5931: Generates scan ranges in planner for s3/adls

2018-05-24 Thread Vuk Ercegovac (Code Review)
Hello Lars Volker, Tianyi Wang, Dimitris Tsirogiannis, Alex Behm, Mostafa 
Mokhtar, Dan Hecht, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/8523

to look at the new patch set (#18).

Change subject: IMPALA-5931: Generates scan ranges in planner for s3/adls
..

IMPALA-5931: Generates scan ranges in planner for s3/adls

Currently, for filesystems that do not include physical
block information (e.g., block replica locations, caching),
synthetic blocks are generated and stored in the catalog
when metadata is loaded. Example file systems for which this is done
includes S3, ADLS, and local fs.

This change avoids generating these blocks when metadata is loaded.
Instead, scan ranges are directly generated from such files by the
backend coordinator. Previously, all scan ranges were produced by
the planner in HDFSScanNode in the frontend. Now, those files without
block information are sent to the coordinator represented by a split
specification that determines how the coordinator will create scan ranges
to send to executors.

This change reduces the space needed in the catalog and reduces the
scan range data structures that are passed from the frontend to the
backend when planning and coordinating a query.
In addition a bug is avoided where non-splittable files were being
split anyways to support the query parameter that places a limit on
scan ranges.

Testing:
- added backend scheduler tests
- mixed-filesystems test covers tables/queries with multiple fs's.
- local fs tests cover the code paths in this change
- all core tests pass when configured with s3
- manually tried larger local filesystem tables (tpch) with multiple
  partitions and observed the same scan ranges.
- TODO: adls testing

Change-Id: I326065adbb2f7e632814113aae85cb51ca4779a5
---
M CMakeLists.txt
M be/src/scheduling/query-schedule.h
M be/src/scheduling/scheduler-test-util.cc
M be/src/scheduling/scheduler-test-util.h
M be/src/scheduling/scheduler-test.cc
M be/src/scheduling/scheduler.cc
M be/src/scheduling/scheduler.h
M be/src/util/CMakeLists.txt
A be/src/util/flat_buffer.cc
A be/src/util/flat_buffer.h
M common/thrift/Frontend.thrift
M common/thrift/PlanNodes.thrift
M common/thrift/Planner.thrift
M fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
M fe/src/main/java/org/apache/impala/planner/DataSourceScanNode.java
M fe/src/main/java/org/apache/impala/planner/HBaseScanNode.java
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
M fe/src/main/java/org/apache/impala/planner/KuduScanNode.java
M fe/src/main/java/org/apache/impala/planner/ScanNode.java
M fe/src/main/java/org/apache/impala/service/Frontend.java
M fe/src/test/java/org/apache/impala/planner/PlannerTestBase.java
22 files changed, 677 insertions(+), 247 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/23/8523/18
--
To view, visit http://gerrit.cloudera.org:8080/8523
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I326065adbb2f7e632814113aae85cb51ca4779a5
Gerrit-Change-Number: 8523
Gerrit-PatchSet: 18
Gerrit-Owner: Vuk Ercegovac 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Tianyi Wang 
Gerrit-Reviewer: Vuk Ercegovac 


[Impala-ASF-CR] IMPALA-6933: Avoids db name collisions for Kudu tests

2018-05-25 Thread Vuk Ercegovac (Code Review)
Vuk Ercegovac has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/10513


Change subject: IMPALA-6933: Avoids db name collisions for Kudu tests
..

IMPALA-6933: Avoids db name collisions for Kudu tests

Kudu tests generate temporary db names in a way so that its
unlikely, yet possible to collide. A recent test failure
indicates such a collision came up. The fix changes the
way that the name is generated so that it includes the
classes name for which the db name is generated. This db name
will make it easier to see which test created it and the name
will not collide with other names generated by other tests.

Testing:
- ran the updated test locally

Change-Id: I7c2f8a35fec90ae0dabe80237d83954668b47f6e
---
M tests/common/kudu_test_suite.py
1 file changed, 6 insertions(+), 4 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/13/10513/1
--
To view, visit http://gerrit.cloudera.org:8080/10513
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I7c2f8a35fec90ae0dabe80237d83954668b47f6e
Gerrit-Change-Number: 10513
Gerrit-PatchSet: 1
Gerrit-Owner: Vuk Ercegovac 


[Impala-ASF-CR] IMPALA-5931: Generates scan ranges in planner for s3/adls

2018-05-18 Thread Vuk Ercegovac (Code Review)
Hello Lars Volker, Dimitris Tsirogiannis, Alex Behm, Mostafa Mokhtar, Dan Hecht,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/8523

to look at the new patch set (#13).

Change subject: IMPALA-5931: Generates scan ranges in planner for s3/adls
..

IMPALA-5931: Generates scan ranges in planner for s3/adls

Currently, for filesystems that do not include physical
block information (e.g., block replica locations, caching),
synthetic blocks are generated and stored in the catalog
when metadata is loaded. Example file systems for which this is done
includes S3, ADLS, and local fs.

This change avoids generating these blocks when metadata is loaded.
Instead, scan ranges are directly generated from such files by the
backend coordinator. Previously, all scan ranges were produced by
the planner in HDFSScanNode in the frontend. Now, those files without
block information are sent to the coordinator represented by a split
specification that determines how the coordinator will create scan ranges
to send to executors.

This change reduces the space needed in the catalog and reduces the
scan range data structures that are passed from the frontend to the
backend when planning and coordinating a query.
In addition a bug is avoided where non-splittable files were being
split anyways to support the query parameter that places a limit on
scan ranges.

Testing:
- added backend scheduler tests
- mixed-filesystems test covers tables/queries with multiple fs's.
- local fs tests cover the code paths in this change
- all core tests pass when configured with s3
- manually tried larger local filesystem tables (tpch) with multiple
  partitions and observed the same scan ranges.
  - TODO: adls testing

Change-Id: I326065adbb2f7e632814113aae85cb51ca4779a5
---
M be/src/scheduling/query-schedule.h
M be/src/scheduling/scheduler-test-util.cc
M be/src/scheduling/scheduler-test-util.h
M be/src/scheduling/scheduler-test.cc
M be/src/scheduling/scheduler.cc
M be/src/scheduling/scheduler.h
M be/src/util/CMakeLists.txt
A be/src/util/flat_buffer.cc
A be/src/util/flat_buffer.h
M common/thrift/Frontend.thrift
M common/thrift/PlanNodes.thrift
M common/thrift/Planner.thrift
M fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
M fe/src/main/java/org/apache/impala/planner/DataSourceScanNode.java
M fe/src/main/java/org/apache/impala/planner/HBaseScanNode.java
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
M fe/src/main/java/org/apache/impala/planner/KuduScanNode.java
M fe/src/main/java/org/apache/impala/planner/ScanNode.java
M fe/src/main/java/org/apache/impala/service/Frontend.java
M fe/src/test/java/org/apache/impala/planner/PlannerTestBase.java
21 files changed, 665 insertions(+), 240 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/23/8523/13
--
To view, visit http://gerrit.cloudera.org:8080/8523
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I326065adbb2f7e632814113aae85cb51ca4779a5
Gerrit-Change-Number: 8523
Gerrit-PatchSet: 13
Gerrit-Owner: Vuk Ercegovac 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Vuk Ercegovac 


[Impala-ASF-CR] IMPALA-5931: Generates scan ranges in planner for s3/adls

2018-05-18 Thread Vuk Ercegovac (Code Review)
Vuk Ercegovac has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8523 )

Change subject: IMPALA-5931: Generates scan ranges in planner for s3/adls
..


Patch Set 12:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/8523/12/common/thrift/Frontend.thrift
File common/thrift/Frontend.thrift:

http://gerrit.cloudera.org:8080/#/c/8523/12/common/thrift/Frontend.thrift@375
PS12, Line 375:   per_node_scan_ranges
> Thanks, the new thrift structures make more sense to me. Maybe it's simpler
yeah, it lets me revert to upfront expansion as I had couple of changes back 
(but without mixed concrete/spec) and get rid of that iterator.


http://gerrit.cloudera.org:8080/#/c/8523/12/common/thrift/PlanNodes.thrift
File common/thrift/PlanNodes.thrift:

http://gerrit.cloudera.org:8080/#/c/8523/12/common/thrift/PlanNodes.thrift@202
PS12, Line 202: THdfsFileSplitGeneratorSpec
> update
Done



--
To view, visit http://gerrit.cloudera.org:8080/8523
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I326065adbb2f7e632814113aae85cb51ca4779a5
Gerrit-Change-Number: 8523
Gerrit-PatchSet: 12
Gerrit-Owner: Vuk Ercegovac 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Vuk Ercegovac 
Gerrit-Comment-Date: Fri, 18 May 2018 21:24:55 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4025: Part 1: Add percentile disc aggregation function

2018-05-17 Thread Vuk Ercegovac (Code Review)
Vuk Ercegovac has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/9777 )

Change subject: IMPALA-4025: Part 1: Add percentile_disc aggregation function
..


Patch Set 7:

(6 comments)

http://gerrit.cloudera.org:8080/#/c/9777/7/fe/src/main/java/org/apache/impala/analysis/AggregateInfo.java
File fe/src/main/java/org/apache/impala/analysis/AggregateInfo.java:

http://gerrit.cloudera.org:8080/#/c/9777/7/fe/src/main/java/org/apache/impala/analysis/AggregateInfo.java@99
PS7, Line 99: outputGroupingExprSmap_
is this used (or planned to be used) anywhere?


http://gerrit.cloudera.org:8080/#/c/9777/7/fe/src/main/java/org/apache/impala/analysis/Analyzer.java
File fe/src/main/java/org/apache/impala/analysis/Analyzer.java:

http://gerrit.cloudera.org:8080/#/c/9777/7/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@187
PS7, Line 187:   void setContainsPercentile() {
public?


http://gerrit.cloudera.org:8080/#/c/9777/7/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@217
PS7, Line 217: boolean containsPercentile = false;
make public as well?


http://gerrit.cloudera.org:8080/#/c/9777/7/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@354
PS7, Line 354:   boolean selectBlockContainsPercentile() { return 
containsPercentile; }
intentional visibility?


http://gerrit.cloudera.org:8080/#/c/9777/7/fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java
File fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java:

http://gerrit.cloudera.org:8080/#/c/9777/7/fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java@108
PS7, Line 108: rewriteSelectStatementHook(stmt, analyzer);
Any way to separate this refactor into its own change? first example I saw: 
where did the helper 'hasSubqueryInDisunction' go? I'm guessing this doesn't 
have anything to specifically do with the change for percentile. Looks like a 
good cleanup overall, but since there is some subtle existing functionality 
here, it would be helpful to separate clean-up from the actual change.


http://gerrit.cloudera.org:8080/#/c/9777/5/fe/src/main/java/org/apache/impala/catalog/AggregateFunction.java
File fe/src/main/java/org/apache/impala/catalog/AggregateFunction.java:

http://gerrit.cloudera.org:8080/#/c/9777/5/fe/src/main/java/org/apache/impala/catalog/AggregateFunction.java@253
PS5, Line 253: if (isLogical_) {
change this to a precondition



--
To view, visit http://gerrit.cloudera.org:8080/9777
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iacef7b3fcd74c4c73d88400ce27307c3baa0121e
Gerrit-Change-Number: 9777
Gerrit-PatchSet: 7
Gerrit-Owner: Tianyi Wang 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Tianyi Wang 
Gerrit-Reviewer: Vuk Ercegovac 
Gerrit-Comment-Date: Thu, 17 May 2018 21:11:26 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5931: Generates scan ranges in planner for s3/adls

2018-05-16 Thread Vuk Ercegovac (Code Review)
Vuk Ercegovac has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8523 )

Change subject: IMPALA-5931: Generates scan ranges in planner for s3/adls
..


Patch Set 12:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/8523/12//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/8523/12//COMMIT_MSG@13
PS12, Line 13: includes S3, ADLS, and local fs.
> Does this change the dependency on fs.s3a.block.size?
This change does not attempt to change the scan ranges that are produced for 
these file systems. Currently, we rely on the filestatus blocksize (see L177 of 
HdfsPartition) when synthesizing blocks to store in the catalog. This patch 
shifts that synthesis to the scheduler, so that its generated for each use 
instead of stored in memory. The block parameter used for this synthesis is set 
in this change on L784 of HdfsScanNode. Instead of filestatus blocksize, it 
uses the filesystem's default block size, which I think is the same thing for 
these file-systems (at least, from what I could tell from hadoop.fs.FileSystem).



--
To view, visit http://gerrit.cloudera.org:8080/8523
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I326065adbb2f7e632814113aae85cb51ca4779a5
Gerrit-Change-Number: 8523
Gerrit-PatchSet: 12
Gerrit-Owner: Vuk Ercegovac 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Vuk Ercegovac 
Gerrit-Comment-Date: Wed, 16 May 2018 06:04:52 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-7019: Schedule EC as remote & disable failed tests

2018-05-22 Thread Vuk Ercegovac (Code Review)
Vuk Ercegovac has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/10413 )

Change subject: IMPALA-7019: Schedule EC as remote & disable failed tests
..


Patch Set 6:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/10413/6/fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java
File fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java:

http://gerrit.cloudera.org:8080/#/c/10413/6/fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java@118
PS6, Line 118: FileBlock.createFbFileBlock(fbb,
 :   loc.getOffset(), loc.getLength(),
 :   (short) 
hostIndex.getIndex(REMOTE_NETWORK_ADDRESS));
> The reason is not strong: Though EC reads are remote, we might still don't
ok, so you were worried that synthetic division into blocks could differ from 
how a file is chunked into blocks on hdfs. Assuming the block size provided by 
FileStatus never changes (L186), when would you see that the two different 
chunking schemes differ?



--
To view, visit http://gerrit.cloudera.org:8080/10413
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I138738d3e28e5daa1718c05c04cd9dd146c4ff84
Gerrit-Change-Number: 10413
Gerrit-PatchSet: 6
Gerrit-Owner: Tianyi Wang 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-Reviewer: Tianyi Wang 
Gerrit-Reviewer: Vuk Ercegovac 
Gerrit-Comment-Date: Tue, 22 May 2018 18:27:05 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-7019: Schedule EC as remote & disable failed tests

2018-05-22 Thread Vuk Ercegovac (Code Review)
Vuk Ercegovac has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/10413 )

Change subject: IMPALA-7019: Schedule EC as remote & disable failed tests
..


Patch Set 6:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/10413/6/fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java
File fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java:

http://gerrit.cloudera.org:8080/#/c/10413/6/fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java@118
PS6, Line 118: FileBlock.createFbFileBlock(fbb,
 :   loc.getOffset(), loc.getLength(),
 :   (short) 
hostIndex.getIndex(REMOTE_NETWORK_ADDRESS));
I know this is merged, but just wanted to revisit why we're storing block 
locations for ec files this way rather than synthesizing them via L137?



--
To view, visit http://gerrit.cloudera.org:8080/10413
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I138738d3e28e5daa1718c05c04cd9dd146c4ff84
Gerrit-Change-Number: 10413
Gerrit-PatchSet: 6
Gerrit-Owner: Tianyi Wang 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-Reviewer: Tianyi Wang 
Gerrit-Reviewer: Vuk Ercegovac 
Gerrit-Comment-Date: Tue, 22 May 2018 17:58:42 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5931: Generates scan ranges in planner for s3/adls

2018-05-18 Thread Vuk Ercegovac (Code Review)
Vuk Ercegovac has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8523 )

Change subject: IMPALA-5931: Generates scan ranges in planner for s3/adls
..


Patch Set 13:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/8523/13/be/src/scheduling/scheduler.cc
File be/src/scheduling/scheduler.cc:

http://gerrit.cloudera.org:8080/#/c/8523/13/be/src/scheduling/scheduler.cc@302
PS13, Line 302: for (const TScanRangeLocationList& range : 
entry.second.concrete_range) {
  :   expanded_locations.push_back(range);
  : }
> that could be just expanded_locations.insert(concrete_range.begin(), concre
Done


http://gerrit.cloudera.org:8080/#/c/8523/13/common/thrift/Planner.thrift
File common/thrift/Planner.thrift:

http://gerrit.cloudera.org:8080/#/c/8523/13/common/thrift/Planner.thrift@108
PS13, Line 108: concrete_range
> plural since the field is a list: concrete_ranges, split_specs
Done


http://gerrit.cloudera.org:8080/#/c/8523/13/fe/src/main/java/org/apache/impala/planner/HBaseScanNode.java
File fe/src/main/java/org/apache/impala/planner/HBaseScanNode.java:

http://gerrit.cloudera.org:8080/#/c/8523/13/fe/src/main/java/org/apache/impala/planner/HBaseScanNode.java@224
PS13, Line 224: scanRangeSpecs_.getSplit_specSize()
> this seems wrong - a single spec may result in multiple hosts, no?
removed. this class never adds specs, but added a precondition check in init in 
the event that's changed for some reason.


http://gerrit.cloudera.org:8080/#/c/8523/13/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
File fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java:

http://gerrit.cloudera.org:8080/#/c/8523/13/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@1126
PS13, Line 1126: scanRangeSpecs_.getSplit_specSize();
> shouldn't that do some calculation based on file size and blocks size and i
good catch. needed to change other places that directly use the size of the 
spec list.



--
To view, visit http://gerrit.cloudera.org:8080/8523
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I326065adbb2f7e632814113aae85cb51ca4779a5
Gerrit-Change-Number: 8523
Gerrit-PatchSet: 13
Gerrit-Owner: Vuk Ercegovac 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Vuk Ercegovac 
Gerrit-Comment-Date: Sat, 19 May 2018 01:39:32 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5931: Generates scan ranges in planner for s3/adls

2018-05-18 Thread Vuk Ercegovac (Code Review)
Hello Lars Volker, Dimitris Tsirogiannis, Alex Behm, Mostafa Mokhtar, Dan Hecht,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/8523

to look at the new patch set (#14).

Change subject: IMPALA-5931: Generates scan ranges in planner for s3/adls
..

IMPALA-5931: Generates scan ranges in planner for s3/adls

Currently, for filesystems that do not include physical
block information (e.g., block replica locations, caching),
synthetic blocks are generated and stored in the catalog
when metadata is loaded. Example file systems for which this is done
includes S3, ADLS, and local fs.

This change avoids generating these blocks when metadata is loaded.
Instead, scan ranges are directly generated from such files by the
backend coordinator. Previously, all scan ranges were produced by
the planner in HDFSScanNode in the frontend. Now, those files without
block information are sent to the coordinator represented by a split
specification that determines how the coordinator will create scan ranges
to send to executors.

This change reduces the space needed in the catalog and reduces the
scan range data structures that are passed from the frontend to the
backend when planning and coordinating a query.
In addition a bug is avoided where non-splittable files were being
split anyways to support the query parameter that places a limit on
scan ranges.

Testing:
- added backend scheduler tests
- mixed-filesystems test covers tables/queries with multiple fs's.
- local fs tests cover the code paths in this change
- all core tests pass when configured with s3
- manually tried larger local filesystem tables (tpch) with multiple
  partitions and observed the same scan ranges.
  - TODO: adls testing

Change-Id: I326065adbb2f7e632814113aae85cb51ca4779a5
---
M be/src/scheduling/query-schedule.h
M be/src/scheduling/scheduler-test-util.cc
M be/src/scheduling/scheduler-test-util.h
M be/src/scheduling/scheduler-test.cc
M be/src/scheduling/scheduler.cc
M be/src/scheduling/scheduler.h
M be/src/util/CMakeLists.txt
A be/src/util/flat_buffer.cc
A be/src/util/flat_buffer.h
M common/thrift/Frontend.thrift
M common/thrift/PlanNodes.thrift
M common/thrift/Planner.thrift
M fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
M fe/src/main/java/org/apache/impala/planner/DataSourceScanNode.java
M fe/src/main/java/org/apache/impala/planner/HBaseScanNode.java
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
M fe/src/main/java/org/apache/impala/planner/KuduScanNode.java
M fe/src/main/java/org/apache/impala/planner/ScanNode.java
M fe/src/main/java/org/apache/impala/service/Frontend.java
M fe/src/test/java/org/apache/impala/planner/PlannerTestBase.java
21 files changed, 676 insertions(+), 241 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/23/8523/14
--
To view, visit http://gerrit.cloudera.org:8080/8523
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I326065adbb2f7e632814113aae85cb51ca4779a5
Gerrit-Change-Number: 8523
Gerrit-PatchSet: 14
Gerrit-Owner: Vuk Ercegovac 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Vuk Ercegovac 


[Impala-ASF-CR](2.x) IMPALA-5931: Generates scan ranges in planner for s3/adls

2018-06-11 Thread Vuk Ercegovac (Code Review)
Hello Impala Public Jenkins,

I'd like you to do a code review. Please visit

http://gerrit.cloudera.org:8080/10692

to review the following change.


Change subject: IMPALA-5931: Generates scan ranges in planner for s3/adls
..

IMPALA-5931: Generates scan ranges in planner for s3/adls

Currently, for filesystems that do not include physical
block information (e.g., block replica locations, caching),
synthetic blocks are generated and stored in the catalog
when metadata is loaded. Example file systems for which this is done
includes S3, ADLS, and local fs.

This change avoids generating these blocks when metadata is loaded.
Instead, scan ranges are directly generated from such files by the
backend coordinator. Previously, all scan ranges were produced by
the planner in HDFSScanNode in the frontend. Now, those files without
block information are sent to the coordinator represented by a split
specification that determines how the coordinator will create scan ranges
to send to executors.

This change reduces the space needed in the catalog and reduces the
scan range data structures that are passed from the frontend to the
backend when planning and coordinating a query.
In addition a bug is avoided where non-splittable files were being
split anyways to support the query parameter that places a limit on
scan ranges.

Testing:
- added backend scheduler tests
- mixed-filesystems test covers tables/queries with multiple fs's.
- local fs tests cover the code paths in this change
- all core tests pass when configured with s3
- manually tried larger local filesystem tables (tpch) with multiple
  partitions and observed the same scan ranges.
- TODO: adls testing

Change-Id: I326065adbb2f7e632814113aae85cb51ca4779a5
Reviewed-on: http://gerrit.cloudera.org:8080/8523
Reviewed-by: Vuk Ercegovac 
Tested-by: Impala Public Jenkins 
---
M CMakeLists.txt
M be/src/scheduling/query-schedule.h
M be/src/scheduling/scheduler-test-util.cc
M be/src/scheduling/scheduler-test-util.h
M be/src/scheduling/scheduler-test.cc
M be/src/scheduling/scheduler.cc
M be/src/scheduling/scheduler.h
M be/src/util/CMakeLists.txt
A be/src/util/flat_buffer.cc
A be/src/util/flat_buffer.h
M common/thrift/Frontend.thrift
M common/thrift/PlanNodes.thrift
M common/thrift/Planner.thrift
M fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
M fe/src/main/java/org/apache/impala/planner/DataSourceScanNode.java
M fe/src/main/java/org/apache/impala/planner/HBaseScanNode.java
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
M fe/src/main/java/org/apache/impala/planner/KuduScanNode.java
M fe/src/main/java/org/apache/impala/planner/ScanNode.java
M fe/src/main/java/org/apache/impala/service/Frontend.java
M fe/src/test/java/org/apache/impala/planner/PlannerTestBase.java
22 files changed, 677 insertions(+), 245 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/92/10692/1
--
To view, visit http://gerrit.cloudera.org:8080/10692
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: 2.x
Gerrit-MessageType: newchange
Gerrit-Change-Id: I326065adbb2f7e632814113aae85cb51ca4779a5
Gerrit-Change-Number: 10692
Gerrit-PatchSet: 1
Gerrit-Owner: Vuk Ercegovac 
Gerrit-Reviewer: Impala Public Jenkins 


[Impala-ASF-CR] IMPALA-7144: Re-enable TestDescribeTableResults

2018-06-08 Thread Vuk Ercegovac (Code Review)
Vuk Ercegovac has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/10643 )

Change subject: IMPALA-7144: Re-enable TestDescribeTableResults
..


Patch Set 5:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/10643/5/fe/src/test/java/org/apache/impala/analysis/AuthorizationTest.java
File fe/src/test/java/org/apache/impala/analysis/AuthorizationTest.java:

http://gerrit.cloudera.org:8080/#/c/10643/5/fe/src/test/java/org/apache/impala/analysis/AuthorizationTest.java@1979
PS5, Line 1979: rowIdx++; // Ignore the first empty line.
  : Preconditions.checkElementIndex(rowIdx + 1, 
rows.size());
simplify:
rowIdx += 2; // Skips over the first empty line
Preconditions.checkElementIndex(rowIdx, rows.size());
cols = rows.get(rowIdx).getColVals();


http://gerrit.cloudera.org:8080/#/c/10643/5/fe/src/test/java/org/apache/impala/analysis/AuthorizationTest.java@1985
PS5, Line 1985:   Preconditions.checkElementIndex(rowIdx + 1, 
rows.size());
simplify:
++rowIdx;
Preconditions.checkElementIndex(rowIdx, rows.size());
cols = rows.get(rowIdx).getColVals();



--
To view, visit http://gerrit.cloudera.org:8080/10643
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I3aeaecf5b6d906a66d338e165a6d506e3964563f
Gerrit-Change-Number: 10643
Gerrit-PatchSet: 5
Gerrit-Owner: Fredy Wijaya 
Gerrit-Reviewer: Fredy Wijaya 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Vuk Ercegovac 
Gerrit-Comment-Date: Fri, 08 Jun 2018 15:59:57 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5931: Generates scan ranges in planner for s3/adls

2018-06-08 Thread Vuk Ercegovac (Code Review)
Vuk Ercegovac has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8523 )

Change subject: IMPALA-5931: Generates scan ranges in planner for s3/adls
..


Patch Set 19:

rebased (clean). re-ran s3 tests (pass).


--
To view, visit http://gerrit.cloudera.org:8080/8523
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I326065adbb2f7e632814113aae85cb51ca4779a5
Gerrit-Change-Number: 8523
Gerrit-PatchSet: 19
Gerrit-Owner: Vuk Ercegovac 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Tianyi Wang 
Gerrit-Reviewer: Vuk Ercegovac 
Gerrit-Comment-Date: Fri, 08 Jun 2018 06:43:31 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-7144: Re-enable TestDescribeTableResults

2018-06-08 Thread Vuk Ercegovac (Code Review)
Vuk Ercegovac has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/10643 )

Change subject: IMPALA-7144: Re-enable TestDescribeTableResults
..


Patch Set 6: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/10643
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I3aeaecf5b6d906a66d338e165a6d506e3964563f
Gerrit-Change-Number: 10643
Gerrit-PatchSet: 6
Gerrit-Owner: Fredy Wijaya 
Gerrit-Reviewer: Fredy Wijaya 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Vuk Ercegovac 
Gerrit-Comment-Date: Fri, 08 Jun 2018 16:06:00 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-7144: Re-enable TestDescribeTableResults

2018-06-08 Thread Vuk Ercegovac (Code Review)
Vuk Ercegovac has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/10643 )

Change subject: IMPALA-7144: Re-enable TestDescribeTableResults
..


Patch Set 4:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/10643/4/fe/src/test/java/org/apache/impala/analysis/AuthorizationTest.java
File fe/src/test/java/org/apache/impala/analysis/AuthorizationTest.java:

http://gerrit.cloudera.org:8080/#/c/10643/4/fe/src/test/java/org/apache/impala/analysis/AuthorizationTest.java@1974
PS4, Line 1974: location
does it matter if location is set multiple times for a given set of rows?


http://gerrit.cloudera.org:8080/#/c/10643/4/fe/src/test/java/org/apache/impala/analysis/AuthorizationTest.java@1979
PS4, Line 1979: rowIdx
that's not the index that is accessed (++rowIdx).



--
To view, visit http://gerrit.cloudera.org:8080/10643
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I3aeaecf5b6d906a66d338e165a6d506e3964563f
Gerrit-Change-Number: 10643
Gerrit-PatchSet: 4
Gerrit-Owner: Fredy Wijaya 
Gerrit-Reviewer: Fredy Wijaya 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Vuk Ercegovac 
Gerrit-Comment-Date: Fri, 08 Jun 2018 06:54:22 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-6929: Support multi-column range partitions for Kudu

2018-06-08 Thread Vuk Ercegovac (Code Review)
Vuk Ercegovac has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/10441 )

Change subject: IMPALA-6929: Support multi-column range partitions for Kudu
..


Patch Set 1:

(4 comments)

looks good, mainly nits and clarifying questions.

http://gerrit.cloudera.org:8080/#/c/10441/1/fe/src/main/java/org/apache/impala/analysis/RangePartition.java
File fe/src/main/java/org/apache/impala/analysis/RangePartition.java:

http://gerrit.cloudera.org:8080/#/c/10441/1/fe/src/main/java/org/apache/impala/analysis/RangePartition.java@46
PS1, Line 46: Multi-value:
:  *   PARTITION VALUE = (val1, val2, ..., valn)
needs to updated?


http://gerrit.cloudera.org:8080/#/c/10441/1/fe/src/main/java/org/apache/impala/analysis/RangePartition.java@81
PS1, Line 81: CREATE
nit: and ALTER


http://gerrit.cloudera.org:8080/#/c/10441/1/testdata/workloads/functional-planner/queries/PlannerTest/kudu.test
File testdata/workloads/functional-planner/queries/PlannerTest/kudu.test:

http://gerrit.cloudera.org:8080/#/c/10441/1/testdata/workloads/functional-planner/queries/PlannerTest/kudu.test@528
PS1, Line 528: '1'
why was this change needed?


http://gerrit.cloudera.org:8080/#/c/10441/1/testdata/workloads/functional-query/queries/QueryTest/kudu_alter.test
File testdata/workloads/functional-query/queries/QueryTest/kudu_alter.test:

http://gerrit.cloudera.org:8080/#/c/10441/1/testdata/workloads/functional-query/queries/QueryTest/kudu_alter.test@575
PS1, Line 575: ('a', 0) <= values < ('b', 1))
just for my own info: what happens if the specified ranges overlap? what will 
happen to tuples that are inserted into the overlap? just wondering if analysis 
should attempt to ban such cases.



--
To view, visit http://gerrit.cloudera.org:8080/10441
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0141dd3344a4f22b186f513b7406f286668ef1e7
Gerrit-Change-Number: 10441
Gerrit-PatchSet: 1
Gerrit-Owner: Thomas Marshall 
Gerrit-Reviewer: Vuk Ercegovac 
Gerrit-Comment-Date: Fri, 08 Jun 2018 17:35:57 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-7128 (part 1) Refactor interfaces for Db, View, Table, Partition

2018-06-07 Thread Vuk Ercegovac (Code Review)
Vuk Ercegovac has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/10611 )

Change subject: IMPALA-7128 (part 1) Refactor interfaces for Db, View, Table, 
Partition
..


Patch Set 2:

(14 comments)

made an initial pass on what seems to be the non-mechanical parts of this 
change.

http://gerrit.cloudera.org:8080/#/c/10611/2/fe/src/main/java/org/apache/impala/analysis/StmtMetadataLoader.java
File fe/src/main/java/org/apache/impala/analysis/StmtMetadataLoader.java:

http://gerrit.cloudera.org:8080/#/c/10611/2/fe/src/main/java/org/apache/impala/analysis/StmtMetadataLoader.java@175
PS2, Line 175: SimpleCatalog
nit: stale


http://gerrit.cloudera.org:8080/#/c/10611/2/fe/src/main/java/org/apache/impala/catalog/FeCatalog.java
File fe/src/main/java/org/apache/impala/catalog/FeCatalog.java:

http://gerrit.cloudera.org:8080/#/c/10611/2/fe/src/main/java/org/apache/impala/catalog/FeCatalog.java@86
PS2, Line 86: HDFS
nit: FS ... I realize that some of the existing classes were misnamed HDFS, 
which is too specific (could be local, s3, etc.). But since we're redo'ing this 
area, perhaps we should refer to FS instead of HDFS?


http://gerrit.cloudera.org:8080/#/c/10611/2/fe/src/main/java/org/apache/impala/catalog/FeDb.java
File fe/src/main/java/org/apache/impala/catalog/FeDb.java:

http://gerrit.cloudera.org:8080/#/c/10611/2/fe/src/main/java/org/apache/impala/catalog/FeDb.java@27
PS2, Line 27: from the frontend
nit: rephrase as "Frontend interface for interacting with a database."


http://gerrit.cloudera.org:8080/#/c/10611/2/fe/src/main/java/org/apache/impala/catalog/FeDb.java@31
PS2, Line 31: @return
Both styles exist (javadoc or not) in the code base, but would be preferable to 
be consistent per file (L39 method does not use @return, for example).


http://gerrit.cloudera.org:8080/#/c/10611/2/fe/src/main/java/org/apache/impala/catalog/FeHdfsPartition.java
File fe/src/main/java/org/apache/impala/catalog/FeHdfsPartition.java:

http://gerrit.cloudera.org:8080/#/c/10611/2/fe/src/main/java/org/apache/impala/catalog/FeHdfsPartition.java@31
PS2, Line 31: HDFS
FS


http://gerrit.cloudera.org:8080/#/c/10611/2/fe/src/main/java/org/apache/impala/catalog/FeHdfsPartition.java@31
PS2, Line 31: between
remove


http://gerrit.cloudera.org:8080/#/c/10611/2/fe/src/main/java/org/apache/impala/catalog/FeHdfsPartition.java@34
PS2, Line 34: Hdfs
I'd rename this to FS (couple comments about this)


http://gerrit.cloudera.org:8080/#/c/10611/2/fe/src/main/java/org/apache/impala/catalog/FeHdfsPartition.java@61
PS2, Line 61: /**
nit: add a newline. several other cases below.


http://gerrit.cloudera.org:8080/#/c/10611/2/fe/src/main/java/org/apache/impala/catalog/FeHdfsTable.java
File fe/src/main/java/org/apache/impala/catalog/FeHdfsTable.java:

http://gerrit.cloudera.org:8080/#/c/10611/2/fe/src/main/java/org/apache/impala/catalog/FeHdfsTable.java@36
PS2, Line 36: Note: despite the "HDFS" nomenclature, this table type is also 
used for
:  * interacting with S3 or other Hadoop-compatible filesystems.
yup! shall we adopt the more general, preferred naming now?


http://gerrit.cloudera.org:8080/#/c/10611/2/fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java
File fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java:

http://gerrit.cloudera.org:8080/#/c/10611/2/fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java@961
PS2, Line 961: Comparison
nit: Comparator


http://gerrit.cloudera.org:8080/#/c/10611/2/fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java@964
PS2, Line 964: KeyValueComparator
this change can introduce differences depending on how partitions are dealt 
with in various collections. why is this change needed in this patch?


http://gerrit.cloudera.org:8080/#/c/10611/2/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java:

http://gerrit.cloudera.org:8080/#/c/10611/2/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@563
PS2, Line 563: @Ov
nit: add a line


http://gerrit.cloudera.org:8080/#/c/10611/2/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@581
PS2, Line 581: @O
nit: add a line


http://gerrit.cloudera.org:8080/#/c/10611/2/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@2016
PS2, Line 2016: HdfsPartition.KV_COMPARATOR
how did you find all places to make this type of update, Collections.sort or 
something broader?



--
To view, visit http://gerrit.cloudera.org:8080/10611
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Id55f7d2e94d81e66ce720acb6315f15a89621b31
Gerrit-Change-Number: 10611
Gerrit-PatchSet: 2
Gerrit-Owner: Todd Lipcon 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Reviewer: Vuk Ercegovac 
Gerrit-Comment-Date: Thu, 07 Jun 2018 17:00:35 +

[Impala-ASF-CR] IMPALA-6987: [DOCS] Refactor the INVALIDATE METADATA and REFRESH docs

2018-06-08 Thread Vuk Ercegovac (Code Review)
Vuk Ercegovac has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/10339 )

Change subject: IMPALA-6987: [DOCS] Refactor the INVALIDATE METADATA and 
REFRESH docs
..


Patch Set 5: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/10339
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2124e14900d0f82569c061cc46006447bb054b36
Gerrit-Change-Number: 10339
Gerrit-PatchSet: 5
Gerrit-Owner: Alex Rodoni 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Alex Rodoni 
Gerrit-Reviewer: Balazs Jeszenszky 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Vuk Ercegovac 
Gerrit-Comment-Date: Fri, 08 Jun 2018 17:45:33 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-7128 (part 2): add an interface for data sources

2018-06-08 Thread Vuk Ercegovac (Code Review)
Vuk Ercegovac has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/10626 )

Change subject: IMPALA-7128 (part 2): add an interface for data sources
..


Patch Set 2: Code-Review+2

(3 comments)

http://gerrit.cloudera.org:8080/#/c/10626/2//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/10626/2//COMMIT_MSG@12
PS2, Line 12: IMPALA-7131
pls mark these with todo's that reference this jira.


http://gerrit.cloudera.org:8080/#/c/10626/2/fe/src/main/java/org/apache/impala/catalog/DataSourceTable.java
File fe/src/main/java/org/apache/impala/catalog/DataSourceTable.java:

http://gerrit.cloudera.org:8080/#/c/10626/2/fe/src/main/java/org/apache/impala/catalog/DataSourceTable.java@42
PS2, Line 42: Represents a table backed by an external data source.
pull this part up into the interface.


http://gerrit.cloudera.org:8080/#/c/10626/2/fe/src/main/java/org/apache/impala/catalog/FeDataSource.java
File fe/src/main/java/org/apache/impala/catalog/FeDataSource.java:

http://gerrit.cloudera.org:8080/#/c/10626/2/fe/src/main/java/org/apache/impala/catalog/FeDataSource.java@21
PS2, Line 21:  * Interface for interacting with data sources from the frontend.
nit: use consistent phrasing with the other interfaces
"Frontend interface for interacting with data sources."



--
To view, visit http://gerrit.cloudera.org:8080/10626
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ibe704197dc2ad7c09b8340865f17567096aa630e
Gerrit-Change-Number: 10626
Gerrit-PatchSet: 2
Gerrit-Owner: Todd Lipcon 
Gerrit-Reviewer: Vuk Ercegovac 
Gerrit-Comment-Date: Fri, 08 Jun 2018 18:01:49 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-6929: Support multi-column range partitions for Kudu

2018-06-08 Thread Vuk Ercegovac (Code Review)
Vuk Ercegovac has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/10441 )

Change subject: IMPALA-6929: Support multi-column range partitions for Kudu
..


Patch Set 2: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/10441
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0141dd3344a4f22b186f513b7406f286668ef1e7
Gerrit-Change-Number: 10441
Gerrit-PatchSet: 2
Gerrit-Owner: Thomas Marshall 
Gerrit-Reviewer: Thomas Marshall 
Gerrit-Reviewer: Vuk Ercegovac 
Gerrit-Comment-Date: Fri, 08 Jun 2018 23:45:45 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-6802 (part 4): Clean up authorization tests

2018-06-08 Thread Vuk Ercegovac (Code Review)
Vuk Ercegovac has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/10442 )

Change subject: IMPALA-6802 (part 4): Clean up authorization tests
..


Patch Set 3:

(6 comments)

http://gerrit.cloudera.org:8080/#/c/10442/3/fe/src/test/java/org/apache/impala/analysis/AuthorizationTestV2.java
File fe/src/test/java/org/apache/impala/analysis/AuthorizationTestV2.java:

http://gerrit.cloudera.org:8080/#/c/10442/3/fe/src/test/java/org/apache/impala/analysis/AuthorizationTestV2.java@918
PS3, Line 918:   @Test
perhaps add a comment explaining what's being tested... from a first glance, it 
looks like certain privs can see certain fields, and others cannot? if there 
are docs or jiras that explain this more, pls reference them here.


http://gerrit.cloudera.org:8080/#/c/10442/3/fe/src/test/java/org/apache/impala/analysis/AuthorizationTestV2.java@922
PS3, Line 922:  
consistent spacing (see comment below)


http://gerrit.cloudera.org:8080/#/c/10442/3/fe/src/test/java/org/apache/impala/analysis/AuthorizationTestV2.java@938
PS3, Line 938:
and several more places below


http://gerrit.cloudera.org:8080/#/c/10442/3/fe/src/test/java/org/apache/impala/analysis/AuthorizationTestV2.java@946
PS3, Line 946: allExcept(
 : TPrivilegeLevel.ALL, TPrivilegeLevel.SELECT)
factor out (repeated 3 times in this block, so it adds too much noise).


http://gerrit.cloudera.org:8080/#/c/10442/3/fe/src/test/java/org/apache/impala/analysis/AuthorizationTestV2.java@952
PS3, Line 952: ng[]{"id"},
explain what is being tested here (missing why "id" is special)


http://gerrit.cloudera.org:8080/#/c/10442/3/fe/src/test/java/org/apache/impala/analysis/AuthorizationTestV2.java@1181
PS3, Line 1181:
use consistent spacing for this... from a brief look, seems like no space is 
the preference.



--
To view, visit http://gerrit.cloudera.org:8080/10442
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic4cf3a4751b41908ef81ec35d89a2713d9fa0dc4
Gerrit-Change-Number: 10442
Gerrit-PatchSet: 3
Gerrit-Owner: Adam Holley 
Gerrit-Reviewer: Adam Holley 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Fredy Wijaya 
Gerrit-Reviewer: Vuk Ercegovac 
Gerrit-Comment-Date: Fri, 08 Jun 2018 22:02:14 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-7140 (part 1). Support fetching schema info in LocalCatalog

2018-06-12 Thread Vuk Ercegovac (Code Review)
Vuk Ercegovac has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/10630 )

Change subject: IMPALA-7140 (part 1). Support fetching schema info in 
LocalCatalog
..


Patch Set 4:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/10630/4/fe/src/main/java/org/apache/impala/catalog/FeCatalogUtils.java
File fe/src/main/java/org/apache/impala/catalog/FeCatalogUtils.java:

http://gerrit.cloudera.org:8080/#/c/10630/4/fe/src/main/java/org/apache/impala/catalog/FeCatalogUtils.java@37
PS4, Line 37: @param
> why not? I agree with not slavish adherence to Javadoc when the parameters
It was my understanding that flowing the param name in the description is 
preferable to using @param. I see that the project is inconsistent... some 
places use it, some don't, even in the same file. The several places where I've 
looked, it introduces duplication (e.g., Expr.castChild). Since we're far from 
being consistent, the @param doesn't seem to buy us much, so its terser to omit 
it. Open to alternatives though.


http://gerrit.cloudera.org:8080/#/c/10630/4/fe/src/main/java/org/apache/impala/catalog/local/LocalTable.java
File fe/src/main/java/org/apache/impala/catalog/local/LocalTable.java:

http://gerrit.cloudera.org:8080/#/c/10630/4/fe/src/main/java/org/apache/impala/catalog/local/LocalTable.java@78
PS4, Line 78: schemaInfo;
> the problem is I needed to use tblName on line 75, so I need to checkNotNul
I think this looks fine:
this.db_ = Preconditions.checkNotNull(db);
this.name_ = Preconditions.checkNotNull(tblName);
this.schemaInfo = Preconditions.checkNotNull(schemaInfo);
Preconditions.checkArgument(tblName.toLowerCase().equals(tblName));


http://gerrit.cloudera.org:8080/#/c/10630/4/fe/src/main/java/org/apache/impala/catalog/local/LocalTable.java@190
PS4, Line 190: implementations
> > Got confused about whether this is supposed to factor existing code
thats fine, but pls leave a breadcrumb for where some of the lifting comes from.


http://gerrit.cloudera.org:8080/#/c/10630/4/fe/src/main/java/org/apache/impala/catalog/local/LocalTable.java@191
PS4, Line 191: stats
> does num bytes get cached in the HMS? I think I missed that
yes, have a look at setTableStats in Table.java



--
To view, visit http://gerrit.cloudera.org:8080/10630
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I640f27e36198955e057da62a3ce25a858406e496
Gerrit-Change-Number: 10630
Gerrit-PatchSet: 4
Gerrit-Owner: Todd Lipcon 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Reviewer: Vuk Ercegovac 
Gerrit-Comment-Date: Wed, 13 Jun 2018 00:09:31 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-7140 (part 1). Support fetching schema info in LocalCatalog

2018-06-12 Thread Vuk Ercegovac (Code Review)
Vuk Ercegovac has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/10630 )

Change subject: IMPALA-7140 (part 1). Support fetching schema info in 
LocalCatalog
..


Patch Set 4:

(12 comments)

http://gerrit.cloudera.org:8080/#/c/10630/4/fe/src/main/java/org/apache/impala/catalog/FeCatalogUtils.java
File fe/src/main/java/org/apache/impala/catalog/FeCatalogUtils.java:

http://gerrit.cloudera.org:8080/#/c/10630/4/fe/src/main/java/org/apache/impala/catalog/FeCatalogUtils.java@37
PS4, Line 37: @param
lets not use javadocs.


http://gerrit.cloudera.org:8080/#/c/10630/4/fe/src/main/java/org/apache/impala/catalog/FeCatalogUtils.java@57
PS4, Line 57: full
fully qualified?


http://gerrit.cloudera.org:8080/#/c/10630/4/fe/src/main/java/org/apache/impala/catalog/FeCatalogUtils.java@77
PS4, Line 77: name
fully qualified?


http://gerrit.cloudera.org:8080/#/c/10630/4/fe/src/main/java/org/apache/impala/catalog/FeCatalogUtils.java@80
PS4, Line 80: validateClusteringColumns
based on HdfsTable::addColumnsFromFieldSchemas. pls add a todo for a potential 
refactor there (would be nice not to duplicate these types of exception 
strings).


http://gerrit.cloudera.org:8080/#/c/10630/4/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java:

http://gerrit.cloudera.org:8080/#/c/10630/4/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@119
PS4, Line 119: public
todo: pull it up to FeFsTable or comment at site of use.


http://gerrit.cloudera.org:8080/#/c/10630/4/fe/src/main/java/org/apache/impala/catalog/local/LocalTable.java
File fe/src/main/java/org/apache/impala/catalog/local/LocalTable.java:

http://gerrit.cloudera.org:8080/#/c/10630/4/fe/src/main/java/org/apache/impala/catalog/local/LocalTable.java@78
PS4, Line 78: schemaInfo;
I prefer the alternative style you used for these:
this.db_ = Preconditions.checkNotNull(db)


http://gerrit.cloudera.org:8080/#/c/10630/4/fe/src/main/java/org/apache/impala/catalog/local/LocalTable.java@175
PS4, Line 175: HdfsTable
Table? (so that the child class is not used)
In either case, this is an odd dep. Seems like SchemaInfo can be the class that 
knows specifically about HMS, so the detail about what param to pull out would 
go there explicitly (then replace that method of Table)


http://gerrit.cloudera.org:8080/#/c/10630/4/fe/src/main/java/org/apache/impala/catalog/local/LocalTable.java@175
PS4, Line 175: getRowCount
I saw that -1 is used as a default/unknown value in catalog.Table... needed 
here as well?


http://gerrit.cloudera.org:8080/#/c/10630/4/fe/src/main/java/org/apache/impala/catalog/local/LocalTable.java@190
PS4, Line 190: Table
FeTable?


http://gerrit.cloudera.org:8080/#/c/10630/4/fe/src/main/java/org/apache/impala/catalog/local/LocalTable.java@190
PS4, Line 190: implementations
Got confused about whether this is supposed to factor existing code in Table. 
Looks like it partially does-- pls clarify whether the intent is to factor or 
leave the duplication.


http://gerrit.cloudera.org:8080/#/c/10630/4/fe/src/main/java/org/apache/impala/catalog/local/LocalTable.java@191
PS4, Line 191: stats
is num bytes needed as well?


http://gerrit.cloudera.org:8080/#/c/10630/4/fe/src/main/java/org/apache/impala/catalog/local/LocalTable.java@263
PS4, Line 263: private boolean isClusteringColumn(Column c) {
duplicates the impl in Table, pls add a todo to refactor.



--
To view, visit http://gerrit.cloudera.org:8080/10630
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I640f27e36198955e057da62a3ce25a858406e496
Gerrit-Change-Number: 10630
Gerrit-PatchSet: 4
Gerrit-Owner: Todd Lipcon 
Gerrit-Reviewer: Vuk Ercegovac 
Gerrit-Comment-Date: Tue, 12 Jun 2018 23:26:36 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-7128 (part 2): add an interface for data sources

2018-06-12 Thread Vuk Ercegovac (Code Review)
Vuk Ercegovac has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/10626 )

Change subject: IMPALA-7128 (part 2): add an interface for data sources
..


Patch Set 4: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/10626
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ibe704197dc2ad7c09b8340865f17567096aa630e
Gerrit-Change-Number: 10626
Gerrit-PatchSet: 4
Gerrit-Owner: Todd Lipcon 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Reviewer: Vuk Ercegovac 
Gerrit-Comment-Date: Tue, 12 Jun 2018 21:01:42 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-7128 (part 1) Refactor interfaces for Db, View, Table, Partition

2018-06-12 Thread Vuk Ercegovac (Code Review)
Vuk Ercegovac has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/10611 )

Change subject: IMPALA-7128 (part 1) Refactor interfaces for Db, View, Table, 
Partition
..


Patch Set 5: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/10611
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Id55f7d2e94d81e66ce720acb6315f15a89621b31
Gerrit-Change-Number: 10611
Gerrit-PatchSet: 5
Gerrit-Owner: Todd Lipcon 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Reviewer: Vuk Ercegovac 
Gerrit-Comment-Date: Tue, 12 Jun 2018 21:02:32 +
Gerrit-HasComments: No


  1   2   3   4   5   6   7   8   >