[Impala-ASF-CR] IMPALA-5262: test analytic order by random fails with assert

2017-05-02 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has submitted this change and it was merged.

Change subject: IMPALA-5262: test_analytic_order_by_random fails with assert
..


IMPALA-5262: test_analytic_order_by_random fails with assert

This was a poorly written test that relies on assumptions about
the behavior of 'rand' and the order that rows get processed in
a table that Impala doesn't actually guarantee.

The new version is still sensitive to the precise behavior of
'rand()', but shouldn't be flaky unless that behavior is changed.

Change-Id: If1ba8154c2b6a8d508916d85391b95885ef915a9
Reviewed-on: http://gerrit.cloudera.org:8080/6775
Reviewed-by: Alex Behm 
Tested-by: Impala Public Jenkins
---
M tests/query_test/test_sort.py
1 file changed, 5 insertions(+), 7 deletions(-)

Approvals:
  Impala Public Jenkins: Verified
  Alex Behm: Looks good to me, approved



-- 
To view, visit http://gerrit.cloudera.org:8080/6775
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: If1ba8154c2b6a8d508916d85391b95885ef915a9
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Thomas Tauber-Marshall 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Thomas Tauber-Marshall 


[Impala-ASF-CR] IMPALA-5180: Don't use non-deterministic exprs in partition pruning

2017-05-02 Thread Alex Behm (Code Review)
Alex Behm has posted comments on this change.

Change subject: IMPALA-5180: Don't use non-deterministic exprs in partition 
pruning
..


Patch Set 11: Code-Review+2

-- 
To view, visit http://gerrit.cloudera.org:8080/6575
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I91054c6bf017401242259a1eff5e859085285546
Gerrit-PatchSet: 11
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Zach Amsden 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4864 Speed up single slot predicates with dictionaries

2017-05-02 Thread Zach Amsden (Code Review)
Zach Amsden has uploaded a new patch set (#5).

Change subject: IMPALA-4864 Speed up single slot predicates with dictionaries
..

IMPALA-4864 Speed up single slot predicates with dictionaries

When dictionaries are present we can pre-evaluate conjuncts
against the dictionary values and simply look up the result.

Status of this diff: Compiles and starts.  Bitmap tests for new
functionality pass.  FE and BE tests passing.  Query test now
passing (run not finished).

Change-Id: I65981c89e5292086809ec1268f5a273f4c1fe054
---
M be/src/common/global-flags.cc
M be/src/exec/hdfs-parquet-scanner-ir.cc
M be/src/exec/hdfs-parquet-scanner.cc
M be/src/exec/hdfs-parquet-scanner.h
M be/src/exec/parquet-column-readers.cc
M be/src/exec/parquet-column-readers.h
M be/src/exec/parquet-scratch-tuple-batch.h
M be/src/util/bitmap-test.cc
M be/src/util/bitmap.h
M be/src/util/dict-encoding.h
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
M 
testdata/workloads/functional-planner/queries/PlannerTest/parquet-filtering.test
12 files changed, 446 insertions(+), 185 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/26/6726/5
-- 
To view, visit http://gerrit.cloudera.org:8080/6726
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I65981c89e5292086809ec1268f5a273f4c1fe054
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden 
Gerrit-Reviewer: Joe McDonnell 


[Impala-ASF-CR] IMPALA-5180: Don't use non-deterministic exprs in partition pruning

2017-05-02 Thread Zach Amsden (Code Review)
Hello Impala Public Jenkins, Alex Behm,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/6575

to look at the new patch set (#10).

Change subject: IMPALA-5180: Don't use non-deterministic exprs in partition 
pruning
..

IMPALA-5180: Don't use non-deterministic exprs in partition pruning

Non-deterministic exprs which evaluate as constant should not be
used during HDFS partition pruning.  We consider Exprs which have no
SlotRefs as bound by default, and thus we end up trying to apply
them indisrciminately.  Constant propagation makes this situation
easier to run into and the behavior is rather unexpected.

The fix for now is to explicitly disallow non-deterministic Exprs
in partition pruning.

Change-Id: I91054c6bf017401242259a1eff5e859085285546
---
M fe/src/main/java/org/apache/impala/analysis/PartitionSet.java
M fe/src/main/java/org/apache/impala/planner/HdfsPartitionPruner.java
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
M testdata/workloads/functional-planner/queries/PlannerTest/hdfs.test
4 files changed, 40 insertions(+), 10 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/75/6575/10
-- 
To view, visit http://gerrit.cloudera.org:8080/6575
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I91054c6bf017401242259a1eff5e859085285546
Gerrit-PatchSet: 10
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Zach Amsden 


[Impala-ASF-CR] IMPALA-4864 Speed up single slot predicates with dictionaries

2017-05-02 Thread Joe McDonnell (Code Review)
Joe McDonnell has posted comments on this change.

Change subject: IMPALA-4864 Speed up single slot predicates with dictionaries
..


Patch Set 4:

(2 comments)

A couple quick observations

http://gerrit.cloudera.org:8080/#/c/6726/4/be/src/exec/hdfs-parquet-scanner.cc
File be/src/exec/hdfs-parquet-scanner.cc:

PS4, Line 1454: );
The front end orders conjuncts by selectivity and cost. When we pull them out 
and attach them to column materialization, the order is not preserved. If the 
conjunct is evaluated using the dictionary, this should be fine. If the 
conjunct is not evaluated from the dictionary, then it might result in a more 
expensive evaluation.

To put numbers on it:
Suppose there are two conjuncts A and B. A is expensive (cost = 10) and super 
selective (eliminates 0.99). B is cheap (cost = 1) and moderately selective 
(eliminates 0.50). The front end might put B first, so if B eliminates 50% of 
the row, then A is called 50% of the time to eliminate the rest. This has an 
amortized cost of 1 + 0.50 * 10 = 6, which is cheaper than calling A 100% of 
the time.

We can reorder the materialization of the columns at runtime using knowledge of 
which columns are dictionary encoded and which aren't.


PS4, Line 1467: endif
It should be possible to do this up in HdfsScanNode. As an example, see 
extractKuduConjuncts in KuduScanNode. This pulls out conjuncts that will be 
evaluated by Kudu.


-- 
To view, visit http://gerrit.cloudera.org:8080/6726
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I65981c89e5292086809ec1268f5a273f4c1fe054
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5188: Add slot sorting in TupleDescriptor::LayoutEquals()

2017-05-02 Thread Taras Bobrovytsky (Code Review)
Taras Bobrovytsky has posted comments on this change.

Change subject: IMPALA-5188: Add slot sorting in TupleDescriptor::LayoutEquals()
..


Patch Set 7:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/6610/7/be/src/runtime/descriptors.cc
File be/src/runtime/descriptors.cc:

Line 529:   }
> let's verify (DCHECK) that all tuples are dense at this point.
After implementing this DCHECK, it turns out we hit it quite often. The reason 
is we sometimes send materialized tuples without a mem layout to the BE (which 
are basically unusable) eg, when replacing plan trees with an EmptySetNode.

There is a TODO in DescriptorTable.toThrift() regarding this.


-- 
To view, visit http://gerrit.cloudera.org:8080/6610
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I426ad244678dbfe517262dfb7bbf4adc0247a35e
Gerrit-PatchSet: 7
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Michael Brown 
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5180: Don't use non-deterministic exprs in partition pruning

2017-05-02 Thread Alex Behm (Code Review)
Alex Behm has posted comments on this change.

Change subject: IMPALA-5180: Don't use non-deterministic exprs in partition 
pruning
..


Patch Set 9:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/6575/9/fe/src/main/java/org/apache/impala/planner/HdfsPartitionPruner.java
File fe/src/main/java/org/apache/impala/planner/HdfsPartitionPruner.java:

Line 112:   
!conjunct.contains(Expr.IS_NONDETERMINISTIC_BUILTIN_FN_PREDICATE)) {
> Since we ignore these predicates here, they still get attached to the plan 
Yes, I think so:
* dropping the predicate would be wrong
* evaluating the predicate against partitions would arguably change the query 
semantics based on a physical design choice
* the same query against an unpartitioned table might give very different 
results

So we are left with evaluating the predicate for every row.


-- 
To view, visit http://gerrit.cloudera.org:8080/6575
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I91054c6bf017401242259a1eff5e859085285546
Gerrit-PatchSet: 9
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Zach Amsden 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5188: Add slot sorting in TupleDescriptor::LayoutEquals()

2017-05-02 Thread Taras Bobrovytsky (Code Review)
Taras Bobrovytsky has uploaded a new patch set (#7).

Change subject: IMPALA-5188: Add slot sorting in TupleDescriptor::LayoutEquals()
..

IMPALA-5188: Add slot sorting in TupleDescriptor::LayoutEquals()

The slot descriptor vectors are not guaranteed to be sorted on the slot
index within a tuple. As a result, TupleDescriptor::LayoutEquals()
sometimes returned a wrong result.

In this patch, we sort the vectors of slot descriptors on the slot index
within the tuple before comparing the vectors.

Testing:
- ran EE tests locally.

Change-Id: I426ad244678dbfe517262dfb7bbf4adc0247a35e
---
M be/src/runtime/descriptors.cc
M be/src/runtime/descriptors.h
M fe/src/main/java/org/apache/impala/analysis/TupleDescriptor.java
M fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java
M testdata/workloads/functional-query/queries/QueryTest/union.test
5 files changed, 51 insertions(+), 8 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/10/6610/7
-- 
To view, visit http://gerrit.cloudera.org:8080/6610
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I426ad244678dbfe517262dfb7bbf4adc0247a35e
Gerrit-PatchSet: 7
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Michael Brown 
Gerrit-Reviewer: Taras Bobrovytsky 


[Impala-ASF-CR] IMPALA-5262: test analytic order by random fails with assert

2017-05-02 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change.

Change subject: IMPALA-5262: test_analytic_order_by_random fails with assert
..


Patch Set 2:

Build started: http://jenkins.impala.io:8080/job/gerrit-verify-dryrun/527/

-- 
To view, visit http://gerrit.cloudera.org:8080/6775
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: If1ba8154c2b6a8d508916d85391b95885ef915a9
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Thomas Tauber-Marshall 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5262: test analytic order by random fails with assert

2017-05-02 Thread Alex Behm (Code Review)
Alex Behm has posted comments on this change.

Change subject: IMPALA-5262: test_analytic_order_by_random fails with assert
..


Patch Set 2: Code-Review+2

-- 
To view, visit http://gerrit.cloudera.org:8080/6775
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: If1ba8154c2b6a8d508916d85391b95885ef915a9
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Thomas Tauber-Marshall 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5188: Add slot sorting in TupleDescriptor::LayoutEquals()

2017-05-02 Thread Alex Behm (Code Review)
Alex Behm has posted comments on this change.

Change subject: IMPALA-5188: Add slot sorting in TupleDescriptor::LayoutEquals()
..


Patch Set 7:

FE changes lgtm.

-- 
To view, visit http://gerrit.cloudera.org:8080/6610
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I426ad244678dbfe517262dfb7bbf4adc0247a35e
Gerrit-PatchSet: 7
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Michael Brown 
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-HasComments: No


[Impala-ASF-CR] Experiment: glibc strncmp/memcmp appears much faster than SSE4.2

2017-05-02 Thread Jim Apple (Code Review)
Jim Apple has posted comments on this change.

Change subject: Experiment: glibc strncmp/memcmp appears much faster than SSE4.2
..


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/6768/1//COMMIT_MSG
Commit Message:

PS1, Line 17: memcmp is sse4.1-based
> The nightly perf test suite.
Where can I find information about the nightly perf test suite?


-- 
To view, visit http://gerrit.cloudera.org:8080/6768
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ie4786a4a75fdaffedd6e17cf076b5368ba4b4e3e
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Jim Apple 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-HasComments: Yes


[Impala-ASF-CR] Experiment: glibc strncmp/memcmp appears much faster than SSE4.2

2017-05-02 Thread Dan Hecht (Code Review)
Dan Hecht has posted comments on this change.

Change subject: Experiment: glibc strncmp/memcmp appears much faster than SSE4.2
..


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/6768/1//COMMIT_MSG
Commit Message:

PS1, Line 17: memcmp is sse4.1-based
> OK. Can you be specific about the benchmarks you'd like to see run, to avoi
The nightly perf test suite.


-- 
To view, visit http://gerrit.cloudera.org:8080/6768
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ie4786a4a75fdaffedd6e17cf076b5368ba4b4e3e
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Jim Apple 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-HasComments: Yes


[Impala-ASF-CR] Experiment: glibc strncmp/memcmp appears much faster than SSE4.2

2017-05-02 Thread Jim Apple (Code Review)
Jim Apple has posted comments on this change.

Change subject: Experiment: glibc strncmp/memcmp appears much faster than SSE4.2
..


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/6768/1//COMMIT_MSG
Commit Message:

PS1, Line 17: memcmp is sse4.1-based
> Oops, yeah cmp.
OK. Can you be specific about the benchmarks you'd like to see run, to avoid to 
many iterations of this patch set, including any node configuration or scale 
factor or file type restrictions you will require?

As a heads-up: I am not in a position to spend a lot of time setting up 
benchmarks that don't work out-of-the box with Apache Impala on a single node, 
so I may have to pass this off to someone else or just submit this change to 
the benchmark and let somebody else finish it if what you need requires more 
time than I have.


-- 
To view, visit http://gerrit.cloudera.org:8080/6768
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ie4786a4a75fdaffedd6e17cf076b5368ba4b4e3e
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Jim Apple 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-HasComments: Yes


[Impala-ASF-CR] Experiment: glibc strncmp/memcmp appears much faster than SSE4.2

2017-05-02 Thread Dan Hecht (Code Review)
Dan Hecht has posted comments on this change.

Change subject: Experiment: glibc strncmp/memcmp appears much faster than SSE4.2
..


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/6768/1//COMMIT_MSG
Commit Message:

PS1, Line 17: memcmp is sse4.1-based
> I don't understand - what use of memcpy? I was hoping only to replace the /
Oops, yeah cmp.

There aren't many direct callers but the caller callers are used in various 
places (and in and out of the IR), and over various data sizes/patterns. Which 
is why it'd be nice to verify with the benchmarks that everything's good.


-- 
To view, visit http://gerrit.cloudera.org:8080/6768
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ie4786a4a75fdaffedd6e17cf076b5368ba4b4e3e
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Jim Apple 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5188: Add slot sorting in TupleDescriptor::LayoutEquals()

2017-05-02 Thread Dan Hecht (Code Review)
Dan Hecht has posted comments on this change.

Change subject: IMPALA-5188: Add slot sorting in TupleDescriptor::LayoutEquals()
..


Patch Set 7:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/6610/7/be/src/runtime/descriptors.cc
File be/src/runtime/descriptors.cc:

Line 529:   }
let's verify (DCHECK) that all tuples are dense at this point.


-- 
To view, visit http://gerrit.cloudera.org:8080/6610
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I426ad244678dbfe517262dfb7bbf4adc0247a35e
Gerrit-PatchSet: 7
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Michael Brown 
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-HasComments: Yes


[Impala-ASF-CR] Experiment: glibc strncmp/memcmp appears much faster than SSE4.2

2017-05-02 Thread Jim Apple (Code Review)
Jim Apple has posted comments on this change.

Change subject: Experiment: glibc strncmp/memcmp appears much faster than SSE4.2
..


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/6768/1//COMMIT_MSG
Commit Message:

PS1, Line 17: memcmp is sse4.1-based
> The particular query test wouldn't likely cover all the places we use memcp
I don't understand - what use of memcpy? I was hoping only to replace the 
/body/ of StringCompare with a call to mem/cmp/, and there aren't that many 
places where we call StringCompare.


-- 
To view, visit http://gerrit.cloudera.org:8080/6768
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ie4786a4a75fdaffedd6e17cf076b5368ba4b4e3e
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Jim Apple 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5188: Add slot sorting in TupleDescriptor::LayoutEquals()

2017-05-02 Thread Taras Bobrovytsky (Code Review)
Hello Impala Public Jenkins, Alex Behm,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/6610

to look at the new patch set (#7).

Change subject: IMPALA-5188: Add slot sorting in TupleDescriptor::LayoutEquals()
..

IMPALA-5188: Add slot sorting in TupleDescriptor::LayoutEquals()

The slot descriptor vectors are not guaranteed to be sorted on the slot
index within a tuple. As a result, TupleDescriptor::LayoutEquals()
sometimes returned a wrong result.

In this patch, we sort the vectors of slot descriptors on the slot index
within the tuple before comparing the vectors.

Testing:
- ran EE tests locally.

Change-Id: I426ad244678dbfe517262dfb7bbf4adc0247a35e
---
M be/src/runtime/descriptors.cc
M be/src/runtime/descriptors.h
M fe/src/main/java/org/apache/impala/analysis/TupleDescriptor.java
M fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java
M testdata/workloads/functional-query/queries/QueryTest/union.test
5 files changed, 51 insertions(+), 8 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/10/6610/7
-- 
To view, visit http://gerrit.cloudera.org:8080/6610
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I426ad244678dbfe517262dfb7bbf4adc0247a35e
Gerrit-PatchSet: 7
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Michael Brown 
Gerrit-Reviewer: Taras Bobrovytsky 


[Impala-ASF-CR] IMPALA-5031: remove undefined behavior: call to strncmp with nullptr

2017-05-02 Thread Dan Hecht (Code Review)
Dan Hecht has posted comments on this change.

Change subject: IMPALA-5031: remove undefined behavior: call to strncmp with 
nullptr
..


Patch Set 1: Code-Review+2

(1 comment)

http://gerrit.cloudera.org:8080/#/c/6767/1/be/src/runtime/string-value.inline.h
File be/src/runtime/string-value.inline.h:

Line 59:   const int result = len > 0 ? strncmp(s1, s2, len) : 0;
> Example backtrace:
How about a short comment explaining strncmp() is undefined for nullptrs.  Some 
day, we may change things at a lower level so that even when str.len == 0, we 
have a non-null str.ptr, so we can avoid these checks. And so it would be good 
to leave a hint that this branch can be removed at that time.


-- 
To view, visit http://gerrit.cloudera.org:8080/6767
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Id563e81720a0a4847664fa2828ecfdcad870da5b
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Jim Apple 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Jim Apple 
Gerrit-HasComments: Yes


[Impala-ASF-CR] Experiment: glibc strncmp/memcmp appears much faster than SSE4.2

2017-05-02 Thread Dan Hecht (Code Review)
Dan Hecht has posted comments on this change.

Change subject: Experiment: glibc strncmp/memcmp appears much faster than SSE4.2
..


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/6768/1//COMMIT_MSG
Commit Message:

PS1, Line 17: memcmp is sse4.1-based
> Yes, it looks like it:
The particular query test wouldn't likely cover all the places we use memcpy. 
Probably our targeted perf tests and tpcds/tpch in the cluster benchmarks would 
give good coverage.


-- 
To view, visit http://gerrit.cloudera.org:8080/6768
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ie4786a4a75fdaffedd6e17cf076b5368ba4b4e3e
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Jim Apple 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-2716: Hive/Impala incompatibility for timestamp data in Parquet

2017-05-02 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has submitted this change and it was merged.

Change subject: IMPALA-2716: Hive/Impala incompatibility for timestamp data in 
Parquet
..


IMPALA-2716: Hive/Impala incompatibility for timestamp data in Parquet

Before this change:
Hive adjusts timestamps by subtracting the local time zone's offset
from all values when writing data to Parquet files. Hive is internally
inconsistent because it behaves differently for other file formats. As
a result of this adjustment, Impala may read "incorrect" timestamp
values from Parquet files written by Hive.

After this change:
Impala reads Parquet MR timestamp data and adjusts values using a time
zone from a table property (parquet.mr.int96.write.zone), if set, and
will not adjust it if the property is absent. No adjustment will be
applied to data written by Impala.

New HDFS tables created by Impala using CREATE TABLE and CREATE TABLE
LIKE  will set the table property to UTC if the global flag
--set_parquet_mr_int96_write_zone_to_utc_on_new_tables is set to true.

HDFS tables created by Impala using CREATE TABLE LIKE 
will copy the property of the table that is copied.

This change also affects the way Impala deals with
--convert_legacy_hive_parquet_utc_timestamps global flag (introduced
in IMPALA-1658). The flag will be taken into account only if
parquet.mr.int96.write.zone table property is not set and ignored
otherwise.

Change-Id: I3f24525ef45a2814f476bdee76655b30081079d6
Reviewed-on: http://gerrit.cloudera.org:8080/5939
Reviewed-by: Dan Hecht 
Tested-by: Impala Public Jenkins
---
M be/src/benchmarks/CMakeLists.txt
A be/src/benchmarks/convert-timestamp-benchmark.cc
M be/src/exec/hdfs-scan-node-base.cc
M be/src/exec/hdfs-scan-node-base.h
M be/src/exec/parquet-column-readers.cc
M be/src/exprs/timestamp-functions.cc
M be/src/exprs/timezone_db.h
M be/src/runtime/timestamp-value.cc
M be/src/runtime/timestamp-value.h
M be/src/service/fe-support.cc
M be/src/service/impala-server.cc
M be/src/util/backend-gflag-util.cc
M common/thrift/BackendGflags.thrift
M common/thrift/PlanNodes.thrift
M common/thrift/generate_error_codes.py
M fe/src/main/java/org/apache/impala/analysis/AlterTableSetTblProperties.java
M fe/src/main/java/org/apache/impala/analysis/BaseTableRef.java
M fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
M fe/src/main/java/org/apache/impala/service/BackendConfig.java
M fe/src/main/java/org/apache/impala/service/FeSupport.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java
M tests/common/impala_test_suite.py
M tests/custom_cluster/test_hive_parquet_timestamp_conversion.py
M tests/metadata/test_ddl.py
M tests/metadata/test_ddl_base.py
A tests/query_test/test_parquet_timestamp_compatibility.py
28 files changed, 850 insertions(+), 74 deletions(-)

Approvals:
  Impala Public Jenkins: Verified
  Dan Hecht: Looks good to me, approved



-- 
To view, visit http://gerrit.cloudera.org:8080/5939
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: I3f24525ef45a2814f476bdee76655b30081079d6
Gerrit-PatchSet: 12
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Attila Jeges 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Michael Ho
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-Reviewer: Zoltan Ivanfi 


[Impala-ASF-CR] IMPALA-2716: Hive/Impala incompatibility for timestamp data in Parquet

2017-05-02 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change.

Change subject: IMPALA-2716: Hive/Impala incompatibility for timestamp data in 
Parquet
..


Patch Set 11: Verified+1

-- 
To view, visit http://gerrit.cloudera.org:8080/5939
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I3f24525ef45a2814f476bdee76655b30081079d6
Gerrit-PatchSet: 11
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Attila Jeges 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Michael Ho
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-Reviewer: Zoltan Ivanfi 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4866: Hash join node does not apply limits correctly

2017-05-02 Thread Dan Hecht (Code Review)
Dan Hecht has posted comments on this change.

Change subject: IMPALA-4866: Hash join node does not apply limits correctly
..


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/6778/1/be/src/exec/partitioned-hash-join-node.cc
File be/src/exec/partitioned-hash-join-node.cc:

Line 506:   while (true) {
there are a lot of other places that this loop can exit. Can't the out_batch 
contain too many rows for all of those exit paths as well?


-- 
To view, visit http://gerrit.cloudera.org:8080/6778
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I414124f8bb6f8b2af2df468e1c23418d05a0e29f
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: anujphadke 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5246: UDF's Close() should handle Expr's preparation failure

2017-05-02 Thread Taras Bobrovytsky (Code Review)
Taras Bobrovytsky has posted comments on this change.

Change subject: IMPALA-5246: UDF's Close() should handle Expr's preparation 
failure
..


Patch Set 1: Code-Review+1

(1 comment)

This change makes sense to me.

http://gerrit.cloudera.org:8080/#/c/6757/1//COMMIT_MSG
Commit Message:

PS1, Line 10:  
Put a period here


-- 
To view, visit http://gerrit.cloudera.org:8080/6757
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Id2c689246ed4f8dd38f104fa35904f3926a7039c
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Michael Ho 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5266 Impala ABM / LZCNT support

2017-05-02 Thread Zach Amsden (Code Review)
Zach Amsden has posted comments on this change.

Change subject: IMPALA-5266 Impala ABM / LZCNT support
..


Patch Set 5:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/5821/5/be/src/util/bit-util.h
File be/src/util/bit-util.h:

Line 88: return (value >> bits) | (value << (64 - bits));
> Can't DCHECK in a constexpr; is documenting the limitation sufficient?
Turns out none of this matters.  clang now refuses to define lzcnt unless 
target("lzcnt") feature is set, which means we can't use the intrinsics.  Also, 
gutil/bits seems to have been upgraded in capabilities since, so some of this 
functionality is unnecessary.


-- 
To view, visit http://gerrit.cloudera.org:8080/5821
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I9f6a465ab4a9ee4f582847f8e211a779bdede3d2
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Zach Amsden 
Gerrit-HasComments: Yes


[Impala-ASF-CR] [DOCS] Mention Kerberos and TLS for Kudu security

2017-05-02 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has submitted this change and it was merged.

Change subject: [DOCS] Mention Kerberos and TLS for Kudu security
..


[DOCS] Mention Kerberos and TLS for Kudu security

Link to Apache Kudu docs for details of new security features.

Change-Id: I1266ad38468ef2e987aff54db35e6cafdacc
Reviewed-on: http://gerrit.cloudera.org:8080/6634
Reviewed-by: Matthew Jacobs 
Tested-by: Impala Public Jenkins
---
M docs/impala_keydefs.ditamap
M docs/shared/impala_common.xml
M docs/topics/impala_kudu.xml
3 files changed, 17 insertions(+), 1 deletion(-)

Approvals:
  Impala Public Jenkins: Verified
  Matthew Jacobs: Looks good to me, approved



-- 
To view, visit http://gerrit.cloudera.org:8080/6634
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: I1266ad38468ef2e987aff54db35e6cafdacc
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: John Russell 
Gerrit-Reviewer: Ambreen Kazi 
Gerrit-Reviewer: Dan Burkert 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Jean-Daniel Cryans 
Gerrit-Reviewer: John Russell 
Gerrit-Reviewer: Matthew Jacobs 


[Impala-ASF-CR] [DOCS] Mention Kerberos and TLS for Kudu security

2017-05-02 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change.

Change subject: [DOCS] Mention Kerberos and TLS for Kudu security
..


Patch Set 2: Verified+1

-- 
To view, visit http://gerrit.cloudera.org:8080/6634
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I1266ad38468ef2e987aff54db35e6cafdacc
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: John Russell 
Gerrit-Reviewer: Ambreen Kazi 
Gerrit-Reviewer: Dan Burkert 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Jean-Daniel Cryans 
Gerrit-Reviewer: John Russell 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-HasComments: No


[Impala-ASF-CR] [DOCS] Mention Kerberos and TLS for Kudu security

2017-05-02 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change.

Change subject: [DOCS] Mention Kerberos and TLS for Kudu security
..


Patch Set 2:

Build started: http://jenkins.impala.io:8080/job/gerrit-docs-submit/112/

-- 
To view, visit http://gerrit.cloudera.org:8080/6634
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I1266ad38468ef2e987aff54db35e6cafdacc
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: John Russell 
Gerrit-Reviewer: Ambreen Kazi 
Gerrit-Reviewer: Dan Burkert 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Jean-Daniel Cryans 
Gerrit-Reviewer: John Russell 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5262: test analytic order by random fails with assert

2017-05-02 Thread Thomas Tauber-Marshall (Code Review)
Thomas Tauber-Marshall has posted comments on this change.

Change subject: IMPALA-5262: test_analytic_order_by_random fails with assert
..


Patch Set 2:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/6775/1/tests/query_test/test_sort.py
File tests/query_test/test_sort.py:

Line 189
> How about running this query and asserting that the result is sorted?
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/6775
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: If1ba8154c2b6a8d508916d85391b95885ef915a9
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Thomas Tauber-Marshall 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5262: test analytic order by random fails with assert

2017-05-02 Thread Thomas Tauber-Marshall (Code Review)
Thomas Tauber-Marshall has uploaded a new patch set (#2).

Change subject: IMPALA-5262: test_analytic_order_by_random fails with assert
..

IMPALA-5262: test_analytic_order_by_random fails with assert

This was a poorly written test that relies on assumptions about
the behavior of 'rand' and the order that rows get processed in
a table that Impala doesn't actually guarantee.

The new version is still sensitive to the precise behavior of
'rand()', but shouldn't be flaky unless that behavior is changed.

Change-Id: If1ba8154c2b6a8d508916d85391b95885ef915a9
---
M tests/query_test/test_sort.py
1 file changed, 5 insertions(+), 7 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/75/6775/2
-- 
To view, visit http://gerrit.cloudera.org:8080/6775
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: If1ba8154c2b6a8d508916d85391b95885ef915a9
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Thomas Tauber-Marshall 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Thomas Tauber-Marshall 


[Impala-ASF-CR] IMPALA-5266 Impala ABM / LZCNT support

2017-05-02 Thread Jim Apple (Code Review)
Jim Apple has posted comments on this change.

Change subject: IMPALA-5266 Impala ABM / LZCNT support
..


Patch Set 5:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/5821/5/be/src/util/bit-util.h
File be/src/util/bit-util.h:

Line 88: return (value >> bits) | (value << (64 - bits));
> Can't DCHECK in a constexpr; is documenting the limitation sufficient?
WFM


-- 
To view, visit http://gerrit.cloudera.org:8080/5821
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I9f6a465ab4a9ee4f582847f8e211a779bdede3d2
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Zach Amsden 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5266 Impala ABM / LZCNT support

2017-05-02 Thread Zach Amsden (Code Review)
Zach Amsden has posted comments on this change.

Change subject: IMPALA-5266 Impala ABM / LZCNT support
..


Patch Set 5:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/5821/5/be/src/util/bit-util.h
File be/src/util/bit-util.h:

Line 88: return (value >> bits) | (value << (64 - bits));
> DCHECK that this isn't the case?
Can't DCHECK in a constexpr; is documenting the limitation sufficient?


-- 
To view, visit http://gerrit.cloudera.org:8080/5821
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I9f6a465ab4a9ee4f582847f8e211a779bdede3d2
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Zach Amsden 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5266 Impala ABM / LZCNT support

2017-05-02 Thread Jim Apple (Code Review)
Jim Apple has posted comments on this change.

Change subject: IMPALA-5266 Impala ABM / LZCNT support
..


Patch Set 5:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/5821/5/be/src/util/bit-util.h
File be/src/util/bit-util.h:

Line 88: return (value >> bits) | (value << (64 - bits));
> DCHECK that this isn't the case?
With a comment explaining why.


-- 
To view, visit http://gerrit.cloudera.org:8080/5821
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I9f6a465ab4a9ee4f582847f8e211a779bdede3d2
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Zach Amsden 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5266 Impala ABM / LZCNT support

2017-05-02 Thread Zach Amsden (Code Review)
Zach Amsden has posted comments on this change.

Change subject: IMPALA-5266 Impala ABM / LZCNT support
..


Patch Set 5:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/5821/5/be/src/util/bit-util.h
File be/src/util/bit-util.h:

Line 88: return (value >> bits) | (value << (64 - bits));
> This is undefined behavior when bits is 0 or 64:
DCHECK that this isn't the case?


-- 
To view, visit http://gerrit.cloudera.org:8080/5821
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I9f6a465ab4a9ee4f582847f8e211a779bdede3d2
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Zach Amsden 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5180: Don't use non-deterministic exprs in partition pruning

2017-05-02 Thread Zach Amsden (Code Review)
Zach Amsden has posted comments on this change.

Change subject: IMPALA-5180: Don't use non-deterministic exprs in partition 
pruning
..


Patch Set 9:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/6575/9/fe/src/main/java/org/apache/impala/planner/HdfsPartitionPruner.java
File fe/src/main/java/org/apache/impala/planner/HdfsPartitionPruner.java:

Line 112:   
!conjunct.contains(Expr.IS_NONDETERMINISTIC_BUILTIN_FN_PREDICATE)) {
Since we ignore these predicates here, they still get attached to the plan node 
- resulting in this:

Failed tests: 
  
PlannerTest.testHdfs:96->PlannerTestBase.runPlannerTestFile:764->PlannerTestBase.runPlannerTestFile:759
 
Expected failure, but query produced PLAN.
Query:
select * from functional.alltypes where rand() > year

PLAN:
PLAN-ROOT SINK
|
00:SCAN HDFS [functional.alltypes]
   partitions=24/24 files=24 size=478.45KB
   predicates: year < rand()


So we can actually evaluate randomized predicates now, they just aren't used 
for partition filtering.  Do we actually want to do that?


-- 
To view, visit http://gerrit.cloudera.org:8080/6575
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I91054c6bf017401242259a1eff5e859085285546
Gerrit-PatchSet: 9
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Zach Amsden 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4866: Hash join node does not apply limits correctly

2017-05-02 Thread Matthew Jacobs (Code Review)
Matthew Jacobs has posted comments on this change.

Change subject: IMPALA-4866: Hash join node does not apply limits correctly
..


Patch Set 1:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/6778/1//COMMIT_MSG
Commit Message:

PS1, Line 11: exhcnage
typo


http://gerrit.cloudera.org:8080/#/c/6778/1/be/src/exec/partitioned-hash-join-node.cc
File be/src/exec/partitioned-hash-join-node.cc:

PS1, Line 580: >
>= ?


Line 581: out_batch->CommitRows(limit_ - num_rows_returned_);
I think you need to update num_rows_returned_ as well


http://gerrit.cloudera.org:8080/#/c/6778/1/tests/common/test_dimensions.py
File tests/common/test_dimensions.py:

PS1, Line 114: # Don't run with NUM_NODES=1 due to IMPALA-561
 : # ALL_CLUSTER_SIZES = [0, 1]
update comment


-- 
To view, visit http://gerrit.cloudera.org:8080/6778
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I414124f8bb6f8b2af2df468e1c23418d05a0e29f
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: anujphadke 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-3973: optional 3rd and 4th arguments for instr().

2017-05-02 Thread Jim Apple (Code Review)
Jim Apple has posted comments on this change.

Change subject: IMPALA-3973: optional 3rd and 4th arguments for instr().
..


Patch Set 5:

Any updates on this? The comment about line 335 was left on January 3rd.

-- 
To view, visit http://gerrit.cloudera.org:8080/5589
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I17268bdb480230938f94559fe1eabe34ac2448b7
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: John Russell 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: John Russell 
Gerrit-Reviewer: Zoltan Ivanfi 
Gerrit-Reviewer: zi+z...@cloudera.com
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-2716: Hive/Impala incompatibility for timestamp data in Parquet

2017-05-02 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change.

Change subject: IMPALA-2716: Hive/Impala incompatibility for timestamp data in 
Parquet
..


Patch Set 11:

Build started: http://jenkins.impala.io:8080/job/gerrit-verify-dryrun/526/

-- 
To view, visit http://gerrit.cloudera.org:8080/5939
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I3f24525ef45a2814f476bdee76655b30081079d6
Gerrit-PatchSet: 11
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Attila Jeges 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Michael Ho
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-Reviewer: Zoltan Ivanfi 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-2716: Hive/Impala incompatibility for timestamp data in Parquet

2017-05-02 Thread Dan Hecht (Code Review)
Dan Hecht has posted comments on this change.

Change subject: IMPALA-2716: Hive/Impala incompatibility for timestamp data in 
Parquet
..


Patch Set 11: Code-Review+2

-- 
To view, visit http://gerrit.cloudera.org:8080/5939
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I3f24525ef45a2814f476bdee76655b30081079d6
Gerrit-PatchSet: 11
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Attila Jeges 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Michael Ho
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-Reviewer: Zoltan Ivanfi 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-2716: Hive/Impala incompatibility for timestamp data in Parquet

2017-05-02 Thread Attila Jeges (Code Review)
Attila Jeges has posted comments on this change.

Change subject: IMPALA-2716: Hive/Impala incompatibility for timestamp data in 
Parquet
..


Patch Set 11:

> Uploaded patch set 11.

Fixed WARN_UNUSED_RESULT warning that failed clang-tidy job.

-- 
To view, visit http://gerrit.cloudera.org:8080/5939
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I3f24525ef45a2814f476bdee76655b30081079d6
Gerrit-PatchSet: 11
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Attila Jeges 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Michael Ho
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-Reviewer: Zoltan Ivanfi 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-2716: Hive/Impala incompatibility for timestamp data in Parquet

2017-05-02 Thread Attila Jeges (Code Review)
Attila Jeges has uploaded a new patch set (#11).

Change subject: IMPALA-2716: Hive/Impala incompatibility for timestamp data in 
Parquet
..

IMPALA-2716: Hive/Impala incompatibility for timestamp data in Parquet

Before this change:
Hive adjusts timestamps by subtracting the local time zone's offset
from all values when writing data to Parquet files. Hive is internally
inconsistent because it behaves differently for other file formats. As
a result of this adjustment, Impala may read "incorrect" timestamp
values from Parquet files written by Hive.

After this change:
Impala reads Parquet MR timestamp data and adjusts values using a time
zone from a table property (parquet.mr.int96.write.zone), if set, and
will not adjust it if the property is absent. No adjustment will be
applied to data written by Impala.

New HDFS tables created by Impala using CREATE TABLE and CREATE TABLE
LIKE  will set the table property to UTC if the global flag
--set_parquet_mr_int96_write_zone_to_utc_on_new_tables is set to true.

HDFS tables created by Impala using CREATE TABLE LIKE 
will copy the property of the table that is copied.

This change also affects the way Impala deals with
--convert_legacy_hive_parquet_utc_timestamps global flag (introduced
in IMPALA-1658). The flag will be taken into account only if
parquet.mr.int96.write.zone table property is not set and ignored
otherwise.

Change-Id: I3f24525ef45a2814f476bdee76655b30081079d6
---
M be/src/benchmarks/CMakeLists.txt
A be/src/benchmarks/convert-timestamp-benchmark.cc
M be/src/exec/hdfs-scan-node-base.cc
M be/src/exec/hdfs-scan-node-base.h
M be/src/exec/parquet-column-readers.cc
M be/src/exprs/timestamp-functions.cc
M be/src/exprs/timezone_db.h
M be/src/runtime/timestamp-value.cc
M be/src/runtime/timestamp-value.h
M be/src/service/fe-support.cc
M be/src/service/impala-server.cc
M be/src/util/backend-gflag-util.cc
M common/thrift/BackendGflags.thrift
M common/thrift/PlanNodes.thrift
M common/thrift/generate_error_codes.py
M fe/src/main/java/org/apache/impala/analysis/AlterTableSetTblProperties.java
M fe/src/main/java/org/apache/impala/analysis/BaseTableRef.java
M fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
M fe/src/main/java/org/apache/impala/service/BackendConfig.java
M fe/src/main/java/org/apache/impala/service/FeSupport.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java
M tests/common/impala_test_suite.py
M tests/custom_cluster/test_hive_parquet_timestamp_conversion.py
M tests/metadata/test_ddl.py
M tests/metadata/test_ddl_base.py
A tests/query_test/test_parquet_timestamp_compatibility.py
28 files changed, 850 insertions(+), 74 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/39/5939/11
-- 
To view, visit http://gerrit.cloudera.org:8080/5939
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I3f24525ef45a2814f476bdee76655b30081079d6
Gerrit-PatchSet: 11
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Attila Jeges 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Michael Ho
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-Reviewer: Zoltan Ivanfi 


[Impala-ASF-CR] IMPALA-2716: Hive/Impala incompatibility for timestamp data in Parquet

2017-05-02 Thread Attila Jeges (Code Review)
Hello Impala Public Jenkins, Alex Behm, Dan Hecht,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/5939

to look at the new patch set (#11).

Change subject: IMPALA-2716: Hive/Impala incompatibility for timestamp data in 
Parquet
..

IMPALA-2716: Hive/Impala incompatibility for timestamp data in Parquet

Before this change:
Hive adjusts timestamps by subtracting the local time zone's offset
from all values when writing data to Parquet files. Hive is internally
inconsistent because it behaves differently for other file formats. As
a result of this adjustment, Impala may read "incorrect" timestamp
values from Parquet files written by Hive.

After this change:
Impala reads Parquet MR timestamp data and adjusts values using a time
zone from a table property (parquet.mr.int96.write.zone), if set, and
will not adjust it if the property is absent. No adjustment will be
applied to data written by Impala.

New HDFS tables created by Impala using CREATE TABLE and CREATE TABLE
LIKE  will set the table property to UTC if the global flag
--set_parquet_mr_int96_write_zone_to_utc_on_new_tables is set to true.

HDFS tables created by Impala using CREATE TABLE LIKE 
will copy the property of the table that is copied.

This change also affects the way Impala deals with
--convert_legacy_hive_parquet_utc_timestamps global flag (introduced
in IMPALA-1658). The flag will be taken into account only if
parquet.mr.int96.write.zone table property is not set and ignored
otherwise.

Change-Id: I3f24525ef45a2814f476bdee76655b30081079d6
---
M be/src/benchmarks/CMakeLists.txt
A be/src/benchmarks/convert-timestamp-benchmark.cc
M be/src/exec/hdfs-scan-node-base.cc
M be/src/exec/hdfs-scan-node-base.h
M be/src/exec/parquet-column-readers.cc
M be/src/exprs/timestamp-functions.cc
M be/src/exprs/timezone_db.h
M be/src/runtime/timestamp-value.cc
M be/src/runtime/timestamp-value.h
M be/src/service/fe-support.cc
M be/src/service/impala-server.cc
M be/src/util/backend-gflag-util.cc
M common/thrift/BackendGflags.thrift
M common/thrift/PlanNodes.thrift
M common/thrift/generate_error_codes.py
M fe/src/main/java/org/apache/impala/analysis/AlterTableSetTblProperties.java
M fe/src/main/java/org/apache/impala/analysis/BaseTableRef.java
M fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
M fe/src/main/java/org/apache/impala/service/BackendConfig.java
M fe/src/main/java/org/apache/impala/service/FeSupport.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java
M tests/common/impala_test_suite.py
M tests/custom_cluster/test_hive_parquet_timestamp_conversion.py
M tests/metadata/test_ddl.py
M tests/metadata/test_ddl_base.py
A tests/query_test/test_parquet_timestamp_compatibility.py
28 files changed, 850 insertions(+), 74 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/39/5939/11
-- 
To view, visit http://gerrit.cloudera.org:8080/5939
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I3f24525ef45a2814f476bdee76655b30081079d6
Gerrit-PatchSet: 11
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Attila Jeges 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Michael Ho
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-Reviewer: Zoltan Ivanfi 


[Impala-ASF-CR] IMPALA-5180: Don't use non-deterministic exprs in partition pruning

2017-05-02 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change.

Change subject: IMPALA-5180: Don't use non-deterministic exprs in partition 
pruning
..


Patch Set 9: Verified-1

Build failed: http://jenkins.impala.io:8080/job/gerrit-verify-dryrun/525/

-- 
To view, visit http://gerrit.cloudera.org:8080/6575
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I91054c6bf017401242259a1eff5e859085285546
Gerrit-PatchSet: 9
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Zach Amsden 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4192: Disentangle Expr and ExprContext

2017-05-02 Thread Michael Ho (Code Review)
Hello Tim Armstrong,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/5483

to look at the new patch set (#8).

Change subject: IMPALA-4192: Disentangle Expr and ExprContext
..

IMPALA-4192: Disentangle Expr and ExprContext

This change separates Expr and ExprContext. This is a preparatory
step for factoring out static data (e.g. Exprs) of plan fragments
to be shared by multiple plan fragment instances.

This change includes the followings:

1. Include aggregate functions (AggFn) as Expr. This separates
   AggFn from its evaluator. AggFn is similar to existing Expr
   as both are represented as a tree of Expr nodes but it doesn't
   really make sense to call Get*Val() on AggFn. This change
   restructures the class hierarchy: much of the existing Expr
   class is now renamed to ScalarExpr. Expr is the parent class
   of both AggFn and ScalarExpr. Expr is defined to be a tree
   with root of either AggFn or ScalarExpr and all descendants
   being ScalarExpr.

2. ExprContext is renamed to ScalarExprEvaluator which is the
   interface for evaluating ScalarExpr; AggFnEvaluator is the
   interface for evaluating AggFn. Multiple evaluators can be
   instantiated per Expr. Expr contains static states of an
   expression while evaluator contains runtime states needed
   for execution (i.e. evaluating the expression).

3. Update all exec nodes to instantiate Expr and their evaluators
   separately. ExecNode::Init() will be responsible for creating
   all the Exprs in an ExecNode while their evaluators are created
   in ExecNode::Prepare(). Certain evaluators are also moved into
   the data structures which actually utilize them. For instance,
   HashTableCtx now owns the build and probe expression evaluators.
   Similarly, TupleRowComparator and Sorter also own the evaluators.
   ExecNode which utilizes these data structures are only responsible
   for creating the expressions used by these data structures.

4. All codegen functions take Exprs instead of evaluators.

5. The assignment of index into the FunctionContext vector is now done
   during the construction of ScalarExpr. Evaluators are only responsible
   for allocating and initializing the FunctionContexts.

6. Open(), Prepare() are now removed from Expr classes. The interface
   for creating any Expr is via either ScalarExpr::Create() or AggFn::Create()
   which will convert a thrift Expr into an initialized Expr object.
   Similarly, Create() interface is used for creating evaluators from an
   intialized Expr object.

This separation allows the future change to introduce PlanNode data structures.
The plan is to move all ExecNode::Init() logic to PlanNode and call them once
per plan fragment.

Change-Id: Iefdc9aeeba033355cb9497e3a5d2363627dcf2f3
---
M be/src/benchmarks/expr-benchmark.cc
M be/src/codegen/gen_ir_descriptions.py
M be/src/codegen/impala-ir.cc
M be/src/codegen/llvm-codegen.cc
M be/src/codegen/llvm-codegen.h
M be/src/common/init.cc
M be/src/exec/CMakeLists.txt
M be/src/exec/aggregation-node-ir.cc
M be/src/exec/aggregation-node.cc
M be/src/exec/aggregation-node.h
M be/src/exec/analytic-eval-node.cc
M be/src/exec/analytic-eval-node.h
M be/src/exec/blocking-join-node.cc
M be/src/exec/blocking-join-node.h
M be/src/exec/data-sink.cc
M be/src/exec/data-sink.h
M be/src/exec/data-source-scan-node.cc
M be/src/exec/exchange-node.cc
M be/src/exec/exchange-node.h
M be/src/exec/exec-node.cc
M be/src/exec/exec-node.h
M be/src/exec/filter-context.cc
M be/src/exec/filter-context.h
M be/src/exec/hash-join-node-ir.cc
M be/src/exec/hash-join-node.cc
M be/src/exec/hash-join-node.h
M be/src/exec/hash-table-ir.cc
M be/src/exec/hash-table-test.cc
M be/src/exec/hash-table.cc
M be/src/exec/hash-table.h
M be/src/exec/hash-table.inline.h
M be/src/exec/hbase-scan-node.cc
M be/src/exec/hbase-table-sink.cc
M be/src/exec/hbase-table-sink.h
M be/src/exec/hbase-table-writer.cc
M be/src/exec/hbase-table-writer.h
M be/src/exec/hdfs-avro-scanner.cc
M be/src/exec/hdfs-avro-scanner.h
M be/src/exec/hdfs-avro-table-writer.cc
M be/src/exec/hdfs-avro-table-writer.h
M be/src/exec/hdfs-parquet-scanner-ir.cc
M be/src/exec/hdfs-parquet-scanner.cc
M be/src/exec/hdfs-parquet-scanner.h
M be/src/exec/hdfs-parquet-table-writer.cc
M be/src/exec/hdfs-parquet-table-writer.h
M be/src/exec/hdfs-rcfile-scanner.cc
M be/src/exec/hdfs-scan-node-base.cc
M be/src/exec/hdfs-scan-node-base.h
M be/src/exec/hdfs-scan-node.cc
M be/src/exec/hdfs-scanner-ir.cc
M be/src/exec/hdfs-scanner.cc
M be/src/exec/hdfs-scanner.h
M be/src/exec/hdfs-sequence-scanner.cc
M be/src/exec/hdfs-sequence-scanner.h
M be/src/exec/hdfs-sequence-table-writer.cc
M be/src/exec/hdfs-sequence-table-writer.h
M be/src/exec/hdfs-table-sink.cc
M be/src/exec/hdfs-table-sink.h
M be/src/exec/hdfs-table-writer.cc
M be/src/exec/hdfs-table-writer.h
M be/src/exec/hdfs-text-scanner.cc
M be/src/exec/hdfs-text-scanner.h
M