[Impala-ASF-CR] IMPALA-3436: Return a decimal when rounding a double

2017-11-15 Thread Alex Behm (Code Review)
Alex Behm has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8398 )

Change subject: IMPALA-3436: Return a decimal when rounding a double
..


Patch Set 3:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/8398/3/common/function-registry/impala_functions.py
File common/function-registry/impala_functions.py:

http://gerrit.cloudera.org:8080/#/c/8398/3/common/function-registry/impala_functions.py@285
PS3, Line 285:   [['round_v1','dround_v1'],
> We're not keeping decimal_v1 around indefinitely though are we?
* Yes, we want to remove decimal_v1.
* The alias does not solve the limitations of our V2 functions.
* For users that do not want to switch to V2 the V1 workaround is acceptable.
* For users who want to switch to V2 it seems weird to tell them to use a 
V1-semantics function as workaround, i.e., they'd get inconsistent rounding 
behavior in their workload or even within the same query.

I think it's better to be clear that it doesn't work in V2.



--
To view, visit http://gerrit.cloudera.org:8080/8398
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I240ec70da7996bbfefcf6438eba4dff8d33d7bd6
Gerrit-Change-Number: 8398
Gerrit-PatchSet: 3
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Wed, 15 Nov 2017 23:25:47 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-3436: Return a decimal when rounding a double

2017-11-15 Thread Alex Behm (Code Review)
Alex Behm has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8398 )

Change subject: IMPALA-3436: Return a decimal when rounding a double
..


Patch Set 3:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/8398/3/common/function-registry/impala_functions.py
File common/function-registry/impala_functions.py:

http://gerrit.cloudera.org:8080/#/c/8398/3/common/function-registry/impala_functions.py@285
PS3, Line 285:   [['round_v1','dround_v1'],
> If users rely on that behavior they can use DECIMAL_V1.
Users typically aren't happy with workarounds that involve changing their 
queries.



--
To view, visit http://gerrit.cloudera.org:8080/8398
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I240ec70da7996bbfefcf6438eba4dff8d33d7bd6
Gerrit-Change-Number: 8398
Gerrit-PatchSet: 3
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Wed, 15 Nov 2017 22:45:08 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-3436: Return a decimal when rounding a double

2017-11-15 Thread Alex Behm (Code Review)
Alex Behm has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8398 )

Change subject: IMPALA-3436: Return a decimal when rounding a double
..


Patch Set 3:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/8398/3/common/function-registry/impala_functions.py
File common/function-registry/impala_functions.py:

http://gerrit.cloudera.org:8080/#/c/8398/3/common/function-registry/impala_functions.py@285
PS3, Line 285:   [['round_v1','dround_v1'],
> Might as well add it now if we're going to add it.
If users rely on that behavior they can use DECIMAL_V1.

Allowing non-constant 2nd args in DECIMAL_V2 is really a new feature. If we 
want to that then imo we should focus on fixing the current functions to allow 
that in a separate patch.



--
To view, visit http://gerrit.cloudera.org:8080/8398
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I240ec70da7996bbfefcf6438eba4dff8d33d7bd6
Gerrit-Change-Number: 8398
Gerrit-PatchSet: 3
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Wed, 15 Nov 2017 22:41:12 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-3887: Use dfs.namenode.replication.min=3

2017-11-14 Thread Alex Behm (Code Review)
Alex Behm has abandoned this change. ( http://gerrit.cloudera.org:8080/8426 )

Change subject: IMPALA-3887: Use dfs.namenode.replication.min=3
..


Abandoned

It's sad but I couldn't make this change work reliably. Data loading frequently 
fails in Hive due to underreplicated HDFS blocks.

Abandoning to reflect reality.
--
To view, visit http://gerrit.cloudera.org:8080/8426
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: abandon
Gerrit-Change-Id: Ie09efdd9512cc4f0e338919af42bcdcede2cfb93
Gerrit-Change-Number: 8426
Gerrit-PatchSet: 3
Gerrit-Owner: Alex Behm 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Philip Zeyliger 


[Impala-ASF-CR] IMPALA-6183: Fix Decimal to Double conversion

2017-11-14 Thread Alex Behm (Code Review)
Alex Behm has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8547 )

Change subject: IMPALA-6183: Fix Decimal to Double conversion
..


Patch Set 1: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/8547
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9bf81d039e5037f22c64a32b328832235aafe9e3
Gerrit-Change-Number: 8547
Gerrit-PatchSet: 1
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Tue, 14 Nov 2017 22:28:08 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5976: Remove equivalence class computation in FE

2017-11-13 Thread Alex Behm (Code Review)
Alex Behm has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8317 )

Change subject: IMPALA-5976: Remove equivalence class computation in FE
..


Patch Set 4:

(24 comments)

http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/analysis/Analyzer.java
File fe/src/main/java/org/apache/impala/analysis/Analyzer.java:

http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@91
PS3, Line 91: /**
> I think If the predicate transfer is the starting point to consider whether
Fair point. I was thinking of a definition like this:

Slot A has a value transfer to slot B if for all rows containing both slots 
slot B has the same value as slot A or the tuple containing slot B is NULL.


http://gerrit.cloudera.org:8080/#/c/8317/4/fe/src/main/java/org/apache/impala/analysis/Analyzer.java
File fe/src/main/java/org/apache/impala/analysis/Analyzer.java:

http://gerrit.cloudera.org:8080/#/c/8317/4/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@102
PS4, Line 102:  * Slot value transfer:
Slot value transfers:


http://gerrit.cloudera.org:8080/#/c/8317/4/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@105
PS4, Line 105:  * Its symmetric closure is a equivalence relation. Its 
equivalence class is called slot
What does "Its" refer to here?

I think it's easier to state less formally:

Each slot is contained in exactly one equivalence class. A slot equivalence 
class is a set of slots where each pair of slots has a mutual value transfer. 
Equivalence classes are assigned an arbitrary id to distinguish them from 
another.


http://gerrit.cloudera.org:8080/#/c/8317/4/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@1527
PS4, Line 1527:   // select * from (select A.a, B.b from A left join B on 
A.a=B.b) v where b is null
to drive it home even further use "B.col is null" as the predicate to show that 
the NULL-checking predicate is unrelated to the slots participating in the 
equivalence


http://gerrit.cloudera.org:8080/#/c/8317/4/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@1648
PS4, Line 1648: // A map from equivalence class IDs to equivalence 
classese. The equivalence classes
typo: classes


http://gerrit.cloudera.org:8080/#/c/8317/4/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@1813
PS4, Line 1813:* Returns a map of slot equivalence classes on the set of 
slots in given tuples.
the given tuples


http://gerrit.cloudera.org:8080/#/c/8317/4/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@1838
PS4, Line 1838:* propagation. Each mapping assigns every slot in srcSids to 
slot in destTid which has
Each mapping assigns every slot in srcSids to a slot in destTid which has a 
value transfer from srcSid.


http://gerrit.cloudera.org:8080/#/c/8317/4/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@1989
PS4, Line 1989: + "\n" + tc + "Condensed Graph:\n" + condensedTc);
move the first "\n" to previous line


http://gerrit.cloudera.org:8080/#/c/8317/4/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@1998
PS4, Line 1998: // transform equality predicates into a transfer graph
remove (pretty clear from function comment)


http://gerrit.cloudera.org:8080/#/c/8317/4/fe/src/main/java/org/apache/impala/analysis/Expr.java
File fe/src/main/java/org/apache/impala/analysis/Expr.java:

http://gerrit.cloudera.org:8080/#/c/8317/4/fe/src/main/java/org/apache/impala/analysis/Expr.java@691
PS4, Line 691:   interface SlotRefComparator {
Move this into SlotRef since it's SlotRef specific?


http://gerrit.cloudera.org:8080/#/c/8317/4/fe/src/main/java/org/apache/impala/analysis/Expr.java@696
PS4, Line 696:* Returns if this expr matches 'that'. Two exprs match if:
Returns true if this expr matches 'that'


http://gerrit.cloudera.org:8080/#/c/8317/4/fe/src/main/java/org/apache/impala/analysis/Expr.java@699
PS4, Line 699:* 2. For every pair of corresponding SlotRef, 
slotRefCmp.matches returns true.
For every pair of corresponding SlotRefs, slotRefCmp.matches() returns true.


http://gerrit.cloudera.org:8080/#/c/8317/4/fe/src/main/java/org/apache/impala/analysis/Expr.java@700
PS4, Line 700:* 3. For every pair of corresponding non-SlotRef exprs, 
localEquals returns true.
localEquals()

(we generally use () for function names to make it clear we are referring to a 
function)


http://gerrit.cloudera.org:8080/#/c/8317/4/fe/src/main/java/org/apache/impala/analysis/Expr.java@719
PS4, Line 719: if (fn_ == null && that.fn_ == null) return true;
I think we should separate matches() and localEquals() more cleanly. I think 
localEquals() should be non-abstract and have a default implementation that 
checks the type and fn_ of this and that. Subclasses can override and call 
super.localEquals() first.



[Impala-ASF-CR] IMPALA-4252: Min-max runtime filters for Kudu

2017-11-13 Thread Alex Behm (Code Review)
Alex Behm has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/7793 )

Change subject: IMPALA-4252: Min-max runtime filters for Kudu
..


Patch Set 14: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/7793
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I02bad890f5b5f78388a3041bf38f89369b5e2f1c
Gerrit-Change-Number: 7793
Gerrit-PatchSet: 14
Gerrit-Owner: Thomas Tauber-Marshall 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Anonymous Coward #345
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Comment-Date: Mon, 13 Nov 2017 23:28:12 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4704: Turns on client connections when local catalog initialized.

2017-11-13 Thread Alex Behm (Code Review)
Alex Behm has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8202 )

Change subject: IMPALA-4704: Turns on client connections when local catalog 
initialized.
..


Patch Set 24: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/8202
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I52b881cba18a7e4533e21a78751c2e35c3d4c8a6
Gerrit-Change-Number: 8202
Gerrit-PatchSet: 24
Gerrit-Owner: Vuk Ercegovac 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Balazs Jeszenszky 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Juan Yu 
Gerrit-Reviewer: Michael Brown 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Vuk Ercegovac 
Gerrit-Comment-Date: Mon, 13 Nov 2017 17:51:28 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4252: Min-max runtime filters for Kudu

2017-11-09 Thread Alex Behm (Code Review)
Alex Behm has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/7793 )

Change subject: IMPALA-4252: Min-max runtime filters for Kudu
..


Patch Set 14: Code-Review+1

(1 comment)

I'm happy with this change. Nice work.

http://gerrit.cloudera.org:8080/#/c/7793/14/be/src/util/min-max-filter.cc
File be/src/util/min-max-filter.cc:

http://gerrit.cloudera.org:8080/#/c/7793/14/be/src/util/min-max-filter.cc@166
PS14, Line 166: // For integer filter types, add a GetCastIntMinMax() that 
return the min/max values for
Sorry I didn't realize this was already in the .h file. Better to remove this 
comment in favor of the one in .h



--
To view, visit http://gerrit.cloudera.org:8080/7793
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I02bad890f5b5f78388a3041bf38f89369b5e2f1c
Gerrit-Change-Number: 7793
Gerrit-PatchSet: 14
Gerrit-Owner: Thomas Tauber-Marshall 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Anonymous Coward #345
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Comment-Date: Fri, 10 Nov 2017 05:54:22 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-6164: Fix stale query profile in TestAlwaysFalseFilter

2017-11-09 Thread Alex Behm (Code Review)
Alex Behm has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8498 )

Change subject: IMPALA-6164: Fix stale query profile in TestAlwaysFalseFilter
..


Patch Set 4: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/8498
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I04bb76d20541fa035d88167b593d1b8bc3873e89
Gerrit-Change-Number: 8498
Gerrit-PatchSet: 4
Gerrit-Owner: Tianyi Wang 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-Reviewer: Tianyi Wang 
Gerrit-Comment-Date: Fri, 10 Nov 2017 01:14:47 +
Gerrit-HasComments: No


[Impala-ASF-CR] Remove unused/defunct Maven repositories.

2017-11-08 Thread Alex Behm (Code Review)
Alex Behm has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8497 )

Change subject: Remove unused/defunct Maven repositories.
..


Patch Set 1: Code-Review+2

Nice, thanks


--
To view, visit http://gerrit.cloudera.org:8080/8497
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I79eb6c483561726c7cbaf86874001f1979128720
Gerrit-Change-Number: 8497
Gerrit-PatchSet: 1
Gerrit-Owner: Philip Zeyliger 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Comment-Date: Thu, 09 Nov 2017 00:31:21 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-6164: Fix stale query profile in TestAlwaysFalseFilter

2017-11-08 Thread Alex Behm (Code Review)
Alex Behm has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8498 )

Change subject: IMPALA-6164: Fix stale query profile in TestAlwaysFalseFilter
..


Patch Set 3: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/8498
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I04bb76d20541fa035d88167b593d1b8bc3873e89
Gerrit-Change-Number: 8498
Gerrit-PatchSet: 3
Gerrit-Owner: Tianyi Wang 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-Reviewer: Tianyi Wang 
Gerrit-Comment-Date: Thu, 09 Nov 2017 00:29:03 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-6164: Fix stale query profile in TestAlwaysFalseFilter

2017-11-08 Thread Alex Behm (Code Review)
Alex Behm has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8498 )

Change subject: IMPALA-6164: Fix stale query profile in TestAlwaysFalseFilter
..


Patch Set 2:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/8498/2/tests/custom_cluster/test_always_false_filter.py
File tests/custom_cluster/test_always_false_filter.py:

http://gerrit.cloudera.org:8080/#/c/8498/2/tests/custom_cluster/test_always_false_filter.py@45
PS2, Line 45:   cursor.fetchall()
comment why we need to fetchall here



--
To view, visit http://gerrit.cloudera.org:8080/8498
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I04bb76d20541fa035d88167b593d1b8bc3873e89
Gerrit-Change-Number: 8498
Gerrit-PatchSet: 2
Gerrit-Owner: Tianyi Wang 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-Reviewer: Tianyi Wang 
Gerrit-Comment-Date: Thu, 09 Nov 2017 00:23:41 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-6173: Fix SHOW CREATE TABLE for unpartitioned Kudu tables

2017-11-08 Thread Alex Behm (Code Review)
Alex Behm has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8506 )

Change subject: IMPALA-6173: Fix SHOW CREATE TABLE for unpartitioned Kudu tables
..


Patch Set 1: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/8506
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Icc327266cfb8b5c05efec97348528cea6904bb20
Gerrit-Change-Number: 8506
Gerrit-PatchSet: 1
Gerrit-Owner: Thomas Tauber-Marshall 
Gerrit-Reviewer: Alex Behm 
Gerrit-Comment-Date: Thu, 09 Nov 2017 00:04:11 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4252: Min-max runtime filters for Kudu

2017-11-08 Thread Alex Behm (Code Review)
Alex Behm has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/7793 )

Change subject: IMPALA-4252: Min-max runtime filters for Kudu
..


Patch Set 13:

(7 comments)

Looks pretty good to me

http://gerrit.cloudera.org:8080/#/c/7793/13/be/src/exec/kudu-scanner.cc
File be/src/exec/kudu-scanner.cc:

http://gerrit.cloudera.org:8080/#/c/7793/13/be/src/exec/kudu-scanner.cc@200
PS13, Line 200: if (!filter->GetCastIntMinMax(col_type, _min, 
_max)) {
Unfortunate that that every Kudu client has to do this.

We should consider filing a JIRA against Kudu to address perform this logic on 
their side.


http://gerrit.cloudera.org:8080/#/c/7793/13/be/src/util/min-max-filter.cc
File be/src/util/min-max-filter.cc:

http://gerrit.cloudera.org:8080/#/c/7793/13/be/src/util/min-max-filter.cc@139
PS13, Line 139:   return std::numeric_limits::max();
using std::numeric_limits; at the top if this file


http://gerrit.cloudera.org:8080/#/c/7793/13/be/src/util/min-max-filter.cc@165
PS13, Line 165: #define NUMERIC_MIN_MAX_FILTER_CAST(NAME)   
\
Brief comment especially about the return value would be good.


http://gerrit.cloudera.org:8080/#/c/7793/13/fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java
File fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java:

http://gerrit.cloudera.org:8080/#/c/7793/13/fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java@604
PS13, Line 604: // must be a SlotRef pointing to a column. Wc can allow 
implicit integer casts
typo: We can allow implicit integer casts


http://gerrit.cloudera.org:8080/#/c/7793/13/fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java@605
PS13, Line 605: // by casting the min/max value before sending them to 
Kudu.
min/max values


http://gerrit.cloudera.org:8080/#/c/7793/13/testdata/workloads/functional-query/queries/QueryTest/min_max_filters.test
File testdata/workloads/functional-query/queries/QueryTest/min_max_filters.test:

http://gerrit.cloudera.org:8080/#/c/7793/13/testdata/workloads/functional-query/queries/QueryTest/min_max_filters.test@98
PS13, Line 98: where a.tinyint_col = b.int_col
Let's make the min/max filter selective, e.g. by adding where b.int_col in 
(0,1) or something like that


http://gerrit.cloudera.org:8080/#/c/7793/13/testdata/workloads/functional-query/queries/QueryTest/min_max_filters.test@103
PS13, Line 103: # The min/max values in the filter are outside the range 
representable for the target col.
Let's also add a non-selective case where the min/max values fall outside the 
range of the target column, something like:

select STRAIGHT_JOIN count(*)
from alltypes a join [BROADCAST]
  (values(min_int() x), (max_int())) v
where a.tinyint_col = v.x



--
To view, visit http://gerrit.cloudera.org:8080/7793
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I02bad890f5b5f78388a3041bf38f89369b5e2f1c
Gerrit-Change-Number: 7793
Gerrit-PatchSet: 13
Gerrit-Owner: Thomas Tauber-Marshall 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Anonymous Coward #345
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Comment-Date: Thu, 09 Nov 2017 00:02:39 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-3436: Return a decimal when rounding a double

2017-11-06 Thread Alex Behm (Code Review)
Alex Behm has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8398 )

Change subject: IMPALA-3436: Return a decimal when rounding a double
..


Patch Set 3: Code-Review+1

(4 comments)

I'm happy with the FE changes. Would be good to get a second pair of eyes on 
the BE portion.

http://gerrit.cloudera.org:8080/#/c/8398/1/be/src/exprs/decimal-functions-ir.cc
File be/src/exprs/decimal-functions-ir.cc:

http://gerrit.cloudera.org:8080/#/c/8398/1/be/src/exprs/decimal-functions-ir.cc@111
PS1, Line 111:   DCHECK(!scale.is_null);
> Added a test.
Agree it's good to be consistent, but it might bite users when we switch 
DECIMAL_V2 to the default (their queries that have non-constant args in round() 
will not analyze). We should be sure to clearly document this new 
behavior/limitation.


http://gerrit.cloudera.org:8080/#/c/8398/1/common/function-registry/impala_functions.py
File common/function-registry/impala_functions.py:

http://gerrit.cloudera.org:8080/#/c/8398/1/common/function-registry/impala_functions.py@284
PS1, Line 284:   # TODO: Remove when DECIMAL_V1 is removed.
> Because what if someone typed in "select dround(x)". We need to be able to
Good point. I was thinking dround() vs round() in toSql() doesn't matter that 
much, but probably it's better to preserve the existing behavior like you 
suggest.


http://gerrit.cloudera.org:8080/#/c/8398/3/common/function-registry/impala_functions.py
File common/function-registry/impala_functions.py:

http://gerrit.cloudera.org:8080/#/c/8398/3/common/function-registry/impala_functions.py@289
PS3, Line 289:   [['round','dround','round_v1','dround_v1'],
This function is used both decimal_v1/v2 modes.


http://gerrit.cloudera.org:8080/#/c/8398/3/fe/src/test/java/org/apache/impala/analysis/AnalyzeExprsTest.java
File fe/src/test/java/org/apache/impala/analysis/AnalyzeExprsTest.java:

http://gerrit.cloudera.org:8080/#/c/8398/3/fe/src/test/java/org/apache/impala/analysis/AnalyzeExprsTest.java@2439
PS3, Line 2439: AnalyzesOk("select round(cast(1.123 as double), int_col) 
from functional.alltypes",
add a positive test with a non-NULL constant second argument to both v1/v2



--
To view, visit http://gerrit.cloudera.org:8080/8398
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I240ec70da7996bbfefcf6438eba4dff8d33d7bd6
Gerrit-Change-Number: 8398
Gerrit-PatchSet: 3
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-Comment-Date: Tue, 07 Nov 2017 01:26:21 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-6151: add query-level fragment/backend counters

2017-11-06 Thread Alex Behm (Code Review)
Alex Behm has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8461 )

Change subject: IMPALA-6151: add query-level fragment/backend counters
..


Patch Set 3:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/8461/3/be/src/service/impala-server.cc
File be/src/service/impala-server.cc:

http://gerrit.cloudera.org:8080/#/c/8461/3/be/src/service/impala-server.cc@1052
PS3, Line 1052:   ImpaladMetrics::NUM_QUERIES_REGISTERED->Increment(-1L);
> I made the change in PS4
Oops, missed that, sorry!



--
To view, visit http://gerrit.cloudera.org:8080/8461
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I3df350414733e98d1ec28adc1c98f45bb0c4e3e9
Gerrit-Change-Number: 8461
Gerrit-PatchSet: 3
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Mon, 06 Nov 2017 23:16:31 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-6151: add query-level fragment/backend counters

2017-11-06 Thread Alex Behm (Code Review)
Alex Behm has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8461 )

Change subject: IMPALA-6151: add query-level fragment/backend counters
..


Patch Set 4: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/8461
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I3df350414733e98d1ec28adc1c98f45bb0c4e3e9
Gerrit-Change-Number: 8461
Gerrit-PatchSet: 4
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Mon, 06 Nov 2017 23:16:37 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-3613: Avoid topic updates to unregistered subscriber instances

2017-11-06 Thread Alex Behm (Code Review)
Alex Behm has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8449 )

Change subject: IMPALA-3613: Avoid topic updates to unregistered subscriber 
instances
..


Patch Set 3:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/8449/3/be/src/statestore/statestore.cc
File be/src/statestore/statestore.cc:

http://gerrit.cloudera.org:8080/#/c/8449/3/be/src/statestore/statestore.cc@278
PS3, Line 278:   lock_guard l(subscribers_lock_);
 :   lock_guard t(topic_lock_);
> IMO, we shouldn't use spinlock for topic_lock_ since we can potentially do
My general opinion on this is "if it ain't broke, don't fix it". Do we have any 
evidence that these locks are a problem, or is this a case of premature 
optimization?

In any case, this is not the main focus of this patch, so let's avoid creeping 
in unrelated changes. I vote for reverting.

If we have evidence that a mutex is not good here, then let's change that in a 
separate patch.



--
To view, visit http://gerrit.cloudera.org:8080/8449
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0329ae7d23dc6e9b04b7bc3ee8d89cbc73756f65
Gerrit-Change-Number: 8449
Gerrit-PatchSet: 3
Gerrit-Owner: Bharath Vissapragada 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-Comment-Date: Mon, 06 Nov 2017 22:05:12 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-6151: add query-level fragment/backend counters

2017-11-06 Thread Alex Behm (Code Review)
Alex Behm has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8461 )

Change subject: IMPALA-6151: add query-level fragment/backend counters
..


Patch Set 3:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/8461/3/be/src/service/impala-server.cc
File be/src/service/impala-server.cc:

http://gerrit.cloudera.org:8080/#/c/8461/3/be/src/service/impala-server.cc@1052
PS3, Line 1052:   ImpaladMetrics::NUM_QUERIES_REGISTERED->Increment(-1L);
> At this point the query has been removed from all tracking structures so I
My understanding is that archiving can take a long time (e.g. huge profile), so 
it means this metric may be inaccurate for an extended period.

I agree it probably doesn't matter that much, but why not try to make the 
counter as accurate as possible. There is a non-trivial amount of work to be 
done at this point.



--
To view, visit http://gerrit.cloudera.org:8080/8461
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I3df350414733e98d1ec28adc1c98f45bb0c4e3e9
Gerrit-Change-Number: 8461
Gerrit-PatchSet: 3
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Mon, 06 Nov 2017 21:54:10 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-6151: add query-level fragment/backend counters

2017-11-03 Thread Alex Behm (Code Review)
Alex Behm has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8461 )

Change subject: IMPALA-6151: add query-level fragment/backend counters
..


Patch Set 3:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/8461/2/be/src/service/impala-server.cc
File be/src/service/impala-server.cc:

http://gerrit.cloudera.org:8080/#/c/8461/2/be/src/service/impala-server.cc@993
PS2, Line 993:   RETURN_IF_ERROR(CancelInternal(query_id, check_inflight, 
cause));
> The only reason CancelInternal() can fail is if the query isn't registered
Good point


http://gerrit.cloudera.org:8080/#/c/8461/3/be/src/service/impala-server.cc
File be/src/service/impala-server.cc:

http://gerrit.cloudera.org:8080/#/c/8461/3/be/src/service/impala-server.cc@1052
PS3, Line 1052:   ImpaladMetrics::NUM_QUERIES_REGISTERED->Increment(-1L);
Why keep it above ArchiveQuery()?

Not your change, but the function comment on ArchiveQuery seems wrong and 
confusing with decrementing the metric here.



--
To view, visit http://gerrit.cloudera.org:8080/8461
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I3df350414733e98d1ec28adc1c98f45bb0c4e3e9
Gerrit-Change-Number: 8461
Gerrit-PatchSet: 3
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Sat, 04 Nov 2017 04:23:37 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-1422: support a constant on LHS of IN predicates.

2017-11-03 Thread Alex Behm (Code Review)
Alex Behm has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8322 )

Change subject: IMPALA-1422: support a constant on LHS of IN predicates.
..


Patch Set 6:

(12 comments)

Some high-level comments before I dig deeper.

http://gerrit.cloudera.org:8080/#/c/8322/2/fe/src/main/java/org/apache/impala/analysis/SlotRef.java
File fe/src/main/java/org/apache/impala/analysis/SlotRef.java:

http://gerrit.cloudera.org:8080/#/c/8322/2/fe/src/main/java/org/apache/impala/analysis/SlotRef.java@94
PS2, Line 94: // TODO: derived slot refs (e.g., star-expanded) will not 
have rawPath set.
Why can't this be addressed in this patch?


http://gerrit.cloudera.org:8080/#/c/8322/2/fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java
File fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java:

http://gerrit.cloudera.org:8080/#/c/8322/2/fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java@287
PS2, Line 287:* rewritten, null is returned. If 'inPred' is rewritten, the 
resulting expression
... and the RHS is a subquery.


http://gerrit.cloudera.org:8080/#/c/8322/2/fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java@299
PS2, Line 299:* 2) Predicate is NOT IN and RHS returns a single row.
What if the RHS subquery is correlated?


http://gerrit.cloudera.org:8080/#/c/8322/2/fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java@319
PS2, Line 319:* NOT EXISTS (RHS AND (C IS NULL OR (CASE WHEN e IS NULL 
THEN C ELSE e END) = C)
Is the transformation even correct if RHS is correlated?


http://gerrit.cloudera.org:8080/#/c/8322/6/fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java
File fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java:

http://gerrit.cloudera.org:8080/#/c/8322/6/fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java@312
PS6, Line 312:*a) No group by or analytic function in subquery.
Ideally, we should not have to distinguish between cases 3a and 3b, and we 
should always do this rewrite:

C NOT IN (RHS)

Rewrites to:

NOT EXISTS (SELECT 1 FROM (RHS) v WHERE C IS NULL OR IFNULL(v.e, C) = C)

My understanding is that such a rewrite dues not work for case 3a when there is 
correlation, so we use an alternate rewrite that does not require a new inline 
view. I think we should reorganize the comments here to reflect this line of 
thinking and the limitations. For example, something like this:


3) Predicate is NOT IN and RHS returns multiple rows.

General rewrite: 


a) No group by or analytic


b) Group by  or analytic function in subquery



http://gerrit.cloudera.org:8080/#/c/8322/6/fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java@316
PS6, Line 316:*REWRITE:
Use IFNULL() instead of CASE to simplify.


http://gerrit.cloudera.org:8080/#/c/8322/6/fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java@319
PS6, Line 319:* NOT EXISTS (RHS AND (C IS NULL OR (CASE WHEN e IS NULL 
THEN C ELSE e END) = C)
I think there's another case where this rewrite is not correct: when the 
subquery has an order by + limit. Basically, we are reimplementing the 
"predicate push down" correctness checks here.

I'm wondering whether we should reduce the scope of this patch and only allow 
those case where the generic rewrite works. In a separate JIRA we should 
consider whether we can make the generic rewrite also work for case 3a, and if 
not what to do instead.

What do you think?


http://gerrit.cloudera.org:8080/#/c/8322/6/fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java@321
PS6, Line 321:*Example:
The following query gives me an IllegalStateException:

explain select id from functional.alltypes t1 where 10 not in (select id from 
functional.alltypestiny order by id limit 2);

Existing bug or related to this change?


http://gerrit.cloudera.org:8080/#/c/8322/6/fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java@328
PS6, Line 328:*Note: it useful to think of C NOT IN (RHS) as the 
boolean expansion:
it is useful


http://gerrit.cloudera.org:8080/#/c/8322/6/fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java@340
PS6, Line 340:*b) Group by  or analytic function in subquery.
extra space after "by"


http://gerrit.cloudera.org:8080/#/c/8322/6/fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java@345
PS6, Line 345:*  NOT EXISTS (SELECT x FROM (RHS) t WHERE C = t or t IS 
NOT NULL)
What does C=t mean here?
"t" is a table alias and cannot be compared to a C

It's not immediately clear to me why the rewrite here is equivalent to the 
generic rewrite described in 3a. Please explain (see my suggestion above). 
Alternatively, I'm also ok if you want to just use the generic rewrite here.


http://gerrit.cloudera.org:8080/#/c/8322/6/fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java@354
PS6, Line 354:* - Assumes that subquery is uncorrelated, 

[Impala-ASF-CR] IMPALA-6151: add query-level fragment/backend counters

2017-11-03 Thread Alex Behm (Code Review)
Alex Behm has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8461 )

Change subject: IMPALA-6151: add query-level fragment/backend counters
..


Patch Set 2:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/8461/2/be/src/service/impala-server.cc
File be/src/service/impala-server.cc:

http://gerrit.cloudera.org:8080/#/c/8461/2/be/src/service/impala-server.cc@993
PS2, Line 993:   RETURN_IF_ERROR(CancelInternal(query_id, check_inflight, 
cause));
What if we hit an error here? Need to still maintain the metric.


http://gerrit.cloudera.org:8080/#/c/8461/2/be/src/service/impala-server.cc@1008
PS2, Line 1008:   ImpaladMetrics::NUM_QUERIES_REGISTERED->Increment(-1L);
To be most accurate, shouldn't we move this to the end of this function? 
There's still some non-trivial work in thus function after this point.


http://gerrit.cloudera.org:8080/#/c/8461/2/common/thrift/metrics.json
File common/thrift/metrics.json:

http://gerrit.cloudera.org:8080/#/c/8461/2/common/thrift/metrics.json@477
PS2, Line 477: "label": "Queries Registered",
Imo we need to be careful with terminology. Our query states are already hard 
to reason about for users.
How do the queries displayed in our "/queries" page relate to "registered" 
queries? I know the answer, but users might not due to confusion with what 
"registered", "in-flight", and "waiting-to-be-closed" mean.

I don't really have a good alternative to "registered", but we should at least 
add in the description that "registered" refers to "in-flight" and 
"waiting-to-be-closed" queries from the "/queries" page.



--
To view, visit http://gerrit.cloudera.org:8080/8461
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I3df350414733e98d1ec28adc1c98f45bb0c4e3e9
Gerrit-Change-Number: 8461
Gerrit-PatchSet: 2
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Fri, 03 Nov 2017 23:51:36 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4252: Min-max runtime filters for Kudu

2017-11-03 Thread Alex Behm (Code Review)
Alex Behm has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/7793 )

Change subject: IMPALA-4252: Min-max runtime filters for Kudu
..


Patch Set 11:

(7 comments)

http://gerrit.cloudera.org:8080/#/c/7793/9//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/7793/9//COMMIT_MSG@27
PS9, Line 27:
> RUNTIME_FILTER_WAIT_TIME_MS applies the same to bloom and min-max - its the
Thanks for explaining.
Can you add this text into a Google Doc for us to keep track of the 
evolution/meaning of these query options? No need to polish, just put it 
somewhere.

I think the new types of filters will affect our query options in non-trivial 
ways and we should come up with a plan that minimizes user confusion, adding 
new options, and deprecating options.


http://gerrit.cloudera.org:8080/#/c/7793/9//COMMIT_MSG@41
PS9, Line 41:
> Those results are posted in the review comments. I can include them here as
Thanks. No need to add here.


http://gerrit.cloudera.org:8080/#/c/7793/9/common/thrift/ImpalaInternalService.thrift
File common/thrift/ImpalaInternalService.thrift:

http://gerrit.cloudera.org:8080/#/c/7793/9/common/thrift/ImpalaInternalService.thrift@202
PS9, Line 202:   // Maximum number of bloom runtime filters allowed per query
> There's basically two reasons for this:
Thanks. Please add this text to a Google Doc for tracking the evolution of 
these query options.

Even though the min/max filters are smaller and bounded in size, I think 
extreme queries with a large number of joins and join conditions can still 
cause havoc. Keep in mind that we now allow an *unbounded* number of such 
filters. Crazy queries will happen.

We should not hold this patch, but the lack of safeguards is concerning to me 
and we should continue thinking.


http://gerrit.cloudera.org:8080/#/c/7793/9/common/thrift/PlanNodes.thrift
File common/thrift/PlanNodes.thrift:

http://gerrit.cloudera.org:8080/#/c/7793/9/common/thrift/PlanNodes.thrift@103
PS9, Line 103:   12: optional string kudu_col_name
case sensitive?


http://gerrit.cloudera.org:8080/#/c/7793/11/fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java
File fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java:

http://gerrit.cloudera.org:8080/#/c/7793/11/fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java@359
PS11, Line 359: public Operator getJoinOp() { return exprCmpOp_; }
getExprCmpOp()


http://gerrit.cloudera.org:8080/#/c/7793/11/fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java@602
PS11, Line 602: if (!(targetExpr instanceof SlotRef)
I think only explicit casts are problematic. Implicit casts should be ok, or am 
I missing something?


http://gerrit.cloudera.org:8080/#/c/7793/11/testdata/workloads/functional-query/queries/QueryTest/runtime_filters.test
File testdata/workloads/functional-query/queries/QueryTest/runtime_filters.test:

http://gerrit.cloudera.org:8080/#/c/7793/11/testdata/workloads/functional-query/queries/QueryTest/runtime_filters.test@144
PS11, Line 144: on a.month = cast(b.month + 1 as int);
Was this your change? Why the change?



--
To view, visit http://gerrit.cloudera.org:8080/7793
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I02bad890f5b5f78388a3041bf38f89369b5e2f1c
Gerrit-Change-Number: 7793
Gerrit-PatchSet: 11
Gerrit-Owner: Thomas Tauber-Marshall 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Anonymous Coward #345
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Comment-Date: Fri, 03 Nov 2017 23:22:11 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] Correct log line in start-impala-cluster.py.

2017-11-02 Thread Alex Behm (Code Review)
Alex Behm has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8432 )

Change subject: Correct log line in start-impala-cluster.py.
..


Patch Set 1: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/8432
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7ceece1c05b9a4ca9f0a08fa30d195f811490c0e
Gerrit-Change-Number: 8432
Gerrit-PatchSet: 1
Gerrit-Owner: Philip Zeyliger 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Comment-Date: Thu, 02 Nov 2017 22:21:43 +
Gerrit-HasComments: No


[Impala-ASF-CR] Correct log line in start-impala-cluster.py.

2017-11-01 Thread Alex Behm (Code Review)
Alex Behm has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8432 )

Change subject: Correct log line in start-impala-cluster.py.
..


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/8432/1/bin/start-impala-cluster.py
File bin/start-impala-cluster.py:

http://gerrit.cloudera.org:8080/#/c/8432/1/bin/start-impala-cluster.py@398
PS1, Line 398: executors = options.cluster_size - options.num_coordinators
Could executors be negative due to the same bug?



--
To view, visit http://gerrit.cloudera.org:8080/8432
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7ceece1c05b9a4ca9f0a08fa30d195f811490c0e
Gerrit-Change-Number: 8432
Gerrit-PatchSet: 1
Gerrit-Owner: Philip Zeyliger 
Gerrit-Reviewer: Alex Behm 
Gerrit-Comment-Date: Wed, 01 Nov 2017 23:58:57 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4252: Min-max runtime filters for Kudu

2017-11-01 Thread Alex Behm (Code Review)
Alex Behm has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/7793 )

Change subject: IMPALA-4252: Min-max runtime filters for Kudu
..


Patch Set 9:

(14 comments)

http://gerrit.cloudera.org:8080/#/c/7793/9//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/7793/9//COMMIT_MSG@13
PS9, Line 13: In RuntimeFilterGenerator in the planner, each partitioned hash 
join
... each hash join node generates a bloom and min-max filter for each equi-join 
predicate, but only those ...


http://gerrit.cloudera.org:8080/#/c/7793/9//COMMIT_MSG@27
PS9, Line 27: For now, min-max filters are only applied at the KuduScanner, 
which
Not specific to the code changes, and I don't expect a response here (probably 
too long :)).

How do the existing query options around runtime filters affect the new min/max 
filters on Kudu? For example, what does DISABLE_ROW_RUNTIME_FILTERING mean for 
the Kudu min/max filters?

How should users think about setting:
RUNTIME_FILTER_WAIT_TIME_MS

In particular, are min/max filters more effective against Kudu PK or partition 
columns?


http://gerrit.cloudera.org:8080/#/c/7793/9//COMMIT_MSG@41
PS9, Line 41: Perf Testing:
Contrived extreme queries are good data points, but how about running the 
TPCH/DS perf suites against Kudu?


http://gerrit.cloudera.org:8080/#/c/7793/9//COMMIT_MSG@44
PS9, Line 44: - Ran a contrived query with a filter that does eliminate any rows
does not eliminate


http://gerrit.cloudera.org:8080/#/c/7793/9/common/thrift/ImpalaInternalService.thrift
File common/thrift/ImpalaInternalService.thrift:

http://gerrit.cloudera.org:8080/#/c/7793/9/common/thrift/ImpalaInternalService.thrift@202
PS9, Line 202:   // Maximum number of bloom runtime filters allowed per query
I think I understand why you did this, but it seems confusing from a user's 
perspective. Ok to leave, but do you have a story around the eventual meaning 
of existing query options when HDFS can do min/max and Kudu can do bloom?


http://gerrit.cloudera.org:8080/#/c/7793/9/fe/src/main/java/org/apache/impala/planner/PlanNode.java
File fe/src/main/java/org/apache/impala/planner/PlanNode.java:

http://gerrit.cloudera.org:8080/#/c/7793/9/fe/src/main/java/org/apache/impala/planner/PlanNode.java@730
PS9, Line 730: output.append(Joiner.on(", ").join(filtersStr) + "\n");
just return the string? don't think we need the 'output' StringBuilder


http://gerrit.cloudera.org:8080/#/c/7793/9/fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java
File fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java:

http://gerrit.cloudera.org:8080/#/c/7793/9/fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java@111
PS9, Line 111: private final Operator joinOp_;
Let's call this cmpOp_ or exprCmpOp_ or something else because "joinOp_" 
usually indicates the join type like left outer, right outer, etc.


http://gerrit.cloudera.org:8080/#/c/7793/9/fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java@170
PS9, Line 170:   SlotRef slotRef = expr.unwrapSlotRef(false);
Add a comment stating that the validity of this is checked elsewhere (and where 
exactly)


http://gerrit.cloudera.org:8080/#/c/7793/9/fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java@368
PS9, Line 368:   if (node instanceof HdfsScanNode && type_ != 
TRuntimeFilterType.BLOOM) {
I feel like these checks belong in the caller. Having an addTarget() function 
be a co-op in some cases seems difficult to reason about. Can be cleaned with a 
isValidTarget() helper function.


http://gerrit.cloudera.org:8080/#/c/7793/9/fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java@463
PS9, Line 463: // We only enforce a limit on the number of bloom filters as 
they are much more
This seems really confusing for users. I'm ok with checking in this version, 
and let's discuss separately.


http://gerrit.cloudera.org:8080/#/c/7793/9/testdata/workloads/functional-planner/queries/PlannerTest/aggregation.test
File testdata/workloads/functional-planner/queries/PlannerTest/aggregation.test:

http://gerrit.cloudera.org:8080/#/c/7793/9/testdata/workloads/functional-planner/queries/PlannerTest/aggregation.test@1248
PS9, Line 1248: |  runtime filters: RF004 <- o_orderkey, RF005 <- o_clerk
To me the re-numbering is a little strange. We can think about how to address 
this in a follow-on change. I'm thinking that ideally users should be able to 
quickly determine the number of runtime filters based on the max RF id, so we 
could assign the real RF id lazily instead of eagerly.


http://gerrit.cloudera.org:8080/#/c/7793/9/testdata/workloads/functional-planner/queries/PlannerTest/min-max-runtime-filters.test
File 
testdata/workloads/functional-planner/queries/PlannerTest/min-max-runtime-filters.test:


[Impala-ASF-CR] IMPALA-5976: Remove equivalence class computation in FE

2017-11-01 Thread Alex Behm (Code Review)
Alex Behm has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8317 )

Change subject: IMPALA-5976: Remove equivalence class computation in FE
..


Patch Set 3:

(62 comments)

http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/analysis/AnalyticExpr.java
File fe/src/main/java/org/apache/impala/analysis/AnalyticExpr.java:

http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/analysis/AnalyticExpr.java@281
PS3, Line 281: return new FunctionCallExpr("if", ifParams);
Let's please avoid code changes unrelated to the purpose of this patch as much 
as reasonable. Cleanup is nice in general, but this patch is already complex 
and huge so let's not add anything extra.

Such changes can also make backporting more difficult due to conflicts, so 
cleanup should be done in a separate change.


http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/analysis/Analyzer.java
File fe/src/main/java/org/apache/impala/analysis/Analyzer.java:

http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@91
PS3, Line 91:  * Slot A has value transfer to slot B if a predicate on A can be 
applied to B in most
We need a precise definition. The original definition was more precise.


http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@93
PS3, Line 93:  * It is a reflexive, transitive binary relation on the set of 
slots.
Not sure this part adds value. What's the significance of these statements with 
respect to our use of value transfers for planning purposes? If we don't make 
use of these terms/properties anywhere else in the code, then we should remove 
these statements.


http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@277
PS3, Line 277: // The SCC-condensed graph representation of value transfer 
relationship.
The SCC-condensed graph representation of all slot value transfers.


http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@278
PS3, Line 278: SccCondensedGraph valueTransferGraph;
public/private?


http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@1146
PS3, Line 1146:* Create and register an auxiliary predicate to express an 
mutual value transfer
a mutual value transfer


http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@1467
PS3, Line 1467:* by replacing the slots of a source predicate with slots of 
the destTid, if for each
how about: if every source slot has a value transfer to a slot in destTid


http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@1513
PS3, Line 1513:   // its referenced tuples are NULL. For example:
Let's simplify the example and make it as clear as possible:

select * from (select A.a, B.b from A left join B on A.a=B.b) v where b is null


http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@1618
PS3, Line 1618:* For each slot equivalence class, adds/removes predicates 
from conjuncts such that it
Need a definition of equivalence class in the Analyzer class comment. You do 
mention the term "equivalence class" but I don't think it has the same meaning 
of the "equivalence class" terminology used here.


http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@1636
PS3, Line 1636: // A map from equivalence class ids to slot equivalence 
classes containing on slots
Garbled sentence, please clean up


http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@1956
PS3, Line 1956:* Compute the value transfer graph based on the registered 
value transfer relation
the registered value transfers


http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@1973
PS3, Line 1973: String condensedTc = 
globalState_.valueTransferGraph.print();
missing space in string


http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@1982
PS3, Line 1982:* Populate the value transfer relation from eq conjuncts to 
graph 'g'.
Add value-transfer edges to 'g' based on the registered equi-join conjuncts.


http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@2059
PS3, Line 2059:* Returns the equivalence class of slotID.
of the given slot id


http://gerrit.cloudera.org:8080/#/c/8317/3/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@2062
PS3, Line 2062:   public List getEquivClass(SlotId slot) {
slot -> sid



[Impala-ASF-CR] IMPALA-3887: Use dfs.namenode.replication.min=3

2017-10-31 Thread Alex Behm (Code Review)
Alex Behm has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8426 )

Change subject: IMPALA-3887: Use dfs.namenode.replication.min=3
..


Patch Set 3:

I am running into MAPREDUCE-6441 with this patch precisely during data load.


--
To view, visit http://gerrit.cloudera.org:8080/8426
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie09efdd9512cc4f0e338919af42bcdcede2cfb93
Gerrit-Change-Number: 8426
Gerrit-PatchSet: 3
Gerrit-Owner: Alex Behm 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Comment-Date: Tue, 31 Oct 2017 21:15:28 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4704: Turns on client connections when local catalog initialized.

2017-10-31 Thread Alex Behm (Code Review)
Alex Behm has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8202 )

Change subject: IMPALA-4704: Turns on client connections when local catalog 
initialized.
..


Patch Set 19: Code-Review+1


--
To view, visit http://gerrit.cloudera.org:8080/8202
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I52b881cba18a7e4533e21a78751c2e35c3d4c8a6
Gerrit-Change-Number: 8202
Gerrit-PatchSet: 19
Gerrit-Owner: Vuk Ercegovac 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Balazs Jeszenszky 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Juan Yu 
Gerrit-Reviewer: Michael Brown 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Vuk Ercegovac 
Gerrit-Comment-Date: Tue, 31 Oct 2017 19:06:36 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4704: Turns on client connections when local catalog initialized.

2017-10-31 Thread Alex Behm (Code Review)
Alex Behm has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8202 )

Change subject: IMPALA-4704: Turns on client connections when local catalog 
initialized.
..


Patch Set 16:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/8202/16/be/src/service/impala-server.h
File be/src/service/impala-server.h:

http://gerrit.cloudera.org:8080/#/c/8202/16/be/src/service/impala-server.h@1014
PS16, Line 1014:   /// Flag that records if backend and/or client services have 
been started.
> they're on back-to-back lines now in impala_server.cc so I think we're fine
Ahh right, I wrote this comment before seeing the other file :). lgtm



--
To view, visit http://gerrit.cloudera.org:8080/8202
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I52b881cba18a7e4533e21a78751c2e35c3d4c8a6
Gerrit-Change-Number: 8202
Gerrit-PatchSet: 16
Gerrit-Owner: Vuk Ercegovac 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Balazs Jeszenszky 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Juan Yu 
Gerrit-Reviewer: Michael Brown 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Vuk Ercegovac 
Gerrit-Comment-Date: Tue, 31 Oct 2017 19:06:26 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4704: Turns on client connections when local catalog initialized.

2017-10-31 Thread Alex Behm (Code Review)
Alex Behm has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8202 )

Change subject: IMPALA-4704: Turns on client connections when local catalog 
initialized.
..


Patch Set 18: Code-Review+1

(3 comments)

I'm happy with this change, but would be great to get another pair of eyes on 
it. Dan, can you take a look?

http://gerrit.cloudera.org:8080/#/c/8202/17/be/src/service/impala-server.h
File be/src/service/impala-server.h:

http://gerrit.cloudera.org:8080/#/c/8202/17/be/src/service/impala-server.h@98
PS17, Line 98: ///- Start ImpalaInternalService API
> currently, the main flips this bit-- its the one orchestrating the sequence
sounds good


http://gerrit.cloudera.org:8080/#/c/8202/16/be/src/service/impala-server.h
File be/src/service/impala-server.h:

http://gerrit.cloudera.org:8080/#/c/8202/16/be/src/service/impala-server.h@1014
PS16, Line 1014:   /// an accurate expiration time, and this structure 
guarantees that we will always
> I think the thing I'm objecting to here is that the internal state will dep
Works for me. To make the relationship between 'services_started_' and 
'is_ready' clear, I think we should add a DCHECK right before setting 
'is_ready' that asserts 'services_started_' must be true.


http://gerrit.cloudera.org:8080/#/c/8202/18/be/src/service/impala-server.cc
File be/src/service/impala-server.cc:

http://gerrit.cloudera.org:8080/#/c/8202/18/be/src/service/impala-server.cc@2050
PS18, Line 2050: I
just set to true?



--
To view, visit http://gerrit.cloudera.org:8080/8202
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I52b881cba18a7e4533e21a78751c2e35c3d4c8a6
Gerrit-Change-Number: 8202
Gerrit-PatchSet: 18
Gerrit-Owner: Vuk Ercegovac 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Balazs Jeszenszky 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Juan Yu 
Gerrit-Reviewer: Michael Brown 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Vuk Ercegovac 
Gerrit-Comment-Date: Tue, 31 Oct 2017 18:51:51 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-3887: Use dfs.namenode.replication.min=3

2017-10-31 Thread Alex Behm (Code Review)
Alex Behm has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8426 )

Change subject: IMPALA-3887: Use dfs.namenode.replication.min=3
..


Patch Set 3:

I'm still testing this change. With this change we seem to be more prone to 
hitting the Hive/mapred local executor race (MAPREDUCE-6441 MAPREDUCE-6992). 
Will sync with PhilZ to see how to proceed.


--
To view, visit http://gerrit.cloudera.org:8080/8426
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie09efdd9512cc4f0e338919af42bcdcede2cfb93
Gerrit-Change-Number: 8426
Gerrit-PatchSet: 3
Gerrit-Owner: Alex Behm 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Comment-Date: Tue, 31 Oct 2017 18:29:44 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-3887: Use dfs.namenode.replication.min=3

2017-10-31 Thread Alex Behm (Code Review)
Alex Behm has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8426 )

Change subject: IMPALA-3887: Use dfs.namenode.replication.min=3
..


Patch Set 2:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/8426/2//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/8426/2//COMMIT_MSG@9
PS2, Line 9: and HBase configuration of our
> Update?
Done



--
To view, visit http://gerrit.cloudera.org:8080/8426
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie09efdd9512cc4f0e338919af42bcdcede2cfb93
Gerrit-Change-Number: 8426
Gerrit-PatchSet: 2
Gerrit-Owner: Alex Behm 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Comment-Date: Tue, 31 Oct 2017 18:26:38 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-3887: Use dfs.namenode.replication.min=3

2017-10-31 Thread Alex Behm (Code Review)
Hello Bharath Vissapragada,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/8426

to look at the new patch set (#3).

Change subject: IMPALA-3887: Use dfs.namenode.replication.min=3
..

IMPALA-3887: Use dfs.namenode.replication.min=3

Changes the HDFS config of our mini-cluster to use:
dfs.namenode.replication.min=3

Several of our tests including planner tests
expect HDFS blocks to have exactly three replicas.
With the default HDFS configuration of
dfs.namenode.replication.min=1, HDFS acks writes
as soon as the min replication is satisfied. This
can lead to test failures because HDFS replication
is racing to replicate blocks while our tests
are executing and expecting to see all replicas.

A more detailed description is in the JIRA.

Testing:
- A core/hdfs ran passed
- A core/hdfs run with data loading passed

Change-Id: Ie09efdd9512cc4f0e338919af42bcdcede2cfb93
---
M fe/src/test/resources/hbase-site.xml.template
M testdata/cluster/node_templates/common/etc/hadoop/conf/hdfs-site.xml.tmpl
2 files changed, 21 insertions(+), 2 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/26/8426/3
--
To view, visit http://gerrit.cloudera.org:8080/8426
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ie09efdd9512cc4f0e338919af42bcdcede2cfb93
Gerrit-Change-Number: 8426
Gerrit-PatchSet: 3
Gerrit-Owner: Alex Behm 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Bharath Vissapragada 


[Impala-ASF-CR] IMPALA-6127: Fix timeout in TestRuntimeFilter.test wait time

2017-10-31 Thread Alex Behm (Code Review)
Alex Behm has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8427 )

Change subject: IMPALA-6127: Fix timeout in TestRuntimeFilter.test_wait_time
..


Patch Set 1: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/8427
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iee005bee8e0a535ce59d2e23e56be6004f2eb9de
Gerrit-Change-Number: 8427
Gerrit-PatchSet: 1
Gerrit-Owner: Thomas Tauber-Marshall 
Gerrit-Reviewer: Alex Behm 
Gerrit-Comment-Date: Tue, 31 Oct 2017 16:22:10 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4704: Turns on client connections when local catalog initialized.

2017-10-31 Thread Alex Behm (Code Review)
Alex Behm has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8202 )

Change subject: IMPALA-4704: Turns on client connections when local catalog 
initialized.
..


Patch Set 17:

(7 comments)

http://gerrit.cloudera.org:8080/#/c/8202/17/be/src/service/impala-server.h
File be/src/service/impala-server.h:

http://gerrit.cloudera.org:8080/#/c/8202/17/be/src/service/impala-server.h@81
PS17, Line 81: ///
whitespace


http://gerrit.cloudera.org:8080/#/c/8202/17/be/src/service/impala-server.h@85
PS17, Line 85: /// by clients at the same time.
Might want to weave in something like this to motivate the specific startup 
sequence:

The Impala server is considered 'ready' iff it can successfully process 
requests in all of its roles. The goal is make the state of the service easy to 
reason about.


http://gerrit.cloudera.org:8080/#/c/8202/17/be/src/service/impala-server.h@87
PS17, Line 87: /// 1. Init
Init()


http://gerrit.cloudera.org:8080/#/c/8202/17/be/src/service/impala-server.h@89
PS17, Line 89: ///- Wait (indefinitelt) for local catalog to be initialized 
from statestore (if coordinator)
typo: indefinitely


http://gerrit.cloudera.org:8080/#/c/8202/17/be/src/service/impala-server.h@93
PS17, Line 93: /// 2. Start
Start()


http://gerrit.cloudera.org:8080/#/c/8202/17/be/src/service/impala-server.h@98
PS17, Line 98: /// Membership callback thread:
When is the 'is_ready' metric set?


http://gerrit.cloudera.org:8080/#/c/8202/16/be/src/service/impala-server.h
File be/src/service/impala-server.h:

http://gerrit.cloudera.org:8080/#/c/8202/16/be/src/service/impala-server.h@1014
PS16, Line 1014:   ExpirationQueue queries_by_timestamp_;
> clarified the comment.
Fair point.
My preference would be to use 'is_ready' to avoid redundant state. If metrics 
are seen as a view on the internal state (which I agree with!), then 'is_ready' 
should report the value of this 'services_started_' flag and not store a 
separate value.



--
To view, visit http://gerrit.cloudera.org:8080/8202
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I52b881cba18a7e4533e21a78751c2e35c3d4c8a6
Gerrit-Change-Number: 8202
Gerrit-PatchSet: 17
Gerrit-Owner: Vuk Ercegovac 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Balazs Jeszenszky 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Michael Brown 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Vuk Ercegovac 
Gerrit-Comment-Date: Tue, 31 Oct 2017 16:21:23 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-3887: Use dfs.namenode.replication.min=3

2017-10-31 Thread Alex Behm (Code Review)
Alex Behm has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8426 )

Change subject: IMPALA-3887: Use dfs.namenode.replication.min=3
..


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/8426/1/fe/src/test/resources/hbase-site.xml.template
File fe/src/test/resources/hbase-site.xml.template:

http://gerrit.cloudera.org:8080/#/c/8426/1/fe/src/test/resources/hbase-site.xml.template@38
PS1, Line 38: 
:   
: dfs.namenode.replication.min
: 3
:   
> Good question, let me give it a try.
I tried without the min replication here and it works fine. Removed it.



--
To view, visit http://gerrit.cloudera.org:8080/8426
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie09efdd9512cc4f0e338919af42bcdcede2cfb93
Gerrit-Change-Number: 8426
Gerrit-PatchSet: 1
Gerrit-Owner: Alex Behm 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Comment-Date: Tue, 31 Oct 2017 15:57:16 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-3887: Use dfs.namenode.replication.min=3

2017-10-31 Thread Alex Behm (Code Review)
Hello Bharath Vissapragada,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/8426

to look at the new patch set (#2).

Change subject: IMPALA-3887: Use dfs.namenode.replication.min=3
..

IMPALA-3887: Use dfs.namenode.replication.min=3

Changes the HDFS and HBase configuration of our
mini-cluster to use:
dfs.namenode.replication.min=3

Several of our tests including planner tests
expect HDFS blocks to have exactly three replicas.
With the default HDFS configuration of
dfs.namenode.replication.min=1, HDFS acks writes
as soon as the min replication is satisfied. This
can lead to test failures because HDFS replication
is racing to replicate blocks while our tests
are executing and expecting to see all replicas.

A more detailed description is in the JIRA.

Testing:
- A core/hdfs ran passed
- A core/hdfs run with data loading passed

Change-Id: Ie09efdd9512cc4f0e338919af42bcdcede2cfb93
---
M fe/src/test/resources/hbase-site.xml.template
M testdata/cluster/node_templates/common/etc/hadoop/conf/hdfs-site.xml.tmpl
2 files changed, 21 insertions(+), 2 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/26/8426/2
--
To view, visit http://gerrit.cloudera.org:8080/8426
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ie09efdd9512cc4f0e338919af42bcdcede2cfb93
Gerrit-Change-Number: 8426
Gerrit-PatchSet: 2
Gerrit-Owner: Alex Behm 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Bharath Vissapragada 


[Impala-ASF-CR] IMPALA-3887: Use dfs.namenode.replication.min=3

2017-10-30 Thread Alex Behm (Code Review)
Alex Behm has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8426 )

Change subject: IMPALA-3887: Use dfs.namenode.replication.min=3
..


Patch Set 1:

(1 comment)

Don't have many data points regarding the effects on build+test runtimes.

The master-core tests have pretty high variance.
A quick examination showed times ranging roughly from 4:50 to 5:20.
The same job with this patch ran in 5:12 so at least it seems within the same 
ballpark.

I'll do an exhaustive test run and report numbers on that.

http://gerrit.cloudera.org:8080/#/c/8426/1/fe/src/test/resources/hbase-site.xml.template
File fe/src/test/resources/hbase-site.xml.template:

http://gerrit.cloudera.org:8080/#/c/8426/1/fe/src/test/resources/hbase-site.xml.template@38
PS1, Line 38: 
:   
: dfs.namenode.replication.min
: 3
:   
> Isn't this redundant? Per my understanding, this only goes into HDFS servic
Good question, let me give it a try.

I originally tried without modifying hbase-site at all and that didn't work, 
maybe setting dfs.replication is enough. I agree that setting the min 
replication here seems weird/unnecessary.



--
To view, visit http://gerrit.cloudera.org:8080/8426
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie09efdd9512cc4f0e338919af42bcdcede2cfb93
Gerrit-Change-Number: 8426
Gerrit-PatchSet: 1
Gerrit-Owner: Alex Behm 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Comment-Date: Tue, 31 Oct 2017 05:27:03 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4704: Turns on client connections when local catalog initialized.

2017-10-30 Thread Alex Behm (Code Review)
Alex Behm has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8202 )

Change subject: IMPALA-4704: Turns on client connections when local catalog 
initialized.
..


Patch Set 16:

(9 comments)

Nice, this looks much better

http://gerrit.cloudera.org:8080/#/c/8202/16/be/src/service/impala-server.h
File be/src/service/impala-server.h:

http://gerrit.cloudera.org:8080/#/c/8202/16/be/src/service/impala-server.h@105
PS16, Line 105: /// TODO: The state of a running query is currently not cleaned 
up if the
Let's document the startup procedure and the reasoning behind it here


http://gerrit.cloudera.org:8080/#/c/8202/16/be/src/service/impala-server.h@1014
PS16, Line 1014:   /// Flag that records if backend and/or client services have 
been started.
Clarify the meaning of "and/or" here. The distinction seems important. Will 
this flag be set if only one service has been started or not?


I know I suggested this new flag :), but now I'm wondering whether we can use 
the existing "is_ready" metric for this purpose.


http://gerrit.cloudera.org:8080/#/c/8202/16/be/src/service/impala-server.h@1017
PS16, Line 1017:   boost::mutex services_started_lock_;
std::atomic_bool instead of this lock?


http://gerrit.cloudera.org:8080/#/c/8202/16/be/src/service/impala-server.cc
File be/src/service/impala-server.cc:

http://gerrit.cloudera.org:8080/#/c/8202/16/be/src/service/impala-server.cc@1626
PS16, Line 1626:   // Register this backend only if services have been started.
Feels clearer to me to check this at the caller. Can be hard to reason about 
functions that are no-ops depending on internal state.
Any reason to have the check in here?

In any case we should add a function comment for the new behavior (ideally in 
the membership callback assuming you agree to move this check)


http://gerrit.cloudera.org:8080/#/c/8202/16/be/src/service/impala-server.cc@1933
PS16, Line 1933: Status ImpalaServer::Init(int32_t thrift_be_port, int32_t 
beeswax_port,
Why reformat fn args? Seemed ok the way it was.


http://gerrit.cloudera.org:8080/#/c/8202/16/tests/custom_cluster/test_catalog_wait.py
File tests/custom_cluster/test_catalog_wait.py:

http://gerrit.cloudera.org:8080/#/c/8202/16/tests/custom_cluster/test_catalog_wait.py@24
PS16, Line 24:   """Impalad must wait for the catalog prior to opening up 
client ports.
Specify the waiting condition more precisely. An impalad must wait for the 
initial catalog update to arrive via the statestore.


http://gerrit.cloudera.org:8080/#/c/8202/16/tests/custom_cluster/test_catalog_wait.py@54
PS16, Line 54: self.expect_connection(self.cluster.impalads[0])
Ideally we'd also check the internal service


http://gerrit.cloudera.org:8080/#/c/8202/16/tests/custom_cluster/test_catalog_wait.py@59
PS16, Line 59:   def test_query_subset(self):
Can we combine this test wit the one above? They seem very similar


http://gerrit.cloudera.org:8080/#/c/8202/16/tests/custom_cluster/test_catalog_wait.py@71
PS16, Line 71: 
self.cluster.impalads[0].service.get_metric_value('impala-server.ready', 1);
I'm wondering whether this can be flaky. We often use 
self.wait_for_metric_value() in case we expect delays. These impalads might 
still be waiting for the initial catalog update.



--
To view, visit http://gerrit.cloudera.org:8080/8202
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I52b881cba18a7e4533e21a78751c2e35c3d4c8a6
Gerrit-Change-Number: 8202
Gerrit-PatchSet: 16
Gerrit-Owner: Vuk Ercegovac 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Balazs Jeszenszky 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Michael Brown 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Vuk Ercegovac 
Gerrit-Comment-Date: Tue, 31 Oct 2017 05:11:18 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-3887: Use dfs.namenode.replication.min=3

2017-10-30 Thread Alex Behm (Code Review)
Alex Behm has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/8426


Change subject: IMPALA-3887: Use dfs.namenode.replication.min=3
..

IMPALA-3887: Use dfs.namenode.replication.min=3

Changes the HDFS and HBase configuration of our
mini-cluster to use:
dfs.namenode.replication.min=3

Several of our tests including planner tests
expect HDFS blocks to have exactly three replicas.
With the default HDFS configuration of
dfs.namenode.replication.min=1, HDFS acks writes
as soon as the min replication is satisfied. This
can lead to test failures because HDFS replication
is racing to replicate blocks while our tests
are executing and expecting to see all replicas.

A more detailed description is in the JIRA.

Testing:
- A core/hdfs ran passed
- A core/hdfs run with data loading passed

Change-Id: Ie09efdd9512cc4f0e338919af42bcdcede2cfb93
---
M fe/src/test/resources/hbase-site.xml.template
M testdata/cluster/node_templates/common/etc/hadoop/conf/hdfs-site.xml.tmpl
2 files changed, 26 insertions(+), 2 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/26/8426/1
--
To view, visit http://gerrit.cloudera.org:8080/8426
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: Ie09efdd9512cc4f0e338919af42bcdcede2cfb93
Gerrit-Change-Number: 8426
Gerrit-PatchSet: 1
Gerrit-Owner: Alex Behm 


[Impala-ASF-CR] IMPALA-6124: Fix alter table ddl updates and test

2017-10-30 Thread Alex Behm (Code Review)
Alex Behm has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8411 )

Change subject: IMPALA-6124: Fix alter table ddl updates and test
..


Patch Set 4:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/8411/4/tests/metadata/test_last_ddl_time_update.py
File tests/metadata/test_last_ddl_time_update.py:

http://gerrit.cloudera.org:8080/#/c/8411/4/tests/metadata/test_last_ddl_time_update.py@25
PS4, Line 25: class TestLastDdlTimeUpdate(ImpalaTestSuite):
> Does this test still make sense? There's no code on our side that modifies
Sorry, I was thinking ahead. Nvm this comment.



--
To view, visit http://gerrit.cloudera.org:8080/8411
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I3126252e7709304d3e1fa4bb06a0b847180bd6cf
Gerrit-Change-Number: 8411
Gerrit-PatchSet: 4
Gerrit-Owner: Lars Volker 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Lars Volker 
Gerrit-Comment-Date: Mon, 30 Oct 2017 23:43:12 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-6124: Fix alter table ddl updates and test

2017-10-30 Thread Alex Behm (Code Review)
Alex Behm has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8411 )

Change subject: IMPALA-6124: Fix alter table ddl updates and test
..


Patch Set 6: Code-Review+1

(1 comment)

Lgtm. Bharath, please do the +2

http://gerrit.cloudera.org:8080/#/c/8411/4/tests/metadata/test_last_ddl_time_update.py
File tests/metadata/test_last_ddl_time_update.py:

http://gerrit.cloudera.org:8080/#/c/8411/4/tests/metadata/test_last_ddl_time_update.py@25
PS4, Line 25: class TestLastDdlTimeUpdate(ImpalaTestSuite):
Does this test still make sense? There's no code on our side that modifies the 
lastDdlTime



--
To view, visit http://gerrit.cloudera.org:8080/8411
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I3126252e7709304d3e1fa4bb06a0b847180bd6cf
Gerrit-Change-Number: 8411
Gerrit-PatchSet: 6
Gerrit-Owner: Lars Volker 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Lars Volker 
Gerrit-Comment-Date: Mon, 30 Oct 2017 23:42:11 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-6118: Fix assertion failure in coordinator bloom filter updating

2017-10-30 Thread Alex Behm (Code Review)
Alex Behm has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8410 )

Change subject: IMPALA-6118: Fix assertion failure in coordinator bloom filter 
updating
..


Patch Set 3: Code-Review+2

Seems fine, std::string is weird


--
To view, visit http://gerrit.cloudera.org:8080/8410
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia28f9eb8d0b3e795d798949a1cfcfeaaec7040ed
Gerrit-Change-Number: 8410
Gerrit-PatchSet: 3
Gerrit-Owner: Tianyi Wang 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Tianyi Wang 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Mon, 30 Oct 2017 23:33:07 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4704: Turns on client connections when local catalog initialized.

2017-10-30 Thread Alex Behm (Code Review)
Alex Behm has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8202 )

Change subject: IMPALA-4704: Turns on client connections when local catalog 
initialized.
..


Patch Set 15:

(8 comments)

http://gerrit.cloudera.org:8080/#/c/8202/15/be/src/service/frontend.cc
File be/src/service/frontend.cc:

http://gerrit.cloudera.org:8080/#/c/8202/15/be/src/service/frontend.cc@31
PS15, Line 31: #ifndef NDEBUG
What's the harm in always compiling it in?


http://gerrit.cloudera.org:8080/#/c/8202/15/be/src/service/frontend.cc@256
PS15, Line 256:   if (result) return true;
return result?


http://gerrit.cloudera.org:8080/#/c/8202/15/be/src/service/impala-server.h
File be/src/service/impala-server.h:

http://gerrit.cloudera.org:8080/#/c/8202/15/be/src/service/impala-server.h@125
PS15, Line 125:   /// Initializes client RPC services. Must be called after 
InitInternalServices. Is a
we typically say InitInternalServices() to make it clear it's a function


http://gerrit.cloudera.org:8080/#/c/8202/15/be/src/service/impala-server.cc
File be/src/service/impala-server.cc:

http://gerrit.cloudera.org:8080/#/c/8202/15/be/src/service/impala-server.cc@1541
PS15, Line 1541: // Additionally wait for the local catalog to be 
initialized if the server is a
Suggest rephrasing "wait" to clarify. "wait" could be misinterpreted as 
blocking. Maybe say something like:

Announce the availability of this impalad coordinator once the local catalog 
has been initialized.


http://gerrit.cloudera.org:8080/#/c/8202/15/be/src/service/impalad-main.cc
File be/src/service/impalad-main.cc:

http://gerrit.cloudera.org:8080/#/c/8202/15/be/src/service/impalad-main.cc@91
PS15, Line 91:   Status status = impala_server->StartInternalServices();
As discussed, ordering is not quite right yet.

It might make sense to add flags for client_services_started and 
internal_services_started to address the race between opening internal/client 
ports and announcing the availability of this daemon through the membership 
callback.


http://gerrit.cloudera.org:8080/#/c/8202/15/bin/start-impala-cluster.py
File bin/start-impala-cluster.py:

http://gerrit.cloudera.org:8080/#/c/8202/15/bin/start-impala-cluster.py@81
PS15, Line 81: # For testing: list of comma-separated delays, in milliseconds, 
that delay catalog
clarify which catalog exactly


http://gerrit.cloudera.org:8080/#/c/8202/15/fe/src/main/java/org/apache/impala/service/Frontend.java
File fe/src/main/java/org/apache/impala/service/Frontend.java:

http://gerrit.cloudera.org:8080/#/c/8202/15/fe/src/main/java/org/apache/impala/service/Frontend.java@877
PS15, Line 877: }
add a final LOG.info() here declaring success (and remove the one in 
impala-server)


http://gerrit.cloudera.org:8080/#/c/8202/12/fe/src/test/java/org/apache/impala/service/FrontendTest.java
File fe/src/test/java/org/apache/impala/service/FrontendTest.java:

http://gerrit.cloudera.org:8080/#/c/8202/12/fe/src/test/java/org/apache/impala/service/FrontendTest.java@a69
PS12, Line 69:
> I was a bit surprised to see authorization configs here-- what was the inte
Sentry can configured in a service mode or with a file-based auth policy. I 
suppose this test was checking that the readiness check works regardless of 
auth policy. Not sure how much sense that makes.

For auth we use AuthorizationTest.

I'm thinking this test doesn't make sense anymore.



--
To view, visit http://gerrit.cloudera.org:8080/8202
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I52b881cba18a7e4533e21a78751c2e35c3d4c8a6
Gerrit-Change-Number: 8202
Gerrit-PatchSet: 15
Gerrit-Owner: Vuk Ercegovac 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Balazs Jeszenszky 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Michael Brown 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Vuk Ercegovac 
Gerrit-Comment-Date: Mon, 30 Oct 2017 21:36:52 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-6126: ASAN detects heap-use-after-free in thrift-server-test

2017-10-30 Thread Alex Behm (Code Review)
Alex Behm has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8412 )

Change subject: IMPALA-6126: ASAN detects heap-use-after-free in 
thrift-server-test
..


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/8412/1/be/src/rpc/thrift-server-test.cc
File be/src/rpc/thrift-server-test.cc:

http://gerrit.cloudera.org:8080/#/c/8412/1/be/src/rpc/thrift-server-test.cc@109
PS1, Line 109: string current_executable_path;
Brief comment why this needs to be here. Alternative: static member in the 
ThriftParamsTest class?



--
To view, visit http://gerrit.cloudera.org:8080/8412
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2cd434757de2cd384def5b360a479e51812a
Gerrit-Change-Number: 8412
Gerrit-PatchSet: 1
Gerrit-Owner: Sailesh Mukil 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Lars Volker 
Gerrit-Comment-Date: Mon, 30 Oct 2017 16:41:16 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-6124: Make assertion in ddl update test resilient to long runtime

2017-10-28 Thread Alex Behm (Code Review)
Alex Behm has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8411 )

Change subject: IMPALA-6124: Make assertion in ddl update test resilient to 
long runtime
..


Patch Set 1: Code-Review+2

lgtm once you complete testing


--
To view, visit http://gerrit.cloudera.org:8080/8411
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I3126252e7709304d3e1fa4bb06a0b847180bd6cf
Gerrit-Change-Number: 8411
Gerrit-PatchSet: 1
Gerrit-Owner: Lars Volker 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Lars Volker 
Gerrit-Comment-Date: Sat, 28 Oct 2017 17:00:17 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-6118: Fix assertion failure in coordinator bloom filter updating

2017-10-27 Thread Alex Behm (Code Review)
Alex Behm has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8410 )

Change subject: IMPALA-6118: Fix assertion failure in coordinator bloom filter 
updating
..


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/8410/1/be/src/runtime/coordinator.cc
File be/src/runtime/coordinator.cc:

http://gerrit.cloudera.org:8080/#/c/8410/1/be/src/runtime/coordinator.cc@1180
PS1, Line 1180:   bloom_filter_.directory.clear();
I think the bigger problem here is actually that the memory of the string is 
not freed (meaning we have a big chunk of untracked memory!). You could create 
a new string on the stack and move() the contents into that string, so the 
memory is freed.

Then I think the old assumption of 0 capacity is valid again.



--
To view, visit http://gerrit.cloudera.org:8080/8410
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia28f9eb8d0b3e795d798949a1cfcfeaaec7040ed
Gerrit-Change-Number: 8410
Gerrit-PatchSet: 1
Gerrit-Owner: Tianyi Wang 
Gerrit-Reviewer: Alex Behm 
Gerrit-Comment-Date: Sat, 28 Oct 2017 04:41:53 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-6123: Fix column order of a query test in test inline view limit

2017-10-27 Thread Alex Behm (Code Review)
Alex Behm has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8409 )

Change subject: IMPALA-6123: Fix column order of a query test in 
test_inline_view_limit
..


Patch Set 2:

Any idea how this test succeeded in the pre-commit tests?


--
To view, visit http://gerrit.cloudera.org:8080/8409
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I11667872b8788a8b0040bf9252bf07b987b5d330
Gerrit-Change-Number: 8409
Gerrit-PatchSet: 2
Gerrit-Owner: Bikramjeet Vig 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Sat, 28 Oct 2017 04:01:43 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-6123: Fix column order of a query test in test inline view limit

2017-10-27 Thread Alex Behm (Code Review)
Alex Behm has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8409 )

Change subject: IMPALA-6123: Fix column order of a query test in 
test_inline_view_limit
..


Patch Set 2:

Actually nvm, I see it only fails i exhaustive. Makes sense.


--
To view, visit http://gerrit.cloudera.org:8080/8409
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I11667872b8788a8b0040bf9252bf07b987b5d330
Gerrit-Change-Number: 8409
Gerrit-PatchSet: 2
Gerrit-Owner: Bikramjeet Vig 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Sat, 28 Oct 2017 04:02:15 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-3436: Return a decimal when rounding a double

2017-10-27 Thread Alex Behm (Code Review)
Alex Behm has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8398 )

Change subject: IMPALA-3436: Return a decimal when rounding a double
..


Patch Set 1:

(13 comments)

http://gerrit.cloudera.org:8080/#/c/8398/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/8398/1//COMMIT_MSG@16
PS1, Line 16: This is implemented by introducing a "hack" where we make the 
parser
How about we say rewrite or translation instead of hack? I don't think it's 
really a terrible hack - it's a minimal change.


http://gerrit.cloudera.org:8080/#/c/8398/1/be/src/exprs/decimal-functions-ir.cc
File be/src/exprs/decimal-functions-ir.cc:

http://gerrit.cloudera.org:8080/#/c/8398/1/be/src/exprs/decimal-functions-ir.cc@111
PS1, Line 111:   DCHECK(!scale.is_null);
Add a FE test case in AnalyzeExprsTest.java that makes sure this is enforced. 
The existing round() functions that take a DECIMAL argument must have a 
non-NULL, constant second argument. New tests should be added to make sure that 
is also enforced for your new round() function.

There's also a subtle change in behavior we should consider. The existing 
round() function accepts non-constant arguments, should the new round 
function() also allow that or not?


http://gerrit.cloudera.org:8080/#/c/8398/1/be/src/exprs/decimal-functions-ir.cc@116
PS1, Line 116:   bool ovf = false;
overflow


http://gerrit.cloudera.org:8080/#/c/8398/1/be/src/exprs/decimal-functions.h
File be/src/exprs/decimal-functions.h:

http://gerrit.cloudera.org:8080/#/c/8398/1/be/src/exprs/decimal-functions.h@58
PS1, Line 58:   FunctionContext*, const DoubleVal&, const IntVal&);
add arg names for consistency


http://gerrit.cloudera.org:8080/#/c/8398/1/common/function-registry/impala_functions.py
File common/function-registry/impala_functions.py:

http://gerrit.cloudera.org:8080/#/c/8398/1/common/function-registry/impala_functions.py@284
PS1, Line 284:   [['round_v1','dround_v1'],
Why keep the dround_v1 alias?


http://gerrit.cloudera.org:8080/#/c/8398/1/common/function-registry/impala_functions.py@286
PS1, Line 286:   [['round','dround'], 'DECIMAL', ['DOUBLE', 'INT'], 
'impala::DecimalFunctions::RoundTo'],
Can you organize these and comment on whether these are used in v1/v2 or both? 
Seems confusing now.


http://gerrit.cloudera.org:8080/#/c/8398/1/common/function-registry/impala_functions.py@287
PS1, Line 287:   [['round','dround','round_v1','dround_v1'],
Why keep the dround_v1 alias?


http://gerrit.cloudera.org:8080/#/c/8398/1/common/function-registry/impala_functions.py@403
PS1, Line 403:   [['round','dround','round_v1','dround_v1'],
Why keep the dround_v1 alias in these?


http://gerrit.cloudera.org:8080/#/c/8398/1/fe/src/main/cup/sql-parser.cup
File fe/src/main/cup/sql-parser.cup:

http://gerrit.cloudera.org:8080/#/c/8398/1/fe/src/main/cup/sql-parser.cup@70
PS1, Line 70:   private TQueryOptions queryOptions;
add TODO to remove when decimal v1 is gone, maybe come up with a specific thing 
you can grep for later in the code, e.g. DECIMAL_V1 and use that consistently 
in the TODOs for removal


http://gerrit.cloudera.org:8080/#/c/8398/1/fe/src/main/java/org/apache/impala/analysis/FunctionCallExpr.java
File fe/src/main/java/org/apache/impala/analysis/FunctionCallExpr.java:

http://gerrit.cloudera.org:8080/#/c/8398/1/fe/src/main/java/org/apache/impala/analysis/FunctionCallExpr.java@100
PS1, Line 100: FunctionCallExpr functionCallExpr = new 
FunctionCallExpr(fnName, params);
newline


http://gerrit.cloudera.org:8080/#/c/8398/1/fe/src/main/java/org/apache/impala/analysis/FunctionCallExpr.java@111
PS1, Line 111: // nullif(x, y) -> if(x DISTINCT FROM y, x, NULL)
newline


http://gerrit.cloudera.org:8080/#/c/8398/1/fe/src/main/java/org/apache/impala/analysis/FunctionCallExpr.java@385
PS1, Line 385:* This can only be called for functions that return wildcard 
decimals and the first
Is this comment still accurate?


http://gerrit.cloudera.org:8080/#/c/8398/1/fe/src/main/java/org/apache/impala/analysis/FunctionCallExpr.java@407
PS1, Line 407:   // The situtation where none of the parameters are 
decimal, but the return type is
* Shrink comment to:
None of the parameters are decimal but the return type is decimal.

* It's not clear to me why we'd use (38,0) in this case, can you explain? For 
example, round(double, int) does not have decimal args. Shouldn't the return 
type be (38,X) where X is the value of the second arg if it is a constant?



--
To view, visit http://gerrit.cloudera.org:8080/8398
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I240ec70da7996bbfefcf6438eba4dff8d33d7bd6
Gerrit-Change-Number: 8398
Gerrit-PatchSet: 1
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: Alex Behm 
Gerrit-Comment-Date: Fri, 27 Oct 2017 

[Impala-ASF-CR] IMPALA-3548: Prune runtime filters based on query options in the FE

2017-10-26 Thread Alex Behm (Code Review)
Alex Behm has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/7564 )

Change subject: IMPALA-3548: Prune runtime filters based on query options in 
the FE
..


Patch Set 13: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/7564
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Id0f0b200e02442edcad8df3979f652d66c6e52eb
Gerrit-Change-Number: 7564
Gerrit-PatchSet: 13
Gerrit-Owner: Attila Jeges 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Comment-Date: Thu, 26 Oct 2017 17:06:30 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-3548: Prune runtime filters based on query options in the FE

2017-10-26 Thread Alex Behm (Code Review)
Alex Behm has submitted this change and it was merged. ( 
http://gerrit.cloudera.org:8080/7564 )

Change subject: IMPALA-3548: Prune runtime filters based on query options in 
the FE
..

IMPALA-3548: Prune runtime filters based on query options in the FE

Currently, the FE generates a number of runtime filters and assigns
them to the single node plan without taking the value of
RUNTIME_FILTER_MODE and DISABLE_ROW_RUNTIME_FILTERING query options
into account.

The backend then removes filters from exec nodes, based on the
following rules:
1. If DISABLE_ROW_RUNTIME_FILTERING is set, filters are removed from
   the exec nodes that are marked as targets not bound by partition
   columns.

2. If RUNTIME_FILTER_MODE is set to LOCAL, filters are removed from
   the exec nodes that are marked as remote targets.

This may cause some confusion to users because they may see runtime
filters in the output of explain that are not applied when the query
is executed.

This change moves the logic of runtime filter pruning to the planner
in the FE. The runtime filter assignment is done on the distributed
plan and the above constraints are enforced there directly.

Change-Id: Id0f0b200e02442edcad8df3979f652d66c6e52eb
Reviewed-on: http://gerrit.cloudera.org:8080/7564
Tested-by: Impala Public Jenkins
Reviewed-by: Alex Behm 
---
M be/src/exec/partitioned-hash-join-builder.cc
M be/src/exec/scan-node.cc
M be/src/runtime/coordinator.cc
M be/src/service/fe-support.cc
M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java
M fe/src/main/java/org/apache/impala/planner/Planner.java
M fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java
M fe/src/main/java/org/apache/impala/service/FeSupport.java
M fe/src/test/java/org/apache/impala/planner/PlannerTest.java
M fe/src/test/java/org/apache/impala/planner/PlannerTestBase.java
M fe/src/test/java/org/apache/impala/testutil/TestFileParser.java
A 
testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-query-options.test
12 files changed, 775 insertions(+), 108 deletions(-)

Approvals:
  Impala Public Jenkins: Verified
  Alex Behm: Looks good to me, approved

--
To view, visit http://gerrit.cloudera.org:8080/7564
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: Id0f0b200e02442edcad8df3979f652d66c6e52eb
Gerrit-Change-Number: 7564
Gerrit-PatchSet: 14
Gerrit-Owner: Attila Jeges 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Thomas Tauber-Marshall 


[Impala-ASF-CR] IMPALA-3548: Prune runtime filters based on query options in the FE

2017-10-25 Thread Alex Behm (Code Review)
Alex Behm has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/7564 )

Change subject: IMPALA-3548: Prune runtime filters based on query options in 
the FE
..


Patch Set 12: Code-Review+2

(1 comment)

http://gerrit.cloudera.org:8080/#/c/7564/10/fe/src/test/java/org/apache/impala/testutil/TestFileParser.java
File fe/src/test/java/org/apache/impala/testutil/TestFileParser.java:

http://gerrit.cloudera.org:8080/#/c/7564/10/fe/src/test/java/org/apache/impala/testutil/TestFileParser.java@288
PS10, Line 288: if (!currentTestCase.isValid()) {
> Correct, every PlannerTest .test file ends with "". I've removed  parse
I agree. This code is really confusing and could use some cleanup - but not in 
this patch.

I believe we already have a check similar to the one you proposed. Try creating 
an invalid .test file, e.g., by omitting the last "".



--
To view, visit http://gerrit.cloudera.org:8080/7564
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Id0f0b200e02442edcad8df3979f652d66c6e52eb
Gerrit-Change-Number: 7564
Gerrit-PatchSet: 12
Gerrit-Owner: Attila Jeges 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Comment-Date: Wed, 25 Oct 2017 18:39:16 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] DRAFT IMPALA-5185: Skip pages based on Parquet::Statistics

2017-10-24 Thread Alex Behm (Code Review)
Alex Behm has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/7354 )

Change subject: DRAFT IMPALA-5185: Skip pages based on Parquet::Statistics
..


Patch Set 3:

Lars, if this patch is not actively being worked on can we abandon?


--
To view, visit http://gerrit.cloudera.org:8080/7354
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I8eec838c5baf22167049f570dd0ef9762c5ae0a6
Gerrit-Change-Number: 7354
Gerrit-PatchSet: 3
Gerrit-Owner: Lars Volker 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Pooja Nilangekar 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Wed, 25 Oct 2017 05:18:59 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4978 / IMPALA-5631: [DOCS] Add FQDN known issue

2017-10-24 Thread Alex Behm (Code Review)
Alex Behm has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/7388 )

Change subject: IMPALA-4978 / IMPALA-5631: [DOCS] Add FQDN known issue
..


Patch Set 2:

Documenting this seems kinda important. Adding Sailesh as reviewer to help move 
this forward. Sailesh, feel free to find another suitable person to review this.


--
To view, visit http://gerrit.cloudera.org:8080/7388
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ib039d0102878f1c05470371f581cb258287b9bc0
Gerrit-Change-Number: 7388
Gerrit-PatchSet: 2
Gerrit-Owner: John Russell 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: John Russell 
Gerrit-Reviewer: Lars Volker 
Gerrit-Comment-Date: Wed, 25 Oct 2017 05:18:04 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-6004: Fix test row filters failure on ASAN

2017-10-24 Thread Alex Behm (Code Review)
Alex Behm has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8358 )

Change subject: IMPALA-6004: Fix test_row_filters failure on ASAN
..


Patch Set 2: Code-Review+2

(1 comment)

http://gerrit.cloudera.org:8080/#/c/8358/1/testdata/workloads/functional-query/queries/QueryTest/runtime_row_filters.test
File 
testdata/workloads/functional-query/queries/QueryTest/runtime_row_filters.test:

http://gerrit.cloudera.org:8080/#/c/8358/1/testdata/workloads/functional-query/queries/QueryTest/runtime_row_filters.test@350
PS1, Line 350: SET RUNTIME_FILTER_WAIT_TIME_MS=$RUNTIME_FILTER_WAIT_TIME_MS;
> Its not really necessary. There's one case (14) that seems to legitimately
Thanks for making this change. I agree it's maybe not strictly necessary, but 
fine-tuning the timeout on a case-by-case basis has already caused many build 
failures and subsequent fixes to bump the timeouts. I'm hoping that this 
blanket configuration will ultimately be more stable.



--
To view, visit http://gerrit.cloudera.org:8080/8358
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia098735594b36a72f02bf7edd051171689618051
Gerrit-Change-Number: 8358
Gerrit-PatchSet: 2
Gerrit-Owner: Thomas Tauber-Marshall 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Comment-Date: Wed, 25 Oct 2017 05:05:28 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] Prereqs for load test system testing

2017-10-24 Thread Alex Behm (Code Review)
Alex Behm has abandoned this change. ( http://gerrit.cloudera.org:8080/5851 )

Change subject: Prereqs for load test system testing
..


Abandoned

Abandoning to reflect that this is not actively being worked on.
--
To view, visit http://gerrit.cloudera.org:8080/5851
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: abandon
Gerrit-Change-Id: I38b5b4fe8c30d1c5f591bbe37b6be8bb8a5398ab
Gerrit-Change-Number: 5851
Gerrit-PatchSet: 1
Gerrit-Owner: Harrison Sheinblatt 


[Impala-ASF-CR] IMPALA-3548: Prune runtime filters based on query options in the FE

2017-10-24 Thread Alex Behm (Code Review)
Alex Behm has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/7564 )

Change subject: IMPALA-3548: Prune runtime filters based on query options in 
the FE
..


Patch Set 11:

(13 comments)

Looks good, final comments.

http://gerrit.cloudera.org:8080/#/c/7564/10/be/src/service/fe-support.cc
File be/src/service/fe-support.cc:

http://gerrit.cloudera.org:8080/#/c/7564/10/be/src/service/fe-support.cc@455
PS10, Line 455:   TQueryOptions options;
> I've renamed the parameter.
Thanks for investigating and providing an explanation!

We should summarize your findings and add them to the function comment of 
FeSupport#NativeParseQueryOptions() to explain why we must pass in an existing 
TQueryOptions for this to work as intended.


http://gerrit.cloudera.org:8080/#/c/7564/11/be/src/service/fe-support.cc
File be/src/service/fe-support.cc:

http://gerrit.cloudera.org:8080/#/c/7564/11/be/src/service/fe-support.cc@453
PS11, Line 453: 
Java_org_apache_impala_service_FeSupport_NativeParseQueryOptions(
Nice and clean! Thanks.


http://gerrit.cloudera.org:8080/#/c/7564/11/be/src/service/fe-support.cc@454
PS11, Line 454: JNIEnv* env, jclass caller_class, jstring options_str, 
jbyteArray tquery_options) {
rename options_str to csv_query_options for consistency


http://gerrit.cloudera.org:8080/#/c/7564/10/fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java
File fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java:

http://gerrit.cloudera.org:8080/#/c/7564/10/fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java@397
PS10, Line 397: int maxNumFilters = 
ctx.getQueryOptions().getMax_num_runtime_filters();
> 0 is a valid value. MAX_NUM_RUNTIME_FILTERS=0 effectively removes every run
Sorry, my mistake. For some reason I didn't read the ">=" correctly.


http://gerrit.cloudera.org:8080/#/c/7564/10/fe/src/main/java/org/apache/impala/service/FeSupport.java
File fe/src/main/java/org/apache/impala/service/FeSupport.java:

http://gerrit.cloudera.org:8080/#/c/7564/10/fe/src/main/java/org/apache/impala/service/FeSupport.java@93
PS10, Line 93:
> See my response to be/src/service/fe-support.cc L455.
Yikes! What a can of worms. Thanks for investigating and explaining.


http://gerrit.cloudera.org:8080/#/c/7564/10/fe/src/test/java/org/apache/impala/testutil/TestFileParser.java
File fe/src/test/java/org/apache/impala/testutil/TestFileParser.java:

http://gerrit.cloudera.org:8080/#/c/7564/10/fe/src/test/java/org/apache/impala/testutil/TestFileParser.java@288
PS10, Line 288: if (!currentTestCase.isValid()) {
> L288-L293 handles test cases found in the middle of the test file.
All PlannerTest .test files should end with "", so I think the code here in 
L228 is enough. I tried your patch locally using a .test file with 3 tests and 
the query option parsing worked as expected without L338.

I might be missing something, if so, can you point me to an example file where 
we also need the parseQueryOptions() call in L338?


http://gerrit.cloudera.org:8080/#/c/7564/11/fe/src/test/java/org/apache/impala/testutil/TestFileParser.java
File fe/src/test/java/org/apache/impala/testutil/TestFileParser.java:

http://gerrit.cloudera.org:8080/#/c/7564/11/fe/src/test/java/org/apache/impala/testutil/TestFileParser.java@359
PS11, Line 359:   " - " + e.getMessage());
also pass in 'e' as the cause


http://gerrit.cloudera.org:8080/#/c/7564/11/testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-query-options.test
File 
testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-query-options.test:

http://gerrit.cloudera.org:8080/#/c/7564/11/testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-query-options.test@38
PS11, Line 38:  DISTRIBUTEDPLAN
What's the value in keeping this section?


http://gerrit.cloudera.org:8080/#/c/7564/11/testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-query-options.test@174
PS11, Line 174:  DISTRIBUTEDPLAN
What's the value in keeping this section?


http://gerrit.cloudera.org:8080/#/c/7564/11/testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-query-options.test@257
PS11, Line 257:  DISTRIBUTEDPLAN
What's the value in keeping this section?


http://gerrit.cloudera.org:8080/#/c/7564/11/testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-query-options.test@392
PS11, Line 392: MAX_NUM_RUNTIME_FILTERS=4
How about setting to 3 so we can see both local and non-local filters being 
removed from the distributed plan


http://gerrit.cloudera.org:8080/#/c/7564/11/testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-query-options.test@582
PS11, Line 582:  DISTRIBUTEDPLAN
What's the value in keeping this section?



[Impala-ASF-CR] IMPALA-6060: Check the return value of JNI exception handling functions

2017-10-24 Thread Alex Behm (Code Review)
Alex Behm has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8334 )

Change subject: IMPALA-6060: Check the return value of JNI exception handling 
functions
..


Patch Set 4: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/8334
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie3ed88bf8739c56a066f2402727c8204e96aa116
Gerrit-Change-Number: 8334
Gerrit-PatchSet: 4
Gerrit-Owner: Tianyi Wang 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Tianyi Wang 
Gerrit-Comment-Date: Tue, 24 Oct 2017 17:24:44 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-6060: Check the return value of JNI exception handling functions

2017-10-23 Thread Alex Behm (Code Review)
Alex Behm has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8334 )

Change subject: IMPALA-6060: Check the return value of JNI exception handling 
functions
..


Patch Set 3:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/8334/3/be/src/util/jni-util.cc
File be/src/util/jni-util.cc:

http://gerrit.cloudera.org:8080/#/c/8334/3/be/src/util/jni-util.cc@190
PS3, Line 190:   auto msg = 
static_cast(env->CallStaticObjectMethod(jni_util_class(),
> Same here and in other cases below: The use of auto obscures the code. This
It is somewhat obvious from the static_cast, of course. Still, auto generally 
adds cognitive burden to the reader unless it really helps readability



--
To view, visit http://gerrit.cloudera.org:8080/8334
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie3ed88bf8739c56a066f2402727c8204e96aa116
Gerrit-Change-Number: 8334
Gerrit-PatchSet: 3
Gerrit-Owner: Tianyi Wang 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Tianyi Wang 
Gerrit-Comment-Date: Tue, 24 Oct 2017 00:42:57 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-6060: Check the return value of JNI exception handling functions

2017-10-23 Thread Alex Behm (Code Review)
Alex Behm has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8334 )

Change subject: IMPALA-6060: Check the return value of JNI exception handling 
functions
..


Patch Set 3:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/8334/3/be/src/util/jni-util.cc
File be/src/util/jni-util.cc:

http://gerrit.cloudera.org:8080/#/c/8334/3/be/src/util/jni-util.cc@190
PS3, Line 190:   auto msg = 
static_cast(env->CallStaticObjectMethod(jni_util_class(),
Same here and in other cases below: The use of auto obscures the code. This is 
a jstring which really is not obvious and does matter.


http://gerrit.cloudera.org:8080/#/c/8334/3/be/src/util/jni-util.cc@194
PS3, Line 194: auto oom_msg = Substitute(oom_msg_template, 
"throwableToString");
no auto please


http://gerrit.cloudera.org:8080/#/c/8334/3/be/src/util/jni-util.cc@201
PS3, Line 201: auto stack = 
static_cast(env->CallStaticObjectMethod(jni_util_class(),
no auto please


http://gerrit.cloudera.org:8080/#/c/8334/3/be/src/util/jni-util.cc@205
PS3, Line 205:   auto oom_msg = Substitute(oom_msg_template, 
"throwableToStackTrace");
no auto please



--
To view, visit http://gerrit.cloudera.org:8080/8334
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie3ed88bf8739c56a066f2402727c8204e96aa116
Gerrit-Change-Number: 8334
Gerrit-PatchSet: 3
Gerrit-Owner: Tianyi Wang 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Tianyi Wang 
Gerrit-Comment-Date: Tue, 24 Oct 2017 00:41:14 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-6068: Fix dataload for complextypes fileformat

2017-10-23 Thread Alex Behm (Code Review)
Alex Behm has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8350 )

Change subject: IMPALA-6068: Fix dataload for complextypes_fileformat
..


Patch Set 3: Code-Review+1

(1 comment)

http://gerrit.cloudera.org:8080/#/c/8350/3/testdata/common/text_delims_table.py
File testdata/common/text_delims_table.py:

http://gerrit.cloudera.org:8080/#/c/8350/3/testdata/common/text_delims_table.py@60
PS3, Line 60:   print ("LOAD DATA LOCAL INPATH '%s' "
Mention that this is intended to be printed into one of the sections of our 
functional schema template.



--
To view, visit http://gerrit.cloudera.org:8080/8350
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7152306b2907198204a6d8d282a0bad561129b82
Gerrit-Change-Number: 8350
Gerrit-PatchSet: 3
Gerrit-Owner: Joe McDonnell 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Comment-Date: Tue, 24 Oct 2017 00:22:52 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4704: Turns on client connections when local catalog initialized.

2017-10-23 Thread Alex Behm (Code Review)
Alex Behm has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8202 )

Change subject: IMPALA-4704: Turns on client connections when local catalog 
initialized.
..


Patch Set 12:

(10 comments)

http://gerrit.cloudera.org:8080/#/c/8202/12/be/src/service/frontend.h
File be/src/service/frontend.h:

http://gerrit.cloudera.org:8080/#/c/8202/12/be/src/service/frontend.h@171
PS12, Line 171:   Status WaitForCatalog();
Why even return anything? Success is implied by this function returning.


http://gerrit.cloudera.org:8080/#/c/8202/12/be/src/service/frontend.cc
File be/src/service/frontend.cc:

http://gerrit.cloudera.org:8080/#/c/8202/12/be/src/service/frontend.cc@253
PS12, Line 253:   JniLocalFrame jni_frame;
This JNI frame stuff is not needed here because we are not creating any local 
references.

The same is true for SetCatalogInitialized() - frame stuff not needed.


http://gerrit.cloudera.org:8080/#/c/8202/12/be/src/service/impala-server.cc
File be/src/service/impala-server.cc:

http://gerrit.cloudera.org:8080/#/c/8202/12/be/src/service/impala-server.cc@2045
PS12, Line 2045:   if (thrift_be_server_.get()) {
I wonder if this can cause problems. A coordinator+executor impalad might have 
registered with the statestore but not received the initial catalog update yet. 
This will not stop other coordinators from scheduling work on this impalad - 
but the InternalService port has not yet been opened, meaning those queries 
will fail.

I think we need to strictly distinguish between the internal and client facing 
services.


http://gerrit.cloudera.org:8080/#/c/8202/12/bin/start-impala-cluster.py
File bin/start-impala-cluster.py:

http://gerrit.cloudera.org:8080/#/c/8202/12/bin/start-impala-cluster.py@81
PS12, Line 81: parser.add_option("--disable_catalog", dest="disable_catalog", 
default=False,
disable_catalogd? We seem to be consistently referring to processes with their 
"d" suffix


http://gerrit.cloudera.org:8080/#/c/8202/12/bin/start-impala-cluster.py@255
PS12, Line 255: def is_catalog_ready(impala_cluster):
is_catalogd_up()?

To distinguish the other "catalog ready" which refers to the local catalog 
cache of an impalad.


http://gerrit.cloudera.org:8080/#/c/8202/12/fe/src/main/java/org/apache/impala/service/Frontend.java
File fe/src/main/java/org/apache/impala/service/Frontend.java:

http://gerrit.cloudera.org:8080/#/c/8202/12/fe/src/main/java/org/apache/impala/service/Frontend.java@865
PS12, Line 865: LOG.info("Waiting for local catalog to be initialized.");
Users may not know what "initialized" means, better to state explicitly what we 
are waiting for, e.g.:

Waiting for the first catalog update from the statestore.


http://gerrit.cloudera.org:8080/#/c/8202/12/fe/src/main/java/org/apache/impala/service/Frontend.java@866
PS12, Line 866: int num_tries = 0;
numTries


http://gerrit.cloudera.org:8080/#/c/8202/12/fe/src/main/java/org/apache/impala/service/Frontend.java@905
PS12, Line 905: "Analyzing a query is not support when the local 
catalog is not initialized.");
I'd chose a different phrasing because "not supported" could be misinterpreted 
as a missing feature or might make the user think he/she did something wrong - 
but hitting this is not expected and probably a bug on our side. How about:

"Local catalog has not been initialized. Aborting query analysis."


http://gerrit.cloudera.org:8080/#/c/8202/12/fe/src/main/java/org/apache/impala/service/JniFrontend.java
File fe/src/main/java/org/apache/impala/service/JniFrontend.java:

http://gerrit.cloudera.org:8080/#/c/8202/12/fe/src/main/java/org/apache/impala/service/JniFrontend.java@583
PS12, Line 583:   public void setCatalogInitialized() {
setCatalogIsReady() for consistency


http://gerrit.cloudera.org:8080/#/c/8202/12/fe/src/test/java/org/apache/impala/service/FrontendTest.java
File fe/src/test/java/org/apache/impala/service/FrontendTest.java:

http://gerrit.cloudera.org:8080/#/c/8202/12/fe/src/test/java/org/apache/impala/service/FrontendTest.java@a69
PS12, Line 69:
Can you think of a reason for keeping this test around? The interesting case 
was the "not ready" case which is now gone, so I'm not sure this test makes 
sense anymore.



--
To view, visit http://gerrit.cloudera.org:8080/8202
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I52b881cba18a7e4533e21a78751c2e35c3d4c8a6
Gerrit-Change-Number: 8202
Gerrit-PatchSet: 12
Gerrit-Owner: Vuk Ercegovac 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Balazs Jeszenszky 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Michael Brown 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Vuk Ercegovac 

[Impala-ASF-CR] IMPALA-6004: Fix test row filters failure on ASAN

2017-10-23 Thread Alex Behm (Code Review)
Alex Behm has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8358 )

Change subject: IMPALA-6004: Fix test_row_filters failure on ASAN
..


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/8358/1/testdata/workloads/functional-query/queries/QueryTest/runtime_row_filters.test
File 
testdata/workloads/functional-query/queries/QueryTest/runtime_row_filters.test:

http://gerrit.cloudera.org:8080/#/c/8358/1/testdata/workloads/functional-query/queries/QueryTest/runtime_row_filters.test@350
PS1, Line 350: SET RUNTIME_FILTER_WAIT_TIME_MS=10;
Do you think it makes sense to have different timeouts for each different test 
case in this file? If not, we should consider using the same timeout everywhere 
in this test and set the query option based on solution like this:
https://gerrit.cloudera.org/#/c/8357



--
To view, visit http://gerrit.cloudera.org:8080/8358
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia098735594b36a72f02bf7edd051171689618051
Gerrit-Change-Number: 8358
Gerrit-PatchSet: 1
Gerrit-Owner: Thomas Tauber-Marshall 
Gerrit-Reviewer: Alex Behm 
Gerrit-Comment-Date: Mon, 23 Oct 2017 18:58:38 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-6060: Check the return value of JNI exception handling functions

2017-10-23 Thread Alex Behm (Code Review)
Alex Behm has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8334 )

Change subject: IMPALA-6060: Check the return value of JNI exception handling 
functions
..


Patch Set 1:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/8334/1/be/src/util/jni-util.cc
File be/src/util/jni-util.cc:

http://gerrit.cloudera.org:8080/#/c/8334/1/be/src/util/jni-util.cc@193
PS1, Line 193:   auto oom_msg_template = "$0 throws an unchecked throwable. JVM 
likely runs out of "
This case of auto obscures the code. Might not be clear to everybody that this 
is a const char* and not a std::string.

How about this err message:

"$0 threw an unchecked exception. The JVM is likely out of memory (OOM)."


http://gerrit.cloudera.org:8080/#/c/8334/1/be/src/util/jni-util.cc@197
PS1, Line 197:   if (msg == nullptr) {
> RETURN_ERROR_IF_EXC will call GetJniExceptionMsg again, which will result i
You are right that RETURN_ERROR_IF_EXC() does not make sense here.

I stand by my main point. We need to check for exceptions after every JNI 
function call, that's the standard practice and we should simply follow it 
here. Yes, checking for msg == nullptr might lead to the same result in 
practice, but that check is unnecessarily indirect.

The right way to handle this case is:

if (env->ExceptionOccurred()) {
  env->ExceptionClear();
  ...
}



--
To view, visit http://gerrit.cloudera.org:8080/8334
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie3ed88bf8739c56a066f2402727c8204e96aa116
Gerrit-Change-Number: 8334
Gerrit-PatchSet: 1
Gerrit-Owner: Tianyi Wang 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Tianyi Wang 
Gerrit-Comment-Date: Mon, 23 Oct 2017 18:28:01 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4524: Batch ALTER TABLE...ADD PARTITION calls.

2017-10-23 Thread Alex Behm (Code Review)
Alex Behm has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8238 )

Change subject: IMPALA-4524: Batch ALTER TABLE...ADD PARTITION calls.
..


Patch Set 9: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/8238
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I95f8221ff08c0f126f951f7d37ff5e57985f855f
Gerrit-Change-Number: 8238
Gerrit-PatchSet: 9
Gerrit-Owner: Philip Zeyliger 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Comment-Date: Mon, 23 Oct 2017 17:51:00 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-6070: Parallel compute table stats.py

2017-10-23 Thread Alex Behm (Code Review)
Alex Behm has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8354 )

Change subject: IMPALA-6070: Parallel compute_table_stats.py
..


Patch Set 2: Code-Review+1

Thanks for adding the logging. Let's MikeB do the final +2.


--
To view, visit http://gerrit.cloudera.org:8080/8354
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ifb080f2552b9dbe304ecadd6e52429214094237d
Gerrit-Change-Number: 8354
Gerrit-PatchSet: 2
Gerrit-Owner: Philip Zeyliger 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Comment-Date: Mon, 23 Oct 2017 17:46:56 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-6070: Parallel compute table stats.py

2017-10-23 Thread Alex Behm (Code Review)
Alex Behm has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8354 )

Change subject: IMPALA-6070: Parallel compute_table_stats.py
..


Patch Set 1:

(2 comments)

Nice

http://gerrit.cloudera.org:8080/#/c/8354/1/tests/util/compute_table_stats.py
File tests/util/compute_table_stats.py:

http://gerrit.cloudera.org:8080/#/c/8354/1/tests/util/compute_table_stats.py@37
PS1, Line 37: print 'Executing: %s' % statement
Do these prints come out clean or all garbled due to the parallelism?


http://gerrit.cloudera.org:8080/#/c/8354/1/tests/util/compute_table_stats.py@119
PS1, Line 119:   compute_stats(client_factory, db_names=db_names,
please add newline to separate more cleanly



--
To view, visit http://gerrit.cloudera.org:8080/8354
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ifb080f2552b9dbe304ecadd6e52429214094237d
Gerrit-Change-Number: 8354
Gerrit-PatchSet: 1
Gerrit-Owner: Philip Zeyliger 
Gerrit-Reviewer: Alex Behm 
Gerrit-Comment-Date: Mon, 23 Oct 2017 17:09:39 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-6068: Fix dataload for complextypes fileformat

2017-10-23 Thread Alex Behm (Code Review)
Alex Behm has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8350 )

Change subject: IMPALA-6068: Fix dataload for complextypes_fileformat
..


Patch Set 2:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/8350/2/testdata/bin/generate-schema-statements.py
File testdata/bin/generate-schema-statements.py:

http://gerrit.cloudera.org:8080/#/c/8350/2/testdata/bin/generate-schema-statements.py@528
PS2, Line 528:   insert = eval_section(section['DEPENDENT_LOAD'])
Are these two sections intended to be mutually exclusive? If so, then we should 
throw an error if both are specified. If not, in what order are they run?



--
To view, visit http://gerrit.cloudera.org:8080/8350
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7152306b2907198204a6d8d282a0bad561129b82
Gerrit-Change-Number: 8350
Gerrit-PatchSet: 2
Gerrit-Owner: Joe McDonnell 
Gerrit-Reviewer: Alex Behm 
Gerrit-Comment-Date: Mon, 23 Oct 2017 16:58:09 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5429: Multi threaded block metadata loading

2017-10-23 Thread Alex Behm (Code Review)
Alex Behm has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8235 )

Change subject: IMPALA-5429: Multi threaded block metadata loading
..


Patch Set 8: Code-Review+1

(1 comment)

Dimitris should +2

http://gerrit.cloudera.org:8080/#/c/8235/8/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java:

http://gerrit.cloudera.org:8080/#/c/8235/8/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@122
PS8, Line 122:   // Limits the number of errors logged when loading huge 
partitioned tables.
remove "huge"



--
To view, visit http://gerrit.cloudera.org:8080/8235
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I07eaa7151dfc4d56da8db8c2654bd65d8f808481
Gerrit-Change-Number: 8235
Gerrit-PatchSet: 8
Gerrit-Owner: Bharath Vissapragada 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Vuk Ercegovac 
Gerrit-Comment-Date: Mon, 23 Oct 2017 16:47:50 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-6070: Parallel data load.

2017-10-23 Thread Alex Behm (Code Review)
Alex Behm has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8320 )

Change subject: IMPALA-6070: Parallel data load.
..


Patch Set 2: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/8320
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I836c4e1586f229621c102c4f4ba22ce7224ab9ac
Gerrit-Change-Number: 8320
Gerrit-PatchSet: 2
Gerrit-Owner: Philip Zeyliger 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Michael Brown 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Zach Amsden 
Gerrit-Comment-Date: Mon, 23 Oct 2017 16:41:14 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5429: Multi threaded block metadata loading

2017-10-20 Thread Alex Behm (Code Review)
Alex Behm has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8235 )

Change subject: IMPALA-5429: Multi threaded block metadata loading
..


Patch Set 7:

(9 comments)

http://gerrit.cloudera.org:8080/#/c/8235/7/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java:

http://gerrit.cloudera.org:8080/#/c/8235/7/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@221
PS7, Line 221: // are excluded because they are considered hidden from 
Impala's perspecitve
Shrink:

// Number of hidden files excluded from file metadata loading . More details at 
isValidDataFile().


http://gerrit.cloudera.org:8080/#/c/8235/7/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@226
PS7, Line 226: // Number of files skipped while computeing the block 
metadata information.
Shrink:

// Number of files skipped from file metadata loading because the the files 
have not changed since the last load. More details at hasFileChanged().


http://gerrit.cloudera.org:8080/#/c/8235/7/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@248
PS7, Line 248: // If set to true, reloads the block metadata only when the 
files in this path
reloads the file metadata (let's try to use the new "file metadata" terminology 
consistently)


http://gerrit.cloudera.org:8080/#/c/8235/7/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@249
PS7, Line 249: // has changed since last load (more details in 
hasFileChanged()).
have changed


http://gerrit.cloudera.org:8080/#/c/8235/7/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@785
PS7, Line 785:* much higher throughput of RPC calls for 
listStatus/listFiles. Based on our
... RPC calls for listStatus/listFiles. For simplicity, the filesystem type is 
determined based on the table's root path and not for each partition 
individually. Based on our experiments, S3 showed ...


http://gerrit.cloudera.org:8080/#/c/8235/7/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@791
PS7, Line 791:* This method is not optimized for tables with mixed 
partition types (partitions mapped
I don'd think we need this much detail, how about folding a simple sentence 
into the existing paragraph above, see suggestion above.


http://gerrit.cloudera.org:8080/#/c/8235/7/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@794
PS7, Line 794:* This may not work well with the mixed partition types and 
needs to fixed.
Don't think this is needed.


http://gerrit.cloudera.org:8080/#/c/8235/7/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@819
PS7, Line 819: // For tables without partitions, we have no block metadata 
to load.
remove ","


http://gerrit.cloudera.org:8080/#/c/8235/7/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@841
PS7, Line 841:   LOG.error("Encountered an error loading block metadata 
for table: " +
Could this flood the log when HDFS is under pressure or there is an 
intermittent Kerberos issue? I'm thinking we should aggregate these into a 
single log message. Do you think having every single path helps with 
supportability? My feeling is it's more important to note how many tasks 
failed, but I'll defer to you if you think having the paths is useful.



--
To view, visit http://gerrit.cloudera.org:8080/8235
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I07eaa7151dfc4d56da8db8c2654bd65d8f808481
Gerrit-Change-Number: 8235
Gerrit-PatchSet: 7
Gerrit-Owner: Bharath Vissapragada 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Vuk Ercegovac 
Gerrit-Comment-Date: Sat, 21 Oct 2017 00:12:39 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-6803: [DOCS] Clarify scope of STRAIGHT JOIN hint

2017-10-20 Thread Alex Behm (Code Review)
Alex Behm has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8336 )

Change subject: IMPALA-6803: [DOCS] Clarify scope of STRAIGHT_JOIN hint
..


Patch Set 4: Code-Review+2

Thanks!


--
To view, visit http://gerrit.cloudera.org:8080/8336
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ibd8e4d8e5260f5a2b7b5c52dc3269bb373b9c76f
Gerrit-Change-Number: 8336
Gerrit-PatchSet: 4
Gerrit-Owner: John Russell 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: John Russell 
Gerrit-Reviewer: Mala Chikka Kempanna 
Gerrit-Comment-Date: Fri, 20 Oct 2017 23:55:23 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-6803: [DOCS] Clarify scope of STRAIGHT JOIN hint

2017-10-20 Thread Alex Behm (Code Review)
Alex Behm has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8336 )

Change subject: IMPALA-6803: [DOCS] Clarify scope of STRAIGHT_JOIN hint
..


Patch Set 3:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/8336/3/docs/shared/impala_common.xml
File docs/shared/impala_common.xml:

http://gerrit.cloudera.org:8080/#/c/8336/3/docs/shared/impala_common.xml@2869
PS3, Line 2869: in the query block containing the hint. It does not 
affect the join order
I suggest removing the "nested within those table references part" because 
WHERE-clause subqueries do not belong to "those table references"


Phrasing:
It does not affect the join order of any nested queries, such as views, inline 
views or WHERE-clause subqueries.



--
To view, visit http://gerrit.cloudera.org:8080/8336
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ibd8e4d8e5260f5a2b7b5c52dc3269bb373b9c76f
Gerrit-Change-Number: 8336
Gerrit-PatchSet: 3
Gerrit-Owner: John Russell 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: John Russell 
Gerrit-Reviewer: Mala Chikka Kempanna 
Gerrit-Comment-Date: Fri, 20 Oct 2017 16:27:26 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5307: Part 4: copy out uncompressed text and seq

2017-10-19 Thread Alex Behm (Code Review)
Alex Behm has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8172 )

Change subject: IMPALA-5307: Part 4: copy out uncompressed text and seq
..


Patch Set 4:

(3 comments)

Nice cleanup! I don't have much to say - patch makes a lot of sense to me.

http://gerrit.cloudera.org:8080/#/c/8172/4/be/src/exec/hdfs-scanner.h
File be/src/exec/hdfs-scanner.h:

http://gerrit.cloudera.org:8080/#/c/8172/4/be/src/exec/hdfs-scanner.h@276
PS4, Line 276:   std::vector string_slot_offsets_;
What's the reason for keeping a separate vector? We also have these in the scan 
node's tuple descriptor.


http://gerrit.cloudera.org:8080/#/c/8172/4/be/src/exec/hdfs-scanner.h@341
PS4, Line 341:   /// - 'copy_out_strings': if true, strings in returned tuples 
that pass conjuncts are
copy_strings?


http://gerrit.cloudera.org:8080/#/c/8172/4/be/src/exec/scanner-context.cc
File be/src/exec/scanner-context.cc:

http://gerrit.cloudera.org:8080/#/c/8172/4/be/src/exec/scanner-context.cc@110
PS4, Line 110: parent_->scan_node_->num_owned_io_buffers_.Add(-1);
Not sure this counter is still useful. Do you think we can get rid of it?



--
To view, visit http://gerrit.cloudera.org:8080/8172
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I304fd002b61bfedf41c8b1405cd7eb7b492bb941
Gerrit-Change-Number: 8172
Gerrit-PatchSet: 4
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Fri, 20 Oct 2017 05:37:08 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-6060: Check the return value of JNI exception handling functions

2017-10-19 Thread Alex Behm (Code Review)
Alex Behm has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8334 )

Change subject: IMPALA-6060: Check the return value of JNI exception handling 
functions
..


Patch Set 1:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/8334/1/be/src/util/jni-util.h
File be/src/util/jni-util.h:

http://gerrit.cloudera.org:8080/#/c/8334/1/be/src/util/jni-util.h@159
PS1, Line 159:   /// lives as long as this guard. jstr should not be null.
Usually such a comment indicates a precondition, but in this case a nullptr is 
handled gracefully by returning a status.

I think it's actually better to make it a precondition and use a DCHECK instead 
of returning a Status.


http://gerrit.cloudera.org:8080/#/c/8334/1/be/src/util/jni-util.cc
File be/src/util/jni-util.cc:

http://gerrit.cloudera.org:8080/#/c/8334/1/be/src/util/jni-util.cc@30
PS1, Line 30: Status JniUtfCharGuard::create(JNIEnv *env, jstring jstr, 
JniUtfCharGuard *out) {
fix pointer style while you are here: JNIEnv* env, etc.


http://gerrit.cloudera.org:8080/#/c/8334/1/be/src/util/jni-util.cc@197
PS1, Line 197:   if (msg == nullptr) {
I think the real cause of the bug is here. The general rule is that you always 
need to check for exceptions after a JNI function call. I think we are missing 
a RETURN_ERROR_IF_EXC(env); at this point.

This handles checked as well as unchecked Java exceptions, including OOM.


http://gerrit.cloudera.org:8080/#/c/8334/1/be/src/util/jni-util.cc@208
PS1, Line 208: if (stack == nullptr) {
same here, I think the right fix is to add a RETURN_ERROR_IF_EXC(env);



--
To view, visit http://gerrit.cloudera.org:8080/8334
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie3ed88bf8739c56a066f2402727c8204e96aa116
Gerrit-Change-Number: 8334
Gerrit-PatchSet: 1
Gerrit-Owner: Tianyi Wang 
Gerrit-Reviewer: Alex Behm 
Gerrit-Comment-Date: Fri, 20 Oct 2017 04:41:41 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4918: Support getting column comments via HS2

2017-10-19 Thread Alex Behm (Code Review)
Alex Behm has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8315 )

Change subject: IMPALA-4918: Support getting column comments via HS2
..


Patch Set 3: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/8315
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I1d33dfd031b5344d7136695b623cec76143ada5c
Gerrit-Change-Number: 8315
Gerrit-PatchSet: 3
Gerrit-Owner: Csaba Ringhofer 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Lars Volker 
Gerrit-Comment-Date: Fri, 20 Oct 2017 04:11:15 +
Gerrit-HasComments: No


[Impala-ASF-CR] [DOCS] Clarify scope of STRAIGHT JOIN hint

2017-10-19 Thread Alex Behm (Code Review)
Alex Behm has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8336 )

Change subject: [DOCS] Clarify scope of STRAIGHT_JOIN hint
..


Patch Set 1:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/8336/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/8336/1//COMMIT_MSG@7
PS1, Line 7: [DOCS] Clarify scope of STRAIGHT_JOIN hint
IMPALA-6083


http://gerrit.cloudera.org:8080/#/c/8336/1/docs/shared/impala_common.xml
File docs/shared/impala_common.xml:

http://gerrit.cloudera.org:8080/#/c/8336/1/docs/shared/impala_common.xml@2869
PS1, Line 2869: in the topmost query block where the hint occurs. It 
does not affect the join order
"topmost" is confusing. It is simply the query block in which the hint appears. 
How about:

... affects the join order of table references in the query block containing 
the hint.


http://gerrit.cloudera.org:8080/#/c/8336/1/docs/shared/impala_common.xml@2870
PS1, Line 2870: of any queries nested within those table references, 
such as subqueries or views.
of any nested queries, such as views, inline views or WHERE-clause subqueries.

(generally, not all nested queries are part of a table reference)


http://gerrit.cloudera.org:8080/#/c/8336/1/docs/shared/impala_common.xml@2872
PS1, Line 2872: whichever top-level or nested queries need a specific 
join order.
apply the hint to all query blocks that need a fixed join order



--
To view, visit http://gerrit.cloudera.org:8080/8336
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ibd8e4d8e5260f5a2b7b5c52dc3269bb373b9c76f
Gerrit-Change-Number: 8336
Gerrit-PatchSet: 1
Gerrit-Owner: John Russell 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Mala Chikka Kempanna 
Gerrit-Comment-Date: Thu, 19 Oct 2017 19:03:58 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-6073: Fail on misconfigured CLASSPATH.

2017-10-18 Thread Alex Behm (Code Review)
Alex Behm has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8327 )

Change subject: IMPALA-6073: Fail on misconfigured CLASSPATH.
..


Patch Set 1:

(3 comments)

Thanks for fixing, lgtm, only minor nits

http://gerrit.cloudera.org:8080/#/c/8327/1/be/src/common/init.cc
File be/src/common/init.cc:

http://gerrit.cloudera.org:8080/#/c/8327/1/be/src/common/init.cc@178
PS1, Line 178: char *classpath = getenv("CLASSPATH");
style: char*


http://gerrit.cloudera.org:8080/#/c/8327/1/be/src/common/init.cc@179
PS1, Line 179: assert(classpath != NULL);
Why not DCHECK and add a message


http://gerrit.cloudera.org:8080/#/c/8327/1/be/src/common/init.cc@182
PS1, Line 182: assert(std::string(classpath).find("*") == 
std::string::npos);
DCHECK and include an error message



--
To view, visit http://gerrit.cloudera.org:8080/8327
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Icb92fa5cad8bbc29b620bec5904e45ed7aeff13d
Gerrit-Change-Number: 8327
Gerrit-PatchSet: 1
Gerrit-Owner: Philip Zeyliger 
Gerrit-Reviewer: Alex Behm 
Gerrit-Comment-Date: Thu, 19 Oct 2017 03:45:24 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4524: Batch calls to ALTER TABLE...ADD PARTITION.

2017-10-18 Thread Alex Behm (Code Review)
Alex Behm has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8238 )

Change subject: IMPALA-4524: Batch calls to ALTER TABLE...ADD PARTITION.
..


Patch Set 7:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/8238/7/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java:

http://gerrit.cloudera.org:8080/#/c/8238/7/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@1958
PS7, Line 1958:   Lists.partition(allHmsPartitionsToAdd, 
MAX_PARTITION_UPDATES_PER_RPC)) {
While you are here, can you also change bulkAlterPartitions() to use 
Lists.partition()?


http://gerrit.cloudera.org:8080/#/c/8238/7/tests/metadata/test_ddl.py
File tests/metadata/test_ddl.py:

http://gerrit.cloudera.org:8080/#/c/8238/7/tests/metadata/test_ddl.py@514
PS7, Line 514:   def test_alter_table_create_many_partitions(self, vector, 
unique_database):
This test doesn't belong in the TestLibCache class.

Don't we have existing tests that test the multi-partition add alter? Seems 
like this test belongs there.


http://gerrit.cloudera.org:8080/#/c/8238/7/tests/metadata/test_ddl.py@519
PS7, Line 519: self.client.execute('use {0}'.format(unique_database))
"use" can generally cause problems in tests because we sometimes run a test 
against different impalads for example to exercise sync_ddl. While I think it 
is ok in this specific instance, it's generally better practice not so issue 
"use" statements unless absolutely necessary.


http://gerrit.cloudera.org:8080/#/c/8238/7/tests/metadata/test_ddl.py@527
PS7, Line 527: # 473 -1  0   0B  NOT CACHED  NOT CACHED 
 TEXTfalse   
hdfs://localhost:20500/test-warehouse/test_alter_table_create_many_partitions_5b4888f5.db/t/p=473
This comment can become stale quickly. I suggest not adding it an output row 
verbatim but mentioning that show partitions contains the partition HDFS paths 
which we expect to contain p=val sub-directories that you are going to search 
for with the regex.



--
To view, visit http://gerrit.cloudera.org:8080/8238
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I95f8221ff08c0f126f951f7d37ff5e57985f855f
Gerrit-Change-Number: 8238
Gerrit-PatchSet: 7
Gerrit-Owner: Philip Zeyliger 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Comment-Date: Thu, 19 Oct 2017 03:40:11 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-6070: Parallel data load.

2017-10-18 Thread Alex Behm (Code Review)
Alex Behm has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8320 )

Change subject: IMPALA-6070: Parallel data load.
..


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/8320/1/testdata/bin/run-hive-server.sh
File testdata/bin/run-hive-server.sh:

http://gerrit.cloudera.org:8080/#/c/8320/1/testdata/bin/run-hive-server.sh@75
PS1, Line 75:   HADOOP_HEAPSIZE="1024" hive --service hiveserver2 > 
${LOGDIR}/hive-server2.out 2>&1 &
> People like Alex are those whom I was most concerned about, as I know he us
:). I'm still using that good-old machine, mem should be fine (fingers crossed).



--
To view, visit http://gerrit.cloudera.org:8080/8320
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I836c4e1586f229621c102c4f4ba22ce7224ab9ac
Gerrit-Change-Number: 8320
Gerrit-PatchSet: 1
Gerrit-Owner: Philip Zeyliger 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Zach Amsden 
Gerrit-Comment-Date: Thu, 19 Oct 2017 00:47:44 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-6070: Parallel data load.

2017-10-18 Thread Alex Behm (Code Review)
Alex Behm has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8320 )

Change subject: IMPALA-6070: Parallel data load.
..


Patch Set 1:

(2 comments)

Changes like these tend to be slow and painful to test, so I'm in favor of not 
parallelizing additional things in this patch. Additional steps can be improved 
later.

http://gerrit.cloudera.org:8080/#/c/8320/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/8320/1//COMMIT_MSG@33
PS1, Line 33:
What testing did you do? Does the data load still run on a non-beefy local 
machine?


http://gerrit.cloudera.org:8080/#/c/8320/1/testdata/bin/run-hive-server.sh
File testdata/bin/run-hive-server.sh:

http://gerrit.cloudera.org:8080/#/c/8320/1/testdata/bin/run-hive-server.sh@75
PS1, Line 75:   HADOOP_HEAPSIZE="1024" hive --service hiveserver2 > 
${LOGDIR}/hive-server2.out 2>&1 &
> This looks like it will also increase HADOOP_HEAPSIZE when not doing a para
I'd prefer to keep this change. Our Hive server tends to OOM pretty easily when 
doing anything non-trivial with Hive on our mini cluster.



--
To view, visit http://gerrit.cloudera.org:8080/8320
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I836c4e1586f229621c102c4f4ba22ce7224ab9ac
Gerrit-Change-Number: 8320
Gerrit-PatchSet: 1
Gerrit-Owner: Philip Zeyliger 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Zach Amsden 
Gerrit-Comment-Date: Thu, 19 Oct 2017 00:07:42 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5976: Remove equivalent class computation in FE

2017-10-18 Thread Alex Behm (Code Review)
Alex Behm has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8317 )

Change subject: IMPALA-5976: Remove equivalent class computation in FE
..


Patch Set 1:

(47 comments)

Very nice.

First wave of comments. I have not exhaustively gone through everything yet 
since this is a very substantial patch.

http://gerrit.cloudera.org:8080/#/c/8317/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/8317/1//COMMIT_MSG@7
PS1, Line 7: IMPALA-5976: Remove equivalent class computation in FE
Remove equivalence classes from FE


http://gerrit.cloudera.org:8080/#/c/8317/1/fe/src/main/java/org/apache/impala/analysis/Analyzer.java
File fe/src/main/java/org/apache/impala/analysis/Analyzer.java:

http://gerrit.cloudera.org:8080/#/c/8317/1/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@a2017
PS1, Line 2017:
Very satisfying to see this code deleted!


http://gerrit.cloudera.org:8080/#/c/8317/1/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@a2890
PS1, Line 2890:
This function was nice for debugging. Does your patch add a similar function 
that nicely prints the two graphs?


http://gerrit.cloudera.org:8080/#/c/8317/1/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@1253
PS1, Line 1253: return !tids.isEmpty() && (tids.size() > 1 || 
isOjConjunct(e) || isFullOuterJoined(e)
This new formatting is extremely hard to read, please reformat for readability. 
The old formatting was much better.


http://gerrit.cloudera.org:8080/#/c/8317/1/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@1496
PS1, Line 1496: for (SlotId dst : getEquivSlots(srcSid)) {
Unless we are absolutely convinced that the change is correct, I think we 
should try to preserve the semantics of the original code. My understanding is 
that the new getEquivSlots() returns slots that have a mutual value transfer 
with srcSid - which is different than what the original code did.


http://gerrit.cloudera.org:8080/#/c/8317/1/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@1608
PS1, Line 1608:* For each equivalence class, adds/removes predicates from 
conjuncts such that it
Refers to "equivalence class" again which is now an undefined term. If you want 
to use that term we should define what it means.

Let's think about how to evolve the terminology. Based on my understanding of 
your patch it looks like you want to redefine "equivalence class" to be a set 
of slots where each pair of slots has a mutual value transfer.


http://gerrit.cloudera.org:8080/#/c/8317/1/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@1626
PS1, Line 1626: // Equivalence classes only containing slots belonging to 
lhsTids.
What does this mean? Is the Integer key the SCC id?


http://gerrit.cloudera.org:8080/#/c/8317/1/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@1802
PS1, Line 1802: if (slotDesc.getId().asInt() >= g.numVertices()) 
continue;
I've seen this check in a couple of places, would be nice if that could be 
handled by 'g' instead of all callsites.


http://gerrit.cloudera.org:8080/#/c/8317/1/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@1952
PS1, Line 1952:   public void computeValueTransfer() {
computeValueTransferGraph


http://gerrit.cloudera.org:8080/#/c/8317/1/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@1972
PS1, Line 1972:   private void constructValueTransferFromEqPredicates(Graph g) {
valueTransfers (plural)


http://gerrit.cloudera.org:8080/#/c/8317/1/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@2047
PS1, Line 2047:* Returns the ids of slots that are in the same equivalence 
class as slotId
Comment need updating. The concept of "equivalence class" does not exist 
anymore and we should not use that term.


http://gerrit.cloudera.org:8080/#/c/8317/1/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@2055
PS1, Line 2055: for (int dst: g.getSccMembers(g.getSccId(slotId.asInt( {
In terms of the g API I think callers should have to care about the existence 
of SCCs. Can we hide those details?


http://gerrit.cloudera.org:8080/#/c/8317/1/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@2070
PS1, Line 2070: Collections.sort(result);
Shouldn't the destinations already be sorted?


http://gerrit.cloudera.org:8080/#/c/8317/1/fe/src/main/java/org/apache/impala/analysis/Expr.java
File fe/src/main/java/org/apache/impala/analysis/Expr.java:

http://gerrit.cloudera.org:8080/#/c/8317/1/fe/src/main/java/org/apache/impala/analysis/Expr.java@702
PS1, Line 702: if (!localCmp.apply(Pair.create(this, that))) return false;
It's very unfortunate that we have to create a new pair object to do a SlotRef 
comparison. We should try to avoid that generate an excessive number of 
objects. Expr.equals() really is called quite frequently and SlotRef is the 
most common Expr.

Using a Java Comparator is not ideal because we don't provide 

[Impala-ASF-CR] IMPALA-3548: Prune runtime filters based on query options in the FE

2017-10-17 Thread Alex Behm (Code Review)
Alex Behm has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/7564 )

Change subject: IMPALA-3548: Prune runtime filters based on query options in 
the FE
..


Patch Set 10:

(27 comments)

http://gerrit.cloudera.org:8080/#/c/7564/10/be/src/service/fe-support.cc
File be/src/service/fe-support.cc:

http://gerrit.cloudera.org:8080/#/c/7564/10/be/src/service/fe-support.cc@455
PS10, Line 455: jbyteArray thrift_in_query_options) {
Not sure what 'thrift_in_query_options' means. How about using 'tquery_options' 
instead? Ideally, we would not need to pass in the existing query options.


http://gerrit.cloudera.org:8080/#/c/7564/10/be/src/service/fe-support.cc@457
PS10, Line 457:   DeserializeThriftMsg(env, thrift_in_query_options, );
Needs error handling, e.g. using THROW_IF_ERROR_RET


http://gerrit.cloudera.org:8080/#/c/7564/10/be/src/service/fe-support.cc@458
PS10, Line 458:   const char *o = env->GetStringUTFChars(options_str, NULL);
* You need to release the string UTF chars, take a look at JniUtfCharGuard in 
jni-util.h

* style: const char* o


http://gerrit.cloudera.org:8080/#/c/7564/10/be/src/service/fe-support.cc@461
PS10, Line 461:   TParseQueryOptionsResult result;
I think it's simpler to convert all Status to an exception. That way we don't 
need the TParseQueryOptionsResult at all, and the error handling is consistent 
(always throws an exception in case of error).


http://gerrit.cloudera.org:8080/#/c/7564/10/fe/src/main/java/org/apache/impala/planner/Planner.java
File fe/src/main/java/org/apache/impala/planner/Planner.java:

http://gerrit.cloudera.org:8080/#/c/7564/10/fe/src/main/java/org/apache/impala/planner/Planner.java@119
PS10, Line 119: // Create runtime filters. 'singleNodePlan' now points to 
the root of the distributed
Comment is wrong. 'singleNodePlan' definitely does not point to the root of the 
distributed plan.

Use rootFragment.getPlanRoot() instead


http://gerrit.cloudera.org:8080/#/c/7564/10/fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java
File fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java:

http://gerrit.cloudera.org:8080/#/c/7564/10/fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java@354
PS10, Line 354: public void addTarget(RuntimeFilterTarget target) { 
targets_.add(target); }
add newline before


http://gerrit.cloudera.org:8080/#/c/7564/10/fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java@397
PS10, Line 397: Preconditions.checkState(maxNumFilters >= 0);
Why is 0 not a valid value?


http://gerrit.cloudera.org:8080/#/c/7564/10/fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java@423
PS10, Line 423:   filter.computeNdvEstimate();
My understanding is that IMPALA-3450 has been fixed and we can remove this code.


http://gerrit.cloudera.org:8080/#/c/7564/10/fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java@537
PS10, Line 537:* The assigned filters are the ones for which 'scanNode' can 
be used a destination
can be used as a destination node


http://gerrit.cloudera.org:8080/#/c/7564/10/fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java@539
PS10, Line 539:* 1. If DISABLE_ROW_RUNTIME_FILTERING query option is set, a 
filter is only assigned to
If the DISABLE_ROW_RUNTIME_FILTERING query option is set ...


http://gerrit.cloudera.org:8080/#/c/7564/10/fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java@541
PS10, Line 541:* 2. If RUNTIME_FILTER_MODE query option is set to LOCAL, a 
filter is only assigned to
If the RUNTIME_FILTER_MODE query option is set ...


http://gerrit.cloudera.org:8080/#/c/7564/10/fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java@542
PS10, Line 542:*'scanNode' if the filter is local to the scan node.
This doesn't really explain what the LOCAL option means. How about:

... a filter is only assigned to 'scanNode' if the filter is produced within 
the same fragment that contains the scan node.


http://gerrit.cloudera.org:8080/#/c/7564/10/fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java@563
PS10, Line 563:   if (!isSingleNodeExec && runtimeFilterMode == 
TRuntimeFilterMode.LOCAL &&
Why the !isSingleNodeExec condition? I think that !isLocalTarget implies 
!isSingleNodeExec.


http://gerrit.cloudera.org:8080/#/c/7564/10/fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java@576
PS10, Line 576:   static private boolean IsLocalTarget(RuntimeFilter filter, 
ScanNode targetNode) {
isLocalTarget()


http://gerrit.cloudera.org:8080/#/c/7564/10/fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java@583
PS10, Line 583:   static private boolean IsBoundByPartitionColumns(Analyzer 
analyzer, Expr targetExpr,
isBoundByPartitionColumns()



[Impala-ASF-CR] IMPALA-5429: Multi threaded block metadata loading

2017-10-16 Thread Alex Behm (Code Review)
Alex Behm has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8235 )

Change subject: IMPALA-5429: Multi threaded block metadata loading
..


Patch Set 6:

(15 comments)

http://gerrit.cloudera.org:8080/#/c/8235/6/be/src/catalog/catalog.cc
File be/src/catalog/catalog.cc:

http://gerrit.cloudera.org:8080/#/c/8235/6/be/src/catalog/catalog.cc@38
PS6, Line 38: DEFINE_int32(max_hdfs_parts_parallel_load, 5,
Let's spell out "parts" into "partitions" to avoid confusion.


http://gerrit.cloudera.org:8080/#/c/8235/6/be/src/catalog/catalog.cc@40
PS6, Line 40: "tables. Due to HDFS architectural limitations, it is 
unlikely to get a linear "
In the sentence "Due to HDFS architectural..." what does the "it" after the 
comma refer to? Better to spell it out, e.g.,

"the response time of block metadata loading is unlikely to improve beyond 5 
threads."


http://gerrit.cloudera.org:8080/#/c/8235/6/be/src/catalog/catalog.cc@42
PS6, Line 42: DEFINE_int32(max_s3_parts_parallel_load, 20,
Does ADLS fall into this or the HDFS bucket? What about other filesystems?


http://gerrit.cloudera.org:8080/#/c/8235/6/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java:

http://gerrit.cloudera.org:8080/#/c/8235/6/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@216
PS6, Line 216:   private class PathStorageMdLoadingStats {
Naming seems a little weird. I think "Storage" is too generic and "Block" is 
too specific. How about we settle on "FileMetadata", so we'd have:

FileMetadataLoadStats
FileMetadataLoadRequest


http://gerrit.cloudera.org:8080/#/c/8235/6/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@240
PS6, Line 240: // If set to true, reloads the block metadata only when the 
underlying file
Please clarify "the underlying file"


http://gerrit.cloudera.org:8080/#/c/8235/6/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@413
PS6, Line 413: ++loadStats.ignoredFiles;
I think we should distinguish the hidden file and already-up-to-date file 
cases, e.g. split ignoredFiles into 'hiddenFiles' and 'skippedFreshFiles' or 
similar


http://gerrit.cloudera.org:8080/#/c/8235/6/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@773
PS6, Line 773:* Returns the thread pool size to load the block metadata of 
this table.
Command on how we distinguish between HDFS, S3 and other filesystems (using the 
table root path?), and what the behavior is for mixed-filesystem tables.


http://gerrit.cloudera.org:8080/#/c/8235/6/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@793
PS6, Line 793: return Math.max(Math.min(numPaths, threadPoolSize), 1);
For my understanding, why is the max() needed here? Is it not be a precondition 
that numPaths > 0 and threadPoolSize > 0?


http://gerrit.cloudera.org:8080/#/c/8235/6/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@800
PS6, Line 800:* metadata is reloaded, else it is loaded from scratch.
Difference between "reloaded" and "loaded from scratch" is not very clear. 
Maybe say "incrementally refreshed" vs. "reloaded from scratch"?


http://gerrit.cloudera.org:8080/#/c/8235/6/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@804
PS6, Line 804: int numPathsToLoad = partsByPath.keySet().size();
partsByPath.size()?


http://gerrit.cloudera.org:8080/#/c/8235/6/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@812
PS6, Line 812:   for (Path p: partsByPath.keySet()) {
The tasks are ordered randomly. I wonder if submitting the tasks in a clustered 
fashion would be better or worse. Not relevant to this patch, but might be 
interesting as a follow-on investigation.


http://gerrit.cloudera.org:8080/#/c/8235/6/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@821
PS6, Line 821: } catch (ExecutionException | InterruptedException e) {
We still consider the load successful if one of these fails. What do you think 
the right behavior should be? Is a partially loaded table an acceptable outcome?


http://gerrit.cloudera.org:8080/#/c/8235/6/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@822
PS6, Line 822:   LOG.error("Encountered an error loading block metadata 
for table: " +
Let's also dump the path


http://gerrit.cloudera.org:8080/#/c/8235/6/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@898
PS6, Line 898:* the newly created HdfsPartitions in parallel.
remove "the"


http://gerrit.cloudera.org:8080/#/c/8235/6/fe/src/main/java/org/apache/impala/util/ListMap.java
File fe/src/main/java/org/apache/impala/util/ListMap.java:

http://gerrit.cloudera.org:8080/#/c/8235/6/fe/src/main/java/org/apache/impala/util/ListMap.java@38
PS6, Line 38:   private List list_ = Collections.synchronizedList(new 
ArrayList());
The CC requirements and behavior are not clear to me. Why is it not sufficient 
to make all methods synchronized?



--
To 

[Impala-ASF-CR] IMPALA-5664: Unix time to timestamp conversions may crash Impala

2017-10-13 Thread Alex Behm (Code Review)
Alex Behm has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/7954 )

Change subject: IMPALA-5664: Unix time to timestamp conversions may crash Impala
..


Patch Set 15:

Csaba, are those failing tests specifically require a timestamp that issues 
such a warning? If not, I suggest we just change the timestamps of those tests 
in a way that preserves test coverage and avoids the warnings problem.


--
To view, visit http://gerrit.cloudera.org:8080/7954
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I77b2f6284d3a597f57e61c17a67c959eff9e38ff
Gerrit-Change-Number: 7954
Gerrit-PatchSet: 15
Gerrit-Owner: Csaba Ringhofer 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Lars Volker 
Gerrit-Comment-Date: Fri, 13 Oct 2017 18:16:10 +
Gerrit-HasComments: No


[Impala-ASF-CR] Download toolchain in parallel.

2017-10-09 Thread Alex Behm (Code Review)
Alex Behm has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8237 )

Change subject: Download toolchain in parallel.
..


Patch Set 1: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/8237
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I46a6088bb002402c7653dbc8257dff869afb26ec
Gerrit-Change-Number: 8237
Gerrit-PatchSet: 1
Gerrit-Owner: Philip Zeyliger 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Mon, 09 Oct 2017 21:17:14 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-6011: Remove use of Guava Hasher.

2017-10-09 Thread Alex Behm (Code Review)
Alex Behm has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8216 )

Change subject: IMPALA-6011: Remove use of Guava Hasher.
..


Patch Set 2:

Afaik these hashes are stored in Navigator and used for finding exact 
duplicates or something like that. I would need to confirm how these hashes are 
used. The safer route for now is to not change the hashes to avoid doing more 
damage.


--
To view, visit http://gerrit.cloudera.org:8080/8216
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I04bb436eb93a86b8db24b2a97f14047f828b
Gerrit-Change-Number: 8216
Gerrit-PatchSet: 2
Gerrit-Owner: Alex Behm 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Mon, 09 Oct 2017 18:03:35 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-2636: HS2 GetTables() returns TABLE TYPE as TABLE for VIEW

2017-10-06 Thread Alex Behm (Code Review)
Alex Behm has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/7353 )

Change subject: IMPALA-2636: HS2 GetTables() returns TABLE_TYPE as TABLE for 
VIEW
..


Patch Set 4: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/7353
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I90616388e6181cf342b3de389af940214ed46428
Gerrit-Change-Number: 7353
Gerrit-PatchSet: 4
Gerrit-Owner: sandeep akinapelli 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Vuk Ercegovac 
Gerrit-Reviewer: sandeep akinapelli 
Gerrit-Comment-Date: Fri, 06 Oct 2017 23:07:52 +
Gerrit-HasComments: No


[Impala-ASF-CR] [DOCS] Tighten up advice about first COMPUTE INCREMENTAL STATS

2017-10-06 Thread Alex Behm (Code Review)
Alex Behm has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/7999 )

Change subject: [DOCS] Tighten up advice about first COMPUTE INCREMENTAL STATS
..


Patch Set 5: Code-Review+2

I'm still working with Bharath on getting diagnostics for the table sizes, but 
this patch looks good.


--
To view, visit http://gerrit.cloudera.org:8080/7999
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia53a6518ce5541e5c9a2cd896856ce042a599b03
Gerrit-Change-Number: 7999
Gerrit-PatchSet: 5
Gerrit-Owner: John Russell 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Greg Rahn 
Gerrit-Reviewer: John Russell 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Silvius Rus 
Gerrit-Reviewer: Vuk Ercegovac 
Gerrit-Comment-Date: Fri, 06 Oct 2017 23:07:13 +
Gerrit-HasComments: No


[Impala-ASF-CR] [DOCS] Tighten up advice about first COMPUTE INCREMENTAL STATS

2017-10-06 Thread Alex Behm (Code Review)
Alex Behm has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/7999 )

Change subject: [DOCS] Tighten up advice about first COMPUTE INCREMENTAL STATS
..


Patch Set 4:

(5 comments)

http://gerrit.cloudera.org:8080/#/c/7999/4/docs/shared/impala_common.xml
File docs/shared/impala_common.xml:

http://gerrit.cloudera.org:8080/#/c/7999/4/docs/shared/impala_common.xml@1227
PS4, Line 1227: COMPUTE INCREMENTAL STATS during the 
lifetime of a table,
(or vice versa)


http://gerrit.cloudera.org:8080/#/c/7999/4/docs/shared/impala_common.xml@1243
PS4, Line 1243: be cached on every impalad host that 
is eligible to be a coordinator.
as it must be cached on the catalogd and on every ...


http://gerrit.cloudera.org:8080/#/c/7999/4/docs/shared/impala_common.xml@1244
PS4, Line 1244: If this metadata for a table exceeds 2 GB, you might 
experience service downtime.
It's worse than that. If the aggregate metadata of *all* tables combined gets 
to 2GB you may experience downtime.


http://gerrit.cloudera.org:8080/#/c/7999/4/docs/topics/impala_partitioning.xml
File docs/topics/impala_partitioning.xml:

http://gerrit.cloudera.org:8080/#/c/7999/4/docs/topics/impala_partitioning.xml@612
PS4, Line 612: as new partitions are added, Impala includes a variation 
of this statement that is intended for use with
How about:

includes a variation of this statement that allows computing statistics on a 
per-partition basis such that stats can be incrementally updated when new 
partitions are added.


http://gerrit.cloudera.org:8080/#/c/7999/4/docs/topics/impala_perf_stats.xml
File docs/topics/impala_perf_stats.xml:

http://gerrit.cloudera.org:8080/#/c/7999/4/docs/topics/impala_perf_stats.xml@361
PS4, Line 361:   COMPUTE STATS statement might take 
hours, or even days. For such tables, use
Sorry, I disagree with the "For such tables, use COMPUTE INCREMENTAL STATS" 
part. I think we need to be very careful about recommending incremental stats. 
We can document what it does, but I think we should go out of out way to not 
explicitly recommend it for any reason.



--
To view, visit http://gerrit.cloudera.org:8080/7999
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia53a6518ce5541e5c9a2cd896856ce042a599b03
Gerrit-Change-Number: 7999
Gerrit-PatchSet: 4
Gerrit-Owner: John Russell 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Greg Rahn 
Gerrit-Reviewer: John Russell 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Silvius Rus 
Gerrit-Reviewer: Vuk Ercegovac 
Gerrit-Comment-Date: Fri, 06 Oct 2017 21:53:46 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] [DOCS] Tighten up advice about first COMPUTE INCREMENTAL STATS

2017-10-06 Thread Alex Behm (Code Review)
Alex Behm has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/7999 )

Change subject: [DOCS] Tighten up advice about first COMPUTE INCREMENTAL STATS
..


Patch Set 2:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/7999/2/docs/shared/impala_common.xml
File docs/shared/impala_common.xml:

http://gerrit.cloudera.org:8080/#/c/7999/2/docs/shared/impala_common.xml@1247
PS2, Line 1247: does not affect
> fair pointer for me, but my comment is about whether this wording is clear
Fine with me to expand this to add my earlier explanation of what the 
"incremental" part of the stats actually is (to explain why dropping the 
"incremental" portion is fine).


http://gerrit.cloudera.org:8080/#/c/7999/2/docs/shared/impala_common.xml@1248
PS2, Line 1248: optimizations such as partition pruning.
> such as partition pruning or join ordering.
Actually I would remove partition pruning because stats have nothing to do with 
that (and we don't want to imply that they do)



--
To view, visit http://gerrit.cloudera.org:8080/7999
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia53a6518ce5541e5c9a2cd896856ce042a599b03
Gerrit-Change-Number: 7999
Gerrit-PatchSet: 2
Gerrit-Owner: John Russell 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Greg Rahn 
Gerrit-Reviewer: John Russell 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Silvius Rus 
Gerrit-Reviewer: Vuk Ercegovac 
Gerrit-Comment-Date: Fri, 06 Oct 2017 17:46:39 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] [DOCS] Tighten up advice about first COMPUTE INCREMENTAL STATS

2017-10-06 Thread Alex Behm (Code Review)
Alex Behm has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/7999 )

Change subject: [DOCS] Tighten up advice about first COMPUTE INCREMENTAL STATS
..


Patch Set 2:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/7999/2/docs/shared/impala_common.xml
File docs/shared/impala_common.xml:

http://gerrit.cloudera.org:8080/#/c/7999/2/docs/shared/impala_common.xml@1228
PS2, Line 1228: DROP STATS and
  : DROP INCREMENTAL STATS)
> are these drops required?
They are not required if you *exactly* what you are doing, but that does not 
apply to most people. Dropping first is always safe and definitely the 
recommended practice.

If you do not drop and switch back-and-forth between compute stats and compute 
incremental sats, then you may end up with "unexpected" table metadata.


http://gerrit.cloudera.org:8080/#/c/7999/2/docs/shared/impala_common.xml@1243
PS2, Line 1243: be cached on every impalad host. If 
this metadata for a table exceeds
> more specifically, impalads that are also coordinators?
Yes


http://gerrit.cloudera.org:8080/#/c/7999/2/docs/shared/impala_common.xml@1243
PS2, Line 1243: metadata for a table exceeds
  : 2 GB
> is there a diagnostic page that we can point to here that explains how to f
This is a longer story. I asked Bharath to help here.


http://gerrit.cloudera.org:8080/#/c/7999/2/docs/shared/impala_common.xml@1247
PS2, Line 1247: does not affect
> does that mean lack of stats has not affect on optimization or something el
Please see my explanation on what "incremental" stats is in previous patch 
sets. Dropping *incremental* stats has no effect on the row counts and NDVs 
used in query optimization.



--
To view, visit http://gerrit.cloudera.org:8080/7999
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia53a6518ce5541e5c9a2cd896856ce042a599b03
Gerrit-Change-Number: 7999
Gerrit-PatchSet: 2
Gerrit-Owner: John Russell 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Greg Rahn 
Gerrit-Reviewer: John Russell 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Silvius Rus 
Gerrit-Reviewer: Vuk Ercegovac 
Gerrit-Comment-Date: Fri, 06 Oct 2017 17:34:31 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-6021: Revert "IMPALA-6009: Upgrade Guava to 14.0.1"

2017-10-06 Thread Alex Behm (Code Review)
Alex Behm has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8225 )

Change subject: IMPALA-6021: Revert "IMPALA-6009: Upgrade Guava to 14.0.1"
..


Patch Set 1: Code-Review+2

Lesson learned: Other components should revert immediately. PhilZ is right of 
course :)


--
To view, visit http://gerrit.cloudera.org:8080/8225
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Idd07d9b011ebd407d7a274725a10b6371d024186
Gerrit-Change-Number: 8225
Gerrit-PatchSet: 1
Gerrit-Owner: Michael Brown 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Fri, 06 Oct 2017 16:02:11 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4622: [DOCS] New Kudu ALTER TABLE syntax

2017-10-06 Thread Alex Behm (Code Review)
Alex Behm has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8191 )

Change subject: IMPALA-4622: [DOCS] New Kudu ALTER TABLE syntax
..


Patch Set 4: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/8191
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7647dfe8acfdb598caf561d41d74e38a8b66ce9d
Gerrit-Change-Number: 8191
Gerrit-PatchSet: 4
Gerrit-Owner: John Russell 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Ambreen Kazi 
Gerrit-Reviewer: Greg Rahn 
Gerrit-Reviewer: John Russell 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Comment-Date: Fri, 06 Oct 2017 15:21:55 +
Gerrit-HasComments: No


[Impala-ASF-CR] [DOCS] Tighten up advice about first COMPUTE INCREMENTAL STATS

2017-10-06 Thread Alex Behm (Code Review)
Alex Behm has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/7999 )

Change subject: [DOCS] Tighten up advice about first COMPUTE INCREMENTAL STATS
..


Patch Set 2:

(9 comments)

http://gerrit.cloudera.org:8080/#/c/7999/2/docs/shared/impala_common.xml
File docs/shared/impala_common.xml:

http://gerrit.cloudera.org:8080/#/c/7999/2/docs/shared/impala_common.xml@1224
PS2, Line 1224: For a particular table, use either COMPUTE 
STATS or
Yes!


http://gerrit.cloudera.org:8080/#/c/7999/2/docs/shared/impala_common.xml@1233
PS2, Line 1233: When you run COMPUTE INCREMENTAL STATS 
on a table for the first time,
I suggest some minor rephrasing to drive home the "don't switch mantra" a 
little more, see comments.


http://gerrit.cloudera.org:8080/#/c/7999/2/docs/shared/impala_common.xml@1234
PS2, Line 1234: the statistics are computed again from scratch 
regardless of whether you previously ran
regardless of whether the table has existing stats.


http://gerrit.cloudera.org:8080/#/c/7999/2/docs/shared/impala_common.xml@1236
PS2, Line 1236: for scanning the entire table when switching from 
COMPUTE STATS to
when running COMPUTE INCREMENTAL STATS for the first time on a given table.

(do not mention switching... not supposed to do that)


http://gerrit.cloudera.org:8080/#/c/7999/2/docs/shared/impala_common.xml@1244
PS2, Line 1244: 2 GB, a serious error can occur. If only a limited 
number of partitions are actively being
If the aggregate metadata of all tables exceeds 2 GB you may experience service 
downtime (daemon crashes).

("serious error" really isn't clear to me)


http://gerrit.cloudera.org:8080/#/c/7999/2/docs/shared/impala_common.xml@1245
PS2, Line 1245: added or inserted into, you can run COMPUTE 
INCREMENTAL STATS for the active
Sorry my phrasing might have been misleading. By "active" partitions I meant 
those partitions that are being queried (i.e. read)... if you query some 
partitions very infrequently then there is no point in keeping incremental 
stats for them.


http://gerrit.cloudera.org:8080/#/c/7999/2/docs/shared/impala_common.xml@1248
PS2, Line 1248: optimizations such as partition pruning.
such as partition pruning or join ordering.


http://gerrit.cloudera.org:8080/#/c/7999/2/docs/topics/impala_partitioning.xml
File docs/topics/impala_partitioning.xml:

http://gerrit.cloudera.org:8080/#/c/7999/2/docs/topics/impala_partitioning.xml@624
PS2, Line 624: subset of partitions rather than the entire table. The 
incremental nature makes it suitable for large tables
Need to be careful here because "large tables" could be misinterpreted to mean 
"tables with many partitions".

I'd prefer to avoid the word "suitable" and instead use a phrasing that states 
it enables updating the stats as partitions are added. Whether incremental 
stats is "suitable" for anything is questionable because of the huge memory 
downside.

I'd agree that incremental stats could be suitable in situations where you have 
a huge partitioned table with a small rolling window of "active" partitions, so 
you only ever need to keep incremental stats on let's say <100 partitions.


http://gerrit.cloudera.org:8080/#/c/7999/2/docs/topics/impala_perf_stats.xml
File docs/topics/impala_perf_stats.xml:

http://gerrit.cloudera.org:8080/#/c/7999/2/docs/topics/impala_perf_stats.xml@361
PS2, Line 361:   COMPUTE STATS statement might take 
hours, or even days. That situation is where you switch
Rephrase to avoid "switch" since switching is bad



--
To view, visit http://gerrit.cloudera.org:8080/7999
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia53a6518ce5541e5c9a2cd896856ce042a599b03
Gerrit-Change-Number: 7999
Gerrit-PatchSet: 2
Gerrit-Owner: John Russell 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Greg Rahn 
Gerrit-Reviewer: John Russell 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Silvius Rus 
Gerrit-Comment-Date: Fri, 06 Oct 2017 06:43:06 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-2190: [DOCS] from timestamp() and to timestamp()

2017-10-06 Thread Alex Behm (Code Review)
Alex Behm has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8046 )

Change subject: IMPALA-2190: [DOCS] from_timestamp() and to_timestamp()
..


Patch Set 3: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/8046
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I5f6dfd3fb99a70975d712bbef6c05900eddadd27
Gerrit-Change-Number: 8046
Gerrit-PatchSet: 3
Gerrit-Owner: John Russell 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Andre Calfa 
Gerrit-Reviewer: Greg Rahn 
Gerrit-Reviewer: John Russell 
Gerrit-Comment-Date: Fri, 06 Oct 2017 06:16:10 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-6011: Remove use of Guava Hasher.

2017-10-05 Thread Alex Behm (Code Review)
Alex Behm has abandoned this change. ( http://gerrit.cloudera.org:8080/8216 )

Change subject: IMPALA-6011: Remove use of Guava Hasher.
..


Abandoned

Guava does some weird stuff internally such that the hash it produces for the 
same query is different than the hash produced by the code in this patch. 
Probably safer not to pursue this.

We can probably fix it, but I imagine the new code will be more 
complex/controversial, so I don't think it's worth the hassle

Sorry for the noise.
--
To view, visit http://gerrit.cloudera.org:8080/8216
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: abandon
Gerrit-Change-Id: I04bb436eb93a86b8db24b2a97f14047f828b
Gerrit-Change-Number: 8216
Gerrit-PatchSet: 2
Gerrit-Owner: Alex Behm 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Tim Armstrong 


  1   2   3   4   5   6   7   8   9   10   >