[Impala-ASF-CR] IMPALA-4187: Switch RPC latency metrics to histograms

2016-09-24 Thread Henry Robinson (Code Review)
Henry Robinson has posted comments on this change.

Change subject: IMPALA-4187: Switch RPC latency metrics to histograms
..


Patch Set 3:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/4516/3/be/src/rpc/rpc-trace.cc
File be/src/rpc/rpc-trace.cc:

PS3, Line 179:   constexpr int32_t SIXTY_MINUTES_IN_MS = 60 * 1000 * 60;
> We'll see users with long running queries go over this (not uncommon) for h
I checked - any value > 60 minutes will be clipped to 60 minutes. That seems ok 
- beyond 60 minutes I'm not sure there's much use in knowing the distribution. 
60 minutes might be too much since most RPCs will never get anywhere close, but 
the total data structure cost is 100KB and there are only a handful of them for 
all the RPCs we support.


-- 
To view, visit http://gerrit.cloudera.org:8080/4516
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I9ba6d4270dd5676eeeff35ad8d9dc5dcddd95e34
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Henry Robinson 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Juan Yu 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-3902: Scheduler improvements for running multiple fragment instances on a single backend

2016-09-24 Thread Henry Robinson (Code Review)
Henry Robinson has posted comments on this change.

Change subject: IMPALA-3902: Scheduler improvements for running multiple 
fragment instances on a single backend
..


Patch Set 14:

(15 comments)

I think it would be good to have Alex review Frontend.java again - he said it 
looked good overall in PS3 so hopefully there won't be many comments.

http://gerrit.cloudera.org:8080/#/c/4054/12/be/src/scheduling/query-resource-mgr.cc
File be/src/scheduling/query-resource-mgr.cc:

Line 1: // Licensed to the Apache Software Foundation (ASF) under one
Remove this file.


http://gerrit.cloudera.org:8080/#/c/4054/12/be/src/scheduling/query-resource-mgr.h
File be/src/scheduling/query-resource-mgr.h:

Line 1: // Licensed to the Apache Software Foundation (ASF) under one
Did your rebase go wrong? This file should be removed.


http://gerrit.cloudera.org:8080/#/c/4054/12/be/src/scheduling/query-schedule.cc
File be/src/scheduling/query-schedule.cc:

PS12, Line 68: if (plan_node_to_fragment_idx_.size() < node.node_id + 1) {
 : plan_node_to_fragment_idx_.resize(node.node_id + 1);
 : plan_node_to_plan_node_idx_.resize(node.node_id + 1);
 :   }
I think you can find the max node ID in InitMtFragmentExecParams if you also 
loop over the plan nodes there. That seems better than all this conditional 
resizing, which is confusing to read.


PS12, Line 105: std
remove std::


http://gerrit.cloudera.org:8080/#/c/4054/12/be/src/scheduling/query-schedule.h
File be/src/scheduling/query-schedule.h:

PS12, Line 184: Returns next instance id. Instance ids are consecutive numbers 
generated from
  :   /// the query id.
shouldn't that be the opposite? Start at 0 if there's a coordinator fragment.

Might be better reworded as "The coordinator fragment instance, if there is 
one, has ID 0, all other fragment instances start from 1."


PS12, Line 278: 
what's 'static info'?


http://gerrit.cloudera.org:8080/#/c/4054/14/be/src/service/fragment-mgr.cc
File be/src/service/fragment-mgr.cc:

PS14, Line 23: #include 
not needed (see below), remove


PS14, Line 42: << hex << exec_params.fragment_instance_ctx.fragment_instance_id 
<< dec
use PrintId().


http://gerrit.cloudera.org:8080/#/c/4054/14/be/src/service/impala-server.cc
File be/src/service/impala-server.cc:

Line 848:   random_generator uuid_generator;
remove this line


PS14, Line 1080: << hex << params.fragment_instance_id
Use PrintId() here instead of hex / dec.


http://gerrit.cloudera.org:8080/#/c/4054/14/be/src/service/query-exec-state.cc
File be/src/service/query-exec-state.cc:

Line 443:   Status status = exec_env_->scheduler()->Schedule(schedule_.get());
Leave a TODO to simplify this as discussed.


http://gerrit.cloudera.org:8080/#/c/4054/14/be/src/service/query-options.cc
File be/src/service/query-options.cc:

Line 398:   case TImpalaQueryOptions::MT_DOP: {
When the patch that added MT_NUM_CORES was committed, I asked to consider a 
mechanism for hiding query options. DId that come to anything? It's unfortunate 
that we're going to commit a query option that (AFAICT) will break clients if 
it's turned on.


http://gerrit.cloudera.org:8080/#/c/4054/14/be/src/util/container-util.h
File be/src/util/container-util.h:

PS14, Line 79: V* FindOrInsert(std::unordered_map* m, const K& key, const 
V& default_val) {
 :   typename std::unordered_map::iterator it = m->find(key);
 :   if (it == m->end()) {
 : it = m->insert(std::make_pair(key, default_val)).first;
 :   }
 :   return >second;
 : }
 : 
 : template 
 : V* FindOrInsert(boost::unordered_map* m, const K& key, 
const V& default_val) {
 :   typename boost::unordered_map::iterator it = m->find(key);
 :   if (it == m->end()) {
 : it = m->insert(std::make_pair(key, default_val)).first;
 :   }
 :   return >second;
 : }
Just add the map type as a template parameter and remove the other versions of 
this.


http://gerrit.cloudera.org:8080/#/c/4054/14/common/thrift/ImpalaInternalService.thrift
File common/thrift/ImpalaInternalService.thrift:

PS14, Line 185: backend
so this is independent of the number of fragment instances for that query on 
that backend?


PS14, Line 333: nd the id of the first fragment instance (the coordinator 
instance)
  :   // are identical.
isn't this true now only if there is a coordinator fragment? Reword to make 
that clearer - looks like first fragment instance is always 0 regardless of 
whether there is coord fragment.


-- 
To view, visit http://gerrit.cloudera.org:8080/4054
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I240445199e22f009f4e72fdb8754eb8d77e3d680
Gerrit-PatchSet: 14

[Impala-ASF-CR] IMPALA-3504: UDF for current timestamp in UTC

2016-09-24 Thread Youwei Wang (Code Review)
Youwei Wang has uploaded a new patch set (#7).

Change subject: IMPALA-3504: UDF for current timestamp in UTC
..

IMPALA-3504: UDF for current timestamp in UTC

Purpose: Returns the current date and time (in the local time zone)
as a TIMESTAMP value. Generated timestamp has been verified using
online time services.
Return type: timestamp

Example:
select utc_timestamp();
+---+
| utc_timestamp()   |
+---+
| 2016-09-23 09:42:42.931447000 |
+---+

Change-Id: I5ee6ee192aa469f77c711f27ad324696a42004d1
---
M be/src/exprs/expr-test.cc
M be/src/exprs/timestamp-functions-ir.cc
M be/src/exprs/timestamp-functions.h
M be/src/runtime/runtime-state.cc
M be/src/runtime/runtime-state.h
M be/src/runtime/timestamp-value.h
M be/src/service/impala-server.cc
M common/function-registry/impala_functions.py
M common/thrift/ImpalaInternalService.thrift
9 files changed, 51 insertions(+), 0 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/90/4490/7
-- 
To view, visit http://gerrit.cloudera.org:8080/4490
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I5ee6ee192aa469f77c711f27ad324696a42004d1
Gerrit-PatchSet: 7
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Youwei Wang 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Youwei Wang 


[Impala-ASF-CR] IMPALA-3504: UDF for current timestamp in UTC

2016-09-24 Thread Matthew Jacobs (Code Review)
Matthew Jacobs has posted comments on this change.

Change subject: IMPALA-3504: UDF for current timestamp in UTC
..


Patch Set 6:

(6 comments)

Thanks for bearing with us, Youwei!

http://gerrit.cloudera.org:8080/#/c/4490/6/be/src/exprs/expr-test.cc
File be/src/exprs/expr-test.cc:

PS6, Line 4120: forward-lapsing time
forward-lapsing?


PS6, Line 4132: ut1, ut2;
can you name these to reference the values they're holding, e.g. unixtime_utc, 
unixtime_local?


PS6, Line 4136: const bool res1
This should be true unless there wasn't a value in the TimestampValue, which 
shouldn't happen here. You don't need to define res1/res2 and check after, just 
DCHECK(now_utc.ToUnixTime(...));


PS6, Line 4137: const bool res2
same


http://gerrit.cloudera.org:8080/#/c/4490/6/be/src/service/impala-server.cc
File be/src/service/impala-server.cc:

PS6, Line 843: boost::date_time::c_local_adjustor::utc_to_local
I'd still vote to not bother with the conversion, but it seems fine.


PS6, Line 845:   const time_duration td_delta = universal_time - local_time;
 :   DCHECK(td_delta >= time_duration(-12, 0, 0) && td_delta <= 
time_duration(12, 0, 0));
I'm not sure we should check this. While this seems intuitive, there might be 
weird corner cases (e.g. leap seconds/minutes/whatever?). Who knows. I'm not 
sure we need to DCHECK this


-- 
To view, visit http://gerrit.cloudera.org:8080/4490
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I5ee6ee192aa469f77c711f27ad324696a42004d1
Gerrit-PatchSet: 6
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Youwei Wang 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Youwei Wang 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-3902: Scheduler improvements for running multiple fragment instances on a single backend

2016-09-24 Thread Internal Jenkins (Code Review)
Internal Jenkins has posted comments on this change.

Change subject: IMPALA-3902: Scheduler improvements for running multiple 
fragment instances on a single backend
..


Patch Set 14: Verified-1

Build failed: 
http://sandbox.jenkins.cloudera.com/job/impala-external-gerrit-verify-merge-ASF/233/

-- 
To view, visit http://gerrit.cloudera.org:8080/4054
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I240445199e22f009f4e72fdb8754eb8d77e3d680
Gerrit-PatchSet: 14
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Marcel Kornacker 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Internal Jenkins
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-3902: Scheduler improvements for running multiple fragment instances on a single backend

2016-09-24 Thread Marcel Kornacker (Code Review)
Hello Matthew Jacobs,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/4054

to look at the new patch set (#14).

Change subject: IMPALA-3902: Scheduler improvements for running multiple 
fragment instances on a single backend
..

IMPALA-3902: Scheduler improvements for running multiple fragment instances on
a single backend

This is an extension of the scheduler and coordinator for multi-threaded
execution. It mainly removes the assumption of having one instance per
fragment per host. The approach taken here is to create parallel data
structures and control flow functions, where necessary, and otherwise to
leave the existing single-instance logic in place. The parallel
structures' and functions' names are prefixed with "Mt" to facilitate
the enventual clean-up.
Not much of an attempt was made to factor out common functionality
between the Mt- and the single-threaded version, because the
single-threaded version will disappear in a follow-on patch and
refactoring the existing code to fit into two parallel functions from
which it's being called might end up obscuring the code more than helping
it. Also, this code is relatively stable and having two parallel paths won't
cause much extra work (in terms of having to apply the same changes/fixes
twice) in the medium term.

Changes to data structures:
- QuerySchedule: per-instance and per-fragment structs with complete
  execution parameters (instead of partially relying on TQueryExecRequest);
  the per-instance execution parameter struct is a child of the per-fragment
  parameter struct
- explicit fragment id, with range 0..#fragments-1 (instead of relying on an
  index into an array in TQueryExecRequest)

Excluded:
- runtime filter handling
- anything related to RM

Change-Id: I240445199e22f009f4e72fdb8754eb8d77e3d680
---
M be/src/common/global-types.h
M be/src/exec/exec-node.cc
M be/src/exec/union-node.cc
M be/src/runtime/coordinator.cc
M be/src/runtime/coordinator.h
M be/src/runtime/runtime-filter-bank.cc
A be/src/scheduling/query-resource-mgr.cc
A be/src/scheduling/query-resource-mgr.h
M be/src/scheduling/query-schedule.cc
M be/src/scheduling/query-schedule.h
M be/src/scheduling/scheduler.h
M be/src/scheduling/simple-scheduler.cc
M be/src/scheduling/simple-scheduler.h
M be/src/service/fragment-exec-state.cc
M be/src/service/fragment-mgr.cc
M be/src/service/impala-server.cc
M be/src/service/query-exec-state.cc
M be/src/service/query-options.cc
M be/src/service/query-options.h
M be/src/util/container-util.h
M be/src/util/uid-util-test.cc
M be/src/util/uid-util.h
M common/thrift/ExecStats.thrift
M common/thrift/Frontend.thrift
M common/thrift/ImpalaInternalService.thrift
M common/thrift/ImpalaService.thrift
M common/thrift/Planner.thrift
M common/thrift/Types.thrift
M fe/src/main/java/com/cloudera/impala/common/TreeNode.java
M fe/src/main/java/com/cloudera/impala/planner/Planner.java
M fe/src/main/java/com/cloudera/impala/service/Frontend.java
M fe/src/test/java/com/cloudera/impala/planner/PlannerTestBase.java
32 files changed, 2,034 insertions(+), 553 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/54/4054/14
-- 
To view, visit http://gerrit.cloudera.org:8080/4054
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I240445199e22f009f4e72fdb8754eb8d77e3d680
Gerrit-PatchSet: 14
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Marcel Kornacker 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Mostafa Mokhtar 


[Impala-ASF-CR] IMPALA-3902: Scheduler improvements for running multiple fragment instances on a single backend

2016-09-24 Thread Matthew Jacobs (Code Review)
Matthew Jacobs has posted comments on this change.

Change subject: IMPALA-3902: Scheduler improvements for running multiple 
fragment instances on a single backend
..


Patch Set 13: Code-Review+1

-- 
To view, visit http://gerrit.cloudera.org:8080/4054
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I240445199e22f009f4e72fdb8754eb8d77e3d680
Gerrit-PatchSet: 13
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Marcel Kornacker 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4101: qgen: Hive join predicates should only contains equality functions

2016-09-24 Thread Anonymous Coward (Code Review)
Hello Michael Brown,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/4419

to look at the new patch set (#8).

Change subject: IMPALA-4101: qgen: Hive join predicates should only contains 
equality functions
..

IMPALA-4101: qgen: Hive join predicates should only contains equality functions

Background:

Hive only supports equi-joins in its JOIN clause, while Postgres and Impala 
support more
complex functions such as <, <=, >, >=, etc. This change modifies the
QueryGenerator._create_relational_join_condition and
QueryGenerator._create_boolean_func_tree methods to only construct equality join
conditions under certain conditions.

The _create_boolean_func_tree method is invoked via
QueryGenerator -> create_query -> _create_from_clause -> _create_join_clause ->
_create_relational_join_condition -> _create_boolean_func_tree. This method is 
invoked
when constructing the JOIN, WHERE, and HAVING clauses. It creates a tree of 
functions
that would typically be found in any of these clauses.

Changes:

The parameter "signatures" is added to the method _create_boolean_func_tree, 
and it lists
out all the allowed signatures the function is allowed to use. Previously, this 
list of
signatures was populated by calling _funcs_to_allowed_signatures(FUNCS), and if
"signatures" is not specified, then the code defaults back to the results of 
that method.
A new method in the DefaultProfile called get_allowed_join_signatures is 
introduced and
returns a list of function signatures that are allowed within a JOIN clause. The
DefaultProfile allows all given signatures, while the HiveProfile only allows 
for the
Equals and And functions, as well as any function that operates over only one 
column.
The reason for these restrictions is that Hive only allows equality joins, does 
not allow
OR operators in the join clause, and has some restrictions on functions that 
operate over
multiple different tables. This last restriction is somewhat subtle; if one 
side of the
equals operator contains a function that operates over two different tables, 
the other
side of the operator cannot contain either of those tables. While it is 
possible to have
functions that take in multiple input parameters, the inputs must be taken from 
specific
tables to prevent Hive from throwing a compile time exception. Adding support 
for this in
qgen code will require significant effort and modification to some core methods
(_create_relational_join_condition and _populate_func_with_vals), so it's best 
to disable
these for Hive altogether.

Note that the _create_boolean_func_tree still allows for OR operators due to 
some logic
around its "and_or_fill_ratio" variable. The plan is to fix this in a future 
patch that
specifically focuses on removing OR operators from Hive JOIN clauses.

Minor change to discrepancy_searcher so that the logs print out "Hive" instead 
of
"Impala" when running against Hive.

Testing:

* Added a new unit test that ensures the HiveProfile only returns equality joins
* Unit tests pass
* Tested against Hive locally
* Tested against Impala via Leopard
* Tested against Impala via the Discrepancy Checker

Change-Id: Ibe8832a03cfa0d7ecc293ec6db6db2bcb34ab459
---
M tests/comparison/discrepancy_searcher.py
M tests/comparison/query_generator.py
M tests/comparison/query_profile.py
A tests/comparison/tests/hive/test_hive_create_relational_join_condition.py
4 files changed, 140 insertions(+), 20 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/19/4419/8
-- 
To view, visit http://gerrit.cloudera.org:8080/4419
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ibe8832a03cfa0d7ecc293ec6db6db2bcb34ab459
Gerrit-PatchSet: 8
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: stak...@cloudera.com
Gerrit-Reviewer: David Knupp 
Gerrit-Reviewer: Michael Brown 
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-Reviewer: stak...@cloudera.com