[Impala-ASF-CR] IMPALA-4187: Switch RPC latency metrics to histograms
Henry Robinson has posted comments on this change. Change subject: IMPALA-4187: Switch RPC latency metrics to histograms .. Patch Set 3: (1 comment) http://gerrit.cloudera.org:8080/#/c/4516/3/be/src/rpc/rpc-trace.cc File be/src/rpc/rpc-trace.cc: PS3, Line 179: constexpr int32_t SIXTY_MINUTES_IN_MS = 60 * 1000 * 60; > We'll see users with long running queries go over this (not uncommon) for h I checked - any value > 60 minutes will be clipped to 60 minutes. That seems ok - beyond 60 minutes I'm not sure there's much use in knowing the distribution. 60 minutes might be too much since most RPCs will never get anywhere close, but the total data structure cost is 100KB and there are only a handful of them for all the RPCs we support. -- To view, visit http://gerrit.cloudera.org:8080/4516 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I9ba6d4270dd5676eeeff35ad8d9dc5dcddd95e34 Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Henry RobinsonGerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Juan Yu Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Sailesh Mukil Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-3902: Scheduler improvements for running multiple fragment instances on a single backend
Henry Robinson has posted comments on this change. Change subject: IMPALA-3902: Scheduler improvements for running multiple fragment instances on a single backend .. Patch Set 14: (15 comments) I think it would be good to have Alex review Frontend.java again - he said it looked good overall in PS3 so hopefully there won't be many comments. http://gerrit.cloudera.org:8080/#/c/4054/12/be/src/scheduling/query-resource-mgr.cc File be/src/scheduling/query-resource-mgr.cc: Line 1: // Licensed to the Apache Software Foundation (ASF) under one Remove this file. http://gerrit.cloudera.org:8080/#/c/4054/12/be/src/scheduling/query-resource-mgr.h File be/src/scheduling/query-resource-mgr.h: Line 1: // Licensed to the Apache Software Foundation (ASF) under one Did your rebase go wrong? This file should be removed. http://gerrit.cloudera.org:8080/#/c/4054/12/be/src/scheduling/query-schedule.cc File be/src/scheduling/query-schedule.cc: PS12, Line 68: if (plan_node_to_fragment_idx_.size() < node.node_id + 1) { : plan_node_to_fragment_idx_.resize(node.node_id + 1); : plan_node_to_plan_node_idx_.resize(node.node_id + 1); : } I think you can find the max node ID in InitMtFragmentExecParams if you also loop over the plan nodes there. That seems better than all this conditional resizing, which is confusing to read. PS12, Line 105: std remove std:: http://gerrit.cloudera.org:8080/#/c/4054/12/be/src/scheduling/query-schedule.h File be/src/scheduling/query-schedule.h: PS12, Line 184: Returns next instance id. Instance ids are consecutive numbers generated from : /// the query id. shouldn't that be the opposite? Start at 0 if there's a coordinator fragment. Might be better reworded as "The coordinator fragment instance, if there is one, has ID 0, all other fragment instances start from 1." PS12, Line 278: what's 'static info'? http://gerrit.cloudera.org:8080/#/c/4054/14/be/src/service/fragment-mgr.cc File be/src/service/fragment-mgr.cc: PS14, Line 23: #include not needed (see below), remove PS14, Line 42: << hex << exec_params.fragment_instance_ctx.fragment_instance_id << dec use PrintId(). http://gerrit.cloudera.org:8080/#/c/4054/14/be/src/service/impala-server.cc File be/src/service/impala-server.cc: Line 848: random_generator uuid_generator; remove this line PS14, Line 1080: << hex << params.fragment_instance_id Use PrintId() here instead of hex / dec. http://gerrit.cloudera.org:8080/#/c/4054/14/be/src/service/query-exec-state.cc File be/src/service/query-exec-state.cc: Line 443: Status status = exec_env_->scheduler()->Schedule(schedule_.get()); Leave a TODO to simplify this as discussed. http://gerrit.cloudera.org:8080/#/c/4054/14/be/src/service/query-options.cc File be/src/service/query-options.cc: Line 398: case TImpalaQueryOptions::MT_DOP: { When the patch that added MT_NUM_CORES was committed, I asked to consider a mechanism for hiding query options. DId that come to anything? It's unfortunate that we're going to commit a query option that (AFAICT) will break clients if it's turned on. http://gerrit.cloudera.org:8080/#/c/4054/14/be/src/util/container-util.h File be/src/util/container-util.h: PS14, Line 79: V* FindOrInsert(std::unordered_map* m, const K& key, const V& default_val) { : typename std::unordered_map ::iterator it = m->find(key); : if (it == m->end()) { : it = m->insert(std::make_pair(key, default_val)).first; : } : return >second; : } : : template : V* FindOrInsert(boost::unordered_map * m, const K& key, const V& default_val) { : typename boost::unordered_map ::iterator it = m->find(key); : if (it == m->end()) { : it = m->insert(std::make_pair(key, default_val)).first; : } : return >second; : } Just add the map type as a template parameter and remove the other versions of this. http://gerrit.cloudera.org:8080/#/c/4054/14/common/thrift/ImpalaInternalService.thrift File common/thrift/ImpalaInternalService.thrift: PS14, Line 185: backend so this is independent of the number of fragment instances for that query on that backend? PS14, Line 333: nd the id of the first fragment instance (the coordinator instance) : // are identical. isn't this true now only if there is a coordinator fragment? Reword to make that clearer - looks like first fragment instance is always 0 regardless of whether there is coord fragment. -- To view, visit http://gerrit.cloudera.org:8080/4054 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I240445199e22f009f4e72fdb8754eb8d77e3d680 Gerrit-PatchSet: 14
[Impala-ASF-CR] IMPALA-3504: UDF for current timestamp in UTC
Youwei Wang has uploaded a new patch set (#7). Change subject: IMPALA-3504: UDF for current timestamp in UTC .. IMPALA-3504: UDF for current timestamp in UTC Purpose: Returns the current date and time (in the local time zone) as a TIMESTAMP value. Generated timestamp has been verified using online time services. Return type: timestamp Example: select utc_timestamp(); +---+ | utc_timestamp() | +---+ | 2016-09-23 09:42:42.931447000 | +---+ Change-Id: I5ee6ee192aa469f77c711f27ad324696a42004d1 --- M be/src/exprs/expr-test.cc M be/src/exprs/timestamp-functions-ir.cc M be/src/exprs/timestamp-functions.h M be/src/runtime/runtime-state.cc M be/src/runtime/runtime-state.h M be/src/runtime/timestamp-value.h M be/src/service/impala-server.cc M common/function-registry/impala_functions.py M common/thrift/ImpalaInternalService.thrift 9 files changed, 51 insertions(+), 0 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/90/4490/7 -- To view, visit http://gerrit.cloudera.org:8080/4490 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I5ee6ee192aa469f77c711f27ad324696a42004d1 Gerrit-PatchSet: 7 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Youwei WangGerrit-Reviewer: Jim Apple Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Youwei Wang
[Impala-ASF-CR] IMPALA-3504: UDF for current timestamp in UTC
Matthew Jacobs has posted comments on this change. Change subject: IMPALA-3504: UDF for current timestamp in UTC .. Patch Set 6: (6 comments) Thanks for bearing with us, Youwei! http://gerrit.cloudera.org:8080/#/c/4490/6/be/src/exprs/expr-test.cc File be/src/exprs/expr-test.cc: PS6, Line 4120: forward-lapsing time forward-lapsing? PS6, Line 4132: ut1, ut2; can you name these to reference the values they're holding, e.g. unixtime_utc, unixtime_local? PS6, Line 4136: const bool res1 This should be true unless there wasn't a value in the TimestampValue, which shouldn't happen here. You don't need to define res1/res2 and check after, just DCHECK(now_utc.ToUnixTime(...)); PS6, Line 4137: const bool res2 same http://gerrit.cloudera.org:8080/#/c/4490/6/be/src/service/impala-server.cc File be/src/service/impala-server.cc: PS6, Line 843: boost::date_time::c_local_adjustor::utc_to_local I'd still vote to not bother with the conversion, but it seems fine. PS6, Line 845: const time_duration td_delta = universal_time - local_time; : DCHECK(td_delta >= time_duration(-12, 0, 0) && td_delta <= time_duration(12, 0, 0)); I'm not sure we should check this. While this seems intuitive, there might be weird corner cases (e.g. leap seconds/minutes/whatever?). Who knows. I'm not sure we need to DCHECK this -- To view, visit http://gerrit.cloudera.org:8080/4490 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I5ee6ee192aa469f77c711f27ad324696a42004d1 Gerrit-PatchSet: 6 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Youwei WangGerrit-Reviewer: Jim Apple Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Youwei Wang Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-3902: Scheduler improvements for running multiple fragment instances on a single backend
Internal Jenkins has posted comments on this change. Change subject: IMPALA-3902: Scheduler improvements for running multiple fragment instances on a single backend .. Patch Set 14: Verified-1 Build failed: http://sandbox.jenkins.cloudera.com/job/impala-external-gerrit-verify-merge-ASF/233/ -- To view, visit http://gerrit.cloudera.org:8080/4054 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I240445199e22f009f4e72fdb8754eb8d77e3d680 Gerrit-PatchSet: 14 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Marcel KornackerGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Internal Jenkins Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Mostafa Mokhtar Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-3902: Scheduler improvements for running multiple fragment instances on a single backend
Hello Matthew Jacobs, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/4054 to look at the new patch set (#14). Change subject: IMPALA-3902: Scheduler improvements for running multiple fragment instances on a single backend .. IMPALA-3902: Scheduler improvements for running multiple fragment instances on a single backend This is an extension of the scheduler and coordinator for multi-threaded execution. It mainly removes the assumption of having one instance per fragment per host. The approach taken here is to create parallel data structures and control flow functions, where necessary, and otherwise to leave the existing single-instance logic in place. The parallel structures' and functions' names are prefixed with "Mt" to facilitate the enventual clean-up. Not much of an attempt was made to factor out common functionality between the Mt- and the single-threaded version, because the single-threaded version will disappear in a follow-on patch and refactoring the existing code to fit into two parallel functions from which it's being called might end up obscuring the code more than helping it. Also, this code is relatively stable and having two parallel paths won't cause much extra work (in terms of having to apply the same changes/fixes twice) in the medium term. Changes to data structures: - QuerySchedule: per-instance and per-fragment structs with complete execution parameters (instead of partially relying on TQueryExecRequest); the per-instance execution parameter struct is a child of the per-fragment parameter struct - explicit fragment id, with range 0..#fragments-1 (instead of relying on an index into an array in TQueryExecRequest) Excluded: - runtime filter handling - anything related to RM Change-Id: I240445199e22f009f4e72fdb8754eb8d77e3d680 --- M be/src/common/global-types.h M be/src/exec/exec-node.cc M be/src/exec/union-node.cc M be/src/runtime/coordinator.cc M be/src/runtime/coordinator.h M be/src/runtime/runtime-filter-bank.cc A be/src/scheduling/query-resource-mgr.cc A be/src/scheduling/query-resource-mgr.h M be/src/scheduling/query-schedule.cc M be/src/scheduling/query-schedule.h M be/src/scheduling/scheduler.h M be/src/scheduling/simple-scheduler.cc M be/src/scheduling/simple-scheduler.h M be/src/service/fragment-exec-state.cc M be/src/service/fragment-mgr.cc M be/src/service/impala-server.cc M be/src/service/query-exec-state.cc M be/src/service/query-options.cc M be/src/service/query-options.h M be/src/util/container-util.h M be/src/util/uid-util-test.cc M be/src/util/uid-util.h M common/thrift/ExecStats.thrift M common/thrift/Frontend.thrift M common/thrift/ImpalaInternalService.thrift M common/thrift/ImpalaService.thrift M common/thrift/Planner.thrift M common/thrift/Types.thrift M fe/src/main/java/com/cloudera/impala/common/TreeNode.java M fe/src/main/java/com/cloudera/impala/planner/Planner.java M fe/src/main/java/com/cloudera/impala/service/Frontend.java M fe/src/test/java/com/cloudera/impala/planner/PlannerTestBase.java 32 files changed, 2,034 insertions(+), 553 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/54/4054/14 -- To view, visit http://gerrit.cloudera.org:8080/4054 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I240445199e22f009f4e72fdb8754eb8d77e3d680 Gerrit-PatchSet: 14 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Marcel KornackerGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Mostafa Mokhtar
[Impala-ASF-CR] IMPALA-3902: Scheduler improvements for running multiple fragment instances on a single backend
Matthew Jacobs has posted comments on this change. Change subject: IMPALA-3902: Scheduler improvements for running multiple fragment instances on a single backend .. Patch Set 13: Code-Review+1 -- To view, visit http://gerrit.cloudera.org:8080/4054 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I240445199e22f009f4e72fdb8754eb8d77e3d680 Gerrit-PatchSet: 13 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Marcel KornackerGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Mostafa Mokhtar Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4101: qgen: Hive join predicates should only contains equality functions
Hello Michael Brown, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/4419 to look at the new patch set (#8). Change subject: IMPALA-4101: qgen: Hive join predicates should only contains equality functions .. IMPALA-4101: qgen: Hive join predicates should only contains equality functions Background: Hive only supports equi-joins in its JOIN clause, while Postgres and Impala support more complex functions such as <, <=, >, >=, etc. This change modifies the QueryGenerator._create_relational_join_condition and QueryGenerator._create_boolean_func_tree methods to only construct equality join conditions under certain conditions. The _create_boolean_func_tree method is invoked via QueryGenerator -> create_query -> _create_from_clause -> _create_join_clause -> _create_relational_join_condition -> _create_boolean_func_tree. This method is invoked when constructing the JOIN, WHERE, and HAVING clauses. It creates a tree of functions that would typically be found in any of these clauses. Changes: The parameter "signatures" is added to the method _create_boolean_func_tree, and it lists out all the allowed signatures the function is allowed to use. Previously, this list of signatures was populated by calling _funcs_to_allowed_signatures(FUNCS), and if "signatures" is not specified, then the code defaults back to the results of that method. A new method in the DefaultProfile called get_allowed_join_signatures is introduced and returns a list of function signatures that are allowed within a JOIN clause. The DefaultProfile allows all given signatures, while the HiveProfile only allows for the Equals and And functions, as well as any function that operates over only one column. The reason for these restrictions is that Hive only allows equality joins, does not allow OR operators in the join clause, and has some restrictions on functions that operate over multiple different tables. This last restriction is somewhat subtle; if one side of the equals operator contains a function that operates over two different tables, the other side of the operator cannot contain either of those tables. While it is possible to have functions that take in multiple input parameters, the inputs must be taken from specific tables to prevent Hive from throwing a compile time exception. Adding support for this in qgen code will require significant effort and modification to some core methods (_create_relational_join_condition and _populate_func_with_vals), so it's best to disable these for Hive altogether. Note that the _create_boolean_func_tree still allows for OR operators due to some logic around its "and_or_fill_ratio" variable. The plan is to fix this in a future patch that specifically focuses on removing OR operators from Hive JOIN clauses. Minor change to discrepancy_searcher so that the logs print out "Hive" instead of "Impala" when running against Hive. Testing: * Added a new unit test that ensures the HiveProfile only returns equality joins * Unit tests pass * Tested against Hive locally * Tested against Impala via Leopard * Tested against Impala via the Discrepancy Checker Change-Id: Ibe8832a03cfa0d7ecc293ec6db6db2bcb34ab459 --- M tests/comparison/discrepancy_searcher.py M tests/comparison/query_generator.py M tests/comparison/query_profile.py A tests/comparison/tests/hive/test_hive_create_relational_join_condition.py 4 files changed, 140 insertions(+), 20 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/19/4419/8 -- To view, visit http://gerrit.cloudera.org:8080/4419 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ibe8832a03cfa0d7ecc293ec6db6db2bcb34ab459 Gerrit-PatchSet: 8 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: stak...@cloudera.com Gerrit-Reviewer: David KnuppGerrit-Reviewer: Michael Brown Gerrit-Reviewer: Taras Bobrovytsky Gerrit-Reviewer: stak...@cloudera.com