Matthew Jacobs has posted comments on this change.

Change subject: IMPALA-3902: Scheduler improvements for running multiple 
fragment instances on a single backend
......................................................................


Patch Set 10:

(26 comments)

How much testing have you done? I'd think you should be able to run the 
existing tests in ST mode.

http://gerrit.cloudera.org:8080/#/c/4054/10/be/src/runtime/coordinator.cc
File be/src/runtime/coordinator.cc:

PS10, Line 294: boost
Is there any reason for the boost sets?


PS10, Line 750: instance_params
why doesn't this params need a cref, but the one in StartRemoteFragments() 
needed it?


PS10, Line 1971:  TODO-MT: remove
please mention this only happens for ST because in the MT case it is set up in 
MtInitExecSummary


Line 1975:     }
in the else case you can DCHECK that exec_summary.exec_stats is the right size.

It should be either 0 (ST case when it wasn't initialized lazily) or the total 
num_instances. (Unless I'm misunderstanding.)


PS10, Line 2083: // TODO: use FragmentInstanceState::error_log_ instead
After the coord is treated as a remote FragmentInstanceState?


PS10, Line 2253: // Make a PublishFilter rpc to 'impalad' for given 
fragment_instance_id
               : // and params.
               : // This takes by-value parameters because we cannot guarantee 
that the
               : // originating coordinator won't be destroyed while this 
executes.
nit wrap to 90chars


http://gerrit.cloudera.org:8080/#/c/4054/10/be/src/runtime/coordinator.h
File be/src/runtime/coordinator.h:

PS10, Line 259:   /// TODO: document lock ordering; it looks like it's:
              :   /// 1. FragmentInstanceState::lock_
              :   /// 2. lock_
this seems to contradict the wording in FragmentInstanceState::lock_

It also looks to me like lock_ gets taken first.


http://gerrit.cloudera.org:8080/#/c/4054/10/be/src/runtime/runtime-filter-bank.cc
File be/src/runtime/runtime-filter-bank.cc:

PS10, Line 84: QUERY
will this (and below) be noisy?


http://gerrit.cloudera.org:8080/#/c/4054/10/be/src/scheduling/query-schedule.h
File be/src/scheduling/query-schedule.h:

PS10, Line 90:  shared across fragment instances
for all fragment instances

otherwise it's weird for this to own a vector of FInstanceExecParams


Line 106: 
remove extra line


Line 153:   /// TODO-MT: get rid of special casing and always return the total
when coordinator is handled as a remote fragment?
If so, please specify


Line 157:   /// TODO-MT: remove
when coordinator is handled as a remote fragment?
If so, can you specify?


PS10, Line 143:   /// TODO-MT: remove; this is actually only the number of 
remote instances
              :   /// (from the coordinator's perspective)
              :   void set_num_fragment_instances(int64_t 
num_fragment_instances) {
              :     num_fragment_instances_ = num_fragment_instances;
              :   }
              : 
              :   /// Returns the number of fragment instances registered with 
this schedule.
              :   /// MT: total number of fragment instances
              :   /// ST: value set with set_num_fragment_instances(); excludes 
coord instance
              :   /// (in effect the number of remote instances)
              :   /// TODO-MT: get rid of special casing and always return the 
total
              :   int GetNumFragmentInstances() const;
              : 
              :   /// Returns the total number of fragment instances, incl. 
coordinator fragment.
              :   /// TODO-MT: remove
              :   int GetTotalFInstances() const;
              : 
              :   /// Returns the number of remote fragment instances (excludes 
coordinator).
              :   /// Works for both MT and ST.
              :   int GetNumRemoteFInstances() const;
This is all still pretty confusing.

My understanding is that after Henry's coordinator change, we can remove 2 of 
these, is that right? Is it this last one that we'll keep?


PS10, Line 172:   /// Map node ids to the index of their fragment in 
TQueryExecRequest.fragments.
              :   int32_t GetFragmentIdx(PlanNodeId id) const { return 
plan_node_to_fragment_idx_[id]; }
              : 
              :   /// Map node ids to the id of their containing fragment.
              :   FragmentId GetFragmentId(PlanNodeId id) const { return 
plan_node_to_fragment_id_[id]; }
this is very confusing. I think you can get rid of the first one if you change 
the code in SimpleScheduler::ComputeScanRangeAssignment to use 
GetContainingFragment().

Then I think you can also remove plan_node_to_fragment_idx_, I don't think 
it'll be necessary for anything else.


PS10, Line 187: 
              :   /// Map node ids to the index of the node inside their 
plan.nodes list.
              :   int32_t GetNodeIdx(PlanNodeId id) const { return 
plan_node_to_plan_node_idx_[id]; }
I think you can remove this- there's only 1 usage I can see in 
SimpleScheduler::ComputeScanRangeAssignment, and I think you can use GetNode() 
instead


PS10, Line 210:   const MtFragmentExecParams* GetCoordFragmentExecParams() 
const {
              :     const TPlanFragment& coord_fragment =  
request_.mt_plan_exec_info[0].fragments[0];
              :     if (coord_fragment.partition.type != 
TPartitionType::UNPARTITIONED) return nullptr;
              :     return &mt_fragment_exec_params_[coord_fragment.id];
              :   }
I only see this used by the fn below, and I only see the fn below used once by 
Coordinator. Can we keep 1 of these? E.g. either merge this into the fn below 
or keep this one and perform the logic in the fn below where it is needed in 
Coordinator.

There are a lot of fns in this class already, so we should consider simplifying 
by removing those that are only used once.


PS10, Line 244:   /// Maps from plan node id to its fragment index. Filled in 
c'tor.
              :   /// TODO-MT: remove
              :   std::vector<int32_t> plan_node_to_fragment_idx_;
I think we can get rid of this by changing ComputeScanRangeAssignment, and I 
think we should do it now. Too many vectors of ids/idxs.


http://gerrit.cloudera.org:8080/#/c/4054/10/be/src/scheduling/scheduler.h
File be/src/scheduling/scheduler.h:

PS10, Line 49:  whose execution is to be coordinated by coord
remove


PS10, Line 51:   /// If resource management is enabled, also reserves resources 
from the central
             :   /// resource manager (Yarn via Llama) to run the query in. 
This function blocks until
             :   /// the reservation request has been granted or denied.
whoops looks like we missed this in Henry's change. Can you change this to say 
something like:

The schedule is submitted to admission control before returning.


http://gerrit.cloudera.org:8080/#/c/4054/10/be/src/scheduling/simple-scheduler.cc
File be/src/scheduling/simple-scheduler.cc:

PS10, Line 900:  // TODO-MT: call AdmitQuery()
just to follow up on our conversation in person: 

I don't see a reason not to do this, it should at least work with the # of 
queries limit. While we plan on doing future in AC work to help MT (i.e. 
cores), maybe this is useful to limit the number of queries to 1. Were you 
thinking we would enable this only after we have support for # of cores in AC?


http://gerrit.cloudera.org:8080/#/c/4054/10/common/thrift/ExecStats.thrift
File common/thrift/ExecStats.thrift:

PS10, Line 69: 
             :   // One entry for each BE executing this plan node. True if 
this plan node is still
             :   // running.
             :   // TODO: is this used?
             :   8: optional list<bool> is_active
I don't see it used anywhere. I tested removing it and everything still 
compiles. Can you remove it?


http://gerrit.cloudera.org:8080/#/c/4054/10/common/thrift/Planner.thrift
File common/thrift/Planner.thrift:

PS10, Line 35:   // TODO: should this be called idx, to distinguish more 
clearly from a
             :   // globally unique id?
I'd prefer to call this an idx given that we have fragment_instance_id of type 
TUniqueId on TPlanFragmentInstanceCtx and TReportExecStatusParams. That alone 
might be OK, but it requires more thought than necessary to read the code (e.g. 
in query-schedule) where we use it to index into maps, sort on it, etc., and I 
have to remember it isn't the TUniqueId.


PS10, Line 37: TFragmentId
... and then maybe this would be better as an int? I'm not sure the typedef 
gets us anything.


http://gerrit.cloudera.org:8080/#/c/4054/10/fe/src/main/java/com/cloudera/impala/service/Frontend.java
File fe/src/main/java/com/cloudera/impala/service/Frontend.java:

PS10, Line 914:    * TODO-MT: if we need to apply the same updates to this 
function and
              :    * createExecRequest(), pull out the common functionality 
into a separate function
We've already accepted a split between MT & ST paths, so I think this 
conditional TODO can be removed. We know refactoring is possible in a number of 
places but we'll remove the ST code--maybe leave a TODO-MT to remove there.


PS10, Line 982:  
TODO-MT


PS10, Line 1029:     // create plan
remove


-- 
To view, visit http://gerrit.cloudera.org:8080/4054
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I240445199e22f009f4e72fdb8754eb8d77e3d680
Gerrit-PatchSet: 10
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Marcel Kornacker <mar...@cloudera.com>
Gerrit-Reviewer: Alex Behm <alex.b...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dhe...@cloudera.com>
Gerrit-Reviewer: Henry Robinson <he...@cloudera.com>
Gerrit-Reviewer: Lars Volker <l...@cloudera.com>
Gerrit-Reviewer: Marcel Kornacker <mar...@cloudera.com>
Gerrit-Reviewer: Matthew Jacobs <m...@cloudera.com>
Gerrit-Reviewer: Mostafa Mokhtar <mmokh...@cloudera.com>
Gerrit-HasComments: Yes

Reply via email to