Alex Behm has posted comments on this change.

Change subject: IMPALA-4014: HEADERS ONLY: Introduce query-wide execution state.

Patch Set 1:


Mostly clarifying questions. Sorry if Henry and you had already reached 
consensus on some of these.
File be/src/runtime/fragment-instance-state.h:

Line 27: /// Tear-down happens automatically in the d'tor and frees all memory 
allocated for
Didn't we want to move away from implicit destruction? Why is it ok to rely on 
that here?

Line 31: /// ReportProfile() is invoked periodically to report the execution 
status. The
No ReportProfile() in here. Wouldn't the reporting be controlled by the 
QueryState by periodically collecting and aggregating the profile() from all 
fragment instance states?

Line 52:   /// It is an error to delete a FragmentInstanceState before 
Open()/GetNext() (depending
What if Prepare() fails?

Line 94:   Status Open();
Given my comment below, should this be Exec() or similar?

Line 100:   Status GetNext(RowBatch** batch);
What's this call for? Don't all fragment instances end in a sink? I know that 
today the coord fragment is an exception, but that's going to change.

Line 120:   void ReleaseThreadToken();
When is this token acquired exactly?
File be/src/runtime/query-state.h:

Line 24: /// Central class for all backend execution state created for a 
particular query,
This first paragraph could use some cleanup, e.g., grammar of the first 
sentence is weird, "such as" seems misplaced. Also there seems to be some 
redundancy with the FragmentInstanceState explanations.

Line 30: /// The lifetime of an instance of this class is dictated by a 
reference count.
This sounds much like a shared_ptr, i.e., implicit management of the lifetime. 
What is the motivation for this design? It does not seem to prevent certain 
interesting races with e.g., RPCs wanting to access the query state after other 
threads are done with it (has transitioned to a 0 ref count).

Incoming RPCs for data exchanges seem especially interesting, e.g. when there 
is a limit somewhere downstream that could cause cancellation.

Line 41:   class Guard {
ScopedQueryState? This makes it clear that the access is scoped, and we already 
have similarly names classes such as ScopedSessionState.

Line 66:   /// Illegal to call any other function of this class after this call 
When is this legal to call with respect to the reference count? Will this call 
block until reference count has reached 0?

Line 76:   Status PublishFilter(const TUniqueId& dst_finstance_id, int32_t 
Isn't a filter really published to a plan node contained in all relevant 
fragment instances? In other words, why not have:
PublishFilter(TPlanNodeId dst_plan_node_id, int32_t filter_id, const 
TBloomFilter& thrift_bloom_filter);

That way we'll only have to deserialize the thrift_bloom_filter once and we can 
shield callers from knowing how many target fragment instances there are.

Another thought: Aren't runtime filters really shared among all relevant 
fragment instances? It might make more sense to have the RuntimeFilterBanks in 
here instead of the FragmentInstanceStates. Are we worried about exploding the 
memory consumption of runtime filters by DOP?

Line 78: };
We'll deal with local filter aggregation separately correct?

To view, visit
To unsubscribe, visit

Gerrit-MessageType: comment
Gerrit-Change-Id: I962ae6b7cb7dc0d07fbb8f70317aeb01d88d400b
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Marcel Kornacker <>
Gerrit-Reviewer: Alex Behm <>
Gerrit-Reviewer: Henry Robinson <>
Gerrit-Reviewer: Lars Volker <>
Gerrit-Reviewer: Matthew Jacobs <>
Gerrit-HasComments: Yes

Reply via email to