[Impala-ASF-CR] IMPALA-4063: Merge report of query fragment instances per executor

2018-11-05 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has submitted this change and it was merged. ( 
http://gerrit.cloudera.org:8080/11615 )

Change subject: IMPALA-4063: Merge report of query fragment instances per 
executor
..

IMPALA-4063: Merge report of query fragment instances per executor

Previously, each fragment instance executing on an executor will
independently report its status to the coordinator periodically.
This creates a huge amount of RPCs to the coordinator under highly
concurrent workloads, causing lock contention in the coordinator's
backend states when multiple fragment instances send them at the
same time. In addition, due to the lack of coordination between query
fragment instances, a query may end without collecting the profiles
from all fragment instances when one of them hits an error before
another fragment instance manages to finish Prepare(), leading to
missing profiles for certain fragment instances.

This change fixes the problem above by making a thread per QueryState
(started by QueryExecMgr) to be responsible for periodically reporting
the status and profiles of all fragment instances of a query running
on a backend. As part of this refactoring, each query fragment instance
will not report their errors individually. Instead, there is a cumulative
status maintained per QueryState. It's set to the error status of the first
fragment instance which hits an error or any general error (e.g. failure
to start a thread) when starting fragment instances. With this change,
the status reporting threads are also removed.

Testing done: exhaustive tests

This patch is based on a patch by Sailesh Mukil

Change-Id: I5f95e026ba05631f33f48ce32da6db39c6f421fa
Reviewed-on: http://gerrit.cloudera.org:8080/11615
Reviewed-by: Impala Public Jenkins 
Tested-by: Impala Public Jenkins 
---
M be/src/runtime/coordinator-backend-state.cc
M be/src/runtime/coordinator-backend-state.h
M be/src/runtime/coordinator.cc
M be/src/runtime/coordinator.h
M be/src/runtime/fragment-instance-state.cc
M be/src/runtime/fragment-instance-state.h
M be/src/runtime/query-exec-mgr.cc
M be/src/runtime/query-exec-mgr.h
M be/src/runtime/query-state.cc
M be/src/runtime/query-state.h
M be/src/service/client-request-state.cc
M be/src/service/client-request-state.h
M be/src/service/control-service.cc
M be/src/service/control-service.h
M common/protobuf/control_service.proto
M common/thrift/RuntimeProfile.thrift
M testdata/workloads/functional-query/queries/QueryTest/bloom_filters.test
A testdata/workloads/functional-query/queries/QueryTest/udf-no-expr-rewrite.test
D 
testdata/workloads/functional-query/queries/QueryTest/udf-non-deterministic.test
M testdata/workloads/functional-query/queries/QueryTest/udf.test
M tests/query_test/test_observability.py
M tests/query_test/test_udfs.py
22 files changed, 423 insertions(+), 475 deletions(-)

Approvals:
  Impala Public Jenkins: Looks good to me, approved; Verified

--
To view, visit http://gerrit.cloudera.org:8080/11615
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: I5f95e026ba05631f33f48ce32da6db39c6f421fa
Gerrit-Change-Number: 11615
Gerrit-PatchSet: 11
Gerrit-Owner: Michael Ho 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Thomas Marshall 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-4063: Merge report of query fragment instances per executor

2018-11-05 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/11615 )

Change subject: IMPALA-4063: Merge report of query fragment instances per 
executor
..


Patch Set 10: Verified+1


--
To view, visit http://gerrit.cloudera.org:8080/11615
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I5f95e026ba05631f33f48ce32da6db39c6f421fa
Gerrit-Change-Number: 11615
Gerrit-PatchSet: 10
Gerrit-Owner: Michael Ho 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Thomas Marshall 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Tue, 06 Nov 2018 01:01:06 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4063: Merge report of query fragment instances per executor

2018-11-05 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/11615 )

Change subject: IMPALA-4063: Merge report of query fragment instances per 
executor
..


Patch Set 10:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/3426/ 
DRY_RUN=false


--
To view, visit http://gerrit.cloudera.org:8080/11615
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I5f95e026ba05631f33f48ce32da6db39c6f421fa
Gerrit-Change-Number: 11615
Gerrit-PatchSet: 10
Gerrit-Owner: Michael Ho 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Thomas Marshall 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Mon, 05 Nov 2018 21:09:28 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4063: Merge report of query fragment instances per executor

2018-11-05 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/11615 )

Change subject: IMPALA-4063: Merge report of query fragment instances per 
executor
..


Patch Set 10: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/11615
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I5f95e026ba05631f33f48ce32da6db39c6f421fa
Gerrit-Change-Number: 11615
Gerrit-PatchSet: 10
Gerrit-Owner: Michael Ho 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Thomas Marshall 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Mon, 05 Nov 2018 21:09:27 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4063: Merge report of query fragment instances per executor

2018-11-05 Thread Thomas Marshall (Code Review)
Thomas Marshall has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/11615 )

Change subject: IMPALA-4063: Merge report of query fragment instances per 
executor
..


Patch Set 9: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/11615
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I5f95e026ba05631f33f48ce32da6db39c6f421fa
Gerrit-Change-Number: 11615
Gerrit-PatchSet: 9
Gerrit-Owner: Michael Ho 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Thomas Marshall 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Mon, 05 Nov 2018 19:43:03 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4063: Merge report of query fragment instances per executor

2018-11-05 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/11615 )

Change subject: IMPALA-4063: Merge report of query fragment instances per 
executor
..


Patch Set 9:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/1278/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/11615
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I5f95e026ba05631f33f48ce32da6db39c6f421fa
Gerrit-Change-Number: 11615
Gerrit-PatchSet: 9
Gerrit-Owner: Michael Ho 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Thomas Marshall 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Mon, 05 Nov 2018 19:39:43 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4063: Merge report of query fragment instances per executor

2018-11-05 Thread Michael Ho (Code Review)
Michael Ho has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/11615 )

Change subject: IMPALA-4063: Merge report of query fragment instances per 
executor
..


Patch Set 9:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/11615/8/testdata/workloads/functional-query/queries/QueryTest/udf-no-expr-rewrite.test
File 
testdata/workloads/functional-query/queries/QueryTest/udf-no-expr-rewrite.test:

http://gerrit.cloudera.org:8080/#/c/11615/8/testdata/workloads/functional-query/queries/QueryTest/udf-no-expr-rewrite.test@26
PS8, Line 26: #A fragment instance's last status report may be sent before 
calling Close() which is
> This in fact may be racy as the final profile may be sent before Close() is
So, workaround this flakiness in test by changing this to row_regex:.* for now.



--
To view, visit http://gerrit.cloudera.org:8080/11615
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I5f95e026ba05631f33f48ce32da6db39c6f421fa
Gerrit-Change-Number: 11615
Gerrit-PatchSet: 9
Gerrit-Owner: Michael Ho 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Thomas Marshall 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Mon, 05 Nov 2018 18:57:10 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4063: Merge report of query fragment instances per executor

2018-11-05 Thread Michael Ho (Code Review)
Hello Thomas Marshall, Tim Armstrong, Joe McDonnell, Bikramjeet Vig, Impala 
Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/11615

to look at the new patch set (#9).

Change subject: IMPALA-4063: Merge report of query fragment instances per 
executor
..

IMPALA-4063: Merge report of query fragment instances per executor

Previously, each fragment instance executing on an executor will
independently report its status to the coordinator periodically.
This creates a huge amount of RPCs to the coordinator under highly
concurrent workloads, causing lock contention in the coordinator's
backend states when multiple fragment instances send them at the
same time. In addition, due to the lack of coordination between query
fragment instances, a query may end without collecting the profiles
from all fragment instances when one of them hits an error before
another fragment instance manages to finish Prepare(), leading to
missing profiles for certain fragment instances.

This change fixes the problem above by making a thread per QueryState
(started by QueryExecMgr) to be responsible for periodically reporting
the status and profiles of all fragment instances of a query running
on a backend. As part of this refactoring, each query fragment instance
will not report their errors individually. Instead, there is a cumulative
status maintained per QueryState. It's set to the error status of the first
fragment instance which hits an error or any general error (e.g. failure
to start a thread) when starting fragment instances. With this change,
the status reporting threads are also removed.

Testing done: exhaustive tests

This patch is based on a patch by Sailesh Mukil

Change-Id: I5f95e026ba05631f33f48ce32da6db39c6f421fa
---
M be/src/runtime/coordinator-backend-state.cc
M be/src/runtime/coordinator-backend-state.h
M be/src/runtime/coordinator.cc
M be/src/runtime/coordinator.h
M be/src/runtime/fragment-instance-state.cc
M be/src/runtime/fragment-instance-state.h
M be/src/runtime/query-exec-mgr.cc
M be/src/runtime/query-exec-mgr.h
M be/src/runtime/query-state.cc
M be/src/runtime/query-state.h
M be/src/service/client-request-state.cc
M be/src/service/client-request-state.h
M be/src/service/control-service.cc
M be/src/service/control-service.h
M common/protobuf/control_service.proto
M common/thrift/RuntimeProfile.thrift
M testdata/workloads/functional-query/queries/QueryTest/bloom_filters.test
A testdata/workloads/functional-query/queries/QueryTest/udf-no-expr-rewrite.test
D 
testdata/workloads/functional-query/queries/QueryTest/udf-non-deterministic.test
M testdata/workloads/functional-query/queries/QueryTest/udf.test
M tests/query_test/test_observability.py
M tests/query_test/test_udfs.py
22 files changed, 423 insertions(+), 475 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/15/11615/9
--
To view, visit http://gerrit.cloudera.org:8080/11615
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I5f95e026ba05631f33f48ce32da6db39c6f421fa
Gerrit-Change-Number: 11615
Gerrit-PatchSet: 9
Gerrit-Owner: Michael Ho 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Thomas Marshall 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-4063: Merge report of query fragment instances per executor

2018-11-01 Thread Michael Ho (Code Review)
Michael Ho has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/11615 )

Change subject: IMPALA-4063: Merge report of query fragment instances per 
executor
..


Patch Set 8:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/11615/8/testdata/workloads/functional-query/queries/QueryTest/udf-no-expr-rewrite.test
File 
testdata/workloads/functional-query/queries/QueryTest/udf-no-expr-rewrite.test:

http://gerrit.cloudera.org:8080/#/c/11615/8/testdata/workloads/functional-query/queries/QueryTest/udf-no-expr-rewrite.test@26
PS8, Line 26: UDF WARNING: Memory leaked via FunctionContext::Allocate(), 100 
bytes leaked via FunctionContext::TrackAl
This in fact may be racy as the final profile may be sent before Close() is 
called on a Fragment Instance. So, this is not 100% deterministic.



--
To view, visit http://gerrit.cloudera.org:8080/11615
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I5f95e026ba05631f33f48ce32da6db39c6f421fa
Gerrit-Change-Number: 11615
Gerrit-PatchSet: 8
Gerrit-Owner: Michael Ho 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Thomas Marshall 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Fri, 02 Nov 2018 00:49:14 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4063: Merge report of query fragment instances per executor

2018-11-01 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/11615 )

Change subject: IMPALA-4063: Merge report of query fragment instances per 
executor
..


Patch Set 8:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/3405/ 
DRY_RUN=false


--
To view, visit http://gerrit.cloudera.org:8080/11615
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I5f95e026ba05631f33f48ce32da6db39c6f421fa
Gerrit-Change-Number: 11615
Gerrit-PatchSet: 8
Gerrit-Owner: Michael Ho 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Thomas Marshall 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Thu, 01 Nov 2018 21:15:03 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4063: Merge report of query fragment instances per executor

2018-11-01 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/11615 )

Change subject: IMPALA-4063: Merge report of query fragment instances per 
executor
..


Patch Set 8: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/11615
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I5f95e026ba05631f33f48ce32da6db39c6f421fa
Gerrit-Change-Number: 11615
Gerrit-PatchSet: 8
Gerrit-Owner: Michael Ho 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Thomas Marshall 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Thu, 01 Nov 2018 21:15:02 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4063: Merge report of query fragment instances per executor

2018-11-01 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/11615 )

Change subject: IMPALA-4063: Merge report of query fragment instances per 
executor
..


Patch Set 7:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/1248/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/11615
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I5f95e026ba05631f33f48ce32da6db39c6f421fa
Gerrit-Change-Number: 11615
Gerrit-PatchSet: 7
Gerrit-Owner: Michael Ho 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Thomas Marshall 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Thu, 01 Nov 2018 17:35:50 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4063: Merge report of query fragment instances per executor

2018-11-01 Thread Michael Ho (Code Review)
Michael Ho has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/11615 )

Change subject: IMPALA-4063: Merge report of query fragment instances per 
executor
..


Patch Set 7: Code-Review+2

Carry +2


--
To view, visit http://gerrit.cloudera.org:8080/11615
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I5f95e026ba05631f33f48ce32da6db39c6f421fa
Gerrit-Change-Number: 11615
Gerrit-PatchSet: 7
Gerrit-Owner: Michael Ho 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Thomas Marshall 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Thu, 01 Nov 2018 16:56:21 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4063: Merge report of query fragment instances per executor

2018-11-01 Thread Michael Ho (Code Review)
Michael Ho has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/11615 )

Change subject: IMPALA-4063: Merge report of query fragment instances per 
executor
..


Patch Set 5:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/11615/5//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/11615/5//COMMIT_MSG@14
PS5, Line 14: In addition, due to the lack of coordination between query
: fragment instances, a query may end without collecting the 
profiles
: from all fragment instances when one of them hits an error before
: another fragment instance manages to finish Prepare(), leading to
: missing profiles for certain fragment instances.
> Great. Could you also remove the xfail from test_profile_fragment_instances
Done


http://gerrit.cloudera.org:8080/#/c/11615/6/tests/query_test/test_udfs.py
File tests/query_test/test_udfs.py:

http://gerrit.cloudera.org:8080/#/c/11615/6/tests/query_test/test_udfs.py@290
PS6, Line 290:   self.run_test_case('QueryTest/udf-codegen-required', 
vector, use_db=unique_database)
> This comment is outdated, and below
Done



--
To view, visit http://gerrit.cloudera.org:8080/11615
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I5f95e026ba05631f33f48ce32da6db39c6f421fa
Gerrit-Change-Number: 11615
Gerrit-PatchSet: 5
Gerrit-Owner: Michael Ho 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Thomas Marshall 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Thu, 01 Nov 2018 16:55:45 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4063: Merge report of query fragment instances per executor

2018-11-01 Thread Michael Ho (Code Review)
Hello Thomas Marshall, Tim Armstrong, Joe McDonnell, Bikramjeet Vig, Impala 
Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/11615

to look at the new patch set (#7).

Change subject: IMPALA-4063: Merge report of query fragment instances per 
executor
..

IMPALA-4063: Merge report of query fragment instances per executor

Previously, each fragment instance executing on an executor will
independently report its status to the coordinator periodically.
This creates a huge amount of RPCs to the coordinator under highly
concurrent workloads, causing lock contention in the coordinator's
backend states when multiple fragment instances send them at the
same time. In addition, due to the lack of coordination between query
fragment instances, a query may end without collecting the profiles
from all fragment instances when one of them hits an error before
another fragment instance manages to finish Prepare(), leading to
missing profiles for certain fragment instances.

This change fixes the problem above by making a thread per QueryState
(started by QueryExecMgr) to be responsible for periodically reporting
the status and profiles of all fragment instances of a query running
on a backend. As part of this refactoring, each query fragment instance
will not report their errors individually. Instead, there is a cumulative
status maintained per QueryState. It's set to the error status of the first
fragment instance which hits an error or any general error (e.g. failure
to start a thread) when starting fragment instances. With this change,
the status reporting threads are also removed.

Testing done: exhaustive tests

This patch is based on a patch by Sailesh Mukil

Change-Id: I5f95e026ba05631f33f48ce32da6db39c6f421fa
---
M be/src/runtime/coordinator-backend-state.cc
M be/src/runtime/coordinator-backend-state.h
M be/src/runtime/coordinator.cc
M be/src/runtime/coordinator.h
M be/src/runtime/fragment-instance-state.cc
M be/src/runtime/fragment-instance-state.h
M be/src/runtime/query-exec-mgr.cc
M be/src/runtime/query-exec-mgr.h
M be/src/runtime/query-state.cc
M be/src/runtime/query-state.h
M be/src/service/client-request-state.cc
M be/src/service/client-request-state.h
M be/src/service/control-service.cc
M be/src/service/control-service.h
M common/protobuf/control_service.proto
M common/thrift/RuntimeProfile.thrift
M testdata/workloads/functional-query/queries/QueryTest/bloom_filters.test
A testdata/workloads/functional-query/queries/QueryTest/udf-no-expr-rewrite.test
D 
testdata/workloads/functional-query/queries/QueryTest/udf-non-deterministic.test
M testdata/workloads/functional-query/queries/QueryTest/udf.test
M tests/query_test/test_observability.py
M tests/query_test/test_udfs.py
22 files changed, 420 insertions(+), 475 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/15/11615/7
--
To view, visit http://gerrit.cloudera.org:8080/11615
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I5f95e026ba05631f33f48ce32da6db39c6f421fa
Gerrit-Change-Number: 11615
Gerrit-PatchSet: 7
Gerrit-Owner: Michael Ho 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Thomas Marshall 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-4063: Merge report of query fragment instances per executor

2018-10-25 Thread Thomas Marshall (Code Review)
Thomas Marshall has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/11615 )

Change subject: IMPALA-4063: Merge report of query fragment instances per 
executor
..


Patch Set 6: Code-Review+2

(2 comments)

http://gerrit.cloudera.org:8080/#/c/11615/5//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/11615/5//COMMIT_MSG@14
PS5, Line 14: In addition, due to the lack of coordination between query
: fragment instances, a query may end without collecting the 
profiles
: from all fragment instances when one of them hits an error before
: another fragment instance manages to finish Prepare(), leading to
: missing profiles for certain fragment instances.
> Yes. Uncommented some of the tests in bloom_filters.test.
Great. Could you also remove the xfail from test_profile_fragment_instances


http://gerrit.cloudera.org:8080/#/c/11615/6/tests/query_test/test_udfs.py
File tests/query_test/test_udfs.py:

http://gerrit.cloudera.org:8080/#/c/11615/6/tests/query_test/test_udfs.py@290
PS6, Line 290: # Some tests assume determinism or non-determinism, which 
depends on expr rewrites.
This comment is outdated, and below



--
To view, visit http://gerrit.cloudera.org:8080/11615
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I5f95e026ba05631f33f48ce32da6db39c6f421fa
Gerrit-Change-Number: 11615
Gerrit-PatchSet: 6
Gerrit-Owner: Michael Ho 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Thomas Marshall 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Thu, 25 Oct 2018 22:50:33 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4063: Merge report of query fragment instances per executor

2018-10-24 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/11615 )

Change subject: IMPALA-4063: Merge report of query fragment instances per 
executor
..


Patch Set 6:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/1150/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/11615
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I5f95e026ba05631f33f48ce32da6db39c6f421fa
Gerrit-Change-Number: 11615
Gerrit-PatchSet: 6
Gerrit-Owner: Michael Ho 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Thomas Marshall 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Thu, 25 Oct 2018 02:53:05 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4063: Merge report of query fragment instances per executor

2018-10-24 Thread Michael Ho (Code Review)
Michael Ho has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/11615 )

Change subject: IMPALA-4063: Merge report of query fragment instances per 
executor
..


Patch Set 6: Code-Review+1

Carry +1


--
To view, visit http://gerrit.cloudera.org:8080/11615
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I5f95e026ba05631f33f48ce32da6db39c6f421fa
Gerrit-Change-Number: 11615
Gerrit-PatchSet: 6
Gerrit-Owner: Michael Ho 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Thomas Marshall 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Thu, 25 Oct 2018 02:18:00 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4063: Merge report of query fragment instances per executor

2018-10-24 Thread Michael Ho (Code Review)
Michael Ho has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/11615 )

Change subject: IMPALA-4063: Merge report of query fragment instances per 
executor
..


Patch Set 1:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/11615/5//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/11615/5//COMMIT_MSG@14
PS5, Line 14: coordinator backend states. In addition, due to the lack
: of coordinator between query fragment instances, a query may end
: without collecting the profiles from all fragment instances when
: one of them hits an error before some other fragment instance 
manages
: to finish calling Prepare(), leading to missing
> Does this mean that we've fixed the problem in IMPALA-7148 and can re-enabl
Yes. Uncommented some of the tests in bloom_filters.test.


http://gerrit.cloudera.org:8080/#/c/11615/1/be/src/runtime/query-state.cc
File be/src/runtime/query-state.cc:

http://gerrit.cloudera.org:8080/#/c/11615/1/be/src/runtime/query-state.cc@236
PS1, Line 236: if (overall_status_.IsCancelled()) {
> Right, I was thinking that CANCELLED_INTERNALLY shouldn't leak out of fragm
Keep it as-is for now.


http://gerrit.cloudera.org:8080/#/c/11615/5/testdata/workloads/functional-query/queries/QueryTest/udf-non-deterministic.test
File 
testdata/workloads/functional-query/queries/QueryTest/udf-non-deterministic.test:

http://gerrit.cloudera.org:8080/#/c/11615/5/testdata/workloads/functional-query/queries/QueryTest/udf-non-deterministic.test@18
PS5, Line 18:  QUERY
> This test doesn't really have anything to do with nondeterminism. Would you
Done



--
To view, visit http://gerrit.cloudera.org:8080/11615
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I5f95e026ba05631f33f48ce32da6db39c6f421fa
Gerrit-Change-Number: 11615
Gerrit-PatchSet: 1
Gerrit-Owner: Michael Ho 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Thomas Marshall 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Thu, 25 Oct 2018 02:13:45 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4063: Merge report of query fragment instances per executor

2018-10-24 Thread Michael Ho (Code Review)
Hello Thomas Marshall, Tim Armstrong, Joe McDonnell, Bikramjeet Vig, Impala 
Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/11615

to look at the new patch set (#6).

Change subject: IMPALA-4063: Merge report of query fragment instances per 
executor
..

IMPALA-4063: Merge report of query fragment instances per executor

Previously, each fragment instance executing on an executor will
independently report its status to the coordinator periodically.
This creates a huge amount of RPCs to the coordinator under highly
concurrent workloads, causing lock contention in the coordinator's
backend states when multiple fragment instances send them at the
same time. In addition, due to the lack of coordination between query
fragment instances, a query may end without collecting the profiles
from all fragment instances when one of them hits an error before
another fragment instance manages to finish Prepare(), leading to
missing profiles for certain fragment instances.

This change fixes the problem above by making a thread per QueryState
(started by QueryExecMgr) to be responsible for periodically reporting
the status and profiles of all fragment instances of a query running
on a backend. As part of this refactoring, each query fragment instance
will not report their errors individually. Instead, there is a cumulative
status maintained per QueryState. It's set to the error status of the first
fragment instance which hits an error or any general error (e.g. failure
to start a thread) when starting fragment instances. With this change,
the status reporting threads are also removed.

Testing done: exhaustive tests

This patch is based on a patch by Sailesh Mukil

Change-Id: I5f95e026ba05631f33f48ce32da6db39c6f421fa
---
M be/src/runtime/coordinator-backend-state.cc
M be/src/runtime/coordinator-backend-state.h
M be/src/runtime/coordinator.cc
M be/src/runtime/coordinator.h
M be/src/runtime/fragment-instance-state.cc
M be/src/runtime/fragment-instance-state.h
M be/src/runtime/query-exec-mgr.cc
M be/src/runtime/query-exec-mgr.h
M be/src/runtime/query-state.cc
M be/src/runtime/query-state.h
M be/src/service/client-request-state.cc
M be/src/service/client-request-state.h
M be/src/service/control-service.cc
M be/src/service/control-service.h
M common/protobuf/control_service.proto
M common/thrift/RuntimeProfile.thrift
M testdata/workloads/functional-query/queries/QueryTest/bloom_filters.test
A testdata/workloads/functional-query/queries/QueryTest/udf-no-expr-rewrite.test
D 
testdata/workloads/functional-query/queries/QueryTest/udf-non-deterministic.test
M testdata/workloads/functional-query/queries/QueryTest/udf.test
M tests/query_test/test_udfs.py
21 files changed, 419 insertions(+), 473 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/15/11615/6
--
To view, visit http://gerrit.cloudera.org:8080/11615
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I5f95e026ba05631f33f48ce32da6db39c6f421fa
Gerrit-Change-Number: 11615
Gerrit-PatchSet: 6
Gerrit-Owner: Michael Ho 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Thomas Marshall 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-4063: Merge report of query fragment instances per executor

2018-10-22 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/11615 )

Change subject: IMPALA-4063: Merge report of query fragment instances per 
executor
..


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/11615/1/be/src/runtime/query-state.cc
File be/src/runtime/query-state.cc:

http://gerrit.cloudera.org:8080/#/c/11615/1/be/src/runtime/query-state.cc@236
PS1, Line 236: if (overall_status_.IsCancelled()) {
> Sorry, I don't quite get this comment. In either cases, won't the state sti
Right, I was thinking that CANCELLED_INTERNALLY shouldn't leak out of fragment 
instances so should really result in an ERROR status rather than CANCELLED. Ok 
not to change this now since it's preserving the current behaviour.



--
To view, visit http://gerrit.cloudera.org:8080/11615
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I5f95e026ba05631f33f48ce32da6db39c6f421fa
Gerrit-Change-Number: 11615
Gerrit-PatchSet: 1
Gerrit-Owner: Michael Ho 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Thomas Marshall 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Mon, 22 Oct 2018 18:32:24 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4063: Merge report of query fragment instances per executor

2018-10-18 Thread Thomas Marshall (Code Review)
Thomas Marshall has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/11615 )

Change subject: IMPALA-4063: Merge report of query fragment instances per 
executor
..


Patch Set 5:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/11615/5//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/11615/5//COMMIT_MSG@14
PS5, Line 14: In addition, due to the lack of coordination between query
: fragment instances, a query may end without collecting the 
profiles
: from all fragment instances when one of them hits an error before
: another fragment instance manages to finish Prepare(), leading to
: missing profiles for certain fragment instances.
Does this mean that we've fixed the problem in IMPALA-7148 and can re-enable 
those tests?

In particular, are we now guaranteed that in a query that completes 
successfully but has some fragments cancelled internally (eg. because of a 
LIMIT) the coordinator will receive and apply at least one profile from every 
fragment instance?


http://gerrit.cloudera.org:8080/#/c/11615/5/testdata/workloads/functional-query/queries/QueryTest/udf-non-deterministic.test
File 
testdata/workloads/functional-query/queries/QueryTest/udf-non-deterministic.test:

http://gerrit.cloudera.org:8080/#/c/11615/5/testdata/workloads/functional-query/queries/QueryTest/udf-non-deterministic.test@18
PS5, Line 18:  QUERY
This test doesn't really have anything to do with nondeterminism. Would you 
mind renaming this file to something like udf-no-expr-rewrites?



--
To view, visit http://gerrit.cloudera.org:8080/11615
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I5f95e026ba05631f33f48ce32da6db39c6f421fa
Gerrit-Change-Number: 11615
Gerrit-PatchSet: 5
Gerrit-Owner: Michael Ho 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Thomas Marshall 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Thu, 18 Oct 2018 22:56:52 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4063: Merge report of query fragment instances per executor

2018-10-16 Thread Michael Ho (Code Review)
Michael Ho has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/11615 )

Change subject: IMPALA-4063: Merge report of query fragment instances per 
executor
..


Patch Set 5:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/11615/5/be/src/runtime/query-state.cc
File be/src/runtime/query-state.cc:

http://gerrit.cloudera.org:8080/#/c/11615/5/be/src/runtime/query-state.cc@495
PS5, Line 495: error:
 :   // This point is reached if there were general errors to start 
query fragment instances.
 :   // Wait for all running fragment instances to finish preparing 
and report status to the
 :   // coordinator to start query cancellation.
> Since we are changing some error handling for thread creation, a way to tes
Thanks for pointing that out. We already have a test to exercise the thread 
creation failure here 
(https://github.com/apache/impala/blob/master/tests/failure/test_failpoints.py#L159-L173)
 but I think it's also a good idea to exercise the paths stress option you 
mentioned. Thanks for the suggestion.



--
To view, visit http://gerrit.cloudera.org:8080/11615
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I5f95e026ba05631f33f48ce32da6db39c6f421fa
Gerrit-Change-Number: 11615
Gerrit-PatchSet: 5
Gerrit-Owner: Michael Ho 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Thomas Marshall 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Wed, 17 Oct 2018 00:31:28 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4063: Merge report of query fragment instances per executor

2018-10-16 Thread Joe McDonnell (Code Review)
Joe McDonnell has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/11615 )

Change subject: IMPALA-4063: Merge report of query fragment instances per 
executor
..


Patch Set 5:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/11615/5/be/src/runtime/query-state.cc
File be/src/runtime/query-state.cc:

http://gerrit.cloudera.org:8080/#/c/11615/5/be/src/runtime/query-state.cc@495
PS5, Line 495: error:
 :   // This point is reached if there were general errors to start 
query fragment instances.
 :   // Wait for all running fragment instances to finish preparing 
and report status to the
 :   // coordinator to start query cancellation.
Since we are changing some error handling for thread creation, a way to test 
that is to run some tests with thread_creation_fault_injection=true and verify 
that nothing hangs/crashes.



--
To view, visit http://gerrit.cloudera.org:8080/11615
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I5f95e026ba05631f33f48ce32da6db39c6f421fa
Gerrit-Change-Number: 11615
Gerrit-PatchSet: 5
Gerrit-Owner: Michael Ho 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Thomas Marshall 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Tue, 16 Oct 2018 22:26:39 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4063: Merge report of query fragment instances per executor

2018-10-16 Thread Bikramjeet Vig (Code Review)
Bikramjeet Vig has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/11615 )

Change subject: IMPALA-4063: Merge report of query fragment instances per 
executor
..


Patch Set 5: Code-Review+1


--
To view, visit http://gerrit.cloudera.org:8080/11615
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I5f95e026ba05631f33f48ce32da6db39c6f421fa
Gerrit-Change-Number: 11615
Gerrit-PatchSet: 5
Gerrit-Owner: Michael Ho 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Thomas Marshall 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Tue, 16 Oct 2018 21:34:43 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4063: Merge report of query fragment instances per executor

2018-10-16 Thread Bikramjeet Vig (Code Review)
Bikramjeet Vig has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/11615 )

Change subject: IMPALA-4063: Merge report of query fragment instances per 
executor
..


Patch Set 5:

(3 comments)

as a followup to this jira, do we have any task that will look at how this will 
impact scaling up cluster sizes?

http://gerrit.cloudera.org:8080/#/c/11615/5/be/src/runtime/coordinator-backend-state.cc
File be/src/runtime/coordinator-backend-state.cc:

http://gerrit.cloudera.org:8080/#/c/11615/5/be/src/runtime/coordinator-backend-state.cc@283
PS5, Line 283:
nit: can you briefly mention that every instance's instanceExecStatus in 
backend_exec_status.instance_exec_status() and its thrift profile tree in the 
'TRuntimeProfileForest' are at the same index. A reader will not have to trace 
it back to QueryState::ConstructReport to make sure its true.

On that note, does it make sense to add the instance id to the 
TRuntimeProfileTree, so that we can add a dcheck here and catch any bug that 
might be caused due to any future changes to that code? Or will that be an 
overkill?


http://gerrit.cloudera.org:8080/#/c/11615/3/be/src/runtime/fragment-instance-state.cc
File be/src/runtime/fragment-instance-state.cc:

http://gerrit.cloudera.org:8080/#/c/11615/3/be/src/runtime/fragment-instance-state.cc@444
PS3, Line 444:   if (fragment_ctx_.fragment.output_sink.type != 
TDataSinkType::PLAN_ROOT_SINK) {
 : // if we haven't already release this thread token in 
Prepare(), release it now
 : ReleaseThreadToken();
 :   }
> This still seems like a conceptual step in the fragment life cycle so it ma
wfm


http://gerrit.cloudera.org:8080/#/c/11615/3/be/src/runtime/query-state.cc
File be/src/runtime/query-state.cc:

http://gerrit.cloudera.org:8080/#/c/11615/3/be/src/runtime/query-state.cc@397
PS3, Line 397: void QueryState::ErrorDuringExecute(const Status& status, const 
TUniqueId& finst_id) {
> It's okay to skip the racy call for now until evidence suggests otherwise.
wfm



--
To view, visit http://gerrit.cloudera.org:8080/11615
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I5f95e026ba05631f33f48ce32da6db39c6f421fa
Gerrit-Change-Number: 11615
Gerrit-PatchSet: 5
Gerrit-Owner: Michael Ho 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Thomas Marshall 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Tue, 16 Oct 2018 21:34:29 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4063: Merge report of query fragment instances per executor

2018-10-15 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/11615 )

Change subject: IMPALA-4063: Merge report of query fragment instances per 
executor
..


Patch Set 5:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/1052/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/11615
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I5f95e026ba05631f33f48ce32da6db39c6f421fa
Gerrit-Change-Number: 11615
Gerrit-PatchSet: 5
Gerrit-Owner: Michael Ho 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Thomas Marshall 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Tue, 16 Oct 2018 01:24:08 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4063: Merge report of query fragment instances per executor

2018-10-15 Thread Michael Ho (Code Review)
Michael Ho has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/11615 )

Change subject: IMPALA-4063: Merge report of query fragment instances per 
executor
..


Patch Set 5:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/11615/3/be/src/runtime/fragment-instance-state.cc
File be/src/runtime/fragment-instance-state.cc:

http://gerrit.cloudera.org:8080/#/c/11615/3/be/src/runtime/fragment-instance-state.cc@444
PS3, Line 444:   if (fragment_ctx_.fragment.output_sink.type != 
TDataSinkType::PLAN_ROOT_SINK) {
 : // if we haven't already release this thread token in 
Prepare(), release it now
 : ReleaseThreadToken();
 :   }
> this is called at only one site and now does only one thing, should we just
This still seems like a conceptual step in the fragment life cycle so it makes 
sense to encapsulate it as a function.


http://gerrit.cloudera.org:8080/#/c/11615/3/be/src/runtime/query-exec-mgr.h
File be/src/runtime/query-exec-mgr.h:

http://gerrit.cloudera.org:8080/#/c/11615/3/be/src/runtime/query-exec-mgr.h@78
PS3, Line 78: an instances hit
:   /// an error or
> nit: unless an instance hits an error they are cancelled
Done


http://gerrit.cloudera.org:8080/#/c/11615/3/be/src/runtime/query-state.cc
File be/src/runtime/query-state.cc:

http://gerrit.cloudera.org:8080/#/c/11615/3/be/src/runtime/query-state.cc@397
PS3, Line 397: void QueryState::ErrorDuringExecute(const Status& status, const 
TUniqueId& finst_id) {
> previous patch had a racy check on the status, was there any benefit that e
It's okay to skip the racy call for now until evidence suggests otherwise. It's 
not a big deal if multiple fragment threads have to block when setting the 
error status. We don't expect this to be a common case so it's better to keep 
the code simpler and less error prone.


http://gerrit.cloudera.org:8080/#/c/11615/3/be/src/runtime/query-state.cc@509
PS3, Line 509:   }
> nit: retain this comment here from previously at L506:
Done



--
To view, visit http://gerrit.cloudera.org:8080/11615
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I5f95e026ba05631f33f48ce32da6db39c6f421fa
Gerrit-Change-Number: 11615
Gerrit-PatchSet: 5
Gerrit-Owner: Michael Ho 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Thomas Marshall 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Tue, 16 Oct 2018 00:47:50 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4063: Merge report of query fragment instances per executor

2018-10-15 Thread Michael Ho (Code Review)
Hello Thomas Marshall, Tim Armstrong, Joe McDonnell, Bikramjeet Vig, Impala 
Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/11615

to look at the new patch set (#5).

Change subject: IMPALA-4063: Merge report of query fragment instances per 
executor
..

IMPALA-4063: Merge report of query fragment instances per executor

Previously, each fragment instance executing on an executor will
independently report its status to the coordinator periodically.
This creates a huge amount of RPCs to the coordinator under highly
concurrent workloads, causing lock contention in the coordinator's
backend states when multiple fragment instances send them at the
same time. In addition, due to the lack of coordination between query
fragment instances, a query may end without collecting the profiles
from all fragment instances when one of them hits an error before
another fragment instance manages to finish Prepare(), leading to
missing profiles for certain fragment instances.

This change fixes the problem above by making a thread per QueryState
(started by QueryExecMgr) to be responsible for periodically reporting
the status and profiles of all fragment instances of a query running
on a backend. As part of this refactoring, each query fragment instance
will not report their errors individually. Instead, there is a cumulative
status maintained per QueryState. It's set to the error status of the first
fragment instance which hits an error or any general error (e.g. failure
to start a thread) when starting fragment instances. With this change,
the status reporting threads are also removed.

Testing done: exhaustive tests

This patch is based on a patch by Sailesh Mukil

Change-Id: I5f95e026ba05631f33f48ce32da6db39c6f421fa
---
M be/src/runtime/coordinator-backend-state.cc
M be/src/runtime/coordinator-backend-state.h
M be/src/runtime/coordinator.cc
M be/src/runtime/coordinator.h
M be/src/runtime/fragment-instance-state.cc
M be/src/runtime/fragment-instance-state.h
M be/src/runtime/query-exec-mgr.cc
M be/src/runtime/query-exec-mgr.h
M be/src/runtime/query-state.cc
M be/src/runtime/query-state.h
M be/src/service/client-request-state.cc
M be/src/service/client-request-state.h
M be/src/service/control-service.cc
M be/src/service/control-service.h
M common/protobuf/control_service.proto
M common/thrift/RuntimeProfile.thrift
M 
testdata/workloads/functional-query/queries/QueryTest/udf-non-deterministic.test
M testdata/workloads/functional-query/queries/QueryTest/udf.test
18 files changed, 389 insertions(+), 436 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/15/11615/5
--
To view, visit http://gerrit.cloudera.org:8080/11615
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I5f95e026ba05631f33f48ce32da6db39c6f421fa
Gerrit-Change-Number: 11615
Gerrit-PatchSet: 5
Gerrit-Owner: Michael Ho 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Thomas Marshall 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-4063: Merge report of query fragment instances per executor

2018-10-13 Thread Bikramjeet Vig (Code Review)
Bikramjeet Vig has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/11615 )

Change subject: IMPALA-4063: Merge report of query fragment instances per 
executor
..


Patch Set 4:

(5 comments)

http://gerrit.cloudera.org:8080/#/c/11615/3//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/11615/3//COMMIT_MSG@13
PS3, Line 13: backend states as each fragment instance tries to them at the same
nit:send


http://gerrit.cloudera.org:8080/#/c/11615/3/be/src/runtime/fragment-instance-state.cc
File be/src/runtime/fragment-instance-state.cc:

http://gerrit.cloudera.org:8080/#/c/11615/3/be/src/runtime/fragment-instance-state.cc@444
PS3, Line 444:   if (fragment_ctx_.fragment.output_sink.type != 
TDataSinkType::PLAN_ROOT_SINK) {
 : // if we haven't already release this thread token in 
Prepare(), release it now
 : ReleaseThreadToken();
 :   }
this is called at only one site and now does only one thing, should we just get 
rid of this method ?


http://gerrit.cloudera.org:8080/#/c/11615/3/be/src/runtime/query-exec-mgr.h
File be/src/runtime/query-exec-mgr.h:

http://gerrit.cloudera.org:8080/#/c/11615/3/be/src/runtime/query-exec-mgr.h@78
PS3, Line 78: some instances hit
:   /// some errors
nit: unless an instance hits an error they are cancelled


http://gerrit.cloudera.org:8080/#/c/11615/3/be/src/runtime/query-state.cc
File be/src/runtime/query-state.cc:

http://gerrit.cloudera.org:8080/#/c/11615/3/be/src/runtime/query-state.cc@397
PS3, Line 397: void QueryState::ErrorDuringExecute(const Status& status, const 
TUniqueId& finst_id) {
previous patch had a racy check on the status, was there any benefit that 
earlier?
If this also called by every fragment when they are cancelled, then it might be 
worth retaining the racy call


http://gerrit.cloudera.org:8080/#/c/11615/3/be/src/runtime/query-state.cc@509
PS3, Line 509:   }
nit: retain this comment here from previously at L506:
 // Block until all the already started fragment instances finish Prepare()-ing 
to
// to transition to the next state and finally report an error.



--
To view, visit http://gerrit.cloudera.org:8080/11615
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I5f95e026ba05631f33f48ce32da6db39c6f421fa
Gerrit-Change-Number: 11615
Gerrit-PatchSet: 4
Gerrit-Owner: Michael Ho 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Thomas Marshall 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Sun, 14 Oct 2018 03:22:30 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4063: Merge report of query fragment instances per executor

2018-10-13 Thread Michael Ho (Code Review)
Michael Ho has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/11615 )

Change subject: IMPALA-4063: Merge report of query fragment instances per 
executor
..


Patch Set 4:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/11615/2/be/src/runtime/query-state.cc
File be/src/runtime/query-state.cc:

http://gerrit.cloudera.org:8080/#/c/11615/2/be/src/runtime/query-state.cc@243
PS2, Line 243: backend_exec_state_ = cur_state == BackendExecState::PREPARING ?
 :   BackendExecState::EXECUTING : 
BackendExecState::FINISHED;
> Sorry, to be clearer - I was suggesting leaving the logic around 'overall_s
There is one already at line 523.



--
To view, visit http://gerrit.cloudera.org:8080/11615
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I5f95e026ba05631f33f48ce32da6db39c6f421fa
Gerrit-Change-Number: 11615
Gerrit-PatchSet: 4
Gerrit-Owner: Michael Ho 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Thomas Marshall 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Sun, 14 Oct 2018 02:21:41 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4063: Merge report of query fragment instances per executor

2018-10-13 Thread Michael Ho (Code Review)
Hello Thomas Marshall, Tim Armstrong, Joe McDonnell, Bikramjeet Vig, Impala 
Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/11615

to look at the new patch set (#4).

Change subject: IMPALA-4063: Merge report of query fragment instances per 
executor
..

IMPALA-4063: Merge report of query fragment instances per executor

Previously, each fragment instance executing on an executor will
independently report its status to the coordinator periodically.
This creates a huge amount of RPCs to the coordinator under highly
concurrent workloads, causing lock contention in the coordinator's
backend states as each fragment instance tries to them at the same
time. In addition, due to the lack of coordination between query
fragment instances, a query may end without collecting the profiles
from all fragment instances when one of them hits an error before
another fragment instance manages to finish calling Prepare(), leading
to missing profiles for certain fragment instances.

This change fixes the problem above by making a thread per QueryState
(started by QueryExecMgr) to be responsible for periodically reporting
the status and profiles of all fragment instances of a query running
on a backend. As part of this refactoring, each query fragment instance
will not report their errors individually. Instead, there is a cumulative
status maintained per QueryState. It's set to the error status of the first
fragment instance which hits an error or any general error (e.g. failure
to start a thread) when starting fragment instances. With this change,
the status reporting threads are also removed.

Testing done: exhaustive tests

This patch is based on a patch by Sailesh Mukil

Change-Id: I5f95e026ba05631f33f48ce32da6db39c6f421fa
---
M be/src/runtime/coordinator-backend-state.cc
M be/src/runtime/coordinator-backend-state.h
M be/src/runtime/coordinator.cc
M be/src/runtime/coordinator.h
M be/src/runtime/fragment-instance-state.cc
M be/src/runtime/fragment-instance-state.h
M be/src/runtime/query-exec-mgr.cc
M be/src/runtime/query-exec-mgr.h
M be/src/runtime/query-state.cc
M be/src/runtime/query-state.h
M be/src/service/client-request-state.cc
M be/src/service/client-request-state.h
M be/src/service/control-service.cc
M be/src/service/control-service.h
M common/protobuf/control_service.proto
M common/thrift/RuntimeProfile.thrift
M 
testdata/workloads/functional-query/queries/QueryTest/udf-non-deterministic.test
M testdata/workloads/functional-query/queries/QueryTest/udf.test
18 files changed, 387 insertions(+), 436 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/15/11615/4
--
To view, visit http://gerrit.cloudera.org:8080/11615
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I5f95e026ba05631f33f48ce32da6db39c6f421fa
Gerrit-Change-Number: 11615
Gerrit-PatchSet: 4
Gerrit-Owner: Michael Ho 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Thomas Marshall 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-4063: Merge report of query fragment instances per executor

2018-10-10 Thread Thomas Marshall (Code Review)
Thomas Marshall has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/11615 )

Change subject: IMPALA-4063: Merge report of query fragment instances per 
executor
..


Patch Set 3:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/11615/2/be/src/runtime/query-state.cc
File be/src/runtime/query-state.cc:

http://gerrit.cloudera.org:8080/#/c/11615/2/be/src/runtime/query-state.cc@243
PS2, Line 243: backend_exec_state_ = cur_state == BackendExecState::PREPARING ?
 :   BackendExecState::EXECUTING : 
BackendExecState::FINISHED;
> The state transition logic is determined by 'overall_status_'.  So, making
Sorry, to be clearer - I was suggesting leaving the logic around 
'overall_state_' here and just changing the contents of this 'else' branch.

My thinking was that for a particular call site of UpdateBackendExecState() we 
would know what state we expect to transition to if overall_state_ is OK, eg. 
the call on line 518 would just take EXECUTING as a param, which would help 
ensure we don't accidentally call UpdateBackendExecState() twice for the 
PREPARE->EXECUTING transition. Looking more closely, though, I see that this 
doesn't make sense for the call at line 510.

I think its fine as is, no need to significantly rewrite this. Maybe just add a 
DCHECK at line 520 that the state is EXECUTING.



--
To view, visit http://gerrit.cloudera.org:8080/11615
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I5f95e026ba05631f33f48ce32da6db39c6f421fa
Gerrit-Change-Number: 11615
Gerrit-PatchSet: 3
Gerrit-Owner: Michael Ho 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Thomas Marshall 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Wed, 10 Oct 2018 22:51:15 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4063: Merge report of query fragment instances per executor

2018-10-10 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/11615 )

Change subject: IMPALA-4063: Merge report of query fragment instances per 
executor
..


Patch Set 3:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/1017/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/11615
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I5f95e026ba05631f33f48ce32da6db39c6f421fa
Gerrit-Change-Number: 11615
Gerrit-PatchSet: 3
Gerrit-Owner: Michael Ho 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Thomas Marshall 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Wed, 10 Oct 2018 22:32:01 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4063: Merge report of query fragment instances per executor

2018-10-10 Thread Michael Ho (Code Review)
Michael Ho has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/11615 )

Change subject: IMPALA-4063: Merge report of query fragment instances per 
executor
..


Patch Set 3:

(5 comments)

http://gerrit.cloudera.org:8080/#/c/11615/2/be/src/runtime/query-state.h
File be/src/runtime/query-state.h:

http://gerrit.cloudera.org:8080/#/c/11615/2/be/src/runtime/query-state.h@185
PS2, Line 185: status reports periodically
 :   /// to
> nit: status reports periodically to the
Done


http://gerrit.cloudera.org:8080/#/c/11615/2/be/src/runtime/query-state.h@428
PS2, Line 428: Should only be called when
 :   /// the current state is a non-terminal state. T
> A little confusing - its required to currently be in a non-terminal state w
Done


http://gerrit.cloudera.org:8080/#/c/11615/2/be/src/runtime/query-state.cc
File be/src/runtime/query-state.cc:

http://gerrit.cloudera.org:8080/#/c/11615/2/be/src/runtime/query-state.cc@227
PS2, Line 227: {
> any reason to do this here, when its only used within the lock below?
Done


http://gerrit.cloudera.org:8080/#/c/11615/2/be/src/runtime/query-state.cc@243
PS2, Line 243: backend_exec_state_ = cur_state == BackendExecState::PREPARING ?
 :   BackendExecState::EXECUTING : 
BackendExecState::FINISHED;
> This feels a little bit weird - would it work to make this more explicit by
The state transition logic is determined by 'overall_status_'.  So, making 
UpdateBackendExecState() take a param of the next state means moving the logic 
encoded in the if-stmt above elsewhere and UpdateBackendExecState() simply 
becomes a setter of backend_exec_state_ with some DCHECKs(). I can see how it 
may be slightly clearer for the transition from PREPARING state to the next 
legal state but it may make the code slightly more hairy at the multiple call 
sites.

Will it be clearer if we update the DCHECK at line 231 to make it more explicit 
which valid states we could be at so it's easier to reason about the state 
transition logic ?


http://gerrit.cloudera.org:8080/#/c/11615/2/common/protobuf/control_service.proto
File common/protobuf/control_service.proto:

http://gerrit.cloudera.org:8080/#/c/11615/2/common/protobuf/control_service.proto@150
PS2, Line 150: // The fragment instance id of the first failed fragment 
instance.
> Maybe worth making it more explicit that the reason we chose the 'first' he
It's not set for "general" error. Comments clarified to update its relationship 
with "overall_status".



--
To view, visit http://gerrit.cloudera.org:8080/11615
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I5f95e026ba05631f33f48ce32da6db39c6f421fa
Gerrit-Change-Number: 11615
Gerrit-PatchSet: 3
Gerrit-Owner: Michael Ho 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Thomas Marshall 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Wed, 10 Oct 2018 21:57:22 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4063: Merge report of query fragment instances per executor

2018-10-10 Thread Thomas Marshall (Code Review)
Thomas Marshall has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/11615 )

Change subject: IMPALA-4063: Merge report of query fragment instances per 
executor
..


Patch Set 2:

(5 comments)

http://gerrit.cloudera.org:8080/#/c/11615/2/be/src/runtime/query-state.h
File be/src/runtime/query-state.h:

http://gerrit.cloudera.org:8080/#/c/11615/2/be/src/runtime/query-state.h@185
PS2, Line 185: status report periodically
 :   /// the
nit: status reports periodically to the


http://gerrit.cloudera.org:8080/#/c/11615/2/be/src/runtime/query-state.h@428
PS2, Line 428: A state transition happens
 :   /// if the current state is a non-terminal state
A little confusing - its required to currently be in a non-terminal state when 
this is called.


http://gerrit.cloudera.org:8080/#/c/11615/2/be/src/runtime/query-state.cc
File be/src/runtime/query-state.cc:

http://gerrit.cloudera.org:8080/#/c/11615/2/be/src/runtime/query-state.cc@227
PS2, Line 227: BackendExecState old_state = backend_exec_state_;
any reason to do this here, when its only used within the lock below?


http://gerrit.cloudera.org:8080/#/c/11615/2/be/src/runtime/query-state.cc@243
PS2, Line 243: backend_exec_state_ = old_state == BackendExecState::PREPARING ?
 :   BackendExecState::EXECUTING : 
BackendExecState::FINISHED;
This feels a little bit weird - would it work to make this more explicit by 
having UpdateBackendExecState take a param of the state to transition to, and 
then DCHECK-ing that the param is the next state in the lifecycle?


http://gerrit.cloudera.org:8080/#/c/11615/2/common/protobuf/control_service.proto
File common/protobuf/control_service.proto:

http://gerrit.cloudera.org:8080/#/c/11615/2/common/protobuf/control_service.proto@150
PS2, Line 150: // The fragment instance id of the first failed fragment 
instance.
Maybe worth making it more explicit that the reason we chose the 'first' here 
is that it corresponds to the value of 'overall_status'

Also, what's the expected value here if 'overall_status' is for a 
non-fragment-specific "general" error?



--
To view, visit http://gerrit.cloudera.org:8080/11615
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I5f95e026ba05631f33f48ce32da6db39c6f421fa
Gerrit-Change-Number: 11615
Gerrit-PatchSet: 2
Gerrit-Owner: Michael Ho 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Thomas Marshall 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Wed, 10 Oct 2018 21:04:19 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4063: Merge report of query fragment instances per executor

2018-10-10 Thread Michael Ho (Code Review)
Michael Ho has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/11615 )

Change subject: IMPALA-4063: Merge report of query fragment instances per 
executor
..


Patch Set 2:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/11615/2/common/protobuf/control_service.proto
File common/protobuf/control_service.proto:

http://gerrit.cloudera.org:8080/#/c/11615/2/common/protobuf/control_service.proto@113
PS2, Line 113: int32
int64. Will update in PS3.



--
To view, visit http://gerrit.cloudera.org:8080/11615
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I5f95e026ba05631f33f48ce32da6db39c6f421fa
Gerrit-Change-Number: 11615
Gerrit-PatchSet: 2
Gerrit-Owner: Michael Ho 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Thomas Marshall 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Wed, 10 Oct 2018 18:21:24 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4063: Merge report of query fragment instances per executor

2018-10-10 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/11615 )

Change subject: IMPALA-4063: Merge report of query fragment instances per 
executor
..


Patch Set 2:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/1007/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/11615
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I5f95e026ba05631f33f48ce32da6db39c6f421fa
Gerrit-Change-Number: 11615
Gerrit-PatchSet: 2
Gerrit-Owner: Michael Ho 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Thomas Marshall 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Wed, 10 Oct 2018 08:54:18 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4063: Merge report of query fragment instances per executor

2018-10-10 Thread Michael Ho (Code Review)
Hello Thomas Marshall, Tim Armstrong, Bikramjeet Vig, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/11615

to look at the new patch set (#2).

Change subject: IMPALA-4063: Merge report of query fragment instances per 
executor
..

IMPALA-4063: Merge report of query fragment instances per executor

Previously, each fragment instance executing on an executor will
independently report its status to the coordinator periodically.
This creates a huge amount of RPCs to the coordinator under highly
concurrent workloads, causing lock contention in the coordinator's
backend states as each fragment instance tries to them at the same
time. In addition, due to the lack of coordination between query
fragment instances, a query may end without collecting the profiles
from all fragment instances when one of them hits an error before
another fragment instance manages to finish calling Prepare(), leading
to missing profiles for certain fragment instances.

This change fixes the problem above by making a thread per QueryState
(started by QueryExecMgr) to be responsible for periodically reporting
the status and profiles of all fragment instances of a query running
on a backend. As part of this refactoring, each query fragment instance
will not report their errors individually. Instead, there is a cumulative
status maintained per QueryState. It's set to the error status of the first
fragment instance which hits an error or any general error (e.g. failure
to start a thread) when starting fragment instances. With this change,
the status reporting threads are also removed.

Testing done: exhaustive tests

This patch is based on a patch by Sailesh Mukil

Change-Id: I5f95e026ba05631f33f48ce32da6db39c6f421fa
---
M be/src/runtime/coordinator-backend-state.cc
M be/src/runtime/coordinator-backend-state.h
M be/src/runtime/coordinator.cc
M be/src/runtime/coordinator.h
M be/src/runtime/fragment-instance-state.cc
M be/src/runtime/fragment-instance-state.h
M be/src/runtime/query-exec-mgr.cc
M be/src/runtime/query-exec-mgr.h
M be/src/runtime/query-state.cc
M be/src/runtime/query-state.h
M be/src/service/client-request-state.cc
M be/src/service/client-request-state.h
M be/src/service/control-service.cc
M common/protobuf/control_service.proto
M common/thrift/RuntimeProfile.thrift
M 
testdata/workloads/functional-query/queries/QueryTest/udf-non-deterministic.test
M testdata/workloads/functional-query/queries/QueryTest/udf.test
17 files changed, 375 insertions(+), 426 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/15/11615/2
--
To view, visit http://gerrit.cloudera.org:8080/11615
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I5f95e026ba05631f33f48ce32da6db39c6f421fa
Gerrit-Change-Number: 11615
Gerrit-PatchSet: 2
Gerrit-Owner: Michael Ho 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Thomas Marshall 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-4063: Merge report of query fragment instances per executor

2018-10-10 Thread Michael Ho (Code Review)
Michael Ho has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/11615 )

Change subject: IMPALA-4063: Merge report of query fragment instances per 
executor
..


Patch Set 1:

(11 comments)

http://gerrit.cloudera.org:8080/#/c/11615/1/be/src/runtime/fragment-instance-state.h
File be/src/runtime/fragment-instance-state.h:

http://gerrit.cloudera.org:8080/#/c/11615/1/be/src/runtime/fragment-instance-state.h@93
PS1, Line 93:   /// Called periodically to get the current status of this 
fragment instance.
> Since we're describing the usage pattern, might as well mention which threa
Done


http://gerrit.cloudera.org:8080/#/c/11615/1/be/src/runtime/fragment-instance-state.h@159
PS1, Line 159:   bool final_report_sent_ = false;
> What's the thread safety story here?
Added some comments. This is exclusively accessed by the query state thread 
only.


http://gerrit.cloudera.org:8080/#/c/11615/1/be/src/runtime/fragment-instance-state.cc
File be/src/runtime/fragment-instance-state.cc:

http://gerrit.cloudera.org:8080/#/c/11615/1/be/src/runtime/fragment-instance-state.cc@97
PS1, Line 97: intance
> instance
Done


http://gerrit.cloudera.org:8080/#/c/11615/1/be/src/runtime/fragment-instance-state.cc@239
PS1, Line 239:
> nit: unnecessary blank line
Done


http://gerrit.cloudera.org:8080/#/c/11615/1/be/src/runtime/fragment-instance-state.cc@243
PS1, Line 243: void 
FragmentInstanceState::GetStatusReport(FragmentInstanceExecStatusPB* 
instance_status,
> nit: could probably reduce vertical whitespace in this function.
Done


http://gerrit.cloudera.org:8080/#/c/11615/1/be/src/runtime/query-exec-mgr.cc
File be/src/runtime/query-exec-mgr.cc:

http://gerrit.cloudera.org:8080/#/c/11615/1/be/src/runtime/query-exec-mgr.cc@128
PS1, Line 128: void QueryExecMgr::StartQueryHelper(QueryState* qs) {
> I feel like we should rename this function since it now runs until the quer
Done


http://gerrit.cloudera.org:8080/#/c/11615/1/be/src/runtime/query-state.h
File be/src/runtime/query-state.h:

http://gerrit.cloudera.org:8080/#/c/11615/1/be/src/runtime/query-state.h@353
PS1, Line 353:   /// 'overall_status_' will be set once this is unblocked and 
so will 'failed_instance_id_'
> line too long (92 > 90)
Done


http://gerrit.cloudera.org:8080/#/c/11615/1/be/src/runtime/query-state.h@418
PS1, Line 418:   bool IsStatusSet() const {
> Maybe something like "HasErrorStatus()". It's confusing that setting overal
Done


http://gerrit.cloudera.org:8080/#/c/11615/1/be/src/runtime/query-state.h@419
PS1, Line 419: return !overall_status_.ok() && 
!overall_status_.IsCancelled();
> This is called without holding status_lock_, right? I don't think IsCancell
Good point. I think the optimization doesn't really matter that much given it's 
only exercised in the error path. It's okay to have a minor bit of lock 
contention if multiple fragment instances hit errors at the same time.


http://gerrit.cloudera.org:8080/#/c/11615/1/be/src/runtime/query-state.cc
File be/src/runtime/query-state.cc:

http://gerrit.cloudera.org:8080/#/c/11615/1/be/src/runtime/query-state.cc@a553
PS1, Line 553:
 :
The deletion of this line warrants some explanation:

In the old code, if there is any error returned from fis->Exec(), the final 
report for "fis" would have been sent already so the coordinator will already 
record the failure of "fis" before the cancellation is issued.

In the new code, the reporting is done periodically by the query state thread 
so the final report may or may not be sent at this point after "fis" hit an 
error above. Calling Cancel() here may actually cause the coordinator fragment 
(if there is one) to prematurely set the backend status at the coordinator, 
causing Coordinator::BackendState::ApplyExecStatusReport() to ignore subsequent 
updates from "fis".  As a result, the actual error status was masked.

The implicit contract in the code (even before this change) is that the final 
report of the first fragment instance to hit an (non-cancellation) error must 
have been sent before initiating the cancellation. Will add a comment here.


http://gerrit.cloudera.org:8080/#/c/11615/1/be/src/runtime/query-state.cc@236
PS1, Line 236: if (overall_status_.IsCancelled()) {
> Can we refine this with the internal/client-driven cancellation distinction
Sorry, I don't quite get this comment. In either cases, won't the state still 
transit to BackendExecState::CANCELLED ?



--
To view, visit http://gerrit.cloudera.org:8080/11615
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I5f95e026ba05631f33f48ce32da6db39c6f421fa
Gerrit-Change-Number: 11615
Gerrit-PatchSet: 1
Gerrit-Owner: Michael Ho 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Thomas Marshall 
Gerrit-Reviewer: Tim Armstrong 

[Impala-ASF-CR] IMPALA-4063: Merge report of query fragment instances per executor

2018-10-08 Thread Michael Ho (Code Review)
Michael Ho has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/11615 )

Change subject: IMPALA-4063: Merge report of query fragment instances per 
executor
..


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/11615/1/be/src/runtime/coordinator-backend-state.cc
File be/src/runtime/coordinator-backend-state.cc:

http://gerrit.cloudera.org:8080/#/c/11615/1/be/src/runtime/coordinator-backend-state.cc@277
PS1, Line 277:   lock_guard l1(exec_summary->lock);
> Isn't this just reverting the change that the previous patch made?
Yes. Will revert that change in the previous patch and add a comment instead.



--
To view, visit http://gerrit.cloudera.org:8080/11615
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I5f95e026ba05631f33f48ce32da6db39c6f421fa
Gerrit-Change-Number: 11615
Gerrit-PatchSet: 1
Gerrit-Owner: Michael Ho 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Thomas Marshall 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Tue, 09 Oct 2018 01:21:22 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4063: Merge report of query fragment instances per executor

2018-10-08 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/11615 )

Change subject: IMPALA-4063: Merge report of query fragment instances per 
executor
..


Patch Set 1:

(11 comments)

Had a few comments. Overall direction of the patch makes sense to me

http://gerrit.cloudera.org:8080/#/c/11615/1/be/src/runtime/coordinator-backend-state.cc
File be/src/runtime/coordinator-backend-state.cc:

http://gerrit.cloudera.org:8080/#/c/11615/1/be/src/runtime/coordinator-backend-state.cc@277
PS1, Line 277:   lock_guard l1(exec_summary->lock);
Isn't this just reverting the change that the previous patch made?


http://gerrit.cloudera.org:8080/#/c/11615/1/be/src/runtime/fragment-instance-state.h
File be/src/runtime/fragment-instance-state.h:

http://gerrit.cloudera.org:8080/#/c/11615/1/be/src/runtime/fragment-instance-state.h@93
PS1, Line 93:   /// Called periodically to get the current status of this 
fragment instance.
Since we're describing the usage pattern, might as well mention which thread is 
calling it.


http://gerrit.cloudera.org:8080/#/c/11615/1/be/src/runtime/fragment-instance-state.h@159
PS1, Line 159:   bool final_report_sent_ = false;
What's the thread safety story here?


http://gerrit.cloudera.org:8080/#/c/11615/1/be/src/runtime/fragment-instance-state.cc
File be/src/runtime/fragment-instance-state.cc:

http://gerrit.cloudera.org:8080/#/c/11615/1/be/src/runtime/fragment-instance-state.cc@97
PS1, Line 97: intance
instance


http://gerrit.cloudera.org:8080/#/c/11615/1/be/src/runtime/fragment-instance-state.cc@239
PS1, Line 239:
nit: unnecessary blank line


http://gerrit.cloudera.org:8080/#/c/11615/1/be/src/runtime/fragment-instance-state.cc@243
PS1, Line 243: void 
FragmentInstanceState::GetStatusReport(FragmentInstanceExecStatusPB* 
instance_status,
nit: could probably reduce vertical whitespace in this function.


http://gerrit.cloudera.org:8080/#/c/11615/1/be/src/runtime/query-exec-mgr.cc
File be/src/runtime/query-exec-mgr.cc:

http://gerrit.cloudera.org:8080/#/c/11615/1/be/src/runtime/query-exec-mgr.cc@128
PS1, Line 128: void QueryExecMgr::StartQueryHelper(QueryState* qs) {
I feel like we should rename this function since it now runs until the query 
finishes on this backend, e.g. ExecuteQueryHelper()


http://gerrit.cloudera.org:8080/#/c/11615/1/be/src/runtime/query-state.h
File be/src/runtime/query-state.h:

http://gerrit.cloudera.org:8080/#/c/11615/1/be/src/runtime/query-state.h@91
PS1, Line 91: /// the final report (it will not be overridden by the resulting 
cancellation).
Is there anything short we can say here to clarify the relationship with the 
overall state on the coordinator?


http://gerrit.cloudera.org:8080/#/c/11615/1/be/src/runtime/query-state.h@418
PS1, Line 418:   bool IsStatusSet() const {
Maybe something like "HasErrorStatus()". It's confusing that setting 
overall_status_ to cancelled counts as "not set".


http://gerrit.cloudera.org:8080/#/c/11615/1/be/src/runtime/query-state.h@419
PS1, Line 419: return !overall_status_.ok() && 
!overall_status_.IsCancelled();
This is called without holding status_lock_, right? I don't think IsCancelled() 
is thread-safe? Since we could could overwrite msg_ with an error status from a 
different thread and free the previous msg_ object.

I don't really like this pattern of doing unlocked reads to Status objects 
given that the interface of status is not documented as thread-safe - I get 
that .ok() works since Status is implemented as a pointer and we're on x86 but 
I think we need to clarify the interfaces - feels like it wouldn't take much 
for someone to invalidate one of the assumptions made by accident.

I wonder if the optimisation of deferring acquisition of the lock really 
matters. If we think it's necessary, it seems like we should document which 
Status methods are threadsafe, and change the implementation to use atomics so 
that the memory ordering assumptions are explicit.


http://gerrit.cloudera.org:8080/#/c/11615/1/be/src/runtime/query-state.cc
File be/src/runtime/query-state.cc:

http://gerrit.cloudera.org:8080/#/c/11615/1/be/src/runtime/query-state.cc@236
PS1, Line 236: if (overall_status_.IsCancelled()) {
Can we refine this with the internal/client-driven cancellation distinction 
that was added here https://gerrit.cloudera.org/#/c/11464/



--
To view, visit http://gerrit.cloudera.org:8080/11615
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I5f95e026ba05631f33f48ce32da6db39c6f421fa
Gerrit-Change-Number: 11615
Gerrit-PatchSet: 1
Gerrit-Owner: Michael Ho 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Thomas Marshall 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Mon, 08 Oct 2018 22:56:38 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4063: Merge report of query fragment instances per executor

2018-10-08 Thread Michael Ho (Code Review)
Michael Ho has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/11615 )

Change subject: IMPALA-4063: Merge report of query fragment instances per 
executor
..


Patch Set 1:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/11615/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/11615/1//COMMIT_MSG@12
PS1, Line 12: conurrent
concurrent


http://gerrit.cloudera.org:8080/#/c/11615/1//COMMIT_MSG@15
PS1, Line 15: coordinator
coordination



--
To view, visit http://gerrit.cloudera.org:8080/11615
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I5f95e026ba05631f33f48ce32da6db39c6f421fa
Gerrit-Change-Number: 11615
Gerrit-PatchSet: 1
Gerrit-Owner: Michael Ho 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Thomas Marshall 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Mon, 08 Oct 2018 22:24:46 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4063: Merge report of query fragment instances per executor

2018-10-08 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/11615 )

Change subject: IMPALA-4063: Merge report of query fragment instances per 
executor
..


Patch Set 1:

I won't get a chance to review this for the next two weeks but please don't let 
that hold things up.


--
To view, visit http://gerrit.cloudera.org:8080/11615
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I5f95e026ba05631f33f48ce32da6db39c6f421fa
Gerrit-Change-Number: 11615
Gerrit-PatchSet: 1
Gerrit-Owner: Michael Ho 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Thomas Marshall 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Mon, 08 Oct 2018 20:53:03 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4063: Merge report of query fragment instances per executor

2018-10-08 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/11615 )

Change subject: IMPALA-4063: Merge report of query fragment instances per 
executor
..


Patch Set 1:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/981/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/11615
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I5f95e026ba05631f33f48ce32da6db39c6f421fa
Gerrit-Change-Number: 11615
Gerrit-PatchSet: 1
Gerrit-Owner: Michael Ho 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Thomas Marshall 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Mon, 08 Oct 2018 18:21:10 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4063: Merge report of query fragment instances per executor

2018-10-08 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/11615 )

Change subject: IMPALA-4063: Merge report of query fragment instances per 
executor
..


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/11615/1/be/src/runtime/query-state.h
File be/src/runtime/query-state.h:

http://gerrit.cloudera.org:8080/#/c/11615/1/be/src/runtime/query-state.h@353
PS1, Line 353:   /// 'overall_status_' will be set once this is unblocked and 
so will 'failed_instance_id_'
line too long (92 > 90)



--
To view, visit http://gerrit.cloudera.org:8080/11615
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I5f95e026ba05631f33f48ce32da6db39c6f421fa
Gerrit-Change-Number: 11615
Gerrit-PatchSet: 1
Gerrit-Owner: Michael Ho 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Thomas Marshall 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Mon, 08 Oct 2018 17:46:42 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4063: Merge report of query fragment instances per executor

2018-10-08 Thread Michael Ho (Code Review)
Michael Ho has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/11615


Change subject: IMPALA-4063: Merge report of query fragment instances per 
executor
..

IMPALA-4063: Merge report of query fragment instances per executor

Previously, each fragment instance executing on an executor will
independently report its status to the coordinator periodically.
This creates a huge amount of RPCs to the coordinator under highly
conurrent workloads, causing lock contention in the coordinator's
backend states as each fragment instance independently tries to
update the coordinator backend states. In addition, due to the lack
of coordinator between query fragment instances, a query may end
without collecting the profiles from all fragment instances when
one of them hits an error before some other fragment instance manages
to finish calling Prepare(), leading to missing profile for certain
fragment instances.

This change fixes the problem above by making a thread per QueryState
(started by QueryExecMgr) to be responsible for periodically reporting
the status and profiles of all fragment instances of a query running
on a backend. As part of this refactoring, each query fragment instance
will not report their errors individually. Instead, there is a cumulative
status maintained per QueryState. It's set to the error status of the first
fragment instance which hits an error or any general error (e.g. failure
to start a thread). With this change, the status reporting thread is
also removed.

Testing done: exhaustive tests

This patch is based on a patch by Sailesh Mukil

Change-Id: I5f95e026ba05631f33f48ce32da6db39c6f421fa
---
M be/src/runtime/coordinator-backend-state.cc
M be/src/runtime/coordinator-backend-state.h
M be/src/runtime/coordinator.cc
M be/src/runtime/coordinator.h
M be/src/runtime/fragment-instance-state.cc
M be/src/runtime/fragment-instance-state.h
M be/src/runtime/query-exec-mgr.cc
M be/src/runtime/query-state.cc
M be/src/runtime/query-state.h
M be/src/service/client-request-state.cc
M be/src/service/client-request-state.h
M be/src/service/control-service.cc
M common/protobuf/control_service.proto
M common/thrift/RuntimeProfile.thrift
M 
testdata/workloads/functional-query/queries/QueryTest/udf-non-deterministic.test
M testdata/workloads/functional-query/queries/QueryTest/udf.test
16 files changed, 377 insertions(+), 420 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/15/11615/1
--
To view, visit http://gerrit.cloudera.org:8080/11615
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I5f95e026ba05631f33f48ce32da6db39c6f421fa
Gerrit-Change-Number: 11615
Gerrit-PatchSet: 1
Gerrit-Owner: Michael Ho