Michael Ho has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/9282 )

Change subject: IMPALA-6116: Bound memory usage of DataStreamSevice's service 
queue
......................................................................


Patch Set 1:

(14 comments)

http://gerrit.cloudera.org:8080/#/c/9282/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/9282/1//COMMIT_MSG@20
PS1, Line 20: this flag is left at 0, in which case it will be set to 20% of 
process
> It seems too aggressive for small-to-medium deployments for sure :). It mig
Pushed it down to 5% for now. Until IMPALA-5859 is fixed, RPC retries may end 
up consuming quite a lot more network bandwidth.


http://gerrit.cloudera.org:8080/#/c/9282/1/be/src/rpc/impala-service-pool.h
File be/src/rpc/impala-service-pool.h:

http://gerrit.cloudera.org:8080/#/c/9282/1/be/src/rpc/impala-service-pool.h@43
PS1, Line 43: MemTracker* mem_tracker
> it would be good to document these parameters. Is this the mem_tracker that
Done


http://gerrit.cloudera.org:8080/#/c/9282/1/be/src/rpc/impala-service-pool.cc
File be/src/rpc/impala-service-pool.cc:

http://gerrit.cloudera.org:8080/#/c/9282/1/be/src/rpc/impala-service-pool.cc@150
PS1, Line 150: or
> , otherwise
Done


http://gerrit.cloudera.org:8080/#/c/9282/1/be/src/rpc/impala-service-pool.cc@152
PS1, Line 152: std::
> nit: names.h should take care of that
Done


http://gerrit.cloudera.org:8080/#/c/9282/1/be/src/rpc/rpc-mgr-test-base.h
File be/src/rpc/rpc-mgr-test-base.h:

http://gerrit.cloudera.org:8080/#/c/9282/1/be/src/rpc/rpc-mgr-test-base.h@133
PS1, Line 133:   // Takes over ownership of the newly created 'service'.
> Can you explain why this method is needed? Could you just pass in a pointer
The RpcMgr no longer owns the rpc service. RpcMgr::Shutdown() will call rpc 
service's Shutdown() function so the lifetime of the service should be as long 
as the rpc_mgr_. That's why the ownership is transferred to RpcMgrTestBase 
class instead. Comments added.


http://gerrit.cloudera.org:8080/#/c/9282/1/be/src/rpc/rpc-mgr.h
File be/src/rpc/rpc-mgr.h:

http://gerrit.cloudera.org:8080/#/c/9282/1/be/src/rpc/rpc-mgr.h@127
PS1, Line 127: mem_tracker
> document. and this is the service's mem tracker right? if so, maybe service
Done


http://gerrit.cloudera.org:8080/#/c/9282/1/be/src/runtime/exec-env.cc
File be/src/runtime/exec-env.cc:

http://gerrit.cloudera.org:8080/#/c/9282/1/be/src/runtime/exec-env.cc@304
PS1, Line 304:     RETURN_IF_ERROR(data_svc_->Init(rpc_mgr_.get()));
> The RpcMgr has a getter in the ExecEnv, too, so you could remove it from th
Done


http://gerrit.cloudera.org:8080/#/c/9282/1/be/src/runtime/exec-env.cc@305
PS1, Line 305: data_svc_
> this one, too
This is a bit tricky as this is needed in data-stream-test.cc.


http://gerrit.cloudera.org:8080/#/c/9282/1/be/src/runtime/krpc-data-stream-mgr.h
File be/src/runtime/krpc-data-stream-mgr.h:

http://gerrit.cloudera.org:8080/#/c/9282/1/be/src/runtime/krpc-data-stream-mgr.h@234
PS1, Line 234: handed over to DataStreamService
> is that correct? Isn't incoming_request_tracker the DataStreamService's mem
DataStreamMgr / Exchange node to be more precise.


http://gerrit.cloudera.org:8080/#/c/9282/1/be/src/runtime/krpc-data-stream-mgr.cc
File be/src/runtime/krpc-data-stream-mgr.cc:

http://gerrit.cloudera.org:8080/#/c/9282/1/be/src/runtime/krpc-data-stream-mgr.cc@83
PS1, Line 83:   mem_tracker_.reset(new MemTracker(-1, "Data Stream Manager 
Deferred RPCs",
It seems the common pattern is to have the MemTracker inside the owning object 
(e.g. exec node, DataStreamRecvr etc) so having it externally in ExecEnv seems 
to be an uncommon pattern. The lifetime of the MemTracker of DataStreamMgr 
shouldn't be longer than that of the DataStreamMgr.


http://gerrit.cloudera.org:8080/#/c/9282/1/be/src/runtime/krpc-data-stream-mgr.cc@175
PS1, Line 175: Release
> I guess we were using the "Local" versions before since we were just transf
The "local" variant doesn't allow the tracker to have memory limit. There is a 
window but then both of these trackers have process_mem_tracker as their 
parents. I can change the order of Consume() and Release() to avoid 
under-counting if that's a big concern at all.


http://gerrit.cloudera.org:8080/#/c/9282/1/be/src/service/data-stream-service.h
File be/src/service/data-stream-service.h:

http://gerrit.cloudera.org:8080/#/c/9282/1/be/src/service/data-stream-service.h@58
PS1, Line 58:   MemTracker* mem_tracker() { return mem_tracker_.get(); }
> does that need to be public?
It's used in ExecEnv::Init().


http://gerrit.cloudera.org:8080/#/c/9282/1/be/src/service/data-stream-service.cc
File be/src/service/data-stream-service.cc:

http://gerrit.cloudera.org:8080/#/c/9282/1/be/src/service/data-stream-service.cc@37
PS1, Line 37: DEFINE_int64(datastream_service_queue_mem_limit, 0,
> It would be good to use ParseUtil::ParseMemSpec() for this for consistency
Good idea. Done.


http://gerrit.cloudera.org:8080/#/c/9282/1/be/src/service/data-stream-service.cc@46
PS1, Line 46: DataStreamService::DataStreamService(RpcMgr* rpc_mgr)
> It seems odd to pass the rpc_mgr into both the ctor and Init().
Good point. We should be able to access it through the singleton ExecEnv.



--
To view, visit http://gerrit.cloudera.org:8080/9282
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Idea4262dfb0e0aa8d58ff6ea6a8aaaa248e880b9
Gerrit-Change-Number: 9282
Gerrit-PatchSet: 1
Gerrit-Owner: Michael Ho <k...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dhe...@cloudera.com>
Gerrit-Reviewer: Lars Volker <l...@cloudera.com>
Gerrit-Reviewer: Michael Ho <k...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com>
Gerrit-Comment-Date: Wed, 14 Feb 2018 08:24:51 +0000
Gerrit-HasComments: Yes

Reply via email to