Michael Ho has posted comments on this change. ( http://gerrit.cloudera.org:8080/9282 )
Change subject: IMPALA-6116: Bound memory usage of DataStreamSevice's service queue ...................................................................... Patch Set 1: (14 comments) http://gerrit.cloudera.org:8080/#/c/9282/1//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/9282/1//COMMIT_MSG@20 PS1, Line 20: this flag is left at 0, in which case it will be set to 20% of process > It seems too aggressive for small-to-medium deployments for sure :). It mig Pushed it down to 5% for now. Until IMPALA-5859 is fixed, RPC retries may end up consuming quite a lot more network bandwidth. http://gerrit.cloudera.org:8080/#/c/9282/1/be/src/rpc/impala-service-pool.h File be/src/rpc/impala-service-pool.h: http://gerrit.cloudera.org:8080/#/c/9282/1/be/src/rpc/impala-service-pool.h@43 PS1, Line 43: MemTracker* mem_tracker > it would be good to document these parameters. Is this the mem_tracker that Done http://gerrit.cloudera.org:8080/#/c/9282/1/be/src/rpc/impala-service-pool.cc File be/src/rpc/impala-service-pool.cc: http://gerrit.cloudera.org:8080/#/c/9282/1/be/src/rpc/impala-service-pool.cc@150 PS1, Line 150: or > , otherwise Done http://gerrit.cloudera.org:8080/#/c/9282/1/be/src/rpc/impala-service-pool.cc@152 PS1, Line 152: std:: > nit: names.h should take care of that Done http://gerrit.cloudera.org:8080/#/c/9282/1/be/src/rpc/rpc-mgr-test-base.h File be/src/rpc/rpc-mgr-test-base.h: http://gerrit.cloudera.org:8080/#/c/9282/1/be/src/rpc/rpc-mgr-test-base.h@133 PS1, Line 133: // Takes over ownership of the newly created 'service'. > Can you explain why this method is needed? Could you just pass in a pointer The RpcMgr no longer owns the rpc service. RpcMgr::Shutdown() will call rpc service's Shutdown() function so the lifetime of the service should be as long as the rpc_mgr_. That's why the ownership is transferred to RpcMgrTestBase class instead. Comments added. http://gerrit.cloudera.org:8080/#/c/9282/1/be/src/rpc/rpc-mgr.h File be/src/rpc/rpc-mgr.h: http://gerrit.cloudera.org:8080/#/c/9282/1/be/src/rpc/rpc-mgr.h@127 PS1, Line 127: mem_tracker > document. and this is the service's mem tracker right? if so, maybe service Done http://gerrit.cloudera.org:8080/#/c/9282/1/be/src/runtime/exec-env.cc File be/src/runtime/exec-env.cc: http://gerrit.cloudera.org:8080/#/c/9282/1/be/src/runtime/exec-env.cc@304 PS1, Line 304: RETURN_IF_ERROR(data_svc_->Init(rpc_mgr_.get())); > The RpcMgr has a getter in the ExecEnv, too, so you could remove it from th Done http://gerrit.cloudera.org:8080/#/c/9282/1/be/src/runtime/exec-env.cc@305 PS1, Line 305: data_svc_ > this one, too This is a bit tricky as this is needed in data-stream-test.cc. http://gerrit.cloudera.org:8080/#/c/9282/1/be/src/runtime/krpc-data-stream-mgr.h File be/src/runtime/krpc-data-stream-mgr.h: http://gerrit.cloudera.org:8080/#/c/9282/1/be/src/runtime/krpc-data-stream-mgr.h@234 PS1, Line 234: handed over to DataStreamService > is that correct? Isn't incoming_request_tracker the DataStreamService's mem DataStreamMgr / Exchange node to be more precise. http://gerrit.cloudera.org:8080/#/c/9282/1/be/src/runtime/krpc-data-stream-mgr.cc File be/src/runtime/krpc-data-stream-mgr.cc: http://gerrit.cloudera.org:8080/#/c/9282/1/be/src/runtime/krpc-data-stream-mgr.cc@83 PS1, Line 83: mem_tracker_.reset(new MemTracker(-1, "Data Stream Manager Deferred RPCs", It seems the common pattern is to have the MemTracker inside the owning object (e.g. exec node, DataStreamRecvr etc) so having it externally in ExecEnv seems to be an uncommon pattern. The lifetime of the MemTracker of DataStreamMgr shouldn't be longer than that of the DataStreamMgr. http://gerrit.cloudera.org:8080/#/c/9282/1/be/src/runtime/krpc-data-stream-mgr.cc@175 PS1, Line 175: Release > I guess we were using the "Local" versions before since we were just transf The "local" variant doesn't allow the tracker to have memory limit. There is a window but then both of these trackers have process_mem_tracker as their parents. I can change the order of Consume() and Release() to avoid under-counting if that's a big concern at all. http://gerrit.cloudera.org:8080/#/c/9282/1/be/src/service/data-stream-service.h File be/src/service/data-stream-service.h: http://gerrit.cloudera.org:8080/#/c/9282/1/be/src/service/data-stream-service.h@58 PS1, Line 58: MemTracker* mem_tracker() { return mem_tracker_.get(); } > does that need to be public? It's used in ExecEnv::Init(). http://gerrit.cloudera.org:8080/#/c/9282/1/be/src/service/data-stream-service.cc File be/src/service/data-stream-service.cc: http://gerrit.cloudera.org:8080/#/c/9282/1/be/src/service/data-stream-service.cc@37 PS1, Line 37: DEFINE_int64(datastream_service_queue_mem_limit, 0, > It would be good to use ParseUtil::ParseMemSpec() for this for consistency Good idea. Done. http://gerrit.cloudera.org:8080/#/c/9282/1/be/src/service/data-stream-service.cc@46 PS1, Line 46: DataStreamService::DataStreamService(RpcMgr* rpc_mgr) > It seems odd to pass the rpc_mgr into both the ctor and Init(). Good point. We should be able to access it through the singleton ExecEnv. -- To view, visit http://gerrit.cloudera.org:8080/9282 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Idea4262dfb0e0aa8d58ff6ea6a8aaaa248e880b9 Gerrit-Change-Number: 9282 Gerrit-PatchSet: 1 Gerrit-Owner: Michael Ho <[email protected]> Gerrit-Reviewer: Dan Hecht <[email protected]> Gerrit-Reviewer: Lars Volker <[email protected]> Gerrit-Reviewer: Michael Ho <[email protected]> Gerrit-Reviewer: Tim Armstrong <[email protected]> Gerrit-Comment-Date: Wed, 14 Feb 2018 08:24:51 +0000 Gerrit-HasComments: Yes
