Re: Discuss about Drill's schedule policy

2017-08-27 Thread Paul Rogers
Hi Weijie, Thanks much for the suggestions! It will take a while to digest all of this as Drill’s existing scheduler (for fragments) is quite complex, but it works. I’ll need to map those concepts to Sparrow. We still don’t have query-level scheduling, but perhaps there is something that can

Re: Discuss about Drill's schedule policy

2017-08-27 Thread weijie tong
Maybe we need to adjust the MajorFragments execution phase as the intermediate MajorFragments are lazy executed now(if the intermediate fragments tasks are lazy allocated or not allocated due to resource restrict, the down stream running works will couldn't send their data out). We should let

Re: Discuss about Drill's schedule policy

2017-08-27 Thread weijie tong
Hi Paul: I have read the codes of Sparrow and Spark-Sparrow last few days. It seems Sparrow can match Drill's architecture very well. According to sparrow's spark implementation, every MinorFragment can be treat as a spark task ,a MajorFragment can be treat as a spark taskset. We will start a

Re: Discuss about Drill's schedule policy

2017-08-23 Thread Paul Rogers
Hi Weijie, Thanks for the link. I’d seen this project a bit earlier, along with Apollo [1]. Sparrow is quite interesting, but is designed to place tasks (processes) on available nodes. This is not quite what Drill does: Drill launches multiple waves of “fragments” to all nodes across the

Re: Discuss about Drill's schedule policy

2017-08-23 Thread weijie tong
@paul have you noticed the Sparrow project ( https://github.com/radlab/sparrow ) and related paper mentioned in the github . Sparrow is a non-central ,low latency scheduler . This seems meet Drill's demand. I think we can first abstract a scheduler interface like what Spark does , then we can

Re: Discuss about Drill's schedule policy

2017-08-21 Thread weijie tong
Thanks for all your suggestions. @paul your analysis is impressive . I agree with your opinion. Current queue solution can not solve this problem perfectly. Our system is suffering a hard time once the cluster is in high load. I will think about this more deeply. welcome more ideas or

Re: Discuss about Drill's schedule policy

2017-08-20 Thread Paul Rogers
Hi Weijie, Great analysis. Let’s look at a few more data points. Drill has no central scheduler (this is a feature: it makes the cluster much easier to manage and has no single point of failure. It was probably the easiest possible solution while Drill was being built.) Instead of central

Re: Discuss about Drill's schedule policy

2017-08-20 Thread Padma Penumarthy
If control RPC is down to a drillbit i.e if a drillbit is not responding, zookeeper should detect that and notify other drillbits to remove the dead drillbit from their active list. Once that happens, the next query that comes in should not even see that drillbit. We need a way to differentiate

Discuss about Drill's schedule policy

2017-08-20 Thread weijie tong
HI all: Drill's current schedule policy seems a little simple. The SimpleParallelizer assigns endpoints in round robin model which ignores the system's load and other factors. To critical scenario, some drillbits are suffering frequent full GCs which will let their control RPC blocked. Current