I like this this idea in general. When running under orchestrators like Yarn, Marathon, or Kubernetes, it's true that those things that start drill "manage" memory, however, there exists issues in that you need to setup the variables in drill to not exceed the amount that orchestrators have allocated. Once an orchestrator sees a managed process overtake what it allocated, it often kills the process. In Drill that can mean drillbits that get killed during queries and thus that leads to a bad user experience. Folks configuring Drill in the field have had to set the Heap, Direct and other settings and hope that they did it right to ensure this didn't happen.
This option, provides a way for people to start working with reasonable settings, I like the by % or absolute values. This is important in multi-tenant environments. I think I saw in the JIRA that Drill will indicate at startup what allocation was used, based on what variables. I think this is important. Log at bit start both in stdout and in the drillbit.log file. Indicate what method was used for allocation, what the user provided values were, and for auto allocations the split provided. (maybe even provide it in such a way, that if if a user read it, and wanted to tweak, they could take the auto allocated output message, and cut and paste that into drill_env.sh. I.e. print the variables and the values that got auto allocated. That way, as an administrator, if I felt the need to tweak settings, I can take exactly what the auto-allocation outputted, put it into my env script, and then tweak to my hearts desire. This is a pretty cool.. John On Fri, Feb 16, 2018 at 1:15 PM, Kunal Khatua <[email protected]> wrote: > Hi everyone > > We're working on simplifying the Drill memory configuration process, where > by, users no more have the need for getting into the specifics of Heap and > Direct memory allocations. > Here is the original JIRA https://issues.apache.org/jira/browse/DRILL-5741 > > The idea is to simply provide the Drill process with a single dimensional > of total process memory (either in absolute values, or as a percentage of > the total system memory), while Drill's startup scripts automatically do an > optimal allocations. This, of course, can be overridden. > > What I'm looking for feed back.... for which, you're welcome to try the > Commit (1ad11ee44902c11efa69cde908002f59169f61d7) specified in the > following > https://github.com/apache/drill/pull/1082 > > You can try building Drill with this private branch (to which the pull > request is linked): https://github.com/kkhatua/drill/commit/ > 1ad11ee44902c11efa69cde908002f59169f61d7 > > Once you've done a clean setup, you should only need to edit > > and uncomment the line having the property - "DRILLBIT_MAX_PROC_MEM" with > a setting like (say) 50%. > export DRILLBIT_MAX_PROC_MEM=50% > > After that, Drill should start up successfully. Log messages should appear > in drillbit.out showing that Drill has auto-configured the memory. > > I'm looking forward to hearing back from folks who've tried this. > > TIA > ~ Kunal >
