Hello all, I am experiencing an error on Resize and Status . The errors are from the REST call on the AM.
command : $DRILL_HOME/bin/drill-on-yarn.sh --site $DRILL_SITE status Result: Application ID: xxxxxxxxxxxxxxxx Application State: RUNNING Host: xxxxxxxxxxxxxxxx Queue: root.xxxxx.default User: xxxxxxxx Start Time: 2019-01-14 14:56:29 Application Name: Drill-on-YARN-cluster_01 Tracking URL: xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx Failed to get AM status REST request failed: https://xxxxxxxxxxxxxxx:9048/rest/status Command : $DRILL_HOME/bin/drill-on-yarn.sh --site $DRILL_SITE resize Result : Resizing cluster for Application ID: xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx Resize failed: REST request failed: https://xxxxxxxxxxxxxxx:9048/rest/shrink/1 I didn't found how I can resolve this issue. maybe someone can help me Thanks. On Sat, Jan 12, 2019 at 8:30 AM Kwizera hugues Teddy <[email protected]> wrote: > Hello , > > Other option work . > > As you say an update is needed in docs and the remove of wrong > information. > > Thanks. > > On Sat, Jan 12, 2019, 08:10 Abhishek Girish <[email protected] wrote: > >> Hello Teddy, >> >> I don't recollect a restart option for the drill-on-yarn.sh script. I've >> always used a combination of stop and start, like Paul mentions. Could you >> please try if that works and get back to us? We could certainly have a >> minor enhancement to support restart - until then i'll request Bridget to >> update the documentation. >> >> Regards, >> Abhishek >> >> On Fri, Jan 11, 2019 at 11:05 PM Kwizera hugues Teddy < >> [email protected]> >> wrote: >> >> > Hello Paul , >> > >> > Thanks you for your response with some interesting information(files in >> > /tmp). >> > >> > For my side all other command line work normally(start|stop|status...|) >> > but no restart(this option not recognized). I tried to search the code >> > source and I found that the restart command is not implemented . then I >> > wonder why the documentation does not match the source code ?. >> > >> > Thanks .Teddy >> > >> > >> > On Sat, Jan 12, 2019, 02:39 Paul Rogers <[email protected] >> wrote: >> > >> > > Let's try to troubleshoot. Does the combination of stop and start >> work? >> > If >> > > so, then there could be a bug with the restart command itself. >> > > >> > > If neither start nor stop work, it could be that you are missing the >> > > application ID file created when you first started DoY. Some >> background. >> > > >> > > When we submit an app to YARN, YARN gives us an app ID. We need this >> in >> > > order to track down the app master for DoY so we can send it commands >> > later. >> > > >> > > When the command line tool starts DoY, it writes the YARN app ID to a >> > > file. Can't remember the details, but it is probably in the >> $DRILL_SITE >> > > directory. The contents are, as I recall, a long hexadecimal string. >> > > >> > > When you invoke the command line, the tool reads this file to figure >> to >> > > track down the DoY app master. The tool then sends commands to the app >> > > master: in this case, a request to shut down. Then, for reset, the >> tool >> > > will communicate with YARN to start a new instance. >> > > >> > > The tool is suppose to give detailed error messages. Did you get any? >> > That >> > > might tell us which of these steps failed. >> > > >> > > Can you connect to the DoY Web UI at the URL provided when you started >> > > DoY? If you can, this means that the DoY App Master is up and running. >> > > >> > > Are you running the client from the same node on which you started it? >> > > That file I mentioned is local to the "DoY client" machine; it is not >> in >> > > DFS. >> > > >> > > Then, there is one more very obscure bug you can check. On some >> > > distributions, the YARN task files are written to the /tmp directory. >> > Some >> > > Linux systems remove these files from time to time. Once the files are >> > > gone, YARN can no longer control its containers: it won't be able to >> stop >> > > the app master or the Drillbit containers. There are two fixes. >> First, go >> > > kill all the processes by hand. Then, move the YARN state files out of >> > > /tmp, or exclude YARN's files from the periodic cleanup. >> > > >> > > Try some of the above and let us know what you find. >> > > >> > > Also, perhaps Abhishek can offer some suggestions as he tested the >> heck >> > > out of the feature and may have additional suggestions. >> > > >> > > Thanks, >> > > - Paul >> > > >> > > >> > > >> > > On Friday, January 11, 2019, 7:46:55 AM PST, Kwizera hugues Teddy >> < >> > > [email protected]> wrote: >> > > >> > > hello, >> > > >> > > 2 weeks ago, I began to discover DoY. Today by reading drill >> documents ( >> > > https://drill.apache.org/docs/appendix-a-release-note-issues/ ) I saw >> > that >> > > we can restart drill cluster by : >> > > >> > > $DRILL_HOME/bin/drill-on-yarn.sh --site $DRILL_SITE restart >> > > >> > > But doesn't work when I tested it. >> > > >> > > No idea about it? >> > > >> > > Thanks. >> > > >> > > >> > > >> > > >> > > On Wed, Jan 2, 2019 at 3:18 AM Paul Rogers <[email protected] >> > >> > > wrote: >> > > >> > > > Hi Charles, >> > > > >> > > > Your engineers have identified a common need, but one which is very >> > > > difficult to satisfy. >> > > > >> > > > TL;DR: DoY gets as close to the requirements as possible within the >> > > > constraints of YARN and Drill. But, future projects could do more. >> > > > >> > > > Your engineers want resource segregation among tenants: >> multi-tenancy. >> > > > This is very difficult to achieve at the application level. Consider >> > > Drill. >> > > > It would need some way to identify users to know which tenant they >> > belong >> > > > to. Then, Drill would need a way to enqueue users whose queries >> would >> > > > exceed the memory or CPU limit for that tenant. Plus, Drill would >> have >> > to >> > > > be able to limit memory and CPU for each query. Much work has been >> done >> > > to >> > > > limit memory, but CPU is very difficult. Mature products such as >> > Teradata >> > > > can do this, but Teradata has 40 years of effort behind it. >> > > > >> > > > Since it is hard to build multi-tenancy in at the app level (not >> > > > impossible, just very, very hard), the thought is to apply it at the >> > > > cluster level. This is done in YARN via limiting the resources >> > available >> > > to >> > > > processes (typically map/reduce) and to limit the number of running >> > > > processes. Works for M/R because each map task uses disk to shuffle >> > > results >> > > > to a reduce task, so map and reduce tasks can run asynchronously. >> > > > >> > > > For tools such as Drill, which do in-memory processing (really, >> > > > across-the-network exchanges), both the sender and receiver have to >> run >> > > > concurrently. This is much harder to schedule than async m/r tasks: >> it >> > > > means that the entire Drill cluster (of whatever size) be up and >> > running >> > > to >> > > > run a query. >> > > > >> > > > The start-up time for Drill is far, far longer than a query. So, it >> is >> > > not >> > > > feasible to use YARN to launch a Drill cluster for each query the >> way >> > you >> > > > would do with Spark. Instead, under YARN, Drill is a long running >> > service >> > > > that handles many queries. >> > > > >> > > > Obviously, this is not ideal: I'm sure your engineers want to use a >> > > > tenant's resources for Drill when running queries, else for Spark, >> > Hive, >> > > or >> > > > maybe TensorFlow. If Drill has to be long-running, I'm sure they's >> like >> > > to >> > > > slosh resources between tenants as is done in YARN. As noted above, >> > this >> > > is >> > > > a hard problem that DoY did not attempt to solve. >> > > > >> > > > One might suggest that Drill grab resources from YARN when Tenant A >> > wants >> > > > to run a query, and release them when that tenant is done, grabbing >> new >> > > > resources when Tenant B wants to run. Impala tried this with Llama >> and >> > > > found it did not work. (This is why DoY is quite a bit simpler; no >> > reason >> > > > to rerun a failed experiment.) >> > > > >> > > > Some folks are looking to Kubernetes (K8s) as a solution. But, that >> > just >> > > > replaces YARN with K8s: Drill is still a long-running process. >> > > > >> > > > To solve the problem you identify, you'll need either: >> > > > >> > > > * A bunch of work in Drill to build multi-tenancy into Drill, or >> > > > * A cloud-like solution in which each tenant spins up a Drill >> cluster >> > > > within its budget, spinning it down, or resizing it, to stay with an >> > > > overall budget. >> > > > >> > > > The second option can be achieved under YARN with DoY, assuming that >> > DoY >> > > > added support for graceful shutdown (or the cluster is reduced in >> size >> > > only >> > > > when no queries are active.) Longer-term, a more modern solution >> would >> > be >> > > > Drill-on-Kubernetes (DoK?) which Abhishek started on. >> > > > >> > > > Engineering is the art of compromise. The question for your >> engineers >> > is >> > > > how to achieve the best result given the limitations of the software >> > > > available today. At the same time, helping the Drill community >> improve >> > > the >> > > > solutions over time. >> > > > >> > > > Thanks, >> > > > - Paul >> > > > >> > > > >> > > > >> > > > On Sunday, December 30, 2018, 9:38:04 PM PST, Charles Givre < >> > > > [email protected]> wrote: >> > > > >> > > > Hi Paul, >> > > > Here’s what our engineers said: >> > > > >> > > > From Paul’s response, I understand that there is a slight confusion >> > > around >> > > > how multi-tenancy has been enabled in our data lake. >> > > > >> > > > Some more details on this – >> > > > >> > > > Drill already has the concept of multitenancy where we can have >> > multiple >> > > > drill clusters running on the same data lake enabled through >> different >> > > > ports and zookeeper. But, all of this is launched through the same >> hard >> > > > coded yarn queue that we provide as a config parameter. >> > > > >> > > > In our data lake, each tenant has a certain amount of compute >> capacity >> > > > allotted to them which they can use for their project work. This is >> > > > provisioned through individual YARN queues for each tenant (resource >> > > > caging). This restricts the tenants from using cluster resources >> > beyond a >> > > > certain limit and not impacting other tenants at the same time. >> > > > >> > > > Access to these YARN queues is provisioned through ACL memberships. >> > > > >> > > > —— >> > > > >> > > > Does this make sense? Is this possible to get Drill to work in this >> > > > manner, or should we look into opening up JIRAs and working on new >> > > > capabilities? >> > > > >> > > > >> > > > >> > > > > On Dec 17, 2018, at 21:59, Paul Rogers <[email protected] >> > >> > > > wrote: >> > > > > >> > > > > Hi Kwizera, >> > > > > I hope my answer to Charles gave you the information you need. If >> > not, >> > > > please check out the DoY documentation or ask follow-up questions. >> > > > > Key thing to remember: Drill is a long-running YARN service; >> queries >> > DO >> > > > NOT go through YARN queues, they go through Drill directly. >> > > > > >> > > > > Thanks, >> > > > > - Paul >> > > > > >> > > > > >> > > > > >> > > > > On Monday, December 17, 2018, 11:01:04 AM PST, Kwizera hugues >> > Teddy >> > > < >> > > > [email protected]> wrote: >> > > > > >> > > > > Hello, >> > > > > Same questions , >> > > > > I would like to know how drill deal with this yarn fonctionality? >> > > > > Cheers. >> > > > > >> > > > > On Mon, Dec 17, 2018, 17:53 Charles Givre <[email protected] >> wrote: >> > > > > >> > > > >> Hello all, >> > > > >> We are trying to set up a Drill cluster on our corporate data >> lake. >> > > Our >> > > > >> cluster requires dynamic YARN queue allocation for multi-tenant >> > > > >> environment. Is this something that Drill supports or is there a >> > > > >> workaround? >> > > > >> Thanks! >> > > > >> —C >> > > > >> > >> >
