Re: Drill on YARN Questions

Kwizera hugues Teddy Mon, 14 Jan 2019 07:30:48 -0800

Hello all,

I am experiencing an error on Resize and Status .
The errors are from the REST call on the AM.


command : $DRILL_HOME/bin/drill-on-yarn.sh --site $DRILL_SITE status
Result:
Application ID: xxxxxxxxxxxxxxxx Application State: RUNNING Host:
xxxxxxxxxxxxxxxx Queue: root.xxxxx.default User: xxxxxxxx Start Time:
2019-01-14 14:56:29 Application Name: Drill-on-YARN-cluster_01 Tracking
URL: xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx Failed to get AM
status
REST request failed: https://xxxxxxxxxxxxxxx:9048/rest/status

Command : $DRILL_HOME/bin/drill-on-yarn.sh --site $DRILL_SITE resize
Result :
      Resizing cluster for Application ID:
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
      Resize failed: REST request failed:
https://xxxxxxxxxxxxxxx:9048/rest/shrink/1

 I didn't found how I can resolve this issue. maybe someone can help me

Thanks.



On Sat, Jan 12, 2019 at 8:30 AM Kwizera hugues Teddy <[email protected]>
wrote:

> Hello ,
>
> Other option work .
>
> As you say an update is needed in docs  and the remove of wrong
> information.
>
> Thanks.
>
> On Sat, Jan 12, 2019, 08:10 Abhishek Girish <[email protected] wrote:
>
>> Hello Teddy,
>>
>> I don't recollect a restart option for the drill-on-yarn.sh script. I've
>> always used a combination of stop and start, like Paul mentions. Could you
>> please try if that works and get back to us? We could certainly have a
>> minor enhancement to support restart - until then i'll request Bridget to
>> update the documentation.
>>
>> Regards,
>> Abhishek
>>
>> On Fri, Jan 11, 2019 at 11:05 PM Kwizera hugues Teddy <
>> [email protected]>
>> wrote:
>>
>> > Hello Paul ,
>> >
>> > Thanks you for your response with some interesting information(files in
>> > /tmp).
>> >
>> > For my side all other command line  work normally(start|stop|status...|)
>> > but no restart(this option not recognized). I tried to search the code
>> > source and I found that the restart command is not implemented . then I
>> > wonder why the documentation does not match the source code ?.
>> >
>> > Thanks .Teddy
>> >
>> >
>> > On Sat, Jan 12, 2019, 02:39 Paul Rogers <[email protected]
>> wrote:
>> >
>> > > Let's try to troubleshoot. Does the combination of stop and start
>> work?
>> > If
>> > > so, then there could be a bug with the restart command itself.
>> > >
>> > > If neither start nor stop work, it could be that you are missing the
>> > > application ID file created when you first started DoY. Some
>> background.
>> > >
>> > > When we submit an app to YARN, YARN gives us an app ID. We need this
>> in
>> > > order to track down the app master for DoY so we can send it commands
>> > later.
>> > >
>> > > When the command line tool starts DoY, it writes the YARN app ID to a
>> > > file. Can't remember the details, but it is probably in the
>> $DRILL_SITE
>> > > directory. The contents are, as I recall, a long hexadecimal string.
>> > >
>> > > When you invoke the command line, the tool reads this file to figure
>> to
>> > > track down the DoY app master. The tool then sends commands to the app
>> > > master: in this case, a request to shut down. Then, for reset, the
>> tool
>> > > will communicate with YARN to start a new instance.
>> > >
>> > > The tool is suppose to give detailed error messages. Did you get any?
>> > That
>> > > might tell us which of these steps failed.
>> > >
>> > > Can you connect to the DoY Web UI at the URL provided when you started
>> > > DoY? If you can, this means that the DoY App Master is up and running.
>> > >
>> > > Are you running the client from the same node on which you started it?
>> > > That file I mentioned is local to the "DoY client" machine; it is not
>> in
>> > > DFS.
>> > >
>> > > Then, there is one more very obscure bug you can check. On some
>> > > distributions, the YARN task files are written to the /tmp directory.
>> > Some
>> > > Linux systems remove these files from time to time. Once the files are
>> > > gone, YARN can no longer control its containers: it won't be able to
>> stop
>> > > the app master or the Drillbit containers. There are two fixes.
>> First, go
>> > > kill all the processes by hand. Then, move the YARN state files out of
>> > > /tmp, or exclude YARN's files from the periodic cleanup.
>> > >
>> > > Try some of the above and let us know what you find.
>> > >
>> > > Also, perhaps Abhishek can offer some suggestions as he tested the
>> heck
>> > > out of the feature and may have additional suggestions.
>> > >
>> > > Thanks,
>> > > - Paul
>> > >
>> > >
>> > >
>> > >     On Friday, January 11, 2019, 7:46:55 AM PST, Kwizera hugues Teddy
>> <
>> > > [email protected]> wrote:
>> > >
>> > >  hello,
>> > >
>> > >  2 weeks ago, I began to discover DoY. Today by reading drill
>> documents (
>> > > https://drill.apache.org/docs/appendix-a-release-note-issues/ ) I saw
>> > that
>> > > we can restart drill cluster by :
>> > >
>> > >  $DRILL_HOME/bin/drill-on-yarn.sh --site $DRILL_SITE restart
>> > >
>> > > But doesn't work when I tested it.
>> > >
>> > > No idea about it?
>> > >
>> > > Thanks.
>> > >
>> > >
>> > >
>> > >
>> > > On Wed, Jan 2, 2019 at 3:18 AM Paul Rogers <[email protected]
>> >
>> > > wrote:
>> > >
>> > > > Hi Charles,
>> > > >
>> > > > Your engineers have identified a common need, but one which is very
>> > > > difficult to satisfy.
>> > > >
>> > > > TL;DR: DoY gets as close to the requirements as possible within the
>> > > > constraints of YARN and Drill. But, future projects could do more.
>> > > >
>> > > > Your engineers want resource segregation among tenants:
>> multi-tenancy.
>> > > > This is very difficult to achieve at the application level. Consider
>> > > Drill.
>> > > > It would need some way to identify users to know which tenant they
>> > belong
>> > > > to. Then, Drill would need a way to enqueue users whose queries
>> would
>> > > > exceed the memory or CPU limit for that tenant. Plus, Drill would
>> have
>> > to
>> > > > be able to limit memory and CPU for each query. Much work has been
>> done
>> > > to
>> > > > limit memory, but CPU is very difficult. Mature products such as
>> > Teradata
>> > > > can do this, but Teradata has 40 years of effort behind it.
>> > > >
>> > > > Since it is hard to build multi-tenancy in at the app level (not
>> > > > impossible, just very, very hard), the thought is to apply it at the
>> > > > cluster level. This is done in YARN via limiting the resources
>> > available
>> > > to
>> > > > processes (typically map/reduce) and to limit the number of running
>> > > > processes. Works for M/R because each map task uses disk to shuffle
>> > > results
>> > > > to a reduce task, so map and reduce tasks can run asynchronously.
>> > > >
>> > > > For tools such as Drill, which do in-memory processing (really,
>> > > > across-the-network exchanges), both the sender and receiver have to
>> run
>> > > > concurrently. This is much harder to schedule than async m/r tasks:
>> it
>> > > > means that the entire Drill cluster (of whatever size) be up and
>> > running
>> > > to
>> > > > run a query.
>> > > >
>> > > > The start-up time for Drill is far, far longer than a query. So, it
>> is
>> > > not
>> > > > feasible to use YARN to launch a Drill cluster for each query the
>> way
>> > you
>> > > > would do with Spark. Instead, under YARN, Drill is a long running
>> > service
>> > > > that handles many queries.
>> > > >
>> > > > Obviously, this is not ideal: I'm sure your engineers want to use a
>> > > > tenant's resources for Drill when running queries, else for Spark,
>> > Hive,
>> > > or
>> > > > maybe TensorFlow. If Drill has to be long-running, I'm sure they's
>> like
>> > > to
>> > > > slosh resources between tenants as is done in YARN. As noted above,
>> > this
>> > > is
>> > > > a hard problem that DoY did not attempt to solve.
>> > > >
>> > > > One might suggest that Drill grab resources from YARN when Tenant A
>> > wants
>> > > > to run a query, and release them when that tenant is done, grabbing
>> new
>> > > > resources when Tenant B wants to run. Impala tried this with Llama
>> and
>> > > > found it did not work. (This is why DoY is quite a bit simpler; no
>> > reason
>> > > > to rerun a failed experiment.)
>> > > >
>> > > > Some folks are looking to Kubernetes (K8s) as a solution. But, that
>> > just
>> > > > replaces YARN with K8s: Drill is still a long-running process.
>> > > >
>> > > > To solve the problem you identify, you'll need either:
>> > > >
>> > > > * A bunch of work in Drill to build multi-tenancy into Drill, or
>> > > > * A cloud-like solution in which each tenant spins up a Drill
>> cluster
>> > > > within its budget, spinning it down, or resizing it, to stay with an
>> > > > overall budget.
>> > > >
>> > > > The second option can be achieved under YARN with DoY, assuming that
>> > DoY
>> > > > added support for graceful shutdown (or the cluster is reduced in
>> size
>> > > only
>> > > > when no queries are active.) Longer-term, a more modern solution
>> would
>> > be
>> > > > Drill-on-Kubernetes (DoK?) which Abhishek started on.
>> > > >
>> > > > Engineering is the art of compromise. The question for your
>> engineers
>> > is
>> > > > how to achieve the best result given the limitations of the software
>> > > > available today. At the same time, helping the Drill community
>> improve
>> > > the
>> > > > solutions over time.
>> > > >
>> > > > Thanks,
>> > > > - Paul
>> > > >
>> > > >
>> > > >
>> > > >    On Sunday, December 30, 2018, 9:38:04 PM PST, Charles Givre <
>> > > > [email protected]> wrote:
>> > > >
>> > > >  Hi Paul,
>> > > > Here’s what our engineers said:
>> > > >
>> > > > From Paul’s response, I understand that there is a slight confusion
>> > > around
>> > > > how multi-tenancy has been enabled in our data lake.
>> > > >
>> > > > Some more details on this –
>> > > >
>> > > > Drill already has the concept of multitenancy where we can have
>> > multiple
>> > > > drill clusters running on the same data lake enabled through
>> different
>> > > > ports and zookeeper. But, all of this is launched through the same
>> hard
>> > > > coded yarn queue that we provide as a config parameter.
>> > > >
>> > > > In our data lake, each tenant has a certain amount of compute
>> capacity
>> > > > allotted to them which they can use for their project work. This is
>> > > > provisioned through individual YARN queues for each tenant (resource
>> > > > caging). This restricts the tenants from using cluster resources
>> > beyond a
>> > > > certain limit and not impacting other tenants at the same time.
>> > > >
>> > > > Access to these YARN queues is provisioned through ACL memberships.
>> > > >
>> > > > ——
>> > > >
>> > > > Does this make sense?  Is this possible to get Drill to work in this
>> > > > manner, or should we look into opening up JIRAs and working on new
>> > > > capabilities?
>> > > >
>> > > >
>> > > >
>> > > > > On Dec 17, 2018, at 21:59, Paul Rogers <[email protected]
>> >
>> > > > wrote:
>> > > > >
>> > > > > Hi Kwizera,
>> > > > > I hope my answer to Charles gave you the information you need. If
>> > not,
>> > > > please check out the DoY documentation or ask follow-up questions.
>> > > > > Key thing to remember: Drill is a long-running YARN service;
>> queries
>> > DO
>> > > > NOT go through YARN queues, they go through Drill directly.
>> > > > >
>> > > > > Thanks,
>> > > > > - Paul
>> > > > >
>> > > > >
>> > > > >
>> > > > >    On Monday, December 17, 2018, 11:01:04 AM PST, Kwizera hugues
>> > Teddy
>> > > <
>> > > > [email protected]> wrote:
>> > > > >
>> > > > > Hello,
>> > > > > Same questions ,
>> > > > > I would like to know how drill deal with this yarn fonctionality?
>> > > > > Cheers.
>> > > > >
>> > > > > On Mon, Dec 17, 2018, 17:53 Charles Givre <[email protected]
>> wrote:
>> > > > >
>> > > > >> Hello all,
>> > > > >> We are trying to set up a Drill cluster on our corporate data
>> lake.
>> > > Our
>> > > > >> cluster requires dynamic YARN queue allocation for multi-tenant
>> > > > >> environment.  Is this something that Drill supports or is there a
>> > > > >> workaround?
>> > > > >> Thanks!
>> > > > >> —C
>> > > >
>> >
>>
>

Re: Drill on YARN Questions

Reply via email to