Re: Problem running Drill in a Docker container in OpenShift

Abhishek Girish Thu, 06 Feb 2020 13:15:59 -0800

Hey Paul,

Sorry for my delayed response. And thanks for your encouragement.


So here's a brief history of my work in this area.

(1) I started with simple YAML based deployments for Drill. Used standard
Kubernetes APIs and Controllers. This supported bringing up Drill in
distributed mode, used MapR ZK for Cluster coordination and had a MapR
client to connect to MapR FS. It had standard features such as resizing and
such. So pretty much basic Drill on Kubernetes use cases were supported.

(2) I then added Helm charts support to make (1) more easy to use.

(3) I began parallel effort to build an Operator for Drill (written in go).
This would create a Custom Resource called DrillCluster  - so in YAML files
you would see Kind as "DrillCluster" instead of Pod / Statefulset / or
similar. The operator model in K8S is now seeing more adoption as it is
more powerful, flexible and simpler for users. For instance, I could add
more code checks and validations, logging for debugging & more when
compared to approaches (1) and (2). Also has more potential for adding more
features and fixes in this model. This is what we shipped at MapR and this
is what I'm working on for open source Drill and planning on sharing soon.
What's pending for an initial preview release is replacing MapR client with
say HDFS client and MapR ZK with Apache ZK. I can also share (1) and (2)
soon after that.

I'll definitely count on your vast experience and knowledge for help in
this regard.

Regards,
Abhishek

On Tue, Feb 4, 2020 at 2:00 AM Paul Rogers <[email protected]>
wrote:

> Hi Abhishek,
>
> Thanks for the update! Seems to make sense to wait for you to open source
> your work than to spend time on duplicating your effort. And, people who
> want a solution short term can perhaps work with MapR/HPE as you suggest.
> Sounds like you have access to the various systems and have worked though
> the myriad details involved in creating a good integration.
>
> Does your work include Helm integration?
>
> The key challenge for any K8s integration is that Drill needs access to
> data, which requires some kind of distributed storage. This has long been a
> K8s weakness. But, it is, of course, a MapR strength.
>
> Please let us know if you need help with the open source efforts.
>
> Thanks,
> - Paul
>
>
>
>     On Monday, February 3, 2020, 3:13:28 AM PST, Abhishek Girish <
> [email protected]> wrote:
>
>  Hey Ron,
>
> As a part of MapR (now HPE), I've created a native operator for Apache
> Drill and this works on multiple variants of Kubernetes including
> OpenShift. With this, we introduce a new Kind called "DrillCluster" via a
> Custom Resource Definition (CRD) and a Custom Controller (logic to manage
> this DrillCluster kind - written in Golang) for the same. Using this, users
> can easily deploy Drill clusters by submitting Custom Resource YAML files
> (CRs) for the DrillCluster kind. It supports creation of multiple Drill
> clusters (multiple Drillbits launched in distributed mode), multiple
> versions (such as 1.15.0 and 1.16.0), auto-scaling the number of Drillbits
> (based on CPU utilization) and more. I can share more details of this if
> anyone's interested.
>
> While Vanilla K8S, and GKE worked out of the box, I had to make some
> changes to support OpenShift (related to Service Accounts, Security Context
> Constraints, etc). Perhaps you ran into similar issues (I'm yet to read
> this thread fully).
>
> We recently had a v1.0.0 GA release [1], [2] & [3]. One thing to note is
> that the current release has dependencies and integrations with MapR's
> distribution of Apache Drill and is close sourced at the moment (there is
> plan to open source that in the near future).
>
> I have an open source variant of this in the works - to support vanilla
> Apache Drill. In the current state, it has all similar features , it
> removes the MapR specific integration (reliance on MapR-FS instead of HDFS,
> MapR ZooKeeper and such). I shortly plan to add Apache HDFS and ZooKeeper
> integration instead. Let me know if you're interested - and I can share the
> GitHub branch.
>
> Regards,
> Abhishek
>
> [1]
>
> https://mapr.com/blog/mapr-releases-kubernetes-ecosystem-operators-for-apache-spark-and-apache-drill/
> [2]
>
> https://mapr.com/docs/home/PersistentStorage/running_drillbits_in_compute_space.html
> [3] https://github.com/mapr/mapr-operators
>
> On Wed, Jan 29, 2020 at 11:11 AM Ron Cecchini <[email protected]>
> wrote:
>
> >
> > Hi, all.  Drill and OpenShift newbie here.
> >
> > Has anyone successfully deployed a Drill Docker container to an OpenShift
> > environment?
> >
> > While there is information about Drill Docker, there seems to be zero
> > information about OpenShift in particular.
> >
> > Per the instructions at drill.apache.org/docs/running-drill-on-docker, I
> > pulled the Drill Docker image from Docker Hub, and then pushed it to our
> > OpenShift environment.  But when I tried to deploy it, I immediately ran
> > into an error about /opt/drill/conf/drill-override.conf not being
> readable.
> >
> > I understand why the problem is happening (because of who OpenShift runs
> > the container as), so I downloaded the source from GitHub and modified
> the
> > Dockerfile to include:
> >
> >    RUN chgrp -R 0 /opt/drill && chmod -R g=u /opt/drill
> >
> > so that all of /opt/drill would be available to everyone.  But then
> > 'docker build' kept failing, giving the error:
> >
> >    Non-resolvable parent POM for
> > org.apache.drill:drill-root:1.18.0-SNAPSHOT:
> >    Could not transfer artifact org.apache:apache:pom:21
> >
> > I tried researching that error but couldn't figure out what was going on.
> > So I finally decided to start trying to mount persistent volumes,
> creating
> > one PV for /opt/drill/conf (and then copying the default
> > drill-override.conf there) and one PV for /opt/drill/log.
> >
> > Now the container gets much further, but eventually fails on something
> > Hadoop related.  I'm not trying to do anything with Hadoop, so I don't
> know
> > what that's about, but it says I don't have HADOOP_HOME set.
> >
> > Hopefully I can figure out the remaining steps I need (an environment
> > variable?  more configs?), but I was wondering if anyone else had already
> > successfully figured out how to deploy to OpenShift, or might know why
> the
> > 'docker build' fails with that error?
> >
> > For what it's worth, I copied over only that drill-override.conf and
> > nothing else.  And I did not set any Drill environment variables in
> > OpenShift.  I'm basically trying to run the "vanilla" Drill Docker as-is.
> >
> > Thanks for any help!
> >
> > Ron
> >
>

Re: Problem running Drill in a Docker container in OpenShift

Reply via email to