John,

That is a great writeup. If Drill were running from docker with the docker
instance referencing a local path ( Mount a host directory as a data volume
| https://docs.docker.com/userguide/dockervolumes/ ) I would expect the
same performance with all the flexibility you are seeking. In case it is
helpful to you or others, there is a docker image with drill at this
location: https://registry.hub.docker.com/u/mkieboom/apache-drill-docker/

The nice thing about the approach you are taking and adding a docker
deployment with something like Drill is that you really don't care where
those docker instance land in your cluster because you can build your
configuration into your docker image and you are off and running and should
have no problem dynamically spinning up a few more instances whenever you
want. Should hopefully simplify administration.

Jim

On Thu, Jul 16, 2015 at 2:08 PM, John Omernik <[email protected]> wrote:

> Timothy -
>
> I played with that, and the performance I was getting in Docker was about
> half that I was getting native. I think that for me, that was occurring
> because if I ran it in Docker, I needed to install the MapR Client in the
> container too, whereas when I run it in marathon, it's using the node's
> access to the disk.  I am comfortable in places where performance stuff
> like this occurs, to not docker all the things, and allow for the tar ball
> method.  Perhaps Mesos could find a way to cache locally?  (Note, putting
> it in MapR FS still has it load pretty quick)
>
> John
>
>
> On Thu, Jul 16, 2015 at 11:44 AM, Timothy Chen <[email protected]> wrote:
>
> > Also will be nice to launch Drill with a docker image so no tar ball is
> > needed, and much easier be cached on each slave.
> >
> > Tim
> >
> >
> > > On Jul 16, 2015, at 9:37 AM, John Omernik <[email protected]> wrote:
> > >
> > > Awesome thanks for the update on memory!
> > >
> > > On Thu, Jul 16, 2015 at 10:48 AM, Andries Engelbrecht <
> > > [email protected]> wrote:
> > >
> > >> Great write up and information! Will be interesting to see how this
> > >> evolves.
> > >>
> > >> A quick note, memory allocation is additive so you have to allocate
> for
> > >> direct plus heap memory. Drill uses direct memory for data
> > >> structures/operations and this is the one that will grow with larger
> > data
> > >> sets, etc.
> > >>
> > >> —Andries
> > >>
> > >>> On Jul 16, 2015, at 5:23 AM, John Omernik <[email protected]> wrote:
> > >>>
> > >>> I have been working on getting various frameworks working on my MapR
> > >>> Cluster that is also running Mesos. Basically, while I know that
> there
> > >> is a
> > >>> package from MapR (for Drill) I am trying to find a way to better
> > >> separate
> > >>> the storage layer from the computer layer.
> > >>>
> > >>> This isn't a dig on MapR, or any of the Hadoop distributions, it's
> > only I
> > >>> want flexibility to try things, to have an R&D team working with the
> > data
> > >>> in an environment that can try out new frameworks etc.  This
> > combination
> > >>> has been very good to me (maybe not to MapR support who received lots
> > of
> > >>> quirky questions from me.   They have been helpful in furthering my
> > >>> understanding of this space!)
> > >>>
> > >>> My next project I wanted to play with was Drill. I found
> > >>> https://github.com/mhausenblas/dromedar (Thanks Michael!) as a basic
> > >> start
> > >>> to a Drill on Mesos approach. I read through the code, I understand
> it,
> > >> but
> > >>> I wanted to see it at a more basic level.
> > >>>
> > >>> So I just figured out how to run Drill bits in Marathon (manually for
> > >>> now).  Basically, for anyone wanting to play along at home, This
> > actually
> > >>> works VERY well.  I used MapR FS to host my package from Drill, I
> set a
> > >>> conf directory.  (Multiple conf directories actually, I set it up so
> I
> > >>> could launch different "sized" drillbits).  I have been able to get
> > >> things
> > >>> running, and be performant on my small test cluster.
> > >>>
> > >>> For those who may be interested here are some of my notes.
> > >>>
> > >>> - I compiled Drill 1.2.0-SNAPSHOT from source. I ran into some
> > compiling
> > >>> issues that Jacques was able to help me through. Basically, Java 1.8
> > >> isn't
> > >>> support for building yet (fails some tests) but there is a work
> around
> > to
> > >>> that.
> > >>>
> > >>> - I took the built package and placed it in MapR FS.  Now, I have
> every
> > >>> node mounting MapRFS to same NFS location.  I could be using a hdfs
> > >>> (maprfs) based tarball but I haven't done that yet. I am just playing
> > >>> around and the NFS mounting of MapRFS sure is handy in this regard.
> > >>>
> > >>> - At first I created a single sized Drill bit, the Marathon JSON is
> > like
> > >>> this:
> > >>>
> > >>> {
> > >>>
> > >>> "cmd":
> > "/mapr/brewpot/mesos/drill/apache-drill-1.2.0-SNAPSHOT/bin/runbit
> > >>> --config /mapr/brewpot/mesos/drill/conf",
> > >>>
> > >>> "cpus": 2.0,
> > >>>
> > >>> "mem": 6144,
> > >>>
> > >>> "id": "drillpot",
> > >>>
> > >>> "instances": 1,
> > >>>
> > >>> "constraints": [["hostname", "UNIQUE"]]
> > >>>
> > >>> }
> > >>>
> > >>>
> > >>> So I can walk you through this.  The first is the command obviously.
> >  I
> > >>> use runbit instead of drillbit.sh start because I want this process
> to
> > >> stay
> > >>> running (from Marathon's perspective).  If I used the drillbit.sh, it
> > >> uses
> > >>> nohup and backgrounds it, Mesos/Marathon thinks it died and tries to
> > >> start
> > >>> another.
> > >>>
> > >>> cpus: obvious, maybe a bit small, but I have a small cluster.
> > >>>
> > >>> mem: When I set mem to 6144 (6GB) in my drill-env.sh, I set max
> direct
> > >>> memory to 6GB and max heap to 3GB.  I wasn't sure if I needed to set
> my
> > >>> marathon memory to 9GB or if the heap was used inside the direct
> > >> memory.  I
> > >>> could use some pointers here.
> > >>>
> > >>> id: This is the id of my cluster in the drill-overides.conf. I did
> this
> > >> so
> > >>> HA proxy would let me connect to the cluster via
> > drillpot.marathon.mesos
> > >>> and it worked pretty well!
> > >>>
> > >>> instances: I started with one, but could scale up with marathon
> > >>>
> > >>> constrains; I only wanted one drill bit per node because of port
> > >>> conflicts.  If I want to be multi tenant  and have more than one
> drill
> > >> bit
> > >>> per node, I would need to figure out how to abstract the ports. This
> is
> > >>> something that I could potentially do in a frame work for Mesos. But
> at
> > >> the
> > >>> same time, I wonder if if when a drill bit registers with a cluster,
> it
> > >>> could just "report" it ports in the zookeeper information.. This is
> > >>> intriguing because if it did this, we could allow it to pull random
> > ports
> > >>> offered to it from Mesos, registers the information, and away we go.
> > It
> > >>> would be intriguing.
> > >>>
> > >>>
> > >>> Once I posted this to marathon, all was good, bits started, queries
> > were
> > >>> had by all!  It worked well. Some challenges:
> > >>>
> > >>>
> > >>> 1.  Ports (as mentioned above) I am not managing those, so port
> > conflicts
> > >>> could occur.
> > >>>
> > >>> 2. I should use a tarball for Marathon, this would allow drill to
> work
> > on
> > >>> Mesos without the MapR requirement.
> > >>>
> > >>> 3. Logging. I have the default logback.xml in the conf directory and
> I
> > am
> > >>> getting file not found issues in my stderr on the Mesos tasks. This
> > isn't
> > >>> kill drill, and it still works, but I should organize my logging
> > better.
> > >>>
> > >>>
> > >>> Hopeful for the future:
> > >>>
> > >>> 1. It would be neat to have a frame work that did the actual running
> of
> > >> the
> > >>> bits.  Perhaps something that could scale up and down based on query
> > >> usage.
> > >>> I played around with some smaller drillbits (similar to how myriad
> > >> defines
> > >>> profiles) so I could have a drill cluster of 2 large bits, and 2
> small
> > >> bits
> > >>> on my 5 node cluster.   That worked, but lots of manual work. A
> > framework
> > >>> would be handy for managing that.
> > >>>
> > >>> 2. Other?
> > >>>
> > >>>
> > >>> I know this isn't a production thing, but I could see being able to
> go
> > >> from
> > >>> this to something a subset of production users could use in
> MapR/Mesos
> > >> (or
> > >>> just Mesos)   I just wanted to share some of my thought processes and
> > >> show
> > >>> a way that various tools can integrate.  Always happy to talk to shop
> > >> with
> > >>> folks on this stuff if anyone has any questions.
> > >>>
> > >>>
> > >>> John
> > >>
> > >>
> >
>



-- 
*Jim Scott*
Director, Enterprise Strategy & Architecture
+1 (347) 746-9281

 <http://www.mapr.com/>
[image: MapR Technologies] <http://www.mapr.com>

Now Available - Free Hadoop On-Demand Training
<http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available>

Reply via email to