Re: Drill on Mesos - A Story

John Omernik Thu, 16 Jul 2015 12:10:32 -0700

Timothy -

I played with that, and the performance I was getting in Docker was about
half that I was getting native. I think that for me, that was occurring
because if I ran it in Docker, I needed to install the MapR Client in the
container too, whereas when I run it in marathon, it's using the node's
access to the disk.  I am comfortable in places where performance stuff
like this occurs, to not docker all the things, and allow for the tar ball
method.  Perhaps Mesos could find a way to cache locally?  (Note, putting
it in MapR FS still has it load pretty quick)


John


On Thu, Jul 16, 2015 at 11:44 AM, Timothy Chen <[email protected]> wrote:

> Also will be nice to launch Drill with a docker image so no tar ball is
> needed, and much easier be cached on each slave.
>
> Tim
>
>
> > On Jul 16, 2015, at 9:37 AM, John Omernik <[email protected]> wrote:
> >
> > Awesome thanks for the update on memory!
> >
> > On Thu, Jul 16, 2015 at 10:48 AM, Andries Engelbrecht <
> > [email protected]> wrote:
> >
> >> Great write up and information! Will be interesting to see how this
> >> evolves.
> >>
> >> A quick note, memory allocation is additive so you have to allocate for
> >> direct plus heap memory. Drill uses direct memory for data
> >> structures/operations and this is the one that will grow with larger
> data
> >> sets, etc.
> >>
> >> —Andries
> >>
> >>> On Jul 16, 2015, at 5:23 AM, John Omernik <[email protected]> wrote:
> >>>
> >>> I have been working on getting various frameworks working on my MapR
> >>> Cluster that is also running Mesos. Basically, while I know that there
> >> is a
> >>> package from MapR (for Drill) I am trying to find a way to better
> >> separate
> >>> the storage layer from the computer layer.
> >>>
> >>> This isn't a dig on MapR, or any of the Hadoop distributions, it's
> only I
> >>> want flexibility to try things, to have an R&D team working with the
> data
> >>> in an environment that can try out new frameworks etc.  This
> combination
> >>> has been very good to me (maybe not to MapR support who received lots
> of
> >>> quirky questions from me.   They have been helpful in furthering my
> >>> understanding of this space!)
> >>>
> >>> My next project I wanted to play with was Drill. I found
> >>> https://github.com/mhausenblas/dromedar (Thanks Michael!) as a basic
> >> start
> >>> to a Drill on Mesos approach. I read through the code, I understand it,
> >> but
> >>> I wanted to see it at a more basic level.
> >>>
> >>> So I just figured out how to run Drill bits in Marathon (manually for
> >>> now).  Basically, for anyone wanting to play along at home, This
> actually
> >>> works VERY well.  I used MapR FS to host my package from Drill, I set a
> >>> conf directory.  (Multiple conf directories actually, I set it up so I
> >>> could launch different "sized" drillbits).  I have been able to get
> >> things
> >>> running, and be performant on my small test cluster.
> >>>
> >>> For those who may be interested here are some of my notes.
> >>>
> >>> - I compiled Drill 1.2.0-SNAPSHOT from source. I ran into some
> compiling
> >>> issues that Jacques was able to help me through. Basically, Java 1.8
> >> isn't
> >>> support for building yet (fails some tests) but there is a work around
> to
> >>> that.
> >>>
> >>> - I took the built package and placed it in MapR FS.  Now, I have every
> >>> node mounting MapRFS to same NFS location.  I could be using a hdfs
> >>> (maprfs) based tarball but I haven't done that yet. I am just playing
> >>> around and the NFS mounting of MapRFS sure is handy in this regard.
> >>>
> >>> - At first I created a single sized Drill bit, the Marathon JSON is
> like
> >>> this:
> >>>
> >>> {
> >>>
> >>> "cmd":
> "/mapr/brewpot/mesos/drill/apache-drill-1.2.0-SNAPSHOT/bin/runbit
> >>> --config /mapr/brewpot/mesos/drill/conf",
> >>>
> >>> "cpus": 2.0,
> >>>
> >>> "mem": 6144,
> >>>
> >>> "id": "drillpot",
> >>>
> >>> "instances": 1,
> >>>
> >>> "constraints": [["hostname", "UNIQUE"]]
> >>>
> >>> }
> >>>
> >>>
> >>> So I can walk you through this.  The first is the command obviously.
>  I
> >>> use runbit instead of drillbit.sh start because I want this process to
> >> stay
> >>> running (from Marathon's perspective).  If I used the drillbit.sh, it
> >> uses
> >>> nohup and backgrounds it, Mesos/Marathon thinks it died and tries to
> >> start
> >>> another.
> >>>
> >>> cpus: obvious, maybe a bit small, but I have a small cluster.
> >>>
> >>> mem: When I set mem to 6144 (6GB) in my drill-env.sh, I set max direct
> >>> memory to 6GB and max heap to 3GB.  I wasn't sure if I needed to set my
> >>> marathon memory to 9GB or if the heap was used inside the direct
> >> memory.  I
> >>> could use some pointers here.
> >>>
> >>> id: This is the id of my cluster in the drill-overides.conf. I did this
> >> so
> >>> HA proxy would let me connect to the cluster via
> drillpot.marathon.mesos
> >>> and it worked pretty well!
> >>>
> >>> instances: I started with one, but could scale up with marathon
> >>>
> >>> constrains; I only wanted one drill bit per node because of port
> >>> conflicts.  If I want to be multi tenant  and have more than one drill
> >> bit
> >>> per node, I would need to figure out how to abstract the ports. This is
> >>> something that I could potentially do in a frame work for Mesos. But at
> >> the
> >>> same time, I wonder if if when a drill bit registers with a cluster, it
> >>> could just "report" it ports in the zookeeper information.. This is
> >>> intriguing because if it did this, we could allow it to pull random
> ports
> >>> offered to it from Mesos, registers the information, and away we go.
> It
> >>> would be intriguing.
> >>>
> >>>
> >>> Once I posted this to marathon, all was good, bits started, queries
> were
> >>> had by all!  It worked well. Some challenges:
> >>>
> >>>
> >>> 1.  Ports (as mentioned above) I am not managing those, so port
> conflicts
> >>> could occur.
> >>>
> >>> 2. I should use a tarball for Marathon, this would allow drill to work
> on
> >>> Mesos without the MapR requirement.
> >>>
> >>> 3. Logging. I have the default logback.xml in the conf directory and I
> am
> >>> getting file not found issues in my stderr on the Mesos tasks. This
> isn't
> >>> kill drill, and it still works, but I should organize my logging
> better.
> >>>
> >>>
> >>> Hopeful for the future:
> >>>
> >>> 1. It would be neat to have a frame work that did the actual running of
> >> the
> >>> bits.  Perhaps something that could scale up and down based on query
> >> usage.
> >>> I played around with some smaller drillbits (similar to how myriad
> >> defines
> >>> profiles) so I could have a drill cluster of 2 large bits, and 2 small
> >> bits
> >>> on my 5 node cluster.   That worked, but lots of manual work. A
> framework
> >>> would be handy for managing that.
> >>>
> >>> 2. Other?
> >>>
> >>>
> >>> I know this isn't a production thing, but I could see being able to go
> >> from
> >>> this to something a subset of production users could use in MapR/Mesos
> >> (or
> >>> just Mesos)   I just wanted to share some of my thought processes and
> >> show
> >>> a way that various tools can integrate.  Always happy to talk to shop
> >> with
> >>> folks on this stuff if anyone has any questions.
> >>>
> >>>
> >>> John
> >>
> >>
>

Re: Drill on Mesos - A Story

Reply via email to