Re: Drill on Mesos - A Story

Timothy Chen Thu, 16 Jul 2015 09:45:51 -0700

Also will be nice to launch Drill with a docker image so no tar ball is needed, 
and much easier be cached on each slave.


Tim


> On Jul 16, 2015, at 9:37 AM, John Omernik <[email protected]> wrote:
> 
> Awesome thanks for the update on memory!
> 
> On Thu, Jul 16, 2015 at 10:48 AM, Andries Engelbrecht <
> [email protected]> wrote:
> 
>> Great write up and information! Will be interesting to see how this
>> evolves.
>> 
>> A quick note, memory allocation is additive so you have to allocate for
>> direct plus heap memory. Drill uses direct memory for data
>> structures/operations and this is the one that will grow with larger data
>> sets, etc.
>> 
>> —Andries
>> 
>>> On Jul 16, 2015, at 5:23 AM, John Omernik <[email protected]> wrote:
>>> 
>>> I have been working on getting various frameworks working on my MapR
>>> Cluster that is also running Mesos. Basically, while I know that there
>> is a
>>> package from MapR (for Drill) I am trying to find a way to better
>> separate
>>> the storage layer from the computer layer.
>>> 
>>> This isn't a dig on MapR, or any of the Hadoop distributions, it's only I
>>> want flexibility to try things, to have an R&D team working with the data
>>> in an environment that can try out new frameworks etc.  This combination
>>> has been very good to me (maybe not to MapR support who received lots of
>>> quirky questions from me.   They have been helpful in furthering my
>>> understanding of this space!)
>>> 
>>> My next project I wanted to play with was Drill. I found
>>> https://github.com/mhausenblas/dromedar (Thanks Michael!) as a basic
>> start
>>> to a Drill on Mesos approach. I read through the code, I understand it,
>> but
>>> I wanted to see it at a more basic level.
>>> 
>>> So I just figured out how to run Drill bits in Marathon (manually for
>>> now).  Basically, for anyone wanting to play along at home, This actually
>>> works VERY well.  I used MapR FS to host my package from Drill, I set a
>>> conf directory.  (Multiple conf directories actually, I set it up so I
>>> could launch different "sized" drillbits).  I have been able to get
>> things
>>> running, and be performant on my small test cluster.
>>> 
>>> For those who may be interested here are some of my notes.
>>> 
>>> - I compiled Drill 1.2.0-SNAPSHOT from source. I ran into some compiling
>>> issues that Jacques was able to help me through. Basically, Java 1.8
>> isn't
>>> support for building yet (fails some tests) but there is a work around to
>>> that.
>>> 
>>> - I took the built package and placed it in MapR FS.  Now, I have every
>>> node mounting MapRFS to same NFS location.  I could be using a hdfs
>>> (maprfs) based tarball but I haven't done that yet. I am just playing
>>> around and the NFS mounting of MapRFS sure is handy in this regard.
>>> 
>>> - At first I created a single sized Drill bit, the Marathon JSON is like
>>> this:
>>> 
>>> {
>>> 
>>> "cmd": "/mapr/brewpot/mesos/drill/apache-drill-1.2.0-SNAPSHOT/bin/runbit
>>> --config /mapr/brewpot/mesos/drill/conf",
>>> 
>>> "cpus": 2.0,
>>> 
>>> "mem": 6144,
>>> 
>>> "id": "drillpot",
>>> 
>>> "instances": 1,
>>> 
>>> "constraints": [["hostname", "UNIQUE"]]
>>> 
>>> }
>>> 
>>> 
>>> So I can walk you through this.  The first is the command obviously.   I
>>> use runbit instead of drillbit.sh start because I want this process to
>> stay
>>> running (from Marathon's perspective).  If I used the drillbit.sh, it
>> uses
>>> nohup and backgrounds it, Mesos/Marathon thinks it died and tries to
>> start
>>> another.
>>> 
>>> cpus: obvious, maybe a bit small, but I have a small cluster.
>>> 
>>> mem: When I set mem to 6144 (6GB) in my drill-env.sh, I set max direct
>>> memory to 6GB and max heap to 3GB.  I wasn't sure if I needed to set my
>>> marathon memory to 9GB or if the heap was used inside the direct
>> memory.  I
>>> could use some pointers here.
>>> 
>>> id: This is the id of my cluster in the drill-overides.conf. I did this
>> so
>>> HA proxy would let me connect to the cluster via drillpot.marathon.mesos
>>> and it worked pretty well!
>>> 
>>> instances: I started with one, but could scale up with marathon
>>> 
>>> constrains; I only wanted one drill bit per node because of port
>>> conflicts.  If I want to be multi tenant  and have more than one drill
>> bit
>>> per node, I would need to figure out how to abstract the ports. This is
>>> something that I could potentially do in a frame work for Mesos. But at
>> the
>>> same time, I wonder if if when a drill bit registers with a cluster, it
>>> could just "report" it ports in the zookeeper information.. This is
>>> intriguing because if it did this, we could allow it to pull random ports
>>> offered to it from Mesos, registers the information, and away we go.  It
>>> would be intriguing.
>>> 
>>> 
>>> Once I posted this to marathon, all was good, bits started, queries were
>>> had by all!  It worked well. Some challenges:
>>> 
>>> 
>>> 1.  Ports (as mentioned above) I am not managing those, so port conflicts
>>> could occur.
>>> 
>>> 2. I should use a tarball for Marathon, this would allow drill to work on
>>> Mesos without the MapR requirement.
>>> 
>>> 3. Logging. I have the default logback.xml in the conf directory and I am
>>> getting file not found issues in my stderr on the Mesos tasks. This isn't
>>> kill drill, and it still works, but I should organize my logging better.
>>> 
>>> 
>>> Hopeful for the future:
>>> 
>>> 1. It would be neat to have a frame work that did the actual running of
>> the
>>> bits.  Perhaps something that could scale up and down based on query
>> usage.
>>> I played around with some smaller drillbits (similar to how myriad
>> defines
>>> profiles) so I could have a drill cluster of 2 large bits, and 2 small
>> bits
>>> on my 5 node cluster.   That worked, but lots of manual work. A framework
>>> would be handy for managing that.
>>> 
>>> 2. Other?
>>> 
>>> 
>>> I know this isn't a production thing, but I could see being able to go
>> from
>>> this to something a subset of production users could use in MapR/Mesos
>> (or
>>> just Mesos)   I just wanted to share some of my thought processes and
>> show
>>> a way that various tools can integrate.  Always happy to talk to shop
>> with
>>> folks on this stuff if anyone has any questions.
>>> 
>>> 
>>> John
>> 
>>

Re: Drill on Mesos - A Story

Reply via email to