Re: Drill on Mesos - A Story

Timothy Chen Thu, 16 Jul 2015 13:59:42 -0700

Hi John,

Fetcher cache is going to be in 0.23 so that's something that you can leverage.


You'll find more doc in the docs folder about it once we have 0.23 released.

Tim

On Thu, Jul 16, 2015 at 12:08 PM, John Omernik <[email protected]> wrote:
> Timothy -
>
> I played with that, and the performance I was getting in Docker was about
> half that I was getting native. I think that for me, that was occurring
> because if I ran it in Docker, I needed to install the MapR Client in the
> container too, whereas when I run it in marathon, it's using the node's
> access to the disk.  I am comfortable in places where performance stuff
> like this occurs, to not docker all the things, and allow for the tar ball
> method.  Perhaps Mesos could find a way to cache locally?  (Note, putting
> it in MapR FS still has it load pretty quick)
>
> John
>
>
> On Thu, Jul 16, 2015 at 11:44 AM, Timothy Chen <[email protected]> wrote:
>
>> Also will be nice to launch Drill with a docker image so no tar ball is
>> needed, and much easier be cached on each slave.
>>
>> Tim
>>
>>
>> > On Jul 16, 2015, at 9:37 AM, John Omernik <[email protected]> wrote:
>> >
>> > Awesome thanks for the update on memory!
>> >
>> > On Thu, Jul 16, 2015 at 10:48 AM, Andries Engelbrecht <
>> > [email protected]> wrote:
>> >
>> >> Great write up and information! Will be interesting to see how this
>> >> evolves.
>> >>
>> >> A quick note, memory allocation is additive so you have to allocate for
>> >> direct plus heap memory. Drill uses direct memory for data
>> >> structures/operations and this is the one that will grow with larger
>> data
>> >> sets, etc.
>> >>
>> >> —Andries
>> >>
>> >>> On Jul 16, 2015, at 5:23 AM, John Omernik <[email protected]> wrote:
>> >>>
>> >>> I have been working on getting various frameworks working on my MapR
>> >>> Cluster that is also running Mesos. Basically, while I know that there
>> >> is a
>> >>> package from MapR (for Drill) I am trying to find a way to better
>> >> separate
>> >>> the storage layer from the computer layer.
>> >>>
>> >>> This isn't a dig on MapR, or any of the Hadoop distributions, it's
>> only I
>> >>> want flexibility to try things, to have an R&D team working with the
>> data
>> >>> in an environment that can try out new frameworks etc.  This
>> combination
>> >>> has been very good to me (maybe not to MapR support who received lots
>> of
>> >>> quirky questions from me.   They have been helpful in furthering my
>> >>> understanding of this space!)
>> >>>
>> >>> My next project I wanted to play with was Drill. I found
>> >>> https://github.com/mhausenblas/dromedar (Thanks Michael!) as a basic
>> >> start
>> >>> to a Drill on Mesos approach. I read through the code, I understand it,
>> >> but
>> >>> I wanted to see it at a more basic level.
>> >>>
>> >>> So I just figured out how to run Drill bits in Marathon (manually for
>> >>> now).  Basically, for anyone wanting to play along at home, This
>> actually
>> >>> works VERY well.  I used MapR FS to host my package from Drill, I set a
>> >>> conf directory.  (Multiple conf directories actually, I set it up so I
>> >>> could launch different "sized" drillbits).  I have been able to get
>> >> things
>> >>> running, and be performant on my small test cluster.
>> >>>
>> >>> For those who may be interested here are some of my notes.
>> >>>
>> >>> - I compiled Drill 1.2.0-SNAPSHOT from source. I ran into some
>> compiling
>> >>> issues that Jacques was able to help me through. Basically, Java 1.8
>> >> isn't
>> >>> support for building yet (fails some tests) but there is a work around
>> to
>> >>> that.
>> >>>
>> >>> - I took the built package and placed it in MapR FS.  Now, I have every
>> >>> node mounting MapRFS to same NFS location.  I could be using a hdfs
>> >>> (maprfs) based tarball but I haven't done that yet. I am just playing
>> >>> around and the NFS mounting of MapRFS sure is handy in this regard.
>> >>>
>> >>> - At first I created a single sized Drill bit, the Marathon JSON is
>> like
>> >>> this:
>> >>>
>> >>> {
>> >>>
>> >>> "cmd":
>> "/mapr/brewpot/mesos/drill/apache-drill-1.2.0-SNAPSHOT/bin/runbit
>> >>> --config /mapr/brewpot/mesos/drill/conf",
>> >>>
>> >>> "cpus": 2.0,
>> >>>
>> >>> "mem": 6144,
>> >>>
>> >>> "id": "drillpot",
>> >>>
>> >>> "instances": 1,
>> >>>
>> >>> "constraints": [["hostname", "UNIQUE"]]
>> >>>
>> >>> }
>> >>>
>> >>>
>> >>> So I can walk you through this.  The first is the command obviously.
>>  I
>> >>> use runbit instead of drillbit.sh start because I want this process to
>> >> stay
>> >>> running (from Marathon's perspective).  If I used the drillbit.sh, it
>> >> uses
>> >>> nohup and backgrounds it, Mesos/Marathon thinks it died and tries to
>> >> start
>> >>> another.
>> >>>
>> >>> cpus: obvious, maybe a bit small, but I have a small cluster.
>> >>>
>> >>> mem: When I set mem to 6144 (6GB) in my drill-env.sh, I set max direct
>> >>> memory to 6GB and max heap to 3GB.  I wasn't sure if I needed to set my
>> >>> marathon memory to 9GB or if the heap was used inside the direct
>> >> memory.  I
>> >>> could use some pointers here.
>> >>>
>> >>> id: This is the id of my cluster in the drill-overides.conf. I did this
>> >> so
>> >>> HA proxy would let me connect to the cluster via
>> drillpot.marathon.mesos
>> >>> and it worked pretty well!
>> >>>
>> >>> instances: I started with one, but could scale up with marathon
>> >>>
>> >>> constrains; I only wanted one drill bit per node because of port
>> >>> conflicts.  If I want to be multi tenant  and have more than one drill
>> >> bit
>> >>> per node, I would need to figure out how to abstract the ports. This is
>> >>> something that I could potentially do in a frame work for Mesos. But at
>> >> the
>> >>> same time, I wonder if if when a drill bit registers with a cluster, it
>> >>> could just "report" it ports in the zookeeper information.. This is
>> >>> intriguing because if it did this, we could allow it to pull random
>> ports
>> >>> offered to it from Mesos, registers the information, and away we go.
>> It
>> >>> would be intriguing.
>> >>>
>> >>>
>> >>> Once I posted this to marathon, all was good, bits started, queries
>> were
>> >>> had by all!  It worked well. Some challenges:
>> >>>
>> >>>
>> >>> 1.  Ports (as mentioned above) I am not managing those, so port
>> conflicts
>> >>> could occur.
>> >>>
>> >>> 2. I should use a tarball for Marathon, this would allow drill to work
>> on
>> >>> Mesos without the MapR requirement.
>> >>>
>> >>> 3. Logging. I have the default logback.xml in the conf directory and I
>> am
>> >>> getting file not found issues in my stderr on the Mesos tasks. This
>> isn't
>> >>> kill drill, and it still works, but I should organize my logging
>> better.
>> >>>
>> >>>
>> >>> Hopeful for the future:
>> >>>
>> >>> 1. It would be neat to have a frame work that did the actual running of
>> >> the
>> >>> bits.  Perhaps something that could scale up and down based on query
>> >> usage.
>> >>> I played around with some smaller drillbits (similar to how myriad
>> >> defines
>> >>> profiles) so I could have a drill cluster of 2 large bits, and 2 small
>> >> bits
>> >>> on my 5 node cluster.   That worked, but lots of manual work. A
>> framework
>> >>> would be handy for managing that.
>> >>>
>> >>> 2. Other?
>> >>>
>> >>>
>> >>> I know this isn't a production thing, but I could see being able to go
>> >> from
>> >>> this to something a subset of production users could use in MapR/Mesos
>> >> (or
>> >>> just Mesos)   I just wanted to share some of my thought processes and
>> >> show
>> >>> a way that various tools can integrate.  Always happy to talk to shop
>> >> with
>> >>> folks on this stuff if anyone has any questions.
>> >>>
>> >>>
>> >>> John
>> >>
>> >>
>>

Re: Drill on Mesos - A Story

Reply via email to