Awesome thanks for the update on memory!

On Thu, Jul 16, 2015 at 10:48 AM, Andries Engelbrecht <
[email protected]> wrote:

> Great write up and information! Will be interesting to see how this
> evolves.
>
> A quick note, memory allocation is additive so you have to allocate for
> direct plus heap memory. Drill uses direct memory for data
> structures/operations and this is the one that will grow with larger data
> sets, etc.
>
> —Andries
>
> On Jul 16, 2015, at 5:23 AM, John Omernik <[email protected]> wrote:
>
> > I have been working on getting various frameworks working on my MapR
> > Cluster that is also running Mesos. Basically, while I know that there
> is a
> > package from MapR (for Drill) I am trying to find a way to better
> separate
> > the storage layer from the computer layer.
> >
> > This isn't a dig on MapR, or any of the Hadoop distributions, it's only I
> > want flexibility to try things, to have an R&D team working with the data
> > in an environment that can try out new frameworks etc.  This combination
> > has been very good to me (maybe not to MapR support who received lots of
> > quirky questions from me.   They have been helpful in furthering my
> > understanding of this space!)
> >
> > My next project I wanted to play with was Drill. I found
> > https://github.com/mhausenblas/dromedar (Thanks Michael!) as a basic
> start
> > to a Drill on Mesos approach. I read through the code, I understand it,
> but
> > I wanted to see it at a more basic level.
> >
> > So I just figured out how to run Drill bits in Marathon (manually for
> > now).  Basically, for anyone wanting to play along at home, This actually
> > works VERY well.  I used MapR FS to host my package from Drill, I set a
> > conf directory.  (Multiple conf directories actually, I set it up so I
> > could launch different "sized" drillbits).  I have been able to get
> things
> > running, and be performant on my small test cluster.
> >
> > For those who may be interested here are some of my notes.
> >
> > - I compiled Drill 1.2.0-SNAPSHOT from source. I ran into some compiling
> > issues that Jacques was able to help me through. Basically, Java 1.8
> isn't
> > support for building yet (fails some tests) but there is a work around to
> > that.
> >
> > - I took the built package and placed it in MapR FS.  Now, I have every
> > node mounting MapRFS to same NFS location.  I could be using a hdfs
> > (maprfs) based tarball but I haven't done that yet. I am just playing
> > around and the NFS mounting of MapRFS sure is handy in this regard.
> >
> > - At first I created a single sized Drill bit, the Marathon JSON is like
> > this:
> >
> > {
> >
> > "cmd": "/mapr/brewpot/mesos/drill/apache-drill-1.2.0-SNAPSHOT/bin/runbit
> > --config /mapr/brewpot/mesos/drill/conf",
> >
> > "cpus": 2.0,
> >
> > "mem": 6144,
> >
> > "id": "drillpot",
> >
> > "instances": 1,
> >
> > "constraints": [["hostname", "UNIQUE"]]
> >
> > }
> >
> >
> > So I can walk you through this.  The first is the command obviously.   I
> > use runbit instead of drillbit.sh start because I want this process to
> stay
> > running (from Marathon's perspective).  If I used the drillbit.sh, it
> uses
> > nohup and backgrounds it, Mesos/Marathon thinks it died and tries to
> start
> > another.
> >
> > cpus: obvious, maybe a bit small, but I have a small cluster.
> >
> > mem: When I set mem to 6144 (6GB) in my drill-env.sh, I set max direct
> > memory to 6GB and max heap to 3GB.  I wasn't sure if I needed to set my
> > marathon memory to 9GB or if the heap was used inside the direct
> memory.  I
> > could use some pointers here.
> >
> > id: This is the id of my cluster in the drill-overides.conf. I did this
> so
> > HA proxy would let me connect to the cluster via drillpot.marathon.mesos
> > and it worked pretty well!
> >
> > instances: I started with one, but could scale up with marathon
> >
> > constrains; I only wanted one drill bit per node because of port
> > conflicts.  If I want to be multi tenant  and have more than one drill
> bit
> > per node, I would need to figure out how to abstract the ports. This is
> > something that I could potentially do in a frame work for Mesos. But at
> the
> > same time, I wonder if if when a drill bit registers with a cluster, it
> > could just "report" it ports in the zookeeper information.. This is
> > intriguing because if it did this, we could allow it to pull random ports
> > offered to it from Mesos, registers the information, and away we go.  It
> > would be intriguing.
> >
> >
> > Once I posted this to marathon, all was good, bits started, queries were
> > had by all!  It worked well. Some challenges:
> >
> >
> > 1.  Ports (as mentioned above) I am not managing those, so port conflicts
> > could occur.
> >
> > 2. I should use a tarball for Marathon, this would allow drill to work on
> > Mesos without the MapR requirement.
> >
> > 3. Logging. I have the default logback.xml in the conf directory and I am
> > getting file not found issues in my stderr on the Mesos tasks. This isn't
> > kill drill, and it still works, but I should organize my logging better.
> >
> >
> > Hopeful for the future:
> >
> > 1. It would be neat to have a frame work that did the actual running of
> the
> > bits.  Perhaps something that could scale up and down based on query
> usage.
> > I played around with some smaller drillbits (similar to how myriad
> defines
> > profiles) so I could have a drill cluster of 2 large bits, and 2 small
> bits
> > on my 5 node cluster.   That worked, but lots of manual work. A framework
> > would be handy for managing that.
> >
> > 2. Other?
> >
> >
> > I know this isn't a production thing, but I could see being able to go
> from
> > this to something a subset of production users could use in MapR/Mesos
> (or
> > just Mesos)   I just wanted to share some of my thought processes and
> show
> > a way that various tools can integrate.  Always happy to talk to shop
> with
> > folks on this stuff if anyone has any questions.
> >
> >
> > John
>
>

Reply via email to