Awesome thanks for the update on memory! On Thu, Jul 16, 2015 at 10:48 AM, Andries Engelbrecht < [email protected]> wrote:
> Great write up and information! Will be interesting to see how this > evolves. > > A quick note, memory allocation is additive so you have to allocate for > direct plus heap memory. Drill uses direct memory for data > structures/operations and this is the one that will grow with larger data > sets, etc. > > —Andries > > On Jul 16, 2015, at 5:23 AM, John Omernik <[email protected]> wrote: > > > I have been working on getting various frameworks working on my MapR > > Cluster that is also running Mesos. Basically, while I know that there > is a > > package from MapR (for Drill) I am trying to find a way to better > separate > > the storage layer from the computer layer. > > > > This isn't a dig on MapR, or any of the Hadoop distributions, it's only I > > want flexibility to try things, to have an R&D team working with the data > > in an environment that can try out new frameworks etc. This combination > > has been very good to me (maybe not to MapR support who received lots of > > quirky questions from me. They have been helpful in furthering my > > understanding of this space!) > > > > My next project I wanted to play with was Drill. I found > > https://github.com/mhausenblas/dromedar (Thanks Michael!) as a basic > start > > to a Drill on Mesos approach. I read through the code, I understand it, > but > > I wanted to see it at a more basic level. > > > > So I just figured out how to run Drill bits in Marathon (manually for > > now). Basically, for anyone wanting to play along at home, This actually > > works VERY well. I used MapR FS to host my package from Drill, I set a > > conf directory. (Multiple conf directories actually, I set it up so I > > could launch different "sized" drillbits). I have been able to get > things > > running, and be performant on my small test cluster. > > > > For those who may be interested here are some of my notes. > > > > - I compiled Drill 1.2.0-SNAPSHOT from source. I ran into some compiling > > issues that Jacques was able to help me through. Basically, Java 1.8 > isn't > > support for building yet (fails some tests) but there is a work around to > > that. > > > > - I took the built package and placed it in MapR FS. Now, I have every > > node mounting MapRFS to same NFS location. I could be using a hdfs > > (maprfs) based tarball but I haven't done that yet. I am just playing > > around and the NFS mounting of MapRFS sure is handy in this regard. > > > > - At first I created a single sized Drill bit, the Marathon JSON is like > > this: > > > > { > > > > "cmd": "/mapr/brewpot/mesos/drill/apache-drill-1.2.0-SNAPSHOT/bin/runbit > > --config /mapr/brewpot/mesos/drill/conf", > > > > "cpus": 2.0, > > > > "mem": 6144, > > > > "id": "drillpot", > > > > "instances": 1, > > > > "constraints": [["hostname", "UNIQUE"]] > > > > } > > > > > > So I can walk you through this. The first is the command obviously. I > > use runbit instead of drillbit.sh start because I want this process to > stay > > running (from Marathon's perspective). If I used the drillbit.sh, it > uses > > nohup and backgrounds it, Mesos/Marathon thinks it died and tries to > start > > another. > > > > cpus: obvious, maybe a bit small, but I have a small cluster. > > > > mem: When I set mem to 6144 (6GB) in my drill-env.sh, I set max direct > > memory to 6GB and max heap to 3GB. I wasn't sure if I needed to set my > > marathon memory to 9GB or if the heap was used inside the direct > memory. I > > could use some pointers here. > > > > id: This is the id of my cluster in the drill-overides.conf. I did this > so > > HA proxy would let me connect to the cluster via drillpot.marathon.mesos > > and it worked pretty well! > > > > instances: I started with one, but could scale up with marathon > > > > constrains; I only wanted one drill bit per node because of port > > conflicts. If I want to be multi tenant and have more than one drill > bit > > per node, I would need to figure out how to abstract the ports. This is > > something that I could potentially do in a frame work for Mesos. But at > the > > same time, I wonder if if when a drill bit registers with a cluster, it > > could just "report" it ports in the zookeeper information.. This is > > intriguing because if it did this, we could allow it to pull random ports > > offered to it from Mesos, registers the information, and away we go. It > > would be intriguing. > > > > > > Once I posted this to marathon, all was good, bits started, queries were > > had by all! It worked well. Some challenges: > > > > > > 1. Ports (as mentioned above) I am not managing those, so port conflicts > > could occur. > > > > 2. I should use a tarball for Marathon, this would allow drill to work on > > Mesos without the MapR requirement. > > > > 3. Logging. I have the default logback.xml in the conf directory and I am > > getting file not found issues in my stderr on the Mesos tasks. This isn't > > kill drill, and it still works, but I should organize my logging better. > > > > > > Hopeful for the future: > > > > 1. It would be neat to have a frame work that did the actual running of > the > > bits. Perhaps something that could scale up and down based on query > usage. > > I played around with some smaller drillbits (similar to how myriad > defines > > profiles) so I could have a drill cluster of 2 large bits, and 2 small > bits > > on my 5 node cluster. That worked, but lots of manual work. A framework > > would be handy for managing that. > > > > 2. Other? > > > > > > I know this isn't a production thing, but I could see being able to go > from > > this to something a subset of production users could use in MapR/Mesos > (or > > just Mesos) I just wanted to share some of my thought processes and > show > > a way that various tools can integrate. Always happy to talk to shop > with > > folks on this stuff if anyone has any questions. > > > > > > John > >
