Timothy - I played with that, and the performance I was getting in Docker was about half that I was getting native. I think that for me, that was occurring because if I ran it in Docker, I needed to install the MapR Client in the container too, whereas when I run it in marathon, it's using the node's access to the disk. I am comfortable in places where performance stuff like this occurs, to not docker all the things, and allow for the tar ball method. Perhaps Mesos could find a way to cache locally? (Note, putting it in MapR FS still has it load pretty quick)
John On Thu, Jul 16, 2015 at 11:44 AM, Timothy Chen <[email protected]> wrote: > Also will be nice to launch Drill with a docker image so no tar ball is > needed, and much easier be cached on each slave. > > Tim > > > > On Jul 16, 2015, at 9:37 AM, John Omernik <[email protected]> wrote: > > > > Awesome thanks for the update on memory! > > > > On Thu, Jul 16, 2015 at 10:48 AM, Andries Engelbrecht < > > [email protected]> wrote: > > > >> Great write up and information! Will be interesting to see how this > >> evolves. > >> > >> A quick note, memory allocation is additive so you have to allocate for > >> direct plus heap memory. Drill uses direct memory for data > >> structures/operations and this is the one that will grow with larger > data > >> sets, etc. > >> > >> —Andries > >> > >>> On Jul 16, 2015, at 5:23 AM, John Omernik <[email protected]> wrote: > >>> > >>> I have been working on getting various frameworks working on my MapR > >>> Cluster that is also running Mesos. Basically, while I know that there > >> is a > >>> package from MapR (for Drill) I am trying to find a way to better > >> separate > >>> the storage layer from the computer layer. > >>> > >>> This isn't a dig on MapR, or any of the Hadoop distributions, it's > only I > >>> want flexibility to try things, to have an R&D team working with the > data > >>> in an environment that can try out new frameworks etc. This > combination > >>> has been very good to me (maybe not to MapR support who received lots > of > >>> quirky questions from me. They have been helpful in furthering my > >>> understanding of this space!) > >>> > >>> My next project I wanted to play with was Drill. I found > >>> https://github.com/mhausenblas/dromedar (Thanks Michael!) as a basic > >> start > >>> to a Drill on Mesos approach. I read through the code, I understand it, > >> but > >>> I wanted to see it at a more basic level. > >>> > >>> So I just figured out how to run Drill bits in Marathon (manually for > >>> now). Basically, for anyone wanting to play along at home, This > actually > >>> works VERY well. I used MapR FS to host my package from Drill, I set a > >>> conf directory. (Multiple conf directories actually, I set it up so I > >>> could launch different "sized" drillbits). I have been able to get > >> things > >>> running, and be performant on my small test cluster. > >>> > >>> For those who may be interested here are some of my notes. > >>> > >>> - I compiled Drill 1.2.0-SNAPSHOT from source. I ran into some > compiling > >>> issues that Jacques was able to help me through. Basically, Java 1.8 > >> isn't > >>> support for building yet (fails some tests) but there is a work around > to > >>> that. > >>> > >>> - I took the built package and placed it in MapR FS. Now, I have every > >>> node mounting MapRFS to same NFS location. I could be using a hdfs > >>> (maprfs) based tarball but I haven't done that yet. I am just playing > >>> around and the NFS mounting of MapRFS sure is handy in this regard. > >>> > >>> - At first I created a single sized Drill bit, the Marathon JSON is > like > >>> this: > >>> > >>> { > >>> > >>> "cmd": > "/mapr/brewpot/mesos/drill/apache-drill-1.2.0-SNAPSHOT/bin/runbit > >>> --config /mapr/brewpot/mesos/drill/conf", > >>> > >>> "cpus": 2.0, > >>> > >>> "mem": 6144, > >>> > >>> "id": "drillpot", > >>> > >>> "instances": 1, > >>> > >>> "constraints": [["hostname", "UNIQUE"]] > >>> > >>> } > >>> > >>> > >>> So I can walk you through this. The first is the command obviously. > I > >>> use runbit instead of drillbit.sh start because I want this process to > >> stay > >>> running (from Marathon's perspective). If I used the drillbit.sh, it > >> uses > >>> nohup and backgrounds it, Mesos/Marathon thinks it died and tries to > >> start > >>> another. > >>> > >>> cpus: obvious, maybe a bit small, but I have a small cluster. > >>> > >>> mem: When I set mem to 6144 (6GB) in my drill-env.sh, I set max direct > >>> memory to 6GB and max heap to 3GB. I wasn't sure if I needed to set my > >>> marathon memory to 9GB or if the heap was used inside the direct > >> memory. I > >>> could use some pointers here. > >>> > >>> id: This is the id of my cluster in the drill-overides.conf. I did this > >> so > >>> HA proxy would let me connect to the cluster via > drillpot.marathon.mesos > >>> and it worked pretty well! > >>> > >>> instances: I started with one, but could scale up with marathon > >>> > >>> constrains; I only wanted one drill bit per node because of port > >>> conflicts. If I want to be multi tenant and have more than one drill > >> bit > >>> per node, I would need to figure out how to abstract the ports. This is > >>> something that I could potentially do in a frame work for Mesos. But at > >> the > >>> same time, I wonder if if when a drill bit registers with a cluster, it > >>> could just "report" it ports in the zookeeper information.. This is > >>> intriguing because if it did this, we could allow it to pull random > ports > >>> offered to it from Mesos, registers the information, and away we go. > It > >>> would be intriguing. > >>> > >>> > >>> Once I posted this to marathon, all was good, bits started, queries > were > >>> had by all! It worked well. Some challenges: > >>> > >>> > >>> 1. Ports (as mentioned above) I am not managing those, so port > conflicts > >>> could occur. > >>> > >>> 2. I should use a tarball for Marathon, this would allow drill to work > on > >>> Mesos without the MapR requirement. > >>> > >>> 3. Logging. I have the default logback.xml in the conf directory and I > am > >>> getting file not found issues in my stderr on the Mesos tasks. This > isn't > >>> kill drill, and it still works, but I should organize my logging > better. > >>> > >>> > >>> Hopeful for the future: > >>> > >>> 1. It would be neat to have a frame work that did the actual running of > >> the > >>> bits. Perhaps something that could scale up and down based on query > >> usage. > >>> I played around with some smaller drillbits (similar to how myriad > >> defines > >>> profiles) so I could have a drill cluster of 2 large bits, and 2 small > >> bits > >>> on my 5 node cluster. That worked, but lots of manual work. A > framework > >>> would be handy for managing that. > >>> > >>> 2. Other? > >>> > >>> > >>> I know this isn't a production thing, but I could see being able to go > >> from > >>> this to something a subset of production users could use in MapR/Mesos > >> (or > >>> just Mesos) I just wanted to share some of my thought processes and > >> show > >>> a way that various tools can integrate. Always happy to talk to shop > >> with > >>> folks on this stuff if anyone has any questions. > >>> > >>> > >>> John > >> > >> >
