1. Yes. You can run Drill & ZK separate from Hadoop env. And it work on AWS.
2. I have not used it with Amazon EMR, maybe others can comment. Why are you looking at EMR and Drill vs spinning up instances with Drill & ZK? Drill does not work like Hive with underlying MR needed to execute queries, Drill is an execution engine itself. 3. Locality of where the Drill instances and S3 storage is will be key. It will be advisable to be in the same region and DC of AWS for both to get performance. I have not experimented enough with the JSON data file size, but you probably want to balance JSON file size to # of files for best behavior. Perhaps start with 32MB size JSON files and scale to 64/128 (and maybe 256MB) and see how it performs with S3. —Andries On Feb 25, 2015, at 11:17 AM, Mihai Stoicescu <[email protected]> wrote: > Hello, > > My name is Mihai Stoicescu and I am trying to experiment with Apache > Drill. > > I have multiple questions that I hope you can help me find the answers: > > 1. Can Drill & Zookeper work outside Hadoop environment? > > 2. What would be the configuration steps I would need to make to > enable Drill with Amazon EMR? > > 3. If I want to keep the data inside S3 as JSON files, do you have > any recommendations in terms of setup and performance? > > > Thank you, > > Mihai Stoicescu
