Re: Using Drill with EMR

Andries Engelbrecht Fri, 27 Feb 2015 11:01:09 -0800

1. Yes. You can run Drill & ZK separate from Hadoop env. And it work on AWS.

2. I have not used it with Amazon EMR, maybe others can comment. Why are you 
looking at EMR and Drill vs spinning up instances with Drill & ZK? Drill does 
not work like Hive with underlying MR needed to execute queries, Drill is an 
execution engine itself.

3. Locality of where the Drill instances and S3 storage is will be key. It will 
be advisable to be in the same region and DC of AWS for both to get 
performance. I have not experimented enough with the JSON data file size, but 
you probably want to balance JSON file size to # of files for best behavior. 
Perhaps start with 32MB size JSON files and scale to 64/128 (and maybe 256MB) 
and see how it performs with S3.

—Andries

On Feb 25, 2015, at 11:17 AM, Mihai Stoicescu <[email protected]> wrote:

> Hello,
> 
> My name is Mihai Stoicescu and I am trying to experiment with  Apache
> Drill.
> 
> I have multiple questions that I hope you can help me find the answers:
> 
>       1. Can Drill & Zookeper work outside Hadoop environment?
> 
>        2. What would be the configuration steps I would need to make to
> enable Drill with Amazon EMR?
> 
>        3. If I want to keep the data inside S3 as JSON files, do you have
> any recommendations in terms of setup and performance?
> 
> 
>   Thank you,
> 
> Mihai Stoicescu

Re: Using Drill with EMR

Reply via email to