Hi,

My name is Manu and I am working as a Bigdata architect in a small startup 
company in Kochi, India. Our new project handles visualizing large volume of 
unstructured data in cloud storage (It can be S3, Azure blob storage or Google 
cloud storage). We are planning to use Apache Drill as SQL query execution 
engine so that we will be cloud agnostic. Unfortunately we are finding some  
key questions unanswered before moving ahead with Drill as our platform. Hoping 
you can provide some clarity and it will be much appreciated.


  1.  When stetting up the drill cluster in prod environment to query data 
ranging from several gigabytes to few terabytes hosted in s3/blob storage/cloud 
storage, what are the considerations for disk space ? I understand drill bits 
make use of data locality, but how does that work in case of cloud storage like 
s3 ? Will the entire data from s3 be moved to drill cluster before starting the 
query processing ?
  2.   Is it possible to use s3 or other cloud storage solutions for Sort, Hash 
Aggregate, and Hash Join operators spill data rather than using local disk ?
  3.  Is it ok to run drill production cluster without hadoop ? Is just 
zookeeper quorum enough ?


I totally understand how busy you can be but if you get a chance, please help 
me to get a clarity on these items. It will be really helpful

Thanks again!
Manu Mukundan
Bigdata Architect,
Prevalent AI,
manu.mukun...@prevalent.ai


Reply via email to