Hi Abhishek, Thanks for your response . I will try with the approach that you have suggested and come back if I I need any further help.
Best regards, ________ Tilak -----Original Message----- From: Abhishek Girish [mailto:[email protected]] Sent: Monday, July 30, 2018 9:43 PM To: user <[email protected]> Subject: Re: Drill Configuration Requirements To Query Data in Tera Bytes Hey Tilak, We don't have any official sizing guidelines - for planning a Drill cluster. A lot of it depends on the type of queries being executed (simple look-ups vs complex joins), data format (columnar data such as Parquet shows best performance), and system load (running a single query on nodes dedicated for Drill). It also depends on the type of machines you have - for example with beefy nodes with lots of RAM and CPU, you'll need fewer number of nodes running Drill. I would recommend getting started with a 4-10 node cluster with a good amount of memory you can spare. And based on the results try and figure out your own sizing guideline (either to add more nodes or increase memory [1]). If you share more details, it could be possible to suggest more. [1] http://drill.apache.org/docs/configuring-drill-memory/ On Mon, Jul 30, 2018 at 1:57 AM Surneni Tilak <[email protected]> wrote: > Hi Team, > > May I know the ideal configuration requirements to query data of size > 10 TB with query time under 5 minutes. Please suggest me regarding the > number of Drilbits that I have to use and the RAM(Direct-Memory & > Heap_Memory) that each drill bit should consists of to complete the > queries within the desired time. > > Best regards, > _________ > Tilak > > >
