Thanks Divya. Why dont you go ahead and create a JIRA (with the above info) for this and assign it to Bridget Bevens.
I can create one but I would rather have someone from the community ask for it. Best, Saurabh On Mon, Oct 23, 2017 at 7:14 PM, Divya Gehlot <[email protected]> wrote: > Yes a very good info which helps a lots of ppl like me who is using Drill > as one of their production environment > cant we share this information as recommendation to Dril users on the > Apache Drill KB ? > > On 20 October 2017 at 01:58, Saurabh Mahapatra < > [email protected] > > wrote: > > > I do not think you will get such information about benchmarks from > > customers on production workloads. But from the customers I have worked > > with who have taken Drill to production, here is some information that > may > > be of use to you: > > > > 1. The trend universally has been to use beefier machines for in-memory > > query engines. We see 256GB RAM and 32 cores as the most frequent > > configuration. On the network side, it is 2x10GbE. > > > > 2. The most commonly sized dedicated cluster for starting out with Drill > in > > production has been around 16-20 nodes with the above configuration. I > have > > several customers who have deployed this on 200+ nodes as well but in > those > > scenarios, it is a service among many. > > > > 3. The concurrency we see in the above settings is a function of the size > > of the dataset and the complexity of the customer query. In general, > > Little's law holds. The smaller the chunk of work is to be processed, the > > faster will be the throughput. Our understanding of this changes further > > with the new releases of Drill where spill to disk features will make it > > more of a pessimistic execution engine. Also, the use of queues can also > > change this understanding. > > > > 4. From my company side, we do have TPCH and TPCDS benchmarks that I do > > share with customers. But such benchmarks are flawed because they come > from > > the world of traditional warehousing where the competition was among > > general purpose query engines. For example, our tests show that at higher > > and higher data scale, Drill beats Impala on these benchmarks. The same > is > > touted by the Hive LLAP folks as well. But they do not necessarily imply > > that it is the best tool choice for the production environment. It is a > > reason why I am resistant getting into the war of the query engines in > > which every query engine beats the other under a given set of primed > > conditions. > > > > 5. It is an absolute most that you understand the query patterns that the > > system will have to withstand with the data characteristics specific to > > your use case. I would only trust that. Big data systems are going to be > > application specific and will require tuning. Which also means that you > > have to revisit the kinds of analytics you would like your end users to > > have. Which again raises the question-what kinds of analytics truly > > generate value for the BI user? > > > > Best, > > Saurabh > > > > On Wed, Oct 18, 2017 at 10:26 PM, PROJJWAL SAHA <[email protected]> > > wrote: > > > > > Hi, > > > > > > Is there any public performance benchmark that users have achieved > using > > > Drill in production scenarios ? It would be useful if someone can pass > me > > > any links for customer user stories. > > > > > > Regards > > > > > >
