I have to admit, I didn't realize columnar was such a big part of Drill. I guess that's consistent with Dremel, so it makes sense. I always thought the emphasis was on heterogenous data access, not on perf. Cool!
So with that in mind, does drill do much with vector processing/SIMD operation? -----Original Message----- From: Jacques Nadeau [mailto:[email protected]] Sent: Monday, August 17, 2015 1:17 AM To: [email protected] Subject: Re: Benchmarks for Apache Drill Drill is very fast. This is because nearly everybody on the Drill team is focused on performance. We haven't published any formal benchmarks yet. That being said, there are a few out there. I see that Ted mentioned the Intel one. Another is here [1]. As Ted mentioned, these blogs test older and pre-release versions of Drill. Nonetheless, Drill already outshines nearly all of the competition. That being said, the reality is that most benchmarks are very skewed and poorly executed so I strongly recommend you try out Drill on your workload. Once you get setup, ask the community for help to tune the system. Many others are finding it to be incredibly fast and it has repeatedly displaced commercial MPP databases and older open source technologies. Drill is the only open source pure columnar in-memory execution engine today. This means that Drill has the right architecture to continue to increase its lead over other engines. (Think of this as future-proofing.) We'll be enhancing the engine with items including columnar functions, compilation optimizations and customized relational operators in the coming months. This will simply extend Drill's performance lead. thanks, Jacques [1] http://allegro.tech/fast-data-hackathon.html -- Jacques Nadeau CTO and Co-Founder, Dremio On Sun, Aug 16, 2015 at 1:47 AM, Ming Han Teh <[email protected]> wrote: > Hi, > > Are there any benchmarks on Apache Drill? > (standalone benchmarks OR vs Impala/Presto) > > Thanks, > Ming Han >
