Re: [DISCUSS] Resurrect support for Table Statistics in Drill

2018-11-14 Thread Vitalii Diravka
You have described the similar design, which Drill Metastore are going to provide. I'm talking about API with put and get statistics and different storages for Metadata, aka Metastore Plugins. Since the project is in progress it is possible to determine the API for storing and obtaining stats and

Re: [DISCUSS] Resurrect support for Table Statistics in Drill

2018-11-10 Thread Paul Rogers
Hi Gautam, You touched on the key issue: storage. You mention that the Drill stats implementation learned from Oracle. Very wise: Oracle is the clear expert in this space. There is a very important difference, however, between Drill and Oracle. Oracle is a complete database including both

Re: [DISCUSS] Resurrect support for Table Statistics in Drill

2018-11-09 Thread Gautam Parai
Hi Paul, Thanks so much for the feedback. Please see my responses below. To take a step back, recall that Arina is working on a metadata proposal. A > key aspect of that proposal is that it provides an API so that Drill can > connect (via an adapter) to any metadata system. The gist of my

Re: [DISCUSS] Resurrect support for Table Statistics in Drill

2018-11-09 Thread Paul Rogers
Hi Gautam, One follow-up clarification: I realize one point was a bit unclear. I suggested that stats be gathered by a query. By this, I simply mean that stats use the existing query mechanism with, perhaps, an enhanced UDAF (user-defined aggregate function) API. Your work includes a new SQL

Re: [DISCUSS] Resurrect support for Table Statistics in Drill

2018-11-08 Thread Paul Rogers
Hi Gautam, Thanks much for the explanations. You raise some interesting points. I noticed that Boaz has just filed a JIRA ticket to tackle the inefficient count distinct case. To take a step back, recall that Arina is working on a metadata proposal. A key aspect of that proposal is that it

Re: [DISCUSS] Resurrect support for Table Statistics in Drill

2018-11-08 Thread Gautam Parai
Hi Paul, Thank you so much for the feedback. Please see my responses below. First, the code to gather the stats is rather complex; it is the evolution > of some work an intern did way back when. We'd be advised to find a simpler > implementation, ideally one that uses mechanisms we already have.

Re: [DISCUSS] Resurrect support for Table Statistics in Drill

2018-11-06 Thread Paul Rogers
Hi All, Stats would be a great addition. Here are a couple of issues that came up in the earlier code review, revisited in light of recent proposed work. First, the code to gather the stats is rather complex; it is the evolution of some work an intern did way back when. We'd be advised to find

Re: [DISCUSS] Resurrect support for Table Statistics in Drill

2018-11-06 Thread Vitalii Diravka
+1 It will help to rely on that code in the process of implementing Drill Metastore, DRILL-6552. @Gautam Please address all current commits and rebase onto latest master, then Vova and me will do additional review for it. Just for clarification, am I right, the changes state is the same as in

Re: [DISCUSS] Resurrect support for Table Statistics in Drill

2018-11-05 Thread Parth Chandra
+1 I'd say go for it. If the option to use enhanced stats an be turned on per session, then users can experiment and choose to turn it on for queries where they do not experience performance degradation. On Fri, Nov 2, 2018 at 3:25 PM Gautam Parai wrote: > Hi all, > > I had an initial

[DISCUSS] Resurrect support for Table Statistics in Drill

2018-11-02 Thread Gautam Parai
Hi all, I had an initial implementation for statistics support for Drill [DRILL-1328] . This JIRA has links to the design spec as well as the PR. Unfortunately, because of some regressions on performance benchmarks (TPCH/TPCDS) we decided to