I will see what I can do to set up a test. On Mon, Apr 30, 2018, 08:10 Vitalii Diravka <[email protected]> wrote:
> Ted, > > The rules are enabled and DRILL-3855 [1] is resolved. > Please try your queries with latest Drill master version. > > [1] https://issues.apache.org/jira/browse/DRILL-3855 > > Kind regards > Vitalii > > > On Mon, Apr 30, 2018 at 4:31 PM Nicolas Paris <[email protected]> wrote: > > > Hi > > > > This looks an interesting design. > > > > Am I correct such view > > would hit the RDBMS for every query ? > > However such view would hit the parquet file only when > > the timestamp predicate would match a partition ? > > > > Any news on a recent test to confirm the design ? > > > > Thanks > > > > 2018-03-20 6:49 GMT+01:00 Ted Dunning <[email protected]>: > > > > > Aman, > > > > > > That is exactly the clarification that I needed. I had a hazy memory > of a > > > problem in this area, but not enough to actually figure out the current > > > state. > > > > > > In case anybody cares, being able to do this is really handy. The basic > > > idea is to keep long history in files and recent history in a DB. That > > > allows you to create files with data that is advantageously sorted in > > order > > > to get excellent compression. You can get nearly atomic switch-over to > > > newly created files with lazy deletion of database entries by using a > > > reference to a cutoff date in a database row. The file side would only > > look > > > for data before the cutoff and the DB would only look for data after > the > > > cut. By positioning new files (created by CTAS on an about to be > obsolete > > > part of the DB) before changing the cutoff date, we get apparent > > atomicity. > > > > > > After the switch, and after a reasonable delay beyond that (to let all > > > pending queries finish), the DB can be trimmed. > > > > > > Without a working pushdown through unions, this is all kind of > pointless. > > > If that is working now, it would be fabulous. > > > > > > An example of how big a win this can be, consider a use case where we > > want > > > to keep all old states of customer preferences and context (say for a > > > mobile phone). Almost all of the hundreds of settings for an individual > > > would be unchanged even if a few do change. That means that if you > could > > > arrange a day (or more) of data by user id, the columnar compression of > > > parquet would crush the data size. This only works, however, if you can > > > collect a fair number of rows for each user. Thus the idea of a hybrid > > > setup. > > > > > > > > > > > > On Mon, Mar 19, 2018 at 11:57 PM, Aman Sinha <[email protected]> > > wrote: > > > > > > > Due to an infinite loop occurring in Calcite planning, we had to > > disable > > > > the filter pushdown past the union (SetOps). See > > > > https://issues.apache.org/jira/browse/DRILL-3855. > > > > Now that we have rebased on Calcite 1.15.0, we should re-enable this > > and > > > > test and if the pushdown works then the partition pruning on both > sides > > > of > > > > the union should automatically work after that. > > > > > > > > Will follow-up on this.. > > > > > > > > -Aman > > > > > > > > On Mon, Mar 19, 2018 at 3:02 PM, Kunal Khatua <[email protected] > > > > > > wrote: > > > > > > > > > I think Ted's question is 2 fold, with the former being more > > important. > > > > > 1. Can we push filters past a union. > > > > > 2. Will Drill push filters down to the source. > > > > > > > > > > For the latter, it depends on the source. > > > > > For the former, it depends primarily on whether Calcite supports > > this. > > > I > > > > > haven't tried it, so I can't say. > > > > > > > > > > On 3/19/2018 2:22:54 PM, rahul challapalli < > > [email protected] > > > > > > > > > wrote: > > > > > First I would suggest to ignore the view and try out a query which > > has > > > > the > > > > > required filters as part of the subqueries on both sides of the > union > > > > (for > > > > > both the database and partitioned parquet data). The plan for such > a > > > > query > > > > > should have the answers to your question. If both the subqueries > > > > > independently prune out un-necessary data, using partitions or > > > indexes, I > > > > > don't think adding a union between them would alter that behavior. > > > > > > > > > > -Rahul > > > > > > > > > > On Mon, Mar 19, 2018 at 1:44 PM, Ted Dunning wrote: > > > > > > > > > > > IF I create a view that is a union of partitioned parquet files > > and a > > > > > > database that has secondary indexes, will Drill be able to > properly > > > > push > > > > > > down query limits into both parts of the union? > > > > > > > > > > > > In particular, if I have lots of archival data and parquet > > > partitioned > > > > by > > > > > > time but my query only asks for recent data that is in the > > database, > > > > will > > > > > > the query avoid the parquet files entirely (as you would wish)? > > > > > > > > > > > > Conversely, if the data I am asking for is entirely in the > archive, > > > > will > > > > > > the query make use of the partitioning on my parquet files > > correctly? > > > > > > > > > > > > > > > > > > > > >
