Re: question about views

Ted Dunning Mon, 30 Apr 2018 22:34:29 -0700

I will see what I can do to set up a test.

On Mon, Apr 30, 2018, 08:10 Vitalii Diravka <[email protected]>
wrote:


> Ted,
>
> The rules are enabled and DRILL-3855 [1] is resolved.
> Please try your queries with latest Drill master version.
>
> [1] https://issues.apache.org/jira/browse/DRILL-3855
>
> Kind regards
> Vitalii
>
>
> On Mon, Apr 30, 2018 at 4:31 PM Nicolas Paris <[email protected]> wrote:
>
> > Hi
> >
> > This looks an interesting design.
> >
> > Am I correct such view
> > would hit the RDBMS for every query ?
> > However such view would hit the parquet file only when
> > the timestamp predicate would match a partition ?
> >
> > Any news on a recent test to confirm the design ?
> >
> > Thanks
> >
> > 2018-03-20 6:49 GMT+01:00 Ted Dunning <[email protected]>:
> >
> > > Aman,
> > >
> > > That is exactly the clarification that I needed. I had a hazy memory
> of a
> > > problem in this area, but not enough to actually figure out the current
> > > state.
> > >
> > > In case anybody cares, being able to do this is really handy. The basic
> > > idea is to keep long history in files and recent history in a DB. That
> > > allows you to create files with data that is advantageously sorted in
> > order
> > > to get excellent compression. You can get nearly atomic switch-over to
> > > newly created files with lazy deletion of database entries by using a
> > > reference to a cutoff date in a database row. The file side would only
> > look
> > > for data before the cutoff and the DB would only look for data after
> the
> > > cut. By positioning new files (created by CTAS on an about to be
> obsolete
> > > part of the DB) before changing the cutoff date, we get apparent
> > atomicity.
> > >
> > > After the switch, and after a reasonable delay beyond that (to let all
> > > pending queries finish), the DB can be trimmed.
> > >
> > > Without a working pushdown through unions, this is all kind of
> pointless.
> > > If that is working now, it would be fabulous.
> > >
> > > An example of how big a win this can be, consider a use case where we
> > want
> > > to keep all old states of customer preferences and context (say for a
> > > mobile phone). Almost all of the hundreds of settings for an individual
> > > would be unchanged even if a few do change. That means that if you
> could
> > > arrange a day (or more) of data by user id, the columnar compression of
> > > parquet would crush the data size. This only works, however, if you can
> > > collect a fair number of rows for each user. Thus the idea of a hybrid
> > > setup.
> > >
> > >
> > >
> > > On Mon, Mar 19, 2018 at 11:57 PM, Aman Sinha <[email protected]>
> > wrote:
> > >
> > > > Due to an infinite loop occurring in Calcite planning, we had to
> > disable
> > > > the filter pushdown past the union (SetOps).  See
> > > > https://issues.apache.org/jira/browse/DRILL-3855.
> > > > Now that we have rebased on Calcite 1.15.0, we should re-enable this
> > and
> > > > test and if the pushdown works then the partition pruning on both
> sides
> > > of
> > > > the union should automatically work after that.
> > > >
> > > > Will follow-up on this..
> > > >
> > > > -Aman
> > > >
> > > > On Mon, Mar 19, 2018 at 3:02 PM, Kunal Khatua <[email protected]
> >
> > > > wrote:
> > > >
> > > > > I think Ted's question is 2 fold, with the former being more
> > important.
> > > > > 1. Can we push filters past a union.
> > > > > 2. Will Drill push filters down to the source.
> > > > >
> > > > > For the latter, it depends on the source.
> > > > > For the former, it depends primarily on whether Calcite supports
> > this.
> > > I
> > > > > haven't tried it, so I can't say.
> > > > >
> > > > > On 3/19/2018 2:22:54 PM, rahul challapalli <
> > [email protected]
> > > >
> > > > > wrote:
> > > > > First I would suggest to ignore the view and try out a query which
> > has
> > > > the
> > > > > required filters as part of the subqueries on both sides of the
> union
> > > > (for
> > > > > both the database and partitioned parquet data). The plan for such
> a
> > > > query
> > > > > should have the answers to your question. If both the subqueries
> > > > > independently prune out un-necessary data, using partitions or
> > > indexes, I
> > > > > don't think adding a union between them would alter that behavior.
> > > > >
> > > > > -Rahul
> > > > >
> > > > > On Mon, Mar 19, 2018 at 1:44 PM, Ted Dunning wrote:
> > > > >
> > > > > > IF I create a view that is a union of partitioned parquet files
> > and a
> > > > > > database that has secondary indexes, will Drill be able to
> properly
> > > > push
> > > > > > down query limits into both parts of the union?
> > > > > >
> > > > > > In particular, if I have lots of archival data and parquet
> > > partitioned
> > > > by
> > > > > > time but my query only asks for recent data that is in the
> > database,
> > > > will
> > > > > > the query avoid the parquet files entirely (as you would wish)?
> > > > > >
> > > > > > Conversely, if the data I am asking for is entirely in the
> archive,
> > > > will
> > > > > > the query make use of the partitioning on my parquet files
> > correctly?
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: question about views

Reply via email to