Hi, It's here: https://issues.apache.org/jira/browse/DRILL-3838
hopefully this can be accommodated soon :). Regards, -Stefan On Wed, Sep 23, 2015 at 5:21 PM, Jacques Nadeau <[email protected]> wrote: > Hey Stefan, > > Yes, this makes a lot of sense and seems reasonable. We've talked about > providing the simple filename as a virtual attribute. It seems like we > should also provide a full path attribute (from the root of the workspace). > Can you open a JIRA for this? It isn't something that is supported now but > should be fairly trivial to do while we are adding the filename virtual > attribute. > > -- > Jacques Nadeau > CTO and Co-Founder, Dremio > > On Tue, Sep 22, 2015 at 1:51 PM, Stefán Baxter <[email protected]> > wrote: > > > Jacques, > > > > Is this something you think makes sense and could be accommodated? > > > > Regards, > > -Stefan > > > > On Fri, Sep 18, 2015 at 12:13 PM, Stefán Baxter < > [email protected] > > > > > wrote: > > > > > Hi, > > > > > > The short back story is this: > > > > > > - We are serving multiple tenants with vastly different data volume > > > and needs > > > - there no such thing as fixed period segment sizes (to get to > approx. > > > volume per segment) > > > > > > - We do queries that combined information from historical and fresh > > > (streaming) data (parquet and json/avro respectively) using joins > > > - currently we are using loggers to emit the streaming data but this > > > will replaced > > > > > > - The "fresh" data (json/avro) files live in a single directory > > > - 1 file per day > > > > > > - Fresh data is occasionally transformed from json/avro to parquet > > > - the frequency of this is set on tenant/volume basis > > > > > > This is why we need/like to*: > > > > > > - Use directory structure and file names as a flexible chronological > > > partitions (via UDFs) > > > - Use parquet partitions for "logical data separation" based on > other > > > attributes than time > > > > > > * Please remember that adding new data to parquet files would > > > eliminate the need for much of this > > > ** The same is true if would move this whole thing to some metadata > > > driven environment like Hive > > > > > > The Historical (parquet) directory structure might look something like > > > this: > > > > > > 1. /<tenant>/<source>/streaming/2015/09/10 > > > - high volume :: data transformed daily > > > > > > 2. /<tenant>/<source>/streaming/2015/W10 > > > - medium volume :: data transformed weekly > > > > > > 3. /<tenant>/<source>/streaming/2015/09 > > > - low(er) volume :: data transformed monthly > > > > > > So yes, we think that having the ability to evaluate full paths and > file > > > names where we can affect the pruning/scanning with appropriate > > exceptions > > > would help us gain some sanity :). > > > > > > I realize that pruning should preferably be done in the planning phase > > but > > > this would allow for a not-too-messy interception of the scanning > > process. > > > > > > Best regards, > > > -Stefan > > > > > > > > > On Fri, Sep 18, 2015 at 6:01 AM, Jacques Nadeau <[email protected]> > > > wrote: > > > > > >> Can you also provide some examples of what you are trying to > accomplish? > > >> > > >> It seems like you might be saying that you want a virtual attribute > for > > >> the > > >> entire path rather than individual pieces? Also remember that > partition > > >> pruning can also be done if you're using Parquet files without all the > > >> dirN > > >> syntax. > > >> > > >> -- > > >> Jacques Nadeau > > >> CTO and Co-Founder, Dremio > > >> > > >> On Thu, Sep 17, 2015 at 10:42 AM, Stefán Baxter < > > >> [email protected]> > > >> wrote: > > >> > > >> > Hi, > > >> > > > >> > I have been writing a few simple utility functions for Drill and > > >> staring at > > >> > the cumbersome dirN conditions required to take advantage of > directory > > >> > pruning. > > >> > > > >> > Would it be possible to allow UDFs to throw fileOutOfScope and > > >> > directoryOutOfScope exceptions that would allow me to a) write a > > failry > > >> > clever inRange(from, to, dirN...) function and would b) allow for > > >> > additional pruning during execution? > > >> > > > >> > Maybe I'm seeing this all wrong but the process of complicating all > > >> queries > > >> > with a, sometimes quite complicated, dirN tail just seems like too > > much > > >> > redundancy. > > >> > > > >> > Regards, > > >> > -Stefan > > >> > > > >> > > > > > > > > >
