Re: storage structure - querying directories - sanity check and UDF assistance

2015-07-24 Thread Jacques Nadeau
Two quick notes: - If you switch to internal null handling, you have to define separate udfs for each possible combination of nullable and non-nullable values. - isSet is an integer, so your if clause would actually be: if (! (yearDir.isSet == 1) ) { // yearDir is NULL, handle this here } --

Re: storage structure - querying directories - sanity check and UDF assistance

2015-07-24 Thread Stefán Baxter
Hi, I have this running now: select occurred_at, dir0, dir1, dir2 from dfs.tmp.`/analytics/processed/test/events` as t where dir0 = dirInRange(cast('2015-04-10' as timestamp),cast('2015-07-11' as timestamp),COALESCE(dir0,'-'),COALESCE(dir1,'-'),COALESCE(dir2,'-')) order by occurred_at;

Re: storage structure - querying directories - sanity check and UDF assistance

2015-07-24 Thread Jason Altekruse
I'm not sure, it is possible that it is being evaluated during planning to prune the scan, but the filter above the scan is not being removed as it should be. I'll try to re-create it the case to take a look. Stefan, Earlier you had mentioned that it was not only inefficient, but it was also

Re: storage structure - querying directories - sanity check and UDF assistance

2015-07-24 Thread Stefán Baxter
Hi Jason, I will share this code tomorrow on github so you can review this using that if it helps. When I was testing this, earlier today, I saw, to my surprise, that the query sometime returned results. This was not constant and I could run exactly the same statement with two different results

Re: storage structure - querying directories - sanity check and UDF assistance

2015-07-24 Thread Jacques Nadeau
- This is being called for *every record for every file in every directory* Are you sure? Constant reduction should take care of this. @Jason, any ideas why it might be failing? -- Jacques Nadeau CTO and Co-Founder, Dremio On Fri, Jul 24, 2015 at 10:45 AM, Stefán Baxter

Re: storage structure - querying directories - sanity check and UDF assistance

2015-07-24 Thread Jason Altekruse
A little clarification on that point. The directory filters are not syntactically separated from filters on regular columns that we read out of files themselves. Without optimization, the easiest way to think about the directory columns are just data that is added to each record coming out of the

Re: storage structure - querying directories - sanity check and UDF assistance

2015-07-24 Thread Ted Dunning
I think that constant reduction isn't entirely working in the presence of joins. For example, I removed the isRandom annotation from my random number generator. You can see constant reduction working if I give a literal number: 0: jdbc:drill:zk=local select b.x,a.y,random(1, 3) from (values

Re: storage structure - querying directories - sanity check and UDF assistance

2015-07-24 Thread Stefán Baxter
Hi, I understand how this can be useful to deal with both row/record and directory should for a result but then there is huge optimization potential left unexploited. (I'm not fully understanding if this directory failing happens with more proof or not). - If this does not eventually fail

storage structure - querying directories - sanity check and UDF assistance

2015-07-24 Thread Stefán Baxter
Hi, I would like to share our intentions for organizing our data and how we plan to construct queries for it. There are four main reasons for sharing this: a) I would like to sanity check the approach b) I'm having a hard time writing a UDF to optimize this and need a bit of help. c) This can