Two quick notes:
- If you switch to internal null handling, you have to define separate udfs
for each possible combination of nullable and non-nullable values.
- isSet is an integer, so your if clause would actually be:
if (! (yearDir.isSet == 1) ) {
// yearDir is NULL, handle this here
}
--
Hi,
I have this running now:
select occurred_at, dir0, dir1, dir2 from
dfs.tmp.`/analytics/processed/test/events` as t where dir0 =
dirInRange(cast('2015-04-10' as timestamp),cast('2015-07-11' as
timestamp),COALESCE(dir0,'-'),COALESCE(dir1,'-'),COALESCE(dir2,'-')) order
by occurred_at;
I'm not sure, it is possible that it is being evaluated during planning to
prune the scan, but the filter above the scan is not being removed as it
should be. I'll try to re-create it the case to take a look.
Stefan,
Earlier you had mentioned that it was not only inefficient, but it was also
Hi Jason,
I will share this code tomorrow on github so you can review this using that
if it helps.
When I was testing this, earlier today, I saw, to my surprise, that the
query sometime returned results. This was not constant and I could run
exactly the same statement with two different results
- This is being called for *every record for every file in every
directory*
Are you sure? Constant reduction should take care of this. @Jason, any
ideas why it might be failing?
--
Jacques Nadeau
CTO and Co-Founder, Dremio
On Fri, Jul 24, 2015 at 10:45 AM, Stefán Baxter
A little clarification on that point. The directory filters are not
syntactically separated from filters on regular columns that we read out of
files themselves. Without optimization, the easiest way to think about the
directory columns are just data that is added to each record coming out of
the
I think that constant reduction isn't entirely working in the presence of
joins. For example, I removed the isRandom annotation from my random
number generator.
You can see constant reduction working if I give a literal number:
0: jdbc:drill:zk=local select b.x,a.y,random(1, 3) from (values
Hi,
I understand how this can be useful to deal with both row/record and
directory should for a result but then there is huge optimization potential
left unexploited. (I'm not fully understanding if this directory failing
happens with more proof or not).
- If this does not eventually fail
Hi,
I would like to share our intentions for organizing our data and how we
plan to construct queries for it.
There are four main reasons for sharing this:
a) I would like to sanity check the approach
b) I'm having a hard time writing a UDF to optimize this and need a bit of
help.
c) This can