This is actually a known issue, constant folding is not working in the
select clause because of a costing problem. Constant folding only works
currently in the where clause today.
https://issues.apache.org/jira/browse/DRILL-2218
On Fri, Jul 24, 2015 at 4:13 PM, Ted Dunning wrote:
> I think that
I think that constant reduction isn't entirely working in the presence of
joins. For example, I removed the isRandom annotation from my random
number generator.
You can see constant reduction working if I give a literal number:
0: jdbc:drill:zk=local> select b.x,a.y,random(1, 3) from (values
> (
Hi,
I understand how this can be useful to deal with both row/record and
directory should for a result but then there is huge optimization potential
left unexploited. (I'm not fully understanding if this "directory failing"
happens with more proof or not).
- If this does not eventually fail direc
A little clarification on that point. The directory filters are not
syntactically separated from filters on regular columns that we read out of
files themselves. Without optimization, the easiest way to think about the
directory columns are just data that is added to each record coming out of
the s
Hi Jason,
I will share this code tomorrow on github so you can review this using that
if it helps.
When I was testing this, earlier today, I saw, to my surprise, that the
query sometime returned results. This was not constant and I could run
exactly the same statement with two different results (
I'm not sure, it is possible that it is being evaluated during planning to
prune the scan, but the filter above the scan is not being removed as it
should be. I'll try to re-create it the case to take a look.
Stefan,
Earlier you had mentioned that it was not only inefficient, but it was also
givin
- This is being called for *every record for every file in every
directory*
Are you sure? Constant reduction should take care of this. @Jason, any
ideas why it might be failing?
--
Jacques Nadeau
CTO and Co-Founder, Dremio
On Fri, Jul 24, 2015 at 10:45 AM, Stefán Baxter
wrote:
> Hi,
>
>
thank you!
On Fri, Jul 24, 2015 at 3:23 PM, Jim Scott wrote:
> let me clarify...
>
> If you were grouping by household, you may want to group on the left side.
> If it is stored in a single valued field, then you would have to manipulate
> the value in some way to get the portion you want to g
Hi,
thanks for the tips.
Observation:
- This is being called for *every record for every file in every
directory*
Can you please tell me what needs to be done to make sure this is only
called 1 for each directory, preferably before file in that directory are
opened/scanned.
Regards,
-S
Two quick notes:
- If you switch to internal null handling, you have to define separate udfs
for each possible combination of nullable and non-nullable values.
- isSet is an integer, so your if clause would actually be:
if (! (yearDir.isSet == 1) ) {
// yearDir is NULL, handle this here
}
--
J
let me clarify...
If you were grouping by household, you may want to group on the left side.
If it is stored in a single valued field, then you would have to manipulate
the value in some way to get the portion you want to group by. Thusly,
storing it in two parts would be optimal for the use case.
Hi Stehan,
I think when you specify your UDF as NULL_IF_NULL it means Drill will
handle null values automatically: if any passed argument to your UDF is
NULL, the UDF won't be evaluated and Drill will return NULL instead.
In your case your UDF need to handle NULL values by setting:
nulls = NullH
Well, that is only true if you dont have a BigInteger to hold it :)
see:
https://java-ipv6.googlecode.com/svn/artifacts/0.14/doc/apidocs/com/googlecode/ipv6/IPv6Address.html
Regards,
-Stefan
On Fri, Jul 24, 2015 at 2:39 PM, Jim Scott wrote:
> an IPv6 address is actually two longs. Depending o
an IPv6 address is actually two longs. Depending on the type of analysis
you are doing you may prefer to store them that way.
e.g. the range on the left side is a home / location and the range on the
right side are sub values (devices within the home).
Depending on your use case you may want to s
Hi,
I have this running now:
"select occurred_at, dir0, dir1, dir2 from
dfs.tmp.`/analytics/processed/test/events` as t where dir0 =
dirInRange(cast('2015-04-10' as timestamp),cast('2015-07-11' as
timestamp),COALESCE(dir0,'-'),COALESCE(dir1,'-'),COALESCE(dir2,'-')) order
by occurred_at;"
Observ
Hi,
I would like to share our intentions for organizing our data and how we
plan to construct queries for it.
There are four main reasons for sharing this:
a) I would like to sanity check the approach
b) I'm having a hard time writing a UDF to optimize this and need a bit of
help.
c) This can p
Hi,
Has anyone here opinion/ideas on how ipv6 addresses might be stored
efficiently in Parquet via Drill.
The Java BigInteger class handles the 128 variant but the BigIntHolder in
Drill relies on a Long. Storing it in two longs is not optimal and it would
surprise me if the variable binary field
17 matches
Mail list logo