date:20150724

Re: IPv6 in Drill/Parquet

2015-07-24 Thread Jim Scott

let me clarify... If you were grouping by household, you may want to group on the left side. If it is stored in a single valued field, then you would have to manipulate the value in some way to get the portion you want to group by. Thusly, storing it in two parts would be optimal for the use

Re: storage structure - querying directories - sanity check and UDF assistance

2015-07-24 Thread Jacques Nadeau

Two quick notes: - If you switch to internal null handling, you have to define separate udfs for each possible combination of nullable and non-nullable values. - isSet is an integer, so your if clause would actually be: if (! (yearDir.isSet == 1) ) { // yearDir is NULL, handle this here } --

Re: IPv6 in Drill/Parquet

2015-07-24 Thread Stefán Baxter

Well, that is only true if you dont have a BigInteger to hold it :) see: https://java-ipv6.googlecode.com/svn/artifacts/0.14/doc/apidocs/com/googlecode/ipv6/IPv6Address.html Regards, -Stefan On Fri, Jul 24, 2015 at 2:39 PM, Jim Scott jsc...@maprtech.com wrote: an IPv6 address is actually two

Re: storage structure - querying directories - sanity check and UDF assistance

2015-07-24 Thread Stefán Baxter

Hi, I have this running now: select occurred_at, dir0, dir1, dir2 from dfs.tmp.`/analytics/processed/test/events` as t where dir0 = dirInRange(cast('2015-04-10' as timestamp),cast('2015-07-11' as timestamp),COALESCE(dir0,'-'),COALESCE(dir1,'-'),COALESCE(dir2,'-')) order by occurred_at;

Re: IPv6 in Drill/Parquet

2015-07-24 Thread Jim Scott

an IPv6 address is actually two longs. Depending on the type of analysis you are doing you may prefer to store them that way. e.g. the range on the left side is a home / location and the range on the right side are sub values (devices within the home). Depending on your use case you may want to

Re: storage structure - querying directories - sanity check and UDF assistance

2015-07-24 Thread Jason Altekruse

I'm not sure, it is possible that it is being evaluated during planning to prune the scan, but the filter above the scan is not being removed as it should be. I'll try to re-create it the case to take a look. Stefan, Earlier you had mentioned that it was not only inefficient, but it was also

Re: storage structure - querying directories - sanity check and UDF assistance

2015-07-24 Thread Stefán Baxter

Hi Jason, I will share this code tomorrow on github so you can review this using that if it helps. When I was testing this, earlier today, I saw, to my surprise, that the query sometime returned results. This was not constant and I could run exactly the same statement with two different results

Re: storage structure - querying directories - sanity check and UDF assistance

2015-07-24 Thread Jacques Nadeau

- This is being called for *every record for every file in every directory* Are you sure? Constant reduction should take care of this. @Jason, any ideas why it might be failing? -- Jacques Nadeau CTO and Co-Founder, Dremio On Fri, Jul 24, 2015 at 10:45 AM, Stefán Baxter

Re: IPv6 in Drill/Parquet

2015-07-24 Thread Stefán Baxter

thank you! On Fri, Jul 24, 2015 at 3:23 PM, Jim Scott jsc...@maprtech.com wrote: let me clarify... If you were grouping by household, you may want to group on the left side. If it is stored in a single valued field, then you would have to manipulate the value in some way to get the

Re: storage structure - querying directories - sanity check and UDF assistance

2015-07-24 Thread Jason Altekruse

A little clarification on that point. The directory filters are not syntactically separated from filters on regular columns that we read out of files themselves. Without optimization, the easiest way to think about the directory columns are just data that is added to each record coming out of the

Re: storage structure - querying directories - sanity check and UDF assistance

2015-07-24 Thread Ted Dunning

I think that constant reduction isn't entirely working in the presence of joins. For example, I removed the isRandom annotation from my random number generator. You can see constant reduction working if I give a literal number: 0: jdbc:drill:zk=local select b.x,a.y,random(1, 3) from (values

Re: storage structure - querying directories - sanity check and UDF assistance

2015-07-24 Thread Stefán Baxter

Hi, I understand how this can be useful to deal with both row/record and directory should for a result but then there is huge optimization potential left unexploited. (I'm not fully understanding if this directory failing happens with more proof or not). - If this does not eventually fail

storage structure - querying directories - sanity check and UDF assistance

2015-07-24 Thread Stefán Baxter

Hi, I would like to share our intentions for organizing our data and how we plan to construct queries for it. There are four main reasons for sharing this: a) I would like to sanity check the approach b) I'm having a hard time writing a UDF to optimize this and need a bit of help. c) This can

Re: IPv6 in Drill/Parquet

Re: storage structure - querying directories - sanity check and UDF assistance

Re: IPv6 in Drill/Parquet

Re: storage structure - querying directories - sanity check and UDF assistance

Re: IPv6 in Drill/Parquet

Re: storage structure - querying directories - sanity check and UDF assistance

Re: storage structure - querying directories - sanity check and UDF assistance

Re: storage structure - querying directories - sanity check and UDF assistance

Re: IPv6 in Drill/Parquet

Re: storage structure - querying directories - sanity check and UDF assistance

Re: storage structure - querying directories - sanity check and UDF assistance

Re: storage structure - querying directories - sanity check and UDF assistance

storage structure - querying directories - sanity check and UDF assistance

13 matches

Site Navigation

Mail list logo

Footer information