Right() function in 0.7

2015-02-13 Thread Minnow Noir
right() is documented on the wiki ( https://cwiki.apache.org/confluence/display/DRILL/SQL+Functions, last edited 6 weeks ago), but doesn't seem to be a valid function: use sys; 0: jdbc:drill:zk=local> select right("blahblah",2) from version; Query failed: Query failed: Failure parsing SQL. Encount

Re: Drill - MapR-DB table - error

2015-02-13 Thread Sudhakar Thota
Aditaya, Thanks for working on this. Here is drill bit log. 2015-02-13 14:28:51,498 [2b218572-dd78-a538-e9bb-547410c18cea:frag:1:0] ERROR o.a.d.e.w.f.AbstractStatusReporter - Error 69c63002-9e29-471a-aacb-a073a421d37f: Failure while running fragment. org.joda.time.IllegalFieldValueException:

Re: Drill & Adjunct Data Warehouse

2015-02-13 Thread Jason Altekruse
Almost all of the heavy lifting has been done for us by calcite. See the discussion here for a little bit of background and the parts we need to still implement. http://mail-archives.apache.org/mod_mbox/drill-dev/201501.mbox/%3CCAMpYv7APxne4JzM_wBrAtBd5Emkogj1jpnPeQQ3bA1E-7RKf=w...@mail.gmail.com%

Re: Drill & Adjunct Data Warehouse

2015-02-13 Thread Jim Scott
I completely agree with that sentiment. Given the Mongo and Cassandra plugin work that is being done, adding a JDBC data source seems like it might be about the next most important to the community as a whole. On Fri, Feb 13, 2015 at 3:23 PM, Christopher Matta wrote: > The potential for a JDBC

Re: Drill - MapR-DB table - error

2015-02-13 Thread Sudhakar Thota
Andries, Good thinking. It works on csv file but not with MapR-DB table. Here is the file showing that. Thanks Sudhakar Thota On Feb 13, 2015, at 8:14 AM, Andries Engelbrecht wrote: > Does the CSV file work? > Just not when in MaprDB? > If so try varchar before to date. > > --Andries

Re: Drill & Adjunct Data Warehouse

2015-02-13 Thread Christopher Matta
The potential for a JDBC storage plugin has come up in discussions a lot lately and would be a very positive addition to the project. I would love to know if there's been any work on this, or if not how something like this could get bootstrapped. Chris Matta cma...@mapr.com 215-701-3146 On Fri, F

Memory Settings

2015-02-13 Thread Christopher Matta
I’m attempting to see if increased available memory to Drill has a positive effect on certain queries, but I’m having trouble determining if changed memory settings are being respected. After setting DRILL_MAX_DIRECT_MEMORY="8G" and DRILL_MAX_HEAP="4G" I restarted drill. Checking the *metrics* pag

Re: Drill & Adjunct Data Warehouse

2015-02-13 Thread Yousef Lasi
I recall reading about development work on a JDBC storage plugin. Is this this still being worked on? and if so, how can we get current status and/or contribute? thanks February 13 2015 7:31 AM, "Uli Bethke" wrote: > The use case of the adjunct data warehouse requires a data federation > la

Re: Drill & Adjunct Data Warehouse

2015-02-13 Thread Ted Dunning
Drill definitely can serve as a database virtualization layer. Calcite was used this way when it was just Optiq and Drill provides interesting additional capabilities. The emerging view of user needs seems to be tilting more towards the semi-structured data capabilities of Drill rather than the v

Re: Large Table Joins

2015-02-13 Thread Jacques Nadeau
You've hit the nail on the head in terms of challenges. The HDFS interface doesn't provide an ability to specifically request certain data placement strategies for a file. While placing the workload on a particular node will likely create the first replica of data on that node, the secondary repl

Re: Large Table Joins

2015-02-13 Thread Uli Bethke
Thanks Aman. This answers my question. I suppose as a workaround for the time being I could denormalize the smaller of the large tables into the bigger one. I would also be interested in the opinions of the group on data co-locality (as implemented by Teradata). This is not so much a Drill qu

Re: Large Table Joins

2015-02-13 Thread Aman Sinha
Drill joins (either hash join or merge join) currently generate 3 types of plans: hash distribute both sides of the join, hash distribute left side and broadcast right side, broadcast right side and don't distribute left side. These are cost-based decisions based on cardinalities. However, the

Re: Large Table Joins

2015-02-13 Thread Uli Bethke
Thanks Jason. Just a bit more background on my question. Modern MPPs such as Teradata allow for full data co-locality via hash distribution of keys. This ensures that join data of two large tables will always end up on the same node and data co-locality is always ensured (no network overhead),

Re: Large Table Joins

2015-02-13 Thread Jason Altekruse
I don't think this actually answers your question. You can limit your filters by directory to avoid reads from the filesystem, and some of the storage plugins like Hbase and Hive implement scan level pushdown, but I do not know if this is sophisticated enough that a join would be aware of the parti

Re: Drill - MapR-DB table - error

2015-02-13 Thread Andries Engelbrecht
Does the CSV file work? Just not when in MaprDB? If so try varchar before to date. --Andries > On Feb 13, 2015, at 8:06 AM, Sudhakar Thota h wrote: > > Andries, > > The order date looks good I think as per the format. > >> | 3421989| Clerk#00601 | O | 1996-01-10 | >> | 34229

Re: Drill - MapR-DB table - error

2015-02-13 Thread Sudhakar Thota
Andries, The order date looks good I think as per the format. > | 3421989| Clerk#00601 | O | 1996-01-10 | > | 3422915| Clerk#00058 | O | 1996-01-11 | > | 3423106| Clerk#00266 | O | 1996-01-11 | If you look at the output, it pulls the records for

Re: Large Table Joins

2015-02-13 Thread Carol McDonald
yes you can read about it here https://cwiki.apache.org/confluence/display/DRILL/Partition+Pruning On Fri, Feb 13, 2015 at 6:42 AM, Uli Bethke wrote: > I have two large tables in Hive (both partitioned and bucketed). Using Map > side joins I can take advantage of data locality for the hash join

Drill & Adjunct Data Warehouse

2015-02-13 Thread Uli Bethke
The use case of the adjunct data warehouse requires a data federation layer between production warehouse analytics on Hadoop and the rest of the EDW on an RDBMS. The incumbents (Teradata, Oracle, SAS etc.) have proprietary offerings in this space. PrestoDB also allows for federation between Ha

Large Table Joins

2015-02-13 Thread Uli Bethke
I have two large tables in Hive (both partitioned and bucketed). Using Map side joins I can take advantage of data locality for the hash join table. Using Drill does the optimizer take the partitioning and bucketing into consideration? thanks uli