Re: Using Drill with EMR

2014-12-04 Thread Jason Altekruse
current version and report back your results. I will also try to contact Timothy Chen, the committer who drove the effort, to see if he would be interested in helping to update the script if need be. http://tnachen.wordpress.com/2013/12/24/drill-on-aws-emr/ - Jason Altekruse On Thu, Dec 4, 2014

Re: Drill & PHP

2015-01-01 Thread Jason Altekruse
While it is not part of the open source Drill project, Mapr Technologies provides an ODBC driver for Drill. A quick search on the PHP docs seems to indicate that PHP can connect to an ODBC provider. The rest interface is also available, and would be bundled with the default open source build, altho

Re: Parquet and filtering

2015-01-05 Thread Jason Altekruse
Hi Adam, I have a few thoughts that might explain the difference in query times. Drill is able to read a subset of the data from a parquet file, when selecting only a few columns out of a large file. Drill will give you faster results if you ask for 3 columns instead of 10 in terms of read perform

Re: Parquet and filtering

2015-01-07 Thread Jason Altekruse
Just made one, I put some comments there from the design discussions we have had in the past. https://issues.apache.org/jira/browse/DRILL-1950 - Jason Altekruse On Tue, Jan 6, 2015 at 11:04 PM, Adam Gilmore wrote: > Just a quick follow up on this - is there a JIRA item for implementing p

Re: create table as default to parquet?

2015-01-07 Thread Jason Altekruse
As we currently use file suffixes to determine file types on read, I think it would make sense to have the same behavior on write (obviously with the option to define overrides as users need them). Thoughts on the best user experience here? -Jason Altekruse On Tue, Jan 6, 2015 at 1:01 PM

Re: Parquet and filtering

2015-01-07 Thread Jason Altekruse
ments have been made to the parquet mainline that may give us the performance we are looking for in these cases. We haven't had time to revisit it so far. -Jason Altekruse On Wed, Jan 7, 2015 at 4:04 PM, Adam Gilmore wrote: > Out of interest, is there a reason Drill implemented e

Re: Parquet and filtering

2015-01-08 Thread Jason Altekruse
attachments is prohibited. The recipient should check > > this email and any attachments for viruses and other defects. The Company > > disclaims any liability for loss or damage arising in any way from this > > communication including any file attachments. > > > > On

Re: Parquet and filtering

2015-01-09 Thread Jason Altekruse
-mail and > delete the original transmission and its contents. Any unauthorised use, > dissemination, forwarding, printing, or copying of this communication > including any file attachments is prohibited. The recipient should check > this email and any attachments for viruses and other defect

Re: question about JSON query

2015-01-09 Thread Jason Altekruse
I believe that Jim may have given the appropriate query to satisfy the needs of the original question, but for anyone who finds this thread I wanted to give a quick clarification about kvgen. The purpose of this function is to allow queries against maps where the keys themselves represent data rath

Re: Varying Execution Times For The Same Query On The Same File

2015-01-16 Thread Jason Altekruse
I do not think we currently consider JSON files splittable. If we do treat them as such, it would depend on the file size and the available read locality available on the nodes. Especially with a select * (or a count(*)) query there is nothing to parallelize except for the read operation and a simp

Re: Drill Specific Performance Monitoring Utilities

2015-01-16 Thread Jason Altekruse
We do not currently have information gathered during execution. There was a discussion at some point about gathering and exposing information that is usually reported at the end of they query in the Web UI query profile view during execution and updating that interface to track progress. I'm not su

Re: Connection timeout on port 2181 - apache-drill-0.7.0

2015-01-19 Thread Jason Altekruse
Ralph, The most common reason for not being able to start a Drillbit is that the port used for the Web UI is still being consumed by another instance of Drill. If you have previously tried to start a Drillbit outside of embedded mode there might still be some portion of the process still running.

Please join us tomorrow for our Google Hangout weekly meeting - 10am Pacific time

2015-01-19 Thread Jason Altekruse
Hello Drillers, Please join us tomorrow at 10am Pacific for our community meeting. If you are new to Drill, have questions about the current work being done throughout the community, or you just want to listen in, anyone is welcome to participate. The link is always available from the website unde

Re: Filter out empty arrays in JSON

2015-01-21 Thread Jason Altekruse
Currently this will fail on a repeated list or repeated map. This function has only been defined for lists of scalars. There is a JIRA open for the enhancement request, the priority on this should probably be bumped up with the use cases we have seen. Currently it is marked "Future". https://issue

Re: Unable to root into Drill sandbox.

2015-01-26 Thread Jason Altekruse
Hi Jen, Unfortunately the mailing list does not allow attachments. Please feel free to upload your image to a service like IMGUR and share a link. http://imgur.com/ -Jason On Mon, Jan 26, 2015 at 8:05 AM, Andries Engelbrecht < aengelbre...@maprtech.com> wrote: > Hi Jen, > > Not sure if the mai

Re: Filter out empty arrays in JSON

2015-01-26 Thread Jason Altekruse
As Aditya commented before this will work if the lists only contain scalars repeated_count('entities.urls') > 0 If the lists contain maps unfortunately this is not available today. There is an enhancement request open for this feature. I have marked it for a fix in 0.9 as it is more of a feature

Join us for the Drill Google hangout tomorrow 10am Pacific

2015-02-02 Thread Jason Altekruse
under "Community -> Get Involved", I have copied it below as well. https://plus.google.com/hangouts/_/event/ci4rdiju8bv04a64efj5fedd0lc As I have done for the last few sessions, I encourage anyone thinking of attending to suggest topics they would like to discuss. Thanks, Jason Altekruse

Re: Drill - Flatten function - help please

2015-02-03 Thread Jason Altekruse
Sudhakar opened an issue for this, so I responded there. Steven is right, this is the current expected functionality, but I discuss there the reasons for it and opened the discussion for use cases that need this functionality. On Tue, Feb 3, 2015 at 2:28 PM, Steven Phillips wrote: > I think flat

Re: Drill - Flatten function - help please

2015-02-03 Thread Jason Altekruse
. I haven’t used Big Query for > this, I used it for flat tables, but I can check. > > Thanks > Sudhakar Thota > > > > On Feb 3, 2015, at 2:45 PM, Jason Altekruse > wrote: > > > Sudhakar opened an issue for this, so I responded there. Steven is right, > > t

Re: Directory pruning with Drill

2015-02-03 Thread Jason Altekruse
Hao, The dir columns are always added to the records coming out of a scan. The issue is with trying to avoid unneeded reads altogether. If you look at the query plan you should see that the scan is going to read all of the files and the filter against the directory column will be applied in a sepa

Cancelling hangout this week

2015-02-10 Thread Jason Altekruse
Sorry about the last minute notice, but unfortunately I'm going to have to cancel the community hangout today. Yash, I saw both of your messages, which I assume were in anticipation of a meeting today. I will find someone today to review the Cassandra storage plugin. I have some thoughts on the Py

Re: Large Table Joins

2015-02-13 Thread Jason Altekruse
I don't think this actually answers your question. You can limit your filters by directory to avoid reads from the filesystem, and some of the storage plugins like Hbase and Hive implement scan level pushdown, but I do not know if this is sophisticated enough that a join would be aware of the parti

Re: Drill & Adjunct Data Warehouse

2015-02-13 Thread Jason Altekruse
Almost all of the heavy lifting has been done for us by calcite. See the discussion here for a little bit of background and the parts we need to still implement. http://mail-archives.apache.org/mod_mbox/drill-dev/201501.mbox/%3CCAMpYv7APxne4JzM_wBrAtBd5Emkogj1jpnPeQQ3bA1E-7RKf=w...@mail.gmail.com%

Hangout happening now!

2015-02-17 Thread Jason Altekruse
Sorry about that lack of a reminder this week. We were off for presidents day yesterday and I didn't think about it. If anyone is available, feel free to join the hangout! https://plus.google.com/hangouts/_/event/ci4rdiju8bv04a64efj5fedd0lc -Jason

Re: Storage Plugin Config for XML

2015-02-23 Thread Jason Altekruse
gh for some members of the community to see the broader need for a use case like their own. http://www.w3.org/TR/2001/REC-xmlschema-2-20010502/ -Jason Altekruse On Mon, Feb 23, 2015 at 11:45 AM, Steven Phillips wrote: > To the best of my knowledge, no one has started working on this yet. >

Re: inverse of kvgen and flatten?

2015-02-27 Thread Jason Altekruse
back into the root of the schema currently. -Jason Altekruse On Fri, Feb 27, 2015 at 8:25 AM, Ted Dunning wrote: > I was just looking through the documentation and I don't see a way to group > data and then create a list. Flatten turns a list into individual > records. I woul

Re: Storage Plugin Config for XML

2015-03-02 Thread Jason Altekruse
Even beyond the issue of types, there are structures that are expressible in XML that do not fit into a database model well, even one like Drill that supports complex data. The primary issue is text stored between opening and closing tags. I don't think these features of XML are commonly used by sy

Would someone be able to lead the hangout today?

2015-03-03 Thread Jason Altekruse
m the attendees, go around and introduce any new people and jump right into the discussions. Notes can be high level with a quick description of the issue, features, etc. https://plus.google.com/hangouts/_/event/ci4rdiju8bv04a64efj5fedd0lc Thanks, Jason Altekruse

Re: Would someone be able to lead the hangout today?

2015-03-03 Thread Jason Altekruse
I did not see Yash's message before I sent this, please see his message for more info. On Tue, Mar 3, 2015 at 8:15 AM, Jason Altekruse wrote: > I will not have time this morning to lead the hangout, anyone with topics > to discuss is welcome to still attend and post minutes to t

Join us tomorrow (Tuesday) for the Drill weekly hangout!

2015-03-09 Thread Jason Altekruse
re. Having an agenda encourages newcomers and mailing list lurkers to come and discuss topics they think sound interesting. Thanks, Jason Altekruse

Re: Join us tomorrow (Tuesday) for the Drill weekly hangout!

2015-03-10 Thread Jason Altekruse
Hangout happening now! https://plus.google.com/hangouts/_/event/ci4rdiju8bv04a64efj5fedd0lc On Mon, Mar 9, 2015 at 11:01 AM, Jason Altekruse wrote: > Hello Drillers, > > Please join us tomorrow at 10am Pacific for our community meeting. If you > are new to Drill, have questio

Re: Join us tomorrow (Tuesday) for the Drill weekly hangout!

2015-03-12 Thread Jason Altekruse
work with tools with incomplete UTF-8 support On Tue, Mar 10, 2015 at 9:58 AM, Jason Altekruse wrote: > Hangout happening now! > > https://plus.google.com/hangouts/_/event/ci4rdiju8bv04a64efj5fedd0lc > > On Mon, Mar 9, 2015 at 11:01 AM, Jason Altekruse > wrote: > >> Hello

Hangout happening now!

2015-03-24 Thread Jason Altekruse
Come join the hangout to talk about whats happening with Drill, the recent 0.8 release candidate and the upcoming schedule. https://plus.google.com/hangouts/_/event/ci4rdiju8bv04a64efj5fedd0lc - Jason

Re: Trying to wrap my head around using WHERE with flatten

2015-03-27 Thread Jason Altekruse
myself for producing a better error message if this kind of planning issue comes up the the future. Thanks, Jason Altekruse On Fri, Mar 27, 2015 at 11:01 AM, Andries Engelbrecht < aengelbre...@maprtech.com> wrote: > I would recommend to not use a count(*) but rather pick a column to use &

Re: WHERE clause with nested JSON data

2015-03-31 Thread Jason Altekruse
The error message indicates that this is a planning bug. Please try to look to see if you can find an open JIRA for the issue and add any information about your case there. If there is not one already filed, please open a new one and try to provide as much explanation as you can about the data invo

Re: Nested or Array JSON

2015-04-02 Thread Jason Altekruse
Hi Muthu, Welcome to the Drill community! Unfortunately the mailing list does not allow attachments, please send along the error log copied into a mail message. If you are working with the 0.7 version of Drill, I would recommend upgrading the the new 0.8 release that just came out, there were a

Re: Nested or Array JSON

2015-04-02 Thread Jason Altekruse
good performance for further analysis. -Jason On Thu, Apr 2, 2015 at 8:49 AM, Jason Altekruse wrote: > Hi Muthu, > > Welcome to the Drill community! > > Unfortunately the mailing list does not allow attachments, please send > along the error log copied into a mail message. > >

Re: Nested or Array JSON

2015-04-03 Thread Jason Altekruse
status. > > > > > > Error message got from ODBC is > > > > > > "ERROR [HY000] [MapR][Drill] (1040) Drill failed to execute the query: > > > SELECT * FROM `HDFS`.`root`.`./user/hadoop2/unclaimedaccount.json` > LIMIT > > 100 > > >

Re: Counting large numbers of unique values

2015-04-09 Thread Jason Altekruse
@Adam This is something that has come up on the list before, you may be thinking of http://blinkdb.org/. This is something that would definitely be interesting to explore once we are stable and passed 1.0. We certainly can try to help you along if you would like to start some of this work. @Marci

Re: Unable to query data from hdfs

2015-04-08 Thread Jason Altekruse
Hi Latha, Unfortunately the mailing list does not support attachments, could you possibly throw the file onto a file sharing service and share a link? If the file is below 20 MB you should be able to file a JIRA issue and upload it there as an attachment if you don't have another host available.

Re: Flatten function limit on large nested JSON array

2015-04-11 Thread Jason Altekruse
Hello Phil, Unfortunately this was a bug that was in flatten all along that ended up being exposed when we fixed another system-wide issue with supporting large lists and very wide strings. I have posted a patch that fixes this issue that is in review, and I want to do a little additional cleanup

Re: Announcing new committer: Hanifi Gunes

2015-04-16 Thread Jason Altekruse
Congrats! Certainly well deserved! On Thu, Apr 16, 2015 at 3:35 PM, Ellen Friedman wrote: > Congrats Hanifi and thanks for all your work > > Ellen > > On Thu, Apr 16, 2015 at 2:29 PM, Jacques Nadeau > wrote: > > > The Apache Drill PMC is very pleased to announce Hanifi Gunes as a new > > commi

Re: Querying fixed-length files

2015-04-20 Thread Jason Altekruse
Just as a bit of explanation for anyone who finds the thread, what is happening here is that the csv parser will read files with no commas in them as a series of records with one value each. This method is a bit of a clever hack, but it will not work if any of your values have commas in them. It is

Re: Documentation for Query Profile page in Web UI

2015-04-21 Thread Jason Altekruse
The attachment for the json profile made it to the list because it is ASCII, but the screenprint was blocked as a binary file. We can take a look at the profile by loading the json into an instance of Drill, but just a reminder about binary attachments for everyone, please upload to a public host a

Hangout happening now!

2015-05-05 Thread Jason Altekruse
If you have some time, join us for our weekly hangout to talk about what is happening in the Drill comminity, everyone is welcome. Stop in to introduce yourself! https://plus.google.com/hangouts/_/event/ci4rdiju8bv04a64efj5fedd0lc

Re: Drill 0.9 slowness/hanging

2015-05-11 Thread Jason Altekruse
You can fix the Web UI slowness issue for now by deleting this jar, it was pulled in as a transitive dependency, but we don't actually need it and it is causing intermittent class-loading conflicts with classes we are actually using for the Web UI. As stated before, the permanent fix is already in

Hangout happening now

2015-05-26 Thread Jason Altekruse
Come join the Drill community as we discuss what has been happening lately and what is in the pipeline. All are welcome, if you know about Drill, want to know more or just want to listen in. https://plus.google.com/hangouts/_/event/ci4rdiju8bv04a64efj5fedd0lc

Re: Monitoring long / stuck CTAS

2015-05-28 Thread Jason Altekruse
There should be no upper limit on the size of the tables you can create with Drill. Be advised that Drill does currently operate entirely optimistically in regards to available resources. If a network connection between two drillbits fails during a query, we will not currently re-schedule the work

Re: Monitoring long / stuck CTAS

2015-05-28 Thread Jason Altekruse
ure I am adjusting the correct config, these are heap parameters > within the Drill configure path, not for Hadoop or Zookeeper? > > > > On May 28, 2015, at 12:08 PM, Jason Altekruse > wrote: > > > > There should be no upper limit on the size of the tables you can create

Re: Monitoring long / stuck CTAS

2015-05-28 Thread Jason Altekruse
allocation in the drill-env.sh files, and have to > restart the drill bits. > >> > >> How large is the data set you are working with, and your cluster/nodes? > >> > >> —Andries > >> > >> > >> On May 28, 2015, at 9:17 AM, Matt

Re: Drill logical plan optimization

2015-05-28 Thread Jason Altekruse
Currently Drill does not allow submission of logical plans. I think that the web interface is out of date and claims you can submit a logical plan, but this is not correct. We would like to allow for modification of plans at the logical level, but we just haven't implemented the feature currently.

Re: Drill logical plan optimization

2015-05-28 Thread Jason Altekruse
arameter of the submit_plan script? Is > this broken as well? > > -- > Piotr Sokólski > > > On Friday 29 May 2015 at 00:29, Jason Altekruse wrote: > > > Currently Drill does not allow submission of logical plans. I think that > > the web interface is out of dat

Re: convert unix timestamp

2015-05-29 Thread Jason Altekruse
Tried taking a look at the function to see what the issue was, from_unixtime is actually a Hive UDF that is just on the classpath and available by default in Drill. It does look like Andries said that it is returning var16char, which might be a bug. The fact that it is trying to cast to bigInt desp

Re: dfs.local local filesystem created in multiple nodes and read from single node

2015-06-01 Thread Jason Altekruse
It sounds like we should not have written to the filesystem if we were not connected to a single host or a distributed filesystem. The problem is that the files we wrote will not be associated together the way they would be in a single filesystem (even a distributed one that would have a common nam

Re: MapR Drill - mongodb collections does not show up

2015-06-01 Thread Jason Altekruse
Hi Mano, Unfortunately the apache mail lists aren't very good with attachments, can you upload it to a public host and share a link? - Jason Altekruse On Mon, Jun 1, 2015 at 3:01 PM, Rangaswamy, Manoharan wrote: > Hi Hanifi, > > As you can see in the log file, I am unable to

Reminder: Weekly hangout tomorrow(Tuesday) 10am Pacific

2015-06-01 Thread Jason Altekruse
/_/event/ci4rdiju8bv04a64efj5fedd0lc - Jason Altekruse

[Discuss] Hive - Smallint and Tinyint

2015-06-08 Thread Jason Altekruse
Hello Drillers, I have been working on DRILL-3209, which aims to speed up reading from hive tables by re-planning them as native Drill reads in the case where the tables are backed by files that have available native readers. This will begin with parquet and delimited text files. To provide the s

Re: from_unixtime in drill explorer/ODBC

2015-06-09 Thread Jason Altekruse
This is pretty well implied with Christopher's message, but Drill ships with a Hive storage plugin which puts Hive jars on the default class path. Just as with Drill native UDFs we pick up the default Hive functions in these jars and register them. Another one that was causing some issues was the H

Re: Query on setting up Apache Drill and nested query for json file

2015-06-09 Thread Jason Altekruse
Hi Rob, Thanks for putting so much effort into getting Drill set up for your use case, we know that there are still some sharp edges in Drill and detailed information about use cases that are hard to set up help us to improve the docs and core project. As a quick answer, I think you might have ru

Re: What is the best way to use Apache-drill with rails app?

2015-06-15 Thread Jason Altekruse
If you are not going to be reading a lot of data, in terms of final results (i.e. your app will consume filtered and/or aggregated results), the rest API should serve your purposes. For better throughput the JDBC and ODBC interfaces will be your best bet. Please note that the odbc driver is not a p

Re: Exception thrown when float field has integer precedent

2015-06-18 Thread Jason Altekruse
Be aware, this will work if you turn on all_text_mode, but obviously it will have some overhead reading as varchar and casting numeric types as strings. If you turn on read_numbers_as_double this will also "work", but be aware we are not using these cast statements as hints about how to do the scan

Re: Reposting the Error on Flattening

2015-06-19 Thread Jason Altekruse
The patch is currently in review, I don't think that it is going to necessarily fix this issue. I am have been looking into issues with flatten and I just opened a new JIRA that I think will actually address your issue. This is a little bit of a low level issue with how the flatten is currently bei

Re: Reposting the Error on Flattening

2015-06-19 Thread Jason Altekruse
estation of the original problem? > > -Hanifi > > On Fri, Jun 19, 2015 at 10:17 AM, Jason Altekruse < > altekruseja...@gmail.com> > wrote: > > > The patch is currently in review, I don't think that it is going to > > necessarily fix this issue. I am have bee

Re: Cannot start drillbit

2015-06-23 Thread Jason Altekruse
If you have started Drill previously and there is already a configuration stored in zookeeper, we will not pick up the boostrap-storage-plugins.json file upon starting Drill. This is only for the first time starting it. You can modify the entry in zookeeper yourself by uploading the json file for t

Hangout starting a little late this morning, should start around 10:10

2015-06-23 Thread Jason Altekruse
Join us at our weekly hangout to discuss what has been happening in the Drill community! https://plus.google.com/hangouts/_/event/ci4rdiju8bv04a64efj5fedd0lc

Re: Hangout starting a little late this morning, should start around 10:10

2015-06-23 Thread Jason Altekruse
That is pacific time, the meeting will start in about 10 minutes. On Tue, Jun 23, 2015 at 9:59 AM, Jason Altekruse wrote: > Join us at our weekly hangout to discuss what has been happening in the > Drill community! > > https://plus.google.com/hangouts/_/event/ci4rdiju8bv04a64efj5fedd0lc >

Re: Cannot start drillbit

2015-06-23 Thread Jason Altekruse
You're welcome, let us know if it works for you. On Tue, Jun 23, 2015 at 10:41 AM, Ganesh Muthuraman wrote: > Thanks Jason. That helps. > Thanks,G > > > From: altekruseja...@gmail.com > > Date: Tue, 23 Jun 2015 09:14:11 -0700 > > Subject: Re: Cannot start drillbit > > To: user@drill.apache.org >

Re: Quotes not being recognized in tab delimited (tsv) files

2015-06-26 Thread Jason Altekruse
This is a reasonable hack for some cases, but I'm pretty sure this is going to break the most common purpose of having quotes at all. If you put the delimiter (tab) between quotes you are going to have it splitting on those characters where it shouldn't be. There is also the issue that the quotes

Re: drill 1.0.0 and hive 0.13 with mapr security

2015-06-30 Thread Jason Altekruse
Venki can give more specifics on the status, but Hive impersonation was implemented in 1.1.0, which should be going out for a vote soon. I think there might have been some limitations on the scope, but I know we have verified the functionality with several security models. On Tue, Jun 30, 2015 at

Re: drill 1.0.0 and hive 0.13 with mapr security

2015-06-30 Thread Jason Altekruse
zalez wrote: > That's great! Thanks. Presumably I could pull down a nightly and try it > out? Will drill still be expecting hive .13? > > On Tuesday, June 30, 2015, Jason Altekruse > wrote: > > > Venki can give more specifics on the status, but Hive impersonation was >

Hangout starting in 30 minutes

2015-06-30 Thread Jason Altekruse
Come join the Apache Drill hangout to find out what is new in the upcoming 1.1 release. Anyone with an interest in Drill is welcome to attend. https://plus.google.com/hangouts/_/event/ci4rdiju8bv04a64efj5fedd0lc

Apache Drill Hangout Information

2015-06-30 Thread Jason Altekruse
Hello Drillers, I have created a Google spreadsheet to track leaders of the weekly hangout (Tuesday at 10am Pacific time) to make sure we always have someone able to attend the meeting and facilitate discussion. Commitment is pretty low, for anyone who has attended the hangout it should be easy to

Re: Custom Functions

2015-07-01 Thread Jason Altekruse
If the general idea of what you are trying to accomplish with this function is not private, it might be useful to ask about your use case more generally. Although we are still working to integrate the JDBC plugin into the Drill mainline and it still requires a thorough testing cycle, this might be

Re: (noob) performance of queries against csv files

2015-07-02 Thread Jason Altekruse
Just one additional note here, I would strongly advise against converting csv files using a select * query out of a csv. The reason for this is two-fold. Currently we read csv files into a list of varchars, rather than individual columns. While parquet supports lists and we will read them, the rea

Re: (noob) performance of queries against csv files

2015-07-02 Thread Jason Altekruse
, Larry White wrote: > so the solution is to use select, but with columns specifically defined. is > that right? > > On Thu, Jul 2, 2015 at 4:48 PM, Jason Altekruse > wrote: > > > Just one additional note here, I would strongly advise against converting > > csv files using

Re: Querying parquet files

2015-07-07 Thread Jason Altekruse
While the format is columnar and we are taking advantage of certain aspects of the layout, we do not split the read between columns, but instead by the block abstraction in Parquet which they call Row Groups. Each of these blocks will contain data from each column, forming a complete set of rows.

Re: Strange Error when Querying

2015-07-08 Thread Jason Altekruse
It certainly does look like an issue with encoding, but you can see in his code the query he is trying to run. There are not unicode characters that I can see. It is possible that this is getting corrupted somehow in the ODBC driver. Please file a JIRA with this case, I don't have a suggestion for

Re: Drill data storage plugin for Azure blob

2015-07-08 Thread Jason Altekruse
I'm not very experienced with configuring the various filesystems the implement the HDFS API, but there is not a need for an Azure specific plugin. The blob storage exposes the HDFS API, similar to S3 and other storage systems. If you can get the hadoop client to run an 'ls' or other filesystem co

Re: Optimizing S3 access for Drill using Parquet files

2015-07-14 Thread Jason Altekruse
I am not aware of anyone doing something like this today, but it seems like something best handled outside of Drill right now. Drill considers itself essentially stateless, we do not manage indexes, table constraints or caching data for any of our current storage systems. There was some work being

Re: Recursive CTE Support in Drill

2015-07-16 Thread Jason Altekruse
@Alexander If you want to test the speed of the ODBC driver you can do that without a new storage plugin. If you get the entire dataset into memory, it will be returned from Drill a quickly as we can possibly send it to the client. One way to do this is to insert a sort; we cannot send along any o

Re: Recursive CTE Support in Drill

2015-07-17 Thread Jason Altekruse
give you a billion records with negligible I/o. > > > > Sent from my iPhone > > > > > On Jul 16, 2015, at 15:43, Jason Altekruse > > wrote: > > > > > > @Alexander If you want to test the speed of the ODBC driver you can do > > that > > > with

Re: Foreman Parallelizer not working with compressed csv file?

2015-07-23 Thread Jason Altekruse
I could be wrong, but I believe that gzip is not a compression that can be split, you must read and decompress the file from start to end. In this case we can not parallelize the read. This stackoverflow article mentions bzip2 as an alternative compression used by hadoop to solve this problem and a

Re: Foreman Parallelizer not working with compressed csv file?

2015-07-23 Thread Jason Altekruse
g and processing it. On Thu, Jul 23, 2015 at 12:08 PM, Juergen Kneissl wrote: > Hi Jason, > > On 07/23/15 18:53, Jason Altekruse wrote: > > I could be wrong, but I believe that gzip is not a compression that can > be > > split, you must read and decompress the file from start to

Re: storage structure - querying directories - sanity check and UDF assistance

2015-07-24 Thread Jason Altekruse
I'm not sure, it is possible that it is being evaluated during planning to prune the scan, but the filter above the scan is not being removed as it should be. I'll try to re-create it the case to take a look. Stefan, Earlier you had mentioned that it was not only inefficient, but it was also givin

Re: storage structure - querying directories - sanity check and UDF assistance

2015-07-24 Thread Jason Altekruse
directory" (hope that makes sense). > > I'll let you know when the code is in a good-enough state and I have pushed > it to github. > > Thanks for all the help guys, it's appreciated. > > Regards, > -Stefan > > > > On Fri, Jul 24, 2015 at 8:46 PM, J

Re: storage structure - querying directories - sanity check and UDF assistance

2015-07-24 Thread Jason Altekruse
This is actually a known issue, constant folding is not working in the select clause because of a costing problem. Constant folding only works currently in the where clause today. https://issues.apache.org/jira/browse/DRILL-2218 On Fri, Jul 24, 2015 at 4:13 PM, Ted Dunning wrote: > I think that

Re: geeting schema from a csv file in json response instead of columns array

2015-07-26 Thread Jason Altekruse
Hafiz, One other note that will likely help you with your use case. Drill allows you to skip reading the header row in a csv file. Without this feature configured you will likely get number or date format exceptions trying to cast your csv data to particular data types, as your column names will b

Re: Querying partitioned Parquet files

2015-07-29 Thread Jason Altekruse
Put a little more simply, the node that we end up planning the query on is going to enumerate the files we will be reading in the query so that we can assign work to given nodes. This currently assumes we are going to know at planning time (on the single node) all of the files to be read. This happ

Re: Querying partitioned Parquet files

2015-07-30 Thread Jason Altekruse
t; > Just to clarify this, Jason - you don't necessarily need HDFS or the like > > for this, if you had say a NFS volume (for example, Amazon Elastic File > > System), you can still accomplish it, right? Or merely if you had all > > files duplicated on every node locally.

Re: querying a specific time period

2015-07-31 Thread Jason Altekruse
You also could use the date-part function. http://drill.apache.org/docs/date-time-functions-and-arithmetic/#date_part-syntax On Fri, Jul 31, 2015 at 9:47 AM, Jacques Nadeau wrote: > I would think you could cast to time and then provide a time boundary. > > I don't remember the exact syntax but

Re: pending queries jamming the system

2015-08-03 Thread Jason Altekruse
We are going to have a lot of users with less perseverance and black box debugging skills than you have been showing in your evaluation of Drill. I would not consider this a stupid user issue at all, we need to be clear about the state of the system to users. If you have some time to record how you

Re: Resetting an option

2015-08-10 Thread Jason Altekruse
I don't know if I missed something, but the Postgres docs seem to indicate that there is no equivalent to the concept of a SYSTEM option that exists in Drill, which can be set with a query. Options can be set at server startup, either in a configuration file or with a command line parameter [2]. On

Re: Need help in querying HDFS from drill

2015-08-20 Thread Jason Altekruse
If files are available through the HDFS API, which includes remote reads, Drill is able to read the files. A good use case for Drill is actually installing on a subset of your nodes to save the overhead of running the server everywhere while still being able to query all of your data. I have not se

Re: Case Sensitivity: LIKE

2015-08-25 Thread Jason Altekruse
Drill supports ILIKE for case insensitive matching. Be aware that it is treated like a regular function, as Steven notes here: https://issues.apache.org/jira/browse/DRILL-3301 This doc page should be changed to include it, I'll open a pull request. https://drill.apache.org/docs/operators On Tue,

Re: xml files with Drill

2015-09-04 Thread Jason Altekruse
One of the times this came up I asked about what the requirements would be, because pure XML is actually not well suited for placing in a standard SQL table, and some of the constructs are even hard to map into the JSON/protobuf model we are currently using for complex data in Drill. I actually do

Re: Naming directories

2015-09-08 Thread Jason Altekruse
One thing you can do to speed up the expression evaluation is to use this expression instead of regex_replace. This will avoid copying each value into a short lived String object which unfortunately is the only interface available on the java regex library we are using within the function. We shoul

Re: Naming directories

2015-09-08 Thread Jason Altekruse
ira/browse/DRILL-1441 On Tue, Sep 8, 2015 at 11:22 AM, Jason Altekruse wrote: > One thing you can do to speed up the expression evaluation is to use this > expression instead of regex_replace. This will avoid copying each value > into a short lived String object which unfortunately is

Re: NullPointers in type conversions

2015-09-10 Thread Jason Altekruse
A SQL level null is different than a null at the JAVA level that would be giving this exception (we don't represent nulls with an actual null java object). There might be a way to work around it, but this is a bug in Drill. You should be able to make a cast between compatible types even if there ar

Re: repeated_contains - intended behaviour?

2015-09-23 Thread Jason Altekruse
I think it is reasonable to consider that a bug. We should implement the function both as it works today and as you were originally expecting it. Any ideas about about a good naming scheme for the two? Unfortunately the regular contains() method does substring matching, but I think the name repeat

Re: NullPointers in type conversions

2015-09-23 Thread Jason Altekruse
es > >> >>> determined > >> >>>> the error to be invalid. Is trying to cast an empty string, or null > >> value > >> >>>> to an integer invalid? What's the workaround? > >> >>>> > >> >>

  1   2   3   >