Re: We're now a TLP

2014-12-02 Thread Ted Dunning
This change in names will probably lead eventually to confusion with people trying to change their subscription status based on the old mailing list names. Be patient when it comes. On Tue, Dec 2, 2014 at 9:15 AM, Jacques Nadeau wrote: > I forgot to mention that the emails have changed (now ev

Re: I want to subscribe to Drill users list

2014-12-08 Thread Ted Dunning
Follow the directions found here: http://drill.apache.org/community/#mailinglists On Sat, Dec 6, 2014 at 7:08 PM, Ajay wrote: > Hi, > > I want to subscribe to Drill users list > > - > Ajay >

Re: I want to subscribe to Drill users list

2014-12-09 Thread Ted Dunning
ran wrote: > > > You will need to email user-subscr...@drill.apache.org > > > > The website is showing the old address (incubator) as there is currently > an > > Apache infrastructure issue preventing us from updating the site. > > > > On Mon, Dec 8, 2014 at

Re: drill rest interface

2014-12-10 Thread Ted Dunning
Carol, I think that you need to put links to images in apache emails. Attachments are typically stripped for security reasons. On Tue, Dec 9, 2014 at 1:46 PM, Carol McDonald wrote: > Here is an example using the drill sandbox > > curl \ > > --header "Content-type: application/json" \ > > --r

Re: Looking For Drill REST API Doc.

2014-12-11 Thread Ted Dunning
I just added this to the wiki. On Wed, Dec 10, 2014 at 9:48 PM, Neeraja Rentachintala < nrentachint...@maprtech.com> wrote: > Refer to below > https://issues.apache.org/jira/browse/DRILL-77 > The google doc in this JIRA has info on the REST APIs. > -Neeraja > > On Wed, Dec 10, 2014 at 9:42 PM,

Re: String for HBase row key

2014-12-17 Thread Ted Dunning
I think that Carol (version 2) meant to say that a variety of types can be pushed down into the the HBase key comparison. The proviso is that the serialization of the key has to be the same as the serialization used by Drill. On Wed, Dec 17, 2014 at 8:37 AM, Carol McDonald wrote: > > can drill

Re: CLI/REST To Register a Workspace?

2014-12-29 Thread Ted Dunning
On Mon, Dec 29, 2014 at 1:56 PM, Chad Smykay wrote: > What I wanted to do know is there was a way to create a work space via CLI > or REST calls within Drill? Do you mean you want to update the configuration with a definition of a workspace?

Re: json data format

2014-12-31 Thread Ted Dunning
On Wed, Dec 31, 2014 at 10:22 AM, Sungwook Yoon wrote: > What will be the right approach here? > Use JSON objects on each line. > Is there a way to produce a legit json file and Drill sees separate objects > in the list of the top level json? > You can read the single record and flatten it.

Re: question about JSON query

2015-01-01 Thread Ted Dunning
Flatten is the right way to destructure lists which is what you have. The proper use of kvgen is for destructuring objects. For instance, with OpenTSDB you can retrieve rows as maps. Since the column names are the keys for this map, kvgen will pull it apart very nicely. On Thu, Jan 1, 2015 at

Re: Drill & PHP

2015-01-01 Thread Ted Dunning
Yes. You can use the REST interface which only requires that you be able to formulate your query in a JSON object and send it to a Drill process from the PHP server. On Thu, Jan 1, 2015 at 3:38 PM, Sharhabeel Hamdan < sharhabeel.ham...@gmail.com> wrote: > Hello Great Team :) > > I would like t

Re: How to contribute to Drill

2015-01-08 Thread Ted Dunning
On Thu, Jan 8, 2015 at 8:43 PM, Maisnam Ns wrote: > 1. Is it fine with other Drill contributors that if I find an unassigned > bug , can I assign it to myself and start working on it or do I need to > send a specific mail to any specific person if so to whom ? > Don't worry about assigning the b

Re: question about JSON query

2015-01-09 Thread Ted Dunning
Great example. This also comes up in open TSDB where column names are time offsets within a window. Reading data from HBase or MapR DB gives you a map and having kvgen makes everything slick as a whistle. On Fri, Jan 9, 2015 at 12:10 PM, Jason Altekruse wrote: > I believe that Jim may have giv

Re: Varying Execution Times For The Same Query On The Same File

2015-01-16 Thread Ted Dunning
If you do want to have more parallelism, use several input files. On Fri, Jan 16, 2015 at 9:13 AM, Jason Altekruse wrote: > I do not think we currently consider JSON files splittable. If we do treat > them as such, it would depend on the file size and the available read > locality available on

Re: Drill Specific Performance Monitoring Utilities

2015-01-16 Thread Ted Dunning
I thought that the diagnostics were emitted as they were produced (recent change). But query by query performance monitoring would be a very cool thing. And the swim-lane visualization that Jacques showed the other day was fabulous. On Fri, Jan 16, 2015 at 1:14 PM, Jason Altekruse wrote: > We

Re: Varying Execution Times For The Same Query On The Same File

2015-01-17 Thread Ted Dunning
On Fri, Jan 16, 2015 at 6:25 PM, George Chow wrote: > Are you saying that Drill will serialize one file to one DrillBit? > For unsplittable files, yes.

Re: Drill Specific Performance Monitoring Utilities

2015-01-17 Thread Ted Dunning
Moderately nice I would say. It might be nice to see records per second or memory usage. Not sure how those would be collected from per query diagnostics. On Sat, Jan 17, 2015 at 7:28 AM, Jim Scott wrote: > How useful would it be if there was an OpenTSDB tcollector for drill > queries? > > O

Re: String for HBase row key

2015-01-20 Thread Ted Dunning
What about filter pushdown in these cases? I know that some filter ops push down through convert calls. What about through byte_substr? On Tue, Jan 20, 2015 at 12:39 PM, Jacques Nadeau wrote: > I believe there is byte_substr (or similar) which you could use before > handing the value to conv

Re: JSON file size vs number of files

2015-02-03 Thread Ted Dunning
Finding 50K files given only a directory name is unlikely to ever be efficient. Reading small files is also unlikely unless the contents are linearized (huge luck if so, unlikely to always work). Caching the contents of recursive directory structures would make things go faster, but it is easy to

Re: Drill & Adjunct Data Warehouse

2015-02-13 Thread Ted Dunning
Drill definitely can serve as a database virtualization layer. Calcite was used this way when it was just Optiq and Drill provides interesting additional capabilities. The emerging view of user needs seems to be tilting more towards the semi-structured data capabilities of Drill rather than the v

Re: best way to query hbase dynamic columns

2015-02-20 Thread Ted Dunning
Would kvgen work on t.price? On Thu, Feb 19, 2015 at 12:59 PM, Carol McDonald wrote: > What is the best way to query an hbase table that has dynamic column names > ? For example this table is similar to the opentsdb table, the rowkey is a > stocksymbol followed by the date and hour , the Price

Re: Storage Plugin Config for XML

2015-02-24 Thread Ted Dunning
To help with this, I just added some pretty ratty capability to log-synth to generate XML data. You might try generating some data for the customer and then they can point and say "more like this" and "less like that". This will give you pretty realistic sample data without security issues. On

inverse of kvgen and flatten?

2015-02-27 Thread Ted Dunning
I was just looking through the documentation and I don't see a way to group data and then create a list. Flatten turns a list into individual records. I would like to turn some fields from a grouped set of records into a list of objects or a list of values.

Re: Using Drill with EMR

2015-02-27 Thread Ted Dunning
On Wed, Feb 25, 2015 at 8:17 PM, Mihai Stoicescu wrote: > Hello, > > My name is Mihai Stoicescu and I am trying to experiment with Apache > Drill. > > I have multiple questions that I hope you can help me find the answers: > >1. Can Drill & Zookeper work outside Hadoop environment? >

Re: UTF coding in JSON docs

2015-03-02 Thread Ted Dunning
The right solution is to go into the JSON format and somehow let character encoding be defined there. On Tue, Mar 3, 2015 at 3:23 AM, Andries Engelbrecht < aengelbre...@maprtech.com> wrote: > How can I convert JSON data with various characters in a text field to a > more usable UTF8 encoding? (

Re: UTF coding in JSON docs

2015-03-09 Thread Ted Dunning
This is dangerous if the source file does not contain UTF-8 but you treat it as if it does. The problem is that characters with high bit set will be treated as leading characters of a multi-byte UTF-8 character encoding. On Mon, Mar 9, 2015 at 11:51 AM, Andries Engelbrecht < aengelbre...@maprtech

Re: Drill file encondig

2015-03-12 Thread Ted Dunning
That would be an excellent enhancement! On Thu, Mar 12, 2015 at 1:08 PM, Steven Phillips wrote: > Drill doesn't support encodings besides utf-8 for the text reader, I > believe. > > On Thu, Mar 12, 2015 at 11:07 AM, Andries Engelbrecht < > aengelbre...@maprtech.com> wrote: > > > Any place to set

Re: Convert UTC to specific timezone?

2015-03-31 Thread Ted Dunning
The original poster wasn't very clear. What they said could mean what Andries provided (which is to determine which timezone that data refers to). The way that I read the question was that they wanted to translate times to be represented as the string formatted version of the same time in a diffe

Re: Convert UTC to specific timezone?

2015-03-31 Thread Ted Dunning
es > > > > > > > > On Mar 31, 2015, at 5:57 AM, Christopher Matta wrote: > > > > > Ted's correct, it would be nice to be able to convert the UTC datetime > > > column to whichever timezone I'm interested in, say 'America/New_York'

Re: Report issues with sensitive data

2015-04-01 Thread Ted Dunning
One idea is to post a log-synth [1] schema that generates data the same shape as your real data. If you can generate fake data that causes the same problem you give developers a huge head start in solving your problem. For the record, are you using the recently announced 0.8 version of Drill? [

Re: Counting large numbers of unique values

2015-04-07 Thread Ted Dunning
How precise do your counts need to be? Can you accept a fraction of a percent statistical error? On Tue, Apr 7, 2015 at 8:11 AM, Aman Sinha wrote: > Drill already does most of this type of transformation. If you do an > 'EXPLAIN PLAN FOR ' > you will see that it first does a grouping on the

Re: Counting large numbers of unique values

2015-04-07 Thread Ted Dunning
On Tue, Apr 7, 2015 at 9:19 AM, Marcin Karpinski wrote: > @ Ted, ideally, I'd like to get exact results, but in case of real > problems, we could perhaps settle on approximate counting. Is there already > such a functionality in Drill? > No. But it is very easy to incorporate existing libraries

Re: Drill to query Client-side encrypted data from S3

2015-04-07 Thread Ted Dunning
Yes. You can integrate the decryption code into a UDF that operates on the elements. On Tue, Apr 7, 2015 at 2:41 PM, Ganesha Muthuraman wrote: > I am trying to use Drill to read from Amazon S3 where the data is > Client-side encrypted, meaning the keys to decrypt the data are custom > controlle

Re: Counting large numbers of unique values

2015-04-07 Thread Ted Dunning
function but haven't gotten to it yet. As Ted mentions, using > > this technique would substantially reduce data shuffling and could be > done > > with a moderate level of effort since our UDAF interface is pluggable. > > > > > > > > On Tue,

Re: Drill to query Client-side encrypted data from S3

2015-04-07 Thread Ted Dunning
Ahh... There is no magic that will handle decryption that you can plug into (at this time). On Tue, Apr 7, 2015 at 3:02 PM, Ganesha Muthuraman wrote: > The situation is this: > There is client side encrypted data on S3. There is an EMR cluster that > uses this as EMRFS. The EMR client reaches

Re: Drill to query Client-side encrypted data from S3

2015-04-07 Thread Ted Dunning
Looking at the link that you provided, it appears that you are encrypting entire data files. That probably makes it better to implement this as a layer in the file access path. Drill doesn't do this just now, but it would be relatively easy to add, I think. On Tue, Apr 7, 2015 at 3:26 PM

Re: Counting large numbers of unique values

2015-04-11 Thread Ted Dunning
to 2 complete code examples of UDFs: >> >> >> https://cwiki.apache.org/confluence/display/DRILL/Custom+Function+Interfaces >> >>> On Thu, Apr 9, 2015 at 9:41 PM, Ted Dunning wrote: >>> >>> On Thu, Apr 9, 2015 at 1:38 AM, Adam Gilmore >>&

Re: Accessing multiple hadoop clusters

2015-04-15 Thread Ted Dunning
Jacques, It sounds like you are implying that the Drill cluster could span both MapR clusters. I think that is true. But I also think that most practical situations, as well as what Jim was asking about, is a case which will have all the drillbits in question next to one cluster and accessing a

Re: Querying OpenTDSB data stored in HBase

2015-04-21 Thread Ted Dunning
IF you blow apart the data as a list, you have scanned the data. Flattening from there will give you sample per row representation. There won't be any pushdown of filtering into the UDF, but this should be really, really fast anyway. On Tue, Apr 21, 2015 at 12:24 PM, Christopher Matta wrote: >

Re: New Drillbits joining cluster causes severe performance spike

2015-04-22 Thread Ted Dunning
Adam, There has been some auto-scaling experimentation done outside the list in which drillbits stay alive, but don't accept work and don't allocate memory until they are needed. That avoids startup transients for the most part. This scaling work is still quite immature, but I will encourage tho

Re: Memory error

2015-04-22 Thread Ted Dunning
You have allocated 4GB to Java's heap and the rest of the 4GB RAM (i.e. zero) you have allocated to data storage. Try 1) running on a larger machine. Having >8G memory will make these worries go away. 2) decreasing memory requirements. Here is one possibility that may or may not work out well:

Re: Drill & approximate query

2015-04-23 Thread Ted Dunning
Uli, I think that the current plans include approximate operators for some aggregations, but not anything on the level, say, BlinkDB. That said, Drill's optimizer could easily have rules that allow you to explicitly down-sample data to different degrees and then have queries choose between versio

Re: Design documents

2015-04-28 Thread Ted Dunning
I don't think that such a diagram exists. There are a number of slide shows around and there is some discussion of the architecture of Drill. Understanding how the system converts SQL to a logical plan, to a physical plan and then to an execution plan is the first step of understanding. Can you

Re: Drill In the Enterprise

2015-04-28 Thread Ted Dunning
There is a JIRA open right now, I believe to support impersonation. With this, a single Drill bit can function on behalf of multiple users, adopting the permissions of each user so that the file system can enforce security constraints. There is also research work going on to experiment with ways

Re: Drill In the Enterprise

2015-04-29 Thread Ted Dunning
Likely the answer would be to have separate drill bits to correspond to each separate scheduling policy. You still need impersonation but whoever handles resource allocation can determine which set of drill bits would be allowed to be rehydrated at any given moment. Sent from my iPhone > On

Re: Questions on drill execution

2015-05-04 Thread Ted Dunning
Is that video linked from the web site? Looks like a really great resource. On Mon, May 4, 2015 at 6:23 PM, Hao Zhu wrote: > https://www.youtube.com/watch?v=kG6vzsk8T7E > This presentation contains lots of details for query execution plan. > > On Mon, May 4, 2015 at 7:10 AM, Carol McDonald >

Re: Questions on drill execution

2015-05-05 Thread Ted Dunning
Cool! On Tue, May 5, 2015 at 3:39 AM, Tomer Shiran wrote: > http://drill.apache.org/docs/drill-introduction/#videos > > > > > On May 4, 2015, at 3:10 PM, Ted Dunning wrote: > > > > Is that video linked from the web site? Looks like a really great > resourc

Re: How to deploy Drill to achieve optimal performance

2015-05-05 Thread Ted Dunning
George, That sounds much too slow. Can you provide some samples of the data and queries? How about actual data counts? Millioins? hundreds of millions? On Tue, May 5, 2015 at 8:54 AM, George Lu wrote: > Hi all, > > These days, I am trying Drill to see whether Drill fits the realtime/nea

Re: Query planning cost

2015-05-07 Thread Ted Dunning
On Fri, May 8, 2015 at 12:30 AM, Adam Gilmore wrote: > We're getting about a 350ms delay for 70 files, about 200ms for 35 files, > about 20-30ms for 1 file. > That is impressively linear. 25ms + files * 4.7 with only 5-10ms error. R^2 = 0.997

Re: Auto-splitting delimitted files

2015-05-20 Thread Ted Dunning
Drill loses locality information on anything but an HDFS oriented file system. That might be part of what you are observing. Having pre-split files should allow parallelism. Can you describe your experiments in more detail? Also, what specifically do you mean by CFS and GFS? Ceph and Gluster?

Re: Auto-splitting delimitted files

2015-05-21 Thread Ted Dunning
it > the files into smaller chunks we would gain increasing efficiency. However, > this doesn't appear to be the case as we're not really getting much > improvement beyond the 30 minute range despite increasing parallelization > by adding additional drill bits and file partit

Re: To EMRFS or not to EMRFS?

2015-05-22 Thread Ted Dunning
The variation will have less to do with Drill (which can read all these options such as EMR resident MapR FS or HDFS or persistent MapR FS or HDFS or S3). The biggest differences will have to do with whether your clusters providing storage are permanent or ephemeral. If they are ephemeral, you ca

Re: [newbie]: how to query HDFS

2015-05-22 Thread Ted Dunning
As a special case, with MapR, you can access all clusters in an administrative group by making sure that you have /mapr/ at the beginning of your path names. THis means that you can simply use different workspaces, or a workspace with a path consisting only of /mapr and still access files and tabl

Re: [newbie]: how to query HDFS

2015-05-22 Thread Ted Dunning
I should have added that there is nothing wrong with the dual plugin approach on MapR. Works fine and it is up to you as a matter of personal choice which is better. On Fri, May 22, 2015 at 4:46 PM, Ted Dunning wrote: > > As a special case, with MapR, you can access all clusters

Re: Custom UDFS slow

2015-05-26 Thread Ted Dunning
On Tue, May 26, 2015 at 7:26 PM, Adam Gilmore wrote: > The code for the WEEK() function is not far from the code from the source > for the EXTRACT(DAY) function. Furthermore, even if I copy the exact code > for the EXTRACT(DAY) function into that, it has the same performance > detriments. > > My

Re: what's the differenct between drill and optiq

2015-05-27 Thread Ted Dunning
Andrew, What Hive does not have is the extensions that Drill has that allow SQL to be type flexible. The ALL type and all of the implications both in terms of implementation and user impact it has are a really big deal. On Wed, May 27, 2015 at 6:08 AM, Andrew Brust < andrew.br...@bluebadgeinsi

Re: what's the differenct between drill and optiq

2015-05-27 Thread Ted Dunning
queries. Drill executes in a very > flexible distributed columnar fashion with late binding. > > On Wed, May 27, 2015 at 8:34 AM, Ted Dunning > wrote: > > > Andrew, > > > > What Hive does not have is the extensions that Drill has that allow > > SQL to be type fl

Re: what's the differenct between drill and optiq

2015-05-28 Thread Ted Dunning
Answers in-line. On Thu, May 28, 2015 at 8:08 AM, Andrew Brust < andrew.br...@bluebadgeinsights.com> wrote: > Absolutely nothing to apologize for, and the below explanation is very > helpful. > You are too kind. > FWIW, I certainly understood that Hive's use of Calcite offered relatively >

Re: what's the differenct between drill and optiq

2015-05-28 Thread Ted Dunning
While an end user could use Calcite, the most common use is as an embedded > library in a broader system. > > The great news is that the community is working together collaborate on an > amazing shared library and framework. > > -Jacques > > > > On Wed, May 27, 2015 at 10:10

Re: Monitoring long / stuck CTAS

2015-05-29 Thread Ted Dunning
Apologies for the plug, but using MapR FS would help you a lot here. The trick is that you can run an NFS server on every node and mount that server as localhost. The benefits are: 1) the entire cluster appears as a conventional POSIX style file system in addition to being available via HDFS API

Re: Monitoring long / stuck CTAS

2015-05-29 Thread Ted Dunning
t snapshots and mirrors carry over to tables as well. Oh, and it tends to be a lot faster and failure tolerant as well. On Fri, May 29, 2015 at 7:00 AM, Yousef Lasi wrote: > Could you expand on the HBase table integration? How does that work? > > On Fri, May 29, 2015 at 5:55 AM, Ted Dunni

Re: avro in HBase columns?

2015-05-29 Thread Ted Dunning
Additional converters would be a great starter contribution! On Fri, May 29, 2015 at 3:32 PM, andrew wrote: > Hi, > > DRILL-1512 only works for reading Avro files off a filesystem such as > HDFS. As far as I know the convert_from() function only supports JSON at > the moment. > > Hope that hel

Re: Interaction between seperate Drill clusters possible?

2015-06-02 Thread Ted Dunning
Drill will make efforts to execute portions of queries locally, but that doesn't look like a powerful enough mechanism for your use case since S3 isn't really local to anything. Also, as a philosophy, Drill delegates all handling of materialized views to you rather than taking responsibility for i

Re: Interaction between seperate Drill clusters possible?

2015-06-02 Thread Ted Dunning
powerful and makes the mental overhead very small > for the enduser. > > I'll first try your suggestion before I dive into any plugin details > though. Thanks again > > > On Tue, Jun 2, 2015 at 11:17 AM, Ted Dunning > wrote: > >> >> Happy to help.

Re: Query on setting up Apache Drill and nested query for json file

2015-06-09 Thread Ted Dunning
On Tue, Jun 9, 2015 at 11:42 AM, Jason Altekruse wrote: > *We do not currently have a shortcut to read files in the directory where > you launched Drill.* > This has made me grumpy in the past, but I really think that Drill got it right here. The real problem is that with a large parallel progr

Re: What is the best way to use Apache-drill with rails app?

2015-06-14 Thread Ted Dunning
What about using ODbc? Sent from my iPhone > On Jun 14, 2015, at 0:00, Hosang Jeon wrote: > > > Hi everyone. > > My current application is built on top of rails framework and I want to > integrate some parts of the application with Apache-drill. > I could see that there is no gems for that

Re: Any way to use https to access Drill web ui

2015-06-15 Thread Ted Dunning
That is on the short-term roadmap. Related issues include universal access to configuration pages. Both will be addressed shortly. On Mon, Jun 15, 2015 at 12:32 AM, George Lu wrote: > For production usage, I want the data to be encrypted from Drill server and > query client? > > Is Drill sup

Re: timestamp string to epoch time

2015-06-15 Thread Ted Dunning
The better solution would be to make the unix_timestamp function ignore the milliseconds (or round off) That may run into the HIVE versus Drill On Mon, Jun 15, 2015 at 8:48 AM, Christopher Matta wrote: > That's kind of annoying, would it make sense to support casting a timestamp > to an INT?

Re: timestamp string to epoch time

2015-06-16 Thread Ted Dunning
-dd HH:mm:ss.SSS') from `sys`.`version`; > ++-+ > | EXPR$0 | EXPR$1| > ++-+ > | 2015-05-29 08:18:53.0 | 1432912733 | > ++-+ > >>

Re: To EMRFS or not to EMRFS?

2015-06-18 Thread Ted Dunning
On Thu, Jun 18, 2015 at 8:24 AM, Paul Mogren wrote: > Following up. Ted gave sound advice regarding reading S3 vs HDFS, but > didn¹t address EMRFS specifically. Here is what I have learned. > Great summary. Very useful when people help by feeding back what they have learned.

Re: JSON/Join/Dynamic schema : java.lang.IllegalStateException: Failure while reading vector.

2015-06-22 Thread Ted Dunning
Andries, That sounds like a reasonable suggestion, but the real problem is that it appears that having the field initially and then having the field be missing is OK, but if it is missing first and then present Drill blows a gasket. I think it looks like a bug. Very good and simple demo. On M

Re: Cannot start drillbit

2015-06-23 Thread Ted Dunning
Try using Drill 1.0 LOTs of improvements since 0.8. On Tue, Jun 23, 2015 at 7:06 AM, Ganesh Muthuraman wrote: > Hi, > Anybody know what this error is? This is drill 0.8. I am unable to start > Drillbit and cannot go the UI to configure the plugin. I see this in > /var/log/drill/drillbit.out >

Re: Custom Functions

2015-06-23 Thread Ted Dunning
Yes and no. It would be pretty easy to build a Drill function that calls out to Jython code. It should be relatively easy to pass a Python function name in as one of the arguments as well. The issues with this approach are: 1) kinda ugly because you aren't calling your code directly 2) Jython

Re: Custom Functions

2015-06-24 Thread Ted Dunning
Failed to execute goal on project simple-drill-function: Could > not > > resol > > ve dependencies for project > > com.mapr:simple-drill-function:jar:1.0-SNAPSHOT: Cou > > ld not find artifact org.apache.drill.exec:drill-java-exec:jar:1.0.0 in > > central > > (htt

Re: Custom Functions

2015-06-24 Thread Ted Dunning
; > I tried running the Simple Drill function . > > >> > https://github.com/mapr-demos/simple-drill-functions > > >> > > > >> > But then when I am trying to run the package , I am getting below > > error > > >> . > > >>

Re: Custom Functions

2015-06-24 Thread Ted Dunning
tailed error file . > > > > > > > > [WARNING] The POM for org.apache.drill.exec:drill-java-exec:jar:1.0.0 > > is > > > > missing > > > > , no dependency information available > > > > > > > > [ERROR] Failed to execute goal on project si

Re: drill configuration setting - rows overwriting one another

2015-06-25 Thread Ted Dunning
Not an answer, but sqlline lets you log results to a file. You can probably then view the file using a better tool. On Thu, Jun 25, 2015 at 4:12 PM, Jim Scott wrote: > Can anyone point me to a configuration setting (not maxwidth) that can > prevent having the drill output in the CLI from ov

Re: drill configuration setting - rows overwriting one another

2015-06-25 Thread Ted Dunning
Could also be a terminal setting bug. On Thu, Jun 25, 2015 at 9:24 PM, Jacques Nadeau wrote: > Sounds like a bug in sqlline's output format. Try changing the output > format from table to csv to work around this. > On Jun 25, 2015 1:13 PM, "Jim Scott" wrote: > > > Can anyone point me to a co

Re: Custom Functions

2015-06-25 Thread Ted Dunning
ec2-user/simple-drill-functions/src/main/java/com/mapr/drill/ListS >um.java:[9,35] error: package > org.apache.drill.exec.vector does not exist > [ERROR] > /home/ec2-user/simple-drill-functions/src/main/java/com/mapr/drill/ListS >

Re: Custom Functions

2015-06-29 Thread Ted Dunning
On Mon, Jun 29, 2015 at 1:03 PM, Alok Tanna wrote: > I was using latest version of Maven , guess that > was the issue. > What was the version?

Re: Custom Functions

2015-06-29 Thread Ted Dunning
Thanks. Very interesting input. On Mon, Jun 29, 2015 at 2:09 PM, Alok Tanna wrote: > Maven 3.3.3 -- Latest version , it worked with 3.0.5 > > On Mon, Jun 29, 2015 at 4:23 PM, Ted Dunning > wrote: > > > On Mon, Jun 29, 2015 at 1:03 PM, Alok Tanna > > wrote: &

Re: Custom Functions

2015-07-01 Thread Ted Dunning
ferring to > > > > > > > > > https://github.com/vicenteg/DrillJDBCExample/blob/master/src/main/java/com/mapr/drill/DrillJDBCExample.java > > > > > > Thanks, > > > Alok Tanna > > > > > > > > > On Mon, Jun 29, 2015 at 6:00 PM, Te

Re: (noob) performance of queries against csv files

2015-07-02 Thread Ted Dunning
Hey Larry, Drill transforms your CSV data into an internal memory-resident format for processing, but does not change the structure of your original data. If you want to convert your file to parquet, you can do this: create table `foo.parquet` as select * from `foo.csv` This will, however, not

Re: (noob) performance of queries against csv files

2015-07-02 Thread Ted Dunning
e read overhead storing data in the varchars, you would also be adding > overhead as your future queries would require a cast anyway to actually > analyze the data. > > > > On Thu, Jul 2, 2015 at 1:27 PM, Larry White wrote: > > > Great. Thanks much > > > > On

Re: test

2015-07-03 Thread Ted Dunning
It is likely that you didn't follow the procedure for unsubscribing. I have sent an unsubscribe request on your behalf which should stop you from receiving any more email. On Fri, Jul 3, 2015 at 4:25 PM, Emer Natalio wrote: > I have opted out 3 months ago and still getting daily threads? > >

Re: Querying parquet files

2015-07-06 Thread Ted Dunning
How many columns do you have? Do you understand about columnar data stores and how selecting only a single column means that much less data needs to be read? If your data consists, say, of integers, then Drill only needs to read 160MB to satisfy your query which is quite reasonable to be read in

Re: Querying parquet files

2015-07-07 Thread Ted Dunning
No. A very simple model like that breaks down on many levels. The most important level that reality intrudes in is the fact that your I/O probably can't really be threaded so widely. What kind of storage are you using? How big is your data? Sent from my iPhone > On Jul 7, 2015, at 6:38, "

Re: Recursive CTE Support in Drill

2015-07-09 Thread Ted Dunning
Are you hard set on using common table expressions? I have discussed a bit off-list creating a data format that would allow tables to be read from a log-synth [1] schema. That would let you read as much data as you might like with an arbitrarily complex (or simple) query. Operationally, you woul

Re: Recursive CTE Support in Drill

2015-07-10 Thread Ted Dunning
wrote: > @Ted, the log-synth storage format would be really useful. I'm already > seeing many unit tests that could benefit from this. Do you have a github > repo for your ongoing work ? > > Thanks! > > On Thu, Jul 9, 2015 at 10:56 PM, Ted Dunning > wrote: > >

Re: Recursive CTE Support in Drill

2015-07-10 Thread Ted Dunning
gin. > > On Fri, Jul 10, 2015 at 9:10 AM, Ted Dunning > wrote: > > > Hakim, > > > > Not yet. Still very much in the stage of gathering feedback. > > > > I would think it very simple. The biggest obstacles are > > > > 1) no documentation on h

Re: Recursive CTE Support in Drill

2015-07-10 Thread Ted Dunning
mat like log-synth should be even simpler to implement. > > On Fri, Jul 10, 2015 at 10:58 AM, Ted Dunning > wrote: > > > I don't think we need a full on storage plugin. I think a data format > > should be sufficient, basically CSV on steroids. > > > > &g

Re: Various ramblings of a newbie

2015-07-11 Thread Ted Dunning
The problem being referred to was one where the type of the data changed and the order in which it was encountered made a difference. For files where the schema is known early (the only thing that ordinary SQL can handle), this won't happen. Also, the problem only occurred in nested data in which

Re: Drill with S3 without hardcoding credentials into core-site

2015-07-12 Thread Ted Dunning
On Sun, Jul 12, 2015 at 9:41 PM, Hafiz Mujadid wrote: > I successfully connected Drill to S3 by placing access and secret keys in > core-site.xml. > > Is it possible to use Drill with S3 without hardcoding credentials into > core-site like defining credentials for multiple users on the fly? > No

Re: Recursive CTE Support in Drill

2015-07-16 Thread Ted Dunning
ting. However, I'd need a ram only option which hopefully provides a >> higher throughput. >> >> @Jacques How involved is it to write a dummy plugin that returns one >> hardcoded row repeatedly 12 million times? >> >> Thanks, >> Alex >> >

Re: Recursive CTE Support in Drill

2015-07-18 Thread Ted Dunning
working on extending the MockStoragePlugin to > support SQL which would provide the outcome you requested. > > [1] > > https://github.com/apache/drill/blob/master/exec/java-exec/src/test/resources/mock-scan.json > > On Thu, Jul 16, 2015 at 10:16 PM, Ted Dunning > wrote: >

Re: Drill not picking up a UDF

2015-07-19 Thread Ted Dunning
Sounds like a fine example, not because of sophistication but because it deals with dates. Check the drill logs. It is likely that drill is grumpy about something in your udf or packaging. Also, feel free to snitch the pom from the simple examples in order to get the pieces assembled and

Re: Drill not picking up a UDF

2015-07-19 Thread Ted Dunning
ning? > > Regards, > -Stefan > > On Sun, Jul 19, 2015 at 6:46 PM, Ted Dunning > wrote: > > > > > Sounds like a fine example, not because of sophistication but because it > > deals with dates. > > > > Check the drill logs. It is likely that drill

Re: Drill not picking up a UDF

2015-07-19 Thread Ted Dunning
On Sun, Jul 19, 2015 at 4:57 PM, Stefán Baxter wrote: > Perhaps this was mentioned in the documentation but this is, at the very > least, not straight forward and super-inviting. > You are very correct on this. The purpose of the simple drill function is as a start at making some of these obscu

Re: Drill not picking up a UDF

2015-07-20 Thread Ted Dunning
On Mon, Jul 20, 2015 at 8:22 AM, Andrew Brust < andrew.br...@bluebadgeinsights.com> wrote: > 1. It seems to me like Drill is at a point where, if you thread the needle > perfectly, things generally work as advertised. That’s certainly an > advance over the old, old days, where stuff that should h

Re: Drill not picking up a UDF

2015-07-20 Thread Ted Dunning
ting same in my > comment. > > > > > On 7/20/15, 4:10 PM, "Ted Dunning" wrote: > > >On Mon, Jul 20, 2015 at 8:22 AM, Andrew Brust < > >andrew.br...@bluebadgeinsights.com> wrote: > > > >> 1. It seems to me like Drill is at a point w

Re: Several questions...

2015-07-22 Thread Ted Dunning
Yes. Just right on that. Regarding the integer conversion, can you saw what format your data is in? Is it exactly 4 bytes, big endian? On Wed, Jul 22, 2015 at 5:34 AM, Alex Ott wrote: > Ok, answering my first question - I need to take the only the column name > into the backquotes, instead o

  1   2   3   4   >