Re: I want to subscribe to Drill users list

2014-12-09 Thread Ted Dunning
...@gmail.com wrote: You will need to email user-subscr...@drill.apache.org The website is showing the old address (incubator) as there is currently an Apache infrastructure issue preventing us from updating the site. On Mon, Dec 8, 2014 at 7:09 PM, Ted Dunning ted.dunn...@gmail.com wrote

Re: String for HBase row key

2014-12-17 Thread Ted Dunning
I think that Carol (version 2) meant to say that a variety of types can be pushed down into the the HBase key comparison. The proviso is that the serialization of the key has to be the same as the serialization used by Drill. On Wed, Dec 17, 2014 at 8:37 AM, Carol McDonald

Re: CLI/REST To Register a Workspace?

2014-12-29 Thread Ted Dunning
On Mon, Dec 29, 2014 at 1:56 PM, Chad Smykay csmy...@mapr.com wrote: What I wanted to do know is there was a way to create a work space via CLI or REST calls within Drill? Do you mean you want to update the configuration with a definition of a workspace?

Re: JSON file size vs number of files

2015-02-03 Thread Ted Dunning
Finding 50K files given only a directory name is unlikely to ever be efficient. Reading small files is also unlikely unless the contents are linearized (huge luck if so, unlikely to always work). Caching the contents of recursive directory structures would make things go faster, but it is easy

Re: Varying Execution Times For The Same Query On The Same File

2015-01-16 Thread Ted Dunning
If you do want to have more parallelism, use several input files. On Fri, Jan 16, 2015 at 9:13 AM, Jason Altekruse altekruseja...@gmail.com wrote: I do not think we currently consider JSON files splittable. If we do treat them as such, it would depend on the file size and the available read

Re: Varying Execution Times For The Same Query On The Same File

2015-01-17 Thread Ted Dunning
On Fri, Jan 16, 2015 at 6:25 PM, George Chow geo...@overcoil.com wrote: Are you saying that Drill will serialize one file to one DrillBit? For unsplittable files, yes.

Re: Drill Specific Performance Monitoring Utilities

2015-01-17 Thread Ted Dunning
Moderately nice I would say. It might be nice to see records per second or memory usage. Not sure how those would be collected from per query diagnostics. On Sat, Jan 17, 2015 at 7:28 AM, Jim Scott jsc...@maprtech.com wrote: How useful would it be if there was an OpenTSDB tcollector for

Re: best way to query hbase dynamic columns

2015-02-20 Thread Ted Dunning
Would kvgen work on t.price? On Thu, Feb 19, 2015 at 12:59 PM, Carol McDonald cmcdon...@maprtech.com wrote: What is the best way to query an hbase table that has dynamic column names ? For example this table is similar to the opentsdb table, the rowkey is a stocksymbol followed by the date

Re: question about JSON query

2015-01-09 Thread Ted Dunning
Great example. This also comes up in open TSDB where column names are time offsets within a window. Reading data from HBase or MapR DB gives you a map and having kvgen makes everything slick as a whistle. On Fri, Jan 9, 2015 at 12:10 PM, Jason Altekruse altekruseja...@gmail.com wrote: I

Re: Drill Adjunct Data Warehouse

2015-02-13 Thread Ted Dunning
Drill definitely can serve as a database virtualization layer. Calcite was used this way when it was just Optiq and Drill provides interesting additional capabilities. The emerging view of user needs seems to be tilting more towards the semi-structured data capabilities of Drill rather than the

Re: Convert UTC to specific timezone?

2015-03-31 Thread Ted Dunning
The original poster wasn't very clear. What they said could mean what Andries provided (which is to determine which timezone that data refers to). The way that I read the question was that they wanted to translate times to be represented as the string formatted version of the same time in a

Re: Convert UTC to specific timezone?

2015-04-01 Thread Ted Dunning
interested in, say 'America/New_York', as a timestamp so I can compare datasets that don't have UTC timestamps with those that do. Is this in the roadmap? Chris Matta cma...@mapr.com 215-701-3146 On Tue, Mar 31, 2015 at 3:11 AM, Ted Dunning ted.dunn...@gmail.com wrote

Re: UTF coding in JSON docs

2015-03-02 Thread Ted Dunning
The right solution is to go into the JSON format and somehow let character encoding be defined there. On Tue, Mar 3, 2015 at 3:23 AM, Andries Engelbrecht aengelbre...@maprtech.com wrote: How can I convert JSON data with various characters in a text field to a more usable UTF8 encoding?

inverse of kvgen and flatten?

2015-02-27 Thread Ted Dunning
I was just looking through the documentation and I don't see a way to group data and then create a list. Flatten turns a list into individual records. I would like to turn some fields from a grouped set of records into a list of objects or a list of values.

Re: Using Drill with EMR

2015-02-27 Thread Ted Dunning
On Wed, Feb 25, 2015 at 8:17 PM, Mihai Stoicescu stoice...@gmail.com wrote: Hello, My name is Mihai Stoicescu and I am trying to experiment with Apache Drill. I have multiple questions that I hope you can help me find the answers: 1. Can Drill Zookeper work outside Hadoop

Re: Drill approximate query

2015-04-23 Thread Ted Dunning
Uli, I think that the current plans include approximate operators for some aggregations, but not anything on the level, say, BlinkDB. That said, Drill's optimizer could easily have rules that allow you to explicitly down-sample data to different degrees and then have queries choose between

Re: New Drillbits joining cluster causes severe performance spike

2015-04-22 Thread Ted Dunning
Adam, There has been some auto-scaling experimentation done outside the list in which drillbits stay alive, but don't accept work and don't allocate memory until they are needed. That avoids startup transients for the most part. This scaling work is still quite immature, but I will encourage

Re: Memory error

2015-04-22 Thread Ted Dunning
You have allocated 4GB to Java's heap and the rest of the 4GB RAM (i.e. zero) you have allocated to data storage. Try 1) running on a larger machine. Having 8G memory will make these worries go away. 2) decreasing memory requirements. Here is one possibility that may or may not work out well:

Re: Drill In the Enterprise

2015-04-29 Thread Ted Dunning
Likely the answer would be to have separate drill bits to correspond to each separate scheduling policy. You still need impersonation but whoever handles resource allocation can determine which set of drill bits would be allowed to be rehydrated at any given moment. Sent from my iPhone On

Re: Design documents

2015-04-28 Thread Ted Dunning
I don't think that such a diagram exists. There are a number of slide shows around and there is some discussion of the architecture of Drill. Understanding how the system converts SQL to a logical plan, to a physical plan and then to an execution plan is the first step of understanding. Can you

Re: Drill In the Enterprise

2015-04-28 Thread Ted Dunning
There is a JIRA open right now, I believe to support impersonation. With this, a single Drill bit can function on behalf of multiple users, adopting the permissions of each user so that the file system can enforce security constraints. There is also research work going on to experiment with ways

Re: Query planning cost

2015-05-07 Thread Ted Dunning
On Fri, May 8, 2015 at 12:30 AM, Adam Gilmore a...@pharmadata.net.au wrote: We're getting about a 350ms delay for 70 files, about 200ms for 35 files, about 20-30ms for 1 file. That is impressively linear. 25ms + files * 4.7 with only 5-10ms error. R^2 = 0.997

Re: Questions on drill execution

2015-05-04 Thread Ted Dunning
Is that video linked from the web site? Looks like a really great resource. On Mon, May 4, 2015 at 6:23 PM, Hao Zhu h...@maprtech.com wrote: https://www.youtube.com/watch?v=kG6vzsk8T7E This presentation contains lots of details for query execution plan. On Mon, May 4, 2015 at 7:10 AM,

Re: Questions on drill execution

2015-05-05 Thread Ted Dunning
Cool! On Tue, May 5, 2015 at 3:39 AM, Tomer Shiran tshi...@gmail.com wrote: http://drill.apache.org/docs/drill-introduction/#videos On May 4, 2015, at 3:10 PM, Ted Dunning ted.dunn...@gmail.com wrote: Is that video linked from the web site? Looks like a really great resource

Re: How to deploy Drill to achieve optimal performance

2015-05-05 Thread Ted Dunning
George, That sounds much too slow. Can you provide some samples of the data and queries? How about actual data counts? Millioins? hundreds of millions? On Tue, May 5, 2015 at 8:54 AM, George Lu luwenbin...@gmail.com wrote: Hi all, These days, I am trying Drill to see whether Drill

Re: Counting large numbers of unique values

2015-04-11 Thread Ted Dunning
://cwiki.apache.org/confluence/display/DRILL/Custom+Function+Interfaces On Thu, Apr 9, 2015 at 9:41 PM, Ted Dunning ted.dunn...@gmail.com wrote: On Thu, Apr 9, 2015 at 1:38 AM, Adam Gilmore dragoncu...@gmail.com wrote: Ted - I'd be really interested in doing something like that (approximate aggregation

Re: Counting large numbers of unique values

2015-04-07 Thread Ted Dunning
On Tue, Apr 7, 2015 at 9:19 AM, Marcin Karpinski mkarpin...@opera.com wrote: @ Ted, ideally, I'd like to get exact results, but in case of real problems, we could perhaps settle on approximate counting. Is there already such a functionality in Drill? No. But it is very easy to incorporate

Re: Drill to query Client-side encrypted data from S3

2015-04-07 Thread Ted Dunning
Looking at the link that you provided, it appears that you are encrypting entire data files. That probably makes it better to implement this as a layer in the file access path. Drill doesn't do this just now, but it would be relatively easy to add, I think. On Tue, Apr 7, 2015 at 3:26 PM, Ted

Re: Drill to query Client-side encrypted data from S3

2015-04-07 Thread Ted Dunning
Ahh... There is no magic that will handle decryption that you can plug into (at this time). On Tue, Apr 7, 2015 at 3:02 PM, Ganesha Muthuraman mganesh...@outlook.com wrote: The situation is this: There is client side encrypted data on S3. There is an EMR cluster that uses this as EMRFS.

Re: Report issues with sensitive data

2015-04-01 Thread Ted Dunning
One idea is to post a log-synth [1] schema that generates data the same shape as your real data. If you can generate fake data that causes the same problem you give developers a huge head start in solving your problem. For the record, are you using the recently announced 0.8 version of Drill?

Re: Auto-splitting delimitted files

2015-05-20 Thread Ted Dunning
Drill loses locality information on anything but an HDFS oriented file system. That might be part of what you are observing. Having pre-split files should allow parallelism. Can you describe your experiments in more detail? Also, what specifically do you mean by CFS and GFS? Ceph and Gluster?

Re: Interaction between seperate Drill clusters possible?

2015-06-02 Thread Ted Dunning
Drill will make efforts to execute portions of queries locally, but that doesn't look like a powerful enough mechanism for your use case since S3 isn't really local to anything. Also, as a philosophy, Drill delegates all handling of materialized views to you rather than taking responsibility for

Re: Query on setting up Apache Drill and nested query for json file

2015-06-09 Thread Ted Dunning
On Tue, Jun 9, 2015 at 11:42 AM, Jason Altekruse altekruseja...@gmail.com wrote: *We do not currently have a shortcut to read files in the directory where you launched Drill.* This has made me grumpy in the past, but I really think that Drill got it right here. The real problem is that with

Re: JSON/Join/Dynamic schema : java.lang.IllegalStateException: Failure while reading vector.

2015-06-22 Thread Ted Dunning
Andries, That sounds like a reasonable suggestion, but the real problem is that it appears that having the field initially and then having the field be missing is OK, but if it is missing first and then present Drill blows a gasket. I think it looks like a bug. Very good and simple demo. On

Re: Cannot start drillbit

2015-06-23 Thread Ted Dunning
Try using Drill 1.0 LOTs of improvements since 0.8. On Tue, Jun 23, 2015 at 7:06 AM, Ganesh Muthuraman mganesh...@outlook.com wrote: Hi, Anybody know what this error is? This is drill 0.8. I am unable to start Drillbit and cannot go the UI to configure the plugin. I see this in

Re: To EMRFS or not to EMRFS?

2015-06-18 Thread Ted Dunning
On Thu, Jun 18, 2015 at 8:24 AM, Paul Mogren pmog...@commercehub.com wrote: Following up. Ted gave sound advice regarding reading S3 vs HDFS, but didn¹t address EMRFS specifically. Here is what I have learned. Great summary. Very useful when people help by feeding back what they have

Re: drill configuration setting - rows overwriting one another

2015-06-26 Thread Ted Dunning
Could also be a terminal setting bug. On Thu, Jun 25, 2015 at 9:24 PM, Jacques Nadeau jacq...@apache.org wrote: Sounds like a bug in sqlline's output format. Try changing the output format from table to csv to work around this. On Jun 25, 2015 1:13 PM, Jim Scott jsc...@maprtech.com wrote:

Re: Custom Functions

2015-06-26 Thread Ted Dunning
://www.exertdigital.com/ https://www.facebook.com/exertdigital https://www.linkedin.com/company/exert-digital On Thu, Jun 25, 2015 at 2:24 AM, Ted Dunning ted.dunn...@gmail.com wrote: As Vince says, get a full JDK, version 7 or above. On Wed, Jun 24, 2015 at 2:17 PM, Vince Gonzalez vince.gonza

Re: Custom Functions

2015-06-25 Thread Ted Dunning
-exec:jar:1.0.0 in central (https://repo.maven.apache.org/maven2) - [Help 1] Thanks, Alok Tanna eXertDigital On Wed, Jun 24, 2015 at 12:42 AM, Ted Dunning ted.dunn...@gmail.com wrote: Yes and no. It would be pretty easy to build a Drill

Re: Custom Functions

2015-06-25 Thread Ted Dunning
at 12:57 PM, Ted Dunning ted.dunn...@gmail.com wrote: Vince is correct. Drill isn't in central so you have to install it locally. On Wed, Jun 24, 2015 at 12:47 PM, Vince Gonzalez vince.gonza...@gmail.com wrote: Alok, I had to install the Drill dependencies locally

Re: Custom Functions

2015-06-24 Thread Ted Dunning
not find artifact org.apache.drill.exec:drill-java-exec:jar:1.0.0 in central (https://repo.maven.apache.org/maven2) - [Help 1] Thanks, Alok Tanna eXertDigital On Wed, Jun 24, 2015 at 12:42 AM, Ted Dunning ted.dunn...@gmail.com wrote: Yes and no. It would be pretty easy

Re: What is the best way to use Apache-drill with rails app?

2015-06-14 Thread Ted Dunning
What about using ODbc? Sent from my iPhone On Jun 14, 2015, at 0:00, Hosang Jeon hosang.j...@braincommerce.com wrote: Hi everyone. My current application is built on top of rails framework and I want to integrate some parts of the application with Apache-drill. I could see that

Re: timestamp string to epoch time

2015-06-16 Thread Ted Dunning
HH:mm:ss.SSS') from `sys`.`version`; ++-+ | EXPR$0 | EXPR$1| ++-+ | 2015-05-29 08:18:53.0 | 1432912733 | ++-+ On Mon, Jun 15, 2015 at 11:49 AM, Ted Dunning ted.dunn

Re: timestamp string to epoch time

2015-06-15 Thread Ted Dunning
The better solution would be to make the unix_timestamp function ignore the milliseconds (or round off) That may run into the HIVE versus Drill On Mon, Jun 15, 2015 at 8:48 AM, Christopher Matta cma...@mapr.com wrote: That's kind of annoying, would it make sense to support casting a

Re: Any way to use https to access Drill web ui

2015-06-15 Thread Ted Dunning
That is on the short-term roadmap. Related issues include universal access to configuration pages. Both will be addressed shortly. On Mon, Jun 15, 2015 at 12:32 AM, George Lu luwenbin...@gmail.com wrote: For production usage, I want the data to be encrypted from Drill server and query

Re: what's the differenct between drill and optiq

2015-05-28 Thread Ted Dunning
Answers in-line. On Thu, May 28, 2015 at 8:08 AM, Andrew Brust andrew.br...@bluebadgeinsights.com wrote: Absolutely nothing to apologize for, and the below explanation is very helpful. You are too kind. FWIW, I certainly understood that Hive's use of Calcite offered relatively little

Re: what's the differenct between drill and optiq

2015-05-28 Thread Ted Dunning
in a broader system. The great news is that the community is working together collaborate on an amazing shared library and framework. -Jacques On Wed, May 27, 2015 at 10:10 PM, Ted Dunning ted.dunn...@gmail.com wrote: Andrew, Sorry for being cryptic. Hanifi is more clear. My point

Re: Monitoring long / stuck CTAS

2015-05-29 Thread Ted Dunning
Apologies for the plug, but using MapR FS would help you a lot here. The trick is that you can run an NFS server on every node and mount that server as localhost. The benefits are: 1) the entire cluster appears as a conventional POSIX style file system in addition to being available via HDFS

Re: what's the differenct between drill and optiq

2015-05-27 Thread Ted Dunning
Andrew, What Hive does not have is the extensions that Drill has that allow SQL to be type flexible. The ALL type and all of the implications both in terms of implementation and user impact it has are a really big deal. On Wed, May 27, 2015 at 6:08 AM, Andrew Brust

Re: Custom UDFS slow

2015-05-26 Thread Ted Dunning
On Tue, May 26, 2015 at 7:26 PM, Adam Gilmore dragoncu...@gmail.com wrote: The code for the WEEK() function is not far from the code from the source for the EXTRACT(DAY) function. Furthermore, even if I copy the exact code for the EXTRACT(DAY) function into that, it has the same performance

Re: avro in HBase columns?

2015-05-30 Thread Ted Dunning
Additional converters would be a great starter contribution! On Fri, May 29, 2015 at 3:32 PM, andrew and...@primer.org wrote: Hi, DRILL-1512 only works for reading Avro files off a filesystem such as HDFS. As far as I know the convert_from() function only supports JSON at the moment.

Re: what's the differenct between drill and optiq

2015-05-27 Thread Ted Dunning
binding. On Wed, May 27, 2015 at 8:34 AM, Ted Dunning ted.dunn...@gmail.com wrote: Andrew, What Hive does not have is the extensions that Drill has that allow SQL to be type flexible. The ALL type and all of the implications both in terms of implementation and user impact it has

Re: Interaction between seperate Drill clusters possible?

2015-06-02 Thread Ted Dunning
for the enduser. I'll first try your suggestion before I dive into any plugin details though. Thanks again On Tue, Jun 2, 2015 at 11:17 AM, Ted Dunning ted.dunn...@gmail.com wrote: Happy to help. Let me know when you find the ways that my answer was incomplete or not according to your needs

Re: Monitoring long / stuck CTAS

2015-05-29 Thread Ted Dunning
and mirrors carry over to tables as well. Oh, and it tends to be a lot faster and failure tolerant as well. On Fri, May 29, 2015 at 7:00 AM, Yousef Lasi yousef.l...@gmail.com wrote: Could you expand on the HBase table integration? How does that work? On Fri, May 29, 2015 at 5:55 AM, Ted Dunning

Re: Querying parquet files

2015-07-07 Thread Ted Dunning
How many columns do you have? Do you understand about columnar data stores and how selecting only a single column means that much less data needs to be read? If your data consists, say, of integers, then Drill only needs to read 160MB to satisfy your query which is quite reasonable to be read in

Re: (noob) performance of queries against csv files

2015-07-02 Thread Ted Dunning
Hey Larry, Drill transforms your CSV data into an internal memory-resident format for processing, but does not change the structure of your original data. If you want to convert your file to parquet, you can do this: create table `foo.parquet` as select * from `foo.csv` This will, however,

Re: [newbie]: how to query HDFS

2015-05-22 Thread Ted Dunning
As a special case, with MapR, you can access all clusters in an administrative group by making sure that you have /mapr/cluster-name at the beginning of your path names. THis means that you can simply use different workspaces, or a workspace with a path consisting only of /mapr and still access

Re: To EMRFS or not to EMRFS?

2015-05-22 Thread Ted Dunning
The variation will have less to do with Drill (which can read all these options such as EMR resident MapR FS or HDFS or persistent MapR FS or HDFS or S3). The biggest differences will have to do with whether your clusters providing storage are permanent or ephemeral. If they are ephemeral, you

Re: [newbie]: how to query HDFS

2015-05-22 Thread Ted Dunning
I should have added that there is nothing wrong with the dual plugin approach on MapR. Works fine and it is up to you as a matter of personal choice which is better. On Fri, May 22, 2015 at 4:46 PM, Ted Dunning ted.dunn...@gmail.com wrote: As a special case, with MapR, you can access all

Re: Auto-splitting delimitted files

2015-05-21 Thread Ted Dunning
we would gain increasing efficiency. However, this doesn't appear to be the case as we're not really getting much improvement beyond the 30 minute range despite increasing parallelization by adding additional drill bits and file partitions. May 21 2015 12:55 AM, Ted Dunning ted.dunn

Re: Benchmarks for Apache Drill

2015-08-17 Thread Ted Dunning
. Drill can also have special purpose operators that recognize the potential for vectorization and insert those vectorized operators as practical. -Original Message- From: Ted Dunning [mailto:ted.dunn...@gmail.com] Sent: Monday, August 17, 2015 3:37 AM To: user user@drill.apache.org

Re: Need help in querying HDFS from drill

2015-08-20 Thread Ted Dunning
Some specific answers here. On Thu, Aug 20, 2015 at 4:13 AM, Malathi malu@gmail.com wrote: I have the following questions: 1) Is it possible to run drill in a machine outside hadoop cluster and query the hdfs files in the cluster? Yes. Absolutely. 2) If yes, is there any need of

Re: Show Files Command

2015-08-23 Thread Ted Dunning
The cleanest fix would be to make the INFORMATION schema return information about file system objects. Then you could do clean selects with whatever you needed to do. https://drill.apache.org/docs/querying-the-information-schema/ On Sun, Aug 23, 2015 at 8:31 AM, USC hsua...@usc.edu wrote: Hi

Re: Benchmarks for Apache Drill

2015-08-16 Thread Ted Dunning
The Drill project itself has not focussed on performance, other than in the basic architecture. There have been some external benchmarks independent of the Drill project by Intel and another group whose name escapes me. The intel work is presented here:

Re: Benchmarks for Apache Drill

2015-08-17 Thread Ted Dunning
has Have a good link where I could read more about that? -Original Message- From: Ted Dunning [mailto:ted.dunn...@gmail.com] Sent: Monday, August 17, 2015 1:52 AM To: user Subject: Re: Benchmarks for Apache Drill On Sun, Aug 16, 2015 at 10:22 PM, Andrew Brust andrew.br

Re: Benchmarks for Apache Drill

2015-08-16 Thread Ted Dunning
On Sun, Aug 16, 2015 at 10:22 PM, Andrew Brust andrew.br...@bluebadgeinsights.com wrote: I have to admit, I didn't realize columnar was such a big part of Drill. I guess that's consistent with Dremel, so it makes sense. I always thought the emphasis was on heterogenous data access, not on

Re: Custom Functions

2015-06-29 Thread Ted Dunning
Thanks. Very interesting input. On Mon, Jun 29, 2015 at 2:09 PM, Alok Tanna ata...@exertdigital.com wrote: Maven 3.3.3 -- Latest version , it worked with 3.0.5 On Mon, Jun 29, 2015 at 4:23 PM, Ted Dunning ted.dunn...@gmail.com wrote: On Mon, Jun 29, 2015 at 1:03 PM, Alok Tanna ata

Re: Convert Hbase Values from Bytes to Actual Strings

2015-08-06 Thread Ted Dunning
On Sat, Aug 1, 2015 at 5:53 PM, John Omernik j...@omernik.com wrote: I am working with this as well and I find the abstraction frustrating from a user perspective as well. Yes. It can be a pain. I understand we can create views, I've done that, but it's a tedious process of ensuring my

Re: Foreman Parallelizer not working with compressed csv file?

2015-07-23 Thread Ted Dunning
On Thu, Jul 23, 2015 at 2:19 PM, Juergen Kneissl her...@gmx.net wrote: On 07/23/15 22:04, Jason Altekruse wrote: I'm very glad to hear that it exceeded your expectations. An important point I would like to add, when you unzipped the file you likely allowed drill to ready not only on both

Re: Several questions...

2015-07-22 Thread Ted Dunning
Yes. Just right on that. Regarding the integer conversion, can you saw what format your data is in? Is it exactly 4 bytes, big endian? On Wed, Jul 22, 2015 at 5:34 AM, Alex Ott alex...@gmail.com wrote: Ok, answering my first question - I need to take the only the column name into the

Re: storage structure - querying directories - sanity check and UDF assistance

2015-07-24 Thread Ted Dunning
I think that constant reduction isn't entirely working in the presence of joins. For example, I removed the isRandom annotation from my random number generator. You can see constant reduction working if I give a literal number: 0: jdbc:drill:zk=local select b.x,a.y,random(1, 3) from (values

Re: Several questions...

2015-07-22 Thread Ted Dunning
On Wed, Jul 22, 2015 at 5:44 PM, Jacques Nadeau jacq...@dremio.com wrote: Good point. It is because in case the expression is evaluated after the data is materialized (yours) and the other the expression is evaluated at the same time the data is materialized (mine). In the case that they are

Re: Several questions...

2015-07-22 Thread Ted Dunning
On Wed, Jul 22, 2015 at 5:35 PM, Jacques Nadeau jacq...@dremio.com wrote: So this works: SELECT CONVERT_TO('[ [1, 2], [3, 4], [5]]' ,'UTF8') AS MYCOL1 FROM sys.version; +--+ |MYCOL1| +--+ | [B@7e308c04 | +--+ OK. So the difference here is

Re: Several questions...

2015-07-22 Thread Ted Dunning
Cool. On Wed, Jul 22, 2015 at 6:54 PM, Jacques Nadeau jacq...@dremio.com wrote: I'm sorry I wasn't clearer. The fact that the error is incomprehensible has already been fixed by Parth and will be part of 1.2 On Wed, Jul 22, 2015 at 6:42 PM, Ted Dunning ted.dunn...@gmail.com wrote

Re: Several questions...

2015-07-22 Thread Ted Dunning
have no problem converting value back via Bytes/toInt. On Wed, Jul 22, 2015 at 7:00 PM, Ted Dunning ted.dunn...@gmail.com wrote: Yes. Just right on that. Regarding the integer conversion, can you saw what format your data is in? Is it exactly 4 bytes, big endian? On Wed, Jul 22, 2015

Re: Several questions...

2015-07-23 Thread Ted Dunning
On Thu, Jul 23, 2015 at 8:18 AM, Jacques Nadeau jacq...@dremio.com wrote: The good news is, Drill does provide a nice simple way to abstract these details away. You simply create a view on top of HBase [1]. The view can contain the physical conversions. Then users can interact with the view

Re: Several questions...

2015-07-22 Thread Ted Dunning
Jacques, I just spent an hour or more trying to read the docs on convert_from/to. I had no success. There are plenty of examples of converting to or from UTF-8, but none describing conversions to do with integers. In doing (lots of) experiments, I have failed to 1) create a constant of binary

Re: Several questions...

2015-07-22 Thread Ted Dunning
On Wed, Jul 22, 2015 at 4:51 PM, Jacques Nadeau jacq...@dremio.com wrote: SELECT STRING_BINARY(CONVERT_TO(1, 'INT')) as i, STRING_BINARY(CONVERT_TO(1, 'INT_BE')) as i_be, STRING_BINARY(CONVERT_TO(1, 'BIGINT')) as l, STRING_BINARY(CONVERT_TO(1, 'BIGINT')) as l_be,

Re: Several questions...

2015-07-22 Thread Ted Dunning
On Wed, Jul 22, 2015 at 4:51 PM, Jacques Nadeau jacq...@dremio.com wrote: SELECT CONVERT_TO('[ [1, 2], [3, 4], [5]]','UTF-8') AS MYCOL1 FROM sys.version; File a bug. This should work. I would love to but I don't know what it should do. Note that the error messages in all of these cases

Re: Recursive CTE Support in Drill

2015-07-16 Thread Ted Dunning
is it to write a dummy plugin that returns one hardcoded row repeatedly 12 million times? Thanks, Alex On Fri, Jul 10, 2015 at 12:56 PM, Ted Dunning ted.dunn...@gmail.com wrote: It may be easy, but it is completely opaque about what really needs to happen. For instance, 1) how

Re: Recursive CTE Support in Drill

2015-07-18 Thread Ted Dunning
which would provide the outcome you requested. [1] https://github.com/apache/drill/blob/master/exec/java-exec/src/test/resources/mock-scan.json On Thu, Jul 16, 2015 at 10:16 PM, Ted Dunning ted.dunn...@gmail.com wrote: Also, just doing a Cartesian join of three copies of 1000 records

Re: Drill not picking up a UDF

2015-07-19 Thread Ted Dunning
On Sun, Jul 19, 2015 at 4:57 PM, Stefán Baxter ste...@activitystream.com wrote: Perhaps this was mentioned in the documentation but this is, at the very least, not straight forward and super-inviting. You are very correct on this. The purpose of the simple drill function is as a start at

Re: Drill not picking up a UDF

2015-07-19 Thread Ted Dunning
Sounds like a fine example, not because of sophistication but because it deals with dates. Check the drill logs. It is likely that drill is grumpy about something in your udf or packaging. Also, feel free to snitch the pom from the simple examples in order to get the pieces assembled and

Re: Drill with S3 without hardcoding credentials into core-site

2015-07-13 Thread Ted Dunning
On Sun, Jul 12, 2015 at 9:41 PM, Hafiz Mujadid hafizmujadi...@gmail.com wrote: I successfully connected Drill to S3 by placing access and secret keys in core-site.xml. Is it possible to use Drill with S3 without hardcoding credentials into core-site like defining credentials for multiple

Re: Drill not picking up a UDF

2015-07-20 Thread Ted Dunning
On Mon, Jul 20, 2015 at 8:22 AM, Andrew Brust andrew.br...@bluebadgeinsights.com wrote: 1. It seems to me like Drill is at a point where, if you thread the needle perfectly, things generally work as advertised. That’s certainly an advance over the old, old days, where stuff that should have

Re: query plan ....

2015-08-24 Thread Ted Dunning
or anything that goes with it ? It is difficult to figure out things without the full picture. thanks, Aman On Mon, Aug 24, 2015 at 5:10 PM, Ted Dunning ted.dunn...@gmail.com wrote: On Mon, Aug 24, 2015 at 4:50 PM, Sungwook Yoon sy...@maprtech.com wrote: Still, the performance drop

Re: query plan ....

2015-08-24 Thread Ted Dunning
On Mon, Aug 24, 2015 at 4:50 PM, Sungwook Yoon sy...@maprtech.com wrote: Still, the performance drop down due to OR filtering is just astounding... That is what query optimizers are for and why getting them to work well is important. The difference in performance that you are observing is not

Re: drill generated Parquet file compatibility with SparkSQL?

2015-08-24 Thread Ted Dunning
Can you supply some sample data and example queries? (see log-synth https://github.com/tdunning/log-synth for help synthesizing the data) On Mon, Aug 24, 2015 at 3:50 PM, Sungwook Yoon sy...@maprtech.com wrote: Hi, I generated Parquet files from Drill with CTAS. SparkSQL is throwing error

Re: Security with Storage Plugins

2015-10-28 Thread Ted Dunning
Saving workspace definitions into the file system has the desirable effect of making them very flexible and it also has the virtue that you can use file system permissions to control access to the underlying data. If you have different database accounts, you can embed the account definition into

Re: [Design Document] Support the Ability to Identify And Skip Records when Function Evaluations Fail

2015-10-23 Thread Ted Dunning
On Thu, Oct 22, 2015 at 4:06 AM, ganesh wrote: > To make it appear more good looking, I started with QLIK SENSE .. but was > unable to connect it with hadoop file system. It only showed me HIVE FILES. > Then I downloaded TABLEAU Trial version ... but I am unable to get

Re: Overriding delimiter at runtime

2015-11-01 Thread Ted Dunning
On Sun, Nov 1, 2015 at 7:26 PM, Jacques Nadeau wrote: > In the meantime, you'll have to create two different > workspaces. Each workspace can set its own defaultInputFormat. > Note also that multiple workspaces can refer to the same file locations. (I have seen a few folks

Re: Drill + gzipped-CSV performance

2015-10-07 Thread Ted Dunning
On Wed, Oct 7, 2015 at 2:03 PM, Jason Altekruse wrote: > Here is a presentation with some helpful information (I haven't read all > of it, but the table on slide 7 gies a nice overview of features in each > codec). > >

Re: Issue with Mongo Drill

2015-10-15 Thread Ted Dunning
MapR has pushed an independent advanced release of Drill 1.2-ish that likely has this fix. On Thu, Oct 15, 2015 at 7:56 AM, Kamesh wrote: > Hi James, > Which version of Drill are you using?. Also whether any of the documents > contain field of data type timestamp or

Re: Empty Array in JSON

2015-10-19 Thread Ted Dunning
; 415-497-8107 @krishahn skype:krishahn > > > On Mon, Oct 19, 2015 at 4:03 PM, Ted Dunning <ted.dunn...@gmail.com> > wrote: > > > Kristine, > > > > I just tried working with that data file > (sf-city-logs-json/citylogs.json). > > > > First, I tr

Re: Apache Drill connecting to OpenTSDB hbase

2015-10-20 Thread Ted Dunning
Chad's right. This is complicated stuff to pull apart for most users. On Tue, Oct 20, 2015 at 9:49 AM, Chad Smykay wrote: > On these note we could REALLY use a UDF for OpenTSDB tables from Drill. > -- > Kind Regards, > Chad Smykay | Solutions Architect | M:

Re: Querying parquet files

2015-07-07 Thread Ted Dunning
No. A very simple model like that breaks down on many levels. The most important level that reality intrudes in is the fact that your I/O probably can't really be threaded so widely. What kind of storage are you using? How big is your data? Sent from my iPhone On Jul 7, 2015, at 6:38,

Re: Recursive CTE Support in Drill

2015-07-09 Thread Ted Dunning
Are you hard set on using common table expressions? I have discussed a bit off-list creating a data format that would allow tables to be read from a log-synth [1] schema. That would let you read as much data as you might like with an arbitrarily complex (or simple) query. Operationally, you

Re: Various ramblings of a newbie

2015-07-11 Thread Ted Dunning
The problem being referred to was one where the type of the data changed and the order in which it was encountered made a difference. For files where the schema is known early (the only thing that ordinary SQL can handle), this won't happen. Also, the problem only occurred in nested data in

Re: Recursive CTE Support in Drill

2015-07-10 Thread Ted Dunning
, 2015 at 9:10 AM, Ted Dunning ted.dunn...@gmail.com wrote: Hakim, Not yet. Still very much in the stage of gathering feedback. I would think it very simple. The biggest obstacles are 1) no documentation on how to write a data format 2) I need to release a jar for log-synth

Re: Recursive CTE Support in Drill

2015-07-10 Thread Ted Dunning
adene...@maprtech.com wrote: @Ted, the log-synth storage format would be really useful. I'm already seeing many unit tests that could benefit from this. Do you have a github repo for your ongoing work ? Thanks! On Thu, Jul 9, 2015 at 10:56 PM, Ted Dunning ted.dunn...@gmail.com wrote

  1   2   3   >