Re: [DISCUSS] Add schema support for the XML format

2022-04-06 Thread Ted Dunning
XML will never die. The Cobol programmers were reincarnated and built similarly long-lasting generators of XML. If you have a schema, then it is a reasonable format for Drill to parse, if only to turn around and write to another format. On Wed, Apr 6, 2022 at 7:31 PM Paul Rogers wrote: > Hi L

Re: [DISCUSS] Add schema support for the XML format

2022-04-06 Thread Ted Dunning
That example: dog > cat can also convert to ["pet":"dog", "pet":"dog'] XML is rife with problems like this. As you say. But worse than can be imagined unless you have been hit by these problems. On Wed, Apr 6, 2022 at 11:39 AM Lee, David wrote: > TO_JSON won't work in cases where.. > > One

Re: [DISCUSS] Refactoring Drill's CSV (Text) Reader

2021-11-17 Thread Ted Dunning
I think that these would be significant improvements. The current behavior is pretty painful on average. Better defaults and just a bit of deduction could pay off big. I even think that the presence of headers might be pretty reliably inferred. On Wed, Nov 17, 2021 at 4:31 PM Charles Givre wro

Re: Very slow parquet write performance due to single threaded write

2021-09-28 Thread Ted Dunning
Ashu, Did you send this same message to a different list (possibly dev@drill?)? I remember answering it with some timing information, but see that you don't have an answer here. On 2021/09/23 15:08:08, Ashu Pachauri wrote: > Hi, > > I have been trying to load a medium sized csv file (22 mill

Re: New Docker images published automatically

2021-09-20 Thread Ted Dunning
This is great news. Makes me think that these might be the best way to try Drill out as well, especially where containers have low overhead (i.e. on Linux) On Mon, Sep 20, 2021 at 4:32 AM luoc wrote: > Hello James, > Great work. Is it possible to add this NOTICE to Github wiki or docs of > we

Re: Query the HBase data in Drill

2021-08-24 Thread Ted Dunning
I know somebody who is querying a very large table and has trouble with pushdown. They are looking for values indexed by primary key with a query like "select * from table where key in s". If s has a very small number of values, this turns into primary key access, but if there are more than just

Re: how do I get the connected drillbit's hostname or IP when using zookeeper?

2021-06-26 Thread Ted Dunning
Mark, That's a great trick to grep for a unique string in the query! You don't have to grep logs, however, when you can grep the profiles or profiles_json system table for the same information. That grepping is simple enough to do in SQL, so you should be good to go. On 2021/06/26 20:28:26,

Re: [ANNOUNCE] Apache Drill 1.19.0 Released

2021-06-14 Thread Ted Dunning
Congratulations to Laurent as a first time release manager! Well done. On Mon, Jun 14, 2021 at 5:56 PM Laurent Goujon wrote: > On behalf of the Apache Drill community, I am happy to announce the release > of Apache Drill 1.19.0. > > Drill is an Apache open-source SQL query engine for Big Data

Re: [VOTE] Release Apache Drill 1.19.0 - RC1

2021-06-05 Thread Ted Dunning
+1 I checked signatures on the source and binary tar files. I extracted the binary tar file and ran some simple queries and validated that the web UI came up when I started embedded (found a dead link in the docs ... filed a JIRA for that). I ran `mvn package` at the root level of the source as

Re: Feature/Question

2021-06-04 Thread Ted Dunning
On Fri, Jun 4, 2021 at 10:04 AM Akshay Bhasin (BLOOMBERG/ 731 LEX) < abhasi...@bloomberg.net> wrote: > Hi Ted, > > Yes - I was able to achieve parallelization like you mentioned below. My > queries were probably not large enough before. Thank you :) > That's great to hear. > On the s3 front -

drill queries from Python?

2021-05-27 Thread Ted Dunning
What is the currently accepted best way to run queries from Python?

Re: [DISCUSSION] ARM-based compatibility tests

2021-01-26 Thread Ted Dunning
I did some minimal testing in embedded mode way back, but nothing serious. I saw no issues at all. On Tue, Jan 26, 2021 at 2:53 AM luoc wrote: > Hi all, > > I have some ARM-based machines (Not X86 architecture), and then want to do > ARM-based compatibility tests. I know that Netty must bump

Re: [DISCUSSION] Roles and Privileges, Security, Secrets

2021-01-20 Thread Ted Dunning
I think that pushing too much of this kind of authentication and authorization logic into Drill has a large complexity risk. Anything to do with kerberos magnifies that complexity. I also think that it is a mistake to depend on user identity if authorization tokens are likely to need to be embedde

Re: Important Message about Bay Area Apache Drill User Group

2020-04-20 Thread Ted Dunning
group? > -- C > > > On Apr 20, 2020, at 2:27 PM, Ted Dunning wrote: > > > > Does anybody want to change this? > > > > -- Forwarded message - > > From: Meetup > > Date: Mon, Apr 20, 2020 at 8:43 AM > > Subject: Important Message about B

Fwd: Important Message about Bay Area Apache Drill User Group

2020-04-20 Thread Ted Dunning
Does anybody want to change this? -- Forwarded message - From: Meetup Date: Mon, Apr 20, 2020 at 8:43 AM Subject: Important Message about Bay Area Apache Drill User Group To: [image: Meetup]

Re: Drill large data build up in fragment by using join

2020-04-15 Thread Ted Dunning
Can you give a sample query? On Wed, Apr 15, 2020 at 12:32 PM Shashank Sharma < shashank.sha...@jungleworks.com> wrote: > Hi folks, > > I have a two large big json data set and querying on distributed apache > drill system, can anyone explain why it is making or build billion of > records to s

Re: Apache Drill Sizing guide

2020-04-13 Thread Ted Dunning
Navin, Your specification of 40 concurrent users and data size are only a bit less than half the story. Without the rest of the story, nobody will be able to give you even general guidance beyond a useless estimate that it will take between roughly 1 and 40 drillbits with with a gob of memory. To

Re: Querying encrypted JSON file

2020-04-11 Thread Ted Dunning
Yes. You need to write a special file format for that, though. On Sat, Apr 11, 2020 at 6:58 AM Prabhakar Bhosaale wrote: > Hi All, > I have a encrypted JSON file. is there any way in drill to query the > encrypted JSON file? Thanks > > Regards > Prabhakar >

Re: Apache Drill Support concurrent parallel Request

2020-04-08 Thread Ted Dunning
Another thing that user's will see when they start trying to use Drill for concurrent queries is that Drill assumes that it is OK to spend quite a bit of time optimizing a query before running it. Taking 500 ms to optimize the query can be a really bad trade-off if your query only takes 100ms to ru

Re: Linux versions supported for Apache drill

2020-04-03 Thread Ted Dunning
I doubt that normal Java programs could even tell the difference very easily between ARM and Intel architectures . On Fri, Apr 3, 2020 at 12:31 PM Paul Rogers wrote: > .. > > > So, I'm guessing if Drill runs on your Raspberry Pi (ARM-based), it will > probably run on just about any i64 Linux. >

Re: Linux versions supported for Apache drill

2020-04-03 Thread Ted Dunning
Paul, My Raspberry Pi4's run Drill with no problem. They have 4GB of RAM. On Fri, Apr 3, 2020 at 10:41 AM Paul Rogers wrote: > Hi Prabhakar, > > Drill is written in Java and should support just about any Linux version; > certainly all the major versions. It's been run on MacOS, Ubuntu, CentOS

Re: Patterns for data updating?

2020-02-27 Thread Ted Dunning
Yes. I have seen things like this before. Typically, if you have short time-to-visibility requirements, some kind of database is required. If you have large data and long retention requirements, it can be advantageous to roll out to a columnar compressed form like parquet. The design that I have

Re: Timestamp Issue

2020-02-06 Thread Ted Dunning
That is really frustrating because that timestamp is literally in an ISO 8601 format. https://en.wikipedia.org/wiki/ISO_8601 It would be nice if these formats just worked by default. On Thu, Feb 6, 2020 at 5:05 AM Charles Givre wrote: > Hi Drill Devs > I'm having a small issue interpreting

Re: [DISCUSS]: Thoughts

2020-01-30 Thread Ted Dunning
Igor, Good documentation and first 5-minute experience are very important, but not because a long-term contributor will see it and commit their spare time for the next five years on that basis. It is more about preventing early attrition of contributors who might find the project very exciting due

Re: Embedding Drill as a distributed query engine

2020-01-21 Thread Ted Dunning
Hmmm I disagree with a lot of what Paul says. Here is where I agree fully: 1) collocating processes in the same JVM increases the blast radius of failures. If either the DB or the Drill threads go south, it will take the other out. This is a relatively low probability event, but increasing t

Re: Embedding Drill as a distributed query engine

2020-01-21 Thread Ted Dunning
Benjamin, I can't answer you precise question, but this is definitely a viable use case. MapR does the same thing in the OJAI interface to MapR DB. The significant difference is that the drill exec and the software handling the db bits are segregated into separate processes and I think that there

Re: Regarding EXCEPT clause of drill

2020-01-16 Thread Ted Dunning
I think that set subtraction is considerably harder than union. It is unlikely to be a simple hack to an operator. But it should be possible to build planner rules that expand an except expression into a plan to do an outer join and filter away results that have non-null right hand sides. The rea

Re: About integration of drill and arrow

2020-01-15 Thread Ted Dunning
On Wed, Jan 15, 2020 at 2:58 PM Paul Rogers wrote: > ... > > For example, Ted, you mention lack of nullability on structure members. > But, Drill represents structures as MAPs, and MAPs can have nullable > members. So, there is likely more to your request than the short summary > suggests. Perhap

Re: About integration of drill and arrow

2020-01-15 Thread Ted Dunning
Jiang, It is sooo cool to hear from actual users in the real world. I would confirm that I have had real problems using drill on nested data. My particular problem wasn't lack of functions, however. It had to do with the fact that without nullable members of structures, I couldn't tell when field

Re: JSON API

2019-12-16 Thread Ted Dunning
Anton, It wouldn't be so hard to add such a capability, but most applications/users so far are OK with a synchronous API. The interface is pretty simple, however. Would you be interested in contributing to an asynchronous alternative? Or at least the design of such a thing? On Mon, Dec 16, 2019

Re: Apache Drill 1.15 EOL Inquiry

2019-12-05 Thread Ted Dunning
Hi, The mailing list that you sent your question to goes to the open source community that develops Apache Drill. This community is supported/a part of the Apache Software Foundation which is a public benefit charity whose mission is the development of software for the public good. None of the con

Re: Check presence of field in json file

2019-09-15 Thread Ted Dunning
ter clause. However it actually does not work with "mytag is > not null and mytag = 'hello'" because I get the following error for files > where mytag is not present. > > SYSTEM ERROR: NumberFormatException: hello > > The physical plan shows that the query optimi

Re: Check presence of field in json file

2019-09-15 Thread Ted Dunning
Keep in mind the danger if testing Foo!=null. That doesn't work and catches me by surprise all the time. Foo is null and variants are what you need. On Sat, Sep 14, 2019, 4:56 PM hanu mapr wrote: > Hello Sebastian, > > By default Drill sets the field 'foo' to null for the files that don't > cont

Re: EMC ECS Configuration with Apache Drill

2019-08-21 Thread Ted Dunning
> simple while configuring drill, but this seems to be far beyond that. > > I'm not sure whether I can get a proxy and also just in case if any other > issues occur as well, is there a way I can debug the code to understand > what values are being passed ? > > On Tue, Aug 2

Re: EMC ECS Configuration with Apache Drill

2019-08-19 Thread Ted Dunning
On Mon, Aug 19, 2019 at 11:33 AM Prabu Mohan wrote: > but i am able to connect to ECS via python using boto3 libraries without > any issues, I am able to write files to the bucket and read them back .. > > not sure why i am facing issues with drill though with the same credentials > The key her

Re: EMC ECS Configuration with Apache Drill

2019-08-19 Thread Ted Dunning
; > org.apache.calcite.jdbc.DynamicRootSchema.getImplicitSubSchema(DynamicRootSchema.java:69) > ~[drill-java-exec-1.16.0.jar:1.18.0-drill-r0] > > at > org.apache.calcite.jdbc.CalciteSchema.getSubSchema(CalciteSchema.java:262) > ~[calcite-core-1.18.0-drill-r0.jar:1.18.0-dril

Re: EMC ECS Configuration with Apache Drill

2019-08-18 Thread Ted Dunning
Did you see anything in any logs? On Sun, Aug 18, 2019 at 10:16 PM Prabu Mohan wrote: > I am able to connect to the http endpoint using boto3 from python (able to > retrieve files/store files), from IE with https and port 9021 , it comes > back with 403 Forbidden indicating that it was able to

Re: Clarification regarding Apache drill setup

2019-08-16 Thread Ted Dunning
My guess is that spilling to S3 will be disastrously slow. On Fri, Aug 16, 2019 at 9:37 AM Paul Rogers wrote: > Hi Manu, > > To add a bit more background... Drill uses local storage only for spilling > result sets when they are too large for memory. Otherwise, data never > touches disk once re

Re: Percentile window function in apache drill

2019-08-12 Thread Ted Dunning
Currently, the way to do this is with window queries where you sort each sub-group and grab the pertinent rows as an approximation of the quantiles you want. Another way would be to use an approximate data structure like a t-digest via an aggregating user-defined function (UDF). Last I checked, t

Re: Apache Drill Hangout - July 9, 2019

2019-07-10 Thread Ted Dunning
It won't be possible to find a time that works for (Kiev, Germany, US (east and west) and Asia. Traditionally, Drill has mostly had contributors from EU and US so that made it possible to find an overlapping time. I would suggest that Hangout times be varied so that some hit the traditional EU+US

Re: Superset and Drill

2019-06-02 Thread Ted Dunning
Nice. I started on this ages ago and got stalled. So very nice that others had more stickem than me and actually followed through on this. On Sun, Jun 2, 2019 at 7:34 AM Charles Givre wrote: > Hello Everyone, > I wanted to send this note to the Drill aliases but as of this weekend, > the Dri

Re: Question current/future functionality

2019-05-29 Thread Ted Dunning
Various people have done pretty much what you suggest. Can you say just a bit more about what you want to do? Are you looking to push queries down into the web service? Does is the web service basically exposing a limited number of datasets that are not subject to filtering inside the web service

Re: strange planning error

2019-05-25 Thread Ted Dunning
Filed https://issues.apache.org/jira/browse/DRILL-7277 On Fri, May 24, 2019 at 11:56 PM Ted Dunning wrote: > > Good eye to spot the issue with redundant ordering. No, I don't need both. > The reason I wound up with two is due to my building and rebuilding the > query many t

Re: strange planning error

2019-05-24 Thread Ted Dunning
26* |* > > *| *4 * | *222* |* > > *| *5 * | *100* |* > > *| *6 * | *32 * |* > > *| *7 * | *16 * |* > > *| *8 * | *9 * |* > > *| *9 * | *7 * |* > > *| *10* | *5 * |* > > *| *11* | *4 * |* > > *| *12* | *2 * |* > > *++-+* > > 12

strange planning error

2019-05-24 Thread Ted Dunning
I have a bunch of data that I want to group up and look at the counts. In order to get a row number for plotting, I tried a window function. The data consists of about 7,2 million rows accessed via a view[1]. The columns are pretty much all untyped[2]. This query works great: *with * *t0 as (*

Re: Query Question

2019-04-12 Thread Ted Dunning
On Fri, Apr 12, 2019 at 9:51 AM Paul Rogers wrote: > Hi All, > > The trick here, of course, is that Drill does not have the tuple concept > of Python: > > zip([1, 2, 3], [4, 5, 6]) --> [(1, 4), (2, 5), (3, 6)] > > > There is no good way in Drill, to represent the array of (1, 4), (2, 5) > pairs.

Re: Query Question

2019-04-11 Thread Ted Dunning
The semantics for zip with different length arguments tend to be either ignore tail of longer argument as Python does with zip or to reuse shorter arguments to fill out to the length of the longest argument as R does with cbind and rbind

Re: [DISCUSS]: Additional Formats for Drill

2019-04-02 Thread Ted Dunning
I have no idea how much uptake these would have, but if the library can give all the formats all at once for modest effort, that would be great. On Tue, Apr 2, 2019 at 9:22 AM Charles Givre wrote: > Hello everyone, > I recently presented a talk at the ASF DC Roadshow (shameless plug[1] ) > but h

Re: Import drill sources in eclipse

2019-02-23 Thread Ted Dunning
Yes. There is a trick here. You can compile plugins separately and just drop jar files into a special directory. When you restart drill (very quick when using embedded mode), these jars are loaded. You may need to rename the pcap format or even remove it entirely from the main drill source code w

Re: Drill fails to query pcap files

2019-02-10 Thread Ted Dunning
could send a few more examples, I’d like to test this on other > files to make sure it works with them. We’re also going to have to do the > same thing for the PCAP-NG format I would assume. > > > On Feb 10, 2019, at 03:07, Ted Dunning wrote: > > > > On Sat, Feb 9, 2019

Re: Drill fails to query pcap files

2019-02-10 Thread Ted Dunning
On Sat, Feb 9, 2019 at 2:25 PM Bob Rudis wrote: > ... > And, I did indeed find a few and am just waiting for a formal review so I > can submit them for the Drill dev & tests. > Awesome!

Re: Drill fails to query pcap files

2019-02-09 Thread Ted Dunning
t; > > On Feb 7, 2019, at 11:45, Ted Dunning wrote: > > > > Giovanni, > > > > A critical thing to help progress here is sample corrupted data. Even > just > > information about what kind of corruption you are seeing is important. > > > > Packet corr

Re: Drill fails to query pcap files

2019-02-07 Thread Ted Dunning
Bob, That would be an awesome contribution! On Thu, Feb 7, 2019 at 5:45 PM Bob Rudis wrote: > Sir Givre: > > I'll be able to (likely this weekend) go back ~18mos and re-test a > bunch of our honeypot PCAP files (I remember various ones failing at > the time). If I do find "bad" ones, they'll

Re: Drill fails to query pcap files

2019-02-07 Thread Ted Dunning
Giovanni, A critical thing to help progress here is sample corrupted data. Even just information about what kind of corruption you are seeing is important. Packet corruption is a key technique of malware so handling bad records well is of great importance. On Thu, Feb 7, 2019 at 3:54 PM Giovan

Re: Is it possible to evaluate different physical plans for a given query with Apache Drill?

2018-12-07 Thread Ted Dunning
If your data is separated from drill by a high latency / high cost link then it's probably better to move the data closer to drill before starting the query. The rationale behind this is that when certain costs absolutely dominate then it's really better to optimize the overall process essentially

Re: Time for a fun Query Question

2018-12-05 Thread Ted Dunning
gt; Regards, Joel > > On Tue, Dec 4, 2018 at 10:03 PM Ted Dunning wrote: > > > I would parse the timestamp into seconds since epoch. Then divide by use > > floor(ts/600) as the key to group on 10 minute boundaries. > > > > This works because: > > > > - al

Re: Time for a fun Query Question

2018-12-04 Thread Ted Dunning
I would parse the timestamp into seconds since epoch. Then divide by use floor(ts/600) as the key to group on 10 minute boundaries. This works because: - all timezones are multiples of 10 minutes away from UTC - all leap seconds are hidden in the seconds since epoch conversions - the epoch was

Re: Drill performance - Waiting time

2018-11-29 Thread Ted Dunning
Matthias, Kunal gives very good information about how to start from the high level to debug this, but you should also be suspicious of the lower levels. For instance, are you sure that your file system is working correctly? Is the file actually stored on MapR? How long does it take to run someth

Re: Drill support for SQLPad

2018-11-29 Thread Ted Dunning
That's cool. On Thu, Nov 29, 2018, 07:33 Charles Givre All, > There is a really nice open source tool out there called SQLPad. In > addition to executing basic SQL Queries, SQLPad enables to to export > results and produce basic visualizations. Until recently, SQLPad did not > support Drill how

Re: Love Drill - Hate Key Has String Token

2018-08-27 Thread Ted Dunning
Can you post a sample file with, say, 5-10 lines? Is it the file names? Or the data values that are giving you fits? On Mon, Aug 27, 2018, 12:51 John Folkers wrote: > Hello, I downloaded Drill over the weekend, and I love it. > > > Problem: $ string token in a key. > > > Question: How can I ge

Re: query performance with unequal drillbits

2018-08-27 Thread Ted Dunning
PDT, John Omernik < > j...@omernik.com> wrote: > > I will +1 Ted's idea. By doing small drillbits, it does take a bit more > overhead, but you also have an ability to scale your Drill cluster size > (especially using the Drillbit shutdown features added recently).

Re: query performance with unequal drillbits

2018-08-22 Thread Ted Dunning
Cool On Wed, Aug 22, 2018, 17:07 scott wrote: > Thanks Ted and Paul. I've been experimenting with the "hack" method. It > works somewhat, and I guess will have to do. > > On Tue, Aug 21, 2018 at 2:50 PM Ted Dunning wrote: > > > A cheap hack is to use

Re: query performance with unequal drillbits

2018-08-21 Thread Ted Dunning
A cheap hack is to use multiple smaller drillbits. Put more drillbits on the hefty machines and fewer on the weaker ones. This increases overheads, but it might help you out. On Tue, Aug 21, 2018 at 1:48 PM scott wrote: > Hi community, > I am trying to find a way to tune Drill so that weaker

Re: Array Index Out of Bounds in String Binary

2018-07-13 Thread Ted Dunning
There are bounds for acceptable behavior for a function like this. Array index out of bounds is not acceptable. Aborting with a clean message about to true problem might be fine, as would be to return a null. On Fri, Jul 13, 2018, 13:46 John Omernik wrote: > So, as to the actual problem, I open

Re: help drill down in production

2018-07-12 Thread Ted Dunning
On Thu, Jul 12, 2018 at 1:36 PM jose luis wrote: > No, I Start Drill in embedded mode using the drill-embedded command: > bin/drill-embedded. > As others have said, that is a mistake. Don't do that. > And, I installed drill in ubuntu in an amazon aws free instance that also > has only 1 gigaby

Re: Which perform better JSON or convert JSON to parquet format ?

2018-06-11 Thread Ted Dunning
Yes. Drill is good at JSON. But Parquet will be faster during a scan. Faster may be better. Or other things may be more important. You have to decide what is important to you. The great virtue of drill is that you have the choice. On Mon, Jun 11, 2018 at 11:06 AM Divya Gehlot wrote: > Thank

Re: Which perform better JSON or convert JSON to parquet format ?

2018-06-11 Thread Ted Dunning
I am going to play the contrarian here. Parquet is not *always* faster than JSON. The (almost unique) case where it is better to leave data as JSON (or whatever) is when the average number of times that a file is read is equal to or less than roughly 1. The point is that to convert read the file

Re: Apache Drill in 10 Minutes - Problems

2018-05-30 Thread Ted Dunning
ries (I had put it into Program Files). > > Unfortunately I still get the same problem. > > Dave. > > > > -Original Message- > From: Ted Dunning [mailto:ted.dunn...@gmail.com] > Sent: 30 May 2018 10:08 > To: user > Subject: Re: Apache Drill in 10 Minutes - Pr

Re: Error Joining Two Tables In Apache Drill

2018-05-30 Thread Ted Dunning
Also, do you have some data that looks like 'XXX'? On Wed, May 30, 2018 at 1:18 PM Charles Givre wrote: > Three questions… > 1. Have you tried this with the format string ‘#’ (A single #) > 2. Have you tried the join w/o any function wrapper around the field? > 3. I notice that the number of

Re: Is Drill site down ?

2018-05-30 Thread Ted Dunning
https://drill.apache.org/ works for me. What URL were you trying? Can you ping that host? On Wed, May 30, 2018 at 10:03 AM Divya Gehlot wrote: > Hi, > I am getting 404 error when opening the drill page. > Has any body else facing the issue ? > > Thanks, > Divya >

Re: Apache Drill in 10 Minutes - Problems

2018-05-30 Thread Ted Dunning
I see no evidence that this is the problem other than people are saying "problem on windows", but is there a space in a directory name somewhere? This is a classic problem when moving programs from Linux/OSX and has been a problem with Drill in the past. It isn't a technical issue so much as direc

Re: question about views

2018-04-30 Thread Ted Dunning
ew would hit the parquet file only when > > the timestamp predicate would match a partition ? > > > > Any news on a recent test to confirm the design ? > > > > Thanks > > > > 2018-03-20 6:49 GMT+01:00 Ted Dunning : > > > > > Aman, > > > >

Re: question about views

2018-03-19 Thread Ted Dunning
prune out un-necessary data, using partitions or indexes, I > > don't think adding a union between them would alter that behavior. > > > > -Rahul > > > > On Mon, Mar 19, 2018 at 1:44 PM, Ted Dunning wrote: > > > > > IF I create a view that is a union

question about views

2018-03-19 Thread Ted Dunning
IF I create a view that is a union of partitioned parquet files and a database that has secondary indexes, will Drill be able to properly push down query limits into both parts of the union? In particular, if I have lots of archival data and parquet partitioned by time but my query only asks for r

Re: Participate in the Apache Drill Poll on Twitter

2018-03-16 Thread Ted Dunning
Did it. On Mar 16, 2018 5:37 AM, "Saurabh Mahapatra" wrote: > Participate in the Apache Drill Poll and have your voice heard through ONE > vote: https://lnkd.in/gfWWXGd > > >

Re: Way to "pivot"

2018-03-06 Thread Ted Dunning
Arjun's approach works even if the timestamps are not unique. Especially if you use avg instead of max. On Mar 6, 2018 8:47 AM, "Arjun kr" wrote: > If each timestamp has only one set of values for (x,y,z) , you can try > something like below. > > select dt , > max(case when source='X' THEN `val

Re: Zookeeper distribution

2018-02-24 Thread Ted Dunning
Should have no effect at all. Drill only uses zookeeper to coordinate itself, not to coordinate with other systems. On Feb 24, 2018 15:25, "Tom Barber" wrote: > Hi folks, > > Random question I've seen a few different answers to... > > If I have a Hadoop cluster, but run Drill on a separate Zook

Re: Which Hadoop File Format Should I Use?

2018-02-07 Thread Ted Dunning
Carbondata does look very cool, but I haven't seen any significant user adoption which means that I haven't heard very many war stories. On Wed, Feb 7, 2018 at 11:58 AM, Saurabh Mahapatra < saurabhmahapatr...@gmail.com> wrote: > ... > The Carbondata project looks quite promising. > > Any though

Re: PCAP files with Apache Drill and Sergeant R

2018-02-07 Thread Ted Dunning
On Wed, Feb 7, 2018 at 10:18 AM, Bob Rudis wrote: > ... > I just wish I had time to PR into the project to have it not totally bork > on imperfect packets, support more PCAP formats and add in/port some helper > UDF decoders. > That is super frustrating. I just helped John Omernik fix a big clas

Re: PCAP files with Apache Drill and Sergeant R

2018-02-07 Thread Ted Dunning
On Tue, Feb 6, 2018 at 1:08 AM, Arjun kr wrote: > ... > I don't have any clue about using Drill with 'R Sergeant library' library. > Hopefully, others can throw any lights on this question. > I just looked this up and in their own words: Jul 17, 2017 - *sergeant*: Tools to Transform and Query D

Re: Using a plugin on files with wrong extensions

2018-01-04 Thread Ted Dunning
Set the default file type for a workspace to be pcap. Then, if the extension doesn't make sense you get the right result. On Thu, Jan 4, 2018 at 9:52 AM, John Omernik wrote: > Hello all - > > I was looking at using the PCAP plugin here, and I setup the type for pcap > in the storage plugin. I

Re: Can Apache Drill perform streaming queries?

2017-11-09 Thread Ted Dunning
Confluent has a non-Apache product, I think, for streaming SQL. On Thu, Nov 9, 2017 at 4:50 PM, Saurabh Mahapatra wrote: > Isn't there the new Kafka plugin? What does that exactly do? > > Best, > Saurabh > > Sent from my iPhone > > > > > On Nov 9, 2017, at 5:15 AM, kant kodali wrote: > > > > H

Re: Drill Capacity

2017-11-02 Thread Ted Dunning
What happens if you split your large file into 5 smaller files? On Thu, Nov 2, 2017 at 12:52 PM, Yun Liu wrote: > Yes- I increased planner.memory.max_query_memory_per_node to 10GB > HEAP to 12G > Direct memory to 16G > And Perm to 1024M > > It didn't have any schema changes. As with the same f

Re: Time series storage with parquet

2017-11-01 Thread Ted Dunning
Rahul Ctas plus some file moves are what you need. Do a query against the new file to force the meta data cache to be updated. Also, consider not building the weekly files. You might measure their impact but I would expect no gain and possibly some loss of performance due to less parallelism. In

Re: Drill performance question

2017-10-30 Thread Ted Dunning
Also, on a practical note, Parquet will likely crush CSV on performance. Columnar. Compressed. Binary. All that. On Mon, Oct 30, 2017 at 9:30 AM, Saurabh Mahapatra < saurabhmahapatr...@gmail.com> wrote: > Hi Charles, > > Can you share some query patterns on this data? More specifically, the >

Re: Querying Parquet files in 2.0 format

2017-09-25 Thread Ted Dunning
Oscar, Is parquet 2.0 required? Is there a particular feature that makes 2.0 especially attractive in spite of the inter operability difficulties it will cause? The answer to these questions are important for the community to prioritize support for 2.0. On Sep 25, 2017 2:40 AM, "Oscar Torreño"

Re: Query Error on PCAP over MapR FS

2017-09-14 Thread Ted Dunning
throw UserException.dataReadError(io) > .addContext("File name:", inputPath) > .build(logger); > } > } > > Thanks, > > Arjun > > From: Ted Dunning > Sent: Thursday, September 14, 2017 1:1

Re: Query Error on PCAP over MapR FS

2017-09-14 Thread Ted Dunning
PCAP shouldn't care at all about the underlying file system. On Thu, Sep 14, 2017 at 9:38 AM, Takeo Ogawara wrote: > I’m not sure PcapRecordReader supports HDFS/MapR FS. > Supporting HDFS/MapR FS or not is different by which file format? > I don’t understand Drill architecture well... > > > If

Re: ***UNCHECKED*** Re: Query Error on PCAP over MapR FS

2017-09-12 Thread Ted Dunning
e read in parallel (e.g. rows are restricted to one line each), then the > plugin can be designed to read the file in parallel. > > > Are PCAP records single-line records? > > > Thanks. > > > --Robert > > > From: Ted Dunning >

Re: ***UNCHECKED*** Re: Query Error on PCAP over MapR FS

2017-09-12 Thread Ted Dunning
y small > files rather than one large file. This is the preferred approach, and it > seems that this approach works for you. Please let me know if you think > otherwise, that you need to access your data in one large PCAP file. > > > Thanks. > > > --Robert > > > _

Re: ***UNCHECKED*** Re: Query Error on PCAP over MapR FS

2017-09-12 Thread Ted Dunning
Don't thank me. Thank the Drill community that made that work! On Tue, Sep 12, 2017 at 6:03 AM, Takeo Ogawara wrote: > Query on directory and wildcard works well! > Thank you so much. > > > 2017/09/12 12:13、Ted Dunning のメール: > > > > On Tue, Sep 12, 2017 at 4:

Re: ***UNCHECKED*** Re: Query Error on PCAP over MapR FS

2017-09-11 Thread Ted Dunning
nel.java:131) > [netty-transport-4.0.27.Final.jar:4.0.27.Final] > > at > > io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511) > [netty-transport-4.0.27.Final.jar:4.0.27.Final] > > at io.netty.channel.nio.NioEventLoop. > processSelectedKeysOpt

Re: ***UNCHECKED*** Re: Query Error on PCAP over MapR FS

2017-09-11 Thread Ted Dunning
On Tue, Sep 12, 2017 at 4:53 AM, Takeo Ogawara wrote: > > > > Is it absolutely required to query large files like this? Would it be > > acceptable to split the file first by making a quick scan over it? > No,loading large file isn’t necessarily required. > In fact, this large PCAP file is created

Re: Query Error on PCAP over MapR FS

2017-09-11 Thread Ted Dunning
On Mon, Sep 11, 2017 at 11:23 AM, Takeo Ogawara wrote: > ... > > 1. Query error when cluster-name is not specified > ... > > With this setting, the following query failed. > > select * from mfs.`x.pcap` ; > > Error: DATA_READ ERROR: /x.pcap (No such file or directory) > > > > File name: /x.pcap >

Re: Does Drill Use Apache Struts

2017-09-08 Thread Ted Dunning
eferenced as a possibility) > > http://thehackernews.com/2017/09/apache-struts-vulnerability.html > > > > > On Fri, Sep 8, 2017 at 9:07 AM, Ted Dunning wrote: > > > Almost certainly not. > > > > What issues are you referring to? I don't follow struts.

Re: Does Drill Use Apache Struts

2017-09-08 Thread Ted Dunning
Almost certainly not. What issues are you referring to? I don't follow struts. On Sep 8, 2017 16:00, "John Omernik" wrote: Hey all, given the recent issues related to Struts, can we confirm that Drill doesn't use this Apache component for anything? I am not good enough at code reviews to see w

Re: R interface to Apache Drill now on CRAN

2017-07-17 Thread Ted Dunning
Cool! On Mon, Jul 17, 2017 at 7:33 PM, Bob Rudis wrote: > Hey folks, > > Those using R or contemplating dabbling in R can now grab the > 'sergeant' package — an R interface to Apache Drill — directly from > CRAN : https://cran.r-project.org/package=sergeant > > As stated previously, the packag

Re: Drill Summit/Conference Proposal

2017-06-18 Thread Ted Dunning
On Sat, Jun 17, 2017 at 11:03 PM, Charles Givre wrote: > I've never been but what about OsCon? > Great option. It is bigger and better attended than ApacheCon (lately). And they allow specialized tracks.

Re: CTAS to wait till the time table is created

2017-06-08 Thread Ted Dunning
On Thu, Jun 8, 2017 at 7:07 AM, Sing, Jasbir wrote: > I am using CTAS command to copy one parquet file from another. But my > threads are not waiting for the task completion and are moving forward. I > want my tread to wait till the time my parquet file is created. > How can I achieve this? > Wh

Re: Reg; Apache Drill

2017-06-07 Thread Ted Dunning
Pritam, Let me rephrase what Bob has to say. It has some merit, but it also probably has a bit more sting than it needs to have. The first question that you need to look at in any kind of textual analysis project is what kind of data you are likely to have. How will the data be presented to you?

  1   2   3   4   >