Re: [ANNOUNCE] Apache Drill 1.17.0 Released

2019-12-30 Thread Aman Sinha
Congratulations on a great release ! Seems quite feature rich. Also thanks to Volodymyr for shepherding the release even during the holiday season. On Fri, Dec 27, 2019 at 2:34 AM Arina Yelchiyeva wrote: > Congrats everyone, great job! > > Kind regards, > Arina > > > On 26 Dec 2019, at 20:32,

Re: Drill Resources Information

2019-10-14 Thread Aman Sinha
Hi Charles, Resource provisioning is a broad area and workload specific but perhaps the following presentations and doc links might help: [1] https://www.slideshare.net/MapRTechnologies/putting-apache-drill-into-production [2] MapR specific but the concepts should be generally applicable :

Re: [ANNOUNCE] New PMC Chair of Apache Drill

2019-08-23 Thread Aman Sinha
Congratulations Charles ! And thank you Arina ! -Aman On Thu, Aug 22, 2019 at 9:11 PM Divya Gehlot wrote: > Congratulations Charles ! > Looking forward for much better Drill and more addition in your book as > well :) > > Thanks , > Divya > > On Fri, 23 Aug 2019 at 12:07 PM, Bhargava

Re: Column alias in group by behavior in 1.16

2019-08-20 Thread Aman Sinha
The change in behavior occurred in Drill 1.15 when the group-by alias support was added [1]. Before this, we could not even group by an alias in the SELECT list. However, as Arina mentioned, the behavior is dependent on Calcite which is used by Drill. Does MySQL or other systems behave the same

Re: Percentile window function in apache drill

2019-08-12 Thread Aman Sinha
Drill currently supports the ranking window functions [1] but not specifically the percentile function. [1] https://drill.apache.org/docs/ranking-window-functions/ On Mon, Aug 12, 2019 at 7:34 AM Ted Dunning wrote: > Currently, the way to do this is with window queries where you sort each >

Re: Apache Drill Hangout July 23rd

2019-07-23 Thread Aman Sinha
arm and > overslept. Now I can't join the meeting, maybe it has finished. I will > issue the ParallelHashJoin PR recently. > > On Tue, Jul 23, 2019 at 10:14 AM Aman Sinha wrote: > > > Hi Drillers, > > > > We will have our bi-weekly hangout tomorrow, July 23rd, at 10 AM PST &g

Re: [ANNOUNCE] New Committer: Bohdan Kazydub

2019-07-23 Thread Aman Sinha
Congratulations Bohdan and thanks much for your contributions ! On Tue, Jul 23, 2019 at 3:13 AM Igor Guzenko wrote: > Congratulations, Bohdan! Great job !!! > > On Tue, Jul 23, 2019 at 12:33 PM Volodymyr Vysotskyi > > wrote: > > > Congratulations, Bohdan! Thanks for your contributions! > > > >

Re: [ANNOUNCE] New Committer: Igor Guzenko

2019-07-23 Thread Aman Sinha
Congratulations Igor and thanks for your contributions to Drill ! On Tue, Jul 23, 2019 at 3:33 AM Anton Gozhiy wrote: > Congratulations Igor, well deserved! > > On Tue, Jul 23, 2019, 12:31 Volodymyr Vysotskyi > wrote: > > > Congratulations, Ihor! Thanks for your contributions! > > > > Kind

Apache Drill Hangout July 23rd

2019-07-22 Thread Aman Sinha
Hi Drillers, We will have our bi-weekly hangout tomorrow, July 23rd, at 10 AM PST (link: https://meet.google.com/yki-iqdf-tai ). If there are any topics you would like to discuss during the hangout please respond to this email. I believe last time Weijie mentioned he could talk about the hash

Re: strange planning error

2019-05-24 Thread Aman Sinha
The changing of the column order in the SELECT should not materially affect the planning, so this does look like a bug. I was able to repro it with a simpler example (without the group-by): select row_number() over (order by department_id desc) r, department_id from

Re: May Apache Drill board report

2019-05-03 Thread Aman Sinha
+1 On Fri, May 3, 2019 at 1:40 PM Volodymyr Vysotskyi wrote: > Looks good, +1 > > > Пт, 3 трав. 2019 23:32 користувач Arina Ielchiieva > пише: > > > Hi all, > > > > please take a look at the draft board report for the last quarter and let > > me know if you have any comments. > > > > Thanks, >

Re: [RESULT] [VOTE] Apache Drill Release 1.16.0 - RC2

2019-05-01 Thread Aman Sinha
Great ! Thanks for managing this release Sorabh ! On Wed, May 1, 2019 at 9:22 AM SorabhApache wrote: > Hi All, > RC2 candidate for 1.16.0 passes the voting criteria. Thanks to everyone who > has tested and voted for release candidate. The summary of voting is: > > Total Votes: 8 > 5x +1

Re: [VOTE] Apache Drill Release 1.16.0 - RC2

2019-04-29 Thread Aman Sinha
Downloaded binary tarball on my Mac and ran in embedded mode. Verified Sorabh's release signature and the tar file's checksum Did a quick glance through maven artifacts Did some manual tests with TPC-DS Web_Sales table and ran REFRESH METADATA command against the same table Checked runtime query

Re: [VOTE] Apache Drill Release 1.16.0 - RC1

2019-04-26 Thread Aman Sinha
t;> > >>>>> On Wed, Apr 24, 2019 at 9:52 AM SorabhApache wrote: > >>>>> > >>>>>> Hi Volodymyr/Anton, > >>>>>> I can verify that I am seeing both the below issues as reported by > >>> Anton > >>>>&

Re: [VOTE] Apache Drill Release 1.16.0 - RC1

2019-04-24 Thread Aman Sinha
. > > > >> > > Tested new features of metadata caching by creating v4 cache > files > > > >> using > > > >> > > new Refresh Metadata commands and manually verified the cache > > files. > > > >> > Tried > > >

Re: [VOTE] Apache Drill Release 1.16.0 - RC1

2019-04-23 Thread Aman Sinha
Hi Vova, I added some thoughts in the DRILL-7195 JIRA. Aman On Tue, Apr 23, 2019 at 6:06 AM Volodymyr Vysotskyi wrote: > Hi all, > > I did some checks and found the following issues: > - DRILL-7195 > - DRILL-7194

Re: Blocker on drill upgrade path

2019-04-19 Thread Aman Sinha
> select max(last_name) last_name from > cp.`employee.json` group by > . . . . . . . . . . . > last_name limit 5; > ++ > | last_name | > ++ > | Nowmer | > | Whelply| > | Spence | > | Gutierrez | > | Damstra| > +

Re: Blocker on drill upgrade path

2019-04-19 Thread Aman Sinha
This is legal: select max(last_name) from cp.`employee.json` group by last_name limit 5; But this is not: select max(last_name) last_name from cp.`employee.json` group by last_name limit 5; The reason is the second query is aliasing the max() output to 'last_name' which is being referenced

Re: Query Question

2019-04-11 Thread Aman Sinha
> I thought flatten() would be the answer, however, if I flatten the columns, I get the following result: Regarding the flatten() output, this is expected because doing a 'SELECT flatten(a), flatten(b) FROM T' is equivalent to doing a cross-product of the 2 arrays. In your example, both arrays

Re: Issue faced in Apache drill

2019-04-09 Thread Aman Sinha
The last suggestion from Paul about CASTing to desired type should work: SELECT a, SUM(CAST(b as INT) ) FROM dfs.`C:\\Users\\user\\Desktop sample.json` group by a; I suggest filing a JIRA for the original query because for some reason if all values are NULLs, (and this is with group-by),

Re: January Apache Drill board report

2019-01-31 Thread Aman Sinha
Thanks for putting this together, Arina. The Drill Developer Day and Meetup were separate events, so you can split them up. - A half day Drill Developer Day was held on Nov 14. A variety of technical design issues were discussed. - A Drill user meetup was held on the same evening. 2

Re: Nested Window Queries

2019-01-03 Thread Aman Sinha
John, what's the full SQL query that you submitted ? On Thu, Jan 3, 2019 at 6:45 AM John Omernik wrote: > Is there a limitation on nesting of of Window Queries? I have a query > where I am using an event stream, and the changing of a value to indicate > an event. (The state goes from

Re: [VOTE] Apache Drill release 1.15.0 - RC0

2018-12-18 Thread Aman Sinha
@vita...@apache.org any idea why there's an extraneous directory in the source ? drwxrwxr-x vitalii/vitalii 0 2018-12-18 03:48 apache-drill-1.15.0-src/${project.*basedir*}/ drwxrwxr-x vitalii/vitalii 0 2018-12-18 03:48 apache-drill-1.15.0-src/${project.*basedir*}/src/ drwxrwxr-x

Re: [ANNOUNCE] New Committer: Salim Achouche

2018-12-17 Thread Aman Sinha
Congratulations Salim ! Thanks for your contributions ! Aman On Mon, Dec 17, 2018 at 3:20 AM Vitalii Diravka wrote: > Congratulations Salim! > Well deserved! > > Kind regards > Vitalii > > > On Mon, Dec 17, 2018 at 12:40 PM Arina Ielchiieva > wrote: > > > The Project Management Committee

Re: Is there any plan for drill to support Parquet Format version 2.5 added column indexes?

2018-12-12 Thread Aman Sinha
This seems quite interesting. Drill does row group pruning, but doing the page level pruning based on indexes would be big win. Also, as you may know, Drill recently added a feature to leverage secondary indexes in NoSQL databases [1]. However, we have to see whether that capability applies to

Re: Apache Drill Meetup on Nov 14th!

2018-11-20 Thread Aman Sinha
che Drill! The next meet up will be on Nov > 14th at 6:30 PM at the MapR Headquarters. > > We will have two speakers for the meetup > - Nitin Sharma @ Netflix who will talk about Netflix's Personalization > Infrastructure > - Aman Sinha @ MapR who will talk about a brand

Re: Hangout Discussion Topics

2018-11-12 Thread Aman Sinha
Since we are having the Drill Developer day on Wednesday, perhaps we can skip the hangout tomorrow ? Aman On Mon, Nov 12, 2018 at 10:13 AM Timothy Farkas wrote: > Hi All, > > Does anyone have any topics to discuss during the hangout tomorrow? > > Thanks, > Tim >

Re: [ANNOUNCE] New Committer: Hanumath Rao Maduri

2018-11-01 Thread Aman Sinha
Congratulations Hanumath ! Aman On Thu, Nov 1, 2018 at 11:39 AM Paul Rogers wrote: > Congratulations Hanu! > > - Paul > > Sent from my iPhone > > > On Nov 1, 2018, at 11:09 AM, Kunal Khatua wrote: > > > > Congratulations, Hanu! > > On 11/1/2018 11:04:58 AM, Abhishek Girish wrote: > >

Re: November Apache Drill board report

2018-11-01 Thread Aman Sinha
Docket container ==> 'Docker' November 14, 2019 ==> 2018 :) (this is wrong in email that was sent out) Rest LGTM. On Thu, Nov 1, 2018 at 6:42 AM Arina Ielchiieva wrote: > Hi all, > > please take a look at the draft board report for the last quarter and let > me know if you have any

Re: [HANGOUT] [new link] Topics for October 02 2018

2018-10-13 Thread Aman Sinha
Please use this link: https://www.slideshare.net/secret/zMZIrpM5qKV5pI I forgot the apache mailing lists block the attachments. Aman On Sat, Oct 13, 2018 at 5:20 PM Aman Sinha wrote: > On my gmail account it shows the attachment was sent. I am re-attaching > and sending. > > Ama

Re: [HANGOUT] [new link] Topics for October 02 2018

2018-10-13 Thread Aman Sinha
On my gmail account it shows the attachment was sent. I am re-attaching and sending. Aman On Sat, Oct 13, 2018 at 3:38 PM Chunhui Shi wrote: > Hi Aman, are you going to send out the slides in another email? > > Regards, > Chunhui >

Re: [HANGOUT] [new link] Topics for October 02 2018

2018-10-12 Thread Aman Sinha
Attached is a PDF version of the slides. Unfortunately, I don't have a recording. thanks, Aman On Thu, Oct 11, 2018 at 9:39 AM Pritesh Maker wrote: > Divya - anyone is welcome to join the hangout! Aman will be sharing the > slides shortly. We use Google Hangouts which doesn't have the

Re: [HANGOUT] [new link] Topics for October 02 2018

2018-09-30 Thread Aman Sinha
I can talk about the index planning and execution feature [1] that is currently in review [2]. [1[ https://issues.apache.org/jira/browse/DRILL-6381 [2] https://github.com/apache/drill/pull/1466 On Fri, Sep 28, 2018 at 2:13 PM Karthikeyan Manivannan wrote: > Hi, > > We will have a Drill Hangout

Re: Drill Hangout tomorrow 06/26

2018-06-26 Thread Aman Sinha
lanning, TestTpchExplain) to the > SlowTest category. > > Is there other solution for this issue? What are other tests are executed > very slowly? > > Kind regards > Vitalii > > > On Tue, Jun 26, 2018 at 3:34 AM Aman Sinha wrote: > > > We'll have the Drill hangout tomo

Drill Hangout tomorrow 06/26

2018-06-25 Thread Aman Sinha
We'll have the Drill hangout tomorrow Jun26th, 2018 at 10:00 PDT. If you have any topics to discuss, send a reply to this post or just join the hangout. ( Drill hangout link )

Re: [DISCUSS] case insensitive storage plugin and workspaces names

2018-06-12 Thread Aman Sinha
plugins table names must be case > sensitive, since under table name we imply directory / file name and their > case sensitivity depends on file system. > > Kind regards, > Arina > > On Tue, Jun 12, 2018 at 6:13 PM Aman Sinha wrote: > > > Drill is dependent on

Re: [DISCUSS] case insensitive storage plugin and workspaces names

2018-06-12 Thread Aman Sinha
Drill is dependent on the underlying file system's case sensitivity. On HDFS one can create 'hadoop fs -mkdir /tmp/TPCH' and /tmp/tpch which are separate directories. These could be set as workspace in Drill's storage plugin configuration and we would want the ability to query both. If we

Re: question about views

2018-03-19 Thread Aman Sinha
Due to an infinite loop occurring in Calcite planning, we had to disable the filter pushdown past the union (SetOps). See https://issues.apache.org/jira/browse/DRILL-3855. Now that we have rebased on Calcite 1.15.0, we should re-enable this and test and if the pushdown works then the partition

Re: Accessing underlying scheme of input

2018-03-02 Thread Aman Sinha
lyze using parquet-tools [root@aman1 ~]# java -jar parquet-tools/target/parquet-tools-1.9.0.jar schema tt3/0_0_0.parquet message root { optional int64 a1; optional group b1 { repeated int64 id; } optional group c1 { optional int64 d1; optional group e1 { repeated bin

Re: Accessing underlying scheme of input

2018-03-02 Thread Aman Sinha
Erol, yes indeed Drill is internally creating the schema for Json data. The top level field's data type can be found using the TYPEOF(column) function that Gautam mentioned earlier. However, I understand you are looking for the nested schema as well, so I would recommend the following approach:

Re: Which Hadoop File Format Should I Use?

2018-02-07 Thread Aman Sinha
The multi-level indexing feature in Carbondata seems very interesting...it will allow persisting OLAP cubes and provide efficient access; virtually providing the capability that specialized OLAP engines provide. The ORC format also provides indexing but it seems not multi-level indexing.

Re: Google Hangouts: Lateral Join High Level Design Presentation

2018-02-06 Thread Aman Sinha
It looks like Volodymyr also had a topic: Decimal types support. He is starting with that. I am not sure if there is going to be sufficient time to cover 2 topics today... On Tue, Feb 6, 2018 at 10:00 AM, Timothy Farkas wrote: > Google Hangout Reminder. > >

Re: Does Drill support Full-text search as in Elasticsearch

2017-12-06 Thread Aman Sinha
Hi, no, currently Drill does not support full-text search and it is unlikely it will do so natively in the near future. Instead, since other products such as Elasticsearch are specialized for it, it would be better to add ES storage plugins in Drill to push down such full-text search filters into

Re: How to verify predicate pushdown

2017-10-26 Thread Aman Sinha
For simple AND-ed predicates, if the entire filter is eligible for pushdown then after pushdown the filter would be dropped, so the Explain plan will not show a Filter node. In this case, I would expect the row count of the Scan to be reduced. For complex predicates, e.g combination of ANDs and

No Drill hangout next Tuesday 19th Sept

2017-09-15 Thread Aman Sinha
Drillers, Due to developers attending the hackathon, we won't be having the Drill hangout next Tuesday. Next one will be on Tuesday Oct 3rd. See you then ! -Aman

[HANGOUT] Suggestions for topics for 08/22

2017-08-21 Thread Aman Sinha
We will have the hangout at 10 AM PST tomorrow 08/22. Please send in suggestions for topics or provide it at the beginning of the hangout. Hangout link: https://plus.google.com/hangouts/_/event/ci4rdiju8bv04a64efj5fedd0lc Thanks.

Re: Drill Summit/Conference Proposal

2017-06-17 Thread Aman Sinha
Agree with Julian about ApacheCon being a better venue with a track (or sub-conference ?) for Drill. The size and scale of the wider conference is likely to be beneficial despite other competing Apache projects. Perhaps the schedule could be managed in such a way that the conflicts with similar

Re: Getting Exception while running Drill-mondrian cube

2017-06-13 Thread Aman Sinha
Can you provide more details ? e.g what query did you run ? The schema entry * indicates this is a Hive table. Is the Hive plugin enabled in the Drill storage plugin UI ? Does the simple query 'Select count(*) from hive.`customer_w_ter` ' run ? On Tue, Jun 13, 2017 at 12:18 PM, Sateesh

Re: Drill data and database locality

2017-06-11 Thread Aman Sinha
Ivan, yes, the scans for the various data sources are expected to use locality information to perform the table scan. If you only run the query against mongodb (the right side of union-all) and the foreman is on server B, does it do the table scan on server A which is hosting the mongodb table ?

Re: Running cartesian joins on Drill

2017-05-11 Thread Aman Sinha
ian join condition. Drill leverages Calcite for this (you can see CALCITE-1200 for some background). Can you file a JIRA for this ? -Aman From: "Aman Sinha (asi...@mapr.com)" <asi...@mapr.com> Date: Thursday, May 11, 2017 at 4:29 PM To: dev <d...@drill.apache.org>, user <

Re: Running cartesian joins on Drill

2017-05-11 Thread Aman Sinha
I think Muhammad may be trying to run his original query with IS NOT DISTINCT FROM. That discussion got side-tracked into Cartesian joins because his query was not getting planned and the error was about Cartesian join. Muhammad, can you try with the equivalent version below ? You mentioned

Hangout starting at 10am PST

2017-04-04 Thread Aman Sinha
Hangout link: https://hangouts.google.com/hangouts/_/event/ci4rdiju8bv04a64efj5fedd0lc

Re: Drill query planning taking a LONG time

2017-02-13 Thread Aman Sinha
In your drillbit.log file, can you look for the entries for the foreman node to see where the time is being spent ? e.g entries of the following type: [275dec51-fcc1-f1bf-cb2f-57a838805a82:foreman] INFO o.a.d.exec.store.parquet.Metadata - Took 64 ms to read metadata from cache file Each

Apache Drill Hangout Minutes (1/10/2017)

2017-01-10 Thread Aman Sinha
Attendees: Arina, Aman, Khurram, Karthik, Paul, Roman 1. Khurram wanted to know the status of Calcite rebasing. Roman indicated it was in progress but currently on temporary hold due to couple of higher priority issues. 2. Lazy initialization for UDFs: Paul wanted more

Re: Suggestion for topics for Drill hangout tomorrow

2017-01-10 Thread Aman Sinha
Hangout starting .. link: https://plus.google.com/hangouts/_/event/ci4rdiju8bv04a64efj5fedd0lc On 1/9/17, 5:24 PM, "Aman Sinha" <asi...@mapr.com> wrote: Hi all, The bi-weekly hangout is tomorrow (01/10/17, 10 AM PST). If you have any suggestions for topics for

Suggestion for topics for Drill hangout tomorrow

2017-01-09 Thread Aman Sinha
Hi all, The bi-weekly hangout is tomorrow (01/10/17, 10 AM PST). If you have any suggestions for topics for tomorrow, please add to this thread. We will also ask for topics at the beginning of the hangout. Thanks.

Re: [1.9.0] : UserException: SYSTEM ERROR: IllegalReferenceCountException: refCnt: 0 and then SYSTEM ERROR: IOException: Failed to shutdown streamer

2016-12-08 Thread Aman Sinha
Hi Anup, since your original query was working on 1.6 and failed in 1.9, could you pls file a JIRA for this ? It sounds like a regression related to evaluation of a Project expression (based on the stack trace). Since there are several CASE exprs, quite likely something related to its

Re: querying rest services

2016-12-06 Thread Aman Sinha
Pls see: https://drill.apache.org/docs/rest-api/ and see if it satisfies your needs. On Tue, Dec 6, 2016 at 5:51 AM, Remzi Düzağaç wrote: > Hi, > > I would like to query rest services like solr or elasticsearch rest > interfaces (or any rest service) > is it possible via

Re: Table Metadata Question

2016-12-04 Thread Aman Sinha
Charles, Drill does not have a metastore for tables, so unless you have defined a view with CAST or are querying Hive tables (Hive has a metastore), the column types are determined at run-time. Have you tried the typeof() function ? SELECT typeof(column) FROM dfs.`data.json` LIMIT 1; On

Re: Building a LogicalPlan

2016-12-02 Thread Aman Sinha
Unfortunately, the 'LogicalPlan' structure that is created by the LogicalPlanBuilder does not go through the full Drill query optimization process. You are better off starting with a Calcite Rel and then building a Drill logical plan with 'Rels' (e.g DrillFilterRel, DrillProjectRel etc.). On

Hangout starting now

2016-09-20 Thread Aman Sinha
Hangout link - https://plus.google.com/hangouts/_/event/ci4rdiju8bv04a64efj5fedd0lc

Suggest topics for hangout tomorrow (9/20)

2016-09-19 Thread Aman Sinha
I'll start the hangout tomorrow at the usual time. I don't have a set agenda yet but if there are any topics folks wish to discuss, please respond on this thread such that others who might be interested can also join. Thanks.

Re: LIMIT push down to parquet row group

2016-09-19 Thread Aman Sinha
Adding to what Jinfeng said, the LIMIT handling relies on the downstream operator sending a 'kill incoming input stream' api which is called by the parent operator on its child once the parent (Limit) has received the required number of rows. Since the unit of processing in Drill is record

Re: Table value functions in Drill

2016-07-28 Thread Aman Sinha
Tushar, Drill does not currently support the table functions as described (it looks like your example is close to MS SQL Server syntax). Drill has support for VALUES list in the FROM clause and a table with options (see [1]) to interpret text data but neither of these match your requirement. [1]

Re: Tableau Web Data Connector

2016-07-25 Thread Aman Sinha
Steve, As far as I know, this has not been written (or maybe someone has written but not yet contributed). Agree that it would certainly be a useful functionality. -Aman On Mon, Jul 25, 2016 at 11:17 AM, Steve Warren wrote: > Has anyone written a Tableau Web Data Connector

[ANNOUNCE] Apache Drill 1.7.0 released

2016-06-28 Thread Aman Sinha
On behalf of the Apache Drill community, I am happy to announce the release of Apache Drill 1.7.0. The source and binary artifacts are available at [1] Review a complete list of fixes and enhancements at [2] This release of Drill fixes many issues and introduces a number of enhancements,

Re: DRILL-4199: Add Support for HBase 1.X - planning to merge

2016-06-21 Thread Aman Sinha
Scan(groupscan=[HBaseGroupScan > >> [HBaseScanSpec=HBaseScanSpec [tableName=offers_ref0, startRow=null, > >> stopRow=null, filter=null], columns=[`*`]]]) > >> 01-06Project(row_key0=[$0], v0=[$1], ITEM=[$2]) > >> 01-08

Re: DRILL-4199: Add Support for HBase 1.X - planning to merge

2016-06-20 Thread Aman Sinha
uce when I added > 5 nodes. Can anyone help me solve this issue? > > > > > 2016-06-17 4:39 GMT+08:00 Aditya <adityakish...@gmail.com>: > > > https://issues.apache.org/jira/browse/DRILL-4727 > > > > On Thu, Jun 16, 2016 at 11:39 AM, Aman Sinha <amansi...@

Re: DRILL-4199: Add Support for HBase 1.X - planning to merge

2016-06-16 Thread Aman Sinha
Qiang/Aditya can you create a JIRA for this and mark it for 1.7. thanks. On Thu, Jun 16, 2016 at 11:25 AM, Aditya wrote: > Thanks for reporting, I'm looking into it and will post a patch soon. > > On Wed, Jun 15, 2016 at 7:27 PM, qiang li wrote:

drill hangout link..

2016-06-14 Thread Aman Sinha
hangout starting now: https://hangouts.google.com/hangouts/_/event/ci4rdiju8bv04a64efj5fedd0lc

Suggestions for hangout topics for 06/14

2016-06-13 Thread Aman Sinha
If you have any suggestions for Drill hangout topics for tomorrow, you can add it to this thread. We will also ask around at the beginning of the hangout for any topics. The goal is to try to cover whatever possible during the 1 hr. Couple of pending topics: 1. (leftover from last time)

Re: Hangout Frequency

2016-05-20 Thread Aman Sinha
Every other week sounds good to me. It is a substantial commitment to do one every week. Many useful discussions already happen on the dev and user mailing lists. On Fri, May 20, 2016 at 12:44 PM, Parth Chandra wrote: > Drill Users, Devs, > > Attendance at the hangouts

Re: query from hbase issue

2016-05-19 Thread Aman Sinha
Khurram, DRILL-4686 seems like a different issue...it is reporting an error whereas the original problem from qiang was an incorrect result. Can you use the same version (1.6) that he was using. Also, is the data set similar ? If you are unable to repro the exact same issue, perhaps qiang

Re: Re: join fail

2016-05-11 Thread Aman Sinha
why the broadcast join falied in the condition > that the size of the table that in join right side is small than the size > of cluster's total memory > > > > lizhenm...@163.com > > From: Aman Sinha > Date: 2016-05-10 23:35 > To: user > Subject: Re: join fail > It's di

Re: Partition reading problem (like operator) while using hive partition table in drill

2016-05-10 Thread Aman Sinha
The Drill test team was able to repro this and is now filed as: https://issues.apache.org/jira/browse/DRILL-4665 On Tue, May 10, 2016 at 8:16 AM, Aman Sinha <amansi...@apache.org> wrote: > This is supposed to work, especially since LIKE predicate is not even on > the partiti

Re: join fail

2016-05-10 Thread Aman Sinha
It's difficult to debug this type of issue over email thread. However, 2 observations: 1. The following Scan which is the table that is broadcast shows a rowcount of 1.3M rows whereas your original email says the rowcount is 32M rows. Are you sure Can you confirm what is the correct row count ?

Re: Partition reading problem (like operator) while using hive partition table in drill

2016-05-10 Thread Aman Sinha
This is supposed to work, especially since LIKE predicate is not even on the partitioning column (it should work either way). I did a quick test with file system tables and it works for LIKE conditions. Not sure yet about Hive tables. Could you pls file a JIRA and we'll follow up. Thanks.

Re: Performance querying a single column out of a parquet file

2016-04-11 Thread Aman Sinha
There is a JIRA related to one aspect of this: DRILL-1950 (filter pushdown into parquet scan). This is still work in progress I believe. Once that is implemented, the scan will produce the filtered rows only. Regarding column projections, currently in Drill, the columns referenced anywhere in

Re: Query Planning and Directory Pruning

2016-02-09 Thread Aman Sinha
At a glance, John's query does not have a WHERE clause..it is querying the subdirectory directly in the FROM clause..in this case Drill will only look at the files within that subdirectory. Directory pruning only comes into the picture when there is a WHERE condition on dir0, dir1 etc. On Tue,

Re: Efficient joins in Drill - avoiding the massive overhead of scan based joins

2016-01-16 Thread Aman Sinha
DRILL-3929 covers the related discussion regarding secondary indexing.

Re: issue with where clause

2015-12-22 Thread Aman Sinha
Nirav, we would need some more details but this looks like a bug ... could you pls create a JIRA along with the stack trace for this error? (it should be in the drillbit.log file). Alternatively, set the following in your sqlline session: alter session set `exec.errors.verbose` = true; and

Re: Column aliases lost when using Dates + GROUP BY in SQL

2015-12-21 Thread Aman Sinha
The aliases work for me in the following query similar to yours. However, I am using latest master branch and running directly through sqlline command, not through Tableau. Can you confirm what Drill version you are using and check if you can repro the behavior through sqlline ? If so, you

Re: Drill and Parquet - Best practices - part 1

2015-11-02 Thread Aman Sinha
> >- If I have multiple files containing a days worth of logging, in > >chronological order, will all the irrelevant files be ignored when > looking > >for a data or a date range? > >- AKA - Will the min-max headers in Parquet be used to prevent > >scanning of data outside the

Re: directory structure containing multiple file types

2015-10-19 Thread Aman Sinha
With regard to the last comment on directory based pruning, please watch DRILL-3759 (https://issues.apache.org/jira/browse/DRILL-3759). I don't have a timeline for it yet but hopefully in the next Drill release. Aman On Mon, Oct 19, 2015 at 3:50 AM, Dhruv Gohil

Re: Performance issue when setting dir0 & dir1 in where clause

2015-10-13 Thread Aman Sinha
This is related to partition pruning and is being addressed as part of DRILL-3759 (https://issues.apache.org/jira/browse/DRILL-3759). Unfortunately, this issue did not make into the 1.2 version but will likely be available with the next release. Could you please add your above use case in the

Re: Naming directories

2015-09-07 Thread Aman Sinha
possible with > > Hive loaded tables, that would allow us to use drill to query, and hive > to > > load. (Using complex transforms, longer running queries etc). > > > > I love drill :) > > > > > > > > On Mon, Sep 7, 2015 at 2:23 PM, Aman Sinha

Re: query plan ....

2015-08-24 Thread Aman Sinha
Indeed, it is not efficient. We are doing 16 invocations of CONVERT_FROMUTF8($1) and 16 invocations of CONVERT_FROMUTF8($2). Can you pls file a JIRA ? We should ideally be doing projection pushdown in conjunction with the filter pushdown in to the HBase scan and computing these functions only

Re: Drill dir0 issue

2015-08-23 Thread Aman Sinha
Sungwook, do you have the latest master build which has the fix for Hive partition pruning (DRILL-3121) ? On Sun, Aug 23, 2015 at 12:15 PM, Sungwook Yoon sy...@maprtech.com wrote: Will do, Thanks, Sungwook On Sun, Aug 23, 2015 at 2:14 PM, Jacques Nadeau jacq...@dremio.com wrote: It

Re: Help with optimizing a query

2015-08-10 Thread Aman Sinha
Parquet files of size 20GB seem too large... could you split it up into smaller chunks ? Drill's Scan parallelism for Parquet is at the granularity of files, so if you have several 256MB files you will get very good parallelism as opposed to fewer large files. The tradeoff is that planning

Re: Drill Hangout (2015-08-04) minutes

2015-08-04 Thread Aman Sinha
The hangout notes refer to dot drill file but I think that may be either misrepresentation or mis-statement during the hangout. For the INSERT discussion, the best source is the separate thread titled '[DISCUSS] Insert Into Table Support'. In fact, we are intending to keep the merged schema in

Re: Drill making wrong type decision on comparison in where clause

2015-07-30 Thread Aman Sinha
it in the near future. Aman On Thu, Jul 30, 2015 at 3:48 PM, Aman Sinha asi...@maprtech.com wrote: Hi John, you cannot use aliases in the WHERE condition. Drill is not unique in this restriction...since the WHERE condition is evaluated before the alias is done in the SELECT clause. Did

Re: Drill making wrong type decision on comparison in where clause

2015-07-30 Thread Aman Sinha
Hi John, you cannot use aliases in the WHERE condition. Drill is not unique in this restriction...since the WHERE condition is evaluated before the alias is done in the SELECT clause. Did you try WHERE t.app.hcc.event_name IN ('logout') ? Aman On Thu, Jul 30, 2015 at 3:42 PM, John Schneider

Re: Counting large numbers of unique values

2015-04-07 Thread Aman Sinha
Drill already does most of this type of transformation. If you do an 'EXPLAIN PLAN FOR your count(distinct) query' you will see that it first does a grouping on the column and then applies the COUNT(column). The first level grouping can be done either based on sorting or hashing and this is

Re: CSV header issue

2015-04-02 Thread Aman Sinha
Hi Mahesh, Please see https://issues.apache.org/jira/browse/DRILL-951 for the issue of CSV headers. It is a feature that will be addressed in an upcoming release (currently tagged for 1.0). Aman On Wed, Apr 1, 2015 at 10:52 PM, Mahesh Sankaran sankarmahes...@gmail.com wrote: Hi , I

Re: Directory pruning with Drill

2015-02-04 Thread Aman Sinha
, based on current observations. It may be good to make it a bit friendlier for a better user experience, will file an enhancement request. —Andries On Feb 3, 2015, at 5:35 PM, Aman Sinha asi...@maprtech.com wrote: Yes, that's the expected behavior for now. Directory pruning where