from:"rahul challapalli"

Re: Good DB theory references

2019-01-22 Thread rahul challapalli

The redbook [1] deserves a mention. It also has a chapter (collection of
papers) dedicated to query optimization [2].

[1] http://www.redbook.io/
[2] http://www.redbook.io/ch7-queryoptimization.html

On Tue, Jan 22, 2019 at 4:16 AM Joel Pfaff  wrote:

>  Hello,
>
> Thanks for this initiative.
> I have found a couple of years ago this page of link from Reynold Xin:
> https://github.com/rxin/db-readings
>
> And it is full of nice things.
>
> Regards, Joel
>
> On Tue, Jan 22, 2019 at 9:01 AM weijie tong 
> wrote:
>
> > Hi Paul:
> > Thanks for the sharing. I would like to share another good latest paper
> > here   "Everything you always wanted to know about compiled and
> vectorized
> > queries but were afraid to ask" :
> > http://www.vldb.org/pvldb/vol11/p2209-kersten.pdf
> >
> > It explains the two kind of database execution architecture : vectorized
> &
> > compiled.  It can also answer the ever asked question about what's the
> > difference between spark's whole stage codegen and Drill's codegen.
> >
> >
> >
> > On Tue, Jan 22, 2019 at 10:51 AM Paul Rogers 
> > wrote:
> >
> > > Hi All,
> > >
> > > Wanted to pass along some good foundational material about databases.
> We
> > > find ourselves immersed day-to-day in the details of Drill's
> > > implementation. It is helpful to occasionally step back and look at the
> > > larger DB tradition in which Drill resides. This material is especially
> > > good for anyone who didn't study DB theory in college.
> > >
> > > "Architecture of a Database System":
> > > http://db.cs.berkeley.edu/papers/fntdb07-architecture.pdf - By
> > > Stonebraker et al. While focused on "classic" DB systems, the ideas
> > readily
> > > apply to "Big Data" distributed engines such as Drill. Walks through
> many
> > > of the basic architectural choices. You'll find yourself saying, "I
> see,
> > > Drill chose the shared-nothing, OS thread model but random heap
> > allocation
> > > rather than a buffer pool." That is, you can see Drill's design choices
> > in
> > > the context of the overall DB solution space.
> > >
> > > "Database Management Systems", 3e by Ramakrishnan & Gehrke. A
> > > textbook-length overview of DB theory. I used the second edition years
> > ago
> > > to design and build a complete embedded hybrid DB and object store. I
> > keep
> > > returning to the book any time I need a refresher on some topic or
> other.
> > >
> > > What other favorites do people have? Anyone know of any good references
> > > that explain the rule-based architecture of a planner such as Calcite?
> > > (R&G, 2e, mostly discuss the classic "dynamic programming" style of
> > > planner.)
> > >
> > > Thanks,
> > > - Paul
> > >
> > >
> >
>

Re: [ANNOUNCE] New Committer: Chunhui Shi

2018-09-29 Thread rahul challapalli

Congratulations Chunhui!

On Sat, Sep 29, 2018, 11:39 AM Kunal Khatua  wrote:

> Congratulations, Chunhui !!
> On 9/28/2018 7:31:44 PM, Chunhui Shi 
> wrote:
> Thank you Arina, PMCs, and every driller friends! I deeply appreciate the
> opportunity to be part of this global growing community of awesome
> developers.
>
> Best regards,
> Chunhui
>
>
> --
> From:Arina Ielchiieva
> Send Time:2018 Sep 28 (Fri) 02:17
> To:dev ; user
> Subject:[ANNOUNCE] New Committer: Chunhui Shi
>
> The Project Management Committee (PMC) for Apache Drill has invited Chunhui
> Shi to become a committer, and we are pleased to announce that he has
> accepted.
>
> Chunhui Shi has become a contributor since 2016, making changes in various
> Drill areas. He has shown profound knowledge in Drill planning side during
> his work to support lateral join. He is also one of the contributors of the
> upcoming feature to support index based planning and execution.
>
> Welcome Chunhui, and thank you for your contributions!
>
> - Arina
> (on behalf of Drill PMC)
>
>

Re: [ANNOUNCE] New Committer: Padma Penumarthy

2018-06-18 Thread rahul challapalli

Congratulations Padma!

On Mon, Jun 18, 2018 at 1:35 PM Khurram Faraaz  wrote:

> Congratulations Padma! Well deserved.
>
>
> Thanks,
>
> Khurram
>
> 
> From: Paul Rogers 
> Sent: Friday, June 15, 2018 7:50:05 PM
> To: dev@drill.apache.org
> Subject: Re: [ANNOUNCE] New Committer: Padma Penumarthy
>
> Congratulations! Well deserved, if just from the number of times you've
> reviewed my code.
>
> Thanks,
> - Paul
>
>
>
> On Friday, June 15, 2018, 9:36:44 AM PDT, Aman Sinha <
> amansi...@apache.org> wrote:
>
>  The Project Management Committee (PMC) for Apache Drill has invited Padma
> Penumarthy to become a committer, and we are pleased to announce that she
> has
> accepted.
>
> Padma has been contributing to Drill for about 1 1/2 years.  She has made
> improvements for work-unit assignment in the parallelizer, performance of
> filter operator for pattern matching and (more recently) on the batch
> sizing for several operators: Flatten, MergeJoin, HashJoin, UnionAll.
>
> Welcome Padma, and thank you for your contributions.  Keep up the good work
> !
>
> -Aman
> (on behalf of Drill PMC)
>
>

Re: [ANNOUNCE] New Committer: Sorabh Hamirwasia

2018-04-30 Thread rahul challapalli

Congratulations Sorabh!

On Mon, Apr 30, 2018 at 11:07 AM, Khurram Faraaz  wrote:

> Congratulations Sorabh!
>
> 
> From: Andries Engelbrecht 
> Sent: Monday, April 30, 2018 11:04:11 AM
> To: dev@drill.apache.org
> Subject: Re: [ANNOUNCE] New Committer: Sorabh Hamirwasia
>
> Congrats Sorabh!!!
>
> --Andries
>
> On 4/30/18, 8:35 AM, "Aman Sinha"  wrote:
>
> The Project Management Committee (PMC) for Apache Drill has invited
> Sorabh
> Hamirwasia  to become a committer, and we are pleased to announce that
> he
> has accepted.
>
> Over the last 1 1/2 years Sorabh's contributions have been in a few
> different areas. He took
> the lead in designing and implementing network encryption support for
> Drill. He has contributed
> to the web server and UI side.  More recently, he is involved in
> design and
> implementation of the lateral join operator.
>
> Welcome Sorabh, and thank you for your contributions.  Keep up the good
> work !
>
> -Aman
> (on behalf of Drill PMC)
>
>
>

Re: [ANNOUNCE] New Committer: Kunal Khatua

2018-02-27 Thread rahul challapalli

Congratulations Kunal!

On Tue, Feb 27, 2018 at 10:52 AM, Pritesh Maker  wrote:

> Congratulations, Kunal!!
>
> -Original Message-
> From: Aman Sinha 
> Sent: February 27, 2018 8:43 AM
> To: dev@drill.apache.org
> Subject: [ANNOUNCE] New Committer: Kunal Khatua
>
> The Project Management Committee (PMC) for Apache Drill has invited Kunal
> Khatua  to become a committer, and we are pleased to announce that he has
> accepted.
>
> Over the last couple of years, Kunal has made substantial contributions to
> the process of creating and interpreting of query profiles, among other
> code contributions. He has led the efforts for Drill performance evaluation
> and benchmarking.  He is a prolific writer on the user mailing list,
> providing detailed responses.
>
> Welcome Kunal, and thank you for your contributions.  Keep up the good
> work !
>
> - Aman
> (on behalf of the Apache Drill PMC)
>

Re: [ANNOUNCE] New Committer: Boaz Ben-Zvi

2017-12-13 Thread rahul challapalli

Congratulations Boaz!

On Wed, Dec 13, 2017 at 12:38 PM, Arina Yelchiyeva <
arina.yelchiy...@gmail.com> wrote:

> Congratulations!
>
> Kind regards
> Arina
>
> > On Dec 13, 2017, at 10:20 PM, Jinfeng Ni  wrote:
> >
> > Congratulations and welcome, Boaz!
> >
> >
> > Jinfeng
> >
> >
> >> On Wed, Dec 13, 2017 at 11:17 AM, Robert Hou  wrote:
> >>
> >> Congratulations, Boaz!
> >>
> >>
> >> --Robert
> >>
> >> 
> >> From: Paul Rogers 
> >> Sent: Wednesday, December 13, 2017 11:02 AM
> >> To: dev@drill.apache.org
> >> Subject: Re: [ANNOUNCE] New Committer: Boaz Ben-Zvi
> >>
> >> Congrats! Well deserved.
> >>
> >> - Paul
> >>
> >>> On Dec 13, 2017, at 11:00 AM, Timothy Farkas  wrote:
> >>>
> >>> Congrats!
> >>>
> >>> 
> >>> From: Kunal Khatua 
> >>> Sent: Wednesday, December 13, 2017 10:47:14 AM
> >>> To: dev@drill.apache.org
> >>> Subject: RE: [ANNOUNCE] New Committer: Boaz Ben-Zvi
> >>>
> >>> Congratulations, Boaz!!
> >>>
> >>> -Original Message-
> >>> From: Abhishek Girish [mailto:agir...@apache.org]
> >>> Sent: Wednesday, December 13, 2017 10:25 AM
> >>> To: dev@drill.apache.org
> >>> Subject: Re: [ANNOUNCE] New Committer: Boaz Ben-Zvi
> >>>
> >>> Congratulations Boaz!
>  On Wed, Dec 13, 2017 at 10:23 AM Aman Sinha 
> wrote:
> 
>  The Project Management Committee (PMC) for Apache Drill has invited
>  Boaz Ben-Zvi  to become a committer, and we are pleased to announce
>  that he has accepted.
> 
>  Boaz has been an active contributor to Drill for more than a year.
>  He designed and implemented the Hash Aggregate spilling and is leading
>  the efforts for Hash Join spilling.
> 
>  Welcome Boaz, and thank you for your contributions.  Keep up the good
>  work !
> 
>  - Aman
>  (on behalf of the Apache Drill PMC)
> 
> >>
> >>
>

Re: [ANNOUNCE] New Committer: Vitalii Diravka

2017-12-10 Thread rahul challapalli

Congratulations Vitalii!

On Sun, Dec 10, 2017 at 3:05 PM, Kunal Khatua  wrote:

> Congratulations!!
>
> -Original Message-
> From: Aman Sinha [mailto:amansi...@apache.org]
> Sent: Sunday, December 10, 2017 11:06 AM
> To: dev@drill.apache.org
> Subject: [ANNOUNCE] New Committer: Vitalii Diravka
>
> The Project Management Committee (PMC) for Apache Drill has invited
> Vitalii Diravka  to become a committer, and we are pleased to announce that
> he has accepted.
>
> Vitalii has been an active contributor to Drill over the last 1 1/2 years.
> His contributions have spanned areas such as: CASTing issues with
> Date/Timestamp, Parquet metadata and SQL enhancements, among others.
>
> Welcome Vitalii, and thank you for your contributions.  Keep up the good
> work !
>
> - Aman
> (on behalf of the Apache Drill PMC)
>

Re: Convert CSV to nested JSON

2017-09-18 Thread rahul challapalli

Can you give an example? Converting CSV into nested JSON does not make
sense to me.

On Mon, Sep 18, 2017 at 3:54 PM, Ted Dunning  wrote:

> What is the ultimate purpose here?
>
>
>
> On Mon, Sep 18, 2017 at 3:21 PM, Kunal Khatua  wrote:
>
> > I'm curious about whether there are any implementations of converting CSV
> > to a nested JSON format  "automagically".
> >
> > Within Drill, I know that the CTAS route will basically convert each row
> > into a JSON document with depth=1, which is pretty much an obese CSV data
> > format.
> >
> > Is it worth having something like this, or is it too hard a problem that
> > it's best that users explicitly define and write the documents?
> >
> > ~ Kunal
> >
> >
>

Re: [ANNOUNCE] New PMC member: Arina Ielchiieva

2017-08-02 Thread rahul challapalli

Congratulations Arina!

On Wed, Aug 2, 2017 at 11:27 AM, Kunal Khatua  wrote:

> Congratulations, Arina!!
>
>
> Thank you for your contributions to Drill !
>
>
> ~ Kunal
>
> 
> From: Aman Sinha 
> Sent: Wednesday, August 2, 2017 11:23:23 AM
> To: dev@drill.apache.org
> Subject: [ANNOUNCE] New PMC member: Arina Ielchiieva
>
> I am pleased to announce that Drill PMC invited Arina Ielchiieva to the PMC
> and she has accepted the invitation.
>
> Congratulations Arina and thanks for your contributions !
>
> -Aman
> (on behalf of Drill PMC)
>

[jira] [Created] (DRILL-5670) Varchar vector throws an assertion error when allocation a new verctor

2017-07-12 Thread Rahul Challapalli (JIRA)

Rahul Challapalli created DRILL-5670:


 Summary: Varchar vector throws an assertion error when allocation 
a new verctor
 Key: DRILL-5670
 URL: https://issues.apache.org/jira/browse/DRILL-5670
 Project: Apache Drill
  Issue Type: Bug
  Components: Execution - Relational Operators
Affects Versions: 1.11.0
Reporter: Rahul Challapalli


I am running this test on a private branch of [paul's 
repository|https://github.com/paul-rogers/drill]. Below is the commit info
{code}
git.commit.id.abbrev=d86e16c
git.commit.user.email=prog...@maprtech.com
git.commit.message.full=DRILL-5601\: Rollup of external sort fixes an 
improvements\n\n- DRILL-5513\: Managed External Sort \: OOM error during the 
merge phase\n- DRILL-5519\: Sort fails to spill and results in an OOM\n- 
DRILL-5522\: OOM during the merge and spill process of the managed external 
sort\n- DRILL-5594\: Excessive buffer reallocations during merge phase of 
external sort\n- DRILL-5597\: Incorrect "bits" vector allocation in nullable 
vectors allocateNew()\n- DRILL-5602\: Repeated List Vector fails to initialize 
the offset vector\n\nAll of the bugs have to do with handling low-memory 
conditions, and with\ncorrectly estimating the sizes of vectors, even when 
those vectors come\nfrom the spill file or from an exchange. Hence, the changes 
for all of\nthe above issues are interrelated.\n
git.commit.id=d86e16c551e7d3553f2cde748a739b1c5a7a7659
git.commit.message.short=DRILL-5601\: Rollup of external sort fixes an 
improvements
git.commit.user.name=Paul Rogers
git.build.user.name=Rahul Challapalli
git.commit.id.describe=0.9.0-1078-gd86e16c
git.build.user.email=challapallira...@gmail.com
git.branch=d86e16c551e7d3553f2cde748a739b1c5a7a7659
git.commit.time=05.07.2017 @ 20\:34\:39 PDT
git.build.time=12.07.2017 @ 14\:27\:03 PDT
git.remote.origin.url=g...@github.com\:paul-rogers/drill.git
{code}

Below query fails with an Assertion Error
{code}
0: jdbc:drill:zk=10.10.100.190:5181> ALTER SESSION SET 
`exec.sort.disable_managed` = false;
+---+-+
|  ok   |   summary   |
+---+-+
| true  | exec.sort.disable_managed updated.  |
+---+-+
1 row selected (1.044 seconds)
0: jdbc:drill:zk=10.10.100.190:5181> alter session set 
`planner.memory.max_query_memory_per_node` = 482344960;
+---++
|  ok   |  summary   |
+---++
| true  | planner.memory.max_query_memory_per_node updated.  |
+---++
1 row selected (0.372 seconds)
0: jdbc:drill:zk=10.10.100.190:5181> alter session set 
`planner.width.max_per_node` = 1;
+---+--+
|  ok   |   summary|
+---+--+
| true  | planner.width.max_per_node updated.  |
+---+--+
1 row selected (0.292 seconds)
0: jdbc:drill:zk=10.10.100.190:5181> alter session set 
`planner.width.max_per_query` = 1;
+---+---+
|  ok   |summary|
+---+---+
| true  | planner.width.max_per_query updated.  |
+---+---+
1 row selected (0.25 seconds)
0: jdbc:drill:zk=10.10.100.190:5181> select count(*) from (select * from 
dfs.`/drill/testdata/resource-manager/3500cols.tbl` order by 
columns[450],columns[330],columns[230],columns[220],columns[110],columns[90],columns[80],columns[70],columns[40],columns[10],columns[20],columns[30],columns[40],columns[50],
 
columns[454],columns[413],columns[940],columns[834],columns[73],columns[140],columns[104],columns[],columns[30],columns[2420],columns[1520],
 columns[1410], 
columns[1110],columns[1290],columns[2380],columns[705],columns[45],columns[1054],columns[2430],columns[420],columns[404],columns[3350],
 
columns[],columns[153],columns[356],columns[84],columns[745],columns[1450],columns[103],columns[2065],columns[343],columns[3420],columns[530],
 columns[3210] ) d where d.col433 = 'sjka skjf';
Error: RESOURCE ERROR: External Sort encountered an error while spilling to disk

Fragment 2:0

[Error Id: 26b55576-1a5c-4756-96d6-fbec25eecf03 on qa-node190.qa.lab:31010]

  (java.lang.AssertionError) null
org.apache.drill.exec.vector.VarCharVector.allocateNew():400
org.apache.drill.exec.vector.RepeatedVarCharVector.allocateNew():272

org.apache.drill.exec.vector.AllocationHelper.allocatePrecomputedChildCount():37
org.apache.drill.exec.vector.AllocationHelper.allocate():44
org.apache.drill.exec.record.Sm

[jira] [Created] (DRILL-5633) IOOBE in HashTable setup during the execution of HashJoin

2017-06-29 Thread Rahul Challapalli (JIRA)

Rahul Challapalli created DRILL-5633:


 Summary: IOOBE in HashTable setup during the execution of HashJoin
 Key: DRILL-5633
 URL: https://issues.apache.org/jira/browse/DRILL-5633
 Project: Apache Drill
  Issue Type: Bug
  Components: Execution - Relational Operators
Affects Versions: 1.11.0
Reporter: Rahul Challapalli


Env
{code}
Commit Id : 6446e56f292a5905d646462c618c056839ad5198
No of Nodes in the cluster : 1
File System : MapRFS
DRILL_MAX_DIRECT_MEMORY="32G"
DRILL_MAX_HEAP="4G"
Assertions Enabled : true
{code}

The below query fails with an IOOBE
{code}
ALTER SESSION SET `exec.sort.disable_managed` = false;
alter session set `planner.memory.max_query_memory_per_node` = 7607483648;
select * from (
  select columns[0] col1 from dfs.`/drill/testdata/hash-agg/seq/seqaa.tbl`
  union all
  select distinct columns[0] col1 from 
dfs.`/drill/testdata/hash-agg/seq/seqaa.tbl`
  order by col1 desc
) d1
inner join (
  select distinct columns[0] col2 from dfs.`/drill/testdata/hash-agg/uuid.tbl`
  union all
  select max(dir0) col2 from 
dfs.`/drill/testdata/resource-manager/small_large_parquet` group by col1
) d2
on d1.col1 = d2.col2;
{code}

Exception from the logs
{code}
2017-06-29 13:23:00,541 [BitServer-4] DEBUG o.a.drill.exec.work.foreman.Foreman 
- 26aaa18e-e67f-4fad-f65f-c806cc2faa44: State change requested RUNNING --> 
FAILED
org.apache.drill.common.exceptions.UserRemoteException: SYSTEM ERROR: 
IndexOutOfBoundsException: Index: 0, Size: 0

Fragment 1:0
  
[Error Id: 5027dec4-b762-4828-b0e1-b280ba8f7831 on qa-node190.qa.lab:31010]
at 
org.apache.drill.exec.work.foreman.QueryManager$1.statusUpdate(QueryManager.java:521)
 [drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
at 
org.apache.drill.exec.rpc.control.WorkEventBus.statusUpdate(WorkEventBus.java:71)
 [drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
at 
org.apache.drill.exec.work.batch.ControlMessageHandler.handle(ControlMessageHandler.java:94)
 [drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
at 
org.apache.drill.exec.work.batch.ControlMessageHandler.handle(ControlMessageHandler.java:55)
 [drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
at org.apache.drill.exec.rpc.BasicServer.handle(BasicServer.java:157) 
[drill-rpc-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
at org.apache.drill.exec.rpc.BasicServer.handle(BasicServer.java:53) 
[drill-rpc-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
at 
org.apache.drill.exec.rpc.RpcBus$InboundHandler.decode(RpcBus.java:274) 
[drill-rpc-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
at 
org.apache.drill.exec.rpc.RpcBus$InboundHandler.decode(RpcBus.java:244) 
[drill-rpc-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
at 
io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:89)
 [netty-codec-4.0.27.Final.jar:4.0.27.Final]
at 
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339)
 [netty-transport-4.0.27.Final.jar:4.0.27.Final]
at 
io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324)
 [netty-transport-4.0.27.Final.jar:4.0.27.Final]
at 
io.netty.handler.timeout.ReadTimeoutHandler.channelRead(ReadTimeoutHandler.java:150)
 [netty-handler-4.0.27.Final.jar:4.0.27.Final]
at 
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339)
 [netty-transport-4.0.27.Final.jar:4.0.27.Final]
at 
io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324)
 [netty-transport-4.0.27.Final.jar:4.0.27.Final]
at 
io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:103)
 [netty-codec-4.0.27.Final.jar:4.0.27.Final]
at 
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339)
 [netty-transport-4.0.27.Final.jar:4.0.27.Final]
at 
io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324)
 [netty-transport-4.0.27.Final.jar:4.0.27.Final]
at 
io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:242)
 [netty-codec-4.0.27.Final.jar:4.0.27.Final]
at 
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339)
 [netty-transport-4.0.27.Final.jar:4.0.27.Final]
at 
io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324)
 [netty-transport-4.0.27.Final.jar:4.0.27.Final]
at 
io.netty.channel.ChannelInboundHandlerAdapter.channelRead(ChannelInboundHandlerAdapter.java:86)
 [netty-transport-4.0.27.Final.jar:4.0.27.Final]
at 
io.netty.channel.AbstractChannelHandlerCo

[jira] [Created] (DRILL-5604) Possible performance degradation with hash aggregate when number of distinct keys increase

2017-06-23 Thread Rahul Challapalli (JIRA)

Rahul Challapalli created DRILL-5604:


 Summary: Possible performance degradation with hash aggregate when 
number of distinct keys increase
 Key: DRILL-5604
 URL: https://issues.apache.org/jira/browse/DRILL-5604
 Project: Apache Drill
  Issue Type: Bug
  Components: Execution - Relational Operators
Affects Versions: 1.11.0
Reporter: Rahul Challapalli


git.commit.id.abbrev=90f43bf

I tried to track the runtime as we gradually increase the no of distinct keys 
without increasing the total no of records. Below is one such test on top of 
tpcds sf1000 dataset

{code}
0: jdbc:drill:zk=10.10.100.190:5181> select count(distinct ss_list_price) from 
store_sales;
+-+
| EXPR$0  |
+-+
| 19736   |
+-+
1 row selected (163.345 seconds)
0: jdbc:drill:zk=10.10.100.190:5181> select count(distinct ss_net_profit) from 
store_sales;
+--+
|  EXPR$0  |
+--+
| 1525675  |
+--+
1 row selected (2094.962 seconds)
{code}

In both the above queries, the hash agg code processed 2879987999 records. So 
the time difference is due to overheads like hash table resizing etc. The 
second query took ~30 mins more than the first raising doubts whether there is 
an issue somewhere.

The dataset is too large to attach to a jira and so are the logs



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Created] (DRILL-5600) Using convert_to function on top of a map gives random errors

2017-06-20 Thread Rahul Challapalli (JIRA)

Rahul Challapalli created DRILL-5600:


 Summary: Using convert_to function on top of a map gives random 
errors
 Key: DRILL-5600
 URL: https://issues.apache.org/jira/browse/DRILL-5600
 Project: Apache Drill
  Issue Type: Bug
Affects Versions: 1.11.0
Reporter: Rahul Challapalli


git.commit.id.abbrev=3858bee
git.commit.user.email=boazben-zvi@BBenZvi-E754-MBP13.local

Error 1 :
{code}
select convert_to(s1.rms.rptd, 'utf8') from (select uid_rnd1, uid_rnd2, d.type 
type, d.uid uid, flatten(d.map.rm) rms from 
dfs.`/drill/testdata/nested_large_rand` d) s1;
Error: SYSTEM ERROR: IllegalStateException: next() returned NONE without first 
returning OK_NEW_SCHEMA [#989, UnorderedReceiverBatch]

Fragment 0:0

[Error Id: e12cb7c4-8ddf-45ba-8f51-34834a3d04c2 on qa-node190.qa.lab:31010] 
(state=,code=0)
{code}

Error 2 :
{code}
select convert_to(s1.rms.rptd, 'utf8') from (select uid_rnd1, uid_rnd2, d.type 
type, d.uid uid, flatten(d.map.rm) rms from 
dfs.`/drill/testdata/nested_large_rand` d) s1;
Error: SYSTEM ERROR: SchemaChangeException: Failure while trying to materialize 
incoming schema.  Errors:
 
Error in expression at index 0.  Error: Missing function implementation: 
[convert_toutf8(MAP-REPEATED)].  Full expression: null..

Fragment 1:0

[Error Id: 8e7ea6f4-b6f0-495a-9403-0ceda78a9572 on qa-node190.qa.lab:31010] 
(state=,code=0)
{code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Re: Performance issue with 2 phase hash-agg design

2017-06-20 Thread rahul challapalli

Thanks for sharing the link Aman.

On Tue, Jun 20, 2017 at 3:26 PM, Aman Sinha  wrote:

> See [1] which talks about this behavior for unique keys and suggests
> manually setting the single phase agg.
> We would need NDV statistics on the group-by keys to have the optimizer
> pick the more efficient scheme.
>
> [1] https://drill.apache.org/docs/guidelines-for-optimizing-aggregation/
>
> On Tue, Jun 20, 2017 at 2:30 PM, Chun Chang  wrote:
>
> > I also noticed if the keys are mostly unique, the first phase aggregation
> > effort is mostly wasted. This can and should be improved.
> >
> >
> > One idea is to detect unique keys while processing. When the percentage
> of
> > unique keys exceeds a certain threshold after processing certain
> percentage
> > of data, skip the rest and send directly to downstream second phase
> > aggregation.
> >
> > 
> > From: rahul challapalli 
> > Sent: Tuesday, June 20, 2017 1:36:31 PM
> > To: dev
> > Subject: Performance issue with 2 phase hash-agg design
> >
> > During the first phase, the hash agg operator is not protected from skew
> in
> > data (Eg : data contains 2 files where the number of records in one file
> is
> > very large compared to the other). Assuming there are only 2 fragments,
> the
> > hash-agg operator in one fragment handles more records and it aggregates
> > until the memory available to it gets exhausted, at which point it sends
> > the record batches downstream to the hash-partitioner.
> >
> > Because the hash-partitioner normalizes the skew in the data, the work is
> > evenly divided and the 2 minor fragments running the second phase
> > hash-aggregate take similar amount of processing time.
> >
> > So what is the problem here? During the first phase one minor fragment
> > takes a long time which affects the runtime of the query. Instead, if the
> > first phase did not do any aggregation or only used low memory (there by
> > limiting the aggregations performed) then the query would have completed
> > faster. However the advantage of doing 2-phase aggregation is reduced
> > traffic on the network. But if the keys used in group by are mostly
> unique
> > then we loose this advantage as well.
> >
> > I was playing with the new spillable hash-agg code and observed that
> > increasing memory did not improve the runtime.  This behavior can be
> > explained by the above reasoning.
> >
> > Aggregating on mostly unique keys may not be a common use case, but any
> > thoughts in general about this?
> >
>

Performance issue with 2 phase hash-agg design

2017-06-20 Thread rahul challapalli

During the first phase, the hash agg operator is not protected from skew in
data (Eg : data contains 2 files where the number of records in one file is
very large compared to the other). Assuming there are only 2 fragments, the
hash-agg operator in one fragment handles more records and it aggregates
until the memory available to it gets exhausted, at which point it sends
the record batches downstream to the hash-partitioner.

Because the hash-partitioner normalizes the skew in the data, the work is
evenly divided and the 2 minor fragments running the second phase
hash-aggregate take similar amount of processing time.

So what is the problem here? During the first phase one minor fragment
takes a long time which affects the runtime of the query. Instead, if the
first phase did not do any aggregation or only used low memory (there by
limiting the aggregations performed) then the query would have completed
faster. However the advantage of doing 2-phase aggregation is reduced
traffic on the network. But if the keys used in group by are mostly unique
then we loose this advantage as well.

I was playing with the new spillable hash-agg code and observed that
increasing memory did not improve the runtime.  This behavior can be
explained by the above reasoning.

Aggregating on mostly unique keys may not be a common use case, but any
thoughts in general about this?

Re: [ANNOUNCE] New Committer: Laurent Goujon

2017-06-09 Thread rahul challapalli

Congratulations Laurent!

On Fri, Jun 9, 2017 at 9:49 AM, Paul Rogers  wrote:

> Congratulations and welcome!
>
> - Paul
>
> > On Jun 9, 2017, at 3:33 AM, Khurram Faraaz  wrote:
> >
> > Congratulations Laurent.
> >
> > 
> > From: Parth Chandra 
> > Sent: Friday, June 9, 2017 3:14:00 AM
> > To: dev@drill.apache.org
> > Subject: [ANNOUNCE] New Committer: Laurent Goujon
> >
> > The Project Management Committee (PMC) for Apache Drill has invited
> Laurent
> > Goujon to become a committer, and we are pleased to announce that he has
> > accepted.
> >
> > Laurent has a long list of contributions many in the client side
> interfaces
> > and metadata queries.
> >
> > Welcome Laurent, and thank you for your contributions.  Keep up the good
> > work !
> >
> > - Parth
> > (on behalf of the Apache Drill PMC)
>
>

Re: A possible regression 1.9 / 1.10 when querying Parquet with complex types /nested structures (Map)

2017-06-03 Thread rahul challapalli

Jira is always the preferrable approach. Thank You.

On Sat, Jun 3, 2017 at 1:38 PM, Stefán Baxter 
wrote:

> Hi Rahul,
>
> Sure, but can I perhaps get the files to you directly?
>
> Regards,
>  -Stefán
>
> On Sat, Jun 3, 2017 at 8:13 PM, rahul challapalli <
> challapallira...@gmail.com> wrote:
>
> > Can you please raise a jira and attach the required files? I can try to
> > reproduce it.
> >
> > Rahul
> >
> > On Jun 3, 2017 6:19 AM, "Stefán Baxter" 
> wrote:
> >
> > > Hi,
> > >
> > > I have a sample data set (a few million records) that is saved to
> parquet
> > > in 2 ways. A simple file structure with primary types to store
> dimensions
> > > and metrics (String, Double) and a using nested maps (String,String and
> > > String,Double) respectively.
> > >
> > > Querying the data set with the simple types only:
> > >
> > > select roundTimeStamp(s.occurred_at,'PT1H') as `at`,
> sum(metrics_price)
> > as
> > > price, sum(metrics_kwh) as kwh from
> > > dfs.asa.`/processed/etactica-dev-p1/entitysamples/metrics/D2017*` as s
> > > group by roundTimeStamp(s.occurred_at,'PT1H')
> > >
> > >
> > > takes: *28.442 *sec. (dev. laptop x 1)
> > >
> > >
> > > Same query against the nested structure:
> > >
> > > select roundTimeStamp(s.occurred_at,'PT1H') as `at`,
> > sum(s.metrics.price)
> > > as price, sum(s.metricss.kwh) as kwh from
> > > dfs.asa.`/processed/etactica-dev-p1/entitysamples/metrics/D2017*` as s
> > > group by roundTimeStamp(s.occurred_at,'PT1H')
> > >
> > > takes: *719.810* sec.
> > >
> > > Event counting the number of records takes very, very long if there is
> a
> > > nested structure involved. (select count(*) from)
> > > It does not behave like this on our production servers (1.8) put I have
> > not
> > > run this particular test on them (their performance has never been an
> > > issue)
> > > I have these sample files available if anyone wishes to reproduces this
> > > consistently.
> > > Regards,
> > >  -Stefán
> > >
> >
>

Re: A possible regression 1.9 / 1.10 when querying Parquet with complex types /nested structures (Map)

2017-06-03 Thread rahul challapalli

Can you please raise a jira and attach the required files? I can try to
reproduce it.

Rahul

On Jun 3, 2017 6:19 AM, "Stefán Baxter"  wrote:

> Hi,
>
> I have a sample data set (a few million records) that is saved to parquet
> in 2 ways. A simple file structure with primary types to store dimensions
> and metrics (String, Double) and a using nested maps (String,String and
> String,Double) respectively.
>
> Querying the data set with the simple types only:
>
> select roundTimeStamp(s.occurred_at,'PT1H') as `at`, sum(metrics_price) as
> price, sum(metrics_kwh) as kwh from
> dfs.asa.`/processed/etactica-dev-p1/entitysamples/metrics/D2017*` as s
> group by roundTimeStamp(s.occurred_at,'PT1H')
>
>
> takes: *28.442 *sec. (dev. laptop x 1)
>
>
> Same query against the nested structure:
>
> select roundTimeStamp(s.occurred_at,'PT1H') as `at`, sum(s.metrics.price)
> as price, sum(s.metricss.kwh) as kwh from
> dfs.asa.`/processed/etactica-dev-p1/entitysamples/metrics/D2017*` as s
> group by roundTimeStamp(s.occurred_at,'PT1H')
>
> takes: *719.810* sec.
>
> Event counting the number of records takes very, very long if there is a
> nested structure involved. (select count(*) from)
> It does not behave like this on our production servers (1.8) put I have not
> run this particular test on them (their performance has never been an
> issue)
> I have these sample files available if anyone wishes to reproduces this
> consistently.
> Regards,
>  -Stefán
>

Re: Upgrading Calcite's version

2017-06-02 Thread rahul challapalli

Yes, drill has its own fork of calcite. You have 2 options here

1. Hand pick the specific changes from calcite into drill
2. Upgrade drill to use the latest calcite version

I believe there is already an on-going effort to upgrade drill to use the
latest version of calcite. I couldn't find the relevant jira though.

- Rahul

On Fri, Jun 2, 2017 at 8:51 AM, Muhammad Gelbana 
wrote:

> Was the currently used version of Calcite (Based on v1.4 ?) modified in
> anyway before it was used in building Drill ?
>
> I'm considering creating a new build of Drill with the latest version of
> Calcite and I need to understand the amount of effort needed.
>
> The reason I want to do that is that I need a feature that exists in a more
> recent version of Calcite, which is pushing down aggregates without using
> subqueries.
>
> *-*
> *Muhammad Gelbana*
> http://www.linkedin.com/in/mgelbana
>

Re: Apache Drill takes 5-6 secs in fetching 1000 records from PostgreSQL table

2017-05-30 Thread rahul challapalli

5-6 seconds is a lot of time for the query and dataset size you mentioned.
Did you check the profile to see where the time is being spent?

On Tue, May 30, 2017 at 2:53 AM,  wrote:

> Hi,
>
> I am creating an UNLOGGED table in PostgreSQL and reading it using Apache
> Drill. Table contains just one column with 1000 UUID entries.
> It is taking 5-6 secs for me to read those records.
>
> I am fetching data using below query,
>
> Select uuidColumn from pgPlugin.public.uuidTable
>
>
> Is there anything that I am missing or any Drill level tweaking is
> required so that queries can be executed in milli-seconds.
>
> Thanks in advance.
>
> Regards,
> Jasbir singh
>
> 
>
> This message is for the designated recipient only and may contain
> privileged, proprietary, or otherwise confidential information. If you have
> received it in error, please notify the sender immediately and delete the
> original. Any other use of the e-mail by you is prohibited. Where allowed
> by local law, electronic communications with Accenture and its affiliates,
> including e-mail and instant messaging (including content), may be scanned
> by our systems for the purposes of information security and assessment of
> internal compliance with Accenture policy.
> 
> __
>
> www.accenture.com
>

[jira] [Created] (DRILL-5534) convert_from on a json map with null value produces an NPE

2017-05-23 Thread Rahul Challapalli (JIRA)

Rahul Challapalli created DRILL-5534:


 Summary: convert_from on a json map with null value produces an NPE
 Key: DRILL-5534
 URL: https://issues.apache.org/jira/browse/DRILL-5534
 Project: Apache Drill
  Issue Type: Bug
  Components: Functions - Drill
Affects Versions: 1.10.0
Reporter: Rahul Challapalli


git.commit.id.abbrev=d11aba2

The below query fails with an NPE. Surprisingly there is information about the 
error in the logs
{code}
0: jdbc:drill:zk=10.10.100.190:5181> select convert_from('{kpi : null}' 
,'json') as kpi from cp.`tpch/lineitem.parquet` limit 1;
Error: Unexpected RuntimeException: java.lang.NullPointerException 
(state=,code=0)
{code}

Adding one more column to the above query gets rid of the NPE
{code}
select '' as rk, convert_from('{kpi : null}' ,'json') as kpi from 
cp.`tpch/lineitem.parquet` limit 1;
+-+--+
| rk  | kpi  |
+-+--+
| | {}   |
+-+--+
1 row selected (1.013 seconds)
{code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Created] (DRILL-5528) Sorting 19GB data with 14GB memory in a single fragment takes ~150 minutes

2017-05-19 Thread Rahul Challapalli (JIRA)

Rahul Challapalli created DRILL-5528:


 Summary: Sorting 19GB data with 14GB memory in a single fragment 
takes ~150 minutes
 Key: DRILL-5528
 URL: https://issues.apache.org/jira/browse/DRILL-5528
 Project: Apache Drill
  Issue Type: Bug
  Components: Execution - Relational Operators
Affects Versions: 1.10.0
Reporter: Rahul Challapalli
Assignee: Paul Rogers


Configuration :
{code}
git.commit.id.abbrev=1e0a14c
DRILL_MAX_DIRECT_MEMORY="32G"
DRILL_MAX_HEAP="4G"
{code}

Based on the runtime of the below query, I suspect there is a performance 
bottleneck somewhere
{code}
[root@qa-node190 external-sort]# /opt/drill/bin/sqlline -u 
jdbc:drill:zk=10.10.100.190:5181
apache drill 1.11.0-SNAPSHOT
"start your sql engine"
0: jdbc:drill:zk=10.10.100.190:5181> ALTER SESSION SET 
`exec.sort.disable_managed` = false;
+---+-+
|  ok   |   summary   |
+---+-+
| true  | exec.sort.disable_managed updated.  |
+---+-+
1 row selected (0.975 seconds)
0: jdbc:drill:zk=10.10.100.190:5181> alter session set 
`planner.width.max_per_node` = 1;
+---+--+
|  ok   |   summary|
+---+--+
| true  | planner.width.max_per_node updated.  |
+---+--+
1 row selected (0.371 seconds)
0: jdbc:drill:zk=10.10.100.190:5181> alter session set 
`planner.disable_exchanges` = true;
+---+-+
|  ok   |   summary   |
+---+-+
| true  | planner.disable_exchanges updated.  |
+---+-+
1 row selected (0.292 seconds)
0: jdbc:drill:zk=10.10.100.190:5181> alter session set 
`planner.memory.max_query_memory_per_node` = 14106127360;
+---++
|  ok   |  summary   |
+---++
| true  | planner.memory.max_query_memory_per_node updated.  |
+---++
1 row selected (0.316 seconds)
0: jdbc:drill:zk=10.10.100.190:5181> select count(*) from (select * from 
dfs.`/drill/testdata/resource-manager/250wide.tbl` order by columns[0])d where 
d.columns[0] = 'ljdfhwuehnoiueyf';
+-+
| EXPR$0  |
+-+
| 0   |
+-+
1 row selected (8530.719 seconds)
{code}

I attached the logs and profile files. The data is too large to attach to a 
jira. Reach out to me if you need any more information



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Re: [ANNOUNCE] New Committer: Paul Rogers

2017-05-19 Thread rahul challapalli

Congratulations Paul. Well Deserved.

On Fri, May 19, 2017 at 8:46 AM, Gautam Parai  wrote:

> Congratulations Paul and thank you for your contributions!
>
>
> Gautam
>
> 
> From: Abhishek Girish 
> Sent: Friday, May 19, 2017 8:27:05 AM
> To: dev@drill.apache.org
> Subject: Re: [ANNOUNCE] New Committer: Paul Rogers
>
> Congrats Paul!
>
> On Fri, May 19, 2017 at 8:23 AM, Charles Givre  wrote:
>
> > Congrats Paul!!
> >
> > On Fri, May 19, 2017 at 11:22 AM, Aman Sinha 
> wrote:
> >
> > > The Project Management Committee (PMC) for Apache Drill has invited
> Paul
> > > Rogers to become a committer, and we are pleased to announce that he
> has
> > > accepted.
> > >
> > > Paul has a long list of contributions that have touched many aspects of
> > the
> > > product.
> > >
> > > Welcome Paul, and thank you for your contributions.  Keep up the good
> > work
> > > !
> > >
> > > - Aman
> > >
> > > (on behalf of the Apache Drill PMC)
> > >
> >
>

[jira] [Created] (DRILL-5522) OOM during the merge and spill process of the managed external sort

2017-05-17 Thread Rahul Challapalli (JIRA)

Rahul Challapalli created DRILL-5522:


 Summary: OOM during the merge and spill process of the managed 
external sort
 Key: DRILL-5522
 URL: https://issues.apache.org/jira/browse/DRILL-5522
 Project: Apache Drill
  Issue Type: Bug
  Components: Execution - Relational Operators
Affects Versions: 1.10.0
Reporter: Rahul Challapalli
Assignee: Paul Rogers


git.commit.id.abbrev=1e0a14c

The below query fails with an OOM
{code}
ALTER SESSION SET `exec.sort.disable_managed` = false;
alter session set `planner.memory.max_query_memory_per_node` = 1552428800;
create table dfs.drillTestDir.xsort_ctas3_multiple partition by (type, aCol) as 
select type, rptds, rms, s3.rms.a aCol, uid from (
  select * from (
select s1.type type, flatten(s1.rms.rptd) rptds, s1.rms, s1.uid
from (
  select d.type type, d.uid uid, flatten(d.map.rm) rms from 
dfs.`/drill/testdata/resource-manager/nested-large.json` d order by d.uid
) s1
  ) s2
  order by s2.rms.mapid, s2.rptds.a
) s3;
{code}

Stack trace
{code}
2017-05-17 15:15:35,027 [26e334aa-1afa-753f-3afe-862f76b80c18:frag:4:2] INFO  
o.a.d.e.w.fragment.FragmentExecutor - User Error Occurred: One or more nodes 
ran out of memory while executing the query. (Unable to allocate buffer of size 
2097152 due to memory limit. Current allocation: 29229064)
org.apache.drill.common.exceptions.UserException: RESOURCE ERROR: One or more 
nodes ran out of memory while executing the query.

Unable to allocate buffer of size 2097152 due to memory limit. Current 
allocation: 29229064

[Error Id: 619e2e34-704c-4964-a354-1348fb33ce8a ]
at 
org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:544)
 ~[drill-common-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
at 
org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:244)
 [drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
at 
org.apache.drill.common.SelfCleaningRunnable.run(SelfCleaningRunnable.java:38) 
[drill-common-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) 
[na:1.7.0_111]
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) 
[na:1.7.0_111]
at java.lang.Thread.run(Thread.java:745) [na:1.7.0_111]
Caused by: org.apache.drill.exec.exception.OutOfMemoryException: Unable to 
allocate buffer of size 2097152 due to memory limit. Current allocation: 
29229064
at 
org.apache.drill.exec.memory.BaseAllocator.buffer(BaseAllocator.java:220) 
~[drill-memory-base-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
at 
org.apache.drill.exec.memory.BaseAllocator.buffer(BaseAllocator.java:195) 
~[drill-memory-base-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
at 
org.apache.drill.exec.vector.BigIntVector.reAlloc(BigIntVector.java:212) 
~[vector-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
at 
org.apache.drill.exec.vector.BigIntVector.copyFromSafe(BigIntVector.java:324) 
~[vector-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
at 
org.apache.drill.exec.vector.NullableBigIntVector.copyFromSafe(NullableBigIntVector.java:367)
 ~[vector-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
at 
org.apache.drill.exec.vector.NullableBigIntVector$TransferImpl.copyValueSafe(NullableBigIntVector.java:328)
 ~[vector-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
at 
org.apache.drill.exec.vector.complex.RepeatedMapVector$RepeatedMapTransferPair.copyValueSafe(RepeatedMapVector.java:360)
 ~[vector-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
at 
org.apache.drill.exec.vector.complex.MapVector$MapTransferPair.copyValueSafe(MapVector.java:220)
 ~[vector-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
at 
org.apache.drill.exec.vector.complex.MapVector.copyFromSafe(MapVector.java:82) 
~[vector-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
at 
org.apache.drill.exec.test.generated.PriorityQueueCopierGen49.doCopy(PriorityQueueCopierTemplate.java:34)
 ~[na:na]
at 
org.apache.drill.exec.test.generated.PriorityQueueCopierGen49.next(PriorityQueueCopierTemplate.java:76)
 ~[na:na]
at 
org.apache.drill.exec.physical.impl.xsort.managed.CopierHolder$BatchMerger.next(CopierHolder.java:234)
 ~[drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
at 
org.apache.drill.exec.physical.impl.xsort.managed.ExternalSortBatch.mergeSpilledRuns(ExternalSortBatch.java:1214)
 ~[drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
at 
org.apache.drill.exec.physical.impl.xsort.managed.ExternalSortBatch.load(ExternalSortBatch.java:689)
 ~[drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
at 
org.apache.drill.exec.physical.impl.xsort.managed.ExternalSortBatch.innerNext(ExternalSortBatch.java:559)
 ~[drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
at 
org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java

[jira] [Created] (DRILL-5519) Sort fails to spill and results in an OOM

2017-05-16 Thread Rahul Challapalli (JIRA)

Rahul Challapalli created DRILL-5519:


 Summary: Sort fails to spill and results in an OOM
 Key: DRILL-5519
 URL: https://issues.apache.org/jira/browse/DRILL-5519
 Project: Apache Drill
  Issue Type: Bug
Affects Versions: 1.10.0
Reporter: Rahul Challapalli
Assignee: Paul Rogers


Setup :
{code}
git.commit.id.abbrev=1e0a14c
DRILL_MAX_DIRECT_MEMORY="32G"
DRILL_MAX_HEAP="4G"
No of nodes in the drill cluster : 1
{code}

The below query fails with an OOM in the "in-memory sort" code, which means the 
logic which decides when to spill is flawed.
{code}
0: jdbc:drill:zk=10.10.100.190:5181> ALTER SESSION SET 
`exec.sort.disable_managed` = false;
+---+-+
|  ok   |   summary   |
+---+-+
| true  | exec.sort.disable_managed updated.  |
+---+-+
1 row selected (1.022 seconds)
0: jdbc:drill:zk=10.10.100.190:5181> alter session set 
`planner.memory.max_query_memory_per_node` = 334288000;
+---++
|  ok   |  summary   |
+---++
| true  | planner.memory.max_query_memory_per_node updated.  |
+---++
1 row selected (0.369 seconds)
0: jdbc:drill:zk=10.10.100.190:5181> select count(*) from (select * from 
(select flatten(flatten(lst_lst)) num from 
dfs.`/drill/testdata/resource-manager/nested-large.json`) d order by d.num) d1 
where d1.num < -1;
Error: RESOURCE ERROR: One or more nodes ran out of memory while executing the 
query.

Unable to allocate buffer of size 4194304 (rounded from 320) due to memory 
limit. Current allocation: 16015936
Fragment 2:2

[Error Id: 4d9cc59a-b5d1-4ca9-9b26-69d9438f0bee on qa-node190.qa.lab:31010] 
(state=,code=0)
{code}

Below is the exception from the logs
{code}
2017-05-16 13:46:33,233 [26e49afc-cf45-637b-acc1-a70fee7fe7e2:frag:2:2] INFO  
o.a.d.e.w.fragment.FragmentExecutor - User Error Occurred: One or more nodes 
ran out of memory while executing the query. (Unable to allocate buffer of size 
4194304 (rounded from 320) due to memory limit. Current allocation: 
16015936)
org.apache.drill.common.exceptions.UserException: RESOURCE ERROR: One or more 
nodes ran out of memory while executing the query.

Unable to allocate buffer of size 4194304 (rounded from 320) due to memory 
limit. Current allocation: 16015936

[Error Id: 4d9cc59a-b5d1-4ca9-9b26-69d9438f0bee ]
at 
org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:544)
 ~[drill-common-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
at 
org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:244)
 [drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
at 
org.apache.drill.common.SelfCleaningRunnable.run(SelfCleaningRunnable.java:38) 
[drill-common-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) 
[na:1.7.0_111]
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) 
[na:1.7.0_111]
at java.lang.Thread.run(Thread.java:745) [na:1.7.0_111]
Caused by: org.apache.drill.exec.exception.OutOfMemoryException: Unable to 
allocate buffer of size 4194304 (rounded from 320) due to memory limit. 
Current allocation: 16015936
at 
org.apache.drill.exec.memory.BaseAllocator.buffer(BaseAllocator.java:220) 
~[drill-memory-base-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
at 
org.apache.drill.exec.memory.BaseAllocator.buffer(BaseAllocator.java:195) 
~[drill-memory-base-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
at 
org.apache.drill.exec.test.generated.MSorterGen44.setup(MSortTemplate.java:91) 
~[na:na]
at 
org.apache.drill.exec.physical.impl.xsort.managed.MergeSort.merge(MergeSort.java:110)
 ~[drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
at 
org.apache.drill.exec.physical.impl.xsort.managed.ExternalSortBatch.sortInMemory(ExternalSortBatch.java:1159)
 ~[drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
at 
org.apache.drill.exec.physical.impl.xsort.managed.ExternalSortBatch.load(ExternalSortBatch.java:687)
 ~[drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
at 
org.apache.drill.exec.physical.impl.xsort.managed.ExternalSortBatch.innerNext(ExternalSortBatch.java:559)
 ~[drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
at 
org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:162)
 ~[drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
at 
org.apache.drill.exec.physical.impl.validate.IteratorValidatorBatchIterator.next(IteratorValidatorBatchIt

[jira] [Created] (DRILL-5513) Managed External Sort : OOM error during the merge phase

2017-05-15 Thread Rahul Challapalli (JIRA)

Rahul Challapalli created DRILL-5513:


 Summary: Managed External Sort : OOM error during the merge phase
 Key: DRILL-5513
 URL: https://issues.apache.org/jira/browse/DRILL-5513
 Project: Apache Drill
  Issue Type: Bug
Affects Versions: 1.10.0
Reporter: Rahul Challapalli
Assignee: Paul Rogers


git.commit.id.abbrev=1e0a14c
No of nodes in cluster : 1
DRILL_MAX_DIRECT_MEMORY="32G"
DRILL_MAX_HEAP="4G"

The below query fails with an OOM
{code}
ALTER SESSION SET `exec.sort.disable_managed` = false;
alter session set `planner.width.max_per_query` = 100;
alter session set `planner.memory.max_query_memory_per_node` = 652428800;
select count(*) from (select s1.type type, flatten(s1.rms.rptd) rptds from 
(select d.type type, d.uid uid, flatten(d.map.rm) rms from 
dfs.`/drill/testdata/resource-manager/nested-large.json` d order by d.uid) s1 
order by s1.rms.mapid);
{code}

Exception from the logs
{code}
2017-05-15 12:58:46,646 [BitServer-4] DEBUG o.a.drill.exec.work.foreman.Foreman 
- 26e5f7b8-71e8-afca-e72e-fad7be2b2416: State change requested RUNNING --> 
FAILED
org.apache.drill.common.exceptions.UserRemoteException: RESOURCE ERROR: One or 
more nodes ran out of memory while executing the query.

Unable to allocate buffer of size 2097152 due to memory limit. Current 
allocation: 19791880
Fragment 5:2

[Error Id: bb67176f-a780-400d-88c9-06fea131ea64 on qa-node190.qa.lab:31010]

  (org.apache.drill.exec.exception.OutOfMemoryException) Unable to allocate 
buffer of size 2097152 due to memory limit. Current allocation: 19791880
org.apache.drill.exec.memory.BaseAllocator.buffer():220
org.apache.drill.exec.memory.BaseAllocator.buffer():195
org.apache.drill.exec.vector.BigIntVector.reAlloc():212
org.apache.drill.exec.vector.BigIntVector.copyFromSafe():324
org.apache.drill.exec.vector.NullableBigIntVector.copyFromSafe():367

org.apache.drill.exec.vector.NullableBigIntVector$TransferImpl.copyValueSafe():328

org.apache.drill.exec.vector.complex.RepeatedMapVector$RepeatedMapTransferPair.copyValueSafe():360

org.apache.drill.exec.vector.complex.MapVector$MapTransferPair.copyValueSafe():220
org.apache.drill.exec.vector.complex.MapVector.copyFromSafe():82
org.apache.drill.exec.test.generated.PriorityQueueCopierGen4494.doCopy():34
org.apache.drill.exec.test.generated.PriorityQueueCopierGen4494.next():76

org.apache.drill.exec.physical.impl.xsort.managed.CopierHolder$BatchMerger.next():234

org.apache.drill.exec.physical.impl.xsort.managed.ExternalSortBatch.mergeSpilledRuns():1214

org.apache.drill.exec.physical.impl.xsort.managed.ExternalSortBatch.load():689

org.apache.drill.exec.physical.impl.xsort.managed.ExternalSortBatch.innerNext():559
org.apache.drill.exec.record.AbstractRecordBatch.next():162

org.apache.drill.exec.physical.impl.validate.IteratorValidatorBatchIterator.next():215
org.apache.drill.exec.record.AbstractRecordBatch.next():119
org.apache.drill.exec.record.AbstractRecordBatch.next():109
org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext():51

org.apache.drill.exec.physical.impl.svremover.RemovingRecordBatch.innerNext():93
org.apache.drill.exec.record.AbstractRecordBatch.next():162

org.apache.drill.exec.physical.impl.validate.IteratorValidatorBatchIterator.next():215
org.apache.drill.exec.physical.impl.BaseRootExec.next():104

org.apache.drill.exec.physical.impl.SingleSenderCreator$SingleSenderRootExec.innerNext():92
org.apache.drill.exec.physical.impl.BaseRootExec.next():94
org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():234
org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():227
java.security.AccessController.doPrivileged():-2
javax.security.auth.Subject.doAs():415
org.apache.hadoop.security.UserGroupInformation.doAs():1595
org.apache.drill.exec.work.fragment.FragmentExecutor.run():227
org.apache.drill.common.SelfCleaningRunnable.run():38
java.util.concurrent.ThreadPoolExecutor.runWorker():1145
java.util.concurrent.ThreadPoolExecutor$Worker.run():615
java.lang.Thread.run():745

at 
org.apache.drill.exec.work.foreman.QueryManager$1.statusUpdate(QueryManager.java:537)
 [drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
at 
org.apache.drill.exec.rpc.control.WorkEventBus.statusUpdate(WorkEventBus.java:71)
 [drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
at 
org.apache.drill.exec.work.batch.ControlMessageHandler.handle(ControlMessageHandler.java:94)
 [drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
at 
org.apache.drill.exec.work.batch.ControlMessageHandler.handle(ControlMessageHandler.java:55)
 [drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
at org.apache.drill.exec.rpc.BasicServer.handle(BasicServer.java:159) 
[drill-

[jira] [Created] (DRILL-5505) Enabling exchanges increased the external sorts spill count by 2 times

2017-05-11 Thread Rahul Challapalli (JIRA)

Rahul Challapalli created DRILL-5505:


 Summary: Enabling exchanges increased the external sorts spill 
count by 2 times
 Key: DRILL-5505
 URL: https://issues.apache.org/jira/browse/DRILL-5505
 Project: Apache Drill
  Issue Type: Bug
Affects Versions: 1.10.0
Reporter: Rahul Challapalli
Assignee: Paul Rogers


git.commit.id.abbrev=1e0a14c

Based on the profile, the below query spilled 32 times
{code}
ALTER SESSION SET `exec.sort.disable_managed` = false;
alter session set `planner.width.max_per_node` = 1;
alter session set `planner.width.max_per_node` = 1;
alter session set `planner.disable_exchanges` = true;
alter session set `planner.memory.max_query_memory_per_node` = 62914560;
select count(*) from (select * from 
dfs.`/drill/testdata/resource-manager/250wide-small.tbl` order by columns[0])d 
where d.columns[0] = 'ljdfhwuehnoiueyf';
{code}

Now if I enabled the exchanges, rest all being same, the same query spilled 66 
times. I attached the 2 profiles and the log file.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Created] (DRILL-5503) Disabling exchanges results in "Unable to allocate sv2 buffer" error within the managed external sort code

2017-05-10 Thread Rahul Challapalli (JIRA)

Rahul Challapalli created DRILL-5503:


 Summary: Disabling exchanges results in "Unable to allocate sv2 
buffer" error within the managed external sort code
 Key: DRILL-5503
 URL: https://issues.apache.org/jira/browse/DRILL-5503
 Project: Apache Drill
  Issue Type: Bug
  Components: Execution - Flow
Affects Versions: 1.10.0
Reporter: Rahul Challapalli
Assignee: Paul Rogers


Setup :
{code}
git.commit.id.abbrev=1e0a14c
No of drillbits : 1
DRILL_MAX_DIRECT_MEMORY="32G"
DRILL_MAX_HEAP="4G"
{code}

The below successfully completes
{code}
ALTER SESSION SET `exec.sort.disable_managed` = false;
alter session set `planner.width.max_per_node` = 1;
alter session set `planner.memory.max_query_memory_per_node` = 6260;
alter session set `planner.width.max_per_query` = 17;
select count(*) from (select * from 
dfs.`/drill/testdata/resource-manager/5kwidecolumns_500k.tbl` order by 
columns[0]) d where d.columns[0] = '4041054511';
+-+
| EXPR$0  |
+-+
| 0   |
+-+
1 row selected (814.104 seconds)
{code}

However if I disable exchanges, I get the following error
{code}
alter session set `planner.disable_exchanges` = false;
select count(*) from (select * from 
dfs.`/drill/testdata/resource-manager/5kwidecolumns_500k.tbl` order by 
columns[0]) d where d.columns[0] = '4041054511';
+-+
| EXPR$0  |
+-+
| 0   |
+-+
1 row selected (814.104 seconds)
{code}

I attached the profile and the log file. The data set used is too large to 
attach here. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Created] (DRILL-5502) Parallelized external sort is slower compared to the single fragment scenario on some data sets

2017-05-10 Thread Rahul Challapalli (JIRA)

Rahul Challapalli created DRILL-5502:


 Summary: Parallelized external sort is slower compared to the 
single fragment scenario on some data sets
 Key: DRILL-5502
 URL: https://issues.apache.org/jira/browse/DRILL-5502
 Project: Apache Drill
  Issue Type: Bug
  Components: Execution - Relational Operators
Affects Versions: 1.10.0
Reporter: Rahul Challapalli
Assignee: Paul Rogers


git.commit.id.abbrev=1e0a14c

The below query runs in a single fragment and completes in ~13 minutes
{code}
ALTER SESSION SET `exec.sort.disable_managed` = false;
alter session set `planner.width.max_per_node` = 1;
alter session set `planner.memory.max_query_memory_per_node` = 6260;
alter session set `planner.width.max_per_query` = 17;
select count(*) from (select * from 
dfs.`/drill/testdata/resource-manager/5kwidecolumns_500k.tbl` order by 
columns[0]) d where d.columns[0] = '4041054511';
+-+
| EXPR$0  |
+-+
| 0   |
+-+
1 row selected (832.705 seconds)
{code}

Now I increased the parallelization to 10 and also increased the memory 
allocated to the sort by 10 times, so that each individual fragments still ends 
up getting the similar amount of memory. In this case however the query takes 
~30 minutes to complete which is strange
{code}
ALTER SESSION SET `exec.sort.disable_managed` = false;
alter session set `planner.width.max_per_node` = 10;
alter session set `planner.memory.max_query_memory_per_node` = 62600;
alter session set `planner.width.max_per_query` = 17;
select count(*) from (select * from 
dfs.`/drill/testdata/resource-manager/5kwidecolumns_500k.tbl` order by 
columns[0]) d where d.columns[0] = '4041054511';
+-+
| EXPR$0  |
+-+
| 0   |
+-+
1 row selected (1845.508 seconds)
{code}

My data set contains wide columns (5k chars wide). I will try to reproduce this 
with a data set where the column width is < 256 bytes. 

Attached the data profile and log file from both the scenarios. The data set is 
too large to attach to a jira



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Created] (DRILL-5500) Query hung in CANCELLATION_REQUESTED state

2017-05-10 Thread Rahul Challapalli (JIRA)

Rahul Challapalli created DRILL-5500:


 Summary: Query hung in CANCELLATION_REQUESTED state
 Key: DRILL-5500
 URL: https://issues.apache.org/jira/browse/DRILL-5500
 Project: Apache Drill
  Issue Type: Bug
  Components: Execution - Flow
Affects Versions: 1.10.0
Reporter: Rahul Challapalli


git.commit.id.abbrev=1e0a14c

I cancelled the below query after ~17 minutes and the profiles indicated that 
the cancellation did not complete
{code}
alter session set `planner.enable_decimal_data_type` = true;
alter session set `planner.slice_target` = 300;
ALTER SESSION SET `exec.sort.disable_managed` = false;
alter session set `planner.width.max_per_node` = 17;
alter session set `planner.width.max_per_query` = 17;
alter session set `planner.memory.max_query_memory_per_node` = 1192600;
select count(*) from (select * from 
dfs.`/drill/testdata/resource-manager/5kwidecolumns_500k.tbl` order by 
columns[0]) d where d.columns[0] = '4041054511';
{code}

~1 minute after the cancellation has been issued, I took the jstack output and 
it indicated that some fragments are running. The sort operator in the query 
did not yet start any spilling by the time I issued the cancel request.

I attached the query profile, log files and the jstack output



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Created] (DRILL-5497) External sort's logic for memory allotment to each fragment should also consider width.max_per_query

2017-05-09 Thread Rahul Challapalli (JIRA)

Rahul Challapalli created DRILL-5497:


 Summary: External sort's logic for memory allotment to each 
fragment should also consider  width.max_per_query 
 Key: DRILL-5497
 URL: https://issues.apache.org/jira/browse/DRILL-5497
 Project: Apache Drill
  Issue Type: Bug
  Components: Execution - Relational Operators
Affects Versions: 1.10.0
Reporter: Rahul Challapalli
Assignee: Paul Rogers


git.commit.id.abbrev=1e0a14c

Currently external sort only considers 'planner.width.max_per_node' option when 
computing the memory allocation to each fragment. It should also consider 
'planner.width.max_per_query' as well. This allows users to externally control 
the desired parallelization for a query across the cluster.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Created] (DRILL-5495) convert_from function on top of int96 data results in ArrayIndexOutOfBoundsException

2017-05-09 Thread Rahul Challapalli (JIRA)

Rahul Challapalli created DRILL-5495:


 Summary: convert_from function on top of int96 data results in 
ArrayIndexOutOfBoundsException
 Key: DRILL-5495
 URL: https://issues.apache.org/jira/browse/DRILL-5495
 Project: Apache Drill
  Issue Type: Bug
Affects Versions: 1.10.0
Reporter: Rahul Challapalli


git.commit.id.abbrev=1e0a14c

The data set used is generated from spark and contains a timestamp stored as 
int96
{code}
[root@qa-node190 framework]# /home/parquet-tools-1.5.1-SNAPSHOT/parquet-meta 
/home/framework/framework/resources/Datasources/parquet_date/spark_generated/d4/part-r-0-08c5c621-62ea-4fee-b690-11576eddc39c.snappy.parquet
 
creator: parquet-mr (build 32c46643845ea8a705c35d4ec8fc654cc8ff816d) 
extra:   org.apache.spark.sql.parquet.row.metadata = 
{"type":"struct","fields":[{"name":"a","type":"integer","nullable":true,"metadata":{}},{"name":"b","type":"strin
 [more]...

file schema: spark_schema 
---
a:   OPTIONAL INT32 R:0 D:1
b:   OPTIONAL BINARY O:UTF8 R:0 D:1
c:   OPTIONAL INT32 O:DATE R:0 D:1
d:   OPTIONAL INT96 R:0 D:1

row group 1: RC:1 TS:8661 
---
a:INT32 SNAPPY DO:0 FPO:4 SZ:2367/2571/1.09 VC:1 
ENC:RLE,PLAIN,BIT_PACKED
b:BINARY SNAPPY DO:0 FPO:2371 SZ:2329/2843/1.22 VC:1 
ENC:RLE,PLAIN_DICTIONARY,BIT_PACKED
c:INT32 SNAPPY DO:0 FPO:4700 SZ:1374/1507/1.10 VC:1 
ENC:RLE,PLAIN,BIT_PACKED
d:INT96 SNAPPY DO:0 FPO:6074 SZ:1597/1740/1.09 VC:1 
ENC:RLE,PLAIN_DICTIONARY,BIT_PACKED
{code}

The below query fails with an ArrayIndexOutOfBoundsException
{code}
select convert_from(d, 'TIMESTAMP_IMPALA') from 
dfs.`/drill/testdata/resource-manager/d4`;

Fails with below error after displaying a bunch of records
Error: SYSTEM ERROR: ArrayIndexOutOfBoundsException: 0

Fragment 1:0

[Error Id: f963f6c0-3306-49a6-9d98-a193c5e7cfee on qa-node190.qa.lab:31010] 
(state=,code=0)
{code}

Attached the logs, profiles and data files



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Created] (DRILL-5493) Managed External Sort + CTAS partition by results in "Unable to allocate sv2 vector" error

2017-05-09 Thread Rahul Challapalli (JIRA)

Rahul Challapalli created DRILL-5493:


 Summary: Managed External Sort + CTAS partition by results in 
"Unable to allocate sv2 vector" error
 Key: DRILL-5493
 URL: https://issues.apache.org/jira/browse/DRILL-5493
 Project: Apache Drill
  Issue Type: Bug
  Components: Execution - Relational Operators
Affects Versions: 1.10.0
Reporter: Rahul Challapalli
Assignee: Paul Rogers


Config :
{code}
git.commit.id.abbrev=1e0a14c
No of nodes : 1
DRILL_MAX_DIRECT_MEMORY="32G"
DRILL_MAX_HEAP="4G"
Assertions Enabled : true
{code}

The below query fails during the CTAS phase (the explicit order by in the query 
runs fine)
{code}
ALTER SESSION SET `exec.sort.disable_managed` = false;
alter session set `planner.width.max_per_query` = 17;
create table dfs.drillTestDir.xsort_ctas4 partition by (col1) as select 
columns[0] as col1 from (select * from 
dfs.`/drill/testdata/resource-manager/wide-to-zero` order by columns[0]);

Error: RESOURCE ERROR: Unable to allocate sv2 buffer

Fragment 0:0

[Error Id: 24ae2ec8-ac2a-45c3-b550-43c12764165d on qa-node190.qa.lab:31010] 
(state=,code=0)
{code}

I attached the logs and profiles. The data is too large to attach to a jira.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Created] (DRILL-5482) Improve the error message when drill process does not have access to the spill directory

2017-05-05 Thread Rahul Challapalli (JIRA)

Rahul Challapalli created DRILL-5482:


 Summary: Improve the error message when drill process does not 
have access to the spill directory
 Key: DRILL-5482
 URL: https://issues.apache.org/jira/browse/DRILL-5482
 Project: Apache Drill
  Issue Type: Bug
  Components: Execution - Relational Operators
Affects Versions: 1.10.0
Reporter: Rahul Challapalli
Assignee: Paul Rogers


git.commit.id.abbrev=1e0a14c

When the drillbit process does not have write permissions to the the spill 
directory, we get the below generic error message
{code}
Error: RESOURCE ERROR: External Sort encountered an error while spilling to disk

Fragment 0:0

[Error Id: 49addefc-2fb2-467c-a524-b09e7344e9f1 on qa-node190.qa.lab:31010] 
(state=,code=0)
{code}

We can certainly improve the error message so that the software becomes more 
self-serviceable



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Created] (DRILL-5478) Spill file size parameter is not honored by the managed external sort

2017-05-05 Thread Rahul Challapalli (JIRA)

Rahul Challapalli created DRILL-5478:


 Summary: Spill file size parameter is not honored by the managed 
external sort
 Key: DRILL-5478
 URL: https://issues.apache.org/jira/browse/DRILL-5478
 Project: Apache Drill
  Issue Type: Bug
  Components: Execution - Relational Operators
Affects Versions: 1.10.0
Reporter: Rahul Challapalli
Assignee: Paul Rogers


git.commit.id.abbrev=1e0a14c

Query:
{code}
ALTER SESSION SET `exec.sort.disable_managed` = false;
alter session set `planner.width.max_per_node` = 1;
alter session set `planner.disable_exchanges` = true;
alter session set `planner.width.max_per_query` = 1;
alter session set `planner.memory.max_query_memory_per_node` = 1052428800;
alter session set `planner.enable_decimal_data_type` = true;
select count(*) from (
  select * from dfs.`/drill/testdata/resource-manager/all_types_large` d1
  order by d1.map.missing
) d;
{code}

Boot Options (spill file size is set to 256MB)
{code}
0: jdbc:drill:zk=10.10.100.190:5181> select * from sys.boot where name like 
'%spill%';
+--+-+---+-+--++---++
|   name   |  kind   | type  | status  
| num_val  | string_val | bool_val  | 
float_val  |
+--+-+---+-+--++---++
| drill.exec.sort.external.spill.directories   | STRING  | BOOT  | BOOT
| null | [
# drill-override.conf: 26
"/tmp/test"
]  | null  | null   |
| drill.exec.sort.external.spill.file_size | STRING  | BOOT  | BOOT
| null | "256M" | null  | 
null   |
| drill.exec.sort.external.spill.fs| STRING  | BOOT  | BOOT
| null | "maprfs:///"   | null  | 
null   |
| drill.exec.sort.external.spill.group.size| LONG| BOOT  | BOOT
| 4| null   | null  | 
null   |
| drill.exec.sort.external.spill.merge_batch_size  | STRING  | BOOT  | BOOT
| null | "16M"  | null  | 
null   |
| drill.exec.sort.external.spill.spill_batch_size  | STRING  | BOOT  | BOOT
| null | "8M"   | null  | 
null   |
| drill.exec.sort.external.spill.threshold | LONG| BOOT  | BOOT
| 4| null   | null  | 
null   |
+--+-+---+-+--++---++
{code}

Below are the spill files while the query is still executing. The size of the 
spill files is ~34MB
{code}
-rwxr-xr-x   3 root root   34957815 2017-05-05 11:26 
/tmp/test/26f33c36-4235-3531-aeaa-2c73dc4ddeb5_major0_minor0_op5_sort/run1
-rwxr-xr-x   3 root root   34957815 2017-05-05 11:27 
/tmp/test/26f33c36-4235-3531-aeaa-2c73dc4ddeb5_major0_minor0_op5_sort/run2
-rwxr-xr-x   3 root root  0 2017-05-05 11:27 
/tmp/test/26f33c36-4235-3531-aeaa-2c73dc4ddeb5_major0_minor0_op5_sort/run3
{code}

The data set is too large to attach here. Reach out to me if you need anything



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Created] (DRILL-5474) Spill directory is not being cleaned up immediately after cancellation

2017-05-04 Thread Rahul Challapalli (JIRA)

Rahul Challapalli created DRILL-5474:


 Summary: Spill directory is not being cleaned up immediately after 
cancellation
 Key: DRILL-5474
 URL: https://issues.apache.org/jira/browse/DRILL-5474
 Project: Apache Drill
  Issue Type: Bug
  Components: Execution - Relational Operators
Affects Versions: 1.10.0
Reporter: Rahul Challapalli
Assignee: Paul Rogers


I allowed the below query to run until it stats spilling. Now I cancelled the 
query and the sqlline prompt returned immediately. From this point, it took ~50 
seconds for this spill directory to be deleted and during this time, the spill 
file constantly grew in size
{code}
ALTER SESSION SET `exec.sort.disable_managed` = false;
alter session set `planner.width.max_per_node` = 1;
alter session set `planner.disable_exchanges` = true;
alter session set `planner.width.max_per_query` = 1;
alter session set `planner.memory.max_query_memory_per_node` = 1052428800;
alter session set `planner.enable_decimal_data_type` = true;
select count(*) from (
  select * from dfs.`/drill/testdata/resource-manager/all_types_large` d1
  order by d1.map.missing, d1.missing12.x, d1.missing1, d1.missing2, 
d1.missing3, d1.missing4,
d1.missing5, d1.missing6, d1.missing7, d1.missing8, d1.missing9, 
d1.missing10, d1.missing11,
d1.missing12.x, d1.missing13, d1.missing14, d1.missing15, d1.missing16, 
d1.missing17, d1.missing18,
d1.missing19, d1.missing20, d1.missing21, d1.missing22, d1.missing23, 
d1.missing24, d1.missing25,
d1.missing26, d1.missing27, d1.missing28, d1.`missing 29`, d1.missing30, 
d1.missing31, d1.missing32,
d1.missing33, d1.missing34, d1.m1
) d where d.missing3 is false;
{code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Created] (DRILL-5472) Parquet reader generating low-density batches causing Sort operator to spill un-necessarily

2017-05-04 Thread Rahul Challapalli (JIRA)

Rahul Challapalli created DRILL-5472:


 Summary: Parquet reader generating low-density batches causing 
Sort operator to spill un-necessarily
 Key: DRILL-5472
 URL: https://issues.apache.org/jira/browse/DRILL-5472
 Project: Apache Drill
  Issue Type: Bug
  Components: Execution - Relational Operators, Storage - Parquet
Reporter: Rahul Challapalli
Assignee: Paul Rogers


git.commit.id.abbrev=1e0a14c

The parquet file used in the below query is ~20MB. The uncompressed size id 
~1.2 GB. Now the below query has a sort which is given ~6GB memory for a single 
fragment and yet it spills.
{code}
select * from (select * from 
dfs.`/drill/testdata/resource-manager/all_types_large` s order by 
s.missing12.x) d where d.missing3 is false;
{code}

The profile indicates that the above query has spilled twice. Attached the 
profile and the logs



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Created] (DRILL-5469) Meaningless error message when trying to use a constant in an order by statement

2017-05-03 Thread Rahul Challapalli (JIRA)

Rahul Challapalli created DRILL-5469:


 Summary: Meaningless error message when trying to use a constant 
in an order by statement
 Key: DRILL-5469
 URL: https://issues.apache.org/jira/browse/DRILL-5469
 Project: Apache Drill
  Issue Type: Bug
Affects Versions: 1.10.0
Reporter: Rahul Challapalli


git.commit.id.abbrev=1e0a14c

While I agree that it does not make any sense to use a constant in an order by 
statement, users could just have such a typo in a large query. It would be 
really helpful if drill could point out precisely where the issue is. This is 
what I currently get
{code}
0: jdbc:drill:zk=10.10.100.190:5181> select * from cp.`tpch/lineitem.parquet` 
order by 5;
Error: VALIDATION ERROR: At line 1, column 51: Ordinal out of range

SQL Query null

[Error Id: e5cf4f6c-3d2c-412e-90a1-863897587399 on qa-node190.qa.lab:31010] 
(state=,code=0)
{code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Created] (DRILL-5466) Cancelling a query caused "DrillBuf refCnt has gone negative" error

2017-05-02 Thread Rahul Challapalli (JIRA)

Rahul Challapalli created DRILL-5466:


 Summary: Cancelling a query caused "DrillBuf refCnt has gone 
negative" error
 Key: DRILL-5466
 URL: https://issues.apache.org/jira/browse/DRILL-5466
 Project: Apache Drill
  Issue Type: Bug
Affects Versions: 1.10.0
Reporter: Rahul Challapalli
 Attachments: createViewsParquet.sql, ref_negative.log, 
ref_negative.profile

git.commit.id.abbrev=38ef562

The below query was running on a tpcds sf1 data set. I cancelled it after 9 
seconds and below is the message I got
{code}
alter session set `planner.enable_decimal_data_type` = true;
ALTER SESSION SET `exec.sort.disable_managed` = false;
alter session set `planner.width.max_per_node` = 1;
alter session set `planner.disable_exchanges` = true;
alter session set `planner.width.max_per_query` = 1;
alter session set `planner.memory.max_query_memory_per_node` = 200435456;
alter session set `planner.enable_hashjoin` = false;
SELECT dt.d_year,
   item.i_brand_id  brand_id,
   item.i_brand brand,
   Sum(ss_ext_discount_amt) sum_agg
FROM   date_dim dt,
   store_sales,
   item
WHERE  dt.d_date_sk = store_sales.ss_sold_date_sk
   AND store_sales.ss_item_sk = item.i_item_sk
   AND item.i_manufact_id = 427
   AND dt.d_moy = 11
GROUP  BY dt.d_year,
  item.i_brand,
  item.i_brand_id
ORDER  BY dt.d_year,
  sum_agg DESC,
  brand_id;
  
Error: Unexpected RuntimeException: java.lang.IllegalStateException: 
DrillBuf[7530] refCnt has gone negative. Buffer Info: ledger[3661] allocator: 
ROOT), isOwning: false, size: 1, references: -1, life: 
8258809446482809..8258817912668200, allocatorManager: [3661, life: 
8258809446399747..8258817913039953] (state=,code=0)

{code}

Surprisingly there is no trace of the exception in the logs. I attached the 
log, profile, and view definition files. Let me know if you have any questions

The data set can be downloaded from 
https://s3.amazonaws.com/apache-drill/files/tpcds/tpcds_sf1_parquet.tar.gz



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Created] (DRILL-5465) Managed external sort results in an OOM

2017-05-02 Thread Rahul Challapalli (JIRA)

Rahul Challapalli created DRILL-5465:


 Summary: Managed external sort results in an OOM
 Key: DRILL-5465
 URL: https://issues.apache.org/jira/browse/DRILL-5465
 Project: Apache Drill
  Issue Type: Bug
Reporter: Rahul Challapalli


git.commit.id.abbrev=1e0a14c

The below query fails with an OOM on top of Tpcds SF1 parquet data. Since the 
sort already spilled once, I assume there is sufficient memory to handle the 
spill/merge batches. The view definition file is attached and the data can be 
downloaded from [1]
{code}
use dfs.tpcds_sf1_parquet_views;
alter session set `planner.enable_decimal_data_type` = true;
ALTER SESSION SET `exec.sort.disable_managed` = false;
alter session set `planner.width.max_per_node` = 1;
alter session set `planner.disable_exchanges` = true;
alter session set `planner.width.max_per_query` = 1;
alter session set `planner.memory.max_query_memory_per_node` = 200435456;
alter session set `planner.enable_hashjoin` = false;
SELECT dt.d_year,
   item.i_brand_id  brand_id,
   item.i_brand brand,
   Sum(ss_ext_discount_amt) sum_agg
FROM   date_dim dt,
   store_sales,
   item
WHERE  dt.d_date_sk = store_sales.ss_sold_date_sk
   AND store_sales.ss_item_sk = item.i_item_sk
   AND item.i_manufact_id = 427
   AND dt.d_moy = 11
GROUP  BY dt.d_year,
  item.i_brand,
  item.i_brand_id
ORDER  BY dt.d_year,
  sum_agg DESC,
  brand_id;
{code}

Exception from the logs
{code}
[Error Id: 676ff6ad-829d-4920-9d4f-5132601d27b4 ]
at 
org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:544)
 ~[drill-common-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
at 
org.apache.drill.exec.physical.impl.xsort.ExternalSortBatch.mergeAndSpill(ExternalSortBatch.java:617)
 [drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
at 
org.apache.drill.exec.physical.impl.xsort.ExternalSortBatch.innerNext(ExternalSortBatch.java:425)
 [drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
at 
org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:162)
 [drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
at 
org.apache.drill.exec.physical.impl.validate.IteratorValidatorBatchIterator.next(IteratorValidatorBatchIterator.java:215)
 [drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
at 
org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:119)
 [drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
at 
org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:109)
 [drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
at 
org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext(AbstractSingleRecordBatch.java:51)
 [drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
at 
org.apache.drill.exec.physical.impl.svremover.RemovingRecordBatch.innerNext(RemovingRecordBatch.java:93)
 [drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
at 
org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:162)
 [drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
at 
org.apache.drill.exec.physical.impl.validate.IteratorValidatorBatchIterator.next(IteratorValidatorBatchIterator.java:215)
 [drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
at 
org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:119)
 [drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
at 
org.apache.drill.exec.record.RecordIterator.nextBatch(RecordIterator.java:99) 
[drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
at 
org.apache.drill.exec.record.RecordIterator.next(RecordIterator.java:185) 
[drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
at 
org.apache.drill.exec.record.RecordIterator.prepare(RecordIterator.java:169) 
[drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
at 
org.apache.drill.exec.physical.impl.join.JoinStatus.prepare(JoinStatus.java:87) 
[drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
at 
org.apache.drill.exec.physical.impl.join.MergeJoinBatch.innerNext(MergeJoinBatch.java:160)
 [drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
at 
org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:162)
 [drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
at 
org.apache.drill.exec.physical.impl.validate.IteratorValidatorBatchIterator.next(IteratorValidatorBatchIterator.java:215)
 [drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
at 
org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:119)
 [drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
at 
org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:109)
 [drill-java-exec-1.11.0-SNAPSHOT.jar

[jira] [Created] (DRILL-5463) Managed External Sort : Unable to allocate sv2 for 32768 records, and not enough batchGroups to spill

2017-05-02 Thread Rahul Challapalli (JIRA)

Rahul Challapalli created DRILL-5463:


 Summary: Managed External Sort : Unable to allocate sv2 for 32768 
records, and not enough batchGroups to spill
 Key: DRILL-5463
 URL: https://issues.apache.org/jira/browse/DRILL-5463
 Project: Apache Drill
  Issue Type: Bug
Affects Versions: 1.10.0
Reporter: Rahul Challapalli


git.commit.id.abbrev=1e0a14c

The below query fails with an OOM after running for ~9 minutes. The memory 
given to an individual sort is very low. However that should still be 
sufficient to accommodate 2 incoming batches + 1 outgoing batch + some 
overhead. The data set used in the query can be found at [1]
{code}
alter session set `planner.enable_decimal_data_type` = true;
ALTER SESSION SET `exec.sort.disable_managed` = false;
alter session set `planner.width.max_per_node` = 1;
alter session set `planner.disable_exchanges` = true;
alter session set `planner.width.max_per_query` = 1;
alter session set `planner.memory.max_query_memory_per_node` = 200435456;
alter session set `planner.enable_hashjoin` = false;
WITH year_total 
 AS (SELECT c_customer_id   customer_id, 
c_first_namecustomer_first_name, 
c_last_name customer_last_name, 
c_preferred_cust_flag   
customer_preferred_cust_flag 
, 
c_birth_country 
customer_birth_country, 
c_login customer_login, 
c_email_address customer_email_address, 
d_year  dyear, 
Sum(( ( ss_ext_list_price - ss_ext_wholesale_cost 
- ss_ext_discount_amt 
  ) 
  + 
  ss_ext_sales_price ) / 2) year_total, 
's' sale_type 
 FROM   customer, 
store_sales, 
date_dim 
 WHERE  c_customer_sk = ss_customer_sk 
AND ss_sold_date_sk = d_date_sk 
 GROUP  BY c_customer_id, 
   c_first_name, 
   c_last_name, 
   c_preferred_cust_flag, 
   c_birth_country, 
   c_login, 
   c_email_address, 
   d_year 
 UNION ALL 
 SELECT c_customer_id customer_id, 
c_first_name  customer_first_name, 
c_last_name   customer_last_name, 
c_preferred_cust_flag 
customer_preferred_cust_flag, 
c_birth_country   
customer_birth_country 
, 
c_login 
customer_login, 
c_email_address   
customer_email_address 
, 
d_yeardyear 
, 
Sum(( ( ( cs_ext_list_price 
  - cs_ext_wholesale_cost 
  - cs_ext_discount_amt 
) + 
  cs_ext_sales_price ) / 2 )) year_total, 
'c'   sale_type 
 FROM   customer, 
catalog_sales, 
date_dim 
 WHERE  c_customer_sk = cs_bill_customer_sk 
AND cs_sold_date_sk = d_date_sk 
 GROUP  BY c_customer_id, 
   c_first_name, 
   c_last_name, 
   c_preferred_cust_flag, 
   c_birth_country, 
   c_login, 
   c_email_address, 
   d_year 
 UNION ALL 
 SELECT c_customer_id customer_id, 
c_first_name  customer_first_name, 
c_last_name   customer_last_name, 
c_preferred_cust_flag 
customer_preferred_cust_flag, 
c_birth_country   
customer_birth_country 
, 
c_login 
customer_login, 
c_email_address   
customer_email_address 
, 
d_yeardyear 
, 
Sum(( ( ( ws_ext_list_price 
  - ws_ext_wholesale_cost 
  - ws_ext_discount_amt 
) + 
  ws_ext_sales_price ) / 2 )) year_total, 
'w'

[jira] [Created] (DRILL-5453) Managed External Sort : Sorting on a lot of columns is taking unreasonably long time

2017-04-28 Thread Rahul Challapalli (JIRA)

Rahul Challapalli created DRILL-5453:


 Summary: Managed External Sort : Sorting on a lot of columns is 
taking unreasonably long time
 Key: DRILL-5453
 URL: https://issues.apache.org/jira/browse/DRILL-5453
 Project: Apache Drill
  Issue Type: Bug
  Components: Execution - Relational Operators
Affects Versions: 1.10.0
Reporter: Rahul Challapalli


The below query ran for ~16hrs before I cancelled it.

{code}
ALTER SESSION SET `exec.sort.disable_managed` = false;
alter session set `planner.memory.max_query_memory_per_node` = 482344960;
alter session set `planner.width.max_per_node` = 1;
alter session set `planner.width.max_per_query` = 1;
select count(*) from (select * from 
dfs.`/drill/testdata/resource-manager/3500cols.tbl` order by 
columns[450],columns[330],columns[230],columns[220],columns[110],columns[90],columns[80],columns[70],columns[40],columns[10],columns[20],columns[30],columns[40],columns[50],
 
columns[454],columns[413],columns[940],columns[834],columns[73],columns[140],columns[104],columns[],columns[30],columns[2420],columns[1520],
 columns[1410], 
columns[1110],columns[1290],columns[2380],columns[705],columns[45],columns[1054],columns[2430],columns[420],columns[404],columns[3350],
 
columns[],columns[153],columns[356],columns[84],columns[745],columns[1450],columns[103],columns[2065],columns[343],columns[3420],columns[530],
 columns[3210] ) d where d.col433 = 'sjka skjf';
alter session set `planner.memory.max_query_memory_per_node` = 2147483648;
{code}

The data set and the logs are too large to attach to a jira. But below is a 
description of the data
{code}
No of records : 1,000,000
No of columns : 3500
Length of each column : < 50
{code}

The profile is attached and I will give my analysis on why I think its an 
un-reasonable amount of time soon.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Created] (DRILL-5447) Managed External Sort : Unable to allocate sv2 vector

2017-04-26 Thread Rahul Challapalli (JIRA)

Rahul Challapalli created DRILL-5447:


 Summary: Managed External Sort : Unable to allocate sv2 vector
 Key: DRILL-5447
 URL: https://issues.apache.org/jira/browse/DRILL-5447
 Project: Apache Drill
  Issue Type: Bug
  Components: Execution - Relational Operators
Affects Versions: 1.10.0
Reporter: Rahul Challapalli
Assignee: Paul Rogers


git.commit.id.abbrev=3e8b01d

Dataset :
{code}
Every records contains a repeated type with 2000 elements. 
The repeated type contains varchars of length 250 for the first 2000 records 
and single character strings for the next 2000 records
The above pattern is repeated a few types
{code}

The below query fails
{code}
ALTER SESSION SET `exec.sort.disable_managed` = false;
alter session set `planner.width.max_per_node` = 1;
alter session set `planner.disable_exchanges` = true;
alter session set `planner.width.max_per_query` = 1;
select count(*) from (select * from (select id, flatten(str_list) str from 
dfs.`/drill/testdata/resource-manager/flatten-large-small.json`) d order by 
d.str) d1 where d1.id=0;

Error: RESOURCE ERROR: Unable to allocate sv2 buffer

Fragment 0:0

[Error Id: 9e45c293-ab26-489d-a90e-25da96004f15 on qa-node190.qa.lab:31010] 
(state=,code=0)
{code}

Exception from the logs
{code}
[Error Id: 9e45c293-ab26-489d-a90e-25da96004f15 ]
at 
org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:544)
 ~[drill-common-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
at 
org.apache.drill.exec.physical.impl.xsort.managed.ExternalSortBatch.newSV2(ExternalSortBatch.java:1463)
 [drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
at 
org.apache.drill.exec.physical.impl.xsort.managed.ExternalSortBatch.makeSelectionVector(ExternalSortBatch.java:799)
 [drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
at 
org.apache.drill.exec.physical.impl.xsort.managed.ExternalSortBatch.processBatch(ExternalSortBatch.java:856)
 [drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
at 
org.apache.drill.exec.physical.impl.xsort.managed.ExternalSortBatch.loadBatch(ExternalSortBatch.java:618)
 [drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
at 
org.apache.drill.exec.physical.impl.xsort.managed.ExternalSortBatch.load(ExternalSortBatch.java:660)
 [drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
at 
org.apache.drill.exec.physical.impl.xsort.managed.ExternalSortBatch.innerNext(ExternalSortBatch.java:559)
 [drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
at 
org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:162)
 [drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
at 
org.apache.drill.exec.physical.impl.validate.IteratorValidatorBatchIterator.next(IteratorValidatorBatchIterator.java:215)
 [drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
at 
org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:119)
 [drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
at 
org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:109)
 [drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
at 
org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext(AbstractSingleRecordBatch.java:51)
 [drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
at 
org.apache.drill.exec.physical.impl.svremover.RemovingRecordBatch.innerNext(RemovingRecordBatch.java:93)
 [drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
at 
org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:162)
 [drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
at 
org.apache.drill.exec.physical.impl.validate.IteratorValidatorBatchIterator.next(IteratorValidatorBatchIterator.java:215)
 [drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
at 
org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:119)
 [drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
at 
org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:109)
 [drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
at 
org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext(AbstractSingleRecordBatch.java:51)
 [drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
at 
org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:162)
 [drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
at 
org.apache.drill.exec.physical.impl.validate.IteratorValidatorBatchIterator.next(IteratorValidatorBatchIterator.java:215)
 [drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
at 
org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:119)
 [drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
at 
org.apache.drill.exec.record.AbstractRecordBatch.next

[jira] [Created] (DRILL-5445) Assertion Error in Managed External Sort when dealing with repeated maps

2017-04-25 Thread Rahul Challapalli (JIRA)

Rahul Challapalli created DRILL-5445:


 Summary: Assertion Error in Managed External Sort when dealing 
with repeated maps
 Key: DRILL-5445
 URL: https://issues.apache.org/jira/browse/DRILL-5445
 Project: Apache Drill
  Issue Type: Bug
Affects Versions: 1.10.0
Reporter: Rahul Challapalli
Assignee: Paul Rogers


git.commit.id.abbrev=3e8b01d

The below query fails with an Assertion Error (I am running with assertions 
enabled)
{code}
ALTER SESSION SET `exec.sort.disable_managed` = false;
alter session set `planner.width.max_per_node` = 1;
alter session set `planner.disable_exchanges` = true;
alter session set `planner.width.max_per_query` = 1;
alter session set `planner.memory.max_query_memory_per_node` = 152428800;
select count(*) from (
select * from (
select event_info.uid, transaction_info.trans_id, event_info.event.evnt_id
from (
 select userinfo.transaction.trans_id trans_id, max(userinfo.event.event_time) 
max_event_time
 from (
 select uid, flatten(events) event, flatten(transactions) transaction from 
dfs.`/drill/testdata/resource-manager/nested-large.json`
 ) userinfo
 where userinfo.transaction.trans_time >= userinfo.event.event_time
 group by userinfo.transaction.trans_id
) transaction_info
inner join
(
 select uid, flatten(events) event
 from dfs.`/drill/testdata/resource-manager/nested-large.json`
) event_info
on transaction_info.max_event_time = event_info.event.event_time) d order by 
features[0].type) d1 where d1.uid < -1;
{code}

Below is the error from the logs
{code}
[Error Id: 26983344-dee3-4a33-8508-ad125f01fee6 on qa-node190.qa.lab:31010]
at 
org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:544)
 ~[drill-common-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
at 
org.apache.drill.exec.work.fragment.FragmentExecutor.sendFinalState(FragmentExecutor.java:295)
 [drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
at 
org.apache.drill.exec.work.fragment.FragmentExecutor.cleanup(FragmentExecutor.java:160)
 [drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
at 
org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:264)
 [drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
at 
org.apache.drill.common.SelfCleaningRunnable.run(SelfCleaningRunnable.java:38) 
[drill-common-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) 
[na:1.7.0_111]
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) 
[na:1.7.0_111]
at java.lang.Thread.run(Thread.java:745) [na:1.7.0_111]
Caused by: java.lang.RuntimeException: java.lang.AssertionError
at 
org.apache.drill.common.DeferredException.addThrowable(DeferredException.java:101)
 ~[drill-common-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
at 
org.apache.drill.exec.work.fragment.FragmentExecutor.fail(FragmentExecutor.java:409)
 [drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
at 
org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:250)
 [drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
... 4 common frames omitted
Caused by: java.lang.AssertionError: null
at 
org.apache.drill.exec.vector.complex.RepeatedMapVector.load(RepeatedMapVector.java:444)
 ~[vector-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
at 
org.apache.drill.exec.cache.VectorAccessibleSerializable.readFromStream(VectorAccessibleSerializable.java:118)
 ~[drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
at 
org.apache.drill.exec.physical.impl.xsort.managed.BatchGroup$SpilledRun.getBatch(BatchGroup.java:222)
 ~[drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
at 
org.apache.drill.exec.physical.impl.xsort.managed.BatchGroup$SpilledRun.getNextIndex(BatchGroup.java:196)
 ~[drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
at 
org.apache.drill.exec.test.generated.PriorityQueueCopierGen23.setup(PriorityQueueCopierTemplate.java:60)
 ~[na:na]
at 
org.apache.drill.exec.physical.impl.xsort.managed.CopierHolder.createCopier(CopierHolder.java:116)
 ~[drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
at 
org.apache.drill.exec.physical.impl.xsort.managed.CopierHolder.access$200(CopierHolder.java:45)
 ~[drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
at 
org.apache.drill.exec.physical.impl.xsort.managed.CopierHolder$BatchMerger.(CopierHolder.java:210)
 ~[drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
at 
org.apache.drill.exec.physical.impl.xsort.managed.CopierHolder$BatchMerger.(CopierHolder.java:171)
 ~[drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
at 
org.apache.drill.exec.physical.impl.xsort.managed.CopierHolder.startFinalMerge(CopierHolder.java:85)
 ~[drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SN

[jira] [Created] (DRILL-5443) Managed External Sort fails with OOM while spilling to disk

2017-04-24 Thread Rahul Challapalli (JIRA)

Rahul Challapalli created DRILL-5443:


 Summary: Managed External Sort fails with OOM while spilling to 
disk
 Key: DRILL-5443
 URL: https://issues.apache.org/jira/browse/DRILL-5443
 Project: Apache Drill
  Issue Type: Bug
  Components: Execution - Relational Operators
Affects Versions: 1.10.0, 1.11.0
Reporter: Rahul Challapalli
Assignee: Paul Rogers


git.commit.id.abbrev=3e8b01d

The below query fails with an OOM
{code}
ALTER SESSION SET `exec.sort.disable_managed` = false;
alter session set `planner.width.max_per_node` = 1;
alter session set `planner.disable_exchanges` = true;
alter session set `planner.width.max_per_query` = 1;
alter session set `planner.memory.max_query_memory_per_node` = 52428800;
select s1.type type, flatten(s1.rms.rptd) rptds from (select d.type type, d.uid 
uid, flatten(d.map.rm) rms from 
dfs.`/drill/testdata/resource-manager/nested-large.json` d order by d.uid) s1 
order by s1.rms.mapid;
{code}

Exception from the logs
{code}
2017-04-24 17:22:59,439 [27016969-ef53-40dc-b582-eea25371fa1c:frag:0:0] INFO  
o.a.d.e.p.i.x.m.ExternalSortBatch - User Error Occurred: External Sort 
encountered an error while spilling to disk (Unable to allocate buffer of size 
524288 (rounded from 307197) due to memory limit. Current allocation: 25886728)
org.apache.drill.common.exceptions.UserException: RESOURCE ERROR: External Sort 
encountered an error while spilling to disk


[Error Id: a64e3790-3a34-42c8-b4ea-4cb1df780e63 ]
at 
org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:544)
 ~[drill-common-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
at 
org.apache.drill.exec.physical.impl.xsort.managed.ExternalSortBatch.doMergeAndSpill(ExternalSortBatch.java:1445)
 [drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
at 
org.apache.drill.exec.physical.impl.xsort.managed.ExternalSortBatch.mergeAndSpill(ExternalSortBatch.java:1376)
 [drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
at 
org.apache.drill.exec.physical.impl.xsort.managed.ExternalSortBatch.mergeRuns(ExternalSortBatch.java:1372)
 [drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
at 
org.apache.drill.exec.physical.impl.xsort.managed.ExternalSortBatch.consolidateBatches(ExternalSortBatch.java:1299)
 [drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
at 
org.apache.drill.exec.physical.impl.xsort.managed.ExternalSortBatch.mergeSpilledRuns(ExternalSortBatch.java:1195)
 [drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
at 
org.apache.drill.exec.physical.impl.xsort.managed.ExternalSortBatch.load(ExternalSortBatch.java:689)
 [drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
at 
org.apache.drill.exec.physical.impl.xsort.managed.ExternalSortBatch.innerNext(ExternalSortBatch.java:559)
 [drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
at 
org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:162)
 [drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
at 
org.apache.drill.exec.physical.impl.validate.IteratorValidatorBatchIterator.next(IteratorValidatorBatchIterator.java:215)
 [drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
at 
org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:119)
 [drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
at 
org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:109)
 [drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
at 
org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext(AbstractSingleRecordBatch.java:51)
 [drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
at 
org.apache.drill.exec.physical.impl.svremover.RemovingRecordBatch.innerNext(RemovingRecordBatch.java:93)
 [drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
at 
org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:162)
 [drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
at 
org.apache.drill.exec.physical.impl.validate.IteratorValidatorBatchIterator.next(IteratorValidatorBatchIterator.java:215)
 [drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
at 
org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:119)
 [drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
at 
org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:109)
 [drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
at 
org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext(AbstractSingleRecordBatch.java:51)
 [drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
at 
org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext(ProjectRecordBatch.java:135)
 [drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
at

Re: Submitting physical plans

2017-04-11 Thread rahul challapalli

Thanks jinfeng and paul. I will play with submit-plan script and see how it
goes.

On Wed, Mar 29, 2017 at 11:48 AM, Paul Rogers  wrote:

> We have unit tests that submit physical plans via the Drill client.
>
> My guess is that submit_plan may need a bit of work: some bit-rot may have
> set in when we updated the Drill scrips a while back. If you try it and
> find issues, let me know and I’ll fix them.
>
> - Paul
>
> > On Mar 29, 2017, at 11:00 AM, Jinfeng Ni  wrote:
> >
> > There is a "submit_plan" script, under bin/ directory. You may want to
> > take a look and have a try. You are right that we used to use that
> > tool to submit physical plan directly. I'm not sure if that tool is
> > maintained throughout all the releases.
> >
> > On Wed, Mar 29, 2017 at 10:47 AM, rahul challapalli
> >  wrote:
> >> Drillers,
> >>
> >> I know sometime in the past folks were using/discussing the ability to
> >> submit physical plans to drill. Any pointers on how to do that now?
> >>
> >> - Rahul
>
>

[jira] [Created] (DRILL-5406) Flatten produces a random ClassCastException

2017-03-31 Thread Rahul Challapalli (JIRA)

Rahul Challapalli created DRILL-5406:


 Summary: Flatten produces a random ClassCastException
 Key: DRILL-5406
 URL: https://issues.apache.org/jira/browse/DRILL-5406
 Project: Apache Drill
  Issue Type: Bug
  Components: Storage - JSON
Affects Versions: 1.9.0
Reporter: Rahul Challapalli


I hit a random error on drill 1.9.0. I will try to reproduce the issue on the 
latest master.

The below query did not fail when I ran it in isolation. However when I ran the 
test suite at [1], which also contains the below query, by using 50 threads 
submitting queries concurrently, I hit the below error.
{code}
select flatten(convert_from(columns[0], 'JSON')) from 
`json_kvgenflatten/convert4783_2.tbl` where 1=2

[Error Id: 1b5f4aef-ae34-4af4-9f2f-8349f8dd97c2 on qa-node183.qa.lab:31010]

  (java.lang.ClassCastException) 
org.apache.drill.common.expression.TypedNullConstant cannot be cast to 
org.apache.drill.exec.expr.ValueVectorReadExpression

org.apache.drill.exec.physical.impl.flatten.FlattenRecordBatch.setupNewSchema():307
org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext():78

org.apache.drill.exec.physical.impl.flatten.FlattenRecordBatch.innerNext():120
org.apache.drill.exec.record.AbstractRecordBatch.next():162

org.apache.drill.exec.physical.impl.validate.IteratorValidatorBatchIterator.next():215
org.apache.drill.exec.physical.impl.BaseRootExec.next():104
org.apache.drill.exec.physical.impl.ScreenCreator$ScreenRoot.innerNext():81
org.apache.drill.exec.physical.impl.BaseRootExec.next():94
org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():232
org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():226
java.security.AccessController.doPrivileged():-2
javax.security.auth.Subject.doAs():422
org.apache.hadoop.security.UserGroupInformation.doAs():1595
org.apache.drill.exec.work.fragment.FragmentExecutor.run():226
org.apache.drill.common.SelfCleaningRunnable.run():38
java.util.concurrent.ThreadPoolExecutor.runWorker():1142
java.util.concurrent.ThreadPoolExecutor$Worker.run():617
java.lang.Thread.run():745

at 
oadd.org.apache.drill.exec.rpc.user.QueryResultHandler.resultArrived(QueryResultHandler.java:123)
at 
oadd.org.apache.drill.exec.rpc.user.UserClient.handleReponse(UserClient.java:144)
at 
oadd.org.apache.drill.exec.rpc.BasicClientWithConnection.handle(BasicClientWithConnection.java:46)
at 
oadd.org.apache.drill.exec.rpc.BasicClientWithConnection.handle(BasicClientWithConnection.java:31)
at oadd.org.apache.drill.exec.rpc.RpcBus.handle(RpcBus.java:65)
at 
oadd.org.apache.drill.exec.rpc.RpcBus$RequestEvent.run(RpcBus.java:363)
at 
oadd.org.apache.drill.common.SerializedExecutor$RunnableProcessor.run(SerializedExecutor.java:89)
at 
oadd.org.apache.drill.exec.rpc.RpcBus$SameExecutor.execute(RpcBus.java:240)
at 
oadd.org.apache.drill.common.SerializedExecutor.execute(SerializedExecutor.java:123)
at 
oadd.org.apache.drill.exec.rpc.RpcBus$InboundHandler.decode(RpcBus.java:274)
at 
oadd.org.apache.drill.exec.rpc.RpcBus$InboundHandler.decode(RpcBus.java:245)
at 
oadd.io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:89)
at 
oadd.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339)
at 
oadd.io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324)
at 
oadd.io.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:254)
at 
oadd.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339)
at 
oadd.io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324)
at 
oadd.io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:103)
at 
oadd.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339)
at 
oadd.io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324)
at 
oadd.io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:242)
at 
oadd.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339)
at 
oadd.io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324)
at 
oadd.io.netty.channel.ChannelInboundHandlerAdapter.channelRead(ChannelInboundHandlerAdapter.java:86)
at 
oadd.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339)

[jira] [Created] (DRILL-5404) kvgen function only supports Simple maps as input

2017-03-30 Thread Rahul Challapalli (JIRA)

Rahul Challapalli created DRILL-5404:


 Summary: kvgen function only supports Simple maps as input
 Key: DRILL-5404
 URL: https://issues.apache.org/jira/browse/DRILL-5404
 Project: Apache Drill
  Issue Type: Bug
  Components: Storage - JSON
Affects Versions: 1.10.0
Reporter: Rahul Challapalli


git.commit.id.abbrev=38ef562

The below query did not fail when I ran it in isolation. However when I ran the 
test suite at [1], which also contains the below query, by using 50 threads 
submitting queries concurrently, I hit the below error. 
{code}
select boolcol, bigintegercol, varcharcol, kvgen(bigintegercol), 
kvgen(boolcol), kvgen(varcharcol) from `json_kvgenflatten/kvgen1.json`
Failed with exception
java.sql.SQLException: SYSTEM ERROR: DrillRuntimeException: kvgen function only 
supports Simple maps as input

Fragment 0:0

[Error Id: 953541c2-cf67-4d29-8d1c-ac3ff3c18f1f on qa-node182.qa.lab:31010]

  (org.apache.drill.common.exceptions.DrillRuntimeException) kvgen function 
only supports Simple maps as input
org.apache.drill.exec.expr.fn.impl.MappifyUtility.mappify():46
org.apache.drill.exec.test.generated.ProjectorGen10361.doEval():45
org.apache.drill.exec.test.generated.ProjectorGen10361.projectRecords():67
org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.doWork():199
org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext():93

org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext():135
org.apache.drill.exec.record.AbstractRecordBatch.next():162

org.apache.drill.exec.physical.impl.validate.IteratorValidatorBatchIterator.next():215
org.apache.drill.exec.record.AbstractRecordBatch.next():119
org.apache.drill.exec.record.AbstractRecordBatch.next():109
org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext():51

org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext():135
org.apache.drill.exec.record.AbstractRecordBatch.next():162

org.apache.drill.exec.physical.impl.validate.IteratorValidatorBatchIterator.next():215
org.apache.drill.exec.physical.impl.BaseRootExec.next():104
org.apache.drill.exec.physical.impl.ScreenCreator$ScreenRoot.innerNext():81
org.apache.drill.exec.physical.impl.BaseRootExec.next():94
org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():232
org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():226
java.security.AccessController.doPrivileged():-2
javax.security.auth.Subject.doAs():422
org.apache.hadoop.security.UserGroupInformation.doAs():1595
org.apache.drill.exec.work.fragment.FragmentExecutor.run():226
org.apache.drill.common.SelfCleaningRunnable.run():38
java.util.concurrent.ThreadPoolExecutor.runWorker():1142
java.util.concurrent.ThreadPoolExecutor$Worker.run():617
java.lang.Thread.run():745

at 
org.apache.drill.jdbc.impl.DrillCursor.nextRowInternally(DrillCursor.java:489)
at 
org.apache.drill.jdbc.impl.DrillCursor.loadInitialSchema(DrillCursor.java:561)
at 
org.apache.drill.jdbc.impl.DrillResultSetImpl.execute(DrillResultSetImpl.java:1895)
at 
org.apache.drill.jdbc.impl.DrillResultSetImpl.execute(DrillResultSetImpl.java:61)
at 
oadd.org.apache.calcite.avatica.AvaticaConnection$1.execute(AvaticaConnection.java:473)
at 
org.apache.drill.jdbc.impl.DrillMetaImpl.prepareAndExecute(DrillMetaImpl.java:1100)
at 
oadd.org.apache.calcite.avatica.AvaticaConnection.prepareAndExecuteInternal(AvaticaConnection.java:477)
at 
org.apache.drill.jdbc.impl.DrillConnectionImpl.prepareAndExecuteInternal(DrillConnectionImpl.java:180)
at 
oadd.org.apache.calcite.avatica.AvaticaStatement.executeInternal(AvaticaStatement.java:109)
at 
oadd.org.apache.calcite.avatica.AvaticaStatement.executeQuery(AvaticaStatement.java:130)
at 
org.apache.drill.jdbc.impl.DrillStatementImpl.executeQuery(DrillStatementImpl.java:112)
at 
org.apache.drill.test.framework.DrillTestJdbc.executeQuery(DrillTestJdbc.java:177)
at 
org.apache.drill.test.framework.DrillTestJdbc.run(DrillTestJdbc.java:101)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: oadd.org.apache.drill.common.exceptions.UserRemoteException: SYSTEM 
ERROR: DrillRuntimeException: kvgen function only supports Simple maps as input

Fragment 0:0

[Error Id: 953541c2-cf67-4d29-8d1c-ac3ff3c18f1f on qa-node182.qa.lab:31010]

  (org.apache.drill.common.exceptions.DrillRuntimeException) kvgen function 
only

[jira] [Created] (DRILL-5400) Random IndexOutOfBoundsException when running a kvgen query on top of nested json data

2017-03-29 Thread Rahul Challapalli (JIRA)

Rahul Challapalli created DRILL-5400:


 Summary: Random IndexOutOfBoundsException when running a kvgen 
query on top of nested json data
 Key: DRILL-5400
 URL: https://issues.apache.org/jira/browse/DRILL-5400
 Project: Apache Drill
  Issue Type: Bug
  Components: Storage - JSON
Affects Versions: 1.10.0
Reporter: Rahul Challapalli


git.commit.id.abbrev=38ef562
The below query did not fail when I ran it in isolation. However when I ran the 
test suite at [1], which also contains the below query, by using 50 threads 
submitting queries concurrently, I hit the below error.
{code}
select geo.features[0].location.bldgs, kvgen(geo.features[0].location.bldgs) 
from `json_kvgenflatten/nested.json` geo
Failed with exception
java.sql.SQLException: SYSTEM ERROR: IndexOutOfBoundsException: index: 0, 
length: 4 (expected: range(0, 0))

Fragment 0:0

[Error Id: 9bf434d1-2199-498d-b0a5-b487bbc7690b on qa-node182.qa.lab:31010]

  (java.lang.IndexOutOfBoundsException) index: 0, length: 4 (expected: range(0, 
0))
io.netty.buffer.DrillBuf.checkIndexD():123
io.netty.buffer.DrillBuf.chk():147
io.netty.buffer.DrillBuf.getInt():520
org.apache.drill.exec.vector.UInt4Vector$Accessor.get():358
org.apache.drill.exec.vector.VarCharVector$Mutator.setSafe():534
org.apache.drill.exec.vector.NullableVarCharVector$Mutator.fillEmpties():480

org.apache.drill.exec.vector.NullableVarCharVector$Mutator.setValueCount():591
org.apache.drill.exec.vector.complex.MapVector$Mutator.setValueCount():346

org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.setValueCount():273
org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.doWork():206
org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext():93

org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext():135
org.apache.drill.exec.record.AbstractRecordBatch.next():162

org.apache.drill.exec.physical.impl.validate.IteratorValidatorBatchIterator.next():215
org.apache.drill.exec.physical.impl.BaseRootExec.next():104
org.apache.drill.exec.physical.impl.ScreenCreator$ScreenRoot.innerNext():81
org.apache.drill.exec.physical.impl.BaseRootExec.next():94
org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():232
org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():226
java.security.AccessController.doPrivileged():-2
javax.security.auth.Subject.doAs():422
org.apache.hadoop.security.UserGroupInformation.doAs():1595
org.apache.drill.exec.work.fragment.FragmentExecutor.run():226
org.apache.drill.common.SelfCleaningRunnable.run():38
java.util.concurrent.ThreadPoolExecutor.runWorker():1142
java.util.concurrent.ThreadPoolExecutor$Worker.run():617
java.lang.Thread.run():745

at 
org.apache.drill.jdbc.impl.DrillCursor.nextRowInternally(DrillCursor.java:489)
at 
org.apache.drill.jdbc.impl.DrillCursor.loadInitialSchema(DrillCursor.java:561)
at 
org.apache.drill.jdbc.impl.DrillResultSetImpl.execute(DrillResultSetImpl.java:1895)
at 
org.apache.drill.jdbc.impl.DrillResultSetImpl.execute(DrillResultSetImpl.java:61)
at 
oadd.org.apache.calcite.avatica.AvaticaConnection$1.execute(AvaticaConnection.java:473)
at 
org.apache.drill.jdbc.impl.DrillMetaImpl.prepareAndExecute(DrillMetaImpl.java:1100)
at 
oadd.org.apache.calcite.avatica.AvaticaConnection.prepareAndExecuteInternal(AvaticaConnection.java:477)
at 
org.apache.drill.jdbc.impl.DrillConnectionImpl.prepareAndExecuteInternal(DrillConnectionImpl.java:180)
at 
oadd.org.apache.calcite.avatica.AvaticaStatement.executeInternal(AvaticaStatement.java:109)
at 
oadd.org.apache.calcite.avatica.AvaticaStatement.executeQuery(AvaticaStatement.java:130)
at 
org.apache.drill.jdbc.impl.DrillStatementImpl.executeQuery(DrillStatementImpl.java:112)
at 
org.apache.drill.test.framework.DrillTestJdbc.executeQuery(DrillTestJdbc.java:177)
at 
org.apache.drill.test.framework.DrillTestJdbc.run(DrillTestJdbc.java:101)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: oadd.org.apache.drill.common.exceptions.UserRemoteException: SYSTEM 
ERROR: IndexOutOfBoundsException: index: 0, length: 4 (expected: range(0, 0))

Fragment 0:0

[Error Id: 9bf434d1-2199-498d-b0a5-b487bbc7690b on qa-node182.qa.lab:31010]

  (java.lang.IndexOutOfBoundsException) index: 0, length: 4 (expected: range(0, 
0))
io.netty.buffer.DrillBuf.checkIndexD():123
io.netty.buffer.DrillBuf.chk

[jira] [Created] (DRILL-5399) Random Error : Flatten does not support inputs of non-list values.

2017-03-29 Thread Rahul Challapalli (JIRA)

Rahul Challapalli created DRILL-5399:


 Summary: Random Error : Flatten does not support inputs of 
non-list values.
 Key: DRILL-5399
 URL: https://issues.apache.org/jira/browse/DRILL-5399
 Project: Apache Drill
  Issue Type: Bug
  Components: Execution - Data Types, Storage - JSON
Affects Versions: 1.10.0
Reporter: Rahul Challapalli


git.commit.id.abbrev=38ef562

The below query did not fail when I ran it in isolation. However when I ran the 
test suite at [1], which also contains the below query, by using 50 threads 
submitting queries concurrently, I hit the below error.
{code}
select flatten(sub.fk.`value`) from (select flatten(kvgen(map)) fk from 
`json_kvgenflatten/nested3.json`) sub
Failed with exception
java.sql.SQLException: UNSUPPORTED_OPERATION ERROR: Flatten does not support 
inputs of non-list values.

Fragment 0:0

[Error Id: 90026283-0b95-4bda-948e-54ed57a62edf on qa-node183.qa.lab:31010]


at 
org.apache.drill.jdbc.impl.DrillCursor.nextRowInternally(DrillCursor.java:489)
at 
org.apache.drill.jdbc.impl.DrillCursor.loadInitialSchema(DrillCursor.java:561)
at 
org.apache.drill.jdbc.impl.DrillResultSetImpl.execute(DrillResultSetImpl.java:1895)
at 
org.apache.drill.jdbc.impl.DrillResultSetImpl.execute(DrillResultSetImpl.java:61)
at 
oadd.org.apache.calcite.avatica.AvaticaConnection$1.execute(AvaticaConnection.java:473)
at 
org.apache.drill.jdbc.impl.DrillMetaImpl.prepareAndExecute(DrillMetaImpl.java:1100)
at 
oadd.org.apache.calcite.avatica.AvaticaConnection.prepareAndExecuteInternal(AvaticaConnection.java:477)
at 
org.apache.drill.jdbc.impl.DrillConnectionImpl.prepareAndExecuteInternal(DrillConnectionImpl.java:180)
at 
oadd.org.apache.calcite.avatica.AvaticaStatement.executeInternal(AvaticaStatement.java:109)
at 
oadd.org.apache.calcite.avatica.AvaticaStatement.executeQuery(AvaticaStatement.java:130)
at 
org.apache.drill.jdbc.impl.DrillStatementImpl.executeQuery(DrillStatementImpl.java:112)
at 
org.apache.drill.test.framework.DrillTestJdbc.executeQuery(DrillTestJdbc.java:177)
at 
org.apache.drill.test.framework.DrillTestJdbc.run(DrillTestJdbc.java:101)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: oadd.org.apache.drill.common.exceptions.UserRemoteException: 
UNSUPPORTED_OPERATION ERROR: Flatten does not support inputs of non-list values.

Fragment 0:0

[Error Id: 90026283-0b95-4bda-948e-54ed57a62edf on qa-node183.qa.lab:31010]


at 
oadd.org.apache.drill.exec.rpc.user.QueryResultHandler.resultArrived(QueryResultHandler.java:123)
at 
oadd.org.apache.drill.exec.rpc.user.UserClient.handle(UserClient.java:343)
at 
oadd.org.apache.drill.exec.rpc.user.UserClient.handle(UserClient.java:88)
at 
oadd.org.apache.drill.exec.rpc.RpcBus$InboundHandler.decode(RpcBus.java:274)
at 
oadd.org.apache.drill.exec.rpc.RpcBus$InboundHandler.decode(RpcBus.java:244)
at 
oadd.io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:89)
at 
oadd.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339)
at 
oadd.io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324)
at 
oadd.io.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:254)
at 
oadd.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339)
at 
oadd.io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324)
at 
oadd.io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:103)
at 
oadd.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339)
at 
oadd.io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324)
at 
oadd.io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:242)
at 
oadd.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339)
at 
oadd.io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324)
at 
oadd.io.netty.channel.ChannelInboundHandlerAdapter.channelRead(ChannelInboundHandlerAdapter.java:86)
at

[jira] [Created] (DRILL-5398) Memory Allocator randomly throws an IllegalStateException when reading nested json data

2017-03-29 Thread Rahul Challapalli (JIRA)

Rahul Challapalli created DRILL-5398:


 Summary: Memory Allocator randomly throws an IllegalStateException 
when reading nested json data
 Key: DRILL-5398
 URL: https://issues.apache.org/jira/browse/DRILL-5398
 Project: Apache Drill
  Issue Type: Bug
  Components: Execution - Data Types, Storage - JSON
Affects Versions: 1.10.0
Reporter: Rahul Challapalli


git.commit.id.abbrev=38ef562

The below query did not fail when I ran it in isolation. However when I ran the 
test suite at [1], which also contains the below query, by using 50 threads 
submitting queries concurrently, I hit the below error. 

{code}
select kvgen(bigintegercol), kvgen(float8col) from 
`json_kvgenflatten/kvgen1.json`
Failed with exception
java.sql.SQLException: SYSTEM ERROR: IllegalStateException: 
Allocator[op:0:0:2:Project] closed with outstanding buffers allocated (6).
Allocator(op:0:0:2:Project) 100/110592/434176/100 
(res/actual/peak/limit)
  child allocators: 0
  ledgers: 6
ledger[2747] allocator: op:0:0:2:Project), isOwning: true, size: 32768, 
references: 1, life: 13372865831733586..0, allocatorManager: [694, life: 
13372864956215117..0] holds 1 buffers. 
DrillBuf[3165], udle: [694 0..32768]
ledger[2756] allocator: op:0:0:2:Project), isOwning: true, size: 4096, 
references: 1, life: 13372865832378236..0, allocatorManager: [702, life: 
13372864958396200..0] holds 1 buffers. 
DrillBuf[3176], udle: [703 0..4096]
ledger[2775] allocator: op:0:0:2:Project), isOwning: true, size: 32768, 
references: 1, life: 13372865833922897..0, allocatorManager: [706, life: 
13372864959861722..0] holds 1 buffers. 
DrillBuf[3196], udle: [707 0..32768]
ledger[2761] allocator: op:0:0:2:Project), isOwning: true, size: 32768, 
references: 1, life: 13372865833009204..0, allocatorManager: [700, life: 
13372864957931156..0] holds 1 buffers. 
DrillBuf[3181], udle: [701 0..32768]
ledger[2769] allocator: op:0:0:2:Project), isOwning: true, size: 4096, 
references: 1, life: 13372865833502205..0, allocatorManager: [708, life: 
13372864960352194..0] holds 1 buffers. 
DrillBuf[3189], udle: [709 0..4096]
ledger[2741] allocator: op:0:0:2:Project), isOwning: true, size: 4096, 
references: 1, life: 13372865831092092..0, allocatorManager: [696, life: 
13372864956764681..0] holds 1 buffers. 
DrillBuf[3160], udle: [697 0..4096]
  reservations: 0


Fragment 0:0

[Error Id: f8d274e2-8119-495c-8a38-017c834f9931 on qa-node183.qa.lab:31010]

  (java.lang.IllegalStateException) Allocator[op:0:0:2:Project] closed with 
outstanding buffers allocated (6).
Allocator(op:0:0:2:Project) 100/110592/434176/100 
(res/actual/peak/limit)
  child allocators: 0
  ledgers: 6
ledger[2747] allocator: op:0:0:2:Project), isOwning: true, size: 32768, 
references: 1, life: 13372865831733586..0, allocatorManager: [694, life: 
13372864956215117..0] holds 1 buffers. 
DrillBuf[3165], udle: [694 0..32768]
ledger[2756] allocator: op:0:0:2:Project), isOwning: true, size: 4096, 
references: 1, life: 13372865832378236..0, allocatorManager: [702, life: 
13372864958396200..0] holds 1 buffers. 
DrillBuf[3176], udle: [703 0..4096]
ledger[2775] allocator: op:0:0:2:Project), isOwning: true, size: 32768, 
references: 1, life: 13372865833922897..0, allocatorManager: [706, life: 
13372864959861722..0] holds 1 buffers. 
DrillBuf[3196], udle: [707 0..32768]
ledger[2761] allocator: op:0:0:2:Project), isOwning: true, size: 32768, 
references: 1, life: 13372865833009204..0, allocatorManager: [700, life: 
13372864957931156..0] holds 1 buffers. 
DrillBuf[3181], udle: [701 0..32768]
ledger[2769] allocator: op:0:0:2:Project), isOwning: true, size: 4096, 
references: 1, life: 13372865833502205..0, allocatorManager: [708, life: 
13372864960352194..0] holds 1 buffers. 
DrillBuf[3189], udle: [709 0..4096]
ledger[2741] allocator: op:0:0:2:Project), isOwning: true, size: 4096, 
references: 1, life: 13372865831092092..0, allocatorManager: [696, life: 
13372864956764681..0] holds 1 buffers. 
DrillBuf[3160], udle: [697 0..4096]
  reservations: 0

org.apache.drill.exec.memory.BaseAllocator.close():486
org.apache.drill.exec.ops.OperatorContextImpl.close():149
org.apache.drill.exec.ops.FragmentContext.suppressingClose():422
org.apache.drill.exec.ops.FragmentContext.close():411
org.apache.drill.exec.work.fragment.FragmentExecutor.closeOutResources():318
org.apache.drill.exec.work.fragment.FragmentExecutor.cleanup():155
org.apache.drill.exec.work.fragment.FragmentExecutor.run():262
org.apache.drill.common.SelfCleaningRunnable.run():38
java.util.concurrent.ThreadPoolExecutor.runWorker():1142
java.util.concurrent.ThreadPoolExecutor$Worker.run():617
java.lang.Thread.run():745

at

[jira] [Created] (DRILL-5397) Random Error : Unable to get holder type for minor type [LATE] and mode [OPTIONAL]

2017-03-29 Thread Rahul Challapalli (JIRA)

Rahul Challapalli created DRILL-5397:


 Summary: Random Error : Unable to get holder type for minor type 
[LATE] and mode [OPTIONAL]
 Key: DRILL-5397
 URL: https://issues.apache.org/jira/browse/DRILL-5397
 Project: Apache Drill
  Issue Type: Bug
  Components: Execution - Data Types, Storage - JSON
Affects Versions: 1.10.0
Reporter: Rahul Challapalli


git.commit.id.abbrev=38ef562

The below query did not fail when running sequentially. However when I ran the 
test suite at [1], which contains the below query, by using 50 threads 
submitting queries concurrently, I hit the below error

{code}
select kvgen(bldgs[0]) from (select kvgen(geo.features[0].location.bldgs) bldgs 
from `json_kvgenflatten/nested.json` geo)
Failed with exception
java.sql.SQLException: SYSTEM ERROR: UnsupportedOperationException: Unable to 
get holder type for minor type [LATE] and mode [OPTIONAL]

Fragment 0:0

[Error Id: 67223a94-b24b-4bde-a87a-743b093b23a6 on qa-node183.qa.lab:31010]

  (java.lang.UnsupportedOperationException) Unable to get holder type for minor 
type [LATE] and mode [OPTIONAL]
org.apache.drill.exec.expr.TypeHelper.getHolderType():602
org.apache.drill.exec.expr.ClassGenerator.getHolderType():666
org.apache.drill.exec.expr.ClassGenerator.declare():368
org.apache.drill.exec.expr.ClassGenerator.declare():364
org.apache.drill.exec.expr.EvaluationVisitor$EvalVisitor.visitUnknown():349

org.apache.drill.exec.expr.EvaluationVisitor$ConstantFilter.visitUnknown():1320
org.apache.drill.exec.expr.EvaluationVisitor$CSEFilter.visitUnknown():1026
org.apache.drill.exec.expr.EvaluationVisitor$CSEFilter.visitUnknown():795

org.apache.drill.common.expression.visitors.AbstractExprVisitor.visitNullConstant():162

org.apache.drill.exec.expr.EvaluationVisitor$CSEFilter.visitNullConstant():1003

org.apache.drill.exec.expr.EvaluationVisitor$CSEFilter.visitNullConstant():795
org.apache.drill.common.expression.TypedNullConstant.accept():46

org.apache.drill.exec.expr.EvaluationVisitor$EvalVisitor.visitFunctionHolderExpression():193

org.apache.drill.exec.expr.EvaluationVisitor$ConstantFilter.visitFunctionHolderExpression():1077

org.apache.drill.exec.expr.EvaluationVisitor$CSEFilter.visitFunctionHolderExpression():815

org.apache.drill.exec.expr.EvaluationVisitor$CSEFilter.visitFunctionHolderExpression():795
org.apache.drill.common.expression.FunctionHolderExpression.accept():47
org.apache.drill.exec.expr.EvaluationVisitor.addExpr():104
org.apache.drill.exec.expr.ClassGenerator.addExpr():261

org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.setupNewSchema():458
org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext():78

org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext():135
org.apache.drill.exec.record.AbstractRecordBatch.next():162

org.apache.drill.exec.physical.impl.validate.IteratorValidatorBatchIterator.next():215
org.apache.drill.exec.physical.impl.BaseRootExec.next():104
org.apache.drill.exec.physical.impl.ScreenCreator$ScreenRoot.innerNext():81
org.apache.drill.exec.physical.impl.BaseRootExec.next():94
org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():232
org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():226
java.security.AccessController.doPrivileged():-2
javax.security.auth.Subject.doAs():422
org.apache.hadoop.security.UserGroupInformation.doAs():1595
org.apache.drill.exec.work.fragment.FragmentExecutor.run():226
org.apache.drill.common.SelfCleaningRunnable.run():38
java.util.concurrent.ThreadPoolExecutor.runWorker():1142
java.util.concurrent.ThreadPoolExecutor$Worker.run():617
java.lang.Thread.run():745

at 
org.apache.drill.jdbc.impl.DrillCursor.nextRowInternally(DrillCursor.java:489)
at 
org.apache.drill.jdbc.impl.DrillCursor.loadInitialSchema(DrillCursor.java:561)
at 
org.apache.drill.jdbc.impl.DrillResultSetImpl.execute(DrillResultSetImpl.java:1895)
at 
org.apache.drill.jdbc.impl.DrillResultSetImpl.execute(DrillResultSetImpl.java:61)
at 
oadd.org.apache.calcite.avatica.AvaticaConnection$1.execute(AvaticaConnection.java:473)
at 
org.apache.drill.jdbc.impl.DrillMetaImpl.prepareAndExecute(DrillMetaImpl.java:1100)
at 
oadd.org.apache.calcite.avatica.AvaticaConnection.prepareAndExecuteInternal(AvaticaConnection.java:477)
at 
org.apache.drill.jdbc.impl.DrillConnectionImpl.prepareAndExecuteInternal(DrillConnectionImpl.java:180)
at 
oadd.org.apache.calcite.avatica.AvaticaStatement.executeInternal(AvaticaStatement.java:109)
at 
oadd.org.apache.calcite.avatica.AvaticaStatement.executeQuery(AvaticaStatement.java:130)
at 
org.apache.drill.jdbc.impl.DrillStatementImpl.executeQuery(DrillStatementImpl.java:112

[jira] [Created] (DRILL-5396) A flatten query on top of 2 files with one record each causes oversize allocation error randomly

2017-03-29 Thread Rahul Challapalli (JIRA)

Rahul Challapalli created DRILL-5396:


 Summary: A flatten query on top of 2 files with one record each 
causes oversize allocation error randomly
 Key: DRILL-5396
 URL: https://issues.apache.org/jira/browse/DRILL-5396
 Project: Apache Drill
  Issue Type: Bug
  Components: Execution - Data Types, Storage - JSON
Reporter: Rahul Challapalli


As part of verifying DRILL-3562, I came up with the below 2 files

File 1:
{code}
{ "a": { "b": { "c": [] } } }
{code}

File 2:
{code}
{ "a": { "b": { "c": [1] } } }
{code}

Now the below query work on individual files, however when I run the query on a 
directory containing both the files, I randomly hit the below error
{code}
select FLATTEN(t.a.b.c) AS c from 
dfs.`/drill/testdata/json_kvgenflatten/drill3562` t;
Error: SYSTEM ERROR: OversizedAllocationException: Unable to expand the buffer. 
Max allowed buffer size is reached.

Fragment 0:0

[Error Id: e556243b-7ad6-4131-81f7-e8b225c9c8bc on qa-node182.qa.lab:31010] 
(state=,code=0)
{code}

This could be related to the fix for DRILL-3562



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Submitting physical plans

2017-03-29 Thread rahul challapalli

Drillers,

I know sometime in the past folks were using/discussing the ability to
submit physical plans to drill. Any pointers on how to do that now?

- Rahul

[jira] [Created] (DRILL-5390) Casting as decimal does not make drill use the decimal value vector

2017-03-27 Thread Rahul Challapalli (JIRA)

Rahul Challapalli created DRILL-5390:


 Summary: Casting as decimal does not make drill use the decimal 
value vector
 Key: DRILL-5390
 URL: https://issues.apache.org/jira/browse/DRILL-5390
 Project: Apache Drill
  Issue Type: Bug
Affects Versions: 1.11.0
Reporter: Rahul Challapalli



The below query should be using the decimal value vector. However it looks like 
it is using the float vector. If we feed the output of the below query to a 
CTAS statement then the parquet file created has a double type instead of a 
decimal type

{code}
alter session set `planner.enable_decimal_data_type` = true;
+---++
|  ok   |  summary   |
+---++
| true  | planner.enable_decimal_data_type updated.  |
+---++
1 row selected (0.39 seconds)
0: jdbc:drill:zk=10.10.100.190:5181> select typeof(col2) from (select 1 as 
col1, cast(2.0 as decimal(9,2)) as col2, cast(3.0 as decimal(9,2)) as col3 from 
cp.`tpch/lineitem.parquet` limit 1) d;
+-+
| EXPR$0  |
+-+
| FLOAT8  |
+-+
{code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Created] (DRILL-5380) Document the usage of drill's parquet "date auto correction" flag

2017-03-24 Thread Rahul Challapalli (JIRA)

Rahul Challapalli created DRILL-5380:


 Summary: Document the usage of drill's parquet "date auto 
correction" flag
 Key: DRILL-5380
 URL: https://issues.apache.org/jira/browse/DRILL-5380
 Project: Apache Drill
  Issue Type: Bug
  Components: Documentation, Storage - Parquet
    Reporter: Rahul Challapalli
Assignee: Bridget Bevens


Drill used a wrong format for storing dates in parquet  before 1.8.0 release 
and as a result it had compatibility issues with other parquet reader/writer 
tools. DRILL-4203 fixes that issue by providing an auto-correction capability 
in drill's parquet reader. However if someone really intends to use dates, 
which drill thinks are wrong, then we can use the approach to disable 
auto-correction by drill

{code}
select l_shipdate from table(cp.`tpch/lineitem.parquet` (type => 'parquet', 
autoCorrectCorruptDates => false));
{code}

This needs to be documented



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Created] (DRILL-5377) Drill returns weird characters when parquet date auto-correction is turned off

2017-03-23 Thread Rahul Challapalli (JIRA)

Rahul Challapalli created DRILL-5377:


 Summary: Drill returns weird characters when parquet date 
auto-correction is turned off
 Key: DRILL-5377
 URL: https://issues.apache.org/jira/browse/DRILL-5377
 Project: Apache Drill
  Issue Type: Bug
  Components: Storage - Parquet
Affects Versions: 1.10.0
Reporter: Rahul Challapalli


git.commit.id.abbrev=38ef562

Below is the output, I get from test framework when I disable auto correction 
for date fields
{code}
select l_shipdate from table(cp.`tpch/lineitem.parquet` (type => 'parquet', 
autoCorrectCorruptDates => false)) order by l_shipdate limit 10;

^@356-03-19
^@356-03-21
^@356-03-21
^@356-03-23
^@356-03-24
^@356-03-24
^@356-03-26
^@356-03-26
^@356-03-26
^@356-03-26
{code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Resolved] (DRILL-5001) Join only supports implicit casts error even when I have explicit cast

2017-03-22 Thread Rahul Challapalli (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-5001?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rahul Challapalli resolved DRILL-5001.
--
Resolution: Not A Bug

Ok...this is not a bug. The underlying parquet data actually contained a 
varchar type where I wrongly assumed its a date type

> Join only supports implicit casts error even when I have explicit cast
> --
>
> Key: DRILL-5001
> URL: https://issues.apache.org/jira/browse/DRILL-5001
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Query Planning & Optimization
>Reporter: Rahul Challapalli
> Attachments: error.log, fewtypes_null_large.tgz
>
>
> git.commit.id.abbrev=190d5d4
> The below query fails even when I had an explicit cast on the right hand side 
> of the join condition. The data also contains a metadata cache
> {code}
> select
>   a.int_col,
>   b.date_col 
> from
>   dfs. `/ drill / testdata / parquet_date / metadata_cache / mixed / 
> fewtypes_null_large ` a 
>   inner join
> (
>   select
> * 
>   from
> dfs. `/ drill / testdata / parquet_date / metadata_cache / mixed / 
> fewtypes_null_large ` 
>   where
> dir0 = '1.2' 
> and date_col > '1996-03-07' 
> )
> b 
> on a.date_col = cast(date_add(b.date_col, 5) as date) 
> where
>   a.int_col = 7 
>   and a.dir0 = '1.9' 
> group by
>   a.int_col,
>   b.date_col;
> Error: SYSTEM ERROR: DrillRuntimeException: Join only supports implicit casts 
> between 1. Numeric data
>  2. Varchar, Varbinary data 3. Date, Timestamp data Left type: DATE, Right 
> type: VARCHAR. Add explicit casts to avoid this error
> Fragment 2:0
> [Error Id: a1b26420-af35-4892-9a87-d9b04e4423dc on qa-node190.qa.lab:31010] 
> (state=,code=0)
> {code}
> I attached the data and the log file.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Re: Having some trouble locating a file referenced in the Advanced Regression tests

2017-03-16 Thread rahul challapalli

I already did :)

On Thu, Mar 16, 2017 at 5:31 PM, Aman Sinha  wrote:

> I am guessing Rahul Chalapalli might have created that data file.  Rahul,
> can you comment ?
>
> -Aman
>
> On 3/16/17, 11:57 AM, "Jason Altekruse"  wrote:
>
> Hey Drillers,
>
> I am working to set up a test environment to run the Advanced
> Regression
> suites. I have been successful getting most of the tests running, but
> I am
> unable to locate the file "widestrings" referenced by the tests in the
> Advanced/data-shapes/wide-columns/5000/10rows/parquet suite. It
> does
> not appear to be in the list of files available on S3 specified in the
> framework pom.xml file. This test suite also does not declare any
> necessary
> data preparation step in its test description JSON file.
>
> I do see that there is a bash file under
> resources/Datasources/data-shapes/wide-strings.sh, but this is
> producing a
> json file, not a parquet file and is not referenced as a data-prep
> prerequisite for any of the tests.
>
> Any help tracking down the file, or a description of the process
> necessary
> to re-create the file would be appreciated.
>
> Thanks,
> Jason
>
>
>

Re: Having some trouble locating a file referenced in the Advanced Regression tests

2017-03-16 Thread rahul challapalli

Hmmsomehow that entry is missing from the pom file. I went back in the
git history and found the below link[1]. I was able to download it. Let me
know if you have any problems.

[1] http://apache-drill.s3.amazonaws.com/files/widestrings-10rows.tgz

- Rahul

On Thu, Mar 16, 2017 at 11:57 AM, Jason Altekruse 
wrote:

> Hey Drillers,
>
> I am working to set up a test environment to run the Advanced Regression
> suites. I have been successful getting most of the tests running, but I am
> unable to locate the file "widestrings" referenced by the tests in the
> Advanced/data-shapes/wide-columns/5000/10rows/parquet suite. It does
> not appear to be in the list of files available on S3 specified in the
> framework pom.xml file. This test suite also does not declare any necessary
> data preparation step in its test description JSON file.
>
> I do see that there is a bash file under
> resources/Datasources/data-shapes/wide-strings.sh, but this is producing a
> json file, not a parquet file and is not referenced as a data-prep
> prerequisite for any of the tests.
>
> Any help tracking down the file, or a description of the process necessary
> to re-create the file would be appreciated.
>
> Thanks,
> Jason
>

[jira] [Resolved] (DRILL-4024) CTAS with auto partition gives an NPE when the partition column has null values in it

2017-03-14 Thread Rahul Challapalli (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-4024?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rahul Challapalli resolved DRILL-4024.
--
Resolution: Duplicate

> CTAS with auto partition gives an NPE when the partition column has null 
> values in it
> -
>
> Key: DRILL-4024
> URL: https://issues.apache.org/jira/browse/DRILL-4024
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Writer
>    Reporter: Rahul Challapalli
> Attachments: error.log, fewtypes_null.parquet
>
>
> git.commit.id.abbrev=522eb81
> The data set used contains null values in the partition column. This causes 
> the below query to fail with an NPE
> {code}
> create table fewtypesnull_varcharpartition partition by (varchar_col) as 
> select * from dfs.`/drill/testdata/cross-sources/fewtypes_null.parquet`;
> Error: SYSTEM ERROR: NullPointerException
> Fragment 0:0
> [Error Id: 6ef352c0-a12d-477c-bba8-e4a747a6b78e on qa-node190.qa.lab:31010] 
> (state=,code=0)
> {code}
> I attached the data set and error log. Let me know if more information is 
> needed



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Resolved] (DRILL-4681) ChannelClosedException causes all queries which are communicating on that channel to fail

2017-03-14 Thread Rahul Challapalli (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-4681?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rahul Challapalli resolved DRILL-4681.
--
Resolution: Not A Bug

> ChannelClosedException causes all queries which are communicating on that 
> channel to fail 
> --
>
> Key: DRILL-4681
> URL: https://issues.apache.org/jira/browse/DRILL-4681
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Flow, Execution - RPC
>Affects Versions: 1.7.0
>    Reporter: Rahul Challapalli
>
> commit # : 09b262776e965ea17a6a863801f7e1ee3e5b3d5a
> Below is what I am describing:
> 1. One of the fragments cause a channel closed exception (due to an OOM 
> or some other condition)
> 2. Drill fails all other fragments which are running at that time even 
> though the fragments themselves eventually run to completion. At a high 
> concurrency this could lead a lot of query failures.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Resolved] (DRILL-4318) Drill hangs with a malformed Tpcds query

2017-03-14 Thread Rahul Challapalli (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-4318?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rahul Challapalli resolved DRILL-4318.
--
Resolution: Cannot Reproduce

> Drill hangs with a malformed Tpcds query
> 
>
> Key: DRILL-4318
> URL: https://issues.apache.org/jira/browse/DRILL-4318
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Query Planning & Optimization
>Affects Versions: 1.5.0
>    Reporter: Rahul Challapalli
> Attachments: error.log
>
>
> git.commit.id.abbrev=3d0b4b0
> The below query never returns anything. The query is malformed as the columns 
> in subquery are im-properly qualified.
> {code}
> SELECT i.i_manufact_id,
> Sum(ss.ss_ext_sales_price) total_sales
>  FROM   store_sales ss,
> date_dim dd,
> customer_address ca,
> item i
>  WHERE  i.i_manufact_id IN (SELECT i.i_manufact_id
>   FROM   item
>   WHERE  i.i_category IN ( 'Books' ))
> AND ss.ss_item_sk = i.i_item_sk
> AND ss.ss_sold_date_sk = dd.d_date_sk
> AND dd.d_year = 1999
> AND dd.d_moy = 3
> AND ss.ss_addr_sk = ca.ca_address.ss_sk
> AND ca.ca_gmt_offset = -5
>  GROUP  BY i.i_manufact_id
> {code}
> Below is the information from the logs
> {code}
> 2016-01-28 00:45:17,997 [29569b62-73b4-f270-98db-7eedf1803895:foreman] INFO  
> o.a.drill.exec.work.foreman.Foreman - Query text for query id 
> 29569b62-73b4-f270-98db-7eedf1803895: SELECT i.i_manufact_id,
> Sum(ss.ss_ext_sales_price) total_sales
>  FROM   store_sales ss,
> date_dim dd,
> customer_address ca,
> item i
>  WHERE  i.i_manufact_id IN (SELECT i.i_manufact_id
>   FROM   item
>   WHERE  i.i_category IN ( 'Books' ))
> AND ss.ss_item_sk = i.i_item_sk
> AND ss.ss_sold_date_sk = dd.d_date_sk
> AND dd.d_year = 1999
> AND dd.d_moy = 3
> AND ss.ss_addr_sk = ca.ca_address.ss_sk
> AND ca.ca_gmt_offset = -5
>  GROUP  BY i.i_manufact_id
> 2016-01-28 00:45:18,616 [29569b62-73b4-f270-98db-7eedf1803895:foreman] INFO  
> o.a.d.exec.store.parquet.Metadata - Took 0 ms to get file statuses
> 2016-01-28 00:45:18,720 [29569b62-73b4-f270-98db-7eedf1803895:foreman] INFO  
> o.a.d.exec.store.parquet.Metadata - Fetch parquet metadata: Executed 1 out of 
> 1 using 1 threads. Time: 101ms total, 101.682269ms avg, 101ms max.
> 2016-01-28 00:45:18,721 [29569b62-73b4-f270-98db-7eedf1803895:foreman] INFO  
> o.a.d.exec.store.parquet.Metadata - Fetch parquet metadata: Executed 1 out of 
> 1 using 1 threads. Earliest start: 11.148000 μs, Latest start: 11.148000 μs, 
> Average start: 11.148000 μs .
> 2016-01-28 00:45:18,721 [29569b62-73b4-f270-98db-7eedf1803895:foreman] INFO  
> o.a.d.exec.store.parquet.Metadata - Took 104 ms to read file metadata
> 2016-01-28 00:45:18,909 [29569b62-73b4-f270-98db-7eedf1803895:foreman] INFO  
> o.a.d.exec.store.parquet.Metadata - Took 0 ms to get file statuses
> 2016-01-28 00:45:18,936 [29569b62-73b4-f270-98db-7eedf1803895:foreman] INFO  
> o.a.d.exec.store.parquet.Metadata - Fetch parquet metadata: Executed 2 out of 
> 2 using 2 threads. Time: 25ms total, 21.543874ms avg, 24ms max.
> 2016-01-28 00:45:18,937 [29569b62-73b4-f270-98db-7eedf1803895:foreman] INFO  
> o.a.d.exec.store.parquet.Metadata - Fetch parquet metadata: Executed 2 out of 
> 2 using 2 threads. Earliest start: 1205.628000 μs, Latest start: 1455.366000 
> μs, Average start: 1330.497000 μs .
> 2016-01-28 00:45:18,937 [29569b62-73b4-f270-98db-7eedf1803895:foreman] INFO  
> o.a.d.exec.store.parquet.Metadata - Took 27 ms to read file metadata
> 2016-01-28 00:45:18,943 [29569b62-73b4-f270-98db-7eedf1803895:foreman] INFO  
> o.a.d.exec.store.parquet.Metadata - Took 0 ms to get file statuses
> 2016-01-28 00:45:18,954 [29569b62-73b4-f270-98db-7eedf1803895:foreman] INFO  
> o.a.d.exec.store.parquet.Metadata - Fetch parquet metadata: Executed 1 out of 
> 1 using 1 threads. Time: 10ms total, 10.416335ms avg, 10ms max.
> 2016-01-28 00:45:18,954 [29569b62-73b4-f270-98db-7eedf1803895:foreman] INFO  
> o.a.d.exec.store.parquet.Metadata - Fetch parquet metadata: Executed 1 out of 
> 1 using 1 threads. Earliest start: 1.002000 μs, Latest star

Re: [ANNOUNCE] New Committer: Arina Ielchiieva

2017-02-24 Thread rahul challapalli

Congrats Arina!

On Fri, Feb 24, 2017 at 9:42 AM, Julian Hyde  wrote:

> Congratulations, and welcome!
>
> On Fri, Feb 24, 2017 at 9:17 AM, Abhishek Girish 
> wrote:
> > Congratulations Arina!
> >
> > On Fri, Feb 24, 2017 at 9:06 AM, Sudheesh Katkam 
> > wrote:
> >
> >> The Project Management Committee (PMC) for Apache Drill has invited
> Arina
> >> Ielchiieva to become a committer, and we are pleased to announce that
> she
> >> has accepted.
> >>
> >> Arina has a long list of contributions [1] that have touched many
> aspects
> >> of the product. Her work includes features such as dynamic UDF support
> and
> >> temporary tables support.
> >>
> >> Welcome Arina, and thank you for your contributions.
> >>
> >> - Sudheesh, on behalf of the Apache Drill PMC
> >>
> >> [1] https://github.com/apache/drill/commits/master?author=
> arina-ielchiieva
> >>
>

[jira] [Created] (DRILL-5294) Managed External Sort throws an OOM during the merge and spill phase

2017-02-22 Thread Rahul Challapalli (JIRA)

Rahul Challapalli created DRILL-5294:


 Summary: Managed External Sort throws an OOM during the merge and 
spill phase
 Key: DRILL-5294
 URL: https://issues.apache.org/jira/browse/DRILL-5294
 Project: Apache Drill
  Issue Type: Bug
  Components: Execution - Relational Operators
Reporter: Rahul Challapalli


commit # : 38f816a45924654efd085bf7f1da7d97a4a51e38

The below query fails with managed sort while it succeeds on the old sort
{code}
select * from (select columns[433] col433, columns[0], 
columns[1],columns[2],columns[3],columns[4],columns[5],columns[6],columns[7],columns[8],columns[9],columns[10],columns[11]
 from dfs.`/drill/testdata/resource-manager/3500cols.tbl` order by 
columns[450],columns[330],columns[230],columns[220],columns[110],columns[90],columns[80],columns[70],columns[40],columns[10],columns[20],columns[30],columns[40],columns[50])
 d where d.col433 = 'sjka skjf';
Error: RESOURCE ERROR: External Sort encountered an error while spilling to disk

Fragment 1:11

[Error Id: 0aa20284-cfcc-450f-89b3-645c280f33a4 on qa-node190.qa.lab:31010] 
(state=,code=0)
{code}

Env : 
{code}
No of Drillbits : 1
DRILL_MAX_DIRECT_MEMORY="32G"
DRILL_MAX_HEAP="4G"
{code}

Attached the logs and profile. Data is too large for a jira



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Created] (DRILL-5289) Drill should handle OOM due to insufficient heap type of errors more gracefully

2017-02-22 Thread Rahul Challapalli (JIRA)

Rahul Challapalli created DRILL-5289:


 Summary: Drill should handle OOM due to insufficient heap type of 
errors more gracefully
 Key: DRILL-5289
 URL: https://issues.apache.org/jira/browse/DRILL-5289
 Project: Apache Drill
  Issue Type: Bug
  Components: Execution - Flow, Execution - RPC
Affects Versions: 1.10.0
Reporter: Rahul Challapalli


[Git Commit ID will be updated soon]

The below query which uses the managed sort causes an OOM error due to 
insufficient heap, which is a bug in itself. 
{code}
ALTER SESSION SET `exec.sort.disable_managed` = false;
+---+-+
|  ok   |   summary   |
+---+-+
| true  | exec.sort.disable_managed updated.  |
+---+-+
1 row selected (1.096 seconds)
0: jdbc:drill:zk=10.10.100.183:5181> alter session set 
`planner.memory.max_query_memory_per_node` = 14106127360;
+---++
|  ok   |  summary   |
+---++
| true  | planner.memory.max_query_memory_per_node updated.  |
+---++
1 row selected (0.253 seconds)
0: jdbc:drill:zk=10.10.100.183:5181> alter session set 
`planner.width.max_per_node` = 1;
+---+--+
|  ok   |   summary|
+---+--+
| true  | planner.width.max_per_node updated.  |
+---+--+
1 row selected (0.184 seconds)
0: jdbc:drill:zk=10.10.100.183:5181> select * from (select * from 
dfs.`/drill/testdata/resource-manager/250wide.tbl` order by columns[0])d where 
d.columns[0] = 'ljdfhwuehnoiueyf';
{code}
Once the OOM happens chaos follows
{code}
1. Dangling fragments are left behind
2. Query fails but zookeeper thinks its still running
3. Client connection timeouts
4. Profile page shows the same query as both running and failed.
{code}

We should be handling this situation more gracefully as this could be perceived 
as a drillbit stability issue. I attached the jstack. The logs and data set 
used are too big to upload here. Reach out to me if you need more information.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Re: [ANNOUNCE] NewApache Drill Committer - Rahul Challapalli

2017-02-15 Thread rahul challapalli

Thank you Ramana and Khurram

On Feb 15, 2017 11:19 AM, "Khurram Faraaz"  wrote:

> Congrats Rahul!
>
> 
> From: Parth Chandra 
> Sent: Wednesday, February 15, 2017 9:48:15 AM
> To: dev@drill.apache.org
> Subject: [ANNOUNCE] NewApache Drill Committer - Rahul Challapalli
>
> The Project Management Committee (PMC) for Apache Drill has invited Rahul
> Challapalli to become a committer and we are pleased  to announce that he
> has accepted.
>
> Welcome Rahul and thanks for your great contributions!
>
> Parth
>

[jira] [Created] (DRILL-5268) SYSTEM ERROR: UnsupportedOperationException: Unable to get size for minor type [MAP] and mode [REQUIRED]

2017-02-15 Thread Rahul Challapalli (JIRA)

Rahul Challapalli created DRILL-5268:


 Summary: SYSTEM ERROR: UnsupportedOperationException: Unable to 
get size for minor type [MAP] and mode [REQUIRED]
 Key: DRILL-5268
 URL: https://issues.apache.org/jira/browse/DRILL-5268
 Project: Apache Drill
  Issue Type: Bug
  Components: Storage - Parquet
Affects Versions: 1.10.0
Reporter: Rahul Challapalli


git.commit.id.abbrev=300e934

With the managed external sort turned on, I get the below error
{code}
alter session set `planner.width.max_per_node` = 1;
alter session set `planner.disable_exchanges` = true;
alter session set `planner.memory.max_query_memory_per_node` = 52428800;
select * from (select d1.type, d1.evnt, d1.transaction from (select d.type 
type, flatten(d.events) evnt, flatten(d.transactions) transaction from 
dfs.`/drill/testdata/resource-manager/10rows/data.json` d) d1 order by 
d1.evnt.event_time, d1.transaction.trans_time) d2 where d2.type='web' and 
d2.evnt.evnt_type = 'cmpgn4';
Error: SYSTEM ERROR: UnsupportedOperationException: Unable to get size for 
minor type [MAP] and mode [REQUIRED]

Fragment 0:0

[Error Id: a9dc1de5-2ff7-44db-bdd2-b166b1f0cea8 on qa-node183.qa.lab:31010] 
(state=,code=0)
{code}

If we do not enable the managed sort, then we end up with DRILL-5234



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Created] (DRILL-5265) External Sort consumes more memory than allocated

2017-02-14 Thread Rahul Challapalli (JIRA)

Rahul Challapalli created DRILL-5265:


 Summary: External Sort consumes more memory than allocated
 Key: DRILL-5265
 URL: https://issues.apache.org/jira/browse/DRILL-5265
 Project: Apache Drill
  Issue Type: Bug
Reporter: Rahul Challapalli


git.commit.id.abbrev=300e934

Based on the profile for the below query, the external sort has a peak memory 
usage of ~126MB when only ~100MB was allocated 
{code}
alter session set `planner.memory.max_query_memory_per_node` = 104857600;
alter session set `planner.width.max_per_node` = 1;
select * from dfs.`/drill/testdata/md1362` order by c_email_address;
{code}

I attached the profile and the log files



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Created] (DRILL-5264) Managed External Sort fails with OOM

2017-02-14 Thread Rahul Challapalli (JIRA)

Rahul Challapalli created DRILL-5264:


 Summary: Managed External Sort fails with OOM
 Key: DRILL-5264
 URL: https://issues.apache.org/jira/browse/DRILL-5264
 Project: Apache Drill
  Issue Type: Bug
Reporter: Rahul Challapalli
Assignee: Paul Rogers


git.commit.id.abbrev=300e934

The below query fails with an OOM
{code}
alter session set `planner.disable_exchanges` = true;
alter session set `planner.memory.max_query_memory_per_node` = 104857600;
alter session set `planner.width.max_per_node` = 1;
select * from dfs.`/drill/testdata/md1362` order by c_email_address;
Error: RESOURCE ERROR: One or more nodes ran out of memory while executing the 
query.

Unable to allocate buffer of size 1048576 due to memory limit. Current 
allocation: 103972896
Fragment 0:0

[Error Id: ba3d1ea7-9bf6-498d-a62a-2ca4e742beea on qa-node190.qa.lab:31010] 
(state=,code=0)
{code}

Exception from the logs
{code}
2017-02-14 15:24:17,911 [275c7003-0e06-42b8-6874-db28bca06d14:frag:0:0] INFO  
o.a.d.e.w.fragment.FragmentExecutor - User Error Occurred: One or more nodes 
ran out of memory while executing the query. (Unable to allocate buffer of size 
1048576 due to memory limit. Current allocation: 103972896)
org.apache.drill.common.exceptions.UserException: RESOURCE ERROR: One or more 
nodes ran out of memory while executing the query.

Unable to allocate buffer of size 1048576 due to memory limit. Current 
allocation: 103972896

[Error Id: ba3d1ea7-9bf6-498d-a62a-2ca4e742beea ]
at 
org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:544)
 ~[drill-common-1.10.0-SNAPSHOT.jar:1.10.0-SNAPSHOT]
at 
org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:242)
 [drill-java-exec-1.10.0-SNAPSHOT.jar:1.10.0-SNAPSHOT]
at 
org.apache.drill.common.SelfCleaningRunnable.run(SelfCleaningRunnable.java:38) 
[drill-common-1.10.0-SNAPSHOT.jar:1.10.0-SNAPSHOT]
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) 
[na:1.7.0_111]
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) 
[na:1.7.0_111]
at java.lang.Thread.run(Thread.java:745) [na:1.7.0_111]
Caused by: org.apache.drill.exec.exception.OutOfMemoryException: Unable to 
allocate buffer of size 1048576 due to memory limit. Current allocation: 
103972896
at 
org.apache.drill.exec.memory.BaseAllocator.buffer(BaseAllocator.java:217) 
~[drill-memory-base-1.10.0-SNAPSHOT.jar:1.10.0-SNAPSHOT]
at 
org.apache.drill.exec.memory.BaseAllocator.buffer(BaseAllocator.java:192) 
~[drill-memory-base-1.10.0-SNAPSHOT.jar:1.10.0-SNAPSHOT]
at 
org.apache.drill.exec.vector.VarCharVector.reAlloc(VarCharVector.java:401) 
~[vector-1.10.0-SNAPSHOT.jar:1.10.0-SNAPSHOT]
at 
org.apache.drill.exec.vector.VarCharVector.copyFromSafe(VarCharVector.java:278) 
~[vector-1.10.0-SNAPSHOT.jar:1.10.0-SNAPSHOT]
at 
org.apache.drill.exec.vector.NullableVarCharVector.copyFromSafe(NullableVarCharVector.java:355)
 ~[vector-1.10.0-SNAPSHOT.jar:1.10.0-SNAPSHOT]
at 
org.apache.drill.exec.test.generated.PriorityQueueCopierGen4.doCopy0$(PriorityQueueCopierTemplate.java:290)
 ~[na:na]
at 
org.apache.drill.exec.test.generated.PriorityQueueCopierGen4.doCopy(PriorityQueueCopierTemplate.java:280)
 ~[na:na]
at 
org.apache.drill.exec.test.generated.PriorityQueueCopierGen4.next(PriorityQueueCopierTemplate.java:76)
 ~[na:na]
at 
org.apache.drill.exec.physical.impl.xsort.managed.CopierHolder$BatchMerger.next(CopierHolder.java:232)
 ~[drill-java-exec-1.10.0-SNAPSHOT.jar:1.10.0-SNAPSHOT]
at 
org.apache.drill.exec.physical.impl.xsort.managed.ExternalSortBatch.mergeSpilledRuns(ExternalSortBatch.java:1140)
 ~[drill-java-exec-1.10.0-SNAPSHOT.jar:1.10.0-SNAPSHOT]
at 
org.apache.drill.exec.physical.impl.xsort.managed.ExternalSortBatch.load(ExternalSortBatch.java:626)
 ~[drill-java-exec-1.10.0-SNAPSHOT.jar:1.10.0-SNAPSHOT]
at 
org.apache.drill.exec.physical.impl.xsort.managed.ExternalSortBatch.innerNext(ExternalSortBatch.java:506)
 ~[drill-java-exec-1.10.0-SNAPSHOT.jar:1.10.0-SNAPSHOT]
at 
org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:162)
 ~[drill-java-exec-1.10.0-SNAPSHOT.jar:1.10.0-SNAPSHOT]
at 
org.apache.drill.exec.physical.impl.validate.IteratorValidatorBatchIterator.next(IteratorValidatorBatchIterator.java:215)
 ~[drill-java-exec-1.10.0-SNAPSHOT.jar:1.10.0-SNAPSHOT]
at 
org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:119)
 ~[drill-java-exec-1.10.0-SNAPSHOT.jar:1.10.0-SNAPSHOT]
at 
org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:109)
 ~[drill-java-exec-1.10.0-SNAPSHOT.jar:1.10.0-SNAPSHOT]
at

[jira] [Created] (DRILL-5262) NPE in managed external sort while spilling to disk

2017-02-13 Thread Rahul Challapalli (JIRA)

Rahul Challapalli created DRILL-5262:


 Summary: NPE in managed external sort while spilling to disk
 Key: DRILL-5262
 URL: https://issues.apache.org/jira/browse/DRILL-5262
 Project: Apache Drill
  Issue Type: Bug
  Components: Execution - Relational Operators
Reporter: Rahul Challapalli


git.commit.id.abbrev=300e934

The data (parquet) set used in the below query contains 1000 files which only 
contain a single row with one integer column and 1 large file ~37 MB. The query 
fails during spilling
{code}
alter session set `planner.memory.max_query_memory_per_node` = 37127360;
alter session set `planner.width.max_per_node` = 1;
select count(*) from (select * from small_large_parquet order by col1 desc) d; 
Error: RESOURCE ERROR: External Sort encountered an error while spilling to disk

Fragment 2:0

[Error Id: 50859d9e-373c-4a97-b270-09f1aae74e3b on qa-node183.qa.lab:31010] 
(state=,code=0)
{code}

Exception from the logs
{code}
2017-02-13 17:01:06,430 [275da989-3005-1c5f-a40c-2415e6d4e89f:frag:2:0] INFO  
o.a.d.e.p.i.x.m.ExternalSortBatch - User Error Occurred: External Sort 
encountered an error while spilling to disk (null)
org.apache.drill.common.exceptions.UserException: RESOURCE ERROR: External Sort 
encountered an error while spilling to disk


[Error Id: 50859d9e-373c-4a97-b270-09f1aae74e3b ]
at 
org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:544)
 ~[drill-common-1.10.0-SNAPSHOT.jar:1.10.0-SNAPSHOT]
at 
org.apache.drill.exec.physical.impl.xsort.managed.ExternalSortBatch.doMergeAndSpill(ExternalSortBatch.java:1336)
 [drill-java-exec-1.10.0-SNAPSHOT.jar:1.10.0-SNAPSHOT]
at 
org.apache.drill.exec.physical.impl.xsort.managed.ExternalSortBatch.mergeAndSpill(ExternalSortBatch.java:1266)
 [drill-java-exec-1.10.0-SNAPSHOT.jar:1.10.0-SNAPSHOT]
at 
org.apache.drill.exec.physical.impl.xsort.managed.ExternalSortBatch.consolidateBatches(ExternalSortBatch.java:1221)
 [drill-java-exec-1.10.0-SNAPSHOT.jar:1.10.0-SNAPSHOT]
at 
org.apache.drill.exec.physical.impl.xsort.managed.ExternalSortBatch.mergeSpilledRuns(ExternalSortBatch.java:1122)
 [drill-java-exec-1.10.0-SNAPSHOT.jar:1.10.0-SNAPSHOT]
at 
org.apache.drill.exec.physical.impl.xsort.managed.ExternalSortBatch.load(ExternalSortBatch.java:626)
 [drill-java-exec-1.10.0-SNAPSHOT.jar:1.10.0-SNAPSHOT]
at 
org.apache.drill.exec.physical.impl.xsort.managed.ExternalSortBatch.innerNext(ExternalSortBatch.java:506)
 [drill-java-exec-1.10.0-SNAPSHOT.jar:1.10.0-SNAPSHOT]
at 
org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:162)
 [drill-java-exec-1.10.0-SNAPSHOT.jar:1.10.0-SNAPSHOT]
at 
org.apache.drill.exec.physical.impl.validate.IteratorValidatorBatchIterator.next(IteratorValidatorBatchIterator.java:215)
 [drill-java-exec-1.10.0-SNAPSHOT.jar:1.10.0-SNAPSHOT]
at 
org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:119)
 [drill-java-exec-1.10.0-SNAPSHOT.jar:1.10.0-SNAPSHOT]
at 
org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:109)
 [drill-java-exec-1.10.0-SNAPSHOT.jar:1.10.0-SNAPSHOT]
at 
org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext(AbstractSingleRecordBatch.java:51)
 [drill-java-exec-1.10.0-SNAPSHOT.jar:1.10.0-SNAPSHOT]
at 
org.apache.drill.exec.physical.impl.svremover.RemovingRecordBatch.innerNext(RemovingRecordBatch.java:93)
 [drill-java-exec-1.10.0-SNAPSHOT.jar:1.10.0-SNAPSHOT]
at 
org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:162)
 [drill-java-exec-1.10.0-SNAPSHOT.jar:1.10.0-SNAPSHOT]
at 
org.apache.drill.exec.physical.impl.validate.IteratorValidatorBatchIterator.next(IteratorValidatorBatchIterator.java:215)
 [drill-java-exec-1.10.0-SNAPSHOT.jar:1.10.0-SNAPSHOT]
at 
org.apache.drill.exec.physical.impl.BaseRootExec.next(BaseRootExec.java:104) 
[drill-java-exec-1.10.0-SNAPSHOT.jar:1.10.0-SNAPSHOT]
at 
org.apache.drill.exec.physical.impl.SingleSenderCreator$SingleSenderRootExec.innerNext(SingleSenderCreator.java:92)
 [drill-java-exec-1.10.0-SNAPSHOT.jar:1.10.0-SNAPSHOT]
at 
org.apache.drill.exec.physical.impl.BaseRootExec.next(BaseRootExec.java:94) 
[drill-java-exec-1.10.0-SNAPSHOT.jar:1.10.0-SNAPSHOT]
at 
org.apache.drill.exec.work.fragment.FragmentExecutor$1.run(FragmentExecutor.java:232)
 [drill-java-exec-1.10.0-SNAPSHOT.jar:1.10.0-SNAPSHOT]
at 
org.apache.drill.exec.work.fragment.FragmentExecutor$1.run(FragmentExecutor.java:226)
 [drill-java-exec-1.10.0-SNAPSHOT.jar:1.10.0-SNAPSHOT]
at java.security.AccessController.doPrivileged(Native Method) 
[na:1.8.0_92]
at javax.security.auth.Subject.doAs(Subject.java:422) [na:1.8.0_92]
at 
org.apache.hadoop.security.UserGroupInformation.doAs

[jira] [Created] (DRILL-5253) External sort fails with OOM error (Fails to allocate sv2)

2017-02-10 Thread Rahul Challapalli (JIRA)

Rahul Challapalli created DRILL-5253:


 Summary: External sort fails with OOM error (Fails to allocate sv2)
 Key: DRILL-5253
 URL: https://issues.apache.org/jira/browse/DRILL-5253
 Project: Apache Drill
  Issue Type: Bug
  Components: Execution - Relational Operators
Affects Versions: 1.10.0
Reporter: Rahul Challapalli


git.commit.id.abbrev=2af709f

The data set used in the below query has the same value for every column in 
every row. The query fails with an OOM as it exceeds the allocated memory

{code}
 select count(*) from (select * from identical order by col1, col2, col3, col4, 
col5, col6, col7, col8, col9, col10);
Error: RESOURCE ERROR: One or more nodes ran out of memory while executing the 
query.

org.apache.drill.exec.exception.OutOfMemoryException: Unable to allocate sv2 
buffer after repeated attempts
Fragment 2:0

[Error Id: aed43fa1-fd8b-4440-9426-0f35d055aabb on qa-node190.qa.lab:31010] 
(state=,code=0)
{code}

Exception from the logs
{code}
org.apache.drill.common.exceptions.UserException: RESOURCE ERROR: One or more 
nodes ran out of memory while executing the query.

org.apache.drill.exec.exception.OutOfMemoryException: Unable to allocate sv2 
buffer after repeated attempts

[Error Id: aed43fa1-fd8b-4440-9426-0f35d055aabb ]
at 
org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:544)
 ~[drill-common-1.10.0-SNAPSHOT.jar:1.10.0-SNAPSHOT]
at 
org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:242)
 [drill-java-exec-1.10.0-SNAPSHOT.jar:1.10.0-SNAPSHOT]
at 
org.apache.drill.common.SelfCleaningRunnable.run(SelfCleaningRunnable.java:38) 
[drill-common-1.10.0-SNAPSHOT.jar:1.10.0-SNAPSHOT]
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) 
[na:1.7.0_111]
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) 
[na:1.7.0_111]
at java.lang.Thread.run(Thread.java:745) [na:1.7.0_111]
Caused by: org.apache.drill.exec.exception.OutOfMemoryException: 
org.apache.drill.exec.exception.OutOfMemoryException: Unable to allocate sv2 
buffer after repeated attempts
at 
org.apache.drill.exec.physical.impl.xsort.ExternalSortBatch.innerNext(ExternalSortBatch.java:371)
 ~[drill-java-exec-1.10.0-SNAPSHOT.jar:1.10.0-SNAPSHOT]
at 
org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:162)
 ~[drill-java-exec-1.10.0-SNAPSHOT.jar:1.10.0-SNAPSHOT]
at 
org.apache.drill.exec.physical.impl.validate.IteratorValidatorBatchIterator.next(IteratorValidatorBatchIterator.java:215)
 ~[drill-java-exec-1.10.0-SNAPSHOT.jar:1.10.0-SNAPSHOT]
at 
org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:119)
 ~[drill-java-exec-1.10.0-SNAPSHOT.jar:1.10.0-SNAPSHOT]
at 
org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:109)
 ~[drill-java-exec-1.10.0-SNAPSHOT.jar:1.10.0-SNAPSHOT]
at 
org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext(AbstractSingleRecordBatch.java:51)
 ~[drill-java-exec-1.10.0-SNAPSHOT.jar:1.10.0-SNAPSHOT]
at 
org.apache.drill.exec.physical.impl.svremover.RemovingRecordBatch.innerNext(RemovingRecordBatch.java:93)
 ~[drill-java-exec-1.10.0-SNAPSHOT.jar:1.10.0-SNAPSHOT]
at 
org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:162)
 ~[drill-java-exec-1.10.0-SNAPSHOT.jar:1.10.0-SNAPSHOT]
at 
org.apache.drill.exec.physical.impl.validate.IteratorValidatorBatchIterator.next(IteratorValidatorBatchIterator.java:215)
 ~[drill-java-exec-1.10.0-SNAPSHOT.jar:1.10.0-SNAPSHOT]
at 
org.apache.drill.exec.physical.impl.BaseRootExec.next(BaseRootExec.java:104) 
~[drill-java-exec-1.10.0-SNAPSHOT.jar:1.10.0-SNAPSHOT]
at 
org.apache.drill.exec.physical.impl.SingleSenderCreator$SingleSenderRootExec.innerNext(SingleSenderCreator.java:92)
 ~[drill-java-exec-1.10.0-SNAPSHOT.jar:1.10.0-SNAPSHOT]
at 
org.apache.drill.exec.physical.impl.BaseRootExec.next(BaseRootExec.java:94) 
~[drill-java-exec-1.10.0-SNAPSHOT.jar:1.10.0-SNAPSHOT]
at 
org.apache.drill.exec.work.fragment.FragmentExecutor$1.run(FragmentExecutor.java:232)
 ~[drill-java-exec-1.10.0-SNAPSHOT.jar:1.10.0-SNAPSHOT]
at 
org.apache.drill.exec.work.fragment.FragmentExecutor$1.run(FragmentExecutor.java:226)
 ~[drill-java-exec-1.10.0-SNAPSHOT.jar:1.10.0-SNAPSHOT]
at java.security.AccessController.doPrivileged(Native Method) 
~[na:1.7.0_111]
at javax.security.auth.Subject.doAs(Subject.java:415) ~[na:1.7.0_111]
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1595)
 ~[hadoop-common-2.7.0-mapr-1607.jar:na]
at 
org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:226)
 [drill-java-exec-1.10.0

[jira] [Created] (DRILL-5249) Optimizer should remove the sort from the plan when the order by statement does not impact the output of the query

2017-02-09 Thread Rahul Challapalli (JIRA)

Rahul Challapalli created DRILL-5249:


 Summary: Optimizer should remove the sort from the plan when the 
order by statement does not impact the output of the query
 Key: DRILL-5249
 URL: https://issues.apache.org/jira/browse/DRILL-5249
 Project: Apache Drill
  Issue Type: Bug
  Components: Query Planning & Optimization
Affects Versions: 1.10.0
Reporter: Rahul Challapalli


git.commit.id.abbrev=2af709f

The below should be optimized to get rid of the "sort" operation
{code}
0: jdbc:drill:zk=10.10.100.190:5181> explain plan for select count(*) from 
. . . . . . . . . . . . . . . . . .> (
. . . . . . . . . . . . . . . . . .>   select * from customer_demographics 
order by cd_marital_status
. . . . . . . . . . . . . . . . . .> ) d1;
+--+--+
| text | json |
+--+--+
| 00-00Screen
00-01  Project(EXPR$0=[$0])
00-02StreamAgg(group=[{}], EXPR$0=[$SUM0($0)])
00-03  UnionExchange
01-01StreamAgg(group=[{}], EXPR$0=[COUNT()])
01-02  Project($f0=[0])
01-03SingleMergeExchange(sort0=[2 ASC])
02-01  SelectionVectorRemover
02-02Sort(sort0=[$2], dir0=[ASC])
02-03  Project(cd_demo_sk=[$0], cd_gender=[$1], 
cd_marital_status=[$2], cd_education_status=[$3], cd_purchase_estimate=[$4], 
cd_credit_rating=[$5], cd_dep_count=[$6], cd_dep_employed_count=[$7], 
cd_dep_college_count=[$8])
02-04HashToRandomExchange(dist0=[[$2]])
03-01  UnorderedMuxExchange
04-01Project(cd_demo_sk=[$0], cd_gender=[$1], 
cd_marital_status=[$2], cd_education_status=[$3], cd_purchase_estimate=[$4], 
cd_credit_rating=[$5], cd_dep_count=[$6], cd_dep_employed_count=[$7], 
cd_dep_college_count=[$8], E_X_P_R_H_A_S_H_F_I_E_L_D=[hash32AsDouble($2)])
04-02  Project(cd_demo_sk=[CAST($0):INTEGER], 
cd_gender=[CAST($1):VARCHAR(200) CHARACTER SET "ISO-8859-1" COLLATE 
"ISO-8859-1$en_US$primary"], cd_marital_status=[CAST($2):VARCHAR(200) CHARACTER 
SET "ISO-8859-1" COLLATE "ISO-8859-1$en_US$primary"], 
cd_education_status=[CAST($3):VARCHAR(200) CHARACTER SET "ISO-8859-1" COLLATE 
"ISO-8859-1$en_US$primary"], cd_purchase_estimate=[CAST($4):INTEGER], 
cd_credit_rating=[CAST($5):VARCHAR(200) CHARACTER SET "ISO-8859-1" COLLATE 
"ISO-8859-1$en_US$primary"], cd_dep_count=[CAST($6):INTEGER], 
cd_dep_employed_count=[CAST($7):INTEGER], 
cd_dep_college_count=[CAST($8):INTEGER])
04-03Scan(groupscan=[ParquetGroupScan 
[entries=[ReadEntryWithPath 
[path=maprfs:///drill/testdata/tpcds_sf1/parquet/customer_demographics]], 
selectionRoot=maprfs:/drill/testdata/tpcds_sf1/parquet/customer_demographics, 
numFiles=1, usedMetadataFile=false, columns=[`cd_demo_sk`, `cd_gender`, 
`cd_marital_status`, `cd_education_status`, `cd_purchase_estimate`, 
`cd_credit_rating`, `cd_dep_count`, `cd_dep_employed_count`, 
`cd_dep_college_count`]]])
{code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Created] (DRILL-5245) Using filter and offset could lead to an assertion error in Calcite

2017-02-08 Thread Rahul Challapalli (JIRA)

Rahul Challapalli created DRILL-5245:


 Summary: Using filter and offset could lead to an assertion error 
in Calcite
 Key: DRILL-5245
 URL: https://issues.apache.org/jira/browse/DRILL-5245
 Project: Apache Drill
  Issue Type: Bug
  Components: Query Planning & Optimization
Affects Versions: 1.10.0
Reporter: Rahul Challapalli


git.commit.id.abbrev=2af709f

Based on the filter selectivity, the planner might think that the number of 
records from upstream in lesser than the "OFFSET" value and can fail with an 
assertion error. Though in reality the estimate based on filter selectivity 
could be wrong

Below is one such example where I am hitting this issue
{code}
select * from (
  select * from (
select d.*, concat(d.c_first_name, d.c_last_name) as name from (
  SELECT 
*
  FROM   catalog_sales,
   customer
  WHERE  cs_bill_customer_sk = c_customer_sk
) as d 
order by d.c_email_address nulls first 
  ) as d1 
  where d1.name is not null
) d2
OFFSET 1434510;
{code}

Exception from the logs
{code}
2017-02-08 11:42:39,925 [27648b4f-98e5-22a9-f7d7-eccb587854a6:foreman] ERROR 
o.a.drill.exec.work.foreman.Foreman - SYSTEM ERROR: AssertionError


[Error Id: d026ab7f-9e11-4854-b39c-66a7846b6a3a on qa-node190.qa.lab:31010]
org.apache.drill.common.exceptions.UserException: SYSTEM ERROR: AssertionError


[Error Id: d026ab7f-9e11-4854-b39c-66a7846b6a3a on qa-node190.qa.lab:31010]
at 
org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:544)
 ~[drill-common-1.10.0-SNAPSHOT.jar:1.10.0-SNAPSHOT]
at 
org.apache.drill.exec.work.foreman.Foreman$ForemanResult.close(Foreman.java:825)
 [drill-java-exec-1.10.0-SNAPSHOT.jar:1.10.0-SNAPSHOT]
at 
org.apache.drill.exec.work.foreman.Foreman.moveToState(Foreman.java:945) 
[drill-java-exec-1.10.0-SNAPSHOT.jar:1.10.0-SNAPSHOT]
at org.apache.drill.exec.work.foreman.Foreman.run(Foreman.java:281) 
[drill-java-exec-1.10.0-SNAPSHOT.jar:1.10.0-SNAPSHOT]
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) 
[na:1.7.0_111]
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) 
[na:1.7.0_111]
at java.lang.Thread.run(Thread.java:745) [na:1.7.0_111]
Caused by: org.apache.drill.exec.work.foreman.ForemanException: Unexpected 
exception during fragment initialization: null
... 4 common frames omitted
Caused by: java.lang.AssertionError: null
at 
org.apache.calcite.rel.metadata.RelMetadataQuery.isNonNegative(RelMetadataQuery.java:524)
 ~[calcite-core-1.4.0-drill-r19.jar:1.4.0-drill-r19]
at 
org.apache.calcite.rel.metadata.RelMetadataQuery.validateResult(RelMetadataQuery.java:543)
 ~[calcite-core-1.4.0-drill-r19.jar:1.4.0-drill-r19]
at 
org.apache.calcite.rel.metadata.RelMetadataQuery.getRowCount(RelMetadataQuery.java:87)
 ~[calcite-core-1.4.0-drill-r19.jar:1.4.0-drill-r19]
at 
org.apache.calcite.rel.externalize.RelWriterImpl.explain_(RelWriterImpl.java:103)
 ~[calcite-core-1.4.0-drill-r19.jar:1.4.0-drill-r19]
at 
org.apache.calcite.rel.externalize.RelWriterImpl.done(RelWriterImpl.java:160) 
~[calcite-core-1.4.0-drill-r19.jar:1.4.0-drill-r19]
at 
org.apache.calcite.rel.AbstractRelNode.explain(AbstractRelNode.java:283) 
~[calcite-core-1.4.0-drill-r19.jar:1.4.0-drill-r19]
at org.apache.calcite.plan.RelOptUtil.toString(RelOptUtil.java:1927) 
~[calcite-core-1.4.0-drill-r19.jar:1.4.0-drill-r19]
at 
org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.log(DefaultSqlHandler.java:138)
 ~[drill-java-exec-1.10.0-SNAPSHOT.jar:1.10.0-SNAPSHOT]
at 
org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.log(DefaultSqlHandler.java:132)
 ~[drill-java-exec-1.10.0-SNAPSHOT.jar:1.10.0-SNAPSHOT]
at 
org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.transform(DefaultSqlHandler.java:411)
 ~[drill-java-exec-1.10.0-SNAPSHOT.jar:1.10.0-SNAPSHOT]
at 
org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.transform(DefaultSqlHandler.java:343)
 ~[drill-java-exec-1.10.0-SNAPSHOT.jar:1.10.0-SNAPSHOT]
at 
org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.convertToDrel(DefaultSqlHandler.java:240)
 ~[drill-java-exec-1.10.0-SNAPSHOT.jar:1.10.0-SNAPSHOT]
at 
org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.convertToDrel(DefaultSqlHandler.java:290)
 ~[drill-java-exec-1.10.0-SNAPSHOT.jar:1.10.0-SNAPSHOT]
at 
org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.getPlan(DefaultSqlHandler.java:168)
 ~[drill-java-exec-1.10.0-SNAPSHOT.jar:1.10.0-SNAPSHOT]
at 
org.apache.drill.exec.planner.sql.DrillSqlWorker.getPhysicalPlan(DrillSqlWorker.java:117)
 ~[drill-java-exec-1.10.0-SNAPSHOT.jar:1.10.0-SNAPSH

Re: Column ordering is incorrect when ORDER BY is used with LIMIT clause in query over parquet data

2017-02-07 Thread rahul challapalli

I don't think a "select *..." query is guaranteed to maintain order.

A similar scenario : https://issues.apache.org/jira/browse/DRILL-1259

On Tue, Feb 7, 2017 at 8:36 AM, Khurram Faraaz  wrote:

> Can someone please look at this. Is this a bug ?
>
>
> Thanks,
>
> Khurram
>
> 
> From: Khurram Faraaz 
> Sent: Monday, February 6, 2017 2:52:25 PM
> To: dev@drill.apache.org
> Subject: Column ordering is incorrect when ORDER BY is used with LIMIT
> clause in query over parquet data
>
> All,
>
>
> This looks incorrect.
>
>
> Query with order by + limit clause, the ordering of the columns returned
> in the query results is NOT the same as the column ordering in the parquet
> file.
>
>
> {noformat}
>
> 0: jdbc:drill:schema=dfs.tmp> SELECT * FROM typeall_l ORDER BY col_int
> limit 1;
> +--+--+-++--
> +-++---+
> ++-+
> | col_bln  | col_chr  |   col_dt|  col_flt   | col_int  |
> col_intrvl_day  | col_intrvl_yr  |  col_tim  |   col_tmstmp   |
>  col_vrchr1   | col_vrchr2  |
> +--+--+-++--
> +-++---+
> ++-+
> | false| MI   | 1967-05-01  | 32.901897  | 0| P12DT20775S
>| P196M  | 19:50:17  | 2004-10-15 17:49:36.0  | Felecia Gourd  |
> NLBQMg9 |
> +--+--+-++--
> +-++---+
> ++-+
> 1 row selected (0.279 seconds)
>
> {noformat}
>
> Without the ORDER BY clause the columns are returned in correct order,
> same as the ordering in the parquet file.
>
> {noformat}
> 0: jdbc:drill:schema=dfs.tmp> SELECT * FROM typeall_l limit 1;
> +--+--++
> 
> -+-+
> ---++++-
> +--+
> | col_int  | col_chr  |   col_vrchr1   |
>col_vrchr2
> |   col_dt|  col_tim  |   col_tmstmp   |  col_flt   |
> col_intrvl_yr  | col_intrvl_day  | col_bln  |
> +--+--++
> 
> -+-+
> ---++++-
> +--+
> | 45436| WV   | John Mcginity  | Rhbf6VFLJguvH9ejrWNkY1CDO8Qqum
> TZAGjwa9cHfjBnLmNIWvo9YfcGObxbeXwa1NkemW9ULxsq5293wEA2v5FFCduwt03D7ysI3RlH8b4B0XAPKY
> | 2011-11-04  | 18:02:26  | 1988-09-23 16:58:42.0  | 10.193293  | P314M
>   | P26DT27386S | false|
> +--+--++
> 
> -+-+
> ---++++-
> +--+
> 1 row selected (0.22 seconds)
>
>
> {noformat}
>
>
> Thanks,
>
> Khurram
>

[jira] [Created] (DRILL-5239) Drill text reader reports wrong results when column value starts with '#'

2017-02-02 Thread Rahul Challapalli (JIRA)

Rahul Challapalli created DRILL-5239:


 Summary: Drill text reader reports wrong results when column value 
starts with '#'
 Key: DRILL-5239
 URL: https://issues.apache.org/jira/browse/DRILL-5239
 Project: Apache Drill
  Issue Type: Bug
  Components: Storage - Text & CSV
Affects Versions: 1.10.0
Reporter: Rahul Challapalli
Priority: Blocker


git.commit.id.abbrev=2af709f

Data Set :
{code}
D|32
8h|234
;#|3489
^$*(|308
#|98
{code}

Wrong Result : (Last row is missing)
{code}
select columns[0] as col1, columns[1] as col2 from 
dfs.`/drill/testdata/wtf2.tbl`;
+---+---+
| col1  | col2  |
+---+---+
| D | 32|
| 8h| 234   |
| ;#| 3489  |
| ^$*(  | 308   |
+---+---+
4 rows selected (0.233 seconds)
{code}

The issue does not however happen with a parquet file



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Created] (DRILL-5228) Several operators in the attached query profile take more time than expected

2017-01-26 Thread Rahul Challapalli (JIRA)

Rahul Challapalli created DRILL-5228:


 Summary: Several operators in the attached query profile take more 
time than expected
 Key: DRILL-5228
 URL: https://issues.apache.org/jira/browse/DRILL-5228
 Project: Apache Drill
  Issue Type: Bug
  Components: Execution - Relational Operators
Affects Versions: 1.10.0
Reporter: Rahul Challapalli


Environment :
{code}
git.commit.id.abbrev=2af709f
DRILL_MAX_DIRECT_MEMORY="32G"
DRILL_MAX_HEAP="4G"
{code}

Data Set : 
{code}
Size : ~18 GB
No Of Columns : 1
Column Width : 256 bytes
{code}

Query ( took ~127 minutes to complete)
{code}
alter session set `planner.width.max_per_node` = 1;
alter session set `planner.disable_exchanges` = true;
alter session set `planner.memory.max_query_memory_per_node` = 14106127360;
select * from (select * from dfs.`/drill/testdata/resource-manager/250wide.tbl` 
order by columns[0])d where d.columns[0] = 'ljdfhwuehnoiueyf';
{code}

*Selection Vector Remover*
{code}
Time Spent based on profile : 7m58s
Problem : Since the external sort spilled to the disk in this case, the 
selection vector remover should have been an no-op. There is no clear 
justification for the time spent
{code}

*Text Sub Scan*
{code}
Time spent based on profile : 13m25s
Problem : I captured the profile screenshot (before-spill.png) once the memory 
allocation for the sort reached its limit. Based on this the scan took 2m13s 
for reading the first 12.48GB of data before sorting/spilling began. For the 
remaining ~5.5 GB it took  ~11 minutes.
{code}

*Projects*
{code}
Timings for the 4 projects based on profile. While I do not have a good reason 
to suspect, these numbers seemed high.
Project 1 : 4m54s
Project 2 : 3m07s
Project 3 : 4m10s
Project 4 : 0.003s
{code}

The time spent in the external sort based on the profile is wrong. DRILL-5227 
is reported for this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (DRILL-5227) Wrong time reported in the query profile for the external sort

2017-01-26 Thread Rahul Challapalli (JIRA)

Rahul Challapalli created DRILL-5227:


 Summary: Wrong time reported in the query profile for the external 
sort
 Key: DRILL-5227
 URL: https://issues.apache.org/jira/browse/DRILL-5227
 Project: Apache Drill
  Issue Type: Bug
  Components: Execution - Relational Operators, Web Server
Affects Versions: 1.10.0
Reporter: Rahul Challapalli


git.commit.id.abbrev=2af709f

Data Set :
{code}
Size : ~18 GB
No Of Columns : 1
Column Width : 256 bytes
{code}

The below query took ~ 127 minutes. However the profile indicated that the 
External Sort itself took 17h27m. Something is wrong.

{code}
alter session set `planner.width.max_per_node` = 1;
alter session set `planner.disable_exchanges` = true;
alter session set `planner.memory.max_query_memory_per_node` = 14106127360;
select * from (select * from dfs.`/drill/testdata/resource-manager/250wide.tbl` 
order by columns[0])d where d.columns[0] = 'ljdfhwuehnoiueyf'
{code}

I attached the query profile. The data set and the logs are too large to attach 
to a jira



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (DRILL-5226) External Sort encountered an error while spilling to disk

2017-01-26 Thread Rahul Challapalli (JIRA)

Rahul Challapalli created DRILL-5226:


 Summary: External Sort encountered an error while spilling to disk
 Key: DRILL-5226
 URL: https://issues.apache.org/jira/browse/DRILL-5226
 Project: Apache Drill
  Issue Type: Bug
  Components: Execution - Relational Operators
Affects Versions: 1.10.0
Reporter: Rahul Challapalli


Environment : 
{code}
git.commit.id.abbrev=2af709f
DRILL_MAX_DIRECT_MEMORY="32G"
DRILL_MAX_HEAP="4G"
Nodes in Mapr Cluster : 1
Data Size : ~ 0.35 GB
No of Columns : 1
Width of column : 256 chars
{code}

The below query fails before spilling to disk due to wrong estimates of the 
record batch size.
{code}
0: jdbc:drill:zk=10.10.100.190:5181> alter session set 
`planner.width.max_per_node` = 1;
+---+--+
|  ok   |   summary|
+---+--+
| true  | planner.width.max_per_node updated.  |
+---+--+
1 row selected (1.11 seconds)
0: jdbc:drill:zk=10.10.100.190:5181> alter session set 
`planner.memory.max_query_memory_per_node` = 62914560;
+---++
|  ok   |  summary   |
+---++
| true  | planner.memory.max_query_memory_per_node updated.  |
+---++
1 row selected (0.362 seconds)
0: jdbc:drill:zk=10.10.100.190:5181> alter session set 
`planner.disable_exchanges` = true;
+---+-+
|  ok   |   summary   |
+---+-+
| true  | planner.disable_exchanges updated.  |
+---+-+
1 row selected (0.277 seconds)
0: jdbc:drill:zk=10.10.100.190:5181> select * from (select * from 
dfs.`/drill/testdata/resource-manager/250wide-small.tbl` order by columns[0])d 
where d.columns[0] = 'ljdfhwuehnoiueyf';
Error: RESOURCE ERROR: External Sort encountered an error while spilling to disk

Unable to allocate buffer of size 1048576 (rounded from 618889) due to memory 
limit. Current allocation: 62736000
Fragment 0:0

[Error Id: 1bb933c8-7dc6-4cbd-8c8e-0e095baac719 on qa-node190.qa.lab:31010] 
(state=,code=0)
{code}

Exception from the logs
{code}
2017-01-26 15:33:09,307 [277578d5-8bea-27db-0da1-cec0f53a13df:frag:0:0] INFO  
o.a.d.e.p.i.xsort.ExternalSortBatch - User Error Occurred: External Sort 
encountered an error while spilling to disk (Unable to allocate buffer of size 
1048576 (rounded from 618889) due to memory limit. Current allocation: 62736000)
org.apache.drill.common.exceptions.UserException: RESOURCE ERROR: External Sort 
encountered an error while spilling to disk

Unable to allocate buffer of size 1048576 (rounded from 618889) due to memory 
limit. Current allocation: 62736000

[Error Id: 1bb933c8-7dc6-4cbd-8c8e-0e095baac719 ]
at 
org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:544)
 ~[drill-common-1.10.0-SNAPSHOT.jar:1.10.0-SNAPSHOT]
at 
org.apache.drill.exec.physical.impl.xsort.ExternalSortBatch.mergeAndSpill(ExternalSortBatch.java:603)
 [drill-java-exec-1.10.0-SNAPSHOT.jar:1.10.0-SNAPSHOT]
at 
org.apache.drill.exec.physical.impl.xsort.ExternalSortBatch.innerNext(ExternalSortBatch.java:411)
 [drill-java-exec-1.10.0-SNAPSHOT.jar:1.10.0-SNAPSHOT]
at 
org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:162)
 [drill-java-exec-1.10.0-SNAPSHOT.jar:1.10.0-SNAPSHOT]
at 
org.apache.drill.exec.physical.impl.validate.IteratorValidatorBatchIterator.next(IteratorValidatorBatchIterator.java:215)
 [drill-java-exec-1.10.0-SNAPSHOT.jar:1.10.0-SNAPSHOT]
at 
org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:119)
 [drill-java-exec-1.10.0-SNAPSHOT.jar:1.10.0-SNAPSHOT]
at 
org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:109)
 [drill-java-exec-1.10.0-SNAPSHOT.jar:1.10.0-SNAPSHOT]
at 
org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext(AbstractSingleRecordBatch.java:51)
 [drill-java-exec-1.10.0-SNAPSHOT.jar:1.10.0-SNAPSHOT]
at 
org.apache.drill.exec.physical.impl.svremover.RemovingRecordBatch.innerNext(RemovingRecordBatch.java:93)
 [drill-java-exec-1.10.0-SNAPSHOT.jar:1.10.0-SNAPSHOT]
at 
org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:162)
 [drill-java-exec-1.10.0-SNAPSHOT.jar:1.10.0-SNAPSHOT]
at 
org.apache.drill.exec.physical.impl.validate.IteratorValidatorBatchIterator.next(IteratorValidatorBatchIterator.java:215)
 [drill-java-exec-1.10.0-SNAPSHOT.jar:1.10.0-SNAPSHOT]
at 
org.apache.drill.exec.record.AbstractRecordBatch.next(

Re: Storage Plugin for accessing Hive ORC Table from Drill

2017-01-22 Thread rahul challapalli

As chunhui mentioned this could very well be a compatibility issue of drill
with hive 2.0. Since drill has never been tested against hive 2.0, this is
not a total surprise. Can you try the below 2 things

1. Make sure you can read the table with hive.
2. Create a very simple hive orc table with a single column (use stored as
orc instead of explicitly mentioning the input and output formats in ur
ddl). Now try reading this simple table from drill.

On Jan 22, 2017 9:55 AM, "Anup Tiwari"  wrote:

> can you point me to any specific line or sentence on that link?
>
> Also please correct me if i am misinterpreting, but as written in 1st
> line "*Drill
> 1.1 and later supports Hive 1.0*", does that mean Drill 1.1 and later
> doesn't support OR partially support Hive 2.x?
>
> Regards,
> *Anup Tiwari*
>
> On Sat, Jan 21, 2017 at 8:48 PM, Zelaine Fong  wrote:
>
> > Have you taken a look at http://drill.apache.org/docs/
> hive-storage-plugin/
> > ?
> >
> > -- Zelaine
> >
> > On 1/20/17, 10:07 PM, "Anup Tiwari"  wrote:
> >
> > @Andries, We are using Hive 2.1.1 with Drill 1.9.0.
> >
> > @Zelaine, Could this be a problem in your Hive metastore?--> As i
> > mentioned
> > earlier, i am able to read hive parquet tables in Drill through hive
> > storage plugin. So can you tell me a bit more like which type of
> > configuration i am missing in metastore?
> >
> > Regards,
> > *Anup Tiwari*
> >
> > On Sat, Jan 21, 2017 at 4:56 AM, Zelaine Fong 
> wrote:
> >
> > > The stack trace shows the following:
> > >
> > > Caused by: org.apache.drill.common.exceptions.
> DrillRuntimeException:
> > > java.io.IOException: Failed to get numRows from HiveTable
> > >
> > > The Drill optimizer is trying to read rowcount information from
> Hive.
> > > Could this be a problem in your Hive metastore?
> > >
> > > Has anyone else seen this before?
> > >
> > > -- Zelaine
> > >
> > > On 1/20/17, 7:35 AM, "Andries Engelbrecht" 
> > wrote:
> > >
> > > What version of Hive are you using?
> > >
> > >
> > > --Andries
> > >
> > > 
> > > From: Anup Tiwari 
> > > Sent: Friday, January 20, 2017 3:00:43 AM
> > > To: u...@drill.apache.org; dev@drill.apache.org
> > > Subject: Re: Storage Plugin for accessing Hive ORC Table from
> > Drill
> > >
> > > Hi,
> > >
> > > Please find below Create Table Statement and subsequent Drill
> > Error :-
> > >
> > > *Table Structure :*
> > >
> > > CREATE TABLE `logindetails_all`(
> > >   `sid` char(40),
> > >   `channel_id` tinyint,
> > >   `c_t` bigint,
> > >   `l_t` bigint)
> > > PARTITIONED BY (
> > >   `login_date` char(10))
> > > CLUSTERED BY (
> > >   channel_id)
> > > INTO 9 BUCKETS
> > > ROW FORMAT SERDE
> > >   'org.apache.hadoop.hive.ql.io.orc.OrcSerde'
> > > STORED AS INPUTFORMAT
> > >   'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat'
> > > OUTPUTFORMAT
> > >   'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat'
> > > LOCATION
> > >   'hdfs://hostname1:9000/usr/hive/warehouse/logindetails_all'
> > > TBLPROPERTIES (
> > >   'compactorthreshold.hive.compactor.delta.num.threshold'='6',
> > >   'compactorthreshold.hive.compactor.delta.pct.threshold'
> ='0.5',
> > >   'transactional'='true',
> > >   'transient_lastDdlTime'='1484313383');
> > > ;
> > >
> > > *Drill Error :*
> > >
> > > *Query* : select * from hive.logindetails_all limit 1;
> > >
> > > *Error :*
> > > 2017-01-20 16:21:12,625 [277e145e-c6bc-3372-01d0-
> > 6c5b75b92d73:foreman]
> > > INFO  o.a.drill.exec.work.foreman.Foreman - Query text for
> > query id
> > > 277e145e-c6bc-3372-01d0-6c5b75b92d73: select * from
> > > hive.logindetails_all
> > > limit 1
> > > 2017-01-20 16:21:12,831 [277e145e-c6bc-3372-01d0-
> > 6c5b75b92d73:foreman]
> > > ERROR o.a.drill.exec.work.foreman.Foreman - SYSTEM ERROR:
> > > NumberFormatException: For input string: "004_"
> > >
> > >
> > > [Error Id: 53fa92e1-477e-45d2-b6f7-6eab9ef1da35 on
> > > prod-hadoop-101.bom-prod.aws.games24x7.com:31010]
> > > org.apache.drill.common.exceptions.UserException: SYSTEM
> ERROR:
> > > NumberFormatException: For input string: "004_"
> > >
> > >
> > > [Error Id: 53fa92e1-477e-45d2-b6f7-6eab9ef1da35 on
> > > prod-hadoop-101.bom-prod.aws.games24x7.com:31010]
> > > at
> > > org.apache.drill.common.exceptions.UserException$
> > > Builder.build(UserException.java:543)
> > > ~[drill-common-1.9.0.jar:1.9.0]
> > > at
> > > org.apache.drill.exec.work.foreman.Foreman$Fore

Re: Where can I find enough information to help me fix these issues ?

2017-01-16 Thread rahul challapalli

It helps if you can give an example of a specific class of which you found
multiple copies.

- Rahul

On Mon, Jan 16, 2017 at 4:30 AM, Muhammad Gelbana 
wrote:

> Everyone,
>
> I'm facing 2 issues with Apache Drill:
>
>- DRILL-5197 
>- DRILL-5193 
>
>
> And it's urgent for me to have them fixed so I tried fixing them myself. I
> cloned this repository  and
> successfully built the project using maven (i.e. mvn clean package)
>
> Now I can't decide were or how to start ! If I attempt to open a class I
> found in a thrown exception, I find multiple copies of the same .java file
> !
>
>
>- So how can I decide which one I should edit ?
>- In some classes, I found this syntax "*<#if entry.hiveType ==
>"BOOLEAN">*" what is this syntax and what is it for ?!
>- Is there a document or a set of documents that would answer
>development questions so I can easily start contributing if I can ?
>
>
> *-*
> *Muhammad Gelbana*
> http://www.linkedin.com/in/mgelbana
>

[jira] [Created] (DRILL-5185) Union all not passing type info when the output contains 0 rows

2017-01-09 Thread Rahul Challapalli (JIRA)

Rahul Challapalli created DRILL-5185:


 Summary: Union all not passing type info when the output contains 
0 rows
 Key: DRILL-5185
 URL: https://issues.apache.org/jira/browse/DRILL-5185
 Project: Apache Drill
  Issue Type: Bug
  Components: Execution - Relational Operators, Query Planning & 
Optimization
Affects Versions: 1.10.0
Reporter: Rahul Challapalli


Version : 1.10.0
git.commit.id.abbrev=4d4e0c2

The below query fails without an explicit cast
{code}
0: jdbc:drill:zk=10.10.100.190:5181> select t1.l_partkey, t2.o_orderdate from (
. . . . . . . . . . . . . . . . . .> select l_orderkey, l_partkey, l_comment 
from cp.`tpch/lineitem.parquet` where l_quantity is null
. . . . . . . . . . . . . . . . . .> union 
. . . . . . . . . . . . . . . . . .> select l_orderkey, l_partkey, l_comment 
from cp.`tpch/lineitem.parquet` where l_quantity is null
. . . . . . . . . . . . . . . . . .> ) as t1,
. . . . . . . . . . . . . . . . . .> cp.`tpch/orders.parquet` as t2
. . . . . . . . . . . . . . . . . .> where t1.l_comment = t2.o_comment;
Error: SYSTEM ERROR: DrillRuntimeException: Join only supports implicit casts 
between 1. Numeric data
 2. Varchar, Varbinary data 3. Date, Timestamp data Left type: VARCHAR, Right 
type: INT. Add explicit casts to avoid this error

Fragment 0:0

[Error Id: e09bb8ee-cb1c-48bc-9dce-42ace2d4b80b on qa-node190.qa.lab:31010] 
(state=,code=0)
{code}




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Re: TPC-DS query 72 takes for ever, it appears to be hung!

2017-01-09 Thread rahul challapalli

What does the query profile say? Is it stuck in planning? Also any activity
in the logs?

- Rahul

On Mon, Jan 9, 2017 at 11:46 AM, Khurram Faraaz  wrote:

> Hi All,
>
>
> TPC-DS query 72 appears to be in running state for ever (it appears to be
> hung). I am on Drill 1.10.0 on a 4 node CentOS cluster, can someone please
> take a look. This is seen over SF1 data.
>
>
> Query plan for TPC-DS query 72
>
> {noformat}
> 00-00Screen : rowType = RecordType(VARCHAR(200) i_item_desc,
> VARCHAR(200) w_warehouse_name, INTEGER d_week_seq, BIGINT no_promo, BIGINT
> promo, BIGINT total_cnt): rowcount = 100.0, cumulative cost =
> {1.2742944455E8 rows, 1.4918508997879562E9 cpu, 0.0 io, 1.98207277056E11
> network, 1.5168530696E8 memory}, id = 5366578
> 00-01  Project(i_item_desc=[$0], w_warehouse_name=[$1],
> d_week_seq=[$2], no_promo=[$3], promo=[$4], total_cnt=[$5]) : rowType =
> RecordType(VARCHAR(200) i_item_desc, VARCHAR(200) w_warehouse_name, INTEGER
> d_week_seq, BIGINT no_promo, BIGINT promo, BIGINT total_cnt): rowcount =
> 100.0, cumulative cost = {1.2742943455E8 rows, 1.4918508897879562E9 cpu,
> 0.0 io, 1.98207277056E11 network, 1.5168530696E8 memory}, id = 5366577
> 00-02SelectionVectorRemover : rowType = RecordType(VARCHAR(200)
> i_item_desc, VARCHAR(200) w_warehouse_name, INTEGER d_week_seq, BIGINT $f3,
> BIGINT $f4, BIGINT total_cnt): rowcount = 100.0, cumulative cost =
> {1.2742943455E8 rows, 1.4918508897879562E9 cpu, 0.0 io, 1.98207277056E11
> network, 1.5168530696E8 memory}, id = 5366576
> 00-03  Limit(fetch=[100]) : rowType = RecordType(VARCHAR(200)
> i_item_desc, VARCHAR(200) w_warehouse_name, INTEGER d_week_seq, BIGINT $f3,
> BIGINT $f4, BIGINT total_cnt): rowcount = 100.0, cumulative cost =
> {1.2742933455E8 rows, 1.4918507897879562E9 cpu, 0.0 io, 1.98207277056E11
> network, 1.5168530696E8 memory}, id = 5366575
> 00-04SelectionVectorRemover : rowType =
> RecordType(VARCHAR(200) i_item_desc, VARCHAR(200) w_warehouse_name, INTEGER
> d_week_seq, BIGINT $f3, BIGINT $f4, BIGINT total_cnt): rowcount = 29362.5,
> cumulative cost = {1.2742923455E8 rows, 1.4918503897879562E9 cpu, 0.0 io,
> 1.98207277056E11 network, 1.5168530696E8 memory}, id = 5366574
> 00-05  TopN(limit=[100]) : rowType = RecordType(VARCHAR(200)
> i_item_desc, VARCHAR(200) w_warehouse_name, INTEGER d_week_seq, BIGINT $f3,
> BIGINT $f4, BIGINT total_cnt): rowcount = 29362.5, cumulative cost =
> {1.2739987205E8 rows, 1.4918210272879562E9 cpu, 0.0 io, 1.98207277056E11
> network, 1.5168530696E8 memory}, id = 5366573
> 00-06Project(i_item_desc=[$0], w_warehouse_name=[$1],
> d_week_seq=[$2], $f3=[$3], $f4=[$4], total_cnt=[$5]) : rowType =
> RecordType(VARCHAR(200) i_item_desc, VARCHAR(200) w_warehouse_name, INTEGER
> d_week_seq, BIGINT $f3, BIGINT $f4, BIGINT total_cnt): rowcount = 29362.5,
> cumulative cost = {1.2737050955E8 rows, 1.48869974365E9 cpu, 0.0 io,
> 1.98207277056E11 network, 1.5168530696E8 memory}, id = 5366572
> 00-07  HashToRandomExchange(dist0=[[$5]], dist1=[[$0]],
> dist2=[[$1]], dist3=[[$2]]) : rowType = RecordType(VARCHAR(200)
> i_item_desc, VARCHAR(200) w_warehouse_name, INTEGER d_week_seq, BIGINT $f3,
> BIGINT $f4, BIGINT total_cnt, ANY E_X_P_R_H_A_S_H_F_I_E_L_D): rowcount =
> 29362.5, cumulative cost = {1.2737050955E8 rows, 1.48869974365E9 cpu, 0.0
> io, 1.98207277056E11 network, 1.5168530696E8 memory}, id = 5366571
> 01-01UnorderedMuxExchange : rowType =
> RecordType(VARCHAR(200) i_item_desc, VARCHAR(200) w_warehouse_name, INTEGER
> d_week_seq, BIGINT $f3, BIGINT $f4, BIGINT total_cnt, ANY
> E_X_P_R_H_A_S_H_F_I_E_L_D): rowcount = 29362.5, cumulative cost =
> {1.2734114705E8 rows, 1.48840611865E9 cpu, 0.0 io, 1.97365395456E11
> network, 1.5168530696E8 memory}, id = 5366570
> 02-01  Project(i_item_desc=[$0],
> w_warehouse_name=[$1], d_week_seq=[$2], $f3=[$3], $f4=[$4], total_cnt=[$5],
> E_X_P_R_H_A_S_H_F_I_E_L_D=[hash32AsDouble($2, hash32AsDouble($1,
> hash32AsDouble($0, hash32AsDouble($5]) : rowType =
> RecordType(VARCHAR(200) i_item_desc, VARCHAR(200) w_warehouse_name, INTEGER
> d_week_seq, BIGINT $f3, BIGINT $f4, BIGINT total_cnt, ANY
> E_X_P_R_H_A_S_H_F_I_E_L_D): rowcount = 29362.5, cumulative cost =
> {1.2731178455E8 rows, 1.48837675615E9 cpu, 0.0 io, 1.97365395456E11
> network, 1.5168530696E8 memory}, id = 5366569
> 02-02HashAgg(group=[{0, 1, 2}], agg#0=[$SUM0($3)],
> agg#1=[$SUM0($4)], total_cnt=[$SUM0($5)]) : rowType =
> RecordType(VARCHAR(200) i_item_desc, VARCHAR(200) w_warehouse_name, INTEGER
> d_week_seq, BIGINT $f3, BIGINT $f4, BIGINT total_cnt): rowcount = 29362.5,
> cumulative cost = {1.2728242205E8 rows, 1.48825930615E9 cpu, 0.0 io,
> 1.97365395456E11 network, 1.5168530696E8 memory}, id = 5366568
> 02-03  Project(i_item_desc=[$0],
> w_warehouse_name=[$1], d_week_seq=[$2], $f3=[$3], $f4=[$4], total_cnt=[$5])
> : rowType = RecordType(VARCHAR(200

[jira] [Created] (DRILL-5154) OOM error in external sort on top of 400GB data set generated using terasort benchamark

2016-12-22 Thread Rahul Challapalli (JIRA)

Rahul Challapalli created DRILL-5154:


 Summary: OOM error in external sort on top of 400GB data set 
generated using terasort benchamark
 Key: DRILL-5154
 URL: https://issues.apache.org/jira/browse/DRILL-5154
 Project: Apache Drill
  Issue Type: Bug
  Components: Execution - Relational Operators
Affects Versions: 1.10.0
Reporter: Rahul Challapalli


git.commit.id.abbrev=cf2b7c7

The below query fails with an OOM in external sort
{code}
No of drillbits : 1
Nodes in Mapr cluster : 2
DRILL_MAX_DIRECT_MEMORY="16G"
DRILL_MAX_HEAP="4G"
select * from (select * from 
dfs.`/drill/testdata/resource-manager/terasort-data/part-m-0.tbl` order by 
columns[0]) d where d.columns[0] = 'null';
Error: RESOURCE ERROR: One or more nodes ran out of memory while executing the 
query.

Unable to allocate buffer of size 8388608 due to memory limit. Current 
allocation: 8441872
Fragment 1:6

[Error Id: 87ede736-b480-4286-b472-7694fdd2f7da on qa-node183.qa.lab:31010] 
(state=,code=0)
{code}

I attached the logs and the query profile



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (DRILL-5153) RESOURCE ERROR: Waited for 15000ms, but tasks for 'Get block maps' are not complete

2016-12-22 Thread Rahul Challapalli (JIRA)

Rahul Challapalli created DRILL-5153:


 Summary: RESOURCE ERROR: Waited for 15000ms, but tasks for 'Get 
block maps'  are not complete
 Key: DRILL-5153
 URL: https://issues.apache.org/jira/browse/DRILL-5153
 Project: Apache Drill
  Issue Type: Bug
  Components: Execution - RPC, Query Planning & Optimization
Reporter: Rahul Challapalli


git.commit.id.abbrev=cf2b7c7

The below query consistently fails on my 2 node cluster. I used the data set 
from the terasort benchmark
{code}
select * from dfs.`/drill/testdata/resource-manager/terasort-data` limit 1;
Error: RESOURCE ERROR: Waited for 15000ms, but tasks for 'Get block maps' are 
not complete. Total runnable size 2, parallelism 2.


[Error Id: 580e6c04-7096-4c09-9c7a-63e70c71d574 on qa-node182.qa.lab:31010] 
(state=,code=0)
{code}





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (DRILL-5149) Planner Optimization : Filter should get pushed into the sub-query

2016-12-21 Thread Rahul Challapalli (JIRA)

Rahul Challapalli created DRILL-5149:


 Summary: Planner Optimization : Filter should get pushed into the 
sub-query
 Key: DRILL-5149
 URL: https://issues.apache.org/jira/browse/DRILL-5149
 Project: Apache Drill
  Issue Type: Bug
  Components: Query Planning & Optimization
Affects Versions: 1.10.0
Reporter: Rahul Challapalli


git.commit.id.abbrev=cf2b7c7

The below plan can be optimized to push the filter into the subquery and also 
to eliminate redundant projects
{code}
explain plan for select * from (select * from 
dfs.`/drill/testdata/resource-manager/5kwidecolumns_500k.tbl` order by 
columns[0]) d where d.columns[0] = '4041054511';

00-00Screen : rowType = RecordType(ANY *): rowcount = 1.436392845E7, 
cumulative cost = {8.776360282950001E8 rows, 1.4059092422298168E10 cpu, 0.0 io, 
1.96115503104E12 network, 1.532152368E9 memory}, id = 11452
00-01  Project(*=[$0]) : rowType = RecordType(ANY *): rowcount = 
1.436392845E7, cumulative cost = {8.7619963545E8 rows, 1.4057656029453169E10 
cpu, 0.0 io, 1.96115503104E12 network, 1.532152368E9 memory}, id = 11451
00-02SelectionVectorRemover : rowType = RecordType(ANY T18¦¦*): 
rowcount = 1.436392845E7, cumulative cost = {8.7619963545E8 rows, 
1.4057656029453169E10 cpu, 0.0 io, 1.96115503104E12 network, 1.532152368E9 
memory}, id = 11450
00-03  Filter(condition=[=(ITEM(ITEM($0, 'columns'), 0), 
'4041054511')]) : rowType = RecordType(ANY T18¦¦*): rowcount = 1.436392845E7, 
cumulative cost = {8.61835707E8 rows, 1.4043292101003168E10 cpu, 0.0 io, 
1.96115503104E12 network, 1.532152368E9 memory}, id = 11449
00-04Project(T18¦¦*=[$0]) : rowType = RecordType(ANY T18¦¦*): 
rowcount = 9.5759523E7, cumulative cost = {7.66076184E8 rows, 
1.3602798295203169E10 cpu, 0.0 io, 1.96115503104E12 network, 1.532152368E9 
memory}, id = 11448
00-05  SingleMergeExchange(sort0=[1 ASC]) : rowType = 
RecordType(ANY T18¦¦*, ANY EXPR$1): rowcount = 9.5759523E7, cumulative cost = 
{7.66076184E8 rows, 1.3602798295203169E10 cpu, 0.0 io, 1.96115503104E12 
network, 1.532152368E9 memory}, id = 11447
01-01SelectionVectorRemover : rowType = RecordType(ANY T18¦¦*, 
ANY EXPR$1): rowcount = 9.5759523E7, cumulative cost = {6.70316661E8 rows, 
1.2836722111203169E10 cpu, 0.0 io, 1.176693018624E12 network, 1.532152368E9 
memory}, id = 11446
01-02  Sort(sort0=[$1], dir0=[ASC]) : rowType = RecordType(ANY 
T18¦¦*, ANY EXPR$1): rowcount = 9.5759523E7, cumulative cost = {5.74557138E8 
rows, 1.2740962588203169E10 cpu, 0.0 io, 1.176693018624E12 network, 
1.532152368E9 memory}, id = 11445
01-03Project(T18¦¦*=[$0], EXPR$1=[$1]) : rowType = 
RecordType(ANY T18¦¦*, ANY EXPR$1): rowcount = 9.5759523E7, cumulative cost = 
{4.78797615E8 rows, 2.585507121E9 cpu, 0.0 io, 1.176693018624E12 network, 0.0 
memory}, id = 11444
01-04  HashToRandomExchange(dist0=[[$1]]) : rowType = 
RecordType(ANY T18¦¦*, ANY EXPR$1, ANY E_X_P_R_H_A_S_H_F_I_E_L_D): rowcount = 
9.5759523E7, cumulative cost = {4.78797615E8 rows, 2.585507121E9 cpu, 0.0 io, 
1.176693018624E12 network, 0.0 memory}, id = 11443
02-01UnorderedMuxExchange : rowType = RecordType(ANY 
T18¦¦*, ANY EXPR$1, ANY E_X_P_R_H_A_S_H_F_I_E_L_D): rowcount = 9.5759523E7, 
cumulative cost = {3.83038092E8 rows, 1.053354753E9 cpu, 0.0 io, 0.0 network, 
0.0 memory}, id = 11442
03-01  Project(T18¦¦*=[$0], EXPR$1=[$1], 
E_X_P_R_H_A_S_H_F_I_E_L_D=[hash32AsDouble($1)]) : rowType = RecordType(ANY 
T18¦¦*, ANY EXPR$1, ANY E_X_P_R_H_A_S_H_F_I_E_L_D): rowcount = 9.5759523E7, 
cumulative cost = {2.87278569E8 rows, 9.5759523E8 cpu, 0.0 io, 0.0 network, 0.0 
memory}, id = 11441
03-02Project(T18¦¦*=[$0], EXPR$1=[ITEM($1, 0)]) : 
rowType = RecordType(ANY T18¦¦*, ANY EXPR$1): rowcount = 9.5759523E7, 
cumulative cost = {1.91519046E8 rows, 5.74557138E8 cpu, 0.0 io, 0.0 network, 
0.0 memory}, id = 11440
03-03  Project(T18¦¦*=[$0], columns=[$1]) : rowType 
= RecordType(ANY T18¦¦*, ANY columns): rowcount = 9.5759523E7, cumulative cost 
= {9.5759523E7 rows, 1.91519046E8 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 
11439
03-04Scan(groupscan=[EasyGroupScan 
[selectionRoot=maprfs:/drill/testdata/resource-manager/5kwidecolumns_500k.tbl, 
numFiles=1, columns=[`*`], 
files=[maprfs:///drill/testdata/resource-manager/5kwidecolumns_500k.tbl]]]) : 
rowType = (DrillRecordRow[*, columns]): rowcount = 9.5759523E7, cumulative cost 
= {9.5759523E7 rows, 1.91519046E8 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 
11438
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (DRILL-5148) Replace hash-distribution with a simple round-robin distribution for a simple order by query

2016-12-21 Thread Rahul Challapalli (JIRA)

Rahul Challapalli created DRILL-5148:


 Summary: Replace hash-distribution with a simple round-robin 
distribution for a simple order by query
 Key: DRILL-5148
 URL: https://issues.apache.org/jira/browse/DRILL-5148
 Project: Apache Drill
  Issue Type: Bug
  Components: Execution - Relational Operators, Query Planning & 
Optimization
Affects Versions: 1.10.0
Reporter: Rahul Challapalli


git.commit.id.abbrev=cf2b7c7

The below plan indicates that we use hash-distribution to avoid data skew. 
However in the below case a simple round-robin approach would be sufficient

{code}
explain plan for select * from 
dfs.`/drill/testdata/resource-manager/5kwidecolumns_500k.tbl` order by 
columns[0];
+--+--+
| text | json |
+--+--+
| 00-00Screen
00-01  Project(*=[$0])
00-02Project(T2¦¦*=[$0])
00-03  SingleMergeExchange(sort0=[1 ASC])
01-01SelectionVectorRemover
01-02  Sort(sort0=[$1], dir0=[ASC])
01-03Project(T2¦¦*=[$0], EXPR$1=[$1])
01-04  HashToRandomExchange(dist0=[[$1]])
02-01UnorderedMuxExchange
03-01  Project(T2¦¦*=[$0], EXPR$1=[$1], 
E_X_P_R_H_A_S_H_F_I_E_L_D=[hash32AsDouble($1)])
03-02Project(T2¦¦*=[$0], EXPR$1=[ITEM($1, 0)])
03-03  Project(T2¦¦*=[$0], columns=[$1])
03-04Scan(groupscan=[EasyGroupScan 
[selectionRoot=maprfs:/drill/testdata/resource-manager/5kwidecolumns_500k.tbl, 
numFiles=1, columns=[`*`], 
files=[maprfs:///drill/testdata/resource-manager/5kwidecolumns_500k.tbl]]])
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (DRILL-5146) Unnecessary spilling to disk by sort when we only have 5000 rows with one column

2016-12-21 Thread Rahul Challapalli (JIRA)

Rahul Challapalli created DRILL-5146:


 Summary: Unnecessary spilling to disk by sort when we only have 
5000 rows with one column
 Key: DRILL-5146
 URL: https://issues.apache.org/jira/browse/DRILL-5146
 Project: Apache Drill
  Issue Type: Bug
  Components: Execution - Relational Operators
Reporter: Rahul Challapalli


git.commit.id.abbrev=cf2b7c7

The below query spills to disk for the sort. The dataset contains 5000 files 
and each file contains a single record. 
{code}
select * from dfs.`/drill/testdata/resource-manager/5000files/text` order by 
columns[1];
{code}

Enviironment :
{code}
DRILL_MAX_DIRECT_MEMORY="16G"
DRILL_MAX_HEAP="4G"
{code}

I attached the dataset, logs and the profile



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (DRILL-5138) TopN operator on top of ~110 GB data set is very slow

2016-12-19 Thread Rahul Challapalli (JIRA)

Rahul Challapalli created DRILL-5138:


 Summary: TopN operator on top of ~110 GB data set is very slow
 Key: DRILL-5138
 URL: https://issues.apache.org/jira/browse/DRILL-5138
 Project: Apache Drill
  Issue Type: Bug
  Components: Execution - Relational Operators
Reporter: Rahul Challapalli


git.commit.id.abbrev=cf2b7c7

No of cores : 23
No of disks : 5
DRILL_MAX_DIRECT_MEMORY="24G"
DRILL_MAX_HEAP="12G"

The below query ran for more than 4 hours and did not complete. The table is 
~110 GB
{code}
select * from catalog_sales order by cs_quantity, cs_wholesale_cost limit 1;
{code}

Physical Plan :
{code}
00-00Screen : rowType = RecordType(ANY *): rowcount = 1.0, cumulative cost 
= {1.00798629141E10 rows, 4.17594320691E10 cpu, 0.0 io, 4.1287118487552E13 
network, 0.0 memory}, id = 352
00-01  Project(*=[$0]) : rowType = RecordType(ANY *): rowcount = 1.0, 
cumulative cost = {1.0079862914E10 rows, 4.1759432069E10 cpu, 0.0 io, 
4.1287118487552E13 network, 0.0 memory}, id = 351
00-02Project(T0¦¦*=[$0]) : rowType = RecordType(ANY T0¦¦*): rowcount = 
1.0, cumulative cost = {1.0079862914E10 rows, 4.1759432069E10 cpu, 0.0 io, 
4.1287118487552E13 network, 0.0 memory}, id = 350
00-03  SelectionVectorRemover : rowType = RecordType(ANY T0¦¦*, ANY 
cs_quantity, ANY cs_wholesale_cost): rowcount = 1.0, cumulative cost = 
{1.0079862914E10 rows, 4.1759432069E10 cpu, 0.0 io, 4.1287118487552E13 network, 
0.0 memory}, id = 349
00-04Limit(fetch=[1]) : rowType = RecordType(ANY T0¦¦*, ANY 
cs_quantity, ANY cs_wholesale_cost): rowcount = 1.0, cumulative cost = 
{1.0079862913E10 rows, 4.1759432068E10 cpu, 0.0 io, 4.1287118487552E13 network, 
0.0 memory}, id = 348
00-05  SingleMergeExchange(sort0=[1 ASC], sort1=[2 ASC]) : rowType 
= RecordType(ANY T0¦¦*, ANY cs_quantity, ANY cs_wholesale_cost): rowcount = 
1.439980416E9, cumulative cost = {1.0079862912E10 rows, 4.1759432064E10 cpu, 
0.0 io, 4.1287118487552E13 network, 0.0 memory}, id = 347
01-01SelectionVectorRemover : rowType = RecordType(ANY T0¦¦*, 
ANY cs_quantity, ANY cs_wholesale_cost): rowcount = 1.439980416E9, cumulative 
cost = {8.639882496E9 rows, 3.0239588736E10 cpu, 0.0 io, 2.3592639135744E13 
network, 0.0 memory}, id = 346
01-02  TopN(limit=[1]) : rowType = RecordType(ANY T0¦¦*, ANY 
cs_quantity, ANY cs_wholesale_cost): rowcount = 1.439980416E9, cumulative cost 
= {7.19990208E9 rows, 2.879960832E10 cpu, 0.0 io, 2.3592639135744E13 network, 
0.0 memory}, id = 345
01-03Project(T0¦¦*=[$0], cs_quantity=[$1], 
cs_wholesale_cost=[$2]) : rowType = RecordType(ANY T0¦¦*, ANY cs_quantity, ANY 
cs_wholesale_cost): rowcount = 1.439980416E9, cumulative cost = {5.759921664E9 
rows, 2.879960832E10 cpu, 0.0 io, 2.3592639135744E13 network, 0.0 memory}, id = 
344
01-04  HashToRandomExchange(dist0=[[$1]], dist1=[[$2]]) : 
rowType = RecordType(ANY T0¦¦*, ANY cs_quantity, ANY cs_wholesale_cost, ANY 
E_X_P_R_H_A_S_H_F_I_E_L_D): rowcount = 1.439980416E9, cumulative cost = 
{5.759921664E9 rows, 2.879960832E10 cpu, 0.0 io, 2.3592639135744E13 network, 
0.0 memory}, id = 343
02-01UnorderedMuxExchange : rowType = RecordType(ANY 
T0¦¦*, ANY cs_quantity, ANY cs_wholesale_cost, ANY E_X_P_R_H_A_S_H_F_I_E_L_D): 
rowcount = 1.439980416E9, cumulative cost = {4.319941248E9 rows, 
1.1519843328E10 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 342
03-01  Project(T0¦¦*=[$0], cs_quantity=[$1], 
cs_wholesale_cost=[$2], E_X_P_R_H_A_S_H_F_I_E_L_D=[hash32AsDouble($2, 
hash32AsDouble($1))]) : rowType = RecordType(ANY T0¦¦*, ANY cs_quantity, ANY 
cs_wholesale_cost, ANY E_X_P_R_H_A_S_H_F_I_E_L_D): rowcount = 1.439980416E9, 
cumulative cost = {2.879960832E9 rows, 1.0079862912E10 cpu, 0.0 io, 0.0 
network, 0.0 memory}, id = 341
03-02Project(T0¦¦*=[$0], cs_quantity=[$1], 
cs_wholesale_cost=[$2]) : rowType = RecordType(ANY T0¦¦*, ANY cs_quantity, ANY 
cs_wholesale_cost): rowcount = 1.439980416E9, cumulative cost = {1.439980416E9 
rows, 4.319941248E9 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 340
03-03  Scan(groupscan=[ParquetGroupScan 
[entries=[ReadEntryWithPath 
[path=maprfs:///drill/testdata/tpcds/parquet/sf1000/catalog_sales]], 
selectionRoot=maprfs:/drill/testdata/tpcds/parquet/sf1000/catalog_sales, 
numFiles=1, usedMetadataFile=false, columns=[`*`]]]) : rowType = 
(DrillRecordRow[*, cs_quantity, cs_wholesale_cost]): rowcount = 1.439980416E9, 
cumulative cost = {1.439980416E9 rows, 4.319941248E9 cpu, 0.0 io, 0.0 network, 
0.0 memory}, id = 339
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (DRILL-5131) Parquet Writer fails with heap space not available error on TPCDS 1TB data set

2016-12-14 Thread Rahul Challapalli (JIRA)

Rahul Challapalli created DRILL-5131:


 Summary: Parquet Writer fails with heap space not available error 
on TPCDS 1TB data set
 Key: DRILL-5131
 URL: https://issues.apache.org/jira/browse/DRILL-5131
 Project: Apache Drill
  Issue Type: Bug
  Components: Storage - Parquet
Affects Versions: 1.9.0
Reporter: Rahul Challapalli


git.commit.id.abbrev=cf2b7c7

The below query fails with "Out of Heap Space" error and brings down the 
drillbit

{code}
create table store_sales as select
case when (columns[0]='') then cast(null as integer) else cast(columns[0] as 
integer) end as ss_sold_date_sk,
case when (columns[1]='') then cast(null as integer) else cast(columns[1] as 
integer) end as ss_sold_time_sk,
case when (columns[2]='') then cast(null as integer) else cast(columns[2] as 
integer) end as ss_item_sk,
case when (columns[3]='') then cast(null as integer) else cast(columns[3] as 
integer) end as ss_customer_sk,
case when (columns[4]='') then cast(null as integer) else cast(columns[4] as 
integer) end as ss_cdemo_sk,
case when (columns[5]='') then cast(null as integer) else cast(columns[5] as 
integer) end as ss_hdemo_sk,
case when (columns[6]='') then cast(null as integer) else cast(columns[6] as 
integer) end as ss_addr_sk,
case when (columns[7]='') then cast(null as integer) else cast(columns[7] as 
integer) end as ss_store_sk,
case when (columns[8]='') then cast(null as integer) else cast(columns[8] as 
integer) end as ss_promo_sk,
case when (columns[9]='') then cast(null as integer) else cast(columns[9] as 
integer) end as ss_ticket_number,
case when (columns[10]='') then cast(null as integer) else cast(columns[10] as 
integer) end as ss_quantity,
case when (columns[11]='') then cast(null as decimal(7,2)) else 
cast(columns[11] as decimal(7,2)) end as ss_wholesale_cost,
case when (columns[12]='') then cast(null as decimal(7,2)) else 
cast(columns[12] as decimal(7,2)) end as ss_list_price,
case when (columns[13]='') then cast(null as decimal(7,2)) else 
cast(columns[13] as decimal(7,2)) end as ss_sales_price,
case when (columns[14]='') then cast(null as decimal(7,2)) else 
cast(columns[14] as decimal(7,2)) end as ss_ext_discount_amt,
case when (columns[15]='') then cast(null as decimal(7,2)) else 
cast(columns[15] as decimal(7,2)) end as ss_ext_sales_price,
case when (columns[16]='') then cast(null as decimal(7,2)) else 
cast(columns[16] as decimal(7,2)) end as ss_ext_wholesale_cost,
case when (columns[17]='') then cast(null as decimal(7,2)) else 
cast(columns[17] as decimal(7,2)) end as ss_ext_list_price,
case when (columns[18]='') then cast(null as decimal(7,2)) else 
cast(columns[18] as decimal(7,2)) end as ss_ext_tax,
case when (columns[19]='') then cast(null as decimal(7,2)) else 
cast(columns[19] as decimal(7,2)) end as ss_coupon_amt,
case when (columns[20]='') then cast(null as decimal(7,2)) else 
cast(columns[20] as decimal(7,2)) end as ss_net_paid,
case when (columns[21]='') then cast(null as decimal(7,2)) else 
cast(columns[21] as decimal(7,2)) end as ss_net_paid_inc_tax,
case when (columns[22]='') then cast(null as decimal(7,2)) else 
cast(columns[22] as decimal(7,2)) end as ss_net_profit
from dfs.`/drill/testdata/tpcds/text/sf1000/store_sales.dat`;
{code}

Exception from the logs
{code}
2016-12-14 14:23:49,303 [27ae4152-0fd4-aa0f-56db-a21e2f54d6c2:frag:1:14] ERROR 
o.a.drill.common.CatastrophicFailure - Catastrophic Failure Occurred, exiting. 
Information message: Unable to handle out of memory condition in 
FragmentExecutor.
java.lang.OutOfMemoryError: Java heap space
at 
org.apache.parquet.bytes.CapacityByteArrayOutputStream.writeToOutput(CapacityByteArrayOutputStream.java:223)
 ~[parquet-encoding-1.8.1-drill-r0.jar:1.8.1-drill-r0]
at 
org.apache.parquet.bytes.CapacityByteArrayOutputStream.writeTo(CapacityByteArrayOutputStream.java:239)
 ~[parquet-encoding-1.8.1-drill-r0.jar:1.8.1-drill-r0]
at 
org.apache.parquet.bytes.BytesInput$CapacityBAOSBytesInput.writeAllTo(BytesInput.java:355)
 ~[parquet-encoding-1.8.1-drill-r0.jar:1.8.1-drill-r0]
at 
org.apache.parquet.bytes.BytesInput$SequenceBytesIn.writeAllTo(BytesInput.java:266)
 ~[parquet-encoding-1.8.1-drill-r0.jar:1.8.1-drill-r0]
at org.apache.parquet.bytes.BytesInput.toByteArray(BytesInput.java:174) 
~[parquet-encoding-1.8.1-drill-r0.jar:1.8.1-drill-r0]
at 
org.apache.parquet.bytes.BytesInput.toByteBuffer(BytesInput.java:185) 
~[parquet-encoding-1.8.1-drill-r0.jar:1.8.1-drill-r0]
at 
org.apache.parquet.hadoop.DirectCodecFactory$SnappyCompressor.compress(DirectCodecFactory.java:291)
 ~[parq

[jira] [Created] (DRILL-5127) Revert the fix for DRILL-4831

2016-12-12 Thread Rahul Challapalli (JIRA)

Rahul Challapalli created DRILL-5127:


 Summary: Revert the fix for DRILL-4831
 Key: DRILL-5127
 URL: https://issues.apache.org/jira/browse/DRILL-5127
 Project: Apache Drill
  Issue Type: Bug
  Components: Metadata
Affects Versions: 1.10
Reporter: Rahul Challapalli


Git Commit # : 3f3811818ecc3bbf6f307a408c30f0406fadc703

DRILL-4831 introduced a major regression DRILL-5082. I tested the supposed fix 
for DRILL-5082 and that introduced a bunch of new issues. Since there is no fix 
in-sight before the next release (DRILL-1.11). I suggest we back off the 
original fix made for DRILL-4831.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (DRILL-5115) Metadata Cache Pruning randomly returns wrong results at higher concurrencies

2016-12-08 Thread Rahul Challapalli (JIRA)

Rahul Challapalli created DRILL-5115:


 Summary: Metadata Cache Pruning randomly returns wrong results at 
higher concurrencies
 Key: DRILL-5115
 URL: https://issues.apache.org/jira/browse/DRILL-5115
 Project: Apache Drill
  Issue Type: Bug
  Components: Metadata, Query Planning & Optimization
Affects Versions: 1.8.0, 1.9.0, 1.10
Reporter: Rahul Challapalli


git.commit.id.abbrev=4312d65

When multiple queries are updating the metadata cache simultaneously the below 
query randomly returns wrong results. 

A single run includes executing a suite of 90 tests at a concurrency of 50. I 
encountered a wrong data scenario in my 10th run.
 
Query :
{code}
select l_orderkey from l_3level where dir0=1 and ((dir1='one' and dir2 IN 
('2015-7-12', '2015-7-13')) or (dir1='two' and dir2='2015-8-12'))
{code}

Wrong Result Plan (based on the profile) : 
{code}
00-00Screen : rowType = RecordType(ANY l_orderkey): rowcount = 310.0, 
cumulative cost = {341.0 rows, 341.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id 
= 205721
00-01  Project(l_orderkey=[$0]) : rowType = RecordType(ANY l_orderkey): 
rowcount = 310.0, cumulative cost = {310.0 rows, 310.0 cpu, 0.0 io, 0.0 
network, 0.0 memory}, id = 205720
00-02Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath 
[path=maprfs:/drill/testdata/metadata_caching_pp/l_3level/1/one/2015-7-13/20.parquet],
 ReadEntryWithPath 
[path=maprfs:/drill/testdata/metadata_caching_pp/l_3level/1/two/2015-8-12/30.parquet],
 ReadEntryWithPath 
[path=maprfs:/drill/testdata/metadata_caching_pp/l_3level/1/one/2015-7-12/10.parquet]],
 selectionRoot=maprfs:/drill/testdata/metadata_caching_pp/l_3level, numFiles=3, 
usedMetadataFile=true, 
cacheFileRoot=/drill/testdata/metadata_caching_pp/l_3level, 
columns=[`l_orderkey`]]]) : rowType = RecordType(ANY l_orderkey): rowcount = 
310.0, cumulative cost = {310.0 rows, 310.0 cpu, 0.0 io, 0.0 network, 0.0 
memory}, id = 205719
{code}

Correct Result Plan (based on the profile):
{code}
00-00Screen : rowType = RecordType(ANY l_orderkey): rowcount = 2.25, 
cumulative cost = {122.475 rows, 527.475 cpu, 0.0 io, 0.0 network, 0.0 memory}, 
id = 226849
00-01  Project(l_orderkey=[$3]) : rowType = RecordType(ANY l_orderkey): 
rowcount = 2.25, cumulative cost = {122.25 rows, 527.25 cpu, 0.0 io, 0.0 
network, 0.0 memory}, id = 226848
00-02SelectionVectorRemover : rowType = RecordType(ANY dir0, ANY dir1, 
ANY dir2, ANY l_orderkey): rowcount = 2.25, cumulative cost = {122.25 rows, 
527.25 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 226847
00-03  Filter(condition=[AND(=($0, 1), OR(AND(=($1, 'one'), OR(=($2, 
'2015-7-12'), =($2, '2015-7-13'))), AND(=($1, 'two'), =($2, '2015-8-12']) : 
rowType = RecordType(ANY dir0, ANY dir1, ANY dir2, ANY l_orderkey): rowcount = 
2.25, cumulative cost = {120.0 rows, 525.0 cpu, 0.0 io, 0.0 network, 0.0 
memory}, id = 226846
00-04Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath 
[path=/drill/testdata/metadata_caching_pp/l_3level/1/one/2015-7-13/20.parquet], 
ReadEntryWithPath 
[path=/drill/testdata/metadata_caching_pp/l_3level/1/two/2015-8-12/30.parquet], 
ReadEntryWithPath 
[path=/drill/testdata/metadata_caching_pp/l_3level/1/one/2015-7-12/10.parquet]],
 selectionRoot=/drill/testdata/metadata_caching_pp/l_3level, numFiles=3, 
usedMetadataFile=true, 
cacheFileRoot=/drill/testdata/metadata_caching_pp/l_3level/1, columns=[`dir0`, 
`dir1`, `dir2`, `l_orderkey`]]]) : rowType = RecordType(ANY dir0, ANY dir1, ANY 
dir2, ANY l_orderkey): rowcount = 60.0, cumulative cost = {60.0 rows, 240.0 
cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 226845
{code}

I attached the data set, log files and the query profiles. Let me know if you 
need anything



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (DRILL-5082) Metadata Cache is being refreshed every single time

2016-11-28 Thread Rahul Challapalli (JIRA)

Rahul Challapalli created DRILL-5082:


 Summary: Metadata Cache is being refreshed every single time
 Key: DRILL-5082
 URL: https://issues.apache.org/jira/browse/DRILL-5082
 Project: Apache Drill
  Issue Type: Bug
  Components: Metadata
Reporter: Rahul Challapalli
Priority: Critical


Git Commit  : 04fb0be191ef09409c00ca7173cb903dfbe2abb0

After the DRILL-4381 fix we are refreshing the metadata cache for every single 
query. This could be because renaming a file is updating the directory's 
timestamp but not the renamed file. 




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (DRILL-5046) Add documentation for directory based partition pruning

2016-11-16 Thread Rahul Challapalli (JIRA)

Rahul Challapalli created DRILL-5046:


 Summary: Add documentation for directory based partition pruning
 Key: DRILL-5046
 URL: https://issues.apache.org/jira/browse/DRILL-5046
 Project: Apache Drill
  Issue Type: Bug
  Components: Documentation
Affects Versions: 1.9.0
Reporter: Rahul Challapalli
Assignee: Bridget Bevens


Drill's documentation for partition pruning should cover the below 2 features

1. Directory based partition pruning
2. Partition pruning based on auto-partitioned parquet files

The first one seems to be missing from our documentation. At the very least we 
should cover

a. How we can leverage this feature to avoid full table scans
b. How this feature works in-conjunction with metadata cache pruning
c. A few examples which involve using wildcards for one of the sub-directories



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Re: [VOTE] Release Apache Drill 1.9.0 RC1

2016-11-15 Thread rahul challapalli

+1 (Non-Binding)

1. Downloaded and built drill from source
2. Ran functional tests [1], and Advanced tests [2].
3. Ran some simple queries on INFORMATION_SCHEMA and sys tables
4. Tried out a few legacy udf's developed prior to Drill-1.0
5. Sanity tested cancellation of running queries

[1]
https://github.com/mapr/drill-test-framework/tree/master/framework/resources/Functional
[2]
https://github.com/mapr/drill-test-framework/tree/master/framework/resources/Advanced

On Tue, Nov 15, 2016 at 11:39 AM, Sudheesh Katkam 
wrote:

> Hi all,
>
> The vote ends tomorrow at 6:30 PM PT; please vote!
>
> As of now, there are only two binding votes.
>
> Thank you,
> Sudheesh
>
> > On Nov 14, 2016, at 7:51 AM, Dechang Gu  wrote:
> >
> > +1
> >
> > - build from source
> > - deployed on a cluster
> > - run TPCH and TPCDS SF100
> >
> > LGTM.
> >
> > -Dechang
> >
> > On Sun, Nov 13, 2016 at 6:13 PM, Aman Sinha 
> wrote:
> >
> >> +1 (binding)
> >>
> >> - Downloaded the binaries on my mac, verified README, git.properties and
> >> KEYS file GPG key
> >> - Ran several queries, including CTAS against TPC-H data.  Checked
> Explain
> >> plans and results for a few queries.
> >> - Checked Web UI for query profiles.
> >> - Downloaded source on my Linux VM, did a build and ran unit tests
> >> successfully.
> >> - Checked Maven artifacts on repositories.apache.org
> >>
> >> -Aman
> >>
> >>
> >> On Sat, Nov 12, 2016 at 12:27 PM, Khurram Faraaz 
> >> wrote:
> >>
> >>> Built from source without unit tests.
> >>> deployed binaries on a cluster.
> >>> executed some basic SQL queries (like aggregation, joins, range search
> >> etc)
> >>> from sqlline.
> >>>
> >>> looks good to me.
> >>>
> >>> On Sat, Nov 12, 2016 at 7:48 AM, Sudheesh Katkam 
> >>> wrote:
> >>>
>  Hi all,
> 
>  I would like to propose the second release candidate (RC1) of Apache
> >>> Drill,
>  version 1.9.0. Thanks to everyone who contributed to this release!
> 
>  + Compared to RC0, this release candidate does not contain DRILL-4373,
> >>> due
>  to a regression (DRILL-5034).
>  + The release candidate covers a total of 73 resolved JIRAs [1].
>  + The tarball artifacts are hosted at [2], and the maven artifacts are
>  hosted at [3].
>  + This release candidate is based on commit
>  db3085498c2dc481f734733535c877dfffb9afea located at [4].
>  + The artifacts are signed with the key at [5].
> 
>  The vote ends at 6:30 PM PT, November 16th, 2016.
> 
>  Here's my vote: +1
> 
>  Thank you,
>  Sudheesh
> 
>  [1]
>  https://issues.apache.org/jira/secure/ReleaseNote.jspa?
>  projectId=12313820&version=12337861
>  [2] http://people.apache.org/~sudheesh/drill/releases/1.9.0/rc1/
>  [3] https://repository.apache.org/content/repositories/
>  orgapachedrill-1038/
>  [4] https://github.com/sudheeshkatkam/drill/commits/drill-1.9.0
>  [5] https://people.apache.org/keys/committer/sudheesh.asc
> 
> >>>
> >>
>
>

[jira] [Created] (DRILL-5038) Drill's new parquet reader complains schema change when there is none

2016-11-11 Thread Rahul Challapalli (JIRA)

Rahul Challapalli created DRILL-5038:


 Summary: Drill's new parquet reader complains schema change when 
there is none
 Key: DRILL-5038
 URL: https://issues.apache.org/jira/browse/DRILL-5038
 Project: Apache Drill
  Issue Type: Bug
  Components: Storage - Parquet
Affects Versions: 1.9.0
Reporter: Rahul Challapalli


git.commit.id.abbrev=4b1902c

The below query fails when we enable the new parquet reader

{code}
0: jdbc:drill:zk=10.10.100.190:5181> alter session set 
`store.parquet.use_new_reader` = true;
+---++
|  ok   |summary |
+---++
| true  | store.parquet.use_new_reader updated.  |
+---++
1 row selected (0.257 seconds)
0: jdbc:drill:zk=10.10.100.190:5181> select 
a.suffix,b.suffix,a.filepath,b.filepath, a.name from `min_max_dir/2005` a join 
`min_max_dir/2016` b on (a.suffix=b.suffix) and a.name > 'yuri king';
Error: SYSTEM ERROR: IllegalStateException: Failure while reading vector.  
Expected vector class of org.apache.drill.exec.vector.NullableVarCharVector but 
was holding vector class org.apache.drill.exec.vector.NullableIntVector, field= 
age(INT:OPTIONAL) 

Fragment 0:0

[Error Id: f695a9fc-ff50-45bc-b7fd-f8f5ba449fab on qa-node191.qa.lab:31010] 
(state=,code=0)
{code}

I attached the data and the log files.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (DRILL-5037) NPE in Parquet Decimal Converter

2016-11-11 Thread Rahul Challapalli (JIRA)

Rahul Challapalli created DRILL-5037:


 Summary: NPE in Parquet Decimal Converter
 Key: DRILL-5037
 URL: https://issues.apache.org/jira/browse/DRILL-5037
 Project: Apache Drill
  Issue Type: Bug
  Components: Storage - Parquet
Affects Versions: 1.9.0
Reporter: Rahul Challapalli


git.commit.id.abbrev=4b1902c

In one of our regression runs, I observed that the below query failed. I 
couldn't reproduce this issue again.

Query :
{code}
select


count(*)as count_star,
sum(a.d18)  as sum_d18,
--round(avg(a.d18)) as round_avg_d18,
cast(avg(a.d18) as bigint)  as round_avg_d18,
--trunc(avg(a.d18)) as trunc_avg_d18,
cast(avg(a.d18) as bigint)  as trunc_avg_d18,
--sum(case when a.d18 = 0 then 100 else round(a.d18/12) end) as 
case_in_sum_d18,
cast(sum(case when a.d18 = 0 then 100 else round(a.d18/12) end) 
as bigint) as case_in_sum_d18,
--coalesce(sum(case when a.d18 = 0 then 100 else 
round(a.d18/12) end), 0) as case_in_sum_d18
cast(coalesce(sum(case when a.d18 = 0 then 100 else 
round(a.d18/12) end), 0) as bigint) as case_in_sum_d18
 
from
alltypes_with_nulls a
left outer join alltypes_with_nulls b on (a.c_integer = 
b.c_integer)
left outer join alltypes_with_nulls c on (b.c_integer = 
c.c_integer)
group by
a.c_varchar
,b.c_varchar
,c.c_varchar
,a.c_integer
,b.c_integer
,c.c_integer
,a.d9
,b.d9
,c.d9
,a.d18
,b.d18
,c.d18
,a.d28
,b.d28
,c.d28
,a.d38
,b.d38
,c.d38
,a.c_date
,b.c_date
,c.c_date
,a.c_date
,b.c_date
,c.c_date
,a.c_time

 order by
a.c_varchar
,b.c_varchar
,c.c_varchar
,a.c_integer
,b.c_integer
,c.c_integer
,a.d9
,b.d9
,c.d9
,a.d18
,b.d18
,c.d18
,a.d28
,b.d28
,c.d28
,a.d38
,b.d38
,c.d38
,a.c_date
,b.c_date
,c.c_date
,a.c_date
,b.c_date
,c.c_date
,a.c_time
{code}

I attached the data set and error from the log file



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Re: Query JSON that has null as value for each key

2016-11-10 Thread rahul challapalli

Khurram,

Take a look at this jira [1]. It looks similar to what you have mentioned.

[1] https://issues.apache.org/jira/browse/DRILL-1256

- Rahul

On Wed, Nov 9, 2016 at 10:49 AM, Khurram Faraaz 
wrote:

> I dont think it is by design. Some one from dev please confirm.
>
> That is because having several columns in a CSV and each column has a null
> value, select * on such CSV returns null for each column. Why is it that
> JSON is treated differently ?
>
> 0: jdbc:drill:schema=dfs.tmp> select * from `r1.csv`;
> +---
> --+
> | columns
>   |
> +---
> --+
> |
> ["null","null","null","null","null","null","null","null","
> null","null","null"]
>  |
> +---
> --+
> 1 row selected (0.318 seconds)
>
> On Thu, Nov 10, 2016 at 12:11 AM, rahul challapalli <
> challapallira...@gmail.com> wrote:
>
> > I think this is expected as drill does not differentiate between missing
> > field and a field which has a null value for all records.
> >
> > On Wed, Nov 9, 2016 at 10:20 AM, Khurram Faraaz 
> > wrote:
> >
> > > Is this by design or is this a bug ?
> > >
> > > On Tue, Nov 8, 2016 at 2:13 PM, Khurram Faraaz 
> > > wrote:
> > >
> > > > Hi All,
> > > >
> > > > Drill 1.9.0 git commit ID : 83513daf
> > > >
> > > > Drill returns same result with or without `store.json.all_text_mode`=
> > > true
> > > >
> > > > [root@cent01 null_eq_joins]# cat right_all_nulls.json
> > > > {
> > > >  "intKey" : null,
> > > >  "bgintKey": null,
> > > >  "strKey": null,
> > > >  "boolKey": null,
> > > >  "fltKey": null,
> > > >  "dblKey": null,
> > > >  "timKey": null,
> > > >  "dtKey": null,
> > > >  "tmstmpKey": null,
> > > >  "intrvldyKey": null,
> > > >  "intrvlyrKey": null
> > > > }
> > > > [root@cent01 null_eq_joins]#
> > > >
> > > > Querying the above JSON file results in null as query result.
> > > >  -  We should see each of the keys in the JSON as a column in query
> > > result.
> > > >  -  And in each column the value should be a null value.
> > > > Current behavior does not look right.
> > > >
> > > > {noformat}
> > > > 0: jdbc:drill:schema=dfs.tmp> select * from `right_all_nulls.json`;
> > > > +---+
> > > > |   *   |
> > > > +---+
> > > > | null  |
> > > > +---+
> > > > 1 row selected (0.313 seconds)
> > > > {noformat}
> > > >
> > > > Thanks,
> > > > Khurram
> > > >
> > >
> >
>

Re: Query JSON that has null as value for each key

2016-11-09 Thread rahul challapalli

I think this is expected as drill does not differentiate between missing
field and a field which has a null value for all records.

On Wed, Nov 9, 2016 at 10:20 AM, Khurram Faraaz 
wrote:

> Is this by design or is this a bug ?
>
> On Tue, Nov 8, 2016 at 2:13 PM, Khurram Faraaz 
> wrote:
>
> > Hi All,
> >
> > Drill 1.9.0 git commit ID : 83513daf
> >
> > Drill returns same result with or without `store.json.all_text_mode`=
> true
> >
> > [root@cent01 null_eq_joins]# cat right_all_nulls.json
> > {
> >  "intKey" : null,
> >  "bgintKey": null,
> >  "strKey": null,
> >  "boolKey": null,
> >  "fltKey": null,
> >  "dblKey": null,
> >  "timKey": null,
> >  "dtKey": null,
> >  "tmstmpKey": null,
> >  "intrvldyKey": null,
> >  "intrvlyrKey": null
> > }
> > [root@cent01 null_eq_joins]#
> >
> > Querying the above JSON file results in null as query result.
> >  -  We should see each of the keys in the JSON as a column in query
> result.
> >  -  And in each column the value should be a null value.
> > Current behavior does not look right.
> >
> > {noformat}
> > 0: jdbc:drill:schema=dfs.tmp> select * from `right_all_nulls.json`;
> > +---+
> > |   *   |
> > +---+
> > | null  |
> > +---+
> > 1 row selected (0.313 seconds)
> > {noformat}
> >
> > Thanks,
> > Khurram
> >
>

JDBC Plugin and Date type

2016-11-04 Thread rahul challapalli

Folks,

I have a couple of questions.

1. After the fix for DRILL-4203, I tried querying parquet files by
disabling the auto-correction. Below is what I got from JDBC . However
sqlline gets rid of the first character and displays the proper result

Query : select l_shipdate from table(cp.`tpch/lineitem.parquet` (type =>
'parquet', autoCorrectCorruptDates => false)) order by l_shipdate limit 10;


^@356-03-19
^@356-03-21
^@356-03-21
^@356-03-23
^@356-03-24
^@356-03-24
^@356-03-26
^@356-03-26
^@356-03-26
^@356-03-26
2. From sqlline, I can't get date values greater than th year. The
below query should have returned '10297-04-27'. Am I missing anything?

0: jdbc:drill:zk=local> select TO_DATE(26278490460) from (VALUES(1));
+-+
|   EXPR$0|
+-+
| 297-04-27  |
+-+


- Rahul

1 2 3 4 5 >

1 - 100 of 457 matches

Mail list logo