Re: [ANNOUNCE] New PMC Chair of Apache Drill

2019-08-23 Thread Robert Hou
Congratulations Charles, and thanks for your contributions to Drill!

Thank you Arina for all you have done as PMC Chair this past year.

--Robert

On Fri, Aug 23, 2019 at 4:16 PM Khurram Faraaz  wrote:

> Congratulations Charles, and thank you Arina.
>
> Regards,
> Khurram
>
> On Fri, Aug 23, 2019 at 2:54 PM Niels Basjes  wrote:
>
> > Congratulations Charles.
> >
> > Niels Basjes
> >
> > On Thu, Aug 22, 2019, 09:28 Arina Ielchiieva  wrote:
> >
> > > Hi all,
> > >
> > > It has been a honor to serve as Drill Chair during the past year but
> it's
> > > high time for the new one...
> > >
> > > I am very pleased to announce that the Drill PMC has voted to elect
> > Charles
> > > Givre as the new PMC chair of Apache Drill. He has also been approved
> > > unanimously by the Apache Board in last board meeting.
> > >
> > > Congratulations, Charles!
> > >
> > > Kind regards,
> > > Arina
> > >
> >
>


Re: [ANNOUNCE] New Committer: Jyothsna Donapati

2019-05-09 Thread Robert Hou
Congratulations!  Thanks for your contributions.

--Robert

On Thu, May 9, 2019 at 4:00 PM Sorabh Hamirwasia 
wrote:

> Congratulations !
>
>
> On Thu, May 9, 2019 at 3:45 PM Hanumanth Maduri 
> wrote:
>
> > Congratulations Jyothsna!!
> >
> > > On May 9, 2019, at 3:06 PM, Gautam Parai  wrote:
> > >
> > > Congratulations Jyothsna!!
> > >
> > > Gautam
> > >
> > >> On Thu, May 9, 2019 at 2:59 PM Timothy Farkas 
> wrote:
> > >>
> > >> Congrats!!
> > >>
> > >>
> > >>> On Thu, May 9, 2019 at 2:54 PM Bridget Bevens 
> > wrote:
> > >>>
> > >>> Congratulations, Jyothsna!!! :-)
> > >>>
> >  On Thu, May 9, 2019 at 2:46 PM Khurram Faraaz 
> > wrote:
> > 
> >  Congratulations Jyothsna!
> > 
> >  On Thu, May 9, 2019 at 2:38 PM salim achouche  >
> >  wrote:
> > 
> > > Congratulations Jyothsna!
> > >
> > > On Thu, May 9, 2019 at 2:28 PM Aman Sinha 
> > >>> wrote:
> > >
> > >> The Project Management Committee (PMC) for Apache Drill has
> invited
> > >> Jyothsna
> > >> Donapati to become a committer, and we are pleased to announce
> that
> > >>> she
> > > has
> > >> accepted.
> > >>
> > >> Jyothsna has been contributing to Drill for about 1 1/2 years.
> She
> > >> initially contributed the graceful shutdown capability and more
> >  recently
> > >> has made several crucial improvements in the parquet metadata
> > >> caching
> > > which
> > >> have gone into the 1.16 release.  She also co-authored the design
> > > document
> > >> for this feature.
> > >>
> > >> Welcome Jyothsna, and thank you for your contributions.  Keep up
> > >> the
> >  good
> > >> work
> > >> !
> > >>
> > >> -Aman
> > >> (on behalf of Drill PMC)
> > >>
> > >
> > >
> > > --
> > > Regards,
> > > Salim
> > >
> > 
> > >>>
> > >>
> >
>


[jira] [Created] (DRILL-7227) TPCDS queries 47, 57, 59 fail to run with Statistics enabled at sf100

2019-04-30 Thread Robert Hou (JIRA)
Robert Hou created DRILL-7227:
-

 Summary: TPCDS queries 47, 57, 59 fail to run with Statistics 
enabled at sf100
 Key: DRILL-7227
 URL: https://issues.apache.org/jira/browse/DRILL-7227
 Project: Apache Drill
  Issue Type: Bug
  Components: Metadata
Affects Versions: 1.16.0
Reporter: Robert Hou
Assignee: Gautam Parai
 Fix For: 1.17.0
 Attachments: 23387ab0-cb1c-cd5e-449a-c9bcefc901c1.sys.drill, 
2338ae93-155b-356d-382e-0da949c6f439.sys.drill

Here is query 78:
{noformat}
WITH ws 
 AS (SELECT d_year AS ws_sold_year, 
ws_item_sk, 
ws_bill_customer_skws_customer_sk, 
Sum(ws_quantity)   ws_qty, 
Sum(ws_wholesale_cost) ws_wc, 
Sum(ws_sales_price)ws_sp 
 FROM   web_sales 
LEFT JOIN web_returns 
   ON wr_order_number = ws_order_number 
  AND ws_item_sk = wr_item_sk 
JOIN date_dim 
  ON ws_sold_date_sk = d_date_sk 
 WHERE  wr_order_number IS NULL 
 GROUP  BY d_year, 
   ws_item_sk, 
   ws_bill_customer_sk), 
 cs 
 AS (SELECT d_year AS cs_sold_year, 
cs_item_sk, 
cs_bill_customer_skcs_customer_sk, 
Sum(cs_quantity)   cs_qty, 
Sum(cs_wholesale_cost) cs_wc, 
Sum(cs_sales_price)cs_sp 
 FROM   catalog_sales 
LEFT JOIN catalog_returns 
   ON cr_order_number = cs_order_number 
  AND cs_item_sk = cr_item_sk 
JOIN date_dim 
  ON cs_sold_date_sk = d_date_sk 
 WHERE  cr_order_number IS NULL 
 GROUP  BY d_year, 
   cs_item_sk, 
   cs_bill_customer_sk), 
 ss 
 AS (SELECT d_year AS ss_sold_year, 
ss_item_sk, 
ss_customer_sk, 
Sum(ss_quantity)   ss_qty, 
Sum(ss_wholesale_cost) ss_wc, 
Sum(ss_sales_price)ss_sp 
 FROM   store_sales 
LEFT JOIN store_returns 
   ON sr_ticket_number = ss_ticket_number 
  AND ss_item_sk = sr_item_sk 
JOIN date_dim 
  ON ss_sold_date_sk = d_date_sk 
 WHERE  sr_ticket_number IS NULL 
 GROUP  BY d_year, 
   ss_item_sk, 
   ss_customer_sk) 
SELECT ss_item_sk, 
   Round(ss_qty / ( COALESCE(ws_qty + cs_qty, 1) ), 2) ratio, 
   ss_qty  store_qty, 
   ss_wc 
   store_wholesale_cost, 
   ss_sp 
   store_sales_price, 
   COALESCE(ws_qty, 0) + COALESCE(cs_qty, 0) 
   other_chan_qty, 
   COALESCE(ws_wc, 0) + COALESCE(cs_wc, 0) 
   other_chan_wholesale_cost, 
   COALESCE(ws_sp, 0) + COALESCE(cs_sp, 0) 
   other_chan_sales_price 
FROM   ss 
   LEFT JOIN ws 
  ON ( ws_sold_year = ss_sold_year 
   AND ws_item_sk = ss_item_sk 
   AND ws_customer_sk = ss_customer_sk ) 
   LEFT JOIN cs 
  ON ( cs_sold_year = ss_sold_year 
   AND cs_item_sk = cs_item_sk 
   AND cs_customer_sk = ss_customer_sk ) 
WHERE  COALESCE(ws_qty, 0) > 0 
   AND COALESCE(cs_qty, 0) > 0 
   AND ss_sold_year = 1999 
ORDER  BY ss_item_sk, 
  ss_qty DESC, 
  ss_wc DESC, 
  ss_sp DESC, 
  other_chan_qty, 
  other_chan_wholesale_cost, 
  other_chan_sales_price, 
  Round(ss_qty / ( COALESCE(ws_qty + cs_qty, 1) ), 2)
LIMIT 100; 
{noformat}

The profile for the new plan is 2338ae93-155b-356d-382e-0da949c6f439.  Hash 
partition sender operator (10-00) takes 10-15 minutes.  I am not sure why it 
takes so long.  It has 10 minor fragments sending to receiver (06-05), which 
has 62 minor fragments.  But hash partition sender (16-00) has 10 minor 
fragments sending to receiver (12-06), which has 220 minor fragments, and there 
is no performance issue.

The profile for the old plan is 23387ab0-cb1c-cd5e-449a-c9bcefc901c1.  Both 
plans use the same commit.  The old plan is created by disabling statistics.

I have not included the plans in the Jira because Jira has a max of 32K.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (DRILL-7183) TPCDS query 10, 35, 69 take longer with sf 1000 when Statistics are disabled

2019-04-17 Thread Robert Hou (JIRA)
Robert Hou created DRILL-7183:
-

 Summary: TPCDS query 10, 35, 69 take longer with sf 1000 when 
Statistics are disabled
 Key: DRILL-7183
 URL: https://issues.apache.org/jira/browse/DRILL-7183
 Project: Apache Drill
  Issue Type: Bug
  Components: Query Planning  Optimization
Affects Versions: 1.16.0
Reporter: Robert Hou
Assignee: Hanumath Rao Maduri
 Fix For: 1.16.0


Query 69 runs 150% slower when Statistics is disabled.  Here is the query:
{noformat}
SELECT
  cd_gender,
  cd_marital_status,
  cd_education_status,
  count(*) cnt1,
  cd_purchase_estimate,
  count(*) cnt2,
  cd_credit_rating,
  count(*) cnt3
FROM
  customer c, customer_address ca, customer_demographics
WHERE
  c.c_current_addr_sk = ca.ca_address_sk AND
ca_state IN ('KY', 'GA', 'NM') AND
cd_demo_sk = c.c_current_cdemo_sk AND
exists(SELECT *
   FROM store_sales, date_dim
   WHERE c.c_customer_sk = ss_customer_sk AND
 ss_sold_date_sk = d_date_sk AND
 d_year = 2001 AND
 d_moy BETWEEN 4 AND 4 + 2) AND
(NOT exists(SELECT *
FROM web_sales, date_dim
WHERE c.c_customer_sk = ws_bill_customer_sk AND
  ws_sold_date_sk = d_date_sk AND
  d_year = 2001 AND
  d_moy BETWEEN 4 AND 4 + 2) AND
  NOT exists(SELECT *
 FROM catalog_sales, date_dim
 WHERE c.c_customer_sk = cs_ship_customer_sk AND
   cs_sold_date_sk = d_date_sk AND
   d_year = 2001 AND
   d_moy BETWEEN 4 AND 4 + 2))
GROUP BY cd_gender, cd_marital_status, cd_education_status,
  cd_purchase_estimate, cd_credit_rating
ORDER BY cd_gender, cd_marital_status, cd_education_status,
  cd_purchase_estimate, cd_credit_rating
LIMIT 100;
{noformat}

This regression is caused by commit 982e98061e029a39f1c593f695c0d93ec7079f0d.  
This commit should be reverted for now.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: [ANNOUNCE] New PMC member: Sorabh Hamirwasia

2019-04-05 Thread Robert Hou
Congratulations Sorabh!  Thanks for your contributions.

--Robert

On Fri, Apr 5, 2019 at 4:57 PM weijie tong  wrote:

> Congratulations Sorabh!
>
> On Sat, Apr 6, 2019 at 7:17 AM Sorabh Hamirwasia 
> wrote:
>
> > Thank You everyone for your wishes!!
> >
> > Looking forward for everyone's help to vote on release candidate next
> week
> > :)
> >
> > Thanks,
> > Sorabh
> >
> > On Fri, Apr 5, 2019 at 2:12 PM Parth Chandra  wrote:
> >
> > > Congrats Sorabh. Just in time to manage the release !
> > >
> > >
> > >
> > > On Fri, Apr 5, 2019 at 9:06 AM Arina Ielchiieva 
> > wrote:
> > >
> > > > I am pleased to announce that Drill PMC invited Sorabh Hamirwasia to
> > > > the PMC and
> > > > he has accepted the invitation.
> > > >
> > > > Congratulations Sorabh and welcome!
> > > >
> > > > - Arina
> > > > (on behalf of Drill PMC)
> > > >
> > >
> >
>


[jira] [Created] (DRILL-7155) Create a standard logging message for batch sizes generated by individual operators

2019-04-04 Thread Robert Hou (JIRA)
Robert Hou created DRILL-7155:
-

 Summary: Create a standard logging message for batch sizes 
generated by individual operators
 Key: DRILL-7155
 URL: https://issues.apache.org/jira/browse/DRILL-7155
 Project: Apache Drill
  Issue Type: Task
  Components: Execution - Relational Operators
Affects Versions: 1.16.0
Reporter: Robert Hou
Assignee: Robert Hou


QA reads log messages in drillbit.log to verify the sizes of data batches 
generated by individual operators.  These log messages need to be standardized 
so that each operator creates the same message.  This allows the QA test 
framework to verify the information in each message.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (DRILL-7154) TPCH query 4 and 17 take longer with sf 1000 when Statistics are disabled

2019-04-04 Thread Robert Hou (JIRA)
Robert Hou created DRILL-7154:
-

 Summary: TPCH query 4 and 17 take longer with sf 1000 when 
Statistics are disabled
 Key: DRILL-7154
 URL: https://issues.apache.org/jira/browse/DRILL-7154
 Project: Apache Drill
  Issue Type: Bug
  Components: Query Planning  Optimization
Affects Versions: 1.16.0
Reporter: Robert Hou
Assignee: Boaz Ben-Zvi
 Fix For: 1.16.0
 Attachments: 235a3ed4-e3d1-f3b7-39c5-fc947f56b6d5.sys.drill, 
235a471b-aa97-bfb5-207d-3f25b4b5fbbb.sys.drill, hashagg.nostats.log, 
hashagg.stats.disabled.log

Here is TPCH 04 with sf 1000:
{noformat}
select
  o.o_orderpriority,
  count(*) as order_count
from
  orders o

where
  o.o_orderdate >= date '1996-10-01'
  and o.o_orderdate < date '1996-10-01' + interval '3' month
  and 
  exists (
select
  *
from
  lineitem l
where
  l.l_orderkey = o.o_orderkey
  and l.l_commitdate < l.l_receiptdate
  )
group by
  o.o_orderpriority
order by
  o.o_orderpriority;
{noformat}

TPCH query 4 takes 30% longer.  The plan is the same.  But the Hash Agg 
operator in the new plan is taking longer.  One possible reason is that the 
Hash Agg operator in the new plan is not using as many buckets as the old plan 
did.  The memory usage of the Hash Agg operator in the new plan is using less 
memory compared to the old plan.

Here is the old plan:
{noformat}
00-00Screen : rowType = RecordType(ANY o_orderpriority, BIGINT 
order_count): rowcount = 375.0, cumulative cost = {1.9163601940441746E10 
rows, 9.07316867594483E10 cpu, 2.2499969127E10 io, 3.59423968386048E12 network, 
2.2631985057468002E10 memory}, id = 5645
00-01  Project(o_orderpriority=[$0], order_count=[$1]) : rowType = 
RecordType(ANY o_orderpriority, BIGINT order_count): rowcount = 375.0, 
cumulative cost = {1.9163226940441746E10 rows, 9.07313117594483E10 cpu, 
2.2499969127E10 io, 3.59423968386048E12 network, 2.2631985057468002E10 memory}, 
id = 5644
00-02SingleMergeExchange(sort0=[0]) : rowType = RecordType(ANY 
o_orderpriority, BIGINT order_count): rowcount = 375.0, cumulative cost = 
{1.9159476940441746E10 rows, 9.07238117594483E10 cpu, 2.2499969127E10 io, 
3.59423968386048E12 network, 2.2631985057468002E10 memory}, id = 5643
01-01  OrderedMuxExchange(sort0=[0]) : rowType = RecordType(ANY 
o_orderpriority, BIGINT order_count): rowcount = 375.0, cumulative cost = 
{1.9155726940441746E10 rows, 9.0643982838025E10 cpu, 2.2499969127E10 io, 
3.56351968386048E12 network, 2.2631985057468002E10 memory}, id = 5642
02-01SelectionVectorRemover : rowType = RecordType(ANY 
o_orderpriority, BIGINT order_count): rowcount = 375.0, cumulative cost = 
{1.9151976940441746E10 rows, 9.0640232838025E10 cpu, 2.2499969127E10 io, 
3.56351968386048E12 network, 2.2631985057468002E10 memory}, id = 5641
02-02  Sort(sort0=[$0], dir0=[ASC]) : rowType = RecordType(ANY 
o_orderpriority, BIGINT order_count): rowcount = 375.0, cumulative cost = 
{1.9148226940441746E10 rows, 9.0636482838025E10 cpu, 2.2499969127E10 io, 
3.56351968386048E12 network, 2.2631985057468002E10 memory}, id = 5640
02-03HashAgg(group=[{0}], order_count=[$SUM0($1)]) : rowType = 
RecordType(ANY o_orderpriority, BIGINT order_count): rowcount = 375.0, 
cumulative cost = {1.9144476940441746E10 rows, 9.030890595055101E10 cpu, 
2.2499969127E10 io, 3.56351968386048E12 network, 2.2571985057468002E10 memory}, 
id = 5639
02-04  HashToRandomExchange(dist0=[[$0]]) : rowType = 
RecordType(ANY o_orderpriority, BIGINT order_count): rowcount = 3.75E7, 
cumulative cost = {1.9106976940441746E10 rows, 8.955890595055101E10 cpu, 
2.2499969127E10 io, 3.56351968386048E12 network, 2.1911985057468002E10 memory}, 
id = 5638
03-01HashAgg(group=[{0}], order_count=[COUNT()]) : rowType 
= RecordType(ANY o_orderpriority, BIGINT order_count): rowcount = 3.75E7, 
cumulative cost = {1.9069476940441746E10 rows, 8.895890595055101E10 cpu, 
2.2499969127E10 io, 3.25631968386048E12 network, 2.1911985057468002E10 memory}, 
id = 5637
03-02  Project(o_orderpriority=[$1]) : rowType = 
RecordType(ANY o_orderpriority): rowcount = 3.75E8, cumulative cost = 
{1.8694476940441746E10 rows, 8.145890595055101E10 cpu, 2.2499969127E10 io, 
3.25631968386048E12 network, 1.5311985057468002E10 memory}, id = 5636
03-03Project(o_orderkey=[$1], o_orderpriority=[$2], 
l_orderkey=[$0]) : rowType = RecordType(ANY o_orderkey, ANY o_orderpriority, 
ANY l_orderkey): rowcount = 3.75E8, cumulative cost = {1.8319476940441746E10 
rows, 8.108390595055101E10 cpu, 2.2499969127E10 io, 3.25631968386048E12 
network, 1.5311985057468002E10 memory}, id = 5635
03-04  HashJoin(condition=[=($1, $0)], 
joinType=[inner], semi-join: =[false]) : rowType = RecordType(ANY l_order

[jira] [Created] (DRILL-7139) Date)add produces Incorrect results when adding to a timestamp

2019-03-28 Thread Robert Hou (JIRA)
Robert Hou created DRILL-7139:
-

 Summary: Date)add produces Incorrect results when adding to a 
timestamp
 Key: DRILL-7139
 URL: https://issues.apache.org/jira/browse/DRILL-7139
 Project: Apache Drill
  Issue Type: Bug
  Components: Functions - Drill
Affects Versions: 1.15.0
Reporter: Robert Hou
Assignee: Pritesh Maker


I am using date_add() to create a sequence of timestamps:
{noformat}
select date_add(timestamp '1970-01-01 00:00:00', 
cast(concat('PT',{color:#f79232}107374{color},'M') as interval minute)) 
timestamp_id from (values(1));
+--+
|   timestamp_id   |
+--+
| 1970-01-25 20:31:12.704  |
+--+
1 row selected (0.121 seconds)
{noformat}

When I add one more, I get an older timestamp:
{noformat}
0: jdbc:drill:drillbit=10.10.51.5> select date_add(timestamp '1970-01-01 
00:00:00', cast(concat('PT',{color:#f79232}107375{color},'M') as interval 
minute)) timestamp_id from (values(1));
+--+
|   timestamp_id   |
+--+
| 1969-12-07 03:29:25.408  |
+--+
1 row selected (0.126 seconds)
{noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (DRILL-7136) Num_buckets for HashAgg in profile may be inaccurate

2019-03-27 Thread Robert Hou (JIRA)
Robert Hou created DRILL-7136:
-

 Summary: Num_buckets for HashAgg in profile may be inaccurate
 Key: DRILL-7136
 URL: https://issues.apache.org/jira/browse/DRILL-7136
 Project: Apache Drill
  Issue Type: Bug
  Components: Tools, Build  Test
Affects Versions: 1.16.0
Reporter: Robert Hou
Assignee: Pritesh Maker
 Fix For: 1.16.0
 Attachments: 23650ee5-6721-8a8f-7dd3-f5dd09a3a7b0.sys.drill

I ran TPCH query 17 with sf 1000.  Here is the query:
{noformat}
select
  sum(l.l_extendedprice) / 7.0 as avg_yearly
from
  lineitem l,
  part p
where
  p.p_partkey = l.l_partkey
  and p.p_brand = 'Brand#13'
  and p.p_container = 'JUMBO CAN'
  and l.l_quantity < (
select
  0.2 * avg(l2.l_quantity)
from
  lineitem l2
where
  l2.l_partkey = p.p_partkey
  );
{noformat}

One of the hash agg operators has resized 6 times.  It should have 4M buckets.  
But the profile shows it has 64K buckets.



I have attached a sample profile.  In this profile, the hash agg operator is 
(04-02).
{noformat}
Operator Metrics
Minor Fragment  NUM_BUCKETS NUM_ENTRIES NUM_RESIZING
RESIZING_TIME_MSNUM_PARTITIONS  SPILLED_PARTITIONS  SPILL_MB
SPILL_CYCLE INPUT_BATCH_COUNT   AVG_INPUT_BATCH_BYTES   
AVG_INPUT_ROW_BYTES INPUT_RECORD_COUNT  OUTPUT_BATCH_COUNT  
AVG_OUTPUT_BATCH_BYTES  AVG_OUTPUT_ROW_BYTESOUTPUT_RECORD_COUNT
04-00-0265,536  748,746 6   364 1   582 0   
813 582,653 18  26,316,456  401 1,631,943   25  
26,176,350
{noformat}





--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (DRILL-7132) Metadata cache does not have correct min/max values for varchar and interval data types

2019-03-22 Thread Robert Hou (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-7132?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Hou resolved DRILL-7132.
---
Resolution: Not A Problem

> Metadata cache does not have correct min/max values for varchar and interval 
> data types
> ---
>
> Key: DRILL-7132
> URL: https://issues.apache.org/jira/browse/DRILL-7132
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Metadata
>Affects Versions: 1.14.0
>Reporter: Robert Hou
>Priority: Major
> Fix For: 1.17.0
>
> Attachments: 0_0_10.parquet
>
>
> The parquet metadata cache does not have correct min/max values for varchar 
> and interval data types.
> I have attached a parquet file.  Here is what parquet tools shows for varchar:
> [varchar_col] BINARY 14.6% of all space [PLAIN, BIT_PACKED] min: 67 max: 67 
> average: 67 total: 67 (raw data: 65 saving -3%)
>   values: min: 1 max: 1 average: 1 total: 1
>   uncompressed: min: 65 max: 65 average: 65 total: 65
>   column values statistics: min: ioegjNJKvnkd, max: ioegjNJKvnkd, num_nulls: 0
> Here is what the metadata cache file shows:
> "name" : [ "varchar_col" ],
> "minValue" : "aW9lZ2pOSkt2bmtk",
> "maxValue" : "aW9lZ2pOSkt2bmtk",
> "nulls" : 0
> Here is what parquet tools shows for interval:
> [interval_col] BINARY 11.3% of all space [PLAIN, BIT_PACKED] min: 52 max: 52 
> average: 52 total: 52 (raw data: 50 saving -4%)
>   values: min: 1 max: 1 average: 1 total: 1
>   uncompressed: min: 50 max: 50 average: 50 total: 50
>   column values statistics: min: P18582D, max: P18582D, num_nulls: 0
> Here is what the metadata cache file shows:
> "name" : [ "interval_col" ],
> "minValue" : "UDE4NTgyRA==",
> "maxValue" : "UDE4NTgyRA==",
> "nulls" : 0



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (DRILL-7132) Metadata cache does not have correct min/max values for varchar and interval data types

2019-03-22 Thread Robert Hou (JIRA)
Robert Hou created DRILL-7132:
-

 Summary: Metadata cache does not have correct min/max values for 
varchar and interval data types
 Key: DRILL-7132
 URL: https://issues.apache.org/jira/browse/DRILL-7132
 Project: Apache Drill
  Issue Type: Bug
  Components: Metadata
Affects Versions: 1.14.0
Reporter: Robert Hou
 Fix For: 1.17.0
 Attachments: 0_0_10.parquet

The parquet metadata cache does not have correct min/max values for varchar and 
interval data types.

I have attached a parquet file.  Here is what parquet tools shows for varchar:

[varchar_col] BINARY 14.6% of all space [PLAIN, BIT_PACKED] min: 67 max: 67 
average: 67 total: 67 (raw data: 65 saving -3%)
  values: min: 1 max: 1 average: 1 total: 1
  uncompressed: min: 65 max: 65 average: 65 total: 65
  column values statistics: min: ioegjNJKvnkd, max: ioegjNJKvnkd, num_nulls: 0

Here is what the metadata cache file shows:

"name" : [ "varchar_col" ],
"minValue" : "aW9lZ2pOSkt2bmtk",
"maxValue" : "aW9lZ2pOSkt2bmtk",
"nulls" : 0

Here is what parquet tools shows for interval:

[interval_col] BINARY 11.3% of all space [PLAIN, BIT_PACKED] min: 52 max: 52 
average: 52 total: 52 (raw data: 50 saving -4%)
  values: min: 1 max: 1 average: 1 total: 1
  uncompressed: min: 50 max: 50 average: 50 total: 50
  column values statistics: min: P18582D, max: P18582D, num_nulls: 0

Here is what the metadata cache file shows:

"name" : [ "interval_col" ],
"minValue" : "UDE4NTgyRA==",
"maxValue" : "UDE4NTgyRA==",
"nulls" : 0






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (DRILL-7122) TPCDS queries 29 25 17 are slower when Statistics is disabled.

2019-03-19 Thread Robert Hou (JIRA)
Robert Hou created DRILL-7122:
-

 Summary: TPCDS queries 29 25 17 are slower when Statistics is 
disabled.
 Key: DRILL-7122
 URL: https://issues.apache.org/jira/browse/DRILL-7122
 Project: Apache Drill
  Issue Type: Bug
Reporter: Robert Hou






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (DRILL-7123) TPCDS query 83 runs slower when Statistics is disabled

2019-03-19 Thread Robert Hou (JIRA)
Robert Hou created DRILL-7123:
-

 Summary: TPCDS query 83 runs slower when Statistics is disabled
 Key: DRILL-7123
 URL: https://issues.apache.org/jira/browse/DRILL-7123
 Project: Apache Drill
  Issue Type: Bug
  Components: Query Planning  Optimization
Affects Versions: 1.16.0
Reporter: Robert Hou
Assignee: Gautam Parai
 Fix For: 1.16.0


Query is TPCDS 83 with sf 100:
{noformat}
WITH sr_items 
 AS (SELECT i_item_id   item_id, 
Sum(sr_return_quantity) sr_item_qty 
 FROM   store_returns, 
item, 
date_dim 
 WHERE  sr_item_sk = i_item_sk 
AND d_date IN (SELECT d_date 
   FROM   date_dim 
   WHERE  d_week_seq IN (SELECT d_week_seq 
 FROM   date_dim 
 WHERE 
  d_date IN ( '1999-06-30', 
  '1999-08-28', 
  '1999-11-18' 
))) 
AND sr_returned_date_sk = d_date_sk 
 GROUP  BY i_item_id), 
 cr_items 
 AS (SELECT i_item_id   item_id, 
Sum(cr_return_quantity) cr_item_qty 
 FROM   catalog_returns, 
item, 
date_dim 
 WHERE  cr_item_sk = i_item_sk 
AND d_date IN (SELECT d_date 
   FROM   date_dim 
   WHERE  d_week_seq IN (SELECT d_week_seq 
 FROM   date_dim 
 WHERE 
  d_date IN ( '1999-06-30', 
  '1999-08-28', 
  '1999-11-18' 
))) 
AND cr_returned_date_sk = d_date_sk 
 GROUP  BY i_item_id), 
 wr_items 
 AS (SELECT i_item_id   item_id, 
Sum(wr_return_quantity) wr_item_qty 
 FROM   web_returns, 
item, 
date_dim 
 WHERE  wr_item_sk = i_item_sk 
AND d_date IN (SELECT d_date 
   FROM   date_dim 
   WHERE  d_week_seq IN (SELECT d_week_seq 
 FROM   date_dim 
 WHERE 
  d_date IN ( '1999-06-30', 
  '1999-08-28', 
  '1999-11-18' 
))) 
AND wr_returned_date_sk = d_date_sk 
 GROUP  BY i_item_id) 
SELECT sr_items.item_id, 
   sr_item_qty, 
   sr_item_qty / ( sr_item_qty + cr_item_qty + wr_item_qty ) / 3.0 
* 
   100 sr_dev, 
   cr_item_qty, 
   cr_item_qty / ( sr_item_qty + cr_item_qty + wr_item_qty ) / 3.0 
* 
   100 cr_dev, 
   wr_item_qty, 
   wr_item_qty / ( sr_item_qty + cr_item_qty + wr_item_qty ) / 3.0 
* 
   100 wr_dev, 
   ( sr_item_qty + cr_item_qty + wr_item_qty ) / 3.0 
   average 
FROM   sr_items, 
   cr_items, 
   wr_items 
WHERE  sr_items.item_id = cr_items.item_id 
   AND sr_items.item_id = wr_items.item_id 
ORDER  BY sr_items.item_id, 
  sr_item_qty
LIMIT 100; 
{noformat}

The number of threads for major fragments 1 and 2 has changed when Statistics 
is disabled.  The number of minor fragments has been reduced from 10 and 15 
fragments down to 3 fragments.  Rowcount has changed for major fragment 2 from 
1439754.0 down to 287950.8.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (DRILL-7121) TPCH 4 takes longer

2019-03-19 Thread Robert Hou (JIRA)
Robert Hou created DRILL-7121:
-

 Summary: TPCH 4 takes longer
 Key: DRILL-7121
 URL: https://issues.apache.org/jira/browse/DRILL-7121
 Project: Apache Drill
  Issue Type: Bug
  Components: Query Planning  Optimization
Affects Versions: 1.16.0
Reporter: Robert Hou
Assignee: Gautam Parai
 Fix For: 1.16.0


Here is TPCH 4 with sf 100:
{noformat}
select
  o.o_orderpriority,
  count(*) as order_count
from
  orders o

where
  o.o_orderdate >= date '1996-10-01'
  and o.o_orderdate < date '1996-10-01' + interval '3' month
  and 
  exists (
select
  *
from
  lineitem l
where
  l.l_orderkey = o.o_orderkey
  and l.l_commitdate < l.l_receiptdate
  )
group by
  o.o_orderpriority
order by
  o.o_orderpriority;
{noformat}

The plan has changed when Statistics is disabled.   A Hash Agg and a Broadcast 
Exchange have been added.  These two operators expand the number of rows from 
the lineitem table from 137M to 9B rows.   This forces the hash join to use 6GB 
of memory instead of 30 MB.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (DRILL-7120) Query fails with ChannelClosedException

2019-03-19 Thread Robert Hou (JIRA)
Robert Hou created DRILL-7120:
-

 Summary: Query fails with ChannelClosedException
 Key: DRILL-7120
 URL: https://issues.apache.org/jira/browse/DRILL-7120
 Project: Apache Drill
  Issue Type: Bug
  Components: Query Planning  Optimization
Affects Versions: 1.16.0
Reporter: Robert Hou
Assignee: Gautam Parai
 Fix For: 1.16.0


TPCH query 5 fails at sf100.  Here is the query:
{noformat}
select
  n.n_name,
  sum(l.l_extendedprice * (1 - l.l_discount)) as revenue
from
  customer c,
  orders o,
  lineitem l,
  supplier s,
  nation n,
  region r
where
  c.c_custkey = o.o_custkey
  and l.l_orderkey = o.o_orderkey
  and l.l_suppkey = s.s_suppkey
  and c.c_nationkey = s.s_nationkey
  and s.s_nationkey = n.n_nationkey
  and n.n_regionkey = r.r_regionkey
  and r.r_name = 'EUROPE'
  and o.o_orderdate >= date '1997-01-01'
  and o.o_orderdate < date '1997-01-01' + interval '1' year
group by
  n.n_name
order by
  revenue desc;
{noformat}

This is the error from drillbit.log:
{noformat}
2019-03-04 17:46:38,684 [23822b0a-b7bd-0b79-b905-1438f5b1d039:frag:6:64] INFO  
o.a.d.e.w.fragment.FragmentExecutor - 
23822b0a-b7bd-0b79-b905-1438f5b1d039:6:64: State change requested RUNNING --> 
FINISHED
2019-03-04 17:46:38,684 [23822b0a-b7bd-0b79-b905-1438f5b1d039:frag:6:64] INFO  
o.a.d.e.w.f.FragmentStatusReporter - 23822b0a-b7bd-0b79-b905-1438f5b1d039:6:64: 
State to report: FINISHED
2019-03-04 18:17:51,454 [BitServer-13] WARN  
o.a.d.exec.rpc.ProtobufLengthDecoder - Failure allocating buffer on incoming 
stream due to memory limits.  Current Allocation: 262144.
2019-03-04 18:17:51,454 [BitServer-13] ERROR o.a.drill.exec.rpc.data.DataServer 
- Out of memory in RPC layer.
2019-03-04 18:17:51,463 [BitServer-13] ERROR o.a.d.exec.rpc.RpcExceptionHandler 
- Exception in RPC communication.  Connection: /10.10.120.104:31012 <--> 
/10.10.120.106:53048 (data server).  Closing connection.
io.netty.handler.codec.DecoderException: 
org.apache.drill.exec.exception.OutOfMemoryException: Failure allocating buffer.
at 
io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:271)
 ~[netty-codec-4.0.48.Final.jar:4.0.48.Final]
at 
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:356)
 [netty-transport-4.0.48.Final.jar:4.0.48.Final]
at 
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:342)
 [netty-transport-4.0.48.Final.jar:4.0.48.Final]
at 
io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:335)
 [netty-transport-4.0.48.Final.jar:4.0.48.Final]
at 
io.netty.channel.ChannelInboundHandlerAdapter.channelRead(ChannelInboundHandlerAdapter.java:86)
 [netty-transport-4.0.48.Final.jar:4.0.48.Final]
at 
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:356)
 [netty-transport-4.0.48.Final.jar:4.0.48.Final]
at 
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:342)
 [netty-transport-4.0.48.Final.jar:4.0.48.Final]
at 
io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:335)
 [netty-transport-4.0.48.Final.jar:4.0.48.Final]
at 
io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1294)
 [netty-transport-4.0.48.Final.jar:4.0.48.Final]
at 
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:356)
 [netty-transport-4.0.48.Final.jar:4.0.48.Final]
at 
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:342)
 [netty-transport-4.0.48.Final.jar:4.0.48.Final]
at 
io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:911)
 [netty-transport-4.0.48.Final.jar:4.0.48.Final]
at 
io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:131)
 [netty-transport-4.0.48.Final.jar:4.0.48.Final]
at 
io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:645) 
[netty-transport-4.0.48.Final.jar:4.0.48.Final]
at 
io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:580)
 [netty-transport-4.0.48.Final.jar:4.0.48.Final]
at 
io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:497) 
[netty-transport-4.0.48.Final.jar:4.0.48.Final]
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:459) 
[netty-transport-4.0.48.Final.jar:4.0.48.Final]
at 
io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:131)
 [netty-common-4.0.48.Final.jar:4.0.48.Final]
at java.lang.Thread.run(Threa

[jira] [Created] (DRILL-7109) Statistics adds external sort, which spills to disk

2019-03-15 Thread Robert Hou (JIRA)
Robert Hou created DRILL-7109:
-

 Summary: Statistics adds external sort, which spills to disk
 Key: DRILL-7109
 URL: https://issues.apache.org/jira/browse/DRILL-7109
 Project: Apache Drill
  Issue Type: Bug
  Components: Query Planning  Optimization
Affects Versions: 1.16.0
Reporter: Robert Hou
Assignee: Gautam Parai
 Fix For: 1.16.0


TPCH query 4 with sf 100 runs many times slower.  One issue is that an extra 
external sort has been added, and both external sorts spill to disk.

Also, the hash join sees 100x more data.

Here is the query:
{noformat}
select
  o.o_orderpriority,
  count(*) as order_count
from
  orders o

where
  o.o_orderdate >= date '1996-10-01'
  and o.o_orderdate < date '1996-10-01' + interval '3' month
  and 
  exists (
select
  *
from
  lineitem l
where
  l.l_orderkey = o.o_orderkey
  and l.l_commitdate < l.l_receiptdate
  )
group by
  o.o_orderpriority
order by
  o.o_orderpriority;
{noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (DRILL-7108) Statistics adds two exchange operators

2019-03-15 Thread Robert Hou (JIRA)
Robert Hou created DRILL-7108:
-

 Summary: Statistics adds two exchange operators
 Key: DRILL-7108
 URL: https://issues.apache.org/jira/browse/DRILL-7108
 Project: Apache Drill
  Issue Type: Bug
  Components: Query Planning  Optimization
Affects Versions: 1.16.0
Reporter: Robert Hou
Assignee: Gautam Parai
 Fix For: 1.16.0


TPCH 16 with sf 100 runs 14% slower.  Here is the query:
{noformat}
select
  p.p_brand,
  p.p_type,
  p.p_size,
  count(distinct ps.ps_suppkey) as supplier_cnt
from
  partsupp ps,
  part p
where
  p.p_partkey = ps.ps_partkey
  and p.p_brand <> 'Brand#21'
  and p.p_type not like 'MEDIUM PLATED%'
  and p.p_size in (38, 2, 8, 31, 44, 5, 14, 24)
  and ps.ps_suppkey not in (
select
  s.s_suppkey
from
  supplier s
where
  s.s_comment like '%Customer%Complaints%'
  )
group by
  p.p_brand,
  p.p_type,
  p.p_size
order by
  supplier_cnt desc,
  p.p_brand,
  p.p_type,
  p.p_size;
{noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (DRILL-6957) Parquet rowgroup filtering can have incorrect file count

2019-01-08 Thread Robert Hou (JIRA)
Robert Hou created DRILL-6957:
-

 Summary: Parquet rowgroup filtering can have incorrect file count
 Key: DRILL-6957
 URL: https://issues.apache.org/jira/browse/DRILL-6957
 Project: Apache Drill
  Issue Type: Bug
Reporter: Robert Hou
Assignee: Jean-Blas IMBERT


If a query accesses all the files, the Scan operator indicates that one file is 
accessed.  The number of rowgroups is correct.

Here is an example query:
{noformat}
select count(*) from 
dfs.`/custdata/tudata/fact/vintage/snapshot_period_id=20151231/comp_id=120` 
where cur_tot_bal_amt < 100
{noformat}

Here is the plan:
{noformat}
Screen : rowType = RecordType(BIGINT EXPR$0): rowcount = 1.0, cumulative cost = 
{9.8376721446E9 rows, 4.35668337906E10 cpu, 2.810763469E9 io, 4096.0 network, 
0.0 memory}, id = 4477
00-01  Project(EXPR$0=[$0]) : rowType = RecordType(BIGINT EXPR$0): rowcount 
= 1.0, cumulative cost = {9.8376721445E9 rows, 4.35668337905E10 cpu, 
2.810763469E9 io, 4096.0 network, 0.0 memory}, id = 4476
00-02StreamAgg(group=[{}], EXPR$0=[$SUM0($0)]) : rowType = 
RecordType(BIGINT EXPR$0): rowcount = 1.0, cumulative cost = {9.8376721435E9 
rows, 4.35668337895E10 cpu, 2.810763469E9 io, 4096.0 network, 0.0 memory}, id = 
4475
00-03  UnionExchange : rowType = RecordType(BIGINT EXPR$0): rowcount = 
1.0, cumulative cost = {9.8376721425E9 rows, 4.35668337775E10 cpu, 
2.810763469E9 io, 4096.0 network, 0.0 memory}, id = 4474
01-01StreamAgg(group=[{}], EXPR$0=[COUNT()]) : rowType = 
RecordType(BIGINT EXPR$0): rowcount = 1.0, cumulative cost = {9.8376721415E9 
rows, 4.35668337695E10 cpu, 2.810763469E9 io, 0.0 network, 0.0 memory}, id = 
4473
01-02  Project($f0=[0]) : rowType = RecordType(INTEGER $f0): 
rowcount = 1.4053817345E9, cumulative cost = {8.432290407E9 rows, 
2.67022529555E10 cpu, 2.810763469E9 io, 0.0 network, 0.0 memory}, id = 4472
01-03SelectionVectorRemover : rowType = RecordType(ANY 
cur_tot_bal_amt): rowcount = 1.4053817345E9, cumulative cost = {7.0269086725E9 
rows, 2.10807260175E10 cpu, 2.810763469E9 io, 0.0 network, 0.0 memory}, id = 
4471
01-04  Filter(condition=[($0, 100)]) : rowType = 
RecordType(ANY cur_tot_bal_amt): rowcount = 1.4053817345E9, cumulative cost = 
{5.621526938E9 rows, 1.9675344283E10 cpu, 2.810763469E9 io, 0.0 network, 0.0 
memory}, id = 4470
01-05Scan(table=[[dfs, 
/custdata/tudata/fact/vintage/snapshot_period_id=20151231/comp_id=120]], 
groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath 
[path=maprfs:///custdata/tudata/fact/vintage/snapshot_period_id=20151231/comp_id=120]],
 
selectionRoot=maprfs:/custdata/tudata/fact/vintage/snapshot_period_id=20151231/comp_id=120,
 numFiles=1, numRowGroups=1007, usedMetadataFile=false, 
columns=[`cur_tot_bal_amt`]]]) : rowType = RecordType(ANY cur_tot_bal_amt): 
rowcount = 2.810763469E9, cumulative cost = {2.810763469E9 rows, 2.810763469E9 
cpu, 2.810763469E9 io, 0.0 network, 0.0 memory}, id = 4469
{noformat}

numFiles is set to 1 when it should be set to 21.

All the files are in one directory.  If I add a level of directories (i.e. a 
directory with multiple directories, each with files), then I get the correct 
file count.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (DRILL-6906) File permissions are not being honored

2018-12-15 Thread Robert Hou (JIRA)
Robert Hou created DRILL-6906:
-

 Summary: File permissions are not being honored
 Key: DRILL-6906
 URL: https://issues.apache.org/jira/browse/DRILL-6906
 Project: Apache Drill
  Issue Type: Bug
  Components: Client - JDBC, Client - ODBC
Affects Versions: 1.15.0
Reporter: Robert Hou
Assignee: Pritesh Maker
 Fix For: 1.15.0


I ran sqlline with user "kuser1".
{noformat}
/opt/mapr/drill/drill-1.15.0.apache/bin/sqlline -u 
"jdbc:drill:drillbit=10.10.30.206" -n kuser1 -p mapr
{noformat}

I tried to access a file that is only accessible by root:
{noformat}
[root@perfnode206 drill-test-framework_krystal]# hf -ls 
/drill/testdata/impersonation/neg_tc5/student
-rwx--   3 root root  64612 2018-06-19 10:30 
/drill/testdata/impersonation/neg_tc5/student
{noformat}

I am able to read the table, which should not be possible.  I used this commit 
for Drill 1.15.
{noformat}
git.commit.id=bf2b414ac62cfc515fdd77f2688bb110073d764d
git.commit.message.full=DRILL-6866\: Upgrade to SqlLine 1.6.0\n\n1. Changed 
SqlLine version to 1.6.0.\n2. Overridden new getVersion method in 
DrillSqlLineApplication.\n3. Set maxColumnWidth to 80 to avoid issue described 
in DRILL-6769.\n4. Changed colorScheme to obsidian.\n5. Output null value for 
varchar / char / boolean types as null instead of empty string.\n6. Changed 
access modifier from package default to public for JDBC classes that implement 
external interfaces to avoid issues when calling methods from these classes 
using reflection.\n\ncloses \#1556
{noformat}

This is from drillbit.log.  It shows that user is kuser1.
{noformat}
2018-12-15 05:00:52,516 [23eb04fb-1701-bea7-dd97-ecda58795b3b:foreman] DEBUG 
o.a.d.e.w.f.QueryStateProcessor - 23eb04fb-1701-bea7-dd97-ecda58795b3b: State 
change requested PREPARING --> PLANNING
2018-12-15 05:00:52,531 [23eb04fb-1701-bea7-dd97-ecda58795b3b:foreman] INFO  
o.a.drill.exec.work.foreman.Foreman - Query text for query with id 
23eb04fb-1701-bea7-dd97-ecda58795b3b issued by kuser1: select * from 
dfs.`/drill/testdata/impersonation/neg_tc5/student`
{noformat}

It is not clear to me if this is a Drill problem or a file system problem.  I 
tested MFS by logging in as kuser1 and trying to copy the file using "hadoop fs 
-copyToLocal /drill/testdata/impersonation/neg_tc5/student" and got an error, 
and was not able to copy the file.  So I think MFS permissions are working.

I also tried with Drill 1.14, and I get the expected error:
{noformat}
0: jdbc:drill:drillbit=10.10.30.206> select * from 
dfs.`/drill/testdata/impersonation/neg_tc5/student` limit 1;
Error: VALIDATION ERROR: From line 1, column 15 to line 1, column 17: Object 
'/drill/testdata/impersonation/neg_tc5/student' not found within 'dfs'

[Error Id: cdf18c2a-b005-4f92-b819-d4324e8807d9 on perfnode206.perf.lab:31010] 
(state=,code=0)
{noformat}

The commit for Drill 1.14 is:
{noformat}
git.commit.message.full=[maven-release-plugin] prepare release drill-1.14.0\n
git.commit.id=0508a128853ce796ca7e99e13008e49442f83147
{noformat}

This problem exists with both Apache JDBC and Simba ODBC.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (DRILL-6902) Extra limit operator is not needed

2018-12-12 Thread Robert Hou (JIRA)
Robert Hou created DRILL-6902:
-

 Summary: Extra limit operator is not needed
 Key: DRILL-6902
 URL: https://issues.apache.org/jira/browse/DRILL-6902
 Project: Apache Drill
  Issue Type: Bug
  Components: Query Planning  Optimization
Affects Versions: 1.15.0
Reporter: Robert Hou
Assignee: Pritesh Maker


For TPCDS query 49, there is an extra limit operator that is not needed.

Here is the query:
{noformat}
SELECT 'web' AS channel, 
   web.item, 
   web.return_ratio, 
   web.return_rank, 
   web.currency_rank 
FROM   (SELECT item, 
   return_ratio, 
   currency_ratio, 
   Rank() 
 OVER ( 
   ORDER BY return_ratio)   AS return_rank, 
   Rank() 
 OVER ( 
   ORDER BY currency_ratio) AS currency_rank 
FROM   (SELECT ws.ws_item_sk   AS 
   item, 
   ( Cast(Sum(COALESCE(wr.wr_return_quantity, 0)) AS 
DEC(15, 
  4)) / 
 Cast( 
 Sum(COALESCE(ws.ws_quantity, 0)) AS DEC(15, 4)) ) AS 
   return_ratio, 
   ( Cast(Sum(COALESCE(wr.wr_return_amt, 0)) AS DEC(15, 4)) 
 / Cast( 
 Sum( 
 COALESCE(ws.ws_net_paid, 0)) AS DEC(15, 
 4)) ) AS 
   currency_ratio 
FROM   web_sales ws 
   LEFT OUTER JOIN web_returns wr 
ON ( ws.ws_order_number = 
wr.wr_order_number 
 AND ws.ws_item_sk = wr.wr_item_sk ), 
   date_dim 
WHERE  wr.wr_return_amt > 1 
   AND ws.ws_net_profit > 1 
   AND ws.ws_net_paid > 0 
   AND ws.ws_quantity > 0 
   AND ws_sold_date_sk = d_date_sk 
   AND d_year = 1999 
   AND d_moy = 12 
GROUP  BY ws.ws_item_sk) in_web) web 
WHERE  ( web.return_rank <= 10 
  OR web.currency_rank <= 10 ) 
UNION 
SELECT 'catalog' AS channel, 
   catalog.item, 
   catalog.return_ratio, 
   catalog.return_rank, 
   catalog.currency_rank 
FROM   (SELECT item, 
   return_ratio, 
   currency_ratio, 
   Rank() 
 OVER ( 
   ORDER BY return_ratio)   AS return_rank, 
   Rank() 
 OVER ( 
   ORDER BY currency_ratio) AS currency_rank 
FROM   (SELECT cs.cs_item_sk   AS 
   item, 
   ( Cast(Sum(COALESCE(cr.cr_return_quantity, 0)) AS 
DEC(15, 
  4)) / 
 Cast( 
 Sum(COALESCE(cs.cs_quantity, 0)) AS DEC(15, 4)) ) AS 
   return_ratio, 
   ( Cast(Sum(COALESCE(cr.cr_return_amount, 0)) AS DEC(15, 
4 
  )) / 
 Cast(Sum( 
 COALESCE(cs.cs_net_paid, 0)) AS DEC( 
 15, 4)) ) AS 
   currency_ratio 
FROM   catalog_sales cs 
   LEFT OUTER JOIN catalog_returns cr 
ON ( cs.cs_order_number = 
cr.cr_order_number 
 AND cs.cs_item_sk = cr.cr_item_sk ), 
   date_dim 
WHERE  cr.cr_return_amount > 1 
   AND cs.cs_net_profit > 1 
   AND cs.cs_net_paid > 0 
   AND cs.cs_quantity > 0 
   AND cs_sold_date_sk = d_date_sk 
   AND d_year = 1999 
   AND d_moy = 12 
GROUP  BY cs.cs_item_sk) in_cat) catalog 
WHERE  ( catalog.return_rank <= 10 
  OR catalog.currency_rank <= 10 ) 
UNION 
SELECT 'store' AS channel, 
   store.item, 
   store.return_ratio, 
   store.return_rank, 
   store.currency_rank 
FROM   (SELECT item, 
   return_ratio, 
   currency_ratio, 
   Rank() 
 OVER ( 
   ORDER BY return_ratio)   AS return_rank, 
   Rank() 
 OVER ( 
   ORDER BY currency_ratio) AS currency_rank 
FROM   (SELECT sts.ss_item_sk   AS 
   item, 
   ( Cast(S

[jira] [Created] (DRILL-6897) TPCH 13 has regressed

2018-12-11 Thread Robert Hou (JIRA)
Robert Hou created DRILL-6897:
-

 Summary: TPCH 13 has regressed
 Key: DRILL-6897
 URL: https://issues.apache.org/jira/browse/DRILL-6897
 Project: Apache Drill
  Issue Type: Bug
  Components: Execution - Relational Operators
Affects Versions: 1.15.0
Reporter: Robert Hou
Assignee: Karthikeyan Manivannan
 Attachments: 240099ed-ef2a-a23a-4559-f1b2e0809e72.sys.drill, 
2400be84-c024-cb92-8743-3211589e0247.sys.drill

I ran TPCH query 13 with both scale factor 100 and 1000, and ran them 3x to get 
a warm start, and ran them twice to verify the regression. It is regressing 
between 26 and 33%.

Here is the query:
{noformat}
select
  c_count,
  count(*) as custdist
from
  (
select
  c.c_custkey,
  count(o.o_orderkey)
from
  customer c 
  left outer join orders o 
on c.c_custkey = o.o_custkey
and o.o_comment not like '%special%requests%'
group by
  c.c_custkey
  ) as orders (c_custkey, c_count)
group by
  c_count
order by
  custdist desc,
  c_count desc;
{noformat}

I have attached two profiles. 240099ed-ef2a-a23a-4559-f1b2e0809e72 is for Drill 
1.15. 2400be84-c024-cb92-8743-3211589e0247 is for Drill 1.14. The commit for 
Drill 1.15 is 596227bbbecfb19bdb55dd8ea58159890f83bc9c. The commit for Drill 
1.14 is 0508a128853ce796ca7e99e13008e49442f83147.

The two plans nearly the same. One difference is that Drill 1.15 is using four 
times more memory in operator 07-01 Unordered Mux Exchange. I think the problem 
may be in operator 09-01 Project. Drill 1.15 is projecting the comment field 
while Drill 1.14 does not project the comment field.

Another issue is that the Drill 1.15 takes more processing time to filter the 
order table. Filter operator 09-03 takes an average of 19.3s. For Drill 1.14, 
filter operator 09-04 takes an average of 15.6s. They process the same number 
of rows, and have the same number of minor fragments.





--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: [ANNOUNCE] New Committer: Karthikeyan Manivannan

2018-12-07 Thread Robert Hou
Congratulations, Karthik!  Thanks for all your contributions.

--Robert

On Fri, Dec 7, 2018 at 11:15 PM weijie tong  wrote:

> Congratulations Karthik !
>
> On Sat, Dec 8, 2018 at 12:10 PM Karthikeyan Manivannan <
> kmanivan...@mapr.com>
> wrote:
>
> > Thanks! In addition to all you wonderful Drillers, I would also like to
> > thank Google, StackOverflow and Larry Tesler
> > <
> >
> https://urldefense.proofpoint.com/v2/url?u=https-3A__www.indiatoday.in_education-2Dtoday_gk-2Dcurrent-2Daffairs_story_copy-2Dpaste-2Dinventor-2D337401-2D2016-2D08-2D26=DwIBaQ=cskdkSMqhcnjZxdQVpwTXg=GXRJhB4g1YFDJsrcglHwUA=unIwO2bGiU-CmEDMlh04j5SH0l7I9oQQysVWsBaBe2o=v_edFrOdFEaw0rIWpVS2PNSEJjUlIq28Kh0O3ULmBPE=
> > >
> > .
> >
> > On Fri, Dec 7, 2018 at 3:59 PM Padma Penumarthy <
> > penumarthy.pa...@gmail.com>
> > wrote:
> >
> > > Congrats Karthik.
> > >
> > > Thanks
> > > Padma
> > >
> > >
> > > On Fri, Dec 7, 2018 at 1:33 PM Paul Rogers 
> > > wrote:
> > >
> > > > Congrats Karthik!
> > > >
> > > > - Paul
> > > >
> > > > Sent from my iPhone
> > > >
> > > > > On Dec 7, 2018, at 11:12 AM, Abhishek Girish 
> > > wrote:
> > > > >
> > > > > Congratulations Karthik!
> > > > >
> > > > >> On Fri, Dec 7, 2018 at 11:11 AM Arina Ielchiieva <
> ar...@apache.org>
> > > > wrote:
> > > > >>
> > > > >> The Project Management Committee (PMC) for Apache Drill has
> invited
> > > > >> Karthikeyan
> > > > >> Manivannan to become a committer, and we are pleased to announce
> > that
> > > he
> > > > >> has accepted.
> > > > >>
> > > > >> Karthik started contributing to the Drill project in 2016. He has
> > > > >> implemented changes in various Drill areas, including batch
> sizing,
> > > > >> security, code-gen, C++ part. One of his latest improvements is
> ACL
> > > > >> support for Drill ZK nodes.
> > > > >>
> > > > >> Welcome Karthik, and thank you for your contributions!
> > > > >>
> > > > >> - Arina
> > > > >> (on behalf of Drill PMC)
> > > > >>
> > > >
> > >
> >
>


[jira] [Resolved] (DRILL-6828) Hit UnrecognizedPropertyException when run tpch queries

2018-11-26 Thread Robert Hou (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-6828?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Hou resolved DRILL-6828.
---
Resolution: Cannot Reproduce

> Hit UnrecognizedPropertyException when run tpch queries
> ---
>
> Key: DRILL-6828
> URL: https://issues.apache.org/jira/browse/DRILL-6828
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Functions - Drill
>Affects Versions: 1.15.0
> Environment: RHEL 7,   Apache Drill commit id: 
> 18e09a1b1c801f2691a05ae7db543bf71874cfea
>Reporter: Dechang Gu
>Assignee: Robert Hou
>Priority: Blocker
> Fix For: 1.15.0
>
>
> Installed Apache Drill 1.15.0 commit id: 
> 18e09a1b1c801f2691a05ae7db543bf71874cfea DRILL-6763: Codegen optimization of 
> SQL functions with constant values(\#1481)
> Hit the following errors:
> {code}
> java.sql.SQLException: SYSTEM ERROR: UnrecognizedPropertyException: 
> Unrecognized field "outgoingBatchSize" (class 
> org.apache.drill.exec.physical.config.HashPartitionSender), not marked as 
> ignorable (9 known properties: "receiver-major-fragment", 
> "initialAllocation", "expr", "userName", "@id", "child", "cost", 
> "destinations", "maxAllocation"])
>  at [Source: (StringReader); line: 1000, column: 29] (through reference 
> chain: 
> org.apache.drill.exec.physical.config.HashPartitionSender["outgoingBatchSize"])
> Fragment 3:175
> Please, refer to logs for more information.
> [Error Id: cc023cdb-9a46-4edd-ad0b-6da1e9085291 on ucs-node6.perf.lab:31010]
> at 
> org.apache.drill.jdbc.impl.DrillCursor.nextRowInternally(DrillCursor.java:528)
> at 
> org.apache.drill.jdbc.impl.DrillCursor.loadInitialSchema(DrillCursor.java:600)
> at 
> org.apache.drill.jdbc.impl.DrillResultSetImpl.execute(DrillResultSetImpl.java:1288)
> at 
> org.apache.drill.jdbc.impl.DrillResultSetImpl.execute(DrillResultSetImpl.java:61)
> at 
> org.apache.calcite.avatica.AvaticaConnection$1.execute(AvaticaConnection.java:667)
> at 
> org.apache.drill.jdbc.impl.DrillMetaImpl.prepareAndExecute(DrillMetaImpl.java:1109)
> at 
> org.apache.drill.jdbc.impl.DrillMetaImpl.prepareAndExecute(DrillMetaImpl.java:1120)
> at 
> org.apache.calcite.avatica.AvaticaConnection.prepareAndExecuteInternal(AvaticaConnection.java:675)
> at 
> org.apache.drill.jdbc.impl.DrillConnectionImpl.prepareAndExecuteInternal(DrillConnectionImpl.java:196)
> at 
> org.apache.calcite.avatica.AvaticaStatement.executeInternal(AvaticaStatement.java:156)
> at 
> org.apache.calcite.avatica.AvaticaStatement.executeQuery(AvaticaStatement.java:227)
> at PipSQueak.executeQuery(PipSQueak.java:289)
> at PipSQueak.runTest(PipSQueak.java:104)
> at PipSQueak.main(PipSQueak.java:477)
> Caused by: org.apache.drill.common.exceptions.UserRemoteException: SYSTEM 
> ERROR: UnrecognizedPropertyException: Unrecognized field "outgoingBatchSize" 
> (class org.apache.drill.exec.physical.config.HashPartitionSender), not marked 
> as ignorable (9 known properties: "receiver-major-fragment", 
> "initialAllocation", "expr", "userName", "@id", "child", "cost", 
> "destinations", "maxAllocation"])
>  at [Source: (StringReader); line: 1000, column: 29] (through reference 
> chain: 
> org.apache.drill.exec.physical.config.HashPartitionSender["outgoingBatchSize"])
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (DRILL-6567) Jenkins Regression: TPCDS query 93 fails with INTERNAL_ERROR ERROR: java.lang.reflect.UndeclaredThrowableException.

2018-11-20 Thread Robert Hou (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-6567?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Hou resolved DRILL-6567.
---
Resolution: Fixed
  Assignee: Vitalii Diravka  (was: Robert Hou)

> Jenkins Regression: TPCDS query 93 fails with INTERNAL_ERROR ERROR: 
> java.lang.reflect.UndeclaredThrowableException.
> ---
>
> Key: DRILL-6567
> URL: https://issues.apache.org/jira/browse/DRILL-6567
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Relational Operators
>Affects Versions: 1.14.0
>Reporter: Robert Hou
>Assignee: Vitalii Diravka
>Priority: Critical
> Fix For: 1.15.0
>
>
> This is TPCDS Query 93.
> Query: 
> /root/drillAutomation/framework-master/framework/resources/Advanced/tpcds/tpcds_sf100/hive/parquet/query93.sql
> SELECT ss_customer_sk,
> Sum(act_sales) sumsales
> FROM   (SELECT ss_item_sk,
> ss_ticket_number,
> ss_customer_sk,
> CASE
> WHEN sr_return_quantity IS NOT NULL THEN
> ( ss_quantity - sr_return_quantity ) * ss_sales_price
> ELSE ( ss_quantity * ss_sales_price )
> END act_sales
> FROM   store_sales
> LEFT OUTER JOIN store_returns
> ON ( sr_item_sk = ss_item_sk
> AND sr_ticket_number = ss_ticket_number ),
> reason
> WHERE  sr_reason_sk = r_reason_sk
> AND r_reason_desc = 'reason 38') t
> GROUP  BY ss_customer_sk
> ORDER  BY sumsales,
> ss_customer_sk
> LIMIT 100;
> Here is the stack trace:
> 2018-06-29 07:00:32 INFO  DrillTestLogger:348 - 
> Exception:
> java.sql.SQLException: INTERNAL_ERROR ERROR: 
> java.lang.reflect.UndeclaredThrowableException
> Setup failed for null
> Fragment 4:56
> [Error Id: 3c72c14d-9362-4a9b-affb-5cf937bed89e on atsqa6c82.qa.lab:31010]
>   (org.apache.drill.common.exceptions.ExecutionSetupException) 
> java.lang.reflect.UndeclaredThrowableException
> 
> org.apache.drill.common.exceptions.ExecutionSetupException.fromThrowable():30
> org.apache.drill.exec.store.hive.readers.HiveAbstractReader.setup():327
> org.apache.drill.exec.physical.impl.ScanBatch.getNextReaderIfHas():245
> org.apache.drill.exec.physical.impl.ScanBatch.next():164
> org.apache.drill.exec.record.AbstractRecordBatch.next():119
> 
> org.apache.drill.exec.physical.impl.join.HashJoinBatch.sniffNonEmptyBatch():276
> 
> org.apache.drill.exec.physical.impl.join.HashJoinBatch.prefetchFirstBatchFromBothSides():238
> org.apache.drill.exec.physical.impl.join.HashJoinBatch.buildSchema():218
> org.apache.drill.exec.record.AbstractRecordBatch.next():152
> org.apache.drill.exec.record.AbstractRecordBatch.next():119
> org.apache.drill.exec.record.AbstractRecordBatch.next():109
> org.apache.drill.exec.record.AbstractUnaryRecordBatch.innerNext():63
> 
> org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext():147
> org.apache.drill.exec.record.AbstractRecordBatch.next():172
> org.apache.drill.exec.record.AbstractRecordBatch.next():119
> 
> org.apache.drill.exec.physical.impl.join.HashJoinBatch.sniffNonEmptyBatch():276
> 
> org.apache.drill.exec.physical.impl.join.HashJoinBatch.prefetchFirstBatchFromBothSides():238
> org.apache.drill.exec.physical.impl.join.HashJoinBatch.buildSchema():218
> org.apache.drill.exec.record.AbstractRecordBatch.next():152
> org.apache.drill.exec.record.AbstractRecordBatch.next():119
> org.apache.drill.exec.record.AbstractRecordBatch.next():109
> org.apache.drill.exec.record.AbstractUnaryRecordBatch.innerNext():63
> 
> org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext():147
> org.apache.drill.exec.record.AbstractRecordBatch.next():172
> org.apache.drill.exec.record.AbstractRecordBatch.next():119
> org.apache.drill.exec.record.AbstractRecordBatch.next():109
> 
> org.apache.drill.exec.physical.impl.aggregate.HashAggBatch.buildSchema():118
> org.apache.drill.exec.record.AbstractRecordBatch.next():152
> org.apache.drill.exec.physical.impl.BaseRootExec.next():103
> 
> org.apache.drill.exec.physical.impl.partitionsender.PartitionSenderRootExec.innerNext():152
> org.apache.drill.exec.physical.impl.BaseRootExec.next():93
> org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():294
> org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():281
> java.security.AccessController.doPrivileged():-2
> javax.security.auth.Subject.doAs():422
> org.apache.hadoop.security.UserGroupInformation.doAs():1595
> org.

Re: [ANNOUNCE] New Committer: Hanumath Rao Maduri

2018-11-01 Thread Robert Hou
Congratulations, Hanu.  Thanks for contributing to Drill.

--Robert

On Thu, Nov 1, 2018 at 4:06 PM Jyothsna Reddy 
wrote:

> Congrats Hanu!! Well deserved :D
>
> Thank you,
> Jyothsna
>
> On Thu, Nov 1, 2018 at 2:15 PM Sorabh Hamirwasia 
> wrote:
>
> > Congratulations Hanu!
> >
> > Thanks,
> > Sorabh
> >
> > On Thu, Nov 1, 2018 at 1:35 PM Hanumath Rao Maduri 
> > wrote:
> >
> > > Thank you all for the wishes!
> > >
> > > Thanks,
> > > -Hanu
> > >
> > > On Thu, Nov 1, 2018 at 1:28 PM Chunhui Shi  > > .invalid>
> > > wrote:
> > >
> > > > Congratulations Hanu!
> > > > --
> > > > From:Arina Ielchiieva 
> > > > Send Time:2018 Nov 1 (Thu) 06:05
> > > > To:dev ; user 
> > > > Subject:[ANNOUNCE] New Committer: Hanumath Rao Maduri
> > > >
> > > > The Project Management Committee (PMC) for Apache Drill has invited
> > > > Hanumath
> > > > Rao Maduri to become a committer, and we are pleased to announce that
> > he
> > > > has accepted.
> > > >
> > > > Hanumath became a contributor in 2017, making changes mostly in the
> > Drill
> > > > planning side, including lateral / unnest support. He is also one of
> > the
> > > > contributors of index based planning and execution support.
> > > >
> > > > Welcome Hanumath, and thank you for your contributions!
> > > >
> > > > - Arina
> > > > (on behalf of Drill PMC)
> > > >
> > >
> >
>


[jira] [Created] (DRILL-6787) Update Spnego webpage

2018-10-09 Thread Robert Hou (JIRA)
Robert Hou created DRILL-6787:
-

 Summary: Update Spnego webpage
 Key: DRILL-6787
 URL: https://issues.apache.org/jira/browse/DRILL-6787
 Project: Apache Drill
  Issue Type: Bug
Affects Versions: 1.14.0
Reporter: Robert Hou
Assignee: Bridget Bevens
 Fix For: 1.15.0


A few things should be updated on this webpage:
https://drill.apache.org/docs/configuring-drill-to-use-spnego-for-http-authentication/

When configuring drillbits in drill-override.conf, the principal and keytab 
should be corrected.  There are two places where this should be corrected.
{noformat}
drill.exec.http: {
  auth.spnego.principal:"HTTP/hostname@realm",
  auth.spnego.keytab:"path/to/keytab",
  auth.mechanisms: [“SPNEGO”]
}
{noformat}
For the section on Chrome, we should change "hostname/domain" to "domain".  Or 
"hostname@domain".  Also, the two blanks around the "=" should be removed.
{noformat}
google-chrome --auth-server-whitelist="hostname/domain"
{noformat}
Also, for the section on Chrome, the "domain" should match the URL given to 
Chrome to access the Web UI.

Also, Linux and Mac should be treated in separate paragraphs.  These should be 
the directions for Mac:
{noformat}
cd /Applications/Google Chrome.app/Contents/MacOS
./"Google Chrome" --auth-server-whitelist="example.com"
{noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: [ANNOUNCE] New Committer: Chunhui Shi

2018-09-29 Thread Robert Hou
Congratulations, Chun-hui.  Thanks for contributing to Drill.

--Robert

On Sat, Sep 29, 2018 at 11:47 AM rahul challapalli <
challapallira...@gmail.com> wrote:

> Congratulations Chunhui!
>
> On Sat, Sep 29, 2018, 11:39 AM Kunal Khatua  wrote:
>
> > Congratulations, Chunhui !!
> > On 9/28/2018 7:31:44 PM, Chunhui Shi 
> > wrote:
> > Thank you Arina, PMCs, and every driller friends! I deeply appreciate the
> > opportunity to be part of this global growing community of awesome
> > developers.
> >
> > Best regards,
> > Chunhui
> >
> >
> > --
> > From:Arina Ielchiieva
> > Send Time:2018 Sep 28 (Fri) 02:17
> > To:dev ; user
> > Subject:[ANNOUNCE] New Committer: Chunhui Shi
> >
> > The Project Management Committee (PMC) for Apache Drill has invited
> Chunhui
> > Shi to become a committer, and we are pleased to announce that he has
> > accepted.
> >
> > Chunhui Shi has become a contributor since 2016, making changes in
> various
> > Drill areas. He has shown profound knowledge in Drill planning side
> during
> > his work to support lateral join. He is also one of the contributors of
> the
> > upcoming feature to support index based planning and execution.
> >
> > Welcome Chunhui, and thank you for your contributions!
> >
> > - Arina
> > (on behalf of Drill PMC)
> >
> >
>


Re: [ANNOUNCE] New Committer: Weijie Tong

2018-08-31 Thread Robert Hou
Congrats Weijie!  Thanks for working on Drill.

--Robert

On Fri, Aug 31, 2018 at 1:38 PM, Boaz Ben-Zvi  wrote:

>Congrat.s Weijie - and thanks for implementing the Bloom Filters fro
> Drill .
>
>  Boaz
>
>
> On 8/31/18 1:04 PM, Aman Sinha wrote:
>
>> Congratulations Weijie ! Thanks for your contributions.
>>
>> On Fri, Aug 31, 2018 at 11:58 AM salim achouche 
>> wrote:
>>
>> Congrats  Weijie!
>>>
>>> On Fri, Aug 31, 2018 at 10:28 AM Paul Rogers 
>>> wrote:
>>>
>>> Congratulations Weijie, thanks for your contributions to Drill.
 Thanks,
 - Paul



  On Friday, August 31, 2018, 8:51:30 AM PDT, Arina Ielchiieva <
 ar...@apache.org> wrote:

   The Project Management Committee (PMC) for Apache Drill has invited

>>> Weijie
>>>
 Tong to become a committer, and we are pleased to announce that he has
 accepted.

 Weijie Tong has become a very active contributor to Drill in recent

>>> months.
>>>
 He contributed the Join predicate push down feature which will be

>>> available
>>>
 in Apache Drill 1.15. The feature is non trivial and has covered changes
 to all aspects of Drill: RPC layer, Planning, and Execution.

 Welcome Weijie, and thank you for your contributions!

 - Arina
 (on behalf of Drill PMC)


>>>
>>> --
>>> Regards,
>>> Salim
>>>
>>>
>


[jira] [Created] (DRILL-6726) Drill should return a better error message when a view uses a table that has a mixed case schema

2018-08-31 Thread Robert Hou (JIRA)
Robert Hou created DRILL-6726:
-

 Summary: Drill should return a better error message when a view 
uses a table that has a mixed case schema
 Key: DRILL-6726
 URL: https://issues.apache.org/jira/browse/DRILL-6726
 Project: Apache Drill
  Issue Type: Bug
  Components: Execution - Relational Operators
Affects Versions: 1.14.0
Reporter: Robert Hou
Assignee: Arina Ielchiieva
 Fix For: 1.15.0


Drill 1.14 changes schemas to be case-insensitive (DRILL-6492). If a view 
references a schema which has upper case letters, the view needs to be rebuilt. 
For example:
{noformat}
create or replace view `dfs.drillTestDirP1`.student_parquet_v as select * from 
`dfs.drillTestDirP1`.student;
{noformat}
If a query references this schema, Drill will return an exception:
{noformat}
java.sql.SQLException: VALIDATION ERROR: Failure while attempting to expand 
view. Requested schema drillTestDirP1 not available in schema dfs.
{noformat}
It would be helpful to users if the error message explains that these views 
need to be re-created.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (DRILL-6725) Views cannot use tables with mixed case schemas

2018-08-31 Thread Robert Hou (JIRA)
Robert Hou created DRILL-6725:
-

 Summary: Views cannot use tables with mixed case schemas
 Key: DRILL-6725
 URL: https://issues.apache.org/jira/browse/DRILL-6725
 Project: Apache Drill
  Issue Type: Bug
  Components: Documentation
Affects Versions: 1.14.0
Reporter: Robert Hou
Assignee: Bridget Bevens
 Fix For: 1.14.0


Drill 1.14 changes schemas to be case-insensitive  (DRILL-6492).  If a view 
references a schema which has upper case letters, the view needs to be rebuilt. 
 For example:

create or replace view `dfs.drillTestDirP1`.student_parquet_v as select * from 
`dfs.drillTestDirP1`.student;

Do we have release notes?  If so, this should be documented.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: [ANNOUNCE] New PMC member: Volodymyr Vysotskyi

2018-08-26 Thread Robert Hou
Congratulations, Volodymyr!  Thank you for all your work on Drill.

On Sat, Aug 25, 2018 at 2:38 PM, Timothy Farkas  wrote:

> Congratulations, Volodymyr!
>
> On Sat, Aug 25, 2018 at 9:00 AM, Kunal Khatua  wrote:
>
> > Congratulations, Volodymyr!
> > On 8/25/2018 6:32:07 AM, weijie tong  wrote:
> > Congratulations Volodymyr!
> >
> > On Sat, Aug 25, 2018 at 8:30 AM salim achouche wrote:
> >
> > > Congrats Volodymyr!
> > >
> > > On Fri, Aug 24, 2018 at 11:32 AM Gautam Parai wrote:
> > >
> > > > Congratulations Vova!
> > > >
> > > > Gautam
> > > >
> > > > On Fri, Aug 24, 2018 at 10:59 AM, Khurram Faraaz
> > > wrote:
> > > >
> > > > > Congratulations Volodymyr!
> > > > >
> > > > > Regards,
> > > > > Khurram
> > > > >
> > > > > On Fri, Aug 24, 2018 at 10:25 AM, Hanumath Rao Maduri
> > > > hanu@gmail.com>
> > > > > wrote:
> > > > >
> > > > > > Congratulations Volodymyr!
> > > > > >
> > > > > > Thanks,
> > > > > > -Hanu
> > > > > >
> > > > > > On Fri, Aug 24, 2018 at 10:22 AM Paul Rogers
> > >
> > > > >
> > > > > > wrote:
> > > > > >
> > > > > > > Congratulations Volodymyr!
> > > > > > > Thanks,
> > > > > > > - Paul
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > On Friday, August 24, 2018, 5:53:25 AM PDT, Arina Ielchiieva
> > > > > > > ar...@apache.org> wrote:
> > > > > > >
> > > > > > > I am pleased to announce that Drill PMC invited Volodymyr
> > > Vysotskyi
> > > > to
> > > > > > the
> > > > > > > PMC and he has accepted the invitation.
> > > > > > >
> > > > > > > Congratulations Vova and thanks for your contributions!
> > > > > > >
> > > > > > > - Arina
> > > > > > > (on behalf of Drill PMC)
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>


[jira] [Created] (DRILL-6710) Drill C++ Client does not handle scale = 0 properly for decimal

2018-08-23 Thread Robert Hou (JIRA)
Robert Hou created DRILL-6710:
-

 Summary: Drill C++ Client does not handle scale = 0 properly for 
decimal
 Key: DRILL-6710
 URL: https://issues.apache.org/jira/browse/DRILL-6710
 Project: Apache Drill
  Issue Type: Bug
  Components: Execution - Relational Operators
Affects Versions: 1.14.0
Reporter: Robert Hou
Assignee: Sorabh Hamirwasia
 Fix For: 1.15.0


Query is:
select cast('99' as decimal(18,0)) + 
cast('9' as decimal(38,0)) from data limit 1

This is the error I get when my test program calls SQLExecDirect:

The driver reported the following diagnostics whilst running SQLExecDirect

HY000:1:40140:[MapR][Support] (40140) Scale can't be less than zero.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (DRILL-6709) Batch statistics logging utility needs to be extended to mid-stream operators

2018-08-23 Thread Robert Hou (JIRA)
Robert Hou created DRILL-6709:
-

 Summary: Batch statistics logging utility needs to be extended to 
mid-stream operators
 Key: DRILL-6709
 URL: https://issues.apache.org/jira/browse/DRILL-6709
 Project: Apache Drill
  Issue Type: Bug
  Components: Execution - Relational Operators
Affects Versions: 1.14.0
Reporter: Robert Hou
Assignee: salim achouche
 Fix For: 1.15.0


A new batch logging utility has been created to log batch sizing messages to 
drillbit.log. It is being used by the Parquet reader. It needs to be enhanced 
so it can be used by mid-stream operators. In particular, mid-stream operators 
have both incoming batches and outgoing batches, while Parquet only has 
outgoing batches. So the utility needs to support incoming batches.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: [ANNOUNCE] New PMC member: Boaz Ben-Zvi

2018-08-17 Thread Robert Hou
Congratulations, Boaz!  Thanks for working on Drill.

--Robert

On Fri, Aug 17, 2018 at 4:45 PM, Padma Penumarthy <
penumarthy.pa...@gmail.com> wrote:

> Congratulations Boaz.
>
> Thanks
> Padma
>
>
> On Fri, Aug 17, 2018 at 2:33 PM, Robert Wu  wrote:
>
> > Congratulations, Boaz!
> >
> > Best regards,
> >
> > Rob
> >
> > -Original Message-
> > From: Abhishek Girish 
> > Sent: Friday, August 17, 2018 2:17 PM
> > To: dev 
> > Subject: Re: [ANNOUNCE] New PMC member: Boaz Ben-Zvi
> >
> > Congratulations, Boaz!
> >
> > On Fri, Aug 17, 2018 at 2:15 PM Sorabh Hamirwasia 
> > wrote:
> >
> > > Congratulations Boaz!
> > >
> > > On Fri, Aug 17, 2018 at 11:42 AM, Karthikeyan Manivannan <
> > > kmanivan...@mapr.com> wrote:
> > >
> > > > Congrats! Well deserved!
> > > >
> > > > On Fri, Aug 17, 2018, 11:31 AM Timothy Farkas 
> > wrote:
> > > >
> > > > > Congrats!
> > > > >
> > > > > On Fri, Aug 17, 2018 at 11:27 AM, Gautam Parai 
> > > wrote:
> > > > >
> > > > > > Congratulations Boaz!!
> > > > > >
> > > > > > Gautam
> > > > > >
> > > > > > On Fri, Aug 17, 2018 at 11:04 AM, Khurram Faraaz
> > > > > > 
> > > > > wrote:
> > > > > >
> > > > > > > Congratulations Boaz.
> > > > > > >
> > > > > > > On Fri, Aug 17, 2018 at 10:47 AM, shi.chunhui <
> > > > > > > shi.chun...@aliyun.com.invalid> wrote:
> > > > > > >
> > > > > > > > Congrats Boaz!
> > > > > > > >
> > > --
> > > > > > > > Sender:Arina Ielchiieva  Sent at:2018 Aug
> > > > > > > > 17 (Fri) 17:51 To:dev ; user
> > > > > > > >  Subject:[ANNOUNCE] New PMC member:
> > > > > > > > Boaz Ben-Zvi
> > > > > > > >
> > > > > > > > I am pleased to announce that Drill PMC invited Boaz Ben-Zvi
> > > > > > > > to
> > > the
> > > > > PMC
> > > > > > > and
> > > > > > > > he has accepted the invitation.
> > > > > > > >
> > > > > > > > Congratulations Boaz and thanks for your contributions!
> > > > > > > >
> > > > > > > > - Arina
> > > > > > > > (on behalf of Drill PMC)
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>


[jira] [Created] (DRILL-6688) Data batches for Project operator exceed the maximum specified

2018-08-14 Thread Robert Hou (JIRA)
Robert Hou created DRILL-6688:
-

 Summary: Data batches for Project operator exceed the maximum 
specified
 Key: DRILL-6688
 URL: https://issues.apache.org/jira/browse/DRILL-6688
 Project: Apache Drill
  Issue Type: Bug
  Components: Execution - Relational Operators
Affects Versions: 1.14.0
Reporter: Robert Hou
Assignee: Karthikeyan Manivannan
 Fix For: 1.15.0


I ran this query:
alter session set `drill.exec.memory.operator.project.output_batch_size` = 
131072;
alter session set `planner.width.max_per_node` = 1;
alter session set `planner.width.max_per_query` = 1;
select
chr(101) CharacterValuea,
chr(102) CharacterValueb,
chr(103) CharacterValuec,
chr(104) CharacterValued,
chr(105) CharacterValuee
from dfs.`/drill/testdata/batch_memory/character5_1MB.parquet`;

The output has 1024 identical lines:
e f g h i

There is one incoming batch:
2018-08-09 15:50:14,794 [24933ad8-a5e2-73f1-90dd-947fc2938e54:frag:0:0] DEBUG 
o.a.d.e.p.i.p.ProjectMemoryManager - BATCH_STATS, incoming: Batch size:

{ Records: 6, Total size: 0, Data size: 30, Gross row width: 0, Net row 
width: 5, Density: 0% }
Batch schema & sizes:

{ `_DEFAULT_COL_TO_READ_`(type: OPTIONAL INT, count: 6, Per entry: std data 
size: 4, std net size: 5, actual data size: 4, actual net size: 5 Totals: data 
size: 24, net size: 30) }
}

There are four outgoing batches. All are too large. The first three look like 
this:
2018-08-09 15:50:14,799 [24933ad8-a5e2-73f1-90dd-947fc2938e54:frag:0:0] DEBUG 
o.a.d.e.p.i.p.ProjectRecordBatch - BATCH_STATS, outgoing: Batch size:

{ Records: 16383, Total size: 0, Data size: 409575, Gross row width: 0, Net row 
width: 25, Density: 0% }
Batch schema & sizes:

{ CharacterValuea(type: REQUIRED VARCHAR, count: 16383, Per entry: std data 
size: 50, std net size: 54, actual data size: 1, actual net size: 5 Totals: 
data size: 16383, net size: 81915) }
CharacterValueb(type: REQUIRED VARCHAR, count: 16383, Per entry: std data size: 
50, std net size: 54, actual data size: 1, actual net size: 5 Totals: data 
size: 16383, net size: 81915) }
CharacterValuec(type: REQUIRED VARCHAR, count: 16383, Per entry: std data size: 
50, std net size: 54, actual data size: 1, actual net size: 5 Totals: data 
size: 16383, net size: 81915) }
CharacterValued(type: REQUIRED VARCHAR, count: 16383, Per entry: std data size: 
50, std net size: 54, actual data size: 1, actual net size: 5 Totals: data 
size: 16383, net size: 81915) }
CharacterValuee(type: REQUIRED VARCHAR, count: 16383, Per entry: std data size: 
50, std net size: 54, actual data size: 1, actual net size: 5 Totals: data 
size: 16383, net size: 81915) }
}

The last batch is smaller because it has the remaining records.

The data size (409575) exceeds the maximum batch size (131072).

character415.q



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (DRILL-6682) Cast integer to binary returns incorrect result

2018-08-10 Thread Robert Hou (JIRA)
Robert Hou created DRILL-6682:
-

 Summary: Cast integer to binary returns incorrect result
 Key: DRILL-6682
 URL: https://issues.apache.org/jira/browse/DRILL-6682
 Project: Apache Drill
  Issue Type: Bug
  Components: Execution - Relational Operators
Affects Versions: 1.12.0
Reporter: Robert Hou
Assignee: Pritesh Maker


This query returns an empty binary string:
select cast(123 as binary) from (values(1));

The same problem occurs for bigint, float and double.
Casting works if the data type is date, time, timestamp, interval, varchar and 
binary.

select cast(date '2018-08-10' as binary) from (values(1));

select length(string_binary(cast(123 as binary))), 
length(string_binary(cast(date '2018-08-10' as binary))) from (values(1));
+-+-+
| EXPR$0  | EXPR$1  |
+-+-+
| 0   | 10  |
+-+-+




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (DRILL-6623) Drill encounters exception IndexOutOfBoundsException: writerIndex: -8373248 (expected: readerIndex(0) <= writerIndex <= capacity(32768))

2018-07-19 Thread Robert Hou (JIRA)
Robert Hou created DRILL-6623:
-

 Summary: Drill encounters exception IndexOutOfBoundsException: 
writerIndex: -8373248 (expected: readerIndex(0) <= writerIndex <= 
capacity(32768))
 Key: DRILL-6623
 URL: https://issues.apache.org/jira/browse/DRILL-6623
 Project: Apache Drill
  Issue Type: Bug
  Components: Execution - Relational Operators
Affects Versions: 1.14.0
Reporter: Robert Hou
Assignee: Pritesh Maker


This is the query:
alter session set `planner.width.max_per_node` = 1;
alter session set `planner.width.max_per_query` = 1;
select * from (
select
split_part(CharacterValuea, '8', 1) CharacterValuea,
split_part(CharacterValueb, '8', 1) CharacterValueb,
split_part(CharacterValuec, '8', 2) CharacterValuec,
split_part(CharacterValued, '8', 3) CharacterValued,
split_part(CharacterValuee, 'b', 1) CharacterValuee
from (select * from 
dfs.`/drill/testdata/batch_memory/character5_1MB_1GB.parquet` order by 
CharacterValuea) d where d.CharacterValuea = '1234567890123110');

The query works with a smaller table.

This is the stack trace:
{noformat}
2018-07-19 16:59:48,803 [24aedae9-d1f3-8e12-2e1f-0479915c61b1:frag:0:0] ERROR 
o.a.d.e.w.fragment.FragmentExecutor - SYSTEM ERROR: IndexOutOfBoundsException: 
writerIndex: -8373248 (expected: readerIndex(0) <= writerIndex <= 
capacity(32768))

Fragment 0:0

[Error Id: edc75560-41ca-4fdd-907f-060be1795786 on qa-node186.qa.lab:31010]
org.apache.drill.common.exceptions.UserException: SYSTEM ERROR: 
IndexOutOfBoundsException: writerIndex: -8373248 (expected: readerIndex(0) <= 
writerIndex <= capacity(32768))

Fragment 0:0

[Error Id: edc75560-41ca-4fdd-907f-060be1795786 on qa-node186.qa.lab:31010]
at 
org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:633)
 ~[drill-common-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
at 
org.apache.drill.exec.work.fragment.FragmentExecutor.sendFinalState(FragmentExecutor.java:361)
 [drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
at 
org.apache.drill.exec.work.fragment.FragmentExecutor.cleanup(FragmentExecutor.java:216)
 [drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
at 
org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:327)
 [drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
at 
org.apache.drill.common.SelfCleaningRunnable.run(SelfCleaningRunnable.java:38) 
[drill-common-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) 
[na:1.8.0_161]
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) 
[na:1.8.0_161]
at java.lang.Thread.run(Thread.java:748) [na:1.8.0_161]
Caused by: java.lang.IndexOutOfBoundsException: writerIndex: -8373248 
(expected: readerIndex(0) <= writerIndex <= capacity(32768))
at 
io.netty.buffer.AbstractByteBuf.writerIndex(AbstractByteBuf.java:104) 
~[netty-buffer-4.0.48.Final.jar:4.0.48.Final]
at 
org.apache.drill.exec.vector.VarCharVector$Mutator.setValueCount(VarCharVector.java:810)
 ~[vector-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
at 
org.apache.drill.exec.vector.NullableVarCharVector$Mutator.setValueCount(NullableVarCharVector.java:641)
 ~[vector-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
at 
org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.setValueCount(ProjectRecordBatch.java:329)
 ~[drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
at 
org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.doWork(ProjectRecordBatch.java:242)
 ~[drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
at 
org.apache.drill.exec.record.AbstractUnaryRecordBatch.innerNext(AbstractUnaryRecordBatch.java:117)
 ~[drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
at 
org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext(ProjectRecordBatch.java:142)
 ~[drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
at 
org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:172)
 ~[drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
at 
org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:119)
 ~[drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
at 
org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:109)
 ~[drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
at 
org.apache.drill.exec.record.AbstractUnaryRecordBatch.innerNext(AbstractUnaryRecordBatch.java:63)
 ~[drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
at 
org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext(ProjectRecordBatch.java:142)
 ~[drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
at 
org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java

Re: [ANNOUNCE] New PMC Chair of Apache Drill

2018-07-18 Thread Robert Hou
Congratulations, Arina!

--Robert

On Wed, Jul 18, 2018 at 9:12 PM, Sorabh Hamirwasia 
wrote:

> Congratulations Arina!
>
> On Wed, Jul 18, 2018 at 6:13 PM, Charles Givre  wrote:
>
> > Congrats Arina!! Well done!
> >
> > > On Jul 18, 2018, at 20:59, Paul Rogers 
> > wrote:
> > >
> > > Congratulations Arina!
> > >
> > > - Paul
> > >
> > >
> > >
> > >On Wednesday, July 18, 2018, 2:19:44 PM PDT, Aman Sinha <
> > amansi...@apache.org> wrote:
> > >
> > > Drill developers,
> > > Time flies and it is time for a new PMC chair !  Thank you all for your
> > > support during the past year.
> > >
> > > I am very pleased to announce that the Drill PMC has voted to elect
> Arina
> > > Ielchiieva as the new PMC chair of Apache Drill.  She has also been
> > > approved unanimously by the Apache Board in today's board meeting.
> > Please
> > > join me in congratulating Arina !
> > >
> > > Thanks,
> > > Aman
> >
> >
>


[jira] [Resolved] (DRILL-6605) TPCDS-84 Query does not return any rows

2018-07-18 Thread Robert Hou (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-6605?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Hou resolved DRILL-6605.
---
Resolution: Fixed

> TPCDS-84 Query does not return any rows
> ---
>
> Key: DRILL-6605
> URL: https://issues.apache.org/jira/browse/DRILL-6605
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Relational Operators
>        Reporter: Robert Hou
>    Assignee: Robert Hou
>Priority: Major
> Attachments: drillbit.log.node80, drillbit.log.node81, 
> drillbit.log.node82, drillbit.log.node83, drillbit.log.node85, 
> drillbit.log.node86, drillbit.log.node87, drillbit.log.node88
>
>
> Query is:
> Advanced/tpcds/tpcds_sf100/hive/parquet/query84.sql
> This uses the hive parquet reader.
> {code:sql}
> SELECT c_customer_id   AS customer_id,
> c_last_name
> || ', '
> || c_first_name AS customername
> FROM   customer,
> customer_address,
> customer_demographics,
> household_demographics,
> income_band,
> store_returns
> WHERE  ca_city = 'Green Acres'
> AND c_current_addr_sk = ca_address_sk
> AND ib_lower_bound >= 54986
> AND ib_upper_bound <= 54986 + 5
> AND ib_income_band_sk = hd_income_band_sk
> AND cd_demo_sk = c_current_cdemo_sk
> AND hd_demo_sk = c_current_hdemo_sk
> AND sr_cdemo_sk = cd_demo_sk
> ORDER  BY c_customer_id
> LIMIT 100
> {code}
> This query should return 100 rows.  It does not return any rows.
> Here is the explain plan:
> {noformat}
> | 00-00Screen
> 00-01  Project(customer_id=[$0], customername=[$1])
> 00-02SelectionVectorRemover
> 00-03  Limit(fetch=[100])
> 00-04SingleMergeExchange(sort0=[0])
> 01-01  OrderedMuxExchange(sort0=[0])
> 02-01SelectionVectorRemover
> 02-02  TopN(limit=[100])
> 02-03HashToRandomExchange(dist0=[[$0]])
> 03-01  Project(customer_id=[$0], customername=[||(||($5, 
> ', '), $4)])
> 03-02Project(c_customer_id=[$1], 
> c_current_cdemo_sk=[$2], c_current_hdemo_sk=[$3], c_current_addr_sk=[$4], 
> c_first_name=[$5], c_last_name=[$6], ca_address_sk=[$8], ca_city=[$9], 
> cd_demo_sk=[$7], hd_demo_sk=[$10], hd_income_band_sk=[$11], 
> ib_income_band_sk=[$12], ib_lower_bound=[$13], ib_upper_bound=[$14], 
> sr_cdemo_sk=[$0])
> 03-03  HashJoin(condition=[=($7, $0)], 
> joinType=[inner])
> 03-05HashToRandomExchange(dist0=[[$0]])
> 04-01  Scan(groupscan=[HiveScan 
> [table=Table(dbName:tpcds100_parquet, tableName:store_returns), 
> columns=[`sr_cdemo_sk`], numPartitions=0, partitions= null, 
> inputDirectories=[maprfs:/drill/testdata/tpcds_sf100/parquet/web_returns], 
> confProperties={}]])
> 03-04HashToRandomExchange(dist0=[[$6]])
> 05-01  HashJoin(condition=[=($2, $9)], 
> joinType=[inner])
> 05-03HashJoin(condition=[=($3, $7)], 
> joinType=[inner])
> 05-05  HashJoin(condition=[=($1, $6)], 
> joinType=[inner])
> 05-07Scan(groupscan=[HiveScan 
> [table=Table(dbName:tpcds100_parquet, tableName:customer), 
> columns=[`c_customer_id`, `c_current_cdemo_sk`, `c_current_hdemo_sk`, 
> `c_current_addr_sk`, `c_first_name`, `c_last_name`], numPartitions=0, 
> partitions= null, 
> inputDirectories=[maprfs:/drill/testdata/tpcds_sf100/parquet/customer], 
> confProperties={}]])
> 05-06BroadcastExchange
> 06-01  Scan(groupscan=[HiveScan 
> [table=Table(dbName:tpcds100_parquet, tableName:customer_demographics), 
> columns=[`cd_demo_sk`], numPartitions=0, partitions= null, 
> inputDirectories=[maprfs:/drill/testdata/tpcds_sf100/parquet/customer_demographics],
>  confProperties={}]])
> 05-04  BroadcastExchange
> 07-01SelectionVectorRemover
> 07-02  Filter(condition=[=($1, 'Green 
> Acres')])
> 07-03Scan(groupscan=[HiveScan 
> [table=Table(dbName:tpcds100_parquet, tableName:customer_address), 
> columns=[`ca_address_sk`, `ca_city`], numPartitions=0, partitions= null, 
> inputDirectories=[maprfs:/drill/testdata/tpcds_sf100/parquet/customer_address],
>  confProperties={}]])
> 05-02BroadcastExchange
> 08-01  HashJoin(co

[jira] [Created] (DRILL-6605) Query does not return any rows

2018-07-12 Thread Robert Hou (JIRA)
Robert Hou created DRILL-6605:
-

 Summary: Query does not return any rows
 Key: DRILL-6605
 URL: https://issues.apache.org/jira/browse/DRILL-6605
 Project: Apache Drill
  Issue Type: Bug
  Components: Execution - Relational Operators
Affects Versions: 1.13.0
Reporter: Robert Hou
Assignee: Pritesh Maker
 Fix For: 1.15.0


Query is:
Advanced/tpcds/tpcds_sf100/hive/parquet/query84.sql

This uses the hive parquet reader.

SELECT c_customer_id   AS customer_id,
c_last_name
\|\| ', '
\|\| c_first_name AS customername
FROM   customer,
customer_address,
customer_demographics,
household_demographics,
income_band,
store_returns
WHERE  ca_city = 'Green Acres'
AND c_current_addr_sk = ca_address_sk
AND ib_lower_bound >= 54986
AND ib_upper_bound <= 54986 + 5
AND ib_income_band_sk = hd_income_band_sk
AND cd_demo_sk = c_current_cdemo_sk
AND hd_demo_sk = c_current_hdemo_sk
AND sr_cdemo_sk = cd_demo_sk
ORDER  BY c_customer_id
LIMIT 100

This query should return 100 rows

commit id is:
1.14.0-SNAPSHOT a77fd142d86dd5648cda8866b8ff3af39c7b6b11DRILL-6516: 
EMIT support in streaming agg   11.07.2018 @ 18:40:03 PDT   Unknown 
12.07.2018 @ 01:50:37 PDT





--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (DRILL-6603) Query does not return enough rows

2018-07-12 Thread Robert Hou (JIRA)
Robert Hou created DRILL-6603:
-

 Summary: Query does not return enough rows
 Key: DRILL-6603
 URL: https://issues.apache.org/jira/browse/DRILL-6603
 Project: Apache Drill
  Issue Type: Bug
  Components: Execution - Relational Operators
Affects Versions: 1.14.0
Reporter: Robert Hou
Assignee: Pritesh Maker
 Fix For: 1.15.0


Query is:
/root/drillAutomation/framework-master/framework/resources/Advanced/data-shapes/wide-columns/5000/10rows/parquet/q67.q

select * from widestrings where str_var is null and dec_var_prec5_sc2 between 
10 and 15

This query should return 5 rows.  It is missing 3 rows.

1664IaYIEviH tJHD 
6nF33QQJn1p4uuTELHOR2z0FCzMK35JkNeDRKCduYKUiPaXFgwftf4Ciidk2d7IXxyrCoX56Vsb 
ITcI9yxPpd3Gu6zkk2kktmZv9oHxMVE1ccVh2iGzU7greQuUEJ1oYFHGzGN9MEeKc5DqbHHT0F65NF1LE88CAudZW5bv6AiIj2D714q72g8ULd2WaazavWBQ6PgdKax
 
5kVvGkt9czWgZOH9CfT0ApOWUWZlQcvtVC2UumK6Q8tmE5f5yjKhTqvXOiistNIMo4K1NqG8U5t9V33b3h9Hk1ymyeGNMrb5Is1jB5nL9zlpyx3y46WoxV9GornIyrLw
 W4wxtVsbj2yFYuU65RdDzkNKezE0LsPtpXeEpJeFoFSP 
lF0wj8xSQg1wx5cfOMXBGNA1nvqTELCPCEzUvFj8hXQ3gANHJ9bOt7QFZhxWLlBhCevbqA40IgJntlf0cAJM6V562fpGd16Trt3mI4YQUOkf3luTVRcBJRpIdoP3ZzgvhnVrgfblboAFMZ8CzCaH7QrZf02fPtYJlBAdoJB6DMjqh6mbkphod1QGYOkE0jqLMCnKoZSpOG9Rk9dIFdlkIrvea0f1KDGAuAlYiTTsdgU4R6CowbVNfEyjIv0Wp1CXC6SzM1Vex6Ye7CrRptvn92SOQCsAElScXa1EuErruEAyIEvtWraXL5X42RxTBsH3TZTR6NVuUcpObKbVIx0kLTdbxIElf33x31QwXUfUVZ
 T4zHEpu6f4mLR6N9uLVG0Fza 
Glq3UxixhgxPXgZpQt9GqT3HJXHEn9F0KGaxhC9VCqSk119HrrJuMpHiYS34MCkw1iFhGFUsRKI3fTFaByicJeCIkjFwn2cr74lONdco4AAFdGGVN1cMgJmlOxUZE0Okv68DocVXUMSXCdcTBBmGL2h2gDIagThjo8sVXORponMNTrXEP068Zy7pNkVJyW10EoZwqE2IIcoKdixYsJvPc0mRWnk3gfSmB6uHWgKvgGq4yzzbGp3NT01z8IRYKbmSXTmLyk9rJjUYatoIi
 
757C2F0Yq0gceouo3LMaz9h4eyiC9psNiL3aoxquqrisayOjPs5esQzoY2iVmVZ7evrVCfxhe2AATFgTvk8Ek78y8s4nVNztlyluIrckfLbnOa25r1h9emJzooVV0Xj945xj5jAUHTZU9kCHKnmkcpEo0a7BdELbL0IvQlitXxbZBS86PlCltLGpLs
 fmYeUzJfpp0Cql3MAECSQQbW4ErwWScaZ5D 
rPfbbDZbF2m2ZtSPNn81G5zZBxfHgpuSm4UVrdd24NlLeG1mxwv 
zU1PbpjSCqbn8rUCWqn5LFafTrmSdtrCuFaknTpqmk1wR9cLnPF3cD xvh0EqSwvCmCTK9xCpZkJF 
4WnBX6w5vg7gQkjvF1GOqP3LeV3qbJc 
SO68S2UrCBNYQKdWyq4HeGG3TTuFF4x74nWkPPi0txEGiGDoYRxPvEQzWyhZ8SHpHZ3 
0UpHpuLWEXIO6VZlPJd4uC IaDEIaB 
rkCJ8TaIVvaBIf0t8FGY8MgXTWzKdUBkOcQawbODXRLEtdGABTnOqftRSfUSpdojmlwRIs8xJIKaxK9wSL67DKahL6E7CvDBaQx20G0o7u
 
rMaponV4OZmHE45vaeAqfLSyWlNL4UvOstiDPaDd8nI08g9MSKFtYYxt3RxvydGxCtaYfgsl3KxjN5VHnAxkvChVlvdS2Yd8IBA
 0dZwblnKUBibdQSgxcypDbRCPeAaOr169L9mrMv82w0V1Ndyt3qK 
wcpv5nKeO8P9kbVlWY9bGi9nxCVs804WBZMA9vc7AT4h7Jp0OsaHbJx0qyFyAnXP lu 
MMsOa28VxSW8thiTfIcx2qkdFN1KXrXpU4uo lxUOcJhH0HlyX6kLKhCnVqpG 
tFP93c5jJ7FdeSujFvxPgo1rQSN9DHXk4DR6nytgBrn2oGcM58zadRNaqoIL2wmWygQsnk7Euzypbg4KhlTICBl1mpb0JwbI7uaCudGcDNWIBMerY
 WgjahuC3QjIFd48o78CQSgqgQjzpHzdELrqMCKaKfdW4ihpHCA0sqNBYGQxxd 
T8iTWorOODkg5Kc7m4gPut8tuzEMOQus1xdajv9PqS8F7xwzAWyhymyYBJ8505HxZDuSFqBXSkpxGDh21fiBHkeKBC9RZp7r
 yD7i6xvRh47Vln0IxvnwcpahLltLr12yL0sDu9LXxHNAHU4gyvHud5J5xXJPD7r5xHXvtNOSiXVl 
hkBBib1k4IO9YjCgModazXNudTx2Mr8ccq6 
kNLKwnrwGdssm3JYyjBsUcXyLMHpS7vncUeKSw2rov4Hg4gTZU8sJMJMAJvu8d6IDJYMHULwrawKOhK8rDTP6sk9Hv27mCG8Gf9inG38Pik7AfnEtUIiZZozEsiSkWvAA7YiHlNDUuL3OX2FRgt2qu9T7zXtQkhon8uSv5FncUq17XB9idflAO0rWIK57HoilaXgIDrzG61kfSKZXpdKuwBVsRNmgJVDSedRsSihlcVDdZ7bmqsgzbvKhFri8lSh8ez6ttlXgF8h4wJ2985bVw5PUmLdeGjlbfrLF0f22vqGi11qz2GUltrjBmmBSrbCLpFUkwqqpATRoQEwo27qi5XwHYWWBqPN9rxF
 
orktFM5SRwG2IJmx8li8sRRchYnNYQgH7iuwKqd69jJJTwwdYla2296Lhw88YHzL60aq2XomN0BNNSoY8cALvy0QIHZpCFd3EmBojr46d6c8nBYMXJLlgKNzklk8vMTKrjAgBQevUH4U7gbQpOIWVf7Tx2BIXkdRGwQYHAuJzU5gtDuDqhuddXkGdACMmp0tgJVP2tpMW05Z3OGs6jYKb5xtqHotIJd7tUM33J85fRYOEIoGOaRblZr7RF82nSOSpPQnDgnVUhJ1j
 mCY1ofeqG7QqeV6LTdRyRPgiiPwHF1Xgpb3feAJ804NmX7xOkDPvw0WeqxrSVMCto r8E64UsRFypZ 
wtzVAlTJKgTMpzA4xeuVXuk85mpEJTIQpNxPjU3vgAacENiejcRs68Y85Ncb5ymC3fD0WAyh23VIsy 
GqaCV9hIFrAs tMM2zlkqpoBsSwgODBEsizaJkb4ZOWJj3Z2Wttr08YPpXSO6 
IhQKD5SHqNXEDNar2UVZwFZbg1YJccvsjWEtfm0AUZ 
3KHMUb3X1F3tWqIYrZucrsjUp2xfaGtqnsij4q7CRWhRucucjyKcKmiaGE7XllzVGPeHWmbtAFku355JLB2OlBXdsgWMVZFcaCOHff6OlSECOgdLGBSL297kgCVKLzDEvxS
 
T4rb5neHQffvmAHOzdIuDGw1559XGVHwzz5lLoc3iSicYlwZTKN2VUOQPHRSqTI1hMJmgTcUaO3LEHyxL2so3EedaU9BSaTaA3kPefKSdu
 ibaW3h1 
WKkznSnlmVjhLzq5e5ywYzwA26EusRtJmAAiiSrYG20uO7ejp1AlorSgOAfM9B5qxQAqaDqQMUlvhlu7SjK46egz5kK3xtcoUfyxyUwAonh3iv
 
VJPXdvxm8ZuZbnm82xLkh4MeWbClb0jH5E42m9aFp8GrSQzAwhzciocZJABwerP1sfITnG6EMyPKdl7FBIjJKjNcFOVabzQX966h6WYnAOKuaYdJWNGgKOISIcR6OwHIaUWjqV9w84VYxXutZJ1rRlbeUPT8ygTZmFk2FK2Ix02rBzt0nFkiTNmoZSilSzSOxSF
 iwtXmtDRtjrQPQCVKlZM3KrYjiJfOem8PIOA8wadL0lHN87gpEqUsrvpohZ8FRW 
ILoeDeWeBYO94JOrYv7JdirgNH7MBdmrMQOrBPpY6bdX3is62JWMm9c0Xv7jyEVdq3hkSsJLWEr4Gu8TZBfjrd9rVX0gqjlQZsk30UwEDjvtfufkYcJj2sGbJ3HzJdIh1MCHIoPb1YyacfzEvnQsnlQagfRu51vSF8qehDJ2AtCezy6hOdwberI4qgP8HMuBKRjoyN91ipykonft9himO44rJtkiREFA9opJA9jKWM8kYzICDmE2
 D3pZcmMGyUEyCY K7IEITWxzmISenhl1Ext2wzZxJoQcfLNU 8rmXNFLwxnJCEYq4bNrEn9IQw 
6xhgjw8roQVEgL8NZTxtlcve8RAyLILFdfNsvvg7qa700PCc

[jira] [Created] (DRILL-6594) Data batches for Project operator are not being split properly and exceed the maximum specified

2018-07-11 Thread Robert Hou (JIRA)
Robert Hou created DRILL-6594:
-

 Summary: Data batches for Project operator are not being split 
properly and exceed the maximum specified
 Key: DRILL-6594
 URL: https://issues.apache.org/jira/browse/DRILL-6594
 Project: Apache Drill
  Issue Type: Bug
  Components: Execution - Relational Operators
Affects Versions: 1.14.0
Reporter: Robert Hou
Assignee: Karthikeyan Manivannan
 Fix For: 1.14.0


I ran this query:
alter session set `drill.exec.memory.operator.project.output_batch_size` = 
131072;
alter session set `planner.width.max_per_node` = 1;
alter session set `planner.width.max_per_query` = 1;
select * from (
select
case when false then c.CharacterValuea else i.IntegerValuea end IntegerValuea,
case when false then c.CharacterValueb else i.IntegerValueb end IntegerValueb,
case when false then c.CharacterValuec else i.IntegerValuec end IntegerValuec,
case when false then c.CharacterValued else i.IntegerValued end IntegerValued,
case when false then c.CharacterValuee else i.IntegerValuee end IntegerValuee
from (select * from dfs.`/drill/testdata/batch_memory/character5_1MB.parquet` 
order by CharacterValuea) c,
dfs.`/drill/testdata/batch_memory/integer5_1MB.parquet` i
where i.Index = c.Index and
c.CharacterValuea = '1234567890123100') limit 10;

An incoming batch looks like this:
2018-06-14 19:28:10,905 [24dcdbc7-2f42-16a9-56f1-9cf58bc549bc:frag:5:0] DEBUG 
o.a.d.e.p.i.p.ProjectMemoryManager - BATCH_STATS, incoming: Batch size:

{ Records: 32768, Total size: 20512768, Data size: 9175040, Gross row width: 
626, Net row width: 280, Density: 45% }
An outgoing batch looks like this:
2018-06-14 19:28:10,911 [24dcdbc7-2f42-16a9-56f1-9cf58bc549bc:frag:5:0] DEBUG 
o.a.d.e.p.i.p.ProjectRecordBatch - BATCH_STATS, outgoing: Batch size: { 
Records: 1023, Total size: 11018240, Data size: 138105, Gross row width: 10771, 
Net row width: 135, Density: 2% }

The data size (138105) exceeds the maximum batch size (131072).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (DRILL-6569) Jenkins Regression: TPCDS query 19 fails with INTERNAL_ERROR ERROR: Can not read value at 2 in block 0 in file maprfs:///drill/testdata/tpcds_sf100/parquet/store_sales/1_

2018-06-29 Thread Robert Hou (JIRA)
Robert Hou created DRILL-6569:
-

 Summary: Jenkins Regression: TPCDS query 19 fails with 
INTERNAL_ERROR ERROR: Can not read value at 2 in block 0 in file 
maprfs:///drill/testdata/tpcds_sf100/parquet/store_sales/1_13_1.parquet
 Key: DRILL-6569
 URL: https://issues.apache.org/jira/browse/DRILL-6569
 Project: Apache Drill
  Issue Type: Bug
  Components: Execution - Relational Operators
Affects Versions: 1.14.0
Reporter: Robert Hou
Assignee: Pritesh Maker
 Fix For: 1.14.0


This is TPCDS Query 19.

Query: 
/root/drillAutomation/framework-master/framework/resources/Advanced/tpcds/tpcds_sf100/hive/parquet/query19.sql

SELECT i_brand_id  brand_id,
i_brand brand,
i_manufact_id,
i_manufact,
Sum(ss_ext_sales_price) ext_price
FROM   date_dim,
store_sales,
item,
customer,
customer_address,
store
WHERE  d_date_sk = ss_sold_date_sk
AND ss_item_sk = i_item_sk
AND i_manager_id = 38
AND d_moy = 12
AND d_year = 1998
AND ss_customer_sk = c_customer_sk
AND c_current_addr_sk = ca_address_sk
AND Substr(ca_zip, 1, 5) <> Substr(s_zip, 1, 5)
AND ss_store_sk = s_store_sk
GROUP  BY i_brand,
i_brand_id,
i_manufact_id,
i_manufact
ORDER  BY ext_price DESC,
i_brand,
i_brand_id,
i_manufact_id,
i_manufact
LIMIT 100;

Here is the stack trace:
2018-06-29 07:00:32 INFO  DrillTestLogger:348 - 
Exception:

java.sql.SQLException: INTERNAL_ERROR ERROR: Can not read value at 2 in block 0 
in file maprfs:///drill/testdata/tpcds_sf100/parquet/store_sales/1_13_1.parquet

Fragment 4:26

[Error Id: 6401a71e-7a5d-4a10-a17c-16873fc3239b on atsqa6c88.qa.lab:31010]

  (hive.org.apache.parquet.io.ParquetDecodingException) Can not read value at 2 
in block 0 in file 
maprfs:///drill/testdata/tpcds_sf100/parquet/store_sales/1_13_1.parquet

hive.org.apache.parquet.hadoop.InternalParquetRecordReader.nextKeyValue():243
hive.org.apache.parquet.hadoop.ParquetRecordReader.nextKeyValue():227

org.apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReaderWrapper.next():199

org.apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReaderWrapper.next():57

org.apache.drill.exec.store.hive.readers.HiveAbstractReader.hasNextValue():417
org.apache.drill.exec.store.hive.readers.HiveParquetReader.next():54
org.apache.drill.exec.physical.impl.ScanBatch.next():172
org.apache.drill.exec.record.AbstractRecordBatch.next():119

org.apache.drill.exec.physical.impl.join.HashJoinBatch.sniffNonEmptyBatch():276

org.apache.drill.exec.physical.impl.join.HashJoinBatch.prefetchFirstBatchFromBothSides():238
org.apache.drill.exec.physical.impl.join.HashJoinBatch.buildSchema():218
org.apache.drill.exec.record.AbstractRecordBatch.next():152
org.apache.drill.exec.record.AbstractRecordBatch.next():119

org.apache.drill.exec.physical.impl.join.HashJoinBatch.sniffNonEmptyBatch():276

org.apache.drill.exec.physical.impl.join.HashJoinBatch.prefetchFirstBatchFromBothSides():238
org.apache.drill.exec.physical.impl.join.HashJoinBatch.buildSchema():218
org.apache.drill.exec.record.AbstractRecordBatch.next():152
org.apache.drill.exec.record.AbstractRecordBatch.next():119

org.apache.drill.exec.physical.impl.join.HashJoinBatch.sniffNonEmptyBatch():276

org.apache.drill.exec.physical.impl.join.HashJoinBatch.prefetchFirstBatchFromBothSides():238
org.apache.drill.exec.physical.impl.join.HashJoinBatch.buildSchema():218
org.apache.drill.exec.record.AbstractRecordBatch.next():152
org.apache.drill.exec.record.AbstractRecordBatch.next():119

org.apache.drill.exec.physical.impl.join.HashJoinBatch.sniffNonEmptyBatch():276

org.apache.drill.exec.physical.impl.join.HashJoinBatch.prefetchFirstBatchFromBothSides():238
org.apache.drill.exec.physical.impl.join.HashJoinBatch.buildSchema():218
org.apache.drill.exec.record.AbstractRecordBatch.next():152
org.apache.drill.exec.record.AbstractRecordBatch.next():119
org.apache.drill.exec.record.AbstractRecordBatch.next():109
org.apache.drill.exec.record.AbstractUnaryRecordBatch.innerNext():63

org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext():147
org.apache.drill.exec.record.AbstractRecordBatch.next():172
org.apache.drill.exec.record.AbstractRecordBatch.next():119
org.apache.drill.exec.record.AbstractRecordBatch.next():109
org.apache.drill.exec.record.AbstractUnaryRecordBatch.innerNext():63
org.apache.drill.exec.record.AbstractRecordBatch.next():172
org.apache.drill.exec.record.AbstractRecordBatch.next():119
org.apache.drill.exec.record.AbstractRecordBatch.next():109
org.apache.drill.exec.record.AbstractUnaryRecordBatch.innerNext():63
org.apache.drill.exec.record.AbstractRecordBatch.next():172
org.apache.drill.exec.record.AbstractRecordBatch.next():119
org.apache.drill.exec.record.AbstractRecordBatch.next

[jira] [Created] (DRILL-6568) Jenkins Regression: TPCDS query 68 fails with IllegalStateException: Unexpected EMIT outcome received in buildSchema phase

2018-06-29 Thread Robert Hou (JIRA)
Robert Hou created DRILL-6568:
-

 Summary: Jenkins Regression: TPCDS query 68 fails with 
IllegalStateException: Unexpected EMIT outcome received in buildSchema phase
 Key: DRILL-6568
 URL: https://issues.apache.org/jira/browse/DRILL-6568
 Project: Apache Drill
  Issue Type: Bug
  Components: Execution - Relational Operators
Affects Versions: 1.14.0
Reporter: Robert Hou
Assignee: Khurram Faraaz
 Fix For: 1.14.0


This is TPCDS Query 68.

Query: 
/root/drillAutomation/framework-master/framework/resources/Advanced/tpcds/tpcds_sf1/original/maprdb/json/query68.sql

SELECT c_last_name,
c_first_name,
ca_city,
bought_city,
ss_ticket_number,
extended_price,
extended_tax,
list_price
FROM   (SELECT ss_ticket_number,
ss_customer_sk,
ca_city bought_city,
Sum(ss_ext_sales_price) extended_price,
Sum(ss_ext_list_price)  list_price,
Sum(ss_ext_tax) extended_tax
FROM   store_sales,
date_dim,
store,
household_demographics,
customer_address
WHERE  store_sales.ss_sold_date_sk = date_dim.d_date_sk
AND store_sales.ss_store_sk = store.s_store_sk
AND store_sales.ss_hdemo_sk = household_demographics.hd_demo_sk
AND store_sales.ss_addr_sk = customer_address.ca_address_sk
AND date_dim.d_dom BETWEEN 1 AND 2
AND ( household_demographics.hd_dep_count = 8
OR household_demographics.hd_vehicle_count = 3 )
AND date_dim.d_year IN ( 1998, 1998 + 1, 1998 + 2 )
AND store.s_city IN ( 'Fairview', 'Midway' )
GROUP  BY ss_ticket_number,
ss_customer_sk,
ss_addr_sk,
ca_city) dn,
customer,
customer_address current_addr
WHERE  ss_customer_sk = c_customer_sk
AND customer.c_current_addr_sk = current_addr.ca_address_sk
AND current_addr.ca_city <> bought_city
ORDER  BY c_last_name,
ss_ticket_number
LIMIT 100;

Here is the stack trace:
2018-06-29 07:00:32 INFO  DrillTestLogger:348 - 
Exception:

java.sql.SQLException: SYSTEM ERROR: IllegalStateException: Unexpected EMIT 
outcome received in buildSchema phase

Fragment 0:0

[Error Id: edbe3477-805e-4f1f-8405-d5c194dc28c2 on atsqa6c87.qa.lab:31010]

  (java.lang.IllegalStateException) Unexpected EMIT outcome received in 
buildSchema phase
org.apache.drill.exec.physical.impl.TopN.TopNBatch.buildSchema():178
org.apache.drill.exec.record.AbstractRecordBatch.next():152
org.apache.drill.exec.record.AbstractRecordBatch.next():119
org.apache.drill.exec.record.AbstractRecordBatch.next():109
org.apache.drill.exec.record.AbstractUnaryRecordBatch.innerNext():63
org.apache.drill.exec.record.AbstractRecordBatch.next():172
org.apache.drill.exec.record.AbstractRecordBatch.next():119
org.apache.drill.exec.record.AbstractRecordBatch.next():109
org.apache.drill.exec.record.AbstractUnaryRecordBatch.innerNext():63
org.apache.drill.exec.physical.impl.limit.LimitRecordBatch.innerNext():87
org.apache.drill.exec.record.AbstractRecordBatch.next():172
org.apache.drill.exec.record.AbstractRecordBatch.next():119
org.apache.drill.exec.record.AbstractRecordBatch.next():109
org.apache.drill.exec.record.AbstractUnaryRecordBatch.innerNext():63
org.apache.drill.exec.record.AbstractRecordBatch.next():172
org.apache.drill.exec.record.AbstractRecordBatch.next():119
org.apache.drill.exec.record.AbstractRecordBatch.next():109
org.apache.drill.exec.record.AbstractUnaryRecordBatch.innerNext():63

org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext():147
org.apache.drill.exec.record.AbstractRecordBatch.next():172
org.apache.drill.exec.physical.impl.BaseRootExec.next():103
org.apache.drill.exec.physical.impl.ScreenCreator$ScreenRoot.innerNext():83
org.apache.drill.exec.physical.impl.BaseRootExec.next():93
org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():294
org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():281
java.security.AccessController.doPrivileged():-2
javax.security.auth.Subject.doAs():422
org.apache.hadoop.security.UserGroupInformation.doAs():1595
org.apache.drill.exec.work.fragment.FragmentExecutor.run():281
org.apache.drill.common.SelfCleaningRunnable.run():38
java.util.concurrent.ThreadPoolExecutor.runWorker():1149
java.util.concurrent.ThreadPoolExecutor$Worker.run():624
java.lang.Thread.run():748

at 
org.apache.drill.jdbc.impl.DrillCursor.nextRowInternally(DrillCursor.java:528)
at 
org.apache.drill.jdbc.impl.DrillCursor.loadInitialSchema(DrillCursor.java:600)
at 
org.apache.drill.jdbc.impl.DrillResultSetImpl.execute(DrillResultSetImpl.java:1904)
at 
org.apache.drill.jdbc.impl.DrillResultSetImpl.execute(DrillResultSetImpl.java:64)
at 
oadd.org.apache.calcite.avatica.AvaticaConnection$1.execute(AvaticaConnection.java:630)
at 
org.apache.drill.jdbc.impl.DrillMetaImpl.prepareAndExecute(DrillMetaImpl.jav

[jira] [Created] (DRILL-6567) Jenkins Regression: TPCDS query 93 fails with INTERNAL_ERROR ERROR: java.lang.reflect.UndeclaredThrowableException.

2018-06-29 Thread Robert Hou (JIRA)
Robert Hou created DRILL-6567:
-

 Summary: Jenkins Regression: TPCDS query 93 fails with 
INTERNAL_ERROR ERROR: java.lang.reflect.UndeclaredThrowableException.
 Key: DRILL-6567
 URL: https://issues.apache.org/jira/browse/DRILL-6567
 Project: Apache Drill
  Issue Type: Bug
  Components: Execution - Relational Operators
Affects Versions: 1.14.0
Reporter: Robert Hou
Assignee: Pritesh Maker
 Fix For: 1.14.0


This is TPCDS Query 93.

Query: 
/root/drillAutomation/framework-master/framework/resources/Advanced/tpcds/tpcds_sf100/hive/parquet/query93.sql

SELECT ss_customer_sk,
Sum(act_sales) sumsales
FROM   (SELECT ss_item_sk,
ss_ticket_number,
ss_customer_sk,
CASE
WHEN sr_return_quantity IS NOT NULL THEN
( ss_quantity - sr_return_quantity ) * ss_sales_price
ELSE ( ss_quantity * ss_sales_price )
END act_sales
FROM   store_sales
LEFT OUTER JOIN store_returns
ON ( sr_item_sk = ss_item_sk
AND sr_ticket_number = ss_ticket_number ),
reason
WHERE  sr_reason_sk = r_reason_sk
AND r_reason_desc = 'reason 38') t
GROUP  BY ss_customer_sk
ORDER  BY sumsales,
ss_customer_sk
LIMIT 100;

Here is the stack trace:
2018-06-29 07:00:32 INFO  DrillTestLogger:348 - 
Exception:

java.sql.SQLException: INTERNAL_ERROR ERROR: 
java.lang.reflect.UndeclaredThrowableException

Setup failed for null
Fragment 4:56

[Error Id: 3c72c14d-9362-4a9b-affb-5cf937bed89e on atsqa6c82.qa.lab:31010]

  (org.apache.drill.common.exceptions.ExecutionSetupException) 
java.lang.reflect.UndeclaredThrowableException

org.apache.drill.common.exceptions.ExecutionSetupException.fromThrowable():30
org.apache.drill.exec.store.hive.readers.HiveAbstractReader.setup():327
org.apache.drill.exec.physical.impl.ScanBatch.getNextReaderIfHas():245
org.apache.drill.exec.physical.impl.ScanBatch.next():164
org.apache.drill.exec.record.AbstractRecordBatch.next():119

org.apache.drill.exec.physical.impl.join.HashJoinBatch.sniffNonEmptyBatch():276

org.apache.drill.exec.physical.impl.join.HashJoinBatch.prefetchFirstBatchFromBothSides():238
org.apache.drill.exec.physical.impl.join.HashJoinBatch.buildSchema():218
org.apache.drill.exec.record.AbstractRecordBatch.next():152
org.apache.drill.exec.record.AbstractRecordBatch.next():119
org.apache.drill.exec.record.AbstractRecordBatch.next():109
org.apache.drill.exec.record.AbstractUnaryRecordBatch.innerNext():63

org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext():147
org.apache.drill.exec.record.AbstractRecordBatch.next():172
org.apache.drill.exec.record.AbstractRecordBatch.next():119

org.apache.drill.exec.physical.impl.join.HashJoinBatch.sniffNonEmptyBatch():276

org.apache.drill.exec.physical.impl.join.HashJoinBatch.prefetchFirstBatchFromBothSides():238
org.apache.drill.exec.physical.impl.join.HashJoinBatch.buildSchema():218
org.apache.drill.exec.record.AbstractRecordBatch.next():152
org.apache.drill.exec.record.AbstractRecordBatch.next():119
org.apache.drill.exec.record.AbstractRecordBatch.next():109
org.apache.drill.exec.record.AbstractUnaryRecordBatch.innerNext():63

org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext():147
org.apache.drill.exec.record.AbstractRecordBatch.next():172
org.apache.drill.exec.record.AbstractRecordBatch.next():119
org.apache.drill.exec.record.AbstractRecordBatch.next():109
org.apache.drill.exec.physical.impl.aggregate.HashAggBatch.buildSchema():118
org.apache.drill.exec.record.AbstractRecordBatch.next():152
org.apache.drill.exec.physical.impl.BaseRootExec.next():103

org.apache.drill.exec.physical.impl.partitionsender.PartitionSenderRootExec.innerNext():152
org.apache.drill.exec.physical.impl.BaseRootExec.next():93
org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():294
org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():281
java.security.AccessController.doPrivileged():-2
javax.security.auth.Subject.doAs():422
org.apache.hadoop.security.UserGroupInformation.doAs():1595
org.apache.drill.exec.work.fragment.FragmentExecutor.run():281
org.apache.drill.common.SelfCleaningRunnable.run():38
java.util.concurrent.ThreadPoolExecutor.runWorker():1149
java.util.concurrent.ThreadPoolExecutor$Worker.run():624
java.lang.Thread.run():748
  Caused By (java.util.concurrent.ExecutionException) 
java.lang.reflect.UndeclaredThrowableException
java.util.concurrent.FutureTask.report():122
java.util.concurrent.FutureTask.get():192
org.apache.drill.exec.store.hive.readers.HiveAbstractReader.setup():320
org.apache.drill.exec.physical.impl.ScanBatch.getNextReaderIfHas():245
org.apache.drill.exec.physical.impl.ScanBatch.next():164
org.apache.drill.exec.record.AbstractRecordBatch.next():119

[jira] [Created] (DRILL-6566) Jenkins Regression: TPCDS query 66 fails with RESOURCE ERROR: One or more nodes ran out of memory while executing the query. AGGR OOM at First Phase.

2018-06-29 Thread Robert Hou (JIRA)
Robert Hou created DRILL-6566:
-

 Summary: Jenkins Regression: TPCDS query 66 fails with RESOURCE 
ERROR: One or more nodes ran out of memory while executing the query.  AGGR OOM 
at First Phase.
 Key: DRILL-6566
 URL: https://issues.apache.org/jira/browse/DRILL-6566
 Project: Apache Drill
  Issue Type: Bug
  Components: Execution - Relational Operators
Affects Versions: 1.14.0
Reporter: Robert Hou
Assignee: Pritesh Maker
 Fix For: 1.14.0


This is TPCDS Query 66.

Query: 
/root/drillAutomation/framework-master/framework/resources/Advanced/tpcds/tpcds_sf1/hive-generated-parquet/hive1_native/query66.sql

SELECT w_warehouse_name,
w_warehouse_sq_ft,
w_city,
w_county,
w_state,
w_country,
ship_carriers,
year1,
Sum(jan_sales) AS jan_sales,
Sum(feb_sales) AS feb_sales,
Sum(mar_sales) AS mar_sales,
Sum(apr_sales) AS apr_sales,
Sum(may_sales) AS may_sales,
Sum(jun_sales) AS jun_sales,
Sum(jul_sales) AS jul_sales,
Sum(aug_sales) AS aug_sales,
Sum(sep_sales) AS sep_sales,
Sum(oct_sales) AS oct_sales,
Sum(nov_sales) AS nov_sales,
Sum(dec_sales) AS dec_sales,
Sum(jan_sales / w_warehouse_sq_ft) AS jan_sales_per_sq_foot,
Sum(feb_sales / w_warehouse_sq_ft) AS feb_sales_per_sq_foot,
Sum(mar_sales / w_warehouse_sq_ft) AS mar_sales_per_sq_foot,
Sum(apr_sales / w_warehouse_sq_ft) AS apr_sales_per_sq_foot,
Sum(may_sales / w_warehouse_sq_ft) AS may_sales_per_sq_foot,
Sum(jun_sales / w_warehouse_sq_ft) AS jun_sales_per_sq_foot,
Sum(jul_sales / w_warehouse_sq_ft) AS jul_sales_per_sq_foot,
Sum(aug_sales / w_warehouse_sq_ft) AS aug_sales_per_sq_foot,
Sum(sep_sales / w_warehouse_sq_ft) AS sep_sales_per_sq_foot,
Sum(oct_sales / w_warehouse_sq_ft) AS oct_sales_per_sq_foot,
Sum(nov_sales / w_warehouse_sq_ft) AS nov_sales_per_sq_foot,
Sum(dec_sales / w_warehouse_sq_ft) AS dec_sales_per_sq_foot,
Sum(jan_net)   AS jan_net,
Sum(feb_net)   AS feb_net,
Sum(mar_net)   AS mar_net,
Sum(apr_net)   AS apr_net,
Sum(may_net)   AS may_net,
Sum(jun_net)   AS jun_net,
Sum(jul_net)   AS jul_net,
Sum(aug_net)   AS aug_net,
Sum(sep_net)   AS sep_net,
Sum(oct_net)   AS oct_net,
Sum(nov_net)   AS nov_net,
Sum(dec_net)   AS dec_net
FROM   (SELECT w_warehouse_name,
w_warehouse_sq_ft,
w_city,
w_county,
w_state,
w_country,
'ZOUROS'
|| ','
|| 'ZHOU' AS ship_carriers,
d_yearAS year1,
Sum(CASE
WHEN d_moy = 1 THEN ws_ext_sales_price * ws_quantity
ELSE 0
END)  AS jan_sales,
Sum(CASE
WHEN d_moy = 2 THEN ws_ext_sales_price * ws_quantity
ELSE 0
END)  AS feb_sales,
Sum(CASE
WHEN d_moy = 3 THEN ws_ext_sales_price * ws_quantity
ELSE 0
END)  AS mar_sales,
Sum(CASE
WHEN d_moy = 4 THEN ws_ext_sales_price * ws_quantity
ELSE 0
END)  AS apr_sales,
Sum(CASE
WHEN d_moy = 5 THEN ws_ext_sales_price * ws_quantity
ELSE 0
END)  AS may_sales,
Sum(CASE
WHEN d_moy = 6 THEN ws_ext_sales_price * ws_quantity
ELSE 0
END)  AS jun_sales,
Sum(CASE
WHEN d_moy = 7 THEN ws_ext_sales_price * ws_quantity
ELSE 0
END)  AS jul_sales,
Sum(CASE
WHEN d_moy = 8 THEN ws_ext_sales_price * ws_quantity
ELSE 0
END)  AS aug_sales,
Sum(CASE
WHEN d_moy = 9 THEN ws_ext_sales_price * ws_quantity
ELSE 0
END)  AS sep_sales,
Sum(CASE
WHEN d_moy = 10 THEN ws_ext_sales_price * ws_quantity
ELSE 0
END)  AS oct_sales,
Sum(CASE
WHEN d_moy = 11 THEN ws_ext_sales_price * ws_quantity
ELSE 0
END)  AS nov_sales,
Sum(CASE
WHEN d_moy = 12 THEN ws_ext_sales_price * ws_quantity
ELSE 0
END)  AS dec_sales,
Sum(CASE
WHEN d_moy = 1 THEN ws_net_paid_inc_ship * ws_quantity
ELSE 0
END)  AS jan_net,
Sum(CASE
WHEN d_moy = 2 THEN ws_net_paid_inc_ship * ws_quantity
ELSE 0
END)  AS feb_net,
Sum(CASE
WHEN d_moy = 3 THEN ws_net_paid_inc_ship * ws_quantity
ELSE 0
END)  AS mar_net,
Sum(CASE
WHEN d_moy = 4 THEN ws_net_paid_inc_ship * ws_quantity
ELSE 0
END)  AS apr_net,
Sum(CASE
WHEN d_moy = 5 THEN ws_net_paid_inc_ship * ws_quantity
ELSE 0
END)  AS may_net,
Sum(CASE
WHEN d_moy = 6 THEN ws_net_paid_inc_ship * ws_quantity
ELSE 0
END)  AS jun_net,
Sum(CASE
WHEN d_moy = 7 THEN ws_net_paid_inc_ship * ws_quantity
ELSE 0
END)  AS jul_net,
Sum(CASE
WHEN d_moy = 8 THEN ws_net_paid_inc_ship * ws_quantity
ELSE 0
END)  AS aug_net,
Sum(CASE
WHEN d_moy = 9 THEN ws_net_paid_inc_ship * ws_quantity
ELSE 0
END)  AS sep_net,
Sum(CASE
WHEN d_moy = 10 THEN ws_net_paid_inc_ship * ws_quantity
ELSE 0
END)  AS oct_net,
Sum(CASE
WHEN d_moy = 11 THEN ws_net_paid_inc_ship * ws_quantity
ELSE 0
END)  AS nov_net,
Sum(CASE
WHEN d_moy = 12

[jira] [Created] (DRILL-6565) cume_dist does not return enough rows

2018-06-29 Thread Robert Hou (JIRA)
Robert Hou created DRILL-6565:
-

 Summary: cume_dist does not return enough rows
 Key: DRILL-6565
 URL: https://issues.apache.org/jira/browse/DRILL-6565
 Project: Apache Drill
  Issue Type: Bug
  Components: Execution - Relational Operators
Affects Versions: 1.14.0
Reporter: Robert Hou
Assignee: Pritesh Maker
 Attachments: drillbit.log.7802

This query should return 64 rows but only returns 38 rows:
alter session set `planner.width.max_per_node` = 1;
alter session set `planner.width.max_per_query` = 1;
select * from (
select cume_dist() over (order by Index) IntervalSecondValuea, Index from 
(select * from 
dfs.`/drill/testdata/batch_memory/fourvarchar_asc_nulls_16MB_1GB.parquet` order 
by BigIntvalue)) d where d.Index = 1;

I tried to reproduce the problem by using a smaller table, but it does not 
reproduce.  I tried to reproduce the problem without the outside select 
statement, but it does not reproduce.

Here is the explain plan:
{noformat}
| 00-00Screen : rowType = RecordType(DOUBLE IntervalSecondValuea, ANY 
Index): rowcount = 12000.0, cumulative cost = {757200.0 rows, 
1.1573335922911648E7 cpu, 0.0 io, 0.0 network, 192.0 memory}, id = 4034
00-01  ProjectAllowDup(IntervalSecondValuea=[$0], Index=[$1]) : rowType = 
RecordType(DOUBLE IntervalSecondValuea, ANY Index): rowcount = 12000.0, 
cumulative cost = {756000.0 rows, 1.1572135922911648E7 cpu, 0.0 io, 0.0 
network, 192.0 memory}, id = 4033
00-02Project(w0$o0=[$1], $0=[$0]) : rowType = RecordType(DOUBLE w0$o0, 
ANY $0): rowcount = 12000.0, cumulative cost = {744000.0 rows, 
1.1548135922911648E7 cpu, 0.0 io, 0.0 network, 192.0 memory}, id = 4032
00-03  SelectionVectorRemover : rowType = RecordType(ANY $0, DOUBLE 
w0$o0): rowcount = 12000.0, cumulative cost = {732000.0 rows, 
1.1524135922911648E7 cpu, 0.0 io, 0.0 network, 192.0 memory}, id = 4031
00-04Filter(condition=[=($0, 1)]) : rowType = RecordType(ANY $0, 
DOUBLE w0$o0): rowcount = 12000.0, cumulative cost = {72.0 rows, 
1.1512135922911648E7 cpu, 0.0 io, 0.0 network, 192.0 memory}, id = 4030
00-05  Window(window#0=[window(partition {} order by [0] range 
between UNBOUNDED PRECEDING and CURRENT ROW aggs [CUME_DIST()])]) : rowType = 
RecordType(ANY $0, DOUBLE w0$o0): rowcount = 8.0, cumulative cost = 
{64.0 rows, 1.1144135922911648E7 cpu, 0.0 io, 0.0 network, 192.0 
memory}, id = 4029
00-06SelectionVectorRemover : rowType = RecordType(ANY $0): 
rowcount = 8.0, cumulative cost = {56.0 rows, 1.0984135922911648E7 cpu, 
0.0 io, 0.0 network, 192.0 memory}, id = 4028
00-07  Sort(sort0=[$0], dir0=[ASC]) : rowType = RecordType(ANY 
$0): rowcount = 8.0, cumulative cost = {48.0 rows, 1.0904135922911648E7 
cpu, 0.0 io, 0.0 network, 192.0 memory}, id = 4027
00-08Project($0=[ITEM($0, 'Index')]) : rowType = 
RecordType(ANY $0): rowcount = 8.0, cumulative cost = {40.0 rows, 
5692067.961455824 cpu, 0.0 io, 0.0 network, 128.0 memory}, id = 4026
00-09  SelectionVectorRemover : rowType = 
RecordType(DYNAMIC_STAR T2¦¦**, ANY BigIntvalue): rowcount = 8.0, 
cumulative cost = {32.0 rows, 5612067.961455824 cpu, 0.0 io, 0.0 network, 
128.0 memory}, id = 4025
00-10Sort(sort0=[$1], dir0=[ASC]) : rowType = 
RecordType(DYNAMIC_STAR T2¦¦**, ANY BigIntvalue): rowcount = 8.0, 
cumulative cost = {24.0 rows, 5532067.961455824 cpu, 0.0 io, 0.0 network, 
128.0 memory}, id = 4024
00-11  Project(T2¦¦**=[$0], BigIntvalue=[$1]) : rowType 
= RecordType(DYNAMIC_STAR T2¦¦**, ANY BigIntvalue): rowcount = 8.0, 
cumulative cost = {16.0 rows, 32.0 cpu, 0.0 io, 0.0 network, 0.0 
memory}, id = 4023
00-12Scan(groupscan=[ParquetGroupScan 
[entries=[ReadEntryWithPath 
[path=maprfs:///drill/testdata/batch_memory/fourvarchar_asc_nulls_16MB_1GB.parquet]],
 
selectionRoot=maprfs:/drill/testdata/batch_memory/fourvarchar_asc_nulls_16MB_1GB.parquet,
 numFiles=1, numRowGroups=6, usedMetadataFile=false, columns=[`**`]]]) : 
rowType = RecordType(DYNAMIC_STAR **, ANY BigIntvalue): rowcount = 8.0, 
cumulative cost = {8.0 rows, 16.0 cpu, 0.0 io, 0.0 network, 0.0 
memory}, id = 4022
{noformat}

I have attached the drillbit.log.

The commit id is:
| 1.14.0-SNAPSHOT  | aa127b70b1e46f7f4aa19881f25eda583627830a  | DRILL-6523: 
Fix NPE for describe of partial schema  | 22.06.2018 @ 11:28:23 PDT  | 
r...@mapr.com  | 23.06.2018 @ 02:05:10 PDT  |

fourvarchar_asc_nulls95.q



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (DRILL-6547) IllegalStateException: Tried to remove unmanaged buffer.

2018-06-27 Thread Robert Hou (JIRA)
Robert Hou created DRILL-6547:
-

 Summary: IllegalStateException: Tried to remove unmanaged buffer.
 Key: DRILL-6547
 URL: https://issues.apache.org/jira/browse/DRILL-6547
 Project: Apache Drill
  Issue Type: Bug
  Components: Execution - Relational Operators
Affects Versions: 1.14.0
Reporter: Robert Hou
Assignee: Pritesh Maker


This is the query:
select * from (
select Index, concat(BinaryValue, 'aaa') NewVarcharValue from (select * from 
dfs.`/drill/testdata/batch_memory/alltypes_large_1MB.parquet`)) d where d.Index 
= 1;

This is the plan:
{noformat}
| 00-00Screen
00-01  Project(Index=[$0], NewVarcharValue=[$1])
00-02SelectionVectorRemover
00-03  Filter(condition=[=($0, 1)])
00-04Project(Index=[$0], NewVarcharValue=[CONCAT($1, 'aaa')])
00-05  Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath 
[path=maprfs:///drill/testdata/batch_memory/alltypes_large_1MB.parquet]], 
selectionRoot=maprfs:/drill/testdata/batch_memory/alltypes_large_1MB.parquet, 
numFiles=1, numRowGroups=1, usedMetadataFile=false, columns=[`Index`, 
`BinaryValue`]]])
{noformat}

Here is the stack trace from drillbit.log:
{noformat}
2018-06-27 13:55:03,291 [24cc0659-30b7-b290-7fae-ecb1c1f15c05:frag:0:0] ERROR 
o.a.d.e.w.fragment.FragmentExecutor - SYSTEM ERROR: IllegalStateException: 
Tried to remove unmanaged buffer.

Fragment 0:0

[Error Id: bc1f2f72-c31b-4b9a-964f-96dec9e0f388 on qa-node186.qa.lab:31010]
org.apache.drill.common.exceptions.UserException: SYSTEM ERROR: 
IllegalStateException: Tried to remove unmanaged buffer.

Fragment 0:0

[Error Id: bc1f2f72-c31b-4b9a-964f-96dec9e0f388 on qa-node186.qa.lab:31010]
at 
org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:633)
 ~[drill-common-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
at 
org.apache.drill.exec.work.fragment.FragmentExecutor.sendFinalState(FragmentExecutor.java:361)
 [drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
at 
org.apache.drill.exec.work.fragment.FragmentExecutor.cleanup(FragmentExecutor.java:216)
 [drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
at 
org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:327)
 [drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
at 
org.apache.drill.common.SelfCleaningRunnable.run(SelfCleaningRunnable.java:38) 
[drill-common-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) 
[na:1.8.0_161]
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) 
[na:1.8.0_161]
at java.lang.Thread.run(Thread.java:748) [na:1.8.0_161]
Caused by: java.lang.IllegalStateException: Tried to remove unmanaged buffer.
at 
org.apache.drill.exec.ops.BufferManagerImpl.replace(BufferManagerImpl.java:50) 
~[drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
at io.netty.buffer.DrillBuf.reallocIfNeeded(DrillBuf.java:97) 
~[drill-memory-base-1.14.0-SNAPSHOT.jar:4.0.48.Final]
at 
org.apache.drill.exec.test.generated.ProjectorGen4046.doEval(ProjectorTemplate.java:77)
 ~[na:na]
at 
org.apache.drill.exec.test.generated.ProjectorGen4046.projectRecords(ProjectorTemplate.java:67)
 ~[na:na]
at 
org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.doWork(ProjectRecordBatch.java:236)
 ~[drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
at 
org.apache.drill.exec.record.AbstractUnaryRecordBatch.innerNext(AbstractUnaryRecordBatch.java:117)
 ~[drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
at 
org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext(ProjectRecordBatch.java:147)
 ~[drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
at 
org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:172)
 ~[drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
at 
org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:119)
 ~[drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
at 
org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:109)
 ~[drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
at 
org.apache.drill.exec.record.AbstractUnaryRecordBatch.innerNext(AbstractUnaryRecordBatch.java:63)
 ~[drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
at 
org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:172)
 ~[drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
at 
org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:119)
 ~[drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
at 
org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:109)
 ~[drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0

Re: [ANNOUNCE] New PMC member: Vitalii Diravka

2018-06-26 Thread Robert Hou
Congrats, Vitalii!

--Robert

On Tue, Jun 26, 2018 at 6:17 PM, Padma Penumarthy 
wrote:

> Congrats Vitalii.
>
> Thanks
> Padma
>
>
> > On Jun 26, 2018, at 6:14 PM, Vlad Rozov  wrote:
> >
> > Congratulations Vitalii!
> >
> > Thank you,
> >
> > Vlad
> >
> > On 6/26/18 17:11, Paul Rogers wrote:
> >> Congratulations Vitalii!
> >> - Paul
> >>
> >>
> >> On Tuesday, June 26, 2018, 11:12:16 AM PDT, Aman Sinha <
> amansi...@apache.org> wrote:
> >>I am pleased to announce that Drill PMC invited Vitalii Diravka to
> the PMC
> >> and he has accepted the invitation.
> >>
> >> Congratulations Vitalii and thanks for your contributions !
> >>
> >> -Aman
> >> (on behalf of Drill PMC)
> >>
> >
>
>


Re: [ANNOUNCE] New Committer: Padma Penumarthy

2018-06-18 Thread Robert Hou
Congratuations, Padma!


--Robert


From: rahul challapalli 
Sent: Monday, June 18, 2018 1:36 PM
To: dev
Subject: Re: [ANNOUNCE] New Committer: Padma Penumarthy

Congratulations Padma!

On Mon, Jun 18, 2018 at 1:35 PM Khurram Faraaz  wrote:

> Congratulations Padma! Well deserved.
>
>
> Thanks,
>
> Khurram
>
> 
> From: Paul Rogers 
> Sent: Friday, June 15, 2018 7:50:05 PM
> To: dev@drill.apache.org
> Subject: Re: [ANNOUNCE] New Committer: Padma Penumarthy
>
> Congratulations! Well deserved, if just from the number of times you've
> reviewed my code.
>
> Thanks,
> - Paul
>
>
>
> On Friday, June 15, 2018, 9:36:44 AM PDT, Aman Sinha <
> amansi...@apache.org> wrote:
>
>  The Project Management Committee (PMC) for Apache Drill has invited Padma
> Penumarthy to become a committer, and we are pleased to announce that she
> has
> accepted.
>
> Padma has been contributing to Drill for about 1 1/2 years.  She has made
> improvements for work-unit assignment in the parallelizer, performance of
> filter operator for pattern matching and (more recently) on the batch
> sizing for several operators: Flatten, MergeJoin, HashJoin, UnionAll.
>
> Welcome Padma, and thank you for your contributions.  Keep up the good work
> !
>
> -Aman
> (on behalf of Drill PMC)
>
>


Re: [Vote] Cleaning Up Old PRs

2018-06-07 Thread Robert Hou
The Exchange PR was under active development, but there were some issues that 
could not be resolved at the time.  So it was shelved until someone could get 
some time to resolve those issues.


Thanks.


--Robert


From: Robert Hou 
Sent: Thursday, June 7, 2018 11:46 AM
To: dev@drill.apache.org
Subject: Re: [Vote] Cleaning Up Old PRs

On a related note, someone created a PR to resolve some Exchange issues a year 
ago.  It has been dormant since then, and the original author is probably not 
going to push it forward.  However, a second person has picked it up now 
because we need to resolve the issue.  There is a lot of good work in that PR, 
and it has provided a great starting point.


I'm not against cleaning up old PRs.  But I am not sure it is easy to automate 
without losing some good work.


Thanks.


--Robert


From: Dave Oshinsky 
Sent: Thursday, June 7, 2018 11:34 AM
To: dev@drill.apache.org
Subject: Re: [Vote] Cleaning Up Old PRs

Hi Tim,
Everyone's time is constrained, so I doubt that it will always be possible to 
give "timely" reviews to PR's, especially complex ones, or ones regarding 
problems that are not regarded as high priority.  I suggest these changes to 
your scheme:

1) Once a PR reaches the 3 months point, send an email to the list and directly 
to the PR creator that the PR will automatically be closed in 1 more month if 
specific actions are not taken.  The PR creator is less likely to miss an email 
that is sent directly to him/her.
2) Automatic removals should not be executed until an administrator has 
approved it.  In other words, it should not be completely automatic, without a 
human in the loop.
3) PR's that are closed (either automatically or not) should remain in the 
system for some time (with "reopen" possible), in case a mistake occurs.  It 
seems that github already supports this behavior.

As of this writing, I see 105 open PR's, 1201 closed PR's for Apache Drill.  
Perhaps I'm missing something, but why the effort to make this automatic?  Are 
there way more PR's than I'm seeing?

Thanks,
Dave O


From: Timothy Farkas 
Sent: Thursday, June 7, 2018 1:38 PM
To: dev@drill.apache.org
Subject: Re: [Vote] Cleaning Up Old PRs

Hi Dave,

I'm sorry you had a bad experience. We should do better giving timely reviews 
moving forward. I think there are some ways we can protect PRs from 
unresponsive committers while still closing PRs from unresponsive contributors. 
Here are some ideas.

 1. Have an auto responder comment on each new PR after it is opened with all 
the information a contributor needs to be successful along with all the 
information about how PRs are autoclosed and what to do to keep the PR alive. 
Also encourage the contributor to spam us until we do a review in this message.

 2. Auto labeling fresh PRs with a "needs-first-review" label (or something 
like that). PRs with this label are exempt from the auto closing process and 
the label will only be removed after a committer has looked at the PR and done 
a first round of review. This can protect a PR that had never been reviewed 
from being closed.
 3. Allow the contributor to request a "pending" label to be placed on their 
PR. This label would make their PR permanently immune to auto closing even 
after a first round of review has been completed and the "needs-first-review" 
label has been removed.

How do you feel about these protections? Do you think they would be sufficient? 
If not, do you have any alternative ideas to help improve the process?

As a note, I think our motivations are the same. We both want quality PRs to 
make it into Drill. I want to do it by removing PRs where the contributor is 
unresponsive so committers can better focus on the PRs that need attention. And 
I think you are rightfully concerned about false positives when automating this 
process. Hopefully we can find a good middle ground that everyone can be happy 
with.

Thanks,
Tim


From: Dave Oshinsky 
Sent: Wednesday, June 6, 2018 6:28:39 PM
To: dev@drill.apache.org
Subject: Re: [Vote] Cleaning Up Old PRs

Tim,
It's too restrictive, unless something can be done to educate (outsider) PR 
authors like myself to "go against the grain" and keep asking.  And asking.  
And asking.  And asking.  You get the picture?  I did all that.  And it was 
ignored.  I assumed that people outside MapR aren't welcome to contribute, 
and/or there was little interest in making decimal work properly, and/or there 
was simply nobody available to review it (what I was most comfortable 
believing), and/or my emails smelled really bad (kidding on the last one 8-).  
I asked a few times, and asked again a few times a few months later, and 
nothing.  What can you do to educate outsiders as to what they need to do to 
make sure a useful PR doesn't get flushed d

Re: [Vote] Cleaning Up Old PRs

2018-06-07 Thread Robert Hou
On a related note, someone created a PR to resolve some Exchange issues a year 
ago.  It has been dormant since then, and the original author is probably not 
going to push it forward.  However, a second person has picked it up now 
because we need to resolve the issue.  There is a lot of good work in that PR, 
and it has provided a great starting point.


I'm not against cleaning up old PRs.  But I am not sure it is easy to automate 
without losing some good work.


Thanks.


--Robert


From: Dave Oshinsky 
Sent: Thursday, June 7, 2018 11:34 AM
To: dev@drill.apache.org
Subject: Re: [Vote] Cleaning Up Old PRs

Hi Tim,
Everyone's time is constrained, so I doubt that it will always be possible to 
give "timely" reviews to PR's, especially complex ones, or ones regarding 
problems that are not regarded as high priority.  I suggest these changes to 
your scheme:

1) Once a PR reaches the 3 months point, send an email to the list and directly 
to the PR creator that the PR will automatically be closed in 1 more month if 
specific actions are not taken.  The PR creator is less likely to miss an email 
that is sent directly to him/her.
2) Automatic removals should not be executed until an administrator has 
approved it.  In other words, it should not be completely automatic, without a 
human in the loop.
3) PR's that are closed (either automatically or not) should remain in the 
system for some time (with "reopen" possible), in case a mistake occurs.  It 
seems that github already supports this behavior.

As of this writing, I see 105 open PR's, 1201 closed PR's for Apache Drill.  
Perhaps I'm missing something, but why the effort to make this automatic?  Are 
there way more PR's than I'm seeing?

Thanks,
Dave O


From: Timothy Farkas 
Sent: Thursday, June 7, 2018 1:38 PM
To: dev@drill.apache.org
Subject: Re: [Vote] Cleaning Up Old PRs

Hi Dave,

I'm sorry you had a bad experience. We should do better giving timely reviews 
moving forward. I think there are some ways we can protect PRs from 
unresponsive committers while still closing PRs from unresponsive contributors. 
Here are some ideas.

 1. Have an auto responder comment on each new PR after it is opened with all 
the information a contributor needs to be successful along with all the 
information about how PRs are autoclosed and what to do to keep the PR alive. 
Also encourage the contributor to spam us until we do a review in this message.

 2. Auto labeling fresh PRs with a "needs-first-review" label (or something 
like that). PRs with this label are exempt from the auto closing process and 
the label will only be removed after a committer has looked at the PR and done 
a first round of review. This can protect a PR that had never been reviewed 
from being closed.
 3. Allow the contributor to request a "pending" label to be placed on their 
PR. This label would make their PR permanently immune to auto closing even 
after a first round of review has been completed and the "needs-first-review" 
label has been removed.

How do you feel about these protections? Do you think they would be sufficient? 
If not, do you have any alternative ideas to help improve the process?

As a note, I think our motivations are the same. We both want quality PRs to 
make it into Drill. I want to do it by removing PRs where the contributor is 
unresponsive so committers can better focus on the PRs that need attention. And 
I think you are rightfully concerned about false positives when automating this 
process. Hopefully we can find a good middle ground that everyone can be happy 
with.

Thanks,
Tim


From: Dave Oshinsky 
Sent: Wednesday, June 6, 2018 6:28:39 PM
To: dev@drill.apache.org
Subject: Re: [Vote] Cleaning Up Old PRs

Tim,
It's too restrictive, unless something can be done to educate (outsider) PR 
authors like myself to "go against the grain" and keep asking.  And asking.  
And asking.  And asking.  You get the picture?  I did all that.  And it was 
ignored.  I assumed that people outside MapR aren't welcome to contribute, 
and/or there was little interest in making decimal work properly, and/or there 
was simply nobody available to review it (what I was most comfortable 
believing), and/or my emails smelled really bad (kidding on the last one 8-).  
I asked a few times, and asked again a few times a few months later, and 
nothing.  What can you do to educate outsiders as to what they need to do to 
make sure a useful PR doesn't get flushed down the toilet?  I spent days 
learning some amount of Drill internals and implementing VARDECIMAL (over 70 
source files changed), and did it again months later to merge to then current 
master tip.  All ignored for quite some time.

Thanks to Volodymyr Vysotskyi for ultimately grabbing the ball and running with 
it.  That complex a change required an "insider" to bring it fully to fruition. 
 But if the PR had been 

Re: help for native drive for .NET

2018-05-22 Thread Robert Hou
I don't think there is a Drill driver for .NET.  For Windows, we have ODBC and 
JDBC.


Can you provide more information on your performance issue?


Thanks.


--Robert


From: ariolov...@gmail.com 
Sent: Monday, May 21, 2018 3:06 PM
To: dev@drill.apache.org
Subject: help for native drive for .NET

Hi,

So, today I use ODBC driver, but is too slow..



Do you know a native driver for .NET ?



Thanks!
Ario







[jira] [Created] (DRILL-6393) Radians should take an argument (x)

2018-05-08 Thread Robert Hou (JIRA)
Robert Hou created DRILL-6393:
-

 Summary: Radians should take an argument (x)
 Key: DRILL-6393
 URL: https://issues.apache.org/jira/browse/DRILL-6393
 Project: Apache Drill
  Issue Type: Bug
  Components: Documentation
Affects Versions: 1.13.0
Reporter: Robert Hou
Assignee: Bridget Bevens
 Fix For: 1.14.0


The radians function is missing an argument on this webpage:

   https://drill.apache.org/docs/math-and-trig/

The table has this information:
{noformat}
RADIANS FLOAT8  Converts x degress to radians.
{nformat}
It should be:
{noformat}
RADIANS(x)  FLOAT8  Converts x degrees to radians.
{noformat}

Also, degress is mis-spelled.  It should be degrees.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (DRILL-5900) Regression: TPCH query encounters random IllegalStateException: Memory was leaked by query

2018-04-18 Thread Robert Hou (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-5900?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Hou resolved DRILL-5900.
---
Resolution: Fixed

This test is now passing.

> Regression: TPCH query encounters random IllegalStateException: Memory was 
> leaked by query
> --
>
> Key: DRILL-5900
> URL: https://issues.apache.org/jira/browse/DRILL-5900
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Relational Operators
>Affects Versions: 1.11.0
>Reporter: Robert Hou
>Assignee: Timothy Farkas
>Priority: Blocker
> Attachments: 2611d7c0-b0c9-a93e-c64d-a4ef8f4baf8f.sys.drill, 
> drillbit.log.node81, drillbit.log.node88
>
>
> This is a random failure in the TPCH-SF100-baseline run.  The test is 
> /root/drillAutomation/framework-master/framework/resources/Advanced/tpch/tpch_sf1/original/parquet/query17.sql.
>   This test has passed before.
> TPCH query 6:
> {noformat}
> SELECT
>   SUM(L.L_EXTENDEDPRICE) / 7.0 AS AVG_YEARLY
> FROM
>   lineitem L,
>   part P
> WHERE
>   P.P_PARTKEY = L.L_PARTKEY
>   AND P.P_BRAND = 'BRAND#13'
>   AND P.P_CONTAINER = 'JUMBO CAN'
>   AND L.L_QUANTITY < (
> SELECT
>   0.2 * AVG(L2.L_QUANTITY)
> FROM
>   lineitem L2
> WHERE
>   L2.L_PARTKEY = P.P_PARTKEY
>   )
> {noformat}
> Error is:
> {noformat}
> 2017-10-23 10:34:55,989 [2611d7c0-b0c9-a93e-c64d-a4ef8f4baf8f:frag:8:2] ERROR 
> o.a.d.e.w.fragment.FragmentExecutor - SYSTEM ERROR: IllegalStateException: 
> Memory was leaked by query. Memory leaked: (2097152)
> Allocator(op:8:2:6:ParquetRowGroupScan) 100/0/7675904/100 
> (res/actual/peak/limit)
> Fragment 8:2
> [Error Id: f21a2560-7259-4e13-88c2-9bac29e2930a on atsqa6c88.qa.lab:31010]
> org.apache.drill.common.exceptions.UserException: SYSTEM ERROR: 
> IllegalStateException: Memory was leaked by query. Memory leaked: (2097152)
> Allocator(op:8:2:6:ParquetRowGroupScan) 100/0/7675904/100 
> (res/actual/peak/limit)
> Fragment 8:2
> [Error Id: f21a2560-7259-4e13-88c2-9bac29e2930a on atsqa6c88.qa.lab:31010]
> at 
> org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:586)
>  ~[drill-common-1.12.0-SNAPSHOT.jar:1.12.0-SNAPSHOT]
> at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.sendFinalState(FragmentExecutor.java:298)
>  [drill-java-exec-1.12.0-SNAPSHOT.jar:1.12.0-SNAPSHOT]
> at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.cleanup(FragmentExecutor.java:160)
>  [drill-java-exec-1.12.0-SNAPSHOT.jar:1.12.0-SNAPSHOT]
> at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:267)
>  [drill-java-exec-1.12.0-SNAPSHOT.jar:1.12.0-SNAPSHOT]
> at 
> org.apache.drill.common.SelfCleaningRunnable.run(SelfCleaningRunnable.java:38)
>  [drill-common-1.12.0-SNAPSHOT.jar:1.12.0-SNAPSHOT]
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>  [na:1.7.0_51]
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>  [na:1.7.0_51]
> at java.lang.Thread.run(Thread.java:744) [na:1.7.0_51]
> Caused by: java.lang.IllegalStateException: Memory was leaked by query. 
> Memory leaked: (2097152)
> Allocator(op:8:2:6:ParquetRowGroupScan) 100/0/7675904/100 
> (res/actual/peak/limit)
> at 
> org.apache.drill.exec.memory.BaseAllocator.close(BaseAllocator.java:519) 
> ~[drill-memory-base-1.12.0-SNAPSHOT.jar:1.12.0-SNAPSHOT]
> at 
> org.apache.drill.exec.ops.AbstractOperatorExecContext.close(AbstractOperatorExecContext.java:86)
>  ~[drill-java-exec-1.12.0-SNAPSHOT.jar:1.12.0-SNAPSHOT]
> at 
> org.apache.drill.exec.ops.OperatorContextImpl.close(OperatorContextImpl.java:108)
>  ~[drill-java-exec-1.12.0-SNAPSHOT.jar:1.12.0-SNAPSHOT]
> at 
> org.apache.drill.exec.ops.FragmentContext.suppressingClose(FragmentContext.java:435)
>  ~[drill-java-exec-1.12.0-SNAPSHOT.jar:1.12.0-SNAPSHOT]
> at 
> org.apache.drill.exec.ops.FragmentContext.close(FragmentContext.java:424) 
> ~[drill-java-exec-1.12.0-SNAPSHOT.jar:1.12.0-SNAPSHOT]
> at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.closeOutResources(FragmentExecutor.java:324)
>  [drill-java-exec-1.12.0-SNAPSHOT.jar:1.12.0-SNAPSHOT]
> at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.cleanup(FragmentExecutor.java:155)
>  [drill-java-exec-1.12.

[jira] [Created] (DRILL-6276) Drill CTAS creates parquet file having page greater than 200 MB.

2018-03-19 Thread Robert Hou (JIRA)
Robert Hou created DRILL-6276:
-

 Summary: Drill CTAS creates parquet file having page greater than 
200 MB.
 Key: DRILL-6276
 URL: https://issues.apache.org/jira/browse/DRILL-6276
 Project: Apache Drill
  Issue Type: Bug
  Components: Storage - Parquet
Affects Versions: 1.13.0
Reporter: Robert Hou
 Attachments: alltypes_asc_16MB.json

I used this CTAS to create a parquet file from a json file:
{noformat}
create table `alltypes.parquet` as select cast(BigIntValue as BigInt) 
BigIntValue, cast(BooleanValue as Boolean) BooleanValue, cast (DateValue as 
Date) DateValue, cast (FloatValue as Float) FloatValue, cast (DoubleValue as 
Double) DoubleValue, cast (IntegerValue as Integer) IntegerValue, cast 
(TimeValue as Time) TimeValue, cast (TimestampValue as Timestamp) 
TimestampValue, cast (IntervalYearValue as INTERVAL YEAR) IntervalYearValue, 
cast (IntervalDayValue as INTERVAL DAY) IntervalDayValue, cast 
(IntervalSecondValue as INTERVAL SECOND) IntervalSecondValue, cast (BinaryValue 
as binary) Binaryvalue, cast (VarcharValue as varchar) VarcharValue from 
`alltypes.json`;
{noformat}

I ran parquet-tools/parquet-dump :

VarcharValue TV=6885 RL=0 DL=1


page 0:  DLE:RLE RLE:BIT_PACKED VLE:PLAIN SZ:17240317 VC:6885

The page size is 16MB.  This is with a 16MB data set.  When I try a similar 1GB 
data set, the page size starts at over 200 MB, decreasing down to 1MB.

VarcharValue TV=208513 RL=0 DL=1


page 0:   DLE:RLE RLE:BIT_PACKED VLE:PLAIN SZ:215243750 VC:87433
page 1:   DLE:RLE RLE:BIT_PACKED VLE:PLAIN SZ:112350266 VC:43717
page 2:   DLE:RLE RLE:BIT_PACKED VLE:PLAIN SZ:52501154 VC:21859
page 3:   DLE:RLE RLE:BIT_PACKED VLE:PLAIN SZ:27725498 VC:10930
page 4:   DLE:RLE RLE:BIT_PACKED VLE:PLAIN SZ:12181241 VC:5466
page 5:   DLE:RLE RLE:BIT_PACKED VLE:PLAIN SZ:11005971 VC:2734
page 6:   DLE:RLE RLE:BIT_PACKED VLE:PLAIN SZ:1133237 VC:1797
page 7:   DLE:RLE RLE:BIT_PACKED VLE:PLAIN SZ:1462803 VC:899
page 8:   DLE:RLE RLE:BIT_PACKED VLE:PLAIN SZ:1050967 VC:490
page 9:   DLE:RLE RLE:BIT_PACKED VLE:PLAIN SZ:1051603 VC:424
page 10:  DLE:RLE RLE:BIT_PACKED VLE:PLAIN SZ:1050919 VC:378
page 11:  DLE:RLE RLE:BIT_PACKED VLE:PLAIN SZ:1050487 VC:345
page 12:  DLE:RLE RLE:BIT_PACKED VLE:PLAIN SZ:1050783 VC:319
page 13:  DLE:RLE RLE:BIT_PACKED VLE:PLAIN SZ:1052303 VC:299
page 14:  DLE:RLE RLE:BIT_PACKED VLE:PLAIN SZ:1053235 VC:282
page 15:  DLE:RLE RLE:BIT_PACKED VLE:PLAIN SZ:1055979 VC:268

The column has a varchar, and the size varies from 2 bytes to 5000 bytes.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (DRILL-6176) Drill skips a row when querying a text file but does not report it.

2018-03-08 Thread Robert Hou (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-6176?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Hou resolved DRILL-6176.
---
Resolution: Not A Problem

> Drill skips a row when querying a text file but does not report it.
> ---
>
> Key: DRILL-6176
> URL: https://issues.apache.org/jira/browse/DRILL-6176
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Data Types
>Affects Versions: 1.12.0
>    Reporter: Robert Hou
>Assignee: Pritesh Maker
>Priority: Critical
> Attachments: 10.tbl
>
>
> I tried to query 10 rows from a tbl file.  It skipped the 6th row, which only 
> has special symbols in it.  So it shows 9 rows.  And there was no warning 
> that a row is skipped.
> i checked the special symbols.  The same symbols appear in other rows.
> This also occurs if the file is a csv file.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: [ANNOUNCE] New Committer: Kunal Khatua

2018-02-28 Thread Robert Hou
Congrats Kunal!


--Robert


From: Robert Wu 
Sent: Wednesday, February 28, 2018 10:50 AM
To: dev@drill.apache.org
Subject: RE: [ANNOUNCE] New Committer: Kunal Khatua

Congratulations, Kunal!

Best regards,

Rob

-Original Message-
From: Vitalii Diravka [mailto:vitalii.dira...@gmail.com]
Sent: Wednesday, February 28, 2018 10:48 AM
To: dev@drill.apache.org
Subject: Re: [ANNOUNCE] New Committer: Kunal Khatua

Congrats, Kunal!

Kind regards
Vitalii

On Wed, Feb 28, 2018 at 6:39 PM, Timothy Farkas  wrote:

> Congrats!
>
> 
> From: Paul Rogers 
> Sent: Wednesday, February 28, 2018 9:58:32 AM
> To: dev@drill.apache.org
> Subject: Re: [ANNOUNCE] New Committer: Kunal Khatua
>
> Congrats, Kunal! Well deserved.
>
> - Paul
>
>
> > On Feb 27, 2018, at 10:42 AM, Prasad Nagaraj Subramanya <
> prasadn...@gmail.com> wrote:
> >
> > Congratulations Kunal!
> >
> >
> > On Tue, Feb 27, 2018 at 10:41 AM, Padma Penumarthy
> >  >
> > wrote:
> >
> >> Congratulations Kunal !
> >>
> >> Thanks
> >> Padma
> >>
> >>
> >>> On Feb 27, 2018, at 8:42 AM, Aman Sinha  wrote:
> >>>
> >>> The Project Management Committee (PMC) for Apache Drill has
> >>> invited
> Kunal
> >>> Khatua  to become a committer, and we are pleased to announce that
> >>> he has accepted.
> >>>
> >>> Over the last couple of years, Kunal has made substantial
> >>> contributions
> >> to
> >>> the process of creating and interpreting of query profiles, among
> >>> other code contributions. He has led the efforts for Drill
> >>> performance
> >> evaluation
> >>> and benchmarking.  He is a prolific writer on the user mailing
> >>> list, providing detailed responses.
> >>>
> >>> Welcome Kunal, and thank you for your contributions.  Keep up the
> >>> good work !
> >>>
> >>> - Aman
> >>> (on behalf of the Apache Drill PMC)
> >>
> >>
>
>


[jira] [Created] (DRILL-6178) Drill does not project extra columns in some cases

2018-02-21 Thread Robert Hou (JIRA)
Robert Hou created DRILL-6178:
-

 Summary: Drill does not project extra columns in some cases
 Key: DRILL-6178
 URL: https://issues.apache.org/jira/browse/DRILL-6178
 Project: Apache Drill
  Issue Type: Bug
  Components: Execution - Relational Operators
Affects Versions: 1.12.0
Reporter: Robert Hou
Assignee: Pritesh Maker
 Attachments: 10.tbl

Drill is supposed to project extra columns as null columns.  This table has 10 
columns.  The extra columns are shown as null:
{noformat}
0: jdbc:drill:zk=10.10.104.85:5181> select columns[0], columns[3], columns[4], 
columns[5], columns[6], columns[7], columns[8], columns[9], columns[10], 
columns[11], columns[12], columns[13], columns[14], columns[15] from 
`resource-manager/1.tbl`;
+-+-+-+-+-+-+-+-+-+-+--+--+--+--+
| EXPR$0 | EXPR$1 | EXPR$2 | EXPR$3 | EXPR$4 | EXPR$5 | EXPR$6 | EXPR$7 | 
EXPR$8 | EXPR$9 | EXPR$10 | EXPR$11 | EXPR$12 | EXPR$13 |
+-+-+-+-+-+-+-+-+-+-+--+--+--+--+
| 1 | | null | null | null | null | -61 | -255.0 | null | null | null | null | 
null | null |
+-+-+-+-+-+-+-+-+-+-+--+--+--+--+{noformat}
 

If I run the same query against a table with 10 rows and 10 columns (attached 
to the Jira), only the 10 columns are shown.

 
{noformat}
select columns[0], columns[1], columns[2], columns[3], columns[4], columns[5], 
columns[6], columns[7], columns[8], columns[9], columns[10], columns[11], 
columns[12], columns[13], columns[14], columns[15] from `10.tbl`{noformat}
 

 

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (DRILL-6176) Drill skips a row when querying a text file but does not report it.

2018-02-21 Thread Robert Hou (JIRA)
Robert Hou created DRILL-6176:
-

 Summary: Drill skips a row when querying a text file but does not 
report it.
 Key: DRILL-6176
 URL: https://issues.apache.org/jira/browse/DRILL-6176
 Project: Apache Drill
  Issue Type: Bug
  Components: Execution - Data Types
Affects Versions: 1.12.0
Reporter: Robert Hou
Assignee: Pritesh Maker


I tried to query 10 rows from a tbl file.  It skipped the 6th row, which only 
has special symbols in it.  So it shows 9 rows.  And there was no warning that 
a row is skipped.

i checked the special symbols.  The same symbols appear in other rows.

This also occurs if the file is a csv file.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (DRILL-6165) Drill should support versioning between Drill clients (JDBC/ODBC) and Drill server

2018-02-16 Thread Robert Hou (JIRA)
Robert Hou created DRILL-6165:
-

 Summary: Drill should support versioning between Drill clients 
(JDBC/ODBC) and Drill server
 Key: DRILL-6165
 URL: https://issues.apache.org/jira/browse/DRILL-6165
 Project: Apache Drill
  Issue Type: Bug
  Components: Client - JDBC, Client - ODBC
Affects Versions: 1.12.0
Reporter: Robert Hou
Assignee: Pritesh Maker


We need to determine which versions of JDBC/ODBC drivers can be used with which 
versions of Drill server.  Due to recent improvements in security, a newer 
client had problems working with an older server.  The current solution is to 
require drill clients and drill servers to be the same version.  In some cases, 
different versions of drill clients can work with different versions of drill 
servers, but this compatibility is being determined on a version-by-version, 
feature-by-feature basis.

We need an architecture that enables this to work automatically.  In 
particular, if a new drill client requests a feature that the older drill 
server does not support, this should be handled gracefully without returning an 
error.

This also has an impact on QA resources.  We recently had a customer issue that 
needed to be fixed on three different Drill server releases, so three new 
drivers had to be created and tested.

Note that drill clients and drill servers can be on different versions for 
various reasons:

1) A user may need to access different drill servers.  They can only have one 
version of the drill client installed on their machine.

2) Many users may need to access the same drill server.  Some users may have 
one version of the drill client installed while other users may have a 
different version of the drill client installed.  In a large customer 
installation, it is difficult to get all users to upgrade their drill client at 
the same time.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (DRILL-6134) Many Drill queries fail when using JDBC Driver from Simba

2018-02-05 Thread Robert Hou (JIRA)
Robert Hou created DRILL-6134:
-

 Summary: Many Drill queries fail when using JDBC Driver from Simba
 Key: DRILL-6134
 URL: https://issues.apache.org/jira/browse/DRILL-6134
 Project: Apache Drill
  Issue Type: Bug
Reporter: Robert Hou
Assignee: Pritesh Maker


Here is an example:

Query: 
/root/drillAutomation/framework-master/framework/resources/Functional/limit0/union/data/union_51.q
{noformat}
(SELECT c2 FROM `union_01_v` ORDER BY c5 DESC nulls first) UNION (SELECT c2 
FROM `union_02_v` ORDER BY c5 ASC nulls first){noformat}
This is the error:
{noformat}
Exception:

java.sql.SQLException: [JDBC Driver]The field c2(BIGINT:OPTIONAL) 
[$bits$(UINT1:REQUIRED), $values$(BIGINT:OPTIONAL)] doesn't match the provided 
metadata major_type {
  minor_type: BIGINT
  mode: OPTIONAL
}
name_part {
  name: "$values$"
}
value_count: 18
buffer_length: 144
.
at 
com.google.common.base.Preconditions.checkArgument(Preconditions.java:145)
at org.apache.drill.exec.vector.BigIntVector.load(BigIntVector.java:287)
at 
org.apache.drill.exec.vector.NullableBigIntVector.load(NullableBigIntVector.java:274)
at 
org.apache.drill.exec.record.RecordBatchLoader.load(RecordBatchLoader.java:131)
at 
com.mapr.drill.drill.dataengine.DRJDBCResultSet.doLoadRecordBatchData(Unknown 
Source)
at com.mapr.drill.drill.dataengine.DRJDBCResultSet.hasMoreRows(Unknown 
Source)
at 
com.mapr.drill.drill.dataengine.DRJDBCResultSet.doMoveToNextRow(Unknown Source)
at com.mapr.drill.jdbc.common.CommonResultSet.moveToNextRow(Unknown 
Source)
at com.mapr.drill.jdbc.common.SForwardResultSet.next(Unknown Source)
at 
org.apache.drill.test.framework.DrillTestJdbc.executeQuery(DrillTestJdbc.java:255)
at 
org.apache.drill.test.framework.DrillTestJdbc.run(DrillTestJdbc.java:115)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:473)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.IllegalArgumentException: The field c2(BIGINT:OPTIONAL) 
[$bits$(UINT1:REQUIRED), $values$(BIGINT:OPTIONAL)] doesn't match the provided 
metadata major_type {
  minor_type: BIGINT
  mode: OPTIONAL
}
name_part {
  name: "$values$"
}
value_count: 18
buffer_length: 144
.
... 16 more{noformat}
 

The commit that causes these errors to occur is:
{noformat}
https://issues.apache.org/jira/browse/DRILL-6049
Rollup of hygiene changes from "batch size" project
commit ID e791ed62b1c91c39676c4adef438c689fd84fd4b{noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: [ANNOUNCE] New PMC member: Paul Rogers

2018-01-31 Thread Robert Hou
Congratulations, Paul!


--Robert


From: Abhishek Girish 
Sent: Tuesday, January 30, 2018 9:31 PM
To: dev@drill.apache.org
Subject: Re: [ANNOUNCE] New PMC member: Paul Rogers

Congratulations, Paul!

On Tue, Jan 30, 2018 at 2:48 PM, Sorabh Hamirwasia 
wrote:

> Congratulations Paul!
>
>
> Thanks,
> Sorabh
>
> 
> From: AnilKumar B 
> Sent: Tuesday, January 30, 2018 2:43:07 PM
> To: dev@drill.apache.org
> Subject: Re: [ANNOUNCE] New PMC member: Paul Rogers
>
> Congratulations, Paul.
>
> Thanks & Regards,
> B Anil Kumar.
>
> On Tue, Jan 30, 2018 at 2:34 PM, Chunhui Shi  wrote:
>
> > Congrats Paul! Well deserved!
> >
> > 
> > From: Kunal Khatua 
> > Sent: Tuesday, January 30, 2018 2:05:56 PM
> > To: dev@drill.apache.org
> > Subject: RE: [ANNOUNCE] New PMC member: Paul Rogers
> >
> > Congratulations, Paul !
> >
> > -Original Message-
> > From: salim achouche [mailto:sachouc...@gmail.com]
> > Sent: Tuesday, January 30, 2018 2:00 PM
> > To: dev@drill.apache.org; Padma Penumarthy 
> > Subject: Re: [ANNOUNCE] New PMC member: Paul Rogers
> >
> > Congrats Paul!
> >
> > Regards,
> > Salim
> >
> > > On Jan 30, 2018, at 1:58 PM, Padma Penumarthy 
> > wrote:
> > >
> > > Congratulations Paul.
> > >
> > > Thanks
> > > Padma
> > >
> > >
> > >> On Jan 30, 2018, at 1:55 PM, Gautam Parai  wrote:
> > >>
> > >> Congratulations Paul!
> > >>
> > >> 
> > >> From: Timothy Farkas 
> > >> Sent: Tuesday, January 30, 2018 1:54:43 PM
> > >> To: dev@drill.apache.org
> > >> Subject: Re: [ANNOUNCE] New PMC member: Paul Rogers
> > >>
> > >> Congrats!
> > >>
> > >> 
> > >> From: Aman Sinha 
> > >> Sent: Tuesday, January 30, 2018 1:50:07 PM
> > >> To: dev@drill.apache.org
> > >> Subject: [ANNOUNCE] New PMC member: Paul Rogers
> > >>
> > >> I am pleased to announce that Drill PMC invited Paul Rogers to the
> > >> PMC and he has accepted the invitation.
> > >>
> > >> Congratulations Paul and thanks for your contributions !
> > >>
> > >> -Aman
> > >> (on behalf of Drill PMC)
> > >
> >
> >
>


[jira] [Created] (DRILL-6078) Query with INTERVAL in predicate does not return any rows

2018-01-08 Thread Robert Hou (JIRA)
Robert Hou created DRILL-6078:
-

 Summary: Query with INTERVAL in predicate does not return any rows
 Key: DRILL-6078
 URL: https://issues.apache.org/jira/browse/DRILL-6078
 Project: Apache Drill
  Issue Type: Bug
  Components: Query Planning & Optimization
Affects Versions: 1.12.0
Reporter: Robert Hou
Assignee: Chunhui Shi


This query does not return any rows when accessing MapR DB tables.

SELECT
  C.C_CUSTKEY,
  C.C_NAME,
  SUM(L.L_EXTENDEDPRICE * (1 - L.L_DISCOUNT)) AS REVENUE,
  C.C_ACCTBAL,
  N.N_NAME,
  C.C_ADDRESS,
  C.C_PHONE,
  C.C_COMMENT
FROM
  customer C,
  orders O,
  lineitem L,
  nation N
WHERE
  C.C_CUSTKEY = O.O_CUSTKEY
  AND L.L_ORDERKEY = O.O_ORDERKEY
  AND O.O_ORDERDate >= DATE '1994-03-01'
  AND O.O_ORDERDate < DATE '1994-03-01' + INTERVAL '3' MONTH
  AND L.L_RETURNFLAG = 'R'
  AND C.C_NATIONKEY = N.N_NATIONKEY
GROUP BY
  C.C_CUSTKEY,
  C.C_NAME,
  C.C_ACCTBAL,
  C.C_PHONE,
  N.N_NAME,
  C.C_ADDRESS,
  C.C_COMMENT
ORDER BY
  REVENUE DESC
LIMIT 20

This query works against JSON tables.  It should return 20 rows.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


Re: [ANNOUNCE] New Committer: Boaz Ben-Zvi

2017-12-13 Thread Robert Hou
Congratulations, Boaz!


--Robert


From: Paul Rogers 
Sent: Wednesday, December 13, 2017 11:02 AM
To: dev@drill.apache.org
Subject: Re: [ANNOUNCE] New Committer: Boaz Ben-Zvi

Congrats! Well deserved.

- Paul

> On Dec 13, 2017, at 11:00 AM, Timothy Farkas  wrote:
>
> Congrats!
>
> 
> From: Kunal Khatua 
> Sent: Wednesday, December 13, 2017 10:47:14 AM
> To: dev@drill.apache.org
> Subject: RE: [ANNOUNCE] New Committer: Boaz Ben-Zvi
>
> Congratulations, Boaz!!
>
> -Original Message-
> From: Abhishek Girish [mailto:agir...@apache.org]
> Sent: Wednesday, December 13, 2017 10:25 AM
> To: dev@drill.apache.org
> Subject: Re: [ANNOUNCE] New Committer: Boaz Ben-Zvi
>
> Congratulations Boaz!
> On Wed, Dec 13, 2017 at 10:23 AM Aman Sinha  wrote:
>
>> The Project Management Committee (PMC) for Apache Drill has invited
>> Boaz Ben-Zvi  to become a committer, and we are pleased to announce
>> that he has accepted.
>>
>> Boaz has been an active contributor to Drill for more than a year.
>> He designed and implemented the Hash Aggregate spilling and is leading
>> the efforts for Hash Join spilling.
>>
>> Welcome Boaz, and thank you for your contributions.  Keep up the good
>> work !
>>
>> - Aman
>> (on behalf of the Apache Drill PMC)
>>



Re: [ANNOUNCE] New Committer: Vitalii Diravka

2017-12-10 Thread Robert Hou
Congratulations!


--Robert


From: Paul Rogers 
Sent: Sunday, December 10, 2017 4:29 PM
To: dev@drill.apache.org
Subject: Re: [ANNOUNCE] New Committer: Vitalii Diravka

Congrats! Well deserved.

- Paul

> On Dec 10, 2017, at 3:16 PM, AnilKumar B  wrote:
>
> Congratulations Vitalii
>
> Thanks & Regards,
> B Anil Kumar.
>
> On Sun, Dec 10, 2017 at 3:12 PM, rahul challapalli <
> challapallira...@gmail.com> wrote:
>
>> Congratulations Vitalii!
>>
>> On Sun, Dec 10, 2017 at 3:05 PM, Kunal Khatua  wrote:
>>
>>> Congratulations!!
>>>
>>> -Original Message-
>>> From: Aman Sinha [mailto:amansi...@apache.org]
>>> Sent: Sunday, December 10, 2017 11:06 AM
>>> To: dev@drill.apache.org
>>> Subject: [ANNOUNCE] New Committer: Vitalii Diravka
>>>
>>> The Project Management Committee (PMC) for Apache Drill has invited
>>> Vitalii Diravka  to become a committer, and we are pleased to announce
>> that
>>> he has accepted.
>>>
>>> Vitalii has been an active contributor to Drill over the last 1 1/2
>> years.
>>> His contributions have spanned areas such as: CASTing issues with
>>> Date/Timestamp, Parquet metadata and SQL enhancements, among others.
>>>
>>> Welcome Vitalii, and thank you for your contributions.  Keep up the good
>>> work !
>>>
>>> - Aman
>>> (on behalf of the Apache Drill PMC)
>>>
>>



[jira] [Resolved] (DRILL-5898) Query returns columns in the wrong order

2017-10-25 Thread Robert Hou (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-5898?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Hou resolved DRILL-5898.
---
Resolution: Fixed

Updated expected results file.

> Query returns columns in the wrong order
> 
>
> Key: DRILL-5898
> URL: https://issues.apache.org/jira/browse/DRILL-5898
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Relational Operators
>Affects Versions: 1.11.0
>    Reporter: Robert Hou
>Assignee: Robert Hou
>Priority: Blocker
> Fix For: 1.12.0
>
>
> This is a regression.  It worked with this commit:
> {noformat}
> f1d1945b3772bb782039fd6811e34a7de66441c8  DRILL-5582: C++ Client: [Threat 
> Modeling] Drillbit may be spoofed by an attacker and this may lead to data 
> being written to the attacker's target instead of Drillbit
> {noformat}
> It fails with this commit, although there are six commits total between the 
> last good one and this one:
> {noformat}
> b0c4e0486d6d4620b04a1bb8198e959d433b4840  DRILL-5876: Use openssl profile 
> to include netty-tcnative dependency with the platform specific classifier
> {noformat}
> Query is:
> {noformat}
> select * from 
> dfs.`/drill/testdata/tpch100_dir_partitioned_5files/lineitem` where 
> dir0=2006 and dir1=12 and dir2=15 and l_discount=0.07 order by l_orderkey, 
> l_extendedprice limit 10
> {noformat}
> Columns are returned in a different order.  Here are the expected results:
> {noformat}
> foxes. furiously final ideas cajol1994-05-27  0.071731.42 4   
> F   653442  4965666.0   1.0 1994-06-23  A   1994-06-22
>   NONESHIP215671  0.07200612  15 (1 time(s))
> lly final account 1994-11-09  0.0745881.783   F   
> 653412  1.320809E7  46.01994-11-24  R   1994-11-08  TAKE 
> BACK RETURNREG AIR 458104  0.08200612  15 (1 time(s))
>  the asymptotes   1997-12-29  0.0760882.8 6   O   653413  
> 1.4271413E7 44.01998-02-04  N   1998-01-20  DELIVER IN 
> PERSON   MAIL21456   0.05200612  15 (1 time(s))
> carefully a   1996-09-23  0.075381.88 2   O   653378  
> 1.6702792E7 3.0 1996-11-14  N   1996-10-15  NONEREG 
> AIR 952809  0.05200612  15 (1 time(s))
> ly final requests. boldly ironic theo 1995-09-04  0.072019.94 2   
> O   653380  2416094.0   2.0 1995-11-14  N   1995-10-18
>   COLLECT COD FOB 166101  0.02200612  15 (1 time(s))
> alongside of the even, e  1996-02-14  0.0786140.322   
> O   653409  5622872.0   48.01996-05-02  N   1996-04-22
>   NONESHIP372888  0.04200612  15 (1 time(s))
> es. regular instruct  1996-10-18  0.0725194.0 1   O   653382  
> 6048060.0   25.01996-08-29  N   1996-08-20  DELIVER IN 
> PERSON   AIR 798079  0.0 200612  15 (1 time(s))
> en package1993-09-19  0.0718718.322   F   653440  
> 1.372054E7  12.01993-09-12  A   1993-09-09  DELIVER IN 
> PERSON   TRUCK   970554  0.0 200612  15 (1 time(s))
> ly regular deposits snooze. unusual, even 1998-01-18  0.07
> 12427.921   O   653413  2822631.0   8.0 1998-02-09
>   N   1998-02-05  TAKE BACK RETURNREG AIR 322636  0.01
> 200612  15 (1 time(s))
>  ironic ideas. bra1996-10-13  0.0764711.533   O   
> 653383  6806672.0   41.01996-12-06  N   1996-11-10  TAKE 
> BACK RETURNAIR 556691  0.01200612  15 (1 time(s))
> {noformat}
> Here are the actual results:
> {noformat}
> 2006  12  15  653383  6806672 556691  3   41.064711.53
> 0.070.01N   O   1996-11-10  1996-10-13  1996-12-06
>   TAKE BACK RETURNAIR  ironic ideas. bra
> 2006  12  15  653378  16702792952809  2   3.0 5381.88 
> 0.070.05N   O   1996-10-15  1996-09-23  1996-11-14
>   NONEREG AIR carefully a
> 2006  12  15  653380  2416094 166101  2   2.0 2019.94 0.07
> 0.02N   O   1995-10-18  1995-09-04  1995-11-14  
> COLLECT COD FOB ly final requests. boldly ironic theo
> 2006  12  15  653413  2822631 322636  1   8.0 12427.92
> 0.070.01   

[jira] [Created] (DRILL-5908) Regression: Query intermittently may fail with error "Waited for 15000ms, but tasks for 'Get block maps' are not complete."

2017-10-25 Thread Robert Hou (JIRA)
Robert Hou created DRILL-5908:
-

 Summary: Regression: Query intermittently may fail with error 
"Waited for 15000ms, but tasks for 'Get block maps' are not complete."
 Key: DRILL-5908
 URL: https://issues.apache.org/jira/browse/DRILL-5908
 Project: Apache Drill
  Issue Type: Bug
  Components: Execution - Relational Operators
Affects Versions: 1.11.0
Reporter: Robert Hou
Assignee: Pritesh Maker


This is from the Functional-Baseline-88.193 Jenkins run.  The test is in the 
Functional test suite, 
partition_pruning/dfs/csv/plan/csvselectpartormultiplewithdir_MD-185.q

Query is:
{noformat}
explain plan for select 
columns[0],columns[1],columns[4],columns[10],columns[13],dir0 from 
`/drill/testdata/partition_pruning/dfs/lineitempart` where (dir0=1993 and 
columns[0]>29600) or (dir0=1994 and columns[0]>29700)
{noformat}

The error is:
{noformat}
Failed with exception
java.sql.SQLException: RESOURCE ERROR: Waited for 15000ms, but tasks for 'Get 
block maps' are not complete. Total runnable size 2, parallelism 2.


[Error Id: ab911277-36cb-465c-a9aa-8e3d21bcc09c on atsqa4-195.qa.lab:31010]


at 
org.apache.drill.jdbc.impl.DrillCursor.nextRowInternally(DrillCursor.java:489)
at 
org.apache.drill.jdbc.impl.DrillCursor.loadInitialSchema(DrillCursor.java:561)
at 
org.apache.drill.jdbc.impl.DrillResultSetImpl.execute(DrillResultSetImpl.java:1895)
at 
org.apache.drill.jdbc.impl.DrillResultSetImpl.execute(DrillResultSetImpl.java:61)
at 
oadd.org.apache.calcite.avatica.AvaticaConnection$1.execute(AvaticaConnection.java:473)
at 
org.apache.drill.jdbc.impl.DrillMetaImpl.prepareAndExecute(DrillMetaImpl.java:1100)
at 
oadd.org.apache.calcite.avatica.AvaticaConnection.prepareAndExecuteInternal(AvaticaConnection.java:477)
at 
org.apache.drill.jdbc.impl.DrillConnectionImpl.prepareAndExecuteInternal(DrillConnectionImpl.java:181)
at 
oadd.org.apache.calcite.avatica.AvaticaStatement.executeInternal(AvaticaStatement.java:110)
at 
oadd.org.apache.calcite.avatica.AvaticaStatement.executeQuery(AvaticaStatement.java:130)
at 
org.apache.drill.jdbc.impl.DrillStatementImpl.executeQuery(DrillStatementImpl.java:112)
at 
org.apache.drill.test.framework.DrillTestJdbc.executeQuery(DrillTestJdbc.java:224)
at 
org.apache.drill.test.framework.DrillTestJdbc.run(DrillTestJdbc.java:136)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:473)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:748)
Caused by: oadd.org.apache.drill.common.exceptions.UserRemoteException: 
RESOURCE ERROR: Waited for 15000ms, but tasks for 'Get block maps' are not 
complete. Total runnable size 2, parallelism 2.


[Error Id: ab911277-36cb-465c-a9aa-8e3d21bcc09c on atsqa4-195.qa.lab:31010]


at 
oadd.org.apache.drill.exec.rpc.user.QueryResultHandler.resultArrived(QueryResultHandler.java:123)
at 
oadd.org.apache.drill.exec.rpc.user.UserClient.handle(UserClient.java:465)
at 
oadd.org.apache.drill.exec.rpc.user.UserClient.handle(UserClient.java:102)
at 
oadd.org.apache.drill.exec.rpc.RpcBus$InboundHandler.decode(RpcBus.java:274)
at 
oadd.org.apache.drill.exec.rpc.RpcBus$InboundHandler.decode(RpcBus.java:244)
at 
oadd.io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:88)
at 
oadd.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:356)
at 
oadd.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:342)
at 
oadd.io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:335)
at 
oadd.io.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:287)
at 
oadd.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:356)
at 
oadd.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:342)
at 
oadd.io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:335)
at 
oadd.io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:102)
at 
oadd.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:356)
at 
oadd.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandler

[jira] [Resolved] (DRILL-5901) Drill test framework can have successful run even if a random failure occurs

2017-10-24 Thread Robert Hou (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-5901?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Hou resolved DRILL-5901.
---
Resolution: Not A Bug

This is a bug in the Drill Test Framework, not in Drill itself.

> Drill test framework can have successful run even if a random failure occurs
> 
>
> Key: DRILL-5901
> URL: https://issues.apache.org/jira/browse/DRILL-5901
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Tools, Build & Test
>Affects Versions: 1.11.0
>Reporter: Robert Hou
>
> From Jenkins:
> http://10.10.104.91:8080/view/Nightly/job/TPCH-SF100-baseline/574/console
> Random Failures:
> /root/drillAutomation/framework-master/framework/resources/Advanced/tpch/tpch_sf1/original/parquet/query17.sql
> Query: 
> SELECT
>   SUM(L.L_EXTENDEDPRICE) / 7.0 AS AVG_YEARLY
> FROM
>   lineitem L,
>   part P
> WHERE
>   P.P_PARTKEY = L.L_PARTKEY
>   AND P.P_BRAND = 'BRAND#13'
>   AND P.P_CONTAINER = 'JUMBO CAN'
>   AND L.L_QUANTITY < (
> SELECT
>   0.2 * AVG(L2.L_QUANTITY)
> FROM
>   lineitem L2
> WHERE
>   L2.L_PARTKEY = P.P_PARTKEY
>   )
> Failed with exception
> java.sql.SQLException: SYSTEM ERROR: IllegalStateException: Memory was leaked 
> by query. Memory leaked: (2097152)
> Allocator(op:8:2:6:ParquetRowGroupScan) 100/0/7675904/100 
> (res/actual/peak/limit)
> Fragment 8:2
> [Error Id: f21a2560-7259-4e13-88c2-9bac29e2930a on atsqa6c88.qa.lab:31010]
>   (java.lang.IllegalStateException) Memory was leaked by query. Memory 
> leaked: (2097152)
> Allocator(op:8:2:6:ParquetRowGroupScan) 100/0/7675904/100 
> (res/actual/peak/limit)
> org.apache.drill.exec.memory.BaseAllocator.close():519
> org.apache.drill.exec.ops.AbstractOperatorExecContext.close():86
> org.apache.drill.exec.ops.OperatorContextImpl.close():108
> org.apache.drill.exec.ops.FragmentContext.suppressingClose():435
> org.apache.drill.exec.ops.FragmentContext.close():424
> 
> org.apache.drill.exec.work.fragment.FragmentExecutor.closeOutResources():324
> org.apache.drill.exec.work.fragment.FragmentExecutor.cleanup():155
> org.apache.drill.exec.work.fragment.FragmentExecutor.run():267
> org.apache.drill.common.SelfCleaningRunnable.run():38
> java.util.concurrent.ThreadPoolExecutor.runWorker():1145
> java.util.concurrent.ThreadPoolExecutor$Worker.run():615
> java.lang.Thread.run():744
>   at 
> org.apache.drill.jdbc.impl.DrillCursor.nextRowInternally(DrillCursor.java:489)
>   at 
> org.apache.drill.jdbc.impl.DrillCursor.loadInitialSchema(DrillCursor.java:561)
>   at 
> org.apache.drill.jdbc.impl.DrillResultSetImpl.execute(DrillResultSetImpl.java:1895)
>   at 
> org.apache.drill.jdbc.impl.DrillResultSetImpl.execute(DrillResultSetImpl.java:61)
>   at 
> oadd.org.apache.calcite.avatica.AvaticaConnection$1.execute(AvaticaConnection.java:473)
>   at 
> org.apache.drill.jdbc.impl.DrillMetaImpl.prepareAndExecute(DrillMetaImpl.java:1100)
>   at 
> oadd.org.apache.calcite.avatica.AvaticaConnection.prepareAndExecuteInternal(AvaticaConnection.java:477)
>   at 
> org.apache.drill.jdbc.impl.DrillConnectionImpl.prepareAndExecuteInternal(DrillConnectionImpl.java:181)
>   at 
> oadd.org.apache.calcite.avatica.AvaticaStatement.executeInternal(AvaticaStatement.java:110)
>   at 
> oadd.org.apache.calcite.avatica.AvaticaStatement.executeQuery(AvaticaStatement.java:130)
>   at 
> org.apache.drill.jdbc.impl.DrillStatementImpl.executeQuery(DrillStatementImpl.java:112)
>   at 
> org.apache.drill.test.framework.DrillTestJdbc.executeQuery(DrillTestJdbc.java:206)
>   at 
> org.apache.drill.test.framework.DrillTestJdbc.run(DrillTestJdbc.java:115)
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>   at java.lang.Thread.run(Thread.java:744)
> Caused by: oadd.org.apache.drill.common.exceptions.UserRemoteException: 
> SYSTEM ERROR: IllegalStateException: Memory was leaked by query. Memory 
> leaked: (2097152)
> Allocator(op:8:2:6:ParquetRowGroupScan) 100/0/7675904/100 
> (res/actual/peak/limit)
> Fragment 8:2
> [Error Id: f21a2560-7259-4e13-88c2-9bac29e2930a on atsqa6c88.qa.lab:31010]
>   (java.lang.Illeg

[jira] [Created] (DRILL-5903) Query encounters "Waited for 15000ms, but tasks for 'Fetch parquet metadata' are not complete."

2017-10-24 Thread Robert Hou (JIRA)
Robert Hou created DRILL-5903:
-

 Summary: Query encounters "Waited for 15000ms, but tasks for 
'Fetch parquet metadata' are not complete."
 Key: DRILL-5903
 URL: https://issues.apache.org/jira/browse/DRILL-5903
 Project: Apache Drill
  Issue Type: Bug
  Components: Metadata, Storage - Parquet
Affects Versions: 1.11.0
Reporter: Robert Hou
Priority: Critical


Query is:
{noformat}
select a.int_col, b.date_col from 
dfs.`/drill/testdata/parquet_date/metadata_cache/mixed/fewtypes_null_large` a 
inner join ( select date_col, int_col from 
dfs.`/drill/testdata/parquet_date/metadata_cache/mixed/fewtypes_null_large` 
where dir0 = '1.2' and date_col > '1996-03-07' ) b on cast(a.date_col as date)= 
date_add(b.date_col, 5) where a.int_col = 7 and a.dir0='1.9' group by 
a.int_col, b.date_col
{noformat}

>From drillbit.log:
{noformat}
fc65-d430-ac1103638113: SELECT SUM(col_int) OVER() sum_int FROM vwOnParq_wCst_35
2017-10-23 11:20:50,122 [26122f83-6956-5aa8-d8de-d4808f572160:foreman] ERROR 
o.a.d.exec.store.parquet.Metadata - Waited for 15000ms, but tasks for 'Fetch 
parquet metadata' are not complete. Total runnable size 3, parallelism 3.
2017-10-23 11:20:50,127 [26122f83-6956-5aa8-d8de-d4808f572160:foreman] INFO  
o.a.d.exec.store.parquet.Metadata - User Error Occurred: Waited for 15000ms, 
but tasks for 'Fetch parquet metadata' are not complete. Total runnable size 3, 
parallelism 3.
org.apache.drill.common.exceptions.UserException: RESOURCE ERROR: Waited for 
15000ms, but tasks for 'Fetch parquet metadata' are not complete. Total 
runnable size 3, parallelism 3.


[Error Id: 7484e127-ea41-4797-83c0-6619ea9b2bcd ]
at 
org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:586)
 ~[drill-common-1.12.0-SNAPSHOT.jar:1.12.0-SNAPSHOT]
at 
org.apache.drill.exec.store.TimedRunnable.run(TimedRunnable.java:151) 
[drill-java-exec-1.12.0-SNAPSHOT.jar:1.12.0-SNAPSHOT]
at 
org.apache.drill.exec.store.parquet.Metadata.getParquetFileMetadata_v3(Metadata.java:341)
 [drill-java-exec-1.12.0-SNAPSHOT.jar:1.12.0-SNAPSHOT]
at 
org.apache.drill.exec.store.parquet.Metadata.getParquetTableMetadata(Metadata.java:318)
 [drill-java-exec-1.12.0-SNAPSHOT.jar:1.12.0-SNAPSHOT]
at 
org.apache.drill.exec.store.parquet.Metadata.getParquetTableMetadata(Metadata.java:142)
 [drill-java-exec-1.12.0-SNAPSHOT.jar:1.12.0-SNAPSHOT]
at 
org.apache.drill.exec.store.parquet.ParquetGroupScan.init(ParquetGroupScan.java:934)
 [drill-java-exec-1.12.0-SNAPSHOT.jar:1.12.0-SNAPSHOT]
at 
org.apache.drill.exec.store.parquet.ParquetGroupScan.(ParquetGroupScan.java:227)
 [drill-java-exec-1.12.0-SNAPSHOT.jar:1.12.0-SNAPSHOT]
at 
org.apache.drill.exec.store.parquet.ParquetGroupScan.(ParquetGroupScan.java:190)
 [drill-java-exec-1.12.0-SNAPSHOT.jar:1.12.0-SNAPSHOT]
at 
org.apache.drill.exec.store.parquet.ParquetFormatPlugin.getGroupScan(ParquetFormatPlugin.java:170)
 [drill-java-exec-1.12.0-SNAPSHOT.jar:1.12.0-SNAPSHOT]
at 
org.apache.drill.exec.store.parquet.ParquetFormatPlugin.getGroupScan(ParquetFormatPlugin.java:66)
 [drill-java-exec-1.12.0-SNAPSHOT.jar:1.12.0-SNAPSHOT]
at 
org.apache.drill.exec.store.dfs.FileSystemPlugin.getPhysicalScan(FileSystemPlugin.java:144)
 [drill-java-exec-1.12.0-SNAPSHOT.jar:1.12.0-SNAPSHOT]
at 
org.apache.drill.exec.store.AbstractStoragePlugin.getPhysicalScan(AbstractStoragePlugin.java:100)
 [drill-java-exec-1.12.0-SNAPSHOT.jar:1.12.0-SNAPSHOT]
at 
org.apache.drill.exec.planner.logical.DrillTable.getGroupScan(DrillTable.java:85)
 [drill-java-exec-1.12.0-SNAPSHOT.jar:1.12.0-SNAPSHOT]
at 
org.apache.drill.exec.planner.logical.DrillPushProjIntoScan.onMatch(DrillPushProjIntoScan.java:62)
 [drill-java-exec-1.12.0-SNAPSHOT.jar:1.12.0-SNAPSHOT]
at 
org.apache.calcite.plan.volcano.VolcanoRuleCall.onMatch(VolcanoRuleCall.java:228)
 [calcite-core-1.4.0-drill-r22.jar:1.4.0-drill-r22]
at 
org.apache.calcite.plan.volcano.VolcanoPlanner.findBestExp(VolcanoPlanner.java:811)
 [calcite-core-1.4.0-drill-r22.jar:1.4.0-drill-r22]
at 
org.apache.calcite.tools.Programs$RuleSetProgram.run(Programs.java:310) 
[calcite-core-1.4.0-drill-r22.jar:1.4.0-drill-r22]
at 
org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.transform(DefaultSqlHandler.java:400)
 [drill-java-exec-1.12.0-SNAPSHOT.jar:1.12.0-SNAPSHOT]
at 
org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.transform(DefaultSqlHandler.java:342)
 [drill-java-exec-1.12.0-SNAPSHOT.jar:1.12.0-SNAPSHOT]
at 
org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.convertToRawDrel(DefaultSqlHandler.java:241)
 [drill-java-exec-1.12.0-SNAPSHOT.jar:1.12.0-SNAPSHOT]
at 
org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.convertToDrel(DefaultSqlHandler.j

[jira] [Created] (DRILL-5902) Regression: Queries encounter random failure due to RPC connection timed out

2017-10-23 Thread Robert Hou (JIRA)
Robert Hou created DRILL-5902:
-

 Summary: Regression: Queries encounter random failure due to RPC 
connection timed out
 Key: DRILL-5902
 URL: https://issues.apache.org/jira/browse/DRILL-5902
 Project: Apache Drill
  Issue Type: Bug
  Components: Execution - RPC
Affects Versions: 1.11.0
Reporter: Robert Hou
Priority: Critical


Multiple random failures (25) occurred with the latest 
Functional-Baseline-88.193 run.  Here is a sample query:

{noformat}
-- Kitchen sink
-- Use all supported functions
select
rank()  over W,
dense_rank()over W,
percent_rank()  over W,
cume_dist() over W,
avg(c_integer + c_integer)  over W,
sum(c_integer/100)  over W,
count(*)over W,
min(c_integer)  over W,
max(c_integer)  over W,
row_number()over W
from
j7
where
c_boolean is not null
window  W as (partition by c_bigint, c_date, c_time, c_boolean order by 
c_integer)
{noformat}

>From the logs:
{noformat}
2017-10-23 04:14:36,536 [BitServer-7] WARN  o.a.d.e.w.b.ControlMessageHandler - 
Dropping request for early fragment termination for path 
261230e8-d03e-9ca9-91bf-c1039deecde2:1:1 -> 
261230e8-d03e-9ca9-91bf-c1039deecde2:0:0 as path to executor unavailable.
2017-10-23 04:14:36,537 [BitServer-7] WARN  o.a.d.e.w.b.ControlMessageHandler - 
Dropping request for early fragment termination for path 
261230e8-d03e-9ca9-91bf-c1039deecde2:1:5 -> 
261230e8-d03e-9ca9-91bf-c1039deecde2:0:0 as path to executor unavailable.
2017-10-23 04:14:36,537 [BitServer-7] WARN  o.a.d.e.w.b.ControlMessageHandler - 
Dropping request for early fragment termination for path 
261230e8-d03e-9ca9-91bf-c1039deecde2:1:9 -> 
261230e8-d03e-9ca9-91bf-c1039deecde2:0:0 as path to executor unavailable.
2017-10-23 04:14:36,537 [BitServer-7] WARN  o.a.d.e.w.b.ControlMessageHandler - 
Dropping request for early fragment termination for path 
261230e8-d03e-9ca9-91bf-c1039deecde2:1:13 -> 
261230e8-d03e-9ca9-91bf-c1039deecde2:0:0 as path to executor unavailable.
2017-10-23 04:14:36,537 [BitServer-7] WARN  o.a.d.e.w.b.ControlMessageHandler - 
Dropping request for early fragment termination for path 
261230e8-d03e-9ca9-91bf-c1039deecde2:1:17 -> 
261230e8-d03e-9ca9-91bf-c1039deecde2:0:0 as path to executor unavailable.
2017-10-23 04:14:36,538 [BitServer-7] WARN  o.a.d.e.w.b.ControlMessageHandler - 
Dropping request for early fragment termination for path 
261230e8-d03e-9ca9-91bf-c1039deecde2:1:21 -> 
261230e8-d03e-9ca9-91bf-c1039deecde2:0:0 as path to executor unavailable.
2017-10-23 04:14:36,538 [BitServer-7] WARN  o.a.d.e.w.b.ControlMessageHandler - 
Dropping request for early fragment termination for path 
261230e8-d03e-9ca9-91bf-c1039deecde2:1:25 -> 
261230e8-d03e-9ca9-91bf-c1039deecde2:0:0 as path to executor unavailable.
{noformat}

{noformat}
2017-10-23 04:14:53,941 [UserServer-1] INFO  o.a.drill.exec.rpc.user.UserServer 
- RPC connection /10.10.88.196:31010 <--> /10.10.88.193:38281 (user server) 
timed out.  Timeout was set to 30 seconds. Closing connection.
2017-10-23 04:14:53,952 [UserServer-1] INFO  
o.a.d.e.w.fragment.FragmentExecutor - 261230f8-2698-15b2-952f-d4ade8d6b180:0:0: 
State change requested RUNNING --> FAILED
2017-10-23 04:14:53,952 [261230f8-2698-15b2-952f-d4ade8d6b180:frag:0:0] INFO  
o.a.d.e.w.fragment.FragmentExecutor - 261230f8-2698-15b2-952f-d4ade8d6b180:0:0: 
State change requested FAILED --> FINISHED
2017-10-23 04:14:53,956 [UserServer-1] WARN  
o.apache.drill.exec.rpc.RequestIdMap - Failure while attempting to fail rpc 
response.
java.lang.IllegalArgumentException: Self-suppression not permitted
at java.lang.Throwable.addSuppressed(Throwable.java:1043) ~[na:1.7.0_45]
at 
org.apache.drill.common.DeferredException.addException(DeferredException.java:88)
 ~[drill-common-1.12.0-SNAPSHOT.jar:1.12.0-SNAPSHOT]
at 
org.apache.drill.common.DeferredException.addThrowable(DeferredException.java:97)
 ~[drill-common-1.12.0-SNAPSHOT.jar:1.12.0-SNAPSHOT]
at 
org.apache.drill.exec.work.fragment.FragmentExecutor.fail(FragmentExecutor.java:413)
 ~[drill-java-exec-1.12.0-SNAPSHOT.jar:1.12.0-SNAPSHOT]
at 
org.apache.drill.exec.work.fragment.FragmentExecutor.access$700(FragmentExecutor.java:55)
 ~[drill-java-exec-1.12.0-SNAPSHOT.jar:1.12.0-SNAPSHOT]
at 
org.apache.drill.exec.work.fragment.FragmentExecutor$ExecutorStateImpl.fail(FragmentExecutor.java:427)
 ~[drill-java-exec-1.12.0-SNAPSHOT.jar:1.12.0-SNAPSHOT]
at 
org.apache.drill.exec.ops.FragmentContext.fail(FragmentContext.java:213) 
~[drill-java-exec-1.12.0-SNAPSHOT.jar:1.12.0-SNAPSHOT]
at 
org.apache.dri

[jira] [Created] (DRILL-5901) Drill test framework can have successful run even if a random failure occurs

2017-10-23 Thread Robert Hou (JIRA)
Robert Hou created DRILL-5901:
-

 Summary: Drill test framework can have successful run even if a 
random failure occurs
 Key: DRILL-5901
 URL: https://issues.apache.org/jira/browse/DRILL-5901
 Project: Apache Drill
  Issue Type: Bug
  Components: Tools, Build & Test
Affects Versions: 1.11.0
Reporter: Robert Hou


Random Failures:
/root/drillAutomation/framework-master/framework/resources/Advanced/tpch/tpch_sf1/original/parquet/query17.sql
Query: 
SELECT
  SUM(L.L_EXTENDEDPRICE) / 7.0 AS AVG_YEARLY
FROM
  lineitem L,
  part P
WHERE
  P.P_PARTKEY = L.L_PARTKEY
  AND P.P_BRAND = 'BRAND#13'
  AND P.P_CONTAINER = 'JUMBO CAN'
  AND L.L_QUANTITY < (
SELECT
  0.2 * AVG(L2.L_QUANTITY)
FROM
  lineitem L2
WHERE
  L2.L_PARTKEY = P.P_PARTKEY
  )
Failed with exception
java.sql.SQLException: SYSTEM ERROR: IllegalStateException: Memory was leaked 
by query. Memory leaked: (2097152)
Allocator(op:8:2:6:ParquetRowGroupScan) 100/0/7675904/100 
(res/actual/peak/limit)


Fragment 8:2

[Error Id: f21a2560-7259-4e13-88c2-9bac29e2930a on atsqa6c88.qa.lab:31010]

  (java.lang.IllegalStateException) Memory was leaked by query. Memory leaked: 
(2097152)
Allocator(op:8:2:6:ParquetRowGroupScan) 100/0/7675904/100 
(res/actual/peak/limit)

org.apache.drill.exec.memory.BaseAllocator.close():519
org.apache.drill.exec.ops.AbstractOperatorExecContext.close():86
org.apache.drill.exec.ops.OperatorContextImpl.close():108
org.apache.drill.exec.ops.FragmentContext.suppressingClose():435
org.apache.drill.exec.ops.FragmentContext.close():424
org.apache.drill.exec.work.fragment.FragmentExecutor.closeOutResources():324
org.apache.drill.exec.work.fragment.FragmentExecutor.cleanup():155
org.apache.drill.exec.work.fragment.FragmentExecutor.run():267
org.apache.drill.common.SelfCleaningRunnable.run():38
java.util.concurrent.ThreadPoolExecutor.runWorker():1145
java.util.concurrent.ThreadPoolExecutor$Worker.run():615
java.lang.Thread.run():744

at 
org.apache.drill.jdbc.impl.DrillCursor.nextRowInternally(DrillCursor.java:489)
at 
org.apache.drill.jdbc.impl.DrillCursor.loadInitialSchema(DrillCursor.java:561)
at 
org.apache.drill.jdbc.impl.DrillResultSetImpl.execute(DrillResultSetImpl.java:1895)
at 
org.apache.drill.jdbc.impl.DrillResultSetImpl.execute(DrillResultSetImpl.java:61)
at 
oadd.org.apache.calcite.avatica.AvaticaConnection$1.execute(AvaticaConnection.java:473)
at 
org.apache.drill.jdbc.impl.DrillMetaImpl.prepareAndExecute(DrillMetaImpl.java:1100)
at 
oadd.org.apache.calcite.avatica.AvaticaConnection.prepareAndExecuteInternal(AvaticaConnection.java:477)
at 
org.apache.drill.jdbc.impl.DrillConnectionImpl.prepareAndExecuteInternal(DrillConnectionImpl.java:181)
at 
oadd.org.apache.calcite.avatica.AvaticaStatement.executeInternal(AvaticaStatement.java:110)
at 
oadd.org.apache.calcite.avatica.AvaticaStatement.executeQuery(AvaticaStatement.java:130)
at 
org.apache.drill.jdbc.impl.DrillStatementImpl.executeQuery(DrillStatementImpl.java:112)
at 
org.apache.drill.test.framework.DrillTestJdbc.executeQuery(DrillTestJdbc.java:206)
at 
org.apache.drill.test.framework.DrillTestJdbc.run(DrillTestJdbc.java:115)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)
Caused by: oadd.org.apache.drill.common.exceptions.UserRemoteException: SYSTEM 
ERROR: IllegalStateException: Memory was leaked by query. Memory leaked: 
(2097152)
Allocator(op:8:2:6:ParquetRowGroupScan) 100/0/7675904/100 
(res/actual/peak/limit)


Fragment 8:2

[Error Id: f21a2560-7259-4e13-88c2-9bac29e2930a on atsqa6c88.qa.lab:31010]

  (java.lang.IllegalStateException) Memory was leaked by query. Memory leaked: 
(2097152)
Allocator(op:8:2:6:ParquetRowGroupScan) 100/0/7675904/100 
(res/actual/peak/limit)

org.apache.drill.exec.memory.BaseAllocator.close():519
org.apache.drill.exec.ops.AbstractOperatorExecContext.close():86
org.apache.drill.exec.ops.OperatorContextImpl.close():108
org.apache.drill.exec.ops.FragmentContext.suppressingClose():435
org.apache.drill.exec.ops.FragmentContext.close():424
org.apache.drill.exec.work.fragment.FragmentExecutor.closeOutResources():324
org.apache.drill.exec.work.fragment.FragmentExecutor.cleanup():155
org.apache.drill.exec.work.fragment.FragmentExecutor.run():267
org.apache.drill.common.SelfCleaningRunnable.r

[jira] [Created] (DRILL-5900) Regression: TPCH query encounters random IllegalStateException: Memory was leaked by query

2017-10-23 Thread Robert Hou (JIRA)
Robert Hou created DRILL-5900:
-

 Summary: Regression: TPCH query encounters random 
IllegalStateException: Memory was leaked by query
 Key: DRILL-5900
 URL: https://issues.apache.org/jira/browse/DRILL-5900
 Project: Apache Drill
  Issue Type: Bug
  Components: Execution - Relational Operators
Affects Versions: 1.11.0
Reporter: Robert Hou
Assignee: Pritesh Maker
Priority: Blocker


This is a random failure.  This test has passed before.

TPCH query 6:
{noformat}
SELECT
  SUM(L.L_EXTENDEDPRICE) / 7.0 AS AVG_YEARLY
FROM
  lineitem L,
  part P
WHERE
  P.P_PARTKEY = L.L_PARTKEY
  AND P.P_BRAND = 'BRAND#13'
  AND P.P_CONTAINER = 'JUMBO CAN'
  AND L.L_QUANTITY < (
SELECT
  0.2 * AVG(L2.L_QUANTITY)
FROM
  lineitem L2
WHERE
  L2.L_PARTKEY = P.P_PARTKEY
  )
{noformat}

Error is:
{noformat}
2017-10-23 10:34:55,989 [2611d7c0-b0c9-a93e-c64d-a4ef8f4baf8f:frag:8:2] ERROR 
o.a.d.e.w.fragment.FragmentExecutor - SYSTEM ERROR: IllegalStateException: 
Memory was leaked by query. Memory leaked: (2097152)
Allocator(op:8:2:6:ParquetRowGroupScan) 100/0/7675904/100 
(res/actual/peak/limit)


Fragment 8:2

[Error Id: f21a2560-7259-4e13-88c2-9bac29e2930a on atsqa6c88.qa.lab:31010]
org.apache.drill.common.exceptions.UserException: SYSTEM ERROR: 
IllegalStateException: Memory was leaked by query. Memory leaked: (2097152)
Allocator(op:8:2:6:ParquetRowGroupScan) 100/0/7675904/100 
(res/actual/peak/limit)


Fragment 8:2

[Error Id: f21a2560-7259-4e13-88c2-9bac29e2930a on atsqa6c88.qa.lab:31010]
at 
org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:586)
 ~[drill-common-1.12.0-SNAPSHOT.jar:1.12.0-SNAPSHOT]
at 
org.apache.drill.exec.work.fragment.FragmentExecutor.sendFinalState(FragmentExecutor.java:298)
 [drill-java-exec-1.12.0-SNAPSHOT.jar:1.12.0-SNAPSHOT]
at 
org.apache.drill.exec.work.fragment.FragmentExecutor.cleanup(FragmentExecutor.java:160)
 [drill-java-exec-1.12.0-SNAPSHOT.jar:1.12.0-SNAPSHOT]
at 
org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:267)
 [drill-java-exec-1.12.0-SNAPSHOT.jar:1.12.0-SNAPSHOT]
at 
org.apache.drill.common.SelfCleaningRunnable.run(SelfCleaningRunnable.java:38) 
[drill-common-1.12.0-SNAPSHOT.jar:1.12.0-SNAPSHOT]
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) 
[na:1.7.0_51]
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) 
[na:1.7.0_51]
at java.lang.Thread.run(Thread.java:744) [na:1.7.0_51]
Caused by: java.lang.IllegalStateException: Memory was leaked by query. Memory 
leaked: (2097152)
Allocator(op:8:2:6:ParquetRowGroupScan) 100/0/7675904/100 
(res/actual/peak/limit)

at 
org.apache.drill.exec.memory.BaseAllocator.close(BaseAllocator.java:519) 
~[drill-memory-base-1.12.0-SNAPSHOT.jar:1.12.0-SNAPSHOT]
at 
org.apache.drill.exec.ops.AbstractOperatorExecContext.close(AbstractOperatorExecContext.java:86)
 ~[drill-java-exec-1.12.0-SNAPSHOT.jar:1.12.0-SNAPSHOT]
at 
org.apache.drill.exec.ops.OperatorContextImpl.close(OperatorContextImpl.java:108)
 ~[drill-java-exec-1.12.0-SNAPSHOT.jar:1.12.0-SNAPSHOT]
at 
org.apache.drill.exec.ops.FragmentContext.suppressingClose(FragmentContext.java:435)
 ~[drill-java-exec-1.12.0-SNAPSHOT.jar:1.12.0-SNAPSHOT]
at 
org.apache.drill.exec.ops.FragmentContext.close(FragmentContext.java:424) 
~[drill-java-exec-1.12.0-SNAPSHOT.jar:1.12.0-SNAPSHOT]
at 
org.apache.drill.exec.work.fragment.FragmentExecutor.closeOutResources(FragmentExecutor.java:324)
 [drill-java-exec-1.12.0-SNAPSHOT.jar:1.12.0-SNAPSHOT]
at 
org.apache.drill.exec.work.fragment.FragmentExecutor.cleanup(FragmentExecutor.java:155)
 [drill-java-exec-1.12.0-SNAPSHOT.jar:1.12.0-SNAPSHOT]
... 5 common frames omitted
2017-10-23 10:34:55,989 [2611d7c0-b0c9-a93e-c64d-a4ef8f4baf8f:frag:6:0] INFO  
o.a.d.e.w.f.FragmentStatusReporter - 2611d7c0-b0c9-a93e-c64d-a4ef8f4baf8f:6:0: 
State to report: FINISHED
{noformat}

sys.version is:
1.12.0-SNAPSHOT b0c4e0486d6d4620b04a1bb8198e959d433b4840DRILL-5876: Use 
openssl profile to include netty-tcnative dependency with the platform specific 
classifier  20.10.2017 @ 16:52:35 PDT

The previous version that ran clean is this commit:
{noformat}
1.12.0-SNAPSHOT f1d1945b3772bb782039fd6811e34a7de66441c8DRILL-5582: C++ 
Client: [Threat Modeling] Drillbit may be spoofed by an attacker and this may 
lead to data being written to the attacker's target instead of Drillbit   
19.10.2017 @ 17:13:05 PDT
{noformat}

But since the failure is random, the problem could have been introduced earlier.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (DRILL-5898) Query returns columns in the wrong order

2017-10-23 Thread Robert Hou (JIRA)
Robert Hou created DRILL-5898:
-

 Summary: Query returns columns in the wrong order
 Key: DRILL-5898
 URL: https://issues.apache.org/jira/browse/DRILL-5898
 Project: Apache Drill
  Issue Type: Bug
  Components: Execution - Relational Operators
Affects Versions: 1.11.0
Reporter: Robert Hou
Assignee: Vitalii Diravka
Priority: Blocker
 Fix For: 1.12.0


This is a regression.  It worked with this commit:
{noformat}
f1d1945b3772bb782039fd6811e34a7de66441c8DRILL-5582: C++ Client: [Threat 
Modeling] Drillbit may be spoofed by an attacker and this may lead to data 
being written to the attacker's target instead of Drillbit
{noformat}
It fails with this commit, although there are six commits total between the 
last good one and this one:
{noformat}
b0c4e0486d6d4620b04a1bb8198e959d433b4840DRILL-5876: Use openssl profile 
to include netty-tcnative dependency with the platform specific classifier
{noformat}


Query is:
{noformat}
select * from dfs.`/drill/testdata/tpch100_dir_partitioned_5files/lineitem` 
where dir0=2006 and dir1=12 and dir2=15 and l_discount=0.07 order by 
l_orderkey, l_extendedprice limit 10
{noformat}

Columns are returned in a different order.  Here are the expected results:
{noformat}
foxes. furiously final ideas cajol  1994-05-27  0.071731.42 4   
F   653442  4965666.0   1.0 1994-06-23  A   1994-06-22  
NONESHIP215671  0.07200612  15 (1 time(s))
lly final account   1994-11-09  0.0745881.783   F   
653412  1.320809E7  46.01994-11-24  R   1994-11-08  TAKE 
BACK RETURNREG AIR 458104  0.08200612  15 (1 time(s))
 the asymptotes 1997-12-29  0.0760882.8 6   O   653413  
1.4271413E7 44.01998-02-04  N   1998-01-20  DELIVER IN 
PERSON   MAIL21456   0.05200612  15 (1 time(s))
carefully a 1996-09-23  0.075381.88 2   O   653378  
1.6702792E7 3.0 1996-11-14  N   1996-10-15  NONEREG AIR 
952809  0.05200612  15 (1 time(s))
ly final requests. boldly ironic theo   1995-09-04  0.072019.94 2   
O   653380  2416094.0   2.0 1995-11-14  N   1995-10-18  
COLLECT COD FOB 166101  0.02200612  15 (1 time(s))
alongside of the even, e1996-02-14  0.0786140.322   
O   653409  5622872.0   48.01996-05-02  N   1996-04-22  
NONESHIP372888  0.04200612  15 (1 time(s))
es. regular instruct1996-10-18  0.0725194.0 1   O   653382  
6048060.0   25.01996-08-29  N   1996-08-20  DELIVER IN 
PERSON   AIR 798079  0.0 200612  15 (1 time(s))
en package  1993-09-19  0.0718718.322   F   653440  
1.372054E7  12.01993-09-12  A   1993-09-09  DELIVER IN 
PERSON   TRUCK   970554  0.0 200612  15 (1 time(s))
ly regular deposits snooze. unusual, even   1998-01-18  0.07
12427.921   O   653413  2822631.0   8.0 1998-02-09  
N   1998-02-05  TAKE BACK RETURNREG AIR 322636  0.012006
12  15 (1 time(s))
 ironic ideas. bra  1996-10-13  0.0764711.533   O   
653383  6806672.0   41.01996-12-06  N   1996-11-10  TAKE 
BACK RETURNAIR 556691  0.01200612  15 (1 time(s))
{noformat}

Here are the actual results:
{noformat}
200612  15  653383  6806672 556691  3   41.064711.53
0.070.01N   O   1996-11-10  1996-10-13  1996-12-06  
TAKE BACK RETURNAIR  ironic ideas. bra
200612  15  653378  16702792952809  2   3.0 5381.88 
0.070.05N   O   1996-10-15  1996-09-23  1996-11-14  
NONEREG AIR carefully a
200612  15  653380  2416094 166101  2   2.0 2019.94 0.07
0.02N   O   1995-10-18  1995-09-04  1995-11-14  COLLECT 
COD FOB ly final requests. boldly ironic theo
200612  15  653413  2822631 322636  1   8.0 12427.92
0.070.01N   O   1998-02-05  1998-01-18  1998-02-09  
TAKE BACK RETURNREG AIR ly regular deposits snooze. unusual, even 
200612  15  653382  6048060 798079  1   25.025194.0 0.07
0.0 N   O   1996-08-20  1996-10-18  1996-08-29  DELIVER 
IN PERSON   AIR es. regular instruct
200612  15  653442  4965666 215671  4   1.0 1731.42 0.07
0.07A   F   1994-06-22  1994-05-27  1994-06-23  NONE
SHIPfoxes. furiously final ideas cajol
200612

[jira] [Created] (DRILL-5891) When Drill runs out of memory for a HashAgg, it should tell the user how much memory to allocate

2017-10-18 Thread Robert Hou (JIRA)
Robert Hou created DRILL-5891:
-

 Summary: When Drill runs out of memory for a HashAgg, it should 
tell the user how much memory to allocate
 Key: DRILL-5891
 URL: https://issues.apache.org/jira/browse/DRILL-5891
 Project: Apache Drill
  Issue Type: Bug
  Components: Execution - Relational Operators
Affects Versions: 1.11.0
Reporter: Robert Hou
Assignee: Pritesh Maker


Query is:
select count(*), max(`filename`) from dfs.`/drill/testdata/hash-agg/data1` 
group by no_nulls_col, nulls_col;

Error is:
Error: RESOURCE ERROR: Not enough memory for internal partitioning and fallback 
mechanism for HashAgg to use unbounded memory is disabled. Either enable 
fallback config drill.exec.hashagg.fallback.enabled using Alter session/system 
command or increase memory limit for Drillbit

>From drillbit.log:
{noformat}
2017-10-18 13:30:17,135 [26184629-3f4c-856a-e99e-97cdf0d29321:frag:1:8] TRACE 
o.a.d.e.p.i.aggregate.HashAggregator - Incoming sizer: Actual batch schema & 
sizes {
  no_nulls_col(type: OPTIONAL VARCHAR, count: 1023, std size: 54, actual size: 
130, data size: 132892)
  nulls_col(type: OPTIONAL VARCHAR, count: 1023, std size: 54, actual size: 
112, data size: 113673)
  EXPR$0(type: REQUIRED BIGINT, count: 1023, std size: 8, actual size: 8, data 
size: 8184)
  EXPR$1(type: OPTIONAL VARCHAR, count: 1023, std size: 54, actual size: 18, 
data size: 18414)
  Records: 1023, Total size: 524288, Data size: 273163, Gross row width: 513, 
Net row width: 268, Density: 53%}
2017-10-18 13:30:17,135 [26184629-3f4c-856a-e99e-97cdf0d29321:frag:1:8] TRACE 
o.a.d.e.p.i.aggregate.HashAggregator - 2nd phase. Estimated internal row width: 
166 Values row width: 66 batch size: 12779520  memory limit: 63161283  max 
column width: 50
2017-10-18 13:30:17,139 [26184629-3f4c-856a-e99e-97cdf0d29321:frag:3:2] TRACE 
o.a.d.e.p.impl.common.HashTable - HT allocated 4784128 for varchar of max width 
50
2017-10-18 13:30:17,139 [26184629-3f4c-856a-e99e-97cdf0d29321:frag:1:15] INFO  
o.a.d.e.p.i.aggregate.HashAggregator - User Error Occurred: Not enough memory 
for internal partitioning and fallback mechanism for HashAgg to use unbounded 
memory is disabled. Either enable fallback config 
drill.exec.hashagg.fallback.enabled using Alter session/system command or 
increase memory limit for Drillbit
org.apache.drill.common.exceptions.UserException: RESOURCE ERROR: Not enough 
memory for internal partitioning and fallback mechanism for HashAgg to use 
unbounded memory is disabled. Either enable fallback config 
drill.exec.hashagg.fallback.enabled using Alter session/system command or 
increase memory limit for Drillbit
{noformat}

I would recommend that we add a log message with the "alter" command to 
increase the amount of memory allocated, and how much memory to allocate.  
Otherwise, the user may not know what to do.

I would also not suggest enabling "drill.exec.hashagg.fallback.enabled" except 
as a last resort.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (DRILL-5889) sqlline loses RPC connection with executing query with HashAgg

2017-10-18 Thread Robert Hou (JIRA)
Robert Hou created DRILL-5889:
-

 Summary: sqlline loses RPC connection with executing query with 
HashAgg
 Key: DRILL-5889
 URL: https://issues.apache.org/jira/browse/DRILL-5889
 Project: Apache Drill
  Issue Type: Bug
  Components: Execution - Relational Operators
Affects Versions: 1.11.0
Reporter: Robert Hou


Query is:
{noformat}
alter session set `planner.memory.max_query_memory_per_node` = 10737418240;
select count(*), max(`filename`) from dfs.`/drill/testdata/hash-agg/data1` 
group by no_nulls_col, nulls_col;
{noformat}

Error is:
{noformat}
0: jdbc:drill:drillbit=10.10.100.190> select count(*), max(`filename`) from 
dfs.`/drill/testdata/hash-agg/data1` group by no_nulls_col, nulls_col;
Error: CONNECTION ERROR: Connection /10.10.100.190:45776 <--> 
/10.10.100.190:31010 (user client) closed unexpectedly. Drillbit down?


[Error Id: db4aea70-11e6-4e63-b0cc-13cdba0ee87a ] (state=,code=0)
{noformat}

>From drillbit.log:
2017-10-18 14:04:23,044 [UserServer-1] INFO  o.a.drill.exec.rpc.user.UserServer 
- RPC connection /10.10.100.190:31010 <--> /10.10.100.190:45776 (user server) 
timed out.  Timeout was set to 30 seconds. Closing connection.

Plan is:
{noformat}
| 00-00Screen
00-01  Project(EXPR$0=[$0], EXPR$1=[$1])
00-02UnionExchange
01-01  Project(EXPR$0=[$2], EXPR$1=[$3])
01-02HashAgg(group=[{0, 1}], EXPR$0=[$SUM0($2)], EXPR$1=[MAX($3)])
01-03  Project(no_nulls_col=[$0], nulls_col=[$1], EXPR$0=[$2], 
EXPR$1=[$3])
01-04HashToRandomExchange(dist0=[[$0]], dist1=[[$1]])
02-01  UnorderedMuxExchange
03-01Project(no_nulls_col=[$0], nulls_col=[$1], 
EXPR$0=[$2], EXPR$1=[$3], E_X_P_R_H_A_S_H_F_I_E_L_D=[hash32AsDouble($1, 
hash32AsDouble($0, 1301011))])
03-02  HashAgg(group=[{0, 1}], EXPR$0=[COUNT()], 
EXPR$1=[MAX($2)])
03-03Scan(groupscan=[ParquetGroupScan 
[entries=[ReadEntryWithPath [path=maprfs:///drill/testdata/hash-agg/data1]], 
selectionRoot=maprfs:/drill/testdata/hash-agg/data1, numFiles=1, 
usedMetadataFile=false, columns=[`no_nulls_col`, `nulls_col`, `filename`]]])
{noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Resolved] (DRILL-5804) External Sort times out, may be infinite loop

2017-10-17 Thread Robert Hou (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-5804?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Hou resolved DRILL-5804.
---
Resolution: Fixed

> External Sort times out, may be infinite loop
> -
>
> Key: DRILL-5804
> URL: https://issues.apache.org/jira/browse/DRILL-5804
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Relational Operators
>Affects Versions: 1.11.0
>    Reporter: Robert Hou
>Assignee: Paul Rogers
> Fix For: 1.12.0
>
> Attachments: drillbit.log
>
>
> Query is:
> {noformat}
> ALTER SESSION SET `exec.sort.disable_managed` = false;
> select count(*) from (
>   select * from (
> select s1.type type, flatten(s1.rms.rptd) rptds, s1.rms, s1.uid 
> from (
>   select d.type type, d.uid uid, flatten(d.map.rm) rms from 
> dfs.`/drill/testdata/resource-manager/nested_large` d order by d.uid
> ) s1
>   ) s2
>   order by s2.rms.mapid, s2.rptds.a, s2.rptds.do_not_exist
> );
> {noformat}
> Plan is:
> {noformat}
> | 00-00Screen
> 00-01  Project(EXPR$0=[$0])
> 00-02StreamAgg(group=[{}], EXPR$0=[$SUM0($0)])
> 00-03  UnionExchange
> 01-01StreamAgg(group=[{}], EXPR$0=[COUNT()])
> 01-02  Project($f0=[0])
> 01-03SingleMergeExchange(sort0=[4 ASC], sort1=[5 ASC], 
> sort2=[6 ASC])
> 02-01  SelectionVectorRemover
> 02-02Sort(sort0=[$4], sort1=[$5], sort2=[$6], dir0=[ASC], 
> dir1=[ASC], dir2=[ASC])
> 02-03  Project(type=[$0], rptds=[$1], rms=[$2], uid=[$3], 
> EXPR$4=[$4], EXPR$5=[$5], EXPR$6=[$6])
> 02-04HashToRandomExchange(dist0=[[$4]], dist1=[[$5]], 
> dist2=[[$6]])
> 03-01  UnorderedMuxExchange
> 04-01Project(type=[$0], rptds=[$1], rms=[$2], 
> uid=[$3], EXPR$4=[$4], EXPR$5=[$5], EXPR$6=[$6], 
> E_X_P_R_H_A_S_H_F_I_E_L_D=[hash32AsDouble($6, hash32AsDouble($5, 
> hash32AsDouble($4, 1301011)))])
> 04-02  Project(type=[$0], rptds=[$1], rms=[$2], 
> uid=[$3], EXPR$4=[ITEM($2, 'mapid')], EXPR$5=[ITEM($1, 'a')], 
> EXPR$6=[ITEM($1, 'do_not_exist')])
> 04-03Flatten(flattenField=[$1])
> 04-04  Project(type=[$0], rptds=[ITEM($2, 
> 'rptd')], rms=[$2], uid=[$1])
> 04-05SingleMergeExchange(sort0=[1 ASC])
> 05-01  SelectionVectorRemover
> 05-02Sort(sort0=[$1], dir0=[ASC])
> 05-03  Project(type=[$0], uid=[$1], 
> rms=[$2])
> 05-04
> HashToRandomExchange(dist0=[[$1]])
> 06-01  UnorderedMuxExchange
> 07-01Project(type=[$0], 
> uid=[$1], rms=[$2], E_X_P_R_H_A_S_H_F_I_E_L_D=[hash32AsDouble($1, 1301011)])
> 07-02  
> Flatten(flattenField=[$2])
> 07-03Project(type=[$0], 
> uid=[$1], rms=[ITEM($2, 'rm')])
> 07-04  
> Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath 
> [path=maprfs:///drill/testdata/resource-manager/nested_large]], 
> selectionRoot=maprfs:/drill/testdata/resource-manager/nested_large, 
> numFiles=1, usedMetadataFile=false, columns=[`type`, `uid`, `map`.`rm`]]])
> {noformat}
> Here is a segment of the drillbit.log, starting at line 55890:
> {noformat}
> 2017-09-19 04:22:56,258 [263f0252-fc60-7f8d-a1b1-c075876d1bd2:frag:2:2] DEBUG 
> o.a.d.e.t.g.SingleBatchSorterGen44 - Took 142 us to sort 1023 records
> 2017-09-19 04:22:56,265 [263f0252-fc60-7f8d-a1b1-c075876d1bd2:frag:2:4] DEBUG 
> o.a.d.e.t.g.SingleBatchSorterGen44 - Took 105 us to sort 1023 records
> 2017-09-19 04:22:56,268 [263f0252-fc60-7f8d-a1b1-c075876d1bd2:frag:3:0] DEBUG 
> o.a.d.e.p.i.p.PartitionSenderRootExec - Partitioner.next(): got next record 
> batch with status OK
> 2017-09-19 04:22:56,275 [263f0252-fc60-7f8d-a1b1-c075876d1bd2:frag:2:7] DEBUG 
> o.a.d.e.t.g.SingleBatchSorterGen44 - Took 145 us to sort 1023 records
> 2017-09-19 04:22:56,354 [263f0252-fc60-7f8d-a1b1-c075876d1bd2:frag:3:0] DEBUG 
> o.a.d.e.p.i.p.PartitionSenderRootExec - Partitioner.next(): got next record 
> batch with status OK
> 2017-09-19 04:22:56,357 [263f0252-fc60-7f8d-a1b1-c075876d1bd2:frag:2:2] DEBUG 
> o.a.d.e.t.g.Singl

[jira] [Created] (DRILL-5886) Operators should create batch sizes that the next operator can consume to avoid OOM

2017-10-17 Thread Robert Hou (JIRA)
Robert Hou created DRILL-5886:
-

 Summary: Operators should create batch sizes that the next 
operator can consume to avoid OOM
 Key: DRILL-5886
 URL: https://issues.apache.org/jira/browse/DRILL-5886
 Project: Apache Drill
  Issue Type: Bug
  Components: Execution - Relational Operators
Affects Versions: 1.11.0
Reporter: Robert Hou
 Attachments: 26478262-f0a7-8fc1-1887-4f27071b9c0f.sys.drill, 
drillbit.log.exchange

Query is:
{noformat}
ALTER SESSION SET `exec.sort.disable_managed` = false
alter session set `planner.memory.max_query_memory_per_node` = 482344960
alter session set `planner.width.max_per_node` = 1
alter session set `planner.width.max_per_query` = 1
alter session set `planner.disable_exchanges` = true
select count(*) from (select * from 
dfs.`/drill/testdata/resource-manager/3500cols.tbl` order by 
columns[450],columns[330],columns[230],columns[220],columns[110],columns[90],columns[80],columns[70],columns[40],columns[10],columns[20],columns[30],columns[40],columns[50],
 
columns[454],columns[413],columns[940],columns[834],columns[73],columns[140],columns[104],columns[],columns[30],columns[2420],columns[1520],
 columns[1410], 
columns[1110],columns[1290],columns[2380],columns[705],columns[45],columns[1054],columns[2430],columns[420],columns[404],columns[3350],
 
columns[],columns[153],columns[356],columns[84],columns[745],columns[1450],columns[103],columns[2065],columns[343],columns[3420],columns[530],
 columns[3210] ) d where d.col433 = 'sjka skjf';
{noformat}

This is the error from drillbit.log:
2017-09-12 17:36:53,155 [26478262-f0a7-8fc1-1887-4f27071b9c0f:frag:0:0] ERROR 
o.a.d.e.p.i.x.m.ExternalSortBatch - Insufficient memory to merge two batches. 
Incoming batch size: 409305088, available memory: 482344960

Here is the plan:
{noformat}
| 00-00Screen
00-01  Project(EXPR$0=[$0])
00-02StreamAgg(group=[{}], EXPR$0=[COUNT()])
00-03  Project($f0=[0])
00-04SelectionVectorRemover
00-05  Filter(condition=[=(ITEM($0, 'col433'), 'sjka skjf')])
00-06Project(T8¦¦*=[$0])
00-07  SelectionVectorRemover
00-08Sort(sort0=[$1], sort1=[$2], sort2=[$3], sort3=[$4], 
sort4=[$5], sort5=[$6], sort6=[$7], sort7=[$8], sort8=[$9], sort9=[$10], 
sort10=[$11], sort11=[$12], sort12=[$9], sort13=[$13], sort14=[$14], 
sort15=[$15], sort16=[$16], sort17=[$17], sort18=[$18], sort19=[$19], 
sort20=[$20], sort21=[$21], sort22=[$12], sort23=[$22], sort24=[$23], 
sort25=[$24], sort26=[$25], sort27=[$26], sort28=[$27], sort29=[$28], 
sort30=[$29], sort31=[$30], sort32=[$31], sort33=[$32], sort34=[$33], 
sort35=[$34], sort36=[$35], sort37=[$36], sort38=[$37], sort39=[$38], 
sort40=[$39], sort41=[$40], sort42=[$41], sort43=[$42], sort44=[$43], 
sort45=[$44], sort46=[$45], sort47=[$46], dir0=[ASC], dir1=[ASC], dir2=[ASC], 
dir3=[ASC], dir4=[ASC], dir5=[ASC], dir6=[ASC], dir7=[ASC], dir8=[ASC], 
dir9=[ASC], dir10=[ASC], dir11=[ASC], dir12=[ASC], dir13=[ASC], dir14=[ASC], 
dir15=[ASC], dir16=[ASC], dir17=[ASC], dir18=[ASC], dir19=[ASC], dir20=[ASC], 
dir21=[ASC], dir22=[ASC], dir23=[ASC], dir24=[ASC], dir25=[ASC], dir26=[ASC], 
dir27=[ASC], dir28=[ASC], dir29=[ASC], dir30=[ASC], dir31=[ASC], dir32=[ASC], 
dir33=[ASC], dir34=[ASC], dir35=[ASC], dir36=[ASC], dir37=[ASC], dir38=[ASC], 
dir39=[ASC], dir40=[ASC], dir41=[ASC], dir42=[ASC], dir43=[ASC], dir44=[ASC], 
dir45=[ASC], dir46=[ASC], dir47=[ASC])
00-09  Project(T8¦¦*=[$0], EXPR$1=[ITEM($1, 450)], 
EXPR$2=[ITEM($1, 330)], EXPR$3=[ITEM($1, 230)], EXPR$4=[ITEM($1, 220)], 
EXPR$5=[ITEM($1, 110)], EXPR$6=[ITEM($1, 90)], EXPR$7=[ITEM($1, 80)], 
EXPR$8=[ITEM($1, 70)], EXPR$9=[ITEM($1, 40)], EXPR$10=[ITEM($1, 10)], 
EXPR$11=[ITEM($1, 20)], EXPR$12=[ITEM($1, 30)], EXPR$13=[ITEM($1, 50)], 
EXPR$14=[ITEM($1, 454)], EXPR$15=[ITEM($1, 413)], EXPR$16=[ITEM($1, 940)], 
EXPR$17=[ITEM($1, 834)], EXPR$18=[ITEM($1, 73)], EXPR$19=[ITEM($1, 140)], 
EXPR$20=[ITEM($1, 104)], EXPR$21=[ITEM($1, )], EXPR$22=[ITEM($1, 2420)], 
EXPR$23=[ITEM($1, 1520)], EXPR$24=[ITEM($1, 1410)], EXPR$25=[ITEM($1, 1110)], 
EXPR$26=[ITEM($1, 1290)], EXPR$27=[ITEM($1, 2380)], EXPR$28=[ITEM($1, 705)], 
EXPR$29=[ITEM($1, 45)], EXPR$30=[ITEM($1, 1054)], EXPR$31=[ITEM($1, 2430)], 
EXPR$32=[ITEM($1, 420)], EXPR$33=[ITEM($1, 404)], EXPR$34=[ITEM($1, 3350)], 
EXPR$35=[ITEM($1, )], EXPR$36=[ITEM($1, 153)], EXPR$37=[ITEM($1, 356)], 
EXPR$38=[ITEM($1, 84)], EXPR$39=[ITEM($1, 745)], EXPR$40=[ITEM($1, 1450)], 
EXPR$41=[ITEM($1, 103)], EXPR$42=[ITEM($1, 2065)], EXPR$43=[ITEM($1, 343)], 
EXPR$44=[ITEM($1, 3420)], EXPR$45=[ITEM($1, 530)], EXPR$46=[ITEM($1, 3210)])
00-10Project(T8¦¦*=[$0], columns=[$1])
00-11  Scan(groupscan=[EasyGroupScan 
[selectionRoot=maprfs:/drill/testdata/resource-manager/3500cols.tbl, 
numFiles=1, columns

[jira] [Created] (DRILL-5885) Drill consumes 2x memory when sorting and reading a spilled batch from disk.

2017-10-17 Thread Robert Hou (JIRA)
Robert Hou created DRILL-5885:
-

 Summary: Drill consumes 2x memory when sorting and reading a 
spilled batch from disk.
 Key: DRILL-5885
 URL: https://issues.apache.org/jira/browse/DRILL-5885
 Project: Apache Drill
  Issue Type: Bug
  Components: Execution - Relational Operators
Affects Versions: 1.11.0
Reporter: Robert Hou


The query is:
{noformat}
select count(*) from (select * from 
dfs.`/drill/testdata/resource-manager/3500cols.tbl` order by 
columns[450],columns[330],columns[230],columns[220],columns[110],columns[90],columns[80],columns[70],columns[40],columns[10],columns[20],columns[30],columns[40],columns[50],
 
columns[454],columns[413],columns[940],columns[834],columns[73],columns[140],columns[104],columns[],columns[30],columns[2420],columns[1520],
 columns[1410], 
columns[1110],columns[1290],columns[2380],columns[705],columns[45],columns[1054],columns[2430],columns[420],columns[404],columns[3350],
 
columns[],columns[153],columns[356],columns[84],columns[745],columns[1450],columns[103],columns[2065],columns[343],columns[3420],columns[530],
 columns[3210] ) d where d.col433 = 'sjka skjf';
{noformat}





--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Resolved] (DRILL-5840) A query that includes sort completes, and then loses Drill connection. Drill becomes unresponsive, and cannot restart because it cannot communicate with Zookeeper

2017-10-09 Thread Robert Hou (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-5840?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Hou resolved DRILL-5840.
---
Resolution: Not A Problem

> A query that includes sort completes, and then loses Drill connection. Drill 
> becomes unresponsive, and cannot restart because it cannot communicate with 
> Zookeeper
> --
>
> Key: DRILL-5840
> URL: https://issues.apache.org/jira/browse/DRILL-5840
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Relational Operators
>Affects Versions: 1.11.0
>Reporter: Robert Hou
>Assignee: Paul Rogers
> Fix For: 1.12.0
>
>
> Query is:
> {noformat}
> ALTER SESSION SET `exec.sort.disable_managed` = false;
> select count(*) from (select * from 
> dfs.`/drill/testdata/resource-manager/250wide.tbl` order by columns[0])d 
> where d.columns[0] = 'ljdfhwuehnoiueyf';
> {noformat}
> Query tries to complete, but cannot.  It takes 20 hours from the time the 
> query tries to complete, to the time Drill finally loses its connection.
> From the drillbit.log:
> {noformat}
> 2017-10-03 16:28:14,892 [262bec7f-3539-0dd7-6fea-f2959f9df3b6:frag:0:0] DEBUG 
> o.a.drill.exec.work.foreman.Foreman - 262bec7f-3539-0dd7-6fea-f2959f9df3b6: 
> State change requested RUNNING --> COMPLETED
> 2017-10-04 01:47:27,698 [UserServer-1] DEBUG 
> o.a.d.e.r.u.UserServerRequestHandler - Received query to run.  Returning 
> query handle.
> 2017-10-04 03:30:02,916 [262bec7f-3539-0dd7-6fea-f2959f9df3b6:frag:0:0] WARN  
> o.a.d.exec.work.foreman.QueryManager - Failure while trying to delete the 
> estore profile for this query.
> org.apache.drill.common.exceptions.DrillRuntimeException: unable to delete 
> node at /running/262bec7f-3539-0dd7-6fea-f2959f9df3b6
>   at 
> org.apache.drill.exec.coord.zk.ZookeeperClient.delete(ZookeeperClient.java:343)
>  ~[drill-java-exec-1.12.0-SNAPSHOT.jar:1.12.0-SNAPSHOT]
>   at 
> org.apache.drill.exec.coord.zk.ZkEphemeralStore.remove(ZkEphemeralStore.java:108)
>  ~[drill-java-exec-1.12.0-SNAPSHOT.jar:1.12.0-SNAPSHOT]
>   at 
> org.apache.drill.exec.work.foreman.QueryManager.updateEphemeralState(QueryManager.java:293)
>  [drill-java-exec-1.12.0-SNAPSHOT.jar:1.12.0-SNAPSHOT]
>   at 
> org.apache.drill.exec.work.foreman.Foreman.recordNewState(Foreman.java:1043) 
> [drill-java-exec-1.12.0-SNAPSHOT.jar:1.12.0-SNAPSHOT]
>   at 
> org.apache.drill.exec.work.foreman.Foreman.moveToState(Foreman.java:964) 
> [drill-java-exec-1.12.0-SNAPSHOT.jar:1.12.0-SNAPSHOT]
>   at 
> org.apache.drill.exec.work.foreman.Foreman.access$2600(Foreman.java:113) 
> [drill-java-exec-1.12.0-SNAPSHOT.jar:1.12.0-SNAPSHOT]
>   at 
> org.apache.drill.exec.work.foreman.Foreman$StateSwitch.processEvent(Foreman.java:1025)
>  [drill-java-exec-1.12.0-SNAPSHOT.jar:1.12.0-SNAPSHOT]
>   at 
> org.apache.drill.exec.work.foreman.Foreman$StateSwitch.processEvent(Foreman.java:1018)
>  [drill-java-exec-1.12.0-SNAPSHOT.jar:1.12.0-SNAPSHOT]
>   at 
> org.apache.drill.common.EventProcessor.processEvents(EventProcessor.java:107) 
> [drill-common-1.12.0-SNAPSHOT.jar:1.12.0-SNAPSHOT]
>   at 
> org.apache.drill.common.EventProcessor.sendEvent(EventProcessor.java:65) 
> [drill-common-1.12.0-SNAPSHOT.jar:1.12.0-SNAPSHOT]
>   at 
> org.apache.drill.exec.work.foreman.Foreman$StateSwitch.addEvent(Foreman.java:1020)
>  [drill-java-exec-1.12.0-SNAPSHOT.jar:1.12.0-SNAPSHOT]
>   at 
> org.apache.drill.exec.work.foreman.Foreman.addToEventQueue(Foreman.java:1038) 
> [drill-java-exec-1.12.0-SNAPSHOT.jar:1.12.0-SNAPSHOT]
>   at 
> org.apache.drill.exec.work.foreman.QueryManager.nodeComplete(QueryManager.java:498)
>  [drill-java-exec-1.12.0-SNAPSHOT.jar:1.12.0-SNAPSHOT]
>   at 
> org.apache.drill.exec.work.foreman.QueryManager.access$100(QueryManager.java:66)
>  [drill-java-exec-1.12.0-SNAPSHOT.jar:1.12.0-SNAPSHOT]
>   at 
> org.apache.drill.exec.work.foreman.QueryManager$NodeTracker.fragmentComplete(QueryManager.java:462)
>  [drill-java-exec-1.12.0-SNAPSHOT.jar:1.12.0-SNAPSHOT]
>   at 
> org.apache.drill.exec.work.foreman.QueryManager.fragmentDone(QueryManager.java:147)
>  [drill-java-exec-1.12.0-SNAPSHOT.jar:1.12.0-SNAPSHOT]
>   at 
> org.apache.drill.exec.work.foreman.QueryManager.access$400(QueryManager.java:66)
>  [drill-java-exec-1.12.0-SNAPSHOT.jar:1.12.0-SNAPSHOT]
>   at 
> org.apache.drill.e

[jira] [Created] (DRILL-5813) A query that includes sort encounters Exception occurred with closed channel

2017-09-22 Thread Robert Hou (JIRA)
Robert Hou created DRILL-5813:
-

 Summary: A query that includes sort encounters Exception occurred 
with closed channel
 Key: DRILL-5813
 URL: https://issues.apache.org/jira/browse/DRILL-5813
 Project: Apache Drill
  Issue Type: Bug
  Components: Execution - Relational Operators
Affects Versions: 1.11.0
Reporter: Robert Hou
Assignee: Paul Rogers
 Fix For: 1.12.0


Query is:
{noformat}
ALTER SESSION SET `exec.sort.disable_managed` = false;
alter session set `planner.enable_decimal_data_type` = true;
select count(*) from (select * from 
dfs.`/drill/testdata/resource-manager/all_types_large` order by missing11) d 
where d.missing3 is false;
{noformat}

This query has passed before when the number of threads and amount of memory is 
restricted.  With more threads and memory, the query does not complete 
execution.

Here is the stack trace:
{noformat}
Exception occurred with closed channel.  Connection: /10.10.100.190:59281 <--> 
/10.10.100.190:31010 (user client)
java.io.IOException: Connection reset by peer
at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
at sun.nio.ch.IOUtil.read(IOUtil.java:192)
at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:384)
at 
oadd.io.netty.buffer.PooledUnsafeDirectByteBuf.setBytes(PooledUnsafeDirectByteBuf.java:311)
at oadd.io.netty.buffer.WrappedByteBuf.setBytes(WrappedByteBuf.java:407)
at 
oadd.io.netty.buffer.UnsafeDirectLittleEndian.setBytes(UnsafeDirectLittleEndian.java:32)
at oadd.io.netty.buffer.DrillBuf.setBytes(DrillBuf.java:792)
at 
oadd.io.netty.buffer.MutableWrappedByteBuf.setBytes(MutableWrappedByteBuf.java:280)
at 
oadd.io.netty.buffer.ExpandableByteBuf.setBytes(ExpandableByteBuf.java:26)
at 
oadd.io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:881)
at 
oadd.io.netty.channel.socket.nio.NioSocketChannel.doReadBytes(NioSocketChannel.java:241)
at 
oadd.io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:119)
at 
oadd.io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511)
at 
oadd.io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)
at 
oadd.io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)
at oadd.io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)
at 
oadd.io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)
at java.lang.Thread.run(Thread.java:745)
User Error Occurred: Connection /10.10.100.190:59281 <--> /10.10.100.190:31010 
(user client) closed unexpectedly. Drillbit down?
oadd.org.apache.drill.common.exceptions.UserException: CONNECTION ERROR: 
Connection /10.10.100.190:59281 <--> /10.10.100.190:31010 (user client) closed 
un
expectedly. Drillbit down?


[Error Id: b97704a4-b8f0-4cd0-b428-2cf1bcf39a1d ]
at 
oadd.org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:550)
at 
oadd.org.apache.drill.exec.rpc.user.QueryResultHandler$ChannelClosedHandler$1.operationComplete(QueryResultHandler.java:373)
at 
oadd.io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:680)
at 
oadd.io.netty.util.concurrent.DefaultPromise.notifyListeners0(DefaultPromise.java:603)
at 
oadd.io.netty.util.concurrent.DefaultPromise.notifyListeners(DefaultPromise.java:563)
at 
oadd.io.netty.util.concurrent.DefaultPromise.trySuccess(DefaultPromise.java:406)
at 
oadd.io.netty.channel.DefaultChannelPromise.trySuccess(DefaultChannelPromise.java:82)
at 
oadd.io.netty.channel.AbstractChannel$CloseFuture.setClosed(AbstractChannel.java:943)
at 
oadd.io.netty.channel.AbstractChannel$AbstractUnsafe.doClose0(AbstractChannel.java:592)
at 
oadd.io.netty.channel.AbstractChannel$AbstractUnsafe.close(AbstractChannel.java:584)
at 
oadd.io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.closeOnRead(AbstractNioByteChannel.java:71)
at 
oadd.io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.handleReadException(AbstractNioByteChannel.java:89)
at 
oadd.io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:162)
at 
oadd.io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511)
at 
oadd.io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)
at 
oadd.io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)
at oadd.io.netty.channel.nio.NioEventLoop.run(Nio

[jira] [Created] (DRILL-5805) External Sort runs out of memory

2017-09-19 Thread Robert Hou (JIRA)
Robert Hou created DRILL-5805:
-

 Summary: External Sort runs out of memory
 Key: DRILL-5805
 URL: https://issues.apache.org/jira/browse/DRILL-5805
 Project: Apache Drill
  Issue Type: Bug
  Components: Execution - Relational Operators
Affects Versions: 1.11.0
Reporter: Robert Hou
Assignee: Paul Rogers
 Fix For: 1.12.0


Query is:
{noformat}
ALTER SESSION SET `exec.sort.disable_managed` = false;
alter session set `planner.width.max_per_node` = 5;
alter session set `planner.disable_exchanges` = true;
alter session set `planner.width.max_per_query` = 100;
select count(*) from (select * from (select id, flatten(str_list) str from 
dfs.`/drill/testdata/resource-manager/flatten-large-small.json`) d order by 
d.str) d1 where d1.id=0;
{noformat}

Plan is:
{noformat}
| 00-00Screen
00-01  Project(EXPR$0=[$0])
00-02StreamAgg(group=[{}], EXPR$0=[COUNT()])
00-03  Project($f0=[0])
00-04SelectionVectorRemover
00-05  Filter(condition=[=($0, 0)])
00-06SelectionVectorRemover
00-07  Sort(sort0=[$1], dir0=[ASC])
00-08Flatten(flattenField=[$1])
00-09  Project(id=[$0], str=[$1])
00-10Scan(groupscan=[EasyGroupScan 
[selectionRoot=maprfs:/drill/testdata/resource-manager/flatten-large-small.json,
 numFiles=1, columns=[`id`, `str_list`], 
files=[maprfs:///drill/testdata/resource-manager/flatten-large-small.json]]])
{noformat}

sys.version is:
{noformat}
| 1.12.0-SNAPSHOT  | c4211d3b545b0d1996b096a8e1ace35376a63977  | Fix for 
DRILL-5670  | 09.09.2017 @ 14:38:25 PDT  | r...@qa-node190.qa.lab  | 11.09.2017 
@ 14:27:16 PDT  |
{noformat}

mult drill5447_1



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (DRILL-5804) Query times out, may be infinite loop

2017-09-19 Thread Robert Hou (JIRA)
Robert Hou created DRILL-5804:
-

 Summary: Query times out, may be infinite loop
 Key: DRILL-5804
 URL: https://issues.apache.org/jira/browse/DRILL-5804
 Project: Apache Drill
  Issue Type: Bug
  Components: Execution - Relational Operators
Affects Versions: 1.11.0
Reporter: Robert Hou
Assignee: Paul Rogers
 Fix For: 1.12.0


Query is:
{noformat}
ALTER SESSION SET `exec.sort.disable_managed` = false;
select count(*) from (
  select * from (
select s1.type type, flatten(s1.rms.rptd) rptds, s1.rms, s1.uid 
from (
  select d.type type, d.uid uid, flatten(d.map.rm) rms from 
dfs.`/drill/testdata/resource-manager/nested_large` d order by d.uid
) s1
  ) s2
  order by s2.rms.mapid, s2.rptds.a, s2.rptds.do_not_exist
);
{noformat}

Plan is:
{noformat}
| 00-00Screen
00-01  Project(EXPR$0=[$0])
00-02StreamAgg(group=[{}], EXPR$0=[$SUM0($0)])
00-03  UnionExchange
01-01StreamAgg(group=[{}], EXPR$0=[COUNT()])
01-02  Project($f0=[0])
01-03SingleMergeExchange(sort0=[4 ASC], sort1=[5 ASC], sort2=[6 
ASC])
02-01  SelectionVectorRemover
02-02Sort(sort0=[$4], sort1=[$5], sort2=[$6], dir0=[ASC], 
dir1=[ASC], dir2=[ASC])
02-03  Project(type=[$0], rptds=[$1], rms=[$2], uid=[$3], 
EXPR$4=[$4], EXPR$5=[$5], EXPR$6=[$6])
02-04HashToRandomExchange(dist0=[[$4]], dist1=[[$5]], 
dist2=[[$6]])
03-01  UnorderedMuxExchange
04-01Project(type=[$0], rptds=[$1], rms=[$2], 
uid=[$3], EXPR$4=[$4], EXPR$5=[$5], EXPR$6=[$6], 
E_X_P_R_H_A_S_H_F_I_E_L_D=[hash32AsDouble($6, hash32AsDouble($5, 
hash32AsDouble($4, 1301011)))])
04-02  Project(type=[$0], rptds=[$1], rms=[$2], 
uid=[$3], EXPR$4=[ITEM($2, 'mapid')], EXPR$5=[ITEM($1, 'a')], EXPR$6=[ITEM($1, 
'do_not_exist')])
04-03Flatten(flattenField=[$1])
04-04  Project(type=[$0], rptds=[ITEM($2, 
'rptd')], rms=[$2], uid=[$1])
04-05SingleMergeExchange(sort0=[1 ASC])
05-01  SelectionVectorRemover
05-02Sort(sort0=[$1], dir0=[ASC])
05-03  Project(type=[$0], uid=[$1], 
rms=[$2])
05-04
HashToRandomExchange(dist0=[[$1]])
06-01  UnorderedMuxExchange
07-01Project(type=[$0], 
uid=[$1], rms=[$2], E_X_P_R_H_A_S_H_F_I_E_L_D=[hash32AsDouble($1, 1301011)])
07-02  
Flatten(flattenField=[$2])
07-03Project(type=[$0], 
uid=[$1], rms=[ITEM($2, 'rm')])
07-04  
Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath 
[path=maprfs:///drill/testdata/resource-manager/nested_large]], 
selectionRoot=maprfs:/drill/testdata/resource-manager/nested_large, numFiles=1, 
usedMetadataFile=false, columns=[`type`, `uid`, `map`.`rm`]]])
{noformat}

Here is a segment of the drillbit.log, starting at line 55890:
{noformat}
2017-09-19 04:22:56,258 [263f0252-fc60-7f8d-a1b1-c075876d1bd2:frag:2:2] DEBUG 
o.a.d.e.t.g.SingleBatchSorterGen44 - Took 142 us to sort 1023 records
2017-09-19 04:22:56,265 [263f0252-fc60-7f8d-a1b1-c075876d1bd2:frag:2:4] DEBUG 
o.a.d.e.t.g.SingleBatchSorterGen44 - Took 105 us to sort 1023 records
2017-09-19 04:22:56,268 [263f0252-fc60-7f8d-a1b1-c075876d1bd2:frag:3:0] DEBUG 
o.a.d.e.p.i.p.PartitionSenderRootExec - Partitioner.next(): got next record 
batch with status OK
2017-09-19 04:22:56,275 [263f0252-fc60-7f8d-a1b1-c075876d1bd2:frag:2:7] DEBUG 
o.a.d.e.t.g.SingleBatchSorterGen44 - Took 145 us to sort 1023 records
2017-09-19 04:22:56,354 [263f0252-fc60-7f8d-a1b1-c075876d1bd2:frag:3:0] DEBUG 
o.a.d.e.p.i.p.PartitionSenderRootExec - Partitioner.next(): got next record 
batch with status OK
2017-09-19 04:22:56,357 [263f0252-fc60-7f8d-a1b1-c075876d1bd2:frag:2:2] DEBUG 
o.a.d.e.t.g.SingleBatchSorterGen44 - Took 143 us to sort 1023 records
2017-09-19 04:22:56,361 [263f0252-fc60-7f8d-a1b1-c075876d1bd2:frag:2:0] DEBUG 
o.a.d.exec.compile.ClassTransformer - Compiled and merged 
PriorityQueueCopierGen50: bytecode size = 11.0 KiB, time = 124 ms.
2017-09-19 04:22:56,365 [263f0252-fc60-7f8d-a1b1-c075876d1bd2:frag:2:4] DEBUG 
o.a.d.e.t.g.SingleBatchSorterGen44 - Took 108 us to sort 1023 records
2017-09-19 04:22:56,367 [263f0252-fc60-7f8d-a1b1-c075876d1bd2:frag:2:0] DEBUG 
o.a.d.e.p.i.x.m.PriorityQueueCopierWrapper - Copier setup complete
2017-09-19 04:22:56,375 [263f0252-fc60-7f8d-a1b1-c075876d1bd2:frag:2:7] DEBUG 
o.a.d.e.t.g.SingleBatchSorterGen44 - Took

[jira] [Created] (DRILL-5786) Query enters Exception in RPC communication

2017-09-13 Thread Robert Hou (JIRA)
Robert Hou created DRILL-5786:
-

 Summary: Query enters Exception in RPC communication
 Key: DRILL-5786
 URL: https://issues.apache.org/jira/browse/DRILL-5786
 Project: Apache Drill
  Issue Type: Bug
  Components: Execution - Relational Operators
Affects Versions: 1.11.0
Reporter: Robert Hou
Assignee: Paul Rogers
 Fix For: 1.12.0


Query is:
{noformat}
select count(*) from (select * from 
dfs.`/drill/testdata/resource-manager/3500cols.tbl` order by 
columns[450],columns[330],columns[230],columns[220],columns[110],columns[90],columns[80],columns[70],columns[40],columns[10],columns[20],columns[30],columns[40],columns[50],
 
columns[454],columns[413],columns[940],columns[834],columns[73],columns[140],columns[104],columns[],columns[30],columns[2420],columns[1520],
 columns[1410], 
columns[1110],columns[1290],columns[2380],columns[705],columns[45],columns[1054],columns[2430],columns[420],columns[404],columns[3350],
 
columns[],columns[153],columns[356],columns[84],columns[745],columns[1450],columns[103],columns[2065],columns[343],columns[3420],columns[530],
 columns[3210] ) d where d.col433 = 'sjka skjf'
{noformat}

This is the same query as DRILL-5670 but no session variables are set.

Here is the stack trace:
{noformat}
2017-09-12 13:14:57,584 [BitServer-5] ERROR o.a.d.exec.rpc.RpcExceptionHandler 
- Exception in RPC communication.  Connection: /10.10.100.190:31012 <--> 
/10.10.100.190:46230 (data server).  Closing connection.
io.netty.handler.codec.DecoderException: 
org.apache.drill.exec.exception.OutOfMemoryException: Failure allocating buffer.
at 
io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:233)
 ~[netty-codec-4.0.27.Final.jar:4.0.27.Final]
at 
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339)
 [netty-transport-4.0.27.Final.jar:4.0.27.Final]
at 
io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324)
 [netty-transport-4.0.27.Final.jar:4.0.27.Final]
at 
io.netty.channel.ChannelInboundHandlerAdapter.channelRead(ChannelInboundHandlerAdapter.java:86)
 [netty-transport-4.0.27.Final.jar:4.0.27.Final]
at 
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339)
 [netty-transport-4.0.27.Final.jar:4.0.27.Final]
at 
io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324)
 [netty-transport-4.0.27.Final.jar:4.0.27.Final]
at 
io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:847)
 [netty-transport-4.0.27.Final.jar:4.0.27.Final]
at 
io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:131)
 [netty-transport-4.0.27.Final.jar:4.0.27.Final]
at 
io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511) 
[netty-transport-4.0.27.Final.jar:4.0.27.Final]
at 
io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)
 [netty-transport-4.0.27.Final.jar:4.0.27.Final]
at 
io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382) 
[netty-transport-4.0.27.Final.jar:4.0.27.Final]
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354) 
[netty-transport-4.0.27.Final.jar:4.0.27.Final]
at 
io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)
 [netty-common-4.0.27.Final.jar:4.0.27.Final]
at java.lang.Thread.run(Thread.java:745) [na:1.7.0_111]
Caused by: org.apache.drill.exec.exception.OutOfMemoryException: Failure 
allocating buffer.
at 
io.netty.buffer.PooledByteBufAllocatorL.allocate(PooledByteBufAllocatorL.java:64)
 ~[drill-memory-base-1.12.0-SNAPSHOT.jar:4.0.27.Final]
at 
org.apache.drill.exec.memory.AllocationManager.(AllocationManager.java:81)
 ~[drill-memory-base-1.12.0-SNAPSHOT.jar:1.12.0-SNAPSHOT]
at 
org.apache.drill.exec.memory.BaseAllocator.bufferWithoutReservation(BaseAllocator.java:260)
 ~[drill-memory-base-1.12.0-SNAPSHOT.jar:1.12.0-SNAPSHOT]
at 
org.apache.drill.exec.memory.BaseAllocator.buffer(BaseAllocator.java:243) 
~[drill-memory-base-1.12.0-SNAPSHOT.jar:1.12.0-SNAPSHOT]
at 
org.apache.drill.exec.memory.BaseAllocator.buffer(BaseAllocator.java:213) 
~[drill-memory-base-1.12.0-SNAPSHOT.jar:1.12.0-SNAPSHOT]
at 
io.netty.buffer.ExpandableByteBuf.capacity(ExpandableByteBuf.java:43) 
~[drill-memory-base-1.12.0-SNAPSHOT.jar:4.0.27.Final]
at 
io.netty.buffer.AbstractByteBuf.ensureWritable(AbstractByteBuf.java:251) 
~[netty-buffer-4.0.27.Final.jar:4.0.27.Final]
at io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:849) 
~[netty-buffer-4.0.27.Final.jar:4.0.27

[jira] [Resolved] (DRILL-5522) OOM during the merge and spill process of the managed external sort

2017-09-11 Thread Robert Hou (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-5522?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Hou resolved DRILL-5522.
---
Resolution: Fixed

This has been resolved.

> OOM during the merge and spill process of the managed external sort
> ---
>
> Key: DRILL-5522
> URL: https://issues.apache.org/jira/browse/DRILL-5522
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Relational Operators
>Affects Versions: 1.10.0
>Reporter: Rahul Challapalli
>Assignee: Paul Rogers
> Attachments: 26e334aa-1afa-753f-3afe-862f76b80c18.sys.drill, 
> drillbit.log, drillbit.out, drill-env.sh
>
>
> git.commit.id.abbrev=1e0a14c
> The below query fails with an OOM
> {code}
> ALTER SESSION SET `exec.sort.disable_managed` = false;
> alter session set `planner.memory.max_query_memory_per_node` = 1552428800;
> create table dfs.drillTestDir.xsort_ctas3_multiple partition by (type, aCol) 
> as select type, rptds, rms, s3.rms.a aCol, uid from (
>   select * from (
> select s1.type type, flatten(s1.rms.rptd) rptds, s1.rms, s1.uid
> from (
>   select d.type type, d.uid uid, flatten(d.map.rm) rms from 
> dfs.`/drill/testdata/resource-manager/nested-large.json` d order by d.uid
> ) s1
>   ) s2
>   order by s2.rms.mapid, s2.rptds.a
> ) s3;
> {code}
> Stack trace
> {code}
> 2017-05-17 15:15:35,027 [26e334aa-1afa-753f-3afe-862f76b80c18:frag:4:2] INFO  
> o.a.d.e.w.fragment.FragmentExecutor - User Error Occurred: One or more nodes 
> ran out of memory while executing the query. (Unable to allocate buffer of 
> size 2097152 due to memory limit. Current allocation: 29229064)
> org.apache.drill.common.exceptions.UserException: RESOURCE ERROR: One or more 
> nodes ran out of memory while executing the query.
> Unable to allocate buffer of size 2097152 due to memory limit. Current 
> allocation: 29229064
> [Error Id: 619e2e34-704c-4964-a354-1348fb33ce8a ]
> at 
> org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:544)
>  ~[drill-common-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
> at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:244)
>  [drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
> at 
> org.apache.drill.common.SelfCleaningRunnable.run(SelfCleaningRunnable.java:38)
>  [drill-common-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>  [na:1.7.0_111]
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>  [na:1.7.0_111]
> at java.lang.Thread.run(Thread.java:745) [na:1.7.0_111]
> Caused by: org.apache.drill.exec.exception.OutOfMemoryException: Unable to 
> allocate buffer of size 2097152 due to memory limit. Current allocation: 
> 29229064
> at 
> org.apache.drill.exec.memory.BaseAllocator.buffer(BaseAllocator.java:220) 
> ~[drill-memory-base-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
> at 
> org.apache.drill.exec.memory.BaseAllocator.buffer(BaseAllocator.java:195) 
> ~[drill-memory-base-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
> at 
> org.apache.drill.exec.vector.BigIntVector.reAlloc(BigIntVector.java:212) 
> ~[vector-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
> at 
> org.apache.drill.exec.vector.BigIntVector.copyFromSafe(BigIntVector.java:324) 
> ~[vector-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
> at 
> org.apache.drill.exec.vector.NullableBigIntVector.copyFromSafe(NullableBigIntVector.java:367)
>  ~[vector-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
> at 
> org.apache.drill.exec.vector.NullableBigIntVector$TransferImpl.copyValueSafe(NullableBigIntVector.java:328)
>  ~[vector-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
> at 
> org.apache.drill.exec.vector.complex.RepeatedMapVector$RepeatedMapTransferPair.copyValueSafe(RepeatedMapVector.java:360)
>  ~[vector-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
> at 
> org.apache.drill.exec.vector.complex.MapVector$MapTransferPair.copyValueSafe(MapVector.java:220)
>  ~[vector-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
> at 
> org.apache.drill.exec.vector.complex.MapVector.copyFromSafe(MapVector.java:82)
>  ~[vector-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
> at 
> org.apache.drill.exec.test.generated.PriorityQueueCopierGen49.doCopy(PriorityQueueCopierTemplate.java:34)
>  ~[na:na]
> at 
> org.apache.drill.exec.test.generated.PriorityQueueCopierGen49.next(PriorityQueueCopierTemplate.java:76)
>  ~

[jira] [Resolved] (DRILL-5443) Managed External Sort fails with OOM while spilling to disk

2017-09-11 Thread Robert Hou (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-5443?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Hou resolved DRILL-5443.
---
Resolution: Fixed

This has been resolved.

> Managed External Sort fails with OOM while spilling to disk
> ---
>
> Key: DRILL-5443
> URL: https://issues.apache.org/jira/browse/DRILL-5443
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Relational Operators
>Affects Versions: 1.10.0, 1.11.0
>Reporter: Rahul Challapalli
>Assignee: Paul Rogers
> Fix For: 1.12.0
>
> Attachments: 265a014b-8cae-30b5-adab-ff030b6c7086.sys.drill, 
> 27016969-ef53-40dc-b582-eea25371fa1c.sys.drill, drill5443.drillbit.log, 
> drillbit.log
>
>
> git.commit.id.abbrev=3e8b01d
> The below query fails with an OOM
> {code}
> ALTER SESSION SET `exec.sort.disable_managed` = false;
> alter session set `planner.width.max_per_node` = 1;
> alter session set `planner.disable_exchanges` = true;
> alter session set `planner.width.max_per_query` = 1;
> alter session set `planner.memory.max_query_memory_per_node` = 52428800;
> select s1.type type, flatten(s1.rms.rptd) rptds from (select d.type type, 
> d.uid uid, flatten(d.map.rm) rms from 
> dfs.`/drill/testdata/resource-manager/nested-large.json` d order by d.uid) s1 
> order by s1.rms.mapid;
> {code}
> Exception from the logs
> {code}
> 2017-04-24 17:22:59,439 [27016969-ef53-40dc-b582-eea25371fa1c:frag:0:0] INFO  
> o.a.d.e.p.i.x.m.ExternalSortBatch - User Error Occurred: External Sort 
> encountered an error while spilling to disk (Unable to allocate buffer of 
> size 524288 (rounded from 307197) due to memory limit. Current allocation: 
> 25886728)
> org.apache.drill.common.exceptions.UserException: RESOURCE ERROR: External 
> Sort encountered an error while spilling to disk
> [Error Id: a64e3790-3a34-42c8-b4ea-4cb1df780e63 ]
> at 
> org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:544)
>  ~[drill-common-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
> at 
> org.apache.drill.exec.physical.impl.xsort.managed.ExternalSortBatch.doMergeAndSpill(ExternalSortBatch.java:1445)
>  [drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
> at 
> org.apache.drill.exec.physical.impl.xsort.managed.ExternalSortBatch.mergeAndSpill(ExternalSortBatch.java:1376)
>  [drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
> at 
> org.apache.drill.exec.physical.impl.xsort.managed.ExternalSortBatch.mergeRuns(ExternalSortBatch.java:1372)
>  [drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
> at 
> org.apache.drill.exec.physical.impl.xsort.managed.ExternalSortBatch.consolidateBatches(ExternalSortBatch.java:1299)
>  [drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
> at 
> org.apache.drill.exec.physical.impl.xsort.managed.ExternalSortBatch.mergeSpilledRuns(ExternalSortBatch.java:1195)
>  [drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
> at 
> org.apache.drill.exec.physical.impl.xsort.managed.ExternalSortBatch.load(ExternalSortBatch.java:689)
>  [drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
> at 
> org.apache.drill.exec.physical.impl.xsort.managed.ExternalSortBatch.innerNext(ExternalSortBatch.java:559)
>  [drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
> at 
> org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:162)
>  [drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
> at 
> org.apache.drill.exec.physical.impl.validate.IteratorValidatorBatchIterator.next(IteratorValidatorBatchIterator.java:215)
>  [drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
> at 
> org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:119)
>  [drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
> at 
> org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:109)
>  [drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
> at 
> org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext(AbstractSingleRecordBatch.java:51)
>  [drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
> at 
> org.apache.drill.exec.physical.impl.svremover.RemovingRecordBatch.innerNext(RemovingRecordBatch.java:93)
>  [drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
> at 
> org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:162)
>  [drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
> at 
> org.apache.drill.exec.physical.impl.validate.IteratorV

[jira] [Resolved] (DRILL-5753) Managed External Sort: One or more nodes ran out of memory while executing the query.

2017-09-11 Thread Robert Hou (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-5753?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Hou resolved DRILL-5753.
---
Resolution: Fixed

> Managed External Sort: One or more nodes ran out of memory while executing 
> the query.
> -
>
> Key: DRILL-5753
> URL: https://issues.apache.org/jira/browse/DRILL-5753
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Relational Operators
>Affects Versions: 1.11.0
>Reporter: Robert Hou
>Assignee: Paul Rogers
> Fix For: 1.12.0
>
> Attachments: 26596b4e-9883-7dc2-6275-37134f7d63be.sys.drill, 
> drillbit.log
>
>
> The query is:
> {noformat}
> ALTER SESSION SET `exec.sort.disable_managed` = false;
> alter session set `planner.memory.max_query_memory_per_node` = 1252428800;
> select count(*) from (
>   select * from (
> select s1.type type, flatten(s1.rms.rptd) rptds, s1.rms, s1.uid 
> from (
>   select d.type type, d.uid uid, flatten(d.map.rm) rms from 
> dfs.`/drill/testdata/resource-manager/nested-large.json` d order by d.uid
> ) s1
>   ) s2
>   order by s2.rms.mapid, s2.rptds.a, s2.rptds.do_not_exist
> );
> ALTER SESSION SET `exec.sort.disable_managed` = true;
> alter session set `planner.memory.max_query_memory_per_node` = 2147483648;
> {noformat}
> The stack trace is:
> {noformat}
> 2017-08-30 03:35:10,479 [BitServer-5] DEBUG 
> o.a.drill.exec.work.foreman.Foreman - 26596b4e-9883-7dc2-6275-37134f7d63be: 
> State change requested RUNNING --> FAILED
> org.apache.drill.common.exceptions.UserRemoteException: RESOURCE ERROR: One 
> or more nodes ran out of memory while executing the query.
> Unable to allocate buffer of size 4194304 due to memory limit. Current 
> allocation: 43960640
> Fragment 2:9
> [Error Id: f58210a2-7569-42d0-8961-8c7e42c7fea3 on atsqa6c80.qa.lab:31010]
>   (org.apache.drill.exec.exception.OutOfMemoryException) Unable to allocate 
> buffer of size 4194304 due to memory limit. Current allocation: 43960640
> org.apache.drill.exec.memory.BaseAllocator.buffer():238
> org.apache.drill.exec.memory.BaseAllocator.buffer():213
> org.apache.drill.exec.vector.BigIntVector.reAlloc():252
> org.apache.drill.exec.vector.BigIntVector$Mutator.setSafe():452
> org.apache.drill.exec.vector.RepeatedBigIntVector$Mutator.addSafe():355
> org.apache.drill.exec.vector.RepeatedBigIntVector.copyFromSafe():220
> 
> org.apache.drill.exec.vector.RepeatedBigIntVector$TransferImpl.copyValueSafe():202
> 
> org.apache.drill.exec.vector.complex.MapVector$MapTransferPair.copyValueSafe():225
> 
> org.apache.drill.exec.vector.complex.MapVector$MapTransferPair.copyValueSafe():225
> org.apache.drill.exec.vector.complex.MapVector.copyFromSafe():82
> 
> org.apache.drill.exec.test.generated.PriorityQueueCopierGen1466.doCopy():47
> org.apache.drill.exec.test.generated.PriorityQueueCopierGen1466.next():77
> 
> org.apache.drill.exec.physical.impl.xsort.managed.PriorityQueueCopierWrapper$BatchMerger.next():267
> 
> org.apache.drill.exec.physical.impl.xsort.managed.ExternalSortBatch.load():374
> 
> org.apache.drill.exec.physical.impl.xsort.managed.ExternalSortBatch.innerNext():303
> org.apache.drill.exec.record.AbstractRecordBatch.next():164
> org.apache.drill.exec.record.AbstractRecordBatch.next():119
> org.apache.drill.exec.record.AbstractRecordBatch.next():109
> org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext():51
> 
> org.apache.drill.exec.physical.impl.svremover.RemovingRecordBatch.innerNext():93
> org.apache.drill.exec.record.AbstractRecordBatch.next():164
> org.apache.drill.exec.physical.impl.BaseRootExec.next():105
> 
> org.apache.drill.exec.physical.impl.SingleSenderCreator$SingleSenderRootExec.innerNext():92
> org.apache.drill.exec.physical.impl.BaseRootExec.next():95
> org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():234
> org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():227
> java.security.AccessController.doPrivileged():-2
> javax.security.auth.Subject.doAs():415
> org.apache.hadoop.security.UserGroupInformation.doAs():1595
> org.apache.drill.exec.work.fragment.FragmentExecutor.run():227
> org.apache.drill.common.SelfCleaningRunnable.run():38
> java.util.concurrent.ThreadPoolExecutor.runWorker():1145
> java.util.concurrent.ThreadPoolExecutor$Worker.run():615
> java.lang.Thread.run():744
> at 
> org.apache.dri

[jira] [Resolved] (DRILL-5744) External sort fails with OOM error

2017-09-11 Thread Robert Hou (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-5744?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Hou resolved DRILL-5744.
---
Resolution: Fixed

This has been verified.

> External sort fails with OOM error
> --
>
> Key: DRILL-5744
> URL: https://issues.apache.org/jira/browse/DRILL-5744
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Relational Operators
>Affects Versions: 1.10.0
>    Reporter: Robert Hou
>Assignee: Paul Rogers
> Fix For: 1.12.0
>
> Attachments: 265b163b-cf44-d2ff-2e70-4cd746b56611.sys.drill, 
> q34.drillbit.log
>
>
> Query is:
> {noformat}
> ALTER SESSION SET `exec.sort.disable_managed` = false;
> alter session set `planner.width.max_per_node` = 1;
> alter session set `planner.disable_exchanges` = true;
> alter session set `planner.width.max_per_query` = 1;
> alter session set `planner.memory.max_query_memory_per_node` = 152428800;
> select count(*) from (
>   select * from (
> select s1.type type, flatten(s1.rms.rptd) rptds, s1.rms, s1.uid 
> from (
>   select d.type type, d.uid uid, flatten(d.map.rm) rms from 
> dfs.`/drill/testdata/resource-manager/nested-large.json` d order by d.uid
> ) s1
>   ) s2
>   order by s2.rms.mapid
> );
> ALTER SESSION SET `exec.sort.disable_managed` = true;
> alter session set `planner.width.max_per_node` = 17;
> alter session set `planner.disable_exchanges` = false;
> alter session set `planner.width.max_per_query` = 1000;
> alter session set `planner.memory.max_query_memory_per_node` = 2147483648;
> {noformat}
> Stack trace is:
> {noformat}
> 2017-08-23 06:59:42,763 [266275e5-ebdb-14ae-d52d-00fa3a154f6d:frag:0:0] INFO  
> o.a.d.e.w.fragment.FragmentExecutor - User Error Occurred: One or more nodes
>  ran out of memory while executing the query. (Unable to allocate buffer of 
> size 4194304 (rounded from 3276750) due to memory limit. Current allocation: 7
> 9986944)
> org.apache.drill.common.exceptions.UserException: RESOURCE ERROR: One or more 
> nodes ran out of memory while executing the query.
> Unable to allocate buffer of size 4194304 (rounded from 3276750) due to 
> memory limit. Current allocation: 79986944
> [Error Id: 4f4959df-0921-4a50-b75e-56488469ab10 ]
>   at 
> org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:550)
>  ~[drill-common-1.12.0-SNAPSHOT.jar:1.12.0-SNAPSHOT]
>   at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:244)
>  [drill-java-exec-1.12.0-SNAPSHOT.jar:1.12.0-SNAPSHOT]
>   at 
> org.apache.drill.common.SelfCleaningRunnable.run(SelfCleaningRunnable.java:38)
>  [drill-common-1.12.0-SNAPSHOT.jar:1.12.0-SNAPSHOT]
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>  [na:1.7.0_51]
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>  [na:1.7.0_51]
>   at java.lang.Thread.run(Thread.java:744) [na:1.7.0_51]
> Caused by: org.apache.drill.exec.exception.OutOfMemoryException: Unable to 
> allocate buffer of size 4194304 (rounded from 3276750) due to memory limit. 
> Cur
> rent allocation: 79986944
>   at 
> org.apache.drill.exec.memory.BaseAllocator.buffer(BaseAllocator.java:238) 
> ~[drill-memory-base-1.12.0-SNAPSHOT.jar:1.12.0-SNAPSHOT]
>   at 
> org.apache.drill.exec.memory.BaseAllocator.buffer(BaseAllocator.java:213) 
> ~[drill-memory-base-1.12.0-SNAPSHOT.jar:1.12.0-SNAPSHOT]
>   at 
> org.apache.drill.exec.vector.VarCharVector.allocateNew(VarCharVector.java:402)
>  ~[vector-1.12.0-SNAPSHOT.jar:1.12.0-SNAPSHOT]
>   at 
> org.apache.drill.exec.vector.NullableVarCharVector.allocateNew(NullableVarCharVector.java:236)
>  ~[vector-1.12.0-SNAPSHOT.jar:1.12.0-SNAPSHOT]
>   at 
> org.apache.drill.exec.vector.AllocationHelper.allocatePrecomputedChildCount(AllocationHelper.java:33)
>  ~[vector-1.12.0-SNAPSHOT.jar:1.12.0-SNAPS
> HOT]
>   at 
> org.apache.drill.exec.vector.AllocationHelper.allocate(AllocationHelper.java:46)
>  ~[vector-1.12.0-SNAPSHOT.jar:1.12.0-SNAPSHOT]
>   at 
> org.apache.drill.exec.record.VectorInitializer.allocateVector(VectorInitializer.java:113)
>  ~[drill-java-exec-1.12.0-SNAPSHOT.jar:1.12.0-SNAPSHOT
> ]
>   at 
> org.apache.drill.exec.record.VectorInitializer.allocateVector(VectorInitializer.java:95)
>  ~[drill-java-exec-1.12.0-SNAPSHOT.jar:1.12.0-SNAPSHOT]
>   at 
> org.apache.drill.exec.record.VectorInitializer.allocateMap(VectorInitializer.java:130)
>  ~[drill-java-exec-1.12.0-

[jira] [Created] (DRILL-5778) Drill seems to run out of memory but completes execution

2017-09-08 Thread Robert Hou (JIRA)
Robert Hou created DRILL-5778:
-

 Summary: Drill seems to run out of memory but completes execution
 Key: DRILL-5778
 URL: https://issues.apache.org/jira/browse/DRILL-5778
 Project: Apache Drill
  Issue Type: Bug
  Components: Execution - Relational Operators
Affects Versions: 1.11.0
Reporter: Robert Hou
Assignee: Paul Rogers
 Fix For: 1.12.0


Query is:
{noformat}
ALTER SESSION SET `exec.sort.disable_managed` = false;
alter session set `planner.width.max_per_node` = 1;
alter session set `planner.disable_exchanges` = true;
alter session set `planner.width.max_per_query` = 1;
alter session set `planner.memory.max_query_memory_per_node` = 2147483648;
select count(*) from (select * from (select id, flatten(str_list) str from 
dfs.`/drill/testdata/resource-manager/flatten-large-small.json`) d order by 
d.str) d1 where d1.id=0;
{noformat}

Plan is:
{noformat}
| 00-00Screen
00-01  Project(EXPR$0=[$0])
00-02StreamAgg(group=[{}], EXPR$0=[$SUM0($0)])
00-03  UnionExchange
01-01StreamAgg(group=[{}], EXPR$0=[COUNT()])
01-02  Project($f0=[0])
01-03SelectionVectorRemover
01-04  Filter(condition=[=($0, 0)])
01-05SingleMergeExchange(sort0=[1 ASC])
02-01  SelectionVectorRemover
02-02Sort(sort0=[$1], dir0=[ASC])
02-03  Project(id=[$0], str=[$1])
02-04HashToRandomExchange(dist0=[[$1]])
03-01  UnorderedMuxExchange
04-01Project(id=[$0], str=[$1], 
E_X_P_R_H_A_S_H_F_I_E_L_D=[hash32AsDouble($1, 1301011)])
04-02  Flatten(flattenField=[$1])
04-03Project(id=[$0], str=[$1])
04-04  Scan(groupscan=[EasyGroupScan 
[selectionRoot=maprfs:/drill/testdata/resource-manager/flatten-large-small.json,
 numFiles=1, columns=[`id`, `str_list`], 
files=[maprfs:///drill/testdata/resource-manager/flatten-large-small.json]]])
{noformat}

>From drillbit.log:
{noformat}
2017-09-08 05:07:21,515 [264d780f-41ac-2c4f-6bc8-bdbb5eeb3df0:frag:0:0] DEBUG 
o.a.d.e.p.i.x.m.ExternalSortBatch - Actual batch schema & sizes {
  str(type: REQUIRED VARCHAR, count: 4096, std size: 54, actual size: 134, data 
size: 548360)
  id(type: OPTIONAL BIGINT, count: 4096, std size: 8, actual size: 9, data 
size: 36864)
  Records: 4096, Total size: 1073819648, Data size: 585224, Gross row width: 
262163, Net row width: 143, Density: 1}
2017-09-08 05:07:21,515 [264d780f-41ac-2c4f-6bc8-bdbb5eeb3df0:frag:0:0] ERROR 
o.a.d.e.p.i.x.m.ExternalSortBatch - Insufficient memory to merge two batches. 
Incoming batch size: 1073819648, available memory: 2147483648
2017-09-08 05:07:21,517 [264d780f-41ac-2c4f-6bc8-bdbb5eeb3df0:frag:0:0] INFO  
o.a.d.e.c.ClassCompilerSelector - Java compiler policy: DEFAULT, Debug option: 
true
2017-09-08 05:07:21,517 [264d780f-41ac-2c4f-6bc8-bdbb5eeb3df0:frag:0:0] DEBUG 
o.a.d.e.compile.JaninoClassCompiler - Compiling (source size=3.3 KiB):

...

2017-09-08 05:07:21,536 [264d780f-41ac-2c4f-6bc8-bdbb5eeb3df0:frag:0:0] DEBUG 
o.a.d.exec.compile.ClassTransformer - Compiled and merged 
SingleBatchSorterGen2677: bytecode size = 3.6 KiB, time = 19 ms.
2017-09-08 05:07:21,566 [264d780f-41ac-2c4f-6bc8-bdbb5eeb3df0:frag:0:0] DEBUG 
o.a.d.e.t.g.SingleBatchSorterGen2677 - Took 5608 us to sort 4096 records
2017-09-08 05:07:21,566 [264d780f-41ac-2c4f-6bc8-bdbb5eeb3df0:frag:0:0] DEBUG 
o.a.d.e.p.i.x.m.ExternalSortBatch - Input Batch Estimates: record size = 143 
bytes; net = 1073819648 bytes, gross = 1610729472, records = 4096
2017-09-08 05:07:21,566 [264d780f-41ac-2c4f-6bc8-bdbb5eeb3df0:frag:0:0] DEBUG 
o.a.d.e.p.i.x.m.ExternalSortBatch - Spill batch size: net = 1048476 bytes, 
gross = 1572714 bytes, records = 7332; spill file = 268435456 bytes
2017-09-08 05:07:21,566 [264d780f-41ac-2c4f-6bc8-bdbb5eeb3df0:frag:0:0] DEBUG 
o.a.d.e.p.i.x.m.ExternalSortBatch - Output batch size: net = 9371505 bytes, 
gross = 14057257 bytes, records = 65535
2017-09-08 05:07:21,566 [264d780f-41ac-2c4f-6bc8-bdbb5eeb3df0:frag:0:0] DEBUG 
o.a.d.e.p.i.x.m.ExternalSortBatch - Available memory: 2147483648, buffer memory 
= 2143289744, merge memory = 2128740638
2017-09-08 05:07:21,571 [264d780f-41ac-2c4f-6bc8-bdbb5eeb3df0:frag:0:0] DEBUG 
o.a.d.e.t.g.SingleBatchSorterGen2677 - Took 4303 us to sort 4096 records
2017-09-08 05:07:21,571 [264d780f-41ac-2c4f-6bc8-bdbb5eeb3df0:frag:0:0] DEBUG 
o.a.d.e.p.i.x.m.ExternalSortBatch - Input Batch Estimates: record size = 266 
bytes; net = 1073819648 bytes, gross = 1610729472, records = 4096
2017-09-08 05:07:21,571 [264d780f-41ac-2c4f-6bc8-bdbb5eeb3df0:frag:0:0] DEBUG 
o.a.d.e.p.i.x.m.ExternalSortBatch - Spill batch size: net = 1048572 bytes, 
gross = 157285

[jira] [Created] (DRILL-5774) Excessive memory allocation

2017-09-07 Thread Robert Hou (JIRA)
Robert Hou created DRILL-5774:
-

 Summary: Excessive memory allocation
 Key: DRILL-5774
 URL: https://issues.apache.org/jira/browse/DRILL-5774
 Project: Apache Drill
  Issue Type: Bug
  Components: Execution - Relational Operators
Affects Versions: 1.11.0
Reporter: Robert Hou
Assignee: Paul Rogers
 Fix For: 1.12.0


This query exhibits excessive memory allocation:
{noformat}
ALTER SESSION SET `exec.sort.disable_managed` = false;
alter session set `planner.width.max_per_node` = 1;
alter session set `planner.disable_exchanges` = true;
alter session set `planner.width.max_per_query` = 1;
select count(*) from (select * from (select id, flatten(str_list) str from 
dfs.`/drill/testdata/resource-manager/flatten-large-small.json`) d order by 
d.str) d1 where d1.id=0;
{noformat}

This query does a flatten on a large table.  The result is 160M records.  Half 
the records have a one-byte string, and half have a 253-byte string.  And then 
there are 40K records with 223 byte strings.

{noformat}
select length(str), count(*) from (select id, flatten(str_list) str from 
dfs.`/drill/testdata/resource-manager/flatten-large-small.json`) group by 
length(str);
+-+---+
| EXPR$0  |  EXPR$1   |
+-+---+
| 223 | 4 |
| 1   | 80042001  |
| 253 | 8000  |
{noformat}

>From the drillbit.log:
{noformat}
2017-09-02 11:43:44,598 [26550427-6adf-a52e-2ea8-dc52d8d8433f:frag:0:0] DEBUG 
o.a.d.e.p.i.x.m.ExternalSortBatch - Actual batch schema & sizes {
  str(type: REQUIRED VARCHAR, count: 4096, std size: 54, actual size: 134, data 
size: 548360)
  id(type: OPTIONAL BIGINT, count: 4096, std size: 8, actual size: 9, data 
size: 36864)
  Records: 4096, Total size: 1073819648, Data size: 585224, Gross row width: 
262163, Net row width: 143, Density: 1}
{noformat}

The data size is 585K, but the batch size is 1 GB.  The density is 1%.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (DRILL-5753) Managed External Sort: One or more nodes ran out of memory while executing the query.

2017-08-30 Thread Robert Hou (JIRA)
Robert Hou created DRILL-5753:
-

 Summary: Managed External Sort: One or more nodes ran out of 
memory while executing the query.
 Key: DRILL-5753
 URL: https://issues.apache.org/jira/browse/DRILL-5753
 Project: Apache Drill
  Issue Type: Bug
  Components: Execution - Relational Operators
Affects Versions: 1.11.0
Reporter: Robert Hou
Assignee: Paul Rogers
 Fix For: 1.12.0


The query is:
{noformat}
ALTER SESSION SET `exec.sort.disable_managed` = false;
alter session set `planner.memory.max_query_memory_per_node` = 1252428800;
select count(*) from (
  select * from (
select s1.type type, flatten(s1.rms.rptd) rptds, s1.rms, s1.uid 
from (
  select d.type type, d.uid uid, flatten(d.map.rm) rms from 
dfs.`/drill/testdata/resource-manager/nested-large.json` d order by d.uid
) s1
  ) s2
  order by s2.rms.mapid, s2.rptds.a, s2.rptds.do_not_exist
);
ALTER SESSION SET `exec.sort.disable_managed` = true;
alter session set `planner.memory.max_query_memory_per_node` = 2147483648;
{noformat}

The stack trace is:
{noformat}
2017-08-30 03:35:10,479 [BitServer-5] DEBUG o.a.drill.exec.work.foreman.Foreman 
- 26596b4e-9883-7dc2-6275-37134f7d63be: State change requested RUNNING --> 
FAILED
org.apache.drill.common.exceptions.UserRemoteException: RESOURCE ERROR: One or 
more nodes ran out of memory while executing the query.

Unable to allocate buffer of size 4194304 due to memory limit. Current 
allocation: 43960640
Fragment 2:9

[Error Id: f58210a2-7569-42d0-8961-8c7e42c7fea3 on atsqa6c80.qa.lab:31010]

  (org.apache.drill.exec.exception.OutOfMemoryException) Unable to allocate 
buffer of size 4194304 due to memory limit. Current allocation: 43960640
org.apache.drill.exec.memory.BaseAllocator.buffer():238
org.apache.drill.exec.memory.BaseAllocator.buffer():213
org.apache.drill.exec.vector.BigIntVector.reAlloc():252
org.apache.drill.exec.vector.BigIntVector$Mutator.setSafe():452
org.apache.drill.exec.vector.RepeatedBigIntVector$Mutator.addSafe():355
org.apache.drill.exec.vector.RepeatedBigIntVector.copyFromSafe():220

org.apache.drill.exec.vector.RepeatedBigIntVector$TransferImpl.copyValueSafe():202

org.apache.drill.exec.vector.complex.MapVector$MapTransferPair.copyValueSafe():225

org.apache.drill.exec.vector.complex.MapVector$MapTransferPair.copyValueSafe():225
org.apache.drill.exec.vector.complex.MapVector.copyFromSafe():82
org.apache.drill.exec.test.generated.PriorityQueueCopierGen1466.doCopy():47
org.apache.drill.exec.test.generated.PriorityQueueCopierGen1466.next():77

org.apache.drill.exec.physical.impl.xsort.managed.PriorityQueueCopierWrapper$BatchMerger.next():267

org.apache.drill.exec.physical.impl.xsort.managed.ExternalSortBatch.load():374

org.apache.drill.exec.physical.impl.xsort.managed.ExternalSortBatch.innerNext():303
org.apache.drill.exec.record.AbstractRecordBatch.next():164
org.apache.drill.exec.record.AbstractRecordBatch.next():119
org.apache.drill.exec.record.AbstractRecordBatch.next():109
org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext():51

org.apache.drill.exec.physical.impl.svremover.RemovingRecordBatch.innerNext():93
org.apache.drill.exec.record.AbstractRecordBatch.next():164
org.apache.drill.exec.physical.impl.BaseRootExec.next():105

org.apache.drill.exec.physical.impl.SingleSenderCreator$SingleSenderRootExec.innerNext():92
org.apache.drill.exec.physical.impl.BaseRootExec.next():95
org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():234
org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():227
java.security.AccessController.doPrivileged():-2
javax.security.auth.Subject.doAs():415
org.apache.hadoop.security.UserGroupInformation.doAs():1595
org.apache.drill.exec.work.fragment.FragmentExecutor.run():227
org.apache.drill.common.SelfCleaningRunnable.run():38
java.util.concurrent.ThreadPoolExecutor.runWorker():1145
java.util.concurrent.ThreadPoolExecutor$Worker.run():615
java.lang.Thread.run():744

at 
org.apache.drill.exec.work.foreman.QueryManager$1.statusUpdate(QueryManager.java:521)
 [drill-java-exec-1.12.0-SNAPSHOT.jar:1.12.0-SNAPSHOT]
at 
org.apache.drill.exec.rpc.control.WorkEventBus.statusUpdate(WorkEventBus.java:71)
 [drill-java-exec-1.12.0-SNAPSHOT.jar:1.12.0-SNAPSHOT]
at 
org.apache.drill.exec.work.batch.ControlMessageHandler.handle(ControlMessageHandler.java:94)
 [drill-java-exec-1.12.0-SNAPSHOT.jar:1.12.0-SNAPSHOT]
at 
org.apache.drill.exec.work.batch.ControlMessageHandler.handle(ControlMessageHandler.java:55)
 [drill-java-exec-1.12.0-SNAPSHOT.jar:1.12.0-SNAPSHOT]
at org.apache.drill.exec.rpc.BasicServer.handle(BasicServer.java:157) 
[drill-rpc-1.12.0-SNAPSHOT.jar:1.12.0-SNAPS

[jira] [Resolved] (DRILL-5732) Unable to allocate sv2 for 9039 records, and not enough batchGroups to spill.

2017-08-29 Thread Robert Hou (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-5732?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Hou resolved DRILL-5732.
---
Resolution: Not A Problem

> Unable to allocate sv2 for 9039 records, and not enough batchGroups to spill.
> -
>
> Key: DRILL-5732
> URL: https://issues.apache.org/jira/browse/DRILL-5732
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.10.0
>        Reporter: Robert Hou
>Assignee: Paul Rogers
> Attachments: 26621eb2-daec-cef9-efed-5986e72a750a.sys.drill, 
> drillbit.log.83
>
>
> git commit id:
> {noformat}
> | 1.12.0-SNAPSHOT  | e9065b55ea560e7f737d6fcb4948f9e945b9b14f  | DRILL-5660: 
> Parquet metadata caching improvements  | 15.08.2017 @ 09:31:00 PDT  | 
> r...@qa-node190.qa.lab  | 15.08.2017 @ 13:29:26 PDT  |
> {noformat}
> Query is:
> {noformat}
> ALTER SESSION SET `exec.sort.disable_managed` = false;
> alter session set `planner.disable_exchanges` = true;
> alter session set `planner.memory.max_query_memory_per_node` = 104857600;
> alter session set `planner.width.max_per_node` = 1;
> alter session set `planner.width.max_per_query` = 1;
> select max(col1), max(cs_sold_date_sk), max(cs_sold_time_sk), 
> max(cs_ship_date_sk), max(cs_bill_customer_sk), max(cs_bill_cdemo_sk), 
> max(cs_bill_hdemo_sk), max(cs_bill_addr_sk), max(cs_ship_customer_sk), 
> max(cs_ship_cdemo_sk), max(cs_ship_hdemo_sk), max(cs_ship_addr_sk), 
> max(cs_call_center_sk), max(cs_catalog_page_sk), max(cs_ship_mode_sk), 
> min(cs_warehouse_sk), max(cs_item_sk), max(cs_promo_sk), 
> max(cs_order_number), max(cs_quantity), max(cs_wholesale_cost), 
> max(cs_list_price), max(cs_sales_price), max(cs_ext_discount_amt), 
> min(cs_ext_sales_price), max(cs_ext_wholesale_cost), min(cs_ext_list_price), 
> min(cs_ext_tax), min(cs_coupon_amt), max(cs_ext_ship_cost), max(cs_net_paid), 
> max(cs_net_paid_inc_tax), min(cs_net_paid_inc_ship), 
> min(cs_net_paid_inc_ship_tax), min(cs_net_profit), min(c_customer_sk), 
> min(length(c_customer_id)), max(c_current_cdemo_sk), max(c_current_hdemo_sk), 
> min(c_current_addr_sk), min(c_first_shipto_date_sk), 
> min(c_first_sales_date_sk), min(length(c_salutation)), 
> min(length(c_first_name)), min(length(c_last_name)), 
> min(length(c_preferred_cust_flag)), max(c_birth_day), min(c_birth_month), 
> min(c_birth_year), max(c_last_review_date), c_email_address  from (select 
> cs_sold_date_sk+cs_sold_time_sk col1, * from 
> dfs.`/drill/testdata/resource-manager/md1362` order by c_email_address nulls 
> first) d where d.col1 > 2536816 and c_email_address is not null group by 
> c_email_address;
> ALTER SESSION SET `exec.sort.disable_managed` = true;
> alter session set `planner.disable_exchanges` = false;
> alter session set `planner.memory.max_query_memory_per_node` = 2147483648;
> alter session set `planner.width.max_per_node` = 17;
> alter session set `planner.width.max_per_query` = 1000;
> {noformat}
> Here is the stack trace:
> {noformat}
> 2017-08-18 13:15:27,052 [2668b522-5833-8fd2-0b6d-e685197f0ae3:frag:0:0] DEBUG 
> o.a.d.e.t.g.SingleBatchSorterGen27 - Took 6445 us to sort 9039 records
> 2017-08-18 13:15:27,420 [2668b522-5833-8fd2-0b6d-e685197f0ae3:frag:0:0] DEBUG 
> o.a.d.e.p.i.xsort.ExternalSortBatch - Copier allocator current allocation 0
> 2017-08-18 13:15:27,420 [2668b522-5833-8fd2-0b6d-e685197f0ae3:frag:0:0] DEBUG 
> o.a.d.e.p.i.xsort.ExternalSortBatch - mergeAndSpill: starting total size in 
> memory = 71964288
> 2017-08-18 13:15:27,421 [2668b522-5833-8fd2-0b6d-e685197f0ae3:frag:0:0] INFO  
> o.a.d.e.p.i.xsort.ExternalSortBatch - User Error Occurred: One or more nodes 
> ran out of memory while executing the query.
> org.apache.drill.common.exceptions.UserException: RESOURCE ERROR: One or more 
> nodes ran out of memory while executing the query.
> Unable to allocate sv2 for 9039 records, and not enough batchGroups to spill.
> batchGroups.size 1
> spilledBatchGroups.size 0
> allocated memory 71964288
> allocator limit 52428800
> [Error Id: 7b248f12-2b31-4013-86b6-92e6c842db48 ]
> at 
> org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:550)
>  ~[drill-common-1.12.0-SNAPSHOT.jar:1.12.0-SNAPSHOT]
> at 
> org.apache.drill.exec.physical.impl.xsort.ExternalSortBatch.newSV2(ExternalSortBatch.java:637)
>  [drill-java-exec-1.12.0-SNAPSHOT.jar:1.12.0-SNAPSHOT]
> at 
> org.apache.drill.exec.physical.impl.xsort.ExternalSortBatch.innerNext(ExternalSortBatch.java:379)
>  [drill-java-exec-1.12.0-SNAPSHOT.jar:1.12.0-SNAPSHOT

[jira] [Created] (DRILL-5744) External sort fails with OOM error

2017-08-28 Thread Robert Hou (JIRA)
Robert Hou created DRILL-5744:
-

 Summary: External sort fails with OOM error
 Key: DRILL-5744
 URL: https://issues.apache.org/jira/browse/DRILL-5744
 Project: Apache Drill
  Issue Type: Bug
  Components: Execution - Relational Operators
Affects Versions: 1.10.0
Reporter: Robert Hou
Assignee: Paul Rogers
 Fix For: 1.12.0


Query is:
{noformat}
ALTER SESSION SET `exec.sort.disable_managed` = false;
alter session set `planner.width.max_per_node` = 1;
alter session set `planner.disable_exchanges` = true;
alter session set `planner.width.max_per_query` = 1;
alter session set `planner.memory.max_query_memory_per_node` = 152428800;
select count(*) from (
  select * from (
select s1.type type, flatten(s1.rms.rptd) rptds, s1.rms, s1.uid 
from (
  select d.type type, d.uid uid, flatten(d.map.rm) rms from 
dfs.`/drill/testdata/resource-manager/nested-large.json` d order by d.uid
) s1
  ) s2
  order by s2.rms.mapid
);
ALTER SESSION SET `exec.sort.disable_managed` = true;
alter session set `planner.width.max_per_node` = 17;
alter session set `planner.disable_exchanges` = false;
alter session set `planner.width.max_per_query` = 1000;
alter session set `planner.memory.max_query_memory_per_node` = 2147483648;
{noformat}

Stack trace is:
{noformat}
2017-08-23 06:59:42,763 [266275e5-ebdb-14ae-d52d-00fa3a154f6d:frag:0:0] INFO  
o.a.d.e.w.fragment.FragmentExecutor - User Error Occurred: One or more nodes
 ran out of memory while executing the query. (Unable to allocate buffer of 
size 4194304 (rounded from 3276750) due to memory limit. Current allocation: 7
9986944)
org.apache.drill.common.exceptions.UserException: RESOURCE ERROR: One or more 
nodes ran out of memory while executing the query.

Unable to allocate buffer of size 4194304 (rounded from 3276750) due to memory 
limit. Current allocation: 79986944

[Error Id: 4f4959df-0921-4a50-b75e-56488469ab10 ]
at 
org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:550)
 ~[drill-common-1.12.0-SNAPSHOT.jar:1.12.0-SNAPSHOT]
at 
org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:244)
 [drill-java-exec-1.12.0-SNAPSHOT.jar:1.12.0-SNAPSHOT]
at 
org.apache.drill.common.SelfCleaningRunnable.run(SelfCleaningRunnable.java:38) 
[drill-common-1.12.0-SNAPSHOT.jar:1.12.0-SNAPSHOT]
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) 
[na:1.7.0_51]
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) 
[na:1.7.0_51]
at java.lang.Thread.run(Thread.java:744) [na:1.7.0_51]
Caused by: org.apache.drill.exec.exception.OutOfMemoryException: Unable to 
allocate buffer of size 4194304 (rounded from 3276750) due to memory limit. Cur
rent allocation: 79986944
at 
org.apache.drill.exec.memory.BaseAllocator.buffer(BaseAllocator.java:238) 
~[drill-memory-base-1.12.0-SNAPSHOT.jar:1.12.0-SNAPSHOT]
at 
org.apache.drill.exec.memory.BaseAllocator.buffer(BaseAllocator.java:213) 
~[drill-memory-base-1.12.0-SNAPSHOT.jar:1.12.0-SNAPSHOT]
at 
org.apache.drill.exec.vector.VarCharVector.allocateNew(VarCharVector.java:402) 
~[vector-1.12.0-SNAPSHOT.jar:1.12.0-SNAPSHOT]
at 
org.apache.drill.exec.vector.NullableVarCharVector.allocateNew(NullableVarCharVector.java:236)
 ~[vector-1.12.0-SNAPSHOT.jar:1.12.0-SNAPSHOT]
at 
org.apache.drill.exec.vector.AllocationHelper.allocatePrecomputedChildCount(AllocationHelper.java:33)
 ~[vector-1.12.0-SNAPSHOT.jar:1.12.0-SNAPS
HOT]
at 
org.apache.drill.exec.vector.AllocationHelper.allocate(AllocationHelper.java:46)
 ~[vector-1.12.0-SNAPSHOT.jar:1.12.0-SNAPSHOT]
at 
org.apache.drill.exec.record.VectorInitializer.allocateVector(VectorInitializer.java:113)
 ~[drill-java-exec-1.12.0-SNAPSHOT.jar:1.12.0-SNAPSHOT
]
at 
org.apache.drill.exec.record.VectorInitializer.allocateVector(VectorInitializer.java:95)
 ~[drill-java-exec-1.12.0-SNAPSHOT.jar:1.12.0-SNAPSHOT]
at 
org.apache.drill.exec.record.VectorInitializer.allocateMap(VectorInitializer.java:130)
 ~[drill-java-exec-1.12.0-SNAPSHOT.jar:1.12.0-SNAPSHOT]
at 
org.apache.drill.exec.record.VectorInitializer.allocateVector(VectorInitializer.java:93)
 ~[drill-java-exec-1.12.0-SNAPSHOT.jar:1.12.0-SNAPSHOT]
at 
org.apache.drill.exec.record.VectorInitializer.allocateBatch(VectorInitializer.java:85)
 ~[drill-java-exec-1.12.0-SNAPSHOT.jar:1.12.0-SNAPSHOT]
at 
org.apache.drill.exec.physical.impl.xsort.managed.PriorityQueueCopierWrapper$BatchMerger.next(PriorityQueueCopierWrapper.java:262)
 ~[drill-java
-exec-1.12.0-SNAPSHOT.jar:1.12.0-SNAPSHOT]
at 
org.apache.drill.exec.physical.impl.xsort.managed.ExternalSortBatch.load(ExternalSortBatch.java:374)
 ~[drill-java-exec-1.12.0-SNAPSHOT.jar:1.12
.0-SNAPSHOT

[jira] [Created] (DRILL-5732) Unable to allocate sv2 for 9039 records, and not enough batchGroups to spill.

2017-08-18 Thread Robert Hou (JIRA)
Robert Hou created DRILL-5732:
-

 Summary: Unable to allocate sv2 for 9039 records, and not enough 
batchGroups to spill.
 Key: DRILL-5732
 URL: https://issues.apache.org/jira/browse/DRILL-5732
 Project: Apache Drill
  Issue Type: Bug
Affects Versions: 1.10.0
Reporter: Robert Hou
Assignee: Paul Rogers


git commit id:
{noformat}
| 1.12.0-SNAPSHOT  | e9065b55ea560e7f737d6fcb4948f9e945b9b14f  | DRILL-5660: 
Parquet metadata caching improvements  | 15.08.2017 @ 09:31:00 PDT  | 
r...@qa-node190.qa.lab  | 15.08.2017 @ 13:29:26 PDT  |
{noformat}

Query is:
{noformat}
ALTER SESSION SET `exec.sort.disable_managed` = false;
alter session set `planner.disable_exchanges` = true;
alter session set `planner.memory.max_query_memory_per_node` = 104857600;
alter session set `planner.width.max_per_node` = 1;
alter session set `planner.width.max_per_query` = 1;
select max(col1), max(cs_sold_date_sk), max(cs_sold_time_sk), 
max(cs_ship_date_sk), max(cs_bill_customer_sk), max(cs_bill_cdemo_sk), 
max(cs_bill_hdemo_sk), max(cs_bill_addr_sk), max(cs_ship_customer_sk), 
max(cs_ship_cdemo_sk), max(cs_ship_hdemo_sk), max(cs_ship_addr_sk), 
max(cs_call_center_sk), max(cs_catalog_page_sk), max(cs_ship_mode_sk), 
min(cs_warehouse_sk), max(cs_item_sk), max(cs_promo_sk), max(cs_order_number), 
max(cs_quantity), max(cs_wholesale_cost), max(cs_list_price), 
max(cs_sales_price), max(cs_ext_discount_amt), min(cs_ext_sales_price), 
max(cs_ext_wholesale_cost), min(cs_ext_list_price), min(cs_ext_tax), 
min(cs_coupon_amt), max(cs_ext_ship_cost), max(cs_net_paid), 
max(cs_net_paid_inc_tax), min(cs_net_paid_inc_ship), 
min(cs_net_paid_inc_ship_tax), min(cs_net_profit), min(c_customer_sk), 
min(length(c_customer_id)), max(c_current_cdemo_sk), max(c_current_hdemo_sk), 
min(c_current_addr_sk), min(c_first_shipto_date_sk), 
min(c_first_sales_date_sk), min(length(c_salutation)), 
min(length(c_first_name)), min(length(c_last_name)), 
min(length(c_preferred_cust_flag)), max(c_birth_day), min(c_birth_month), 
min(c_birth_year), max(c_last_review_date), c_email_address  from (select 
cs_sold_date_sk+cs_sold_time_sk col1, * from 
dfs.`/drill/testdata/resource-manager/md1362` order by c_email_address nulls 
first) d where d.col1 > 2536816 and c_email_address is not null group by 
c_email_address;
ALTER SESSION SET `exec.sort.disable_managed` = true;
alter session set `planner.disable_exchanges` = false;
alter session set `planner.memory.max_query_memory_per_node` = 2147483648;
alter session set `planner.width.max_per_node` = 17;
alter session set `planner.width.max_per_query` = 1000;
{noformat}

Here is the stack trace:
{noformat}
2017-08-18 13:15:27,052 [2668b522-5833-8fd2-0b6d-e685197f0ae3:frag:0:0] DEBUG 
o.a.d.e.t.g.SingleBatchSorterGen27 - Took 6445 us to sort 9039 records
2017-08-18 13:15:27,420 [2668b522-5833-8fd2-0b6d-e685197f0ae3:frag:0:0] DEBUG 
o.a.d.e.p.i.xsort.ExternalSortBatch - Copier allocator current allocation 0
2017-08-18 13:15:27,420 [2668b522-5833-8fd2-0b6d-e685197f0ae3:frag:0:0] DEBUG 
o.a.d.e.p.i.xsort.ExternalSortBatch - mergeAndSpill: starting total size in 
memory = 71964288
2017-08-18 13:15:27,421 [2668b522-5833-8fd2-0b6d-e685197f0ae3:frag:0:0] INFO  
o.a.d.e.p.i.xsort.ExternalSortBatch - User Error Occurred: One or more nodes 
ran out of memory while executing the query.
org.apache.drill.common.exceptions.UserException: RESOURCE ERROR: One or more 
nodes ran out of memory while executing the query.

Unable to allocate sv2 for 9039 records, and not enough batchGroups to spill.
batchGroups.size 1
spilledBatchGroups.size 0
allocated memory 71964288
allocator limit 52428800

[Error Id: 7b248f12-2b31-4013-86b6-92e6c842db48 ]
at 
org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:550)
 ~[drill-common-1.12.0-SNAPSHOT.jar:1.12.0-SNAPSHOT]
at 
org.apache.drill.exec.physical.impl.xsort.ExternalSortBatch.newSV2(ExternalSortBatch.java:637)
 [drill-java-exec-1.12.0-SNAPSHOT.jar:1.12.0-SNAPSHOT]
at 
org.apache.drill.exec.physical.impl.xsort.ExternalSortBatch.innerNext(ExternalSortBatch.java:379)
 [drill-java-exec-1.12.0-SNAPSHOT.jar:1.12.0-SNAPSHOT]
at 
org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:164)
 [drill-java-exec-1.12.0-SNAPSHOT.jar:1.12.0-SNAPSHOT]
at 
org.apache.drill.exec.physical.impl.validate.IteratorValidatorBatchIterator.next(IteratorValidatorBatchIterator.java:225)
 [drill-java-exec-1.12.0-SNAPSHOT.jar:1.12.0-SNAPSHOT]
at 
org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:119)
 [drill-java-exec-1.12.0-SNAPSHOT.jar:1.12.0-SNAPSHOT]
at 
org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:109)
 [drill-java-exec-1.12.0-SNAPSHOT.jar:1.12.0-SNAPSHOT]
at 
org.apache.drill.exec.record.AbstractSingleRecordBatch.innerN

Re: [ANNOUNCE] New PMC member: Arina Ielchiieva

2017-08-03 Thread Robert Hou
Congratulations!  Thanks for your contributions.


--Robert


From: Vitalii Diravka 
Sent: Thursday, August 3, 2017 3:32 AM
To: dev@drill.apache.org
Subject: Re: [ANNOUNCE] New PMC member: Arina Ielchiieva

Congratulations! Well deserved.

Kind regards
Vitalii

On Thu, Aug 3, 2017 at 2:53 AM, Arina Yelchiyeva  wrote:

> Thank all you!
>
> Kind regards
> Arina
>
> On Thu, Aug 3, 2017 at 5:58 AM, Sudheesh Katkam 
> wrote:
>
> > Congratulations and thank you, Arina.
> >
> > On Wed, Aug 2, 2017 at 1:38 PM, Paul Rogers  wrote:
> >
> > > The success of the Drill 1.11 release proves this is a well-deserved
> > move.
> > > Congratulations!
> > >
> > > - Paul
> > >
> > > > On Aug 2, 2017, at 11:23 AM, Aman Sinha 
> wrote:
> > > >
> > > > I am pleased to announce that Drill PMC invited Arina Ielchiieva to
> the
> > > PMC
> > > > and she has accepted the invitation.
> > > >
> > > > Congratulations Arina and thanks for your contributions !
> > > >
> > > > -Aman
> > > > (on behalf of Drill PMC)
> > >
> > >
> >
>


Re: [ANNOUNCE] New Committer: Laurent Goujon

2017-06-09 Thread Robert Hou
Congrats, Laurent!  Thanks for all your work on the client side.


--Robert


From: Jinfeng Ni 
Sent: Friday, June 9, 2017 1:11 PM
To: dev
Subject: Re: [ANNOUNCE] New Committer: Laurent Goujon

Congratulations, Laurent!



On Fri, Jun 9, 2017 at 10:02 AM, Julien Le Dem  wrote:

> Congrats Laurent!
>
> On Fri, Jun 9, 2017 at 9:57 AM, rahul challapalli <
> challapallira...@gmail.com> wrote:
>
> > Congratulations Laurent!
> >
> > On Fri, Jun 9, 2017 at 9:49 AM, Paul Rogers  wrote:
> >
> > > Congratulations and welcome!
> > >
> > > - Paul
> > >
> > > > On Jun 9, 2017, at 3:33 AM, Khurram Faraaz  wrote:
> > > >
> > > > Congratulations Laurent.
> > > >
> > > > 
> > > > From: Parth Chandra 
> > > > Sent: Friday, June 9, 2017 3:14:00 AM
> > > > To: dev@drill.apache.org
> > > > Subject: [ANNOUNCE] New Committer: Laurent Goujon
> > > >
> > > > The Project Management Committee (PMC) for Apache Drill has invited
> > > Laurent
> > > > Goujon to become a committer, and we are pleased to announce that he
> > has
> > > > accepted.
> > > >
> > > > Laurent has a long list of contributions many in the client side
> > > interfaces
> > > > and metadata queries.
> > > >
> > > > Welcome Laurent, and thank you for your contributions.  Keep up the
> > good
> > > > work !
> > > >
> > > > - Parth
> > > > (on behalf of the Apache Drill PMC)
> > >
> > >
> >
>
>
>
> --
> Julien
>


Re: [ANNOUNCE] New Committer: Paul Rogers

2017-05-19 Thread Robert Hou
Congrats, Paul!



From: Chunhui Shi 
Sent: Friday, May 19, 2017 9:44 AM
To: dev
Subject: Re: [ANNOUNCE] New Committer: Paul Rogers

Congrats Paul! Thank you for your contributions!


From: rahul challapalli 
Sent: Friday, May 19, 2017 9:20:52 AM
To: dev
Subject: Re: [ANNOUNCE] New Committer: Paul Rogers

Congratulations Paul. Well Deserved.

On Fri, May 19, 2017 at 8:46 AM, Gautam Parai  wrote:

> Congratulations Paul and thank you for your contributions!
>
>
> Gautam
>
> 
> From: Abhishek Girish 
> Sent: Friday, May 19, 2017 8:27:05 AM
> To: dev@drill.apache.org
> Subject: Re: [ANNOUNCE] New Committer: Paul Rogers
>
> Congrats Paul!
>
> On Fri, May 19, 2017 at 8:23 AM, Charles Givre  wrote:
>
> > Congrats Paul!!
> >
> > On Fri, May 19, 2017 at 11:22 AM, Aman Sinha 
> wrote:
> >
> > > The Project Management Committee (PMC) for Apache Drill has invited
> Paul
> > > Rogers to become a committer, and we are pleased to announce that he
> has
> > > accepted.
> > >
> > > Paul has a long list of contributions that have touched many aspects of
> > the
> > > product.
> > >
> > > Welcome Paul, and thank you for your contributions.  Keep up the good
> > work
> > > !
> > >
> > > - Aman
> > >
> > > (on behalf of the Apache Drill PMC)
> > >
> >
>


[jira] [Created] (DRILL-5374) Parquet filter pushdown does not prune partition with nulls when predicate uses float column

2017-03-21 Thread Robert Hou (JIRA)
Robert Hou created DRILL-5374:
-

 Summary: Parquet filter pushdown does not prune partition with 
nulls when predicate uses float column
 Key: DRILL-5374
 URL: https://issues.apache.org/jira/browse/DRILL-5374
 Project: Apache Drill
  Issue Type: Bug
  Components: Query Planning & Optimization
Affects Versions: 1.9.0
Reporter: Robert Hou
Assignee: Jinfeng Ni


Drill does not prune enough partitions for this query when filter pushdown is 
used with metadata caching. The float column is being compared with a double 
value.

{code}
0: jdbc:drill:zk=10.10.100.186:5181/drill/rho> select count(*) from 
orders_parts_metadata where float_id < 1100.0;
{code}

To reproduce the problem, put the attached files into a directory. Then 

{code}
create the metadata:
refresh table metadata dfs.`path_to_directory`;
{code}

For example, if you put the files in 
/drill/testdata/filter/orders_parts_metadata, then run this sql command

{code}
refresh table metadata dfs.`/drill/testdata/filter/orders_parts_metadata`;
{code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


Re: [ANNOUNCE] New Committer: Arina Ielchiieva

2017-02-24 Thread Robert Hou
Congratulations, Arina!



From: rahul challapalli 
Sent: Friday, February 24, 2017 9:48 AM
To: dev
Subject: Re: [ANNOUNCE] New Committer: Arina Ielchiieva

Congrats Arina!

On Fri, Feb 24, 2017 at 9:42 AM, Julian Hyde  wrote:

> Congratulations, and welcome!
>
> On Fri, Feb 24, 2017 at 9:17 AM, Abhishek Girish 
> wrote:
> > Congratulations Arina!
> >
> > On Fri, Feb 24, 2017 at 9:06 AM, Sudheesh Katkam 
> > wrote:
> >
> >> The Project Management Committee (PMC) for Apache Drill has invited
> Arina
> >> Ielchiieva to become a committer, and we are pleased to announce that
> she
> >> has accepted.
> >>
> >> Arina has a long list of contributions [1] that have touched many
> aspects
> >> of the product. Her work includes features such as dynamic UDF support
> and
> >> temporary tables support.
> >>
> >> Welcome Arina, and thank you for your contributions.
> >>
> >> - Sudheesh, on behalf of the Apache Drill PMC
> >>
> >> [1] https://github.com/apache/drill/commits/master?author=
> arina-ielchiieva
> >>
>


[jira] [Created] (DRILL-5136) Some SQL statements fail when using Simba ODBC driver 1.3

2016-12-19 Thread Robert Hou (JIRA)
Robert Hou created DRILL-5136:
-

 Summary: Some SQL statements fail when using Simba ODBC driver 1.3
 Key: DRILL-5136
 URL: https://issues.apache.org/jira/browse/DRILL-5136
 Project: Apache Drill
  Issue Type: Bug
  Components: Client - ODBC
Affects Versions: 1.9.0
Reporter: Robert Hou


"show schemas" does not work with Simba ODBC driver 

SQL>show schemas
1: SQLPrepare = [MapR][Drill] (1040) Drill failed to execute the query: show 
schemas
[30029]Query execution error. Details:[
PARSE ERROR: Encountered "( show" at line 1, column 15.
Was expecting one of:
 ...
 ...
 ...
 ...
 ...
"LATERAL" ...
"(" "WITH" ...
"(" "+" ...
"(" "-" ...
"("  ...
"("  ...
"(" 

  1   2   >