[jira] [Created] (DRILL-4787) column value is always null in inner join query

2016-07-18 Thread Zhenhua Dong (JIRA)
Zhenhua Dong created DRILL-4787:
---

 Summary: column value is always null in inner join query
 Key: DRILL-4787
 URL: https://issues.apache.org/jira/browse/DRILL-4787
 Project: Apache Drill
  Issue Type: Bug
Affects Versions: 1.6.0
 Environment: OS: SUSE Linux Enterprise Server 11 SP3  (x86_64)
Cluster: 2 control node + 2 payload node

Reporter: Zhenhua Dong


1. query result is not correct
select USER_A.NAMEID, 
   USER_A.CSLOC, 
   USER_A.PSLOC 
   FROM USER_B
   inner join USER_A
   on USER_B.NAMEID=USER_A.NAMEID
   where USER_B.NAMEID=490
+-+++
| NAME| CSLOC  | PSLOC  |
+-+++
| null| 2  | 2  |
+-+++

2. execute plan
>explain plan for select USER_A.NAMEID,
USER_A.CSLOC,
USER_A.PSLOC
FROM USER_B
inner join USER_A
on USER_B.NAMEID=USER_A.NAMEID
where USER_B.NAMEID=490;

00-00Screen
00-01  Project(NAMEID=[$0], CSLOC=[$1], PSLOC=[$2])
00-02Project(NAMEID=[$20], CSLOC=[$25], PSLOC=[$26])
00-03  Jdbc(sql=[SELECT *
FROM (SELECT *
FROM `mysqldb`.`USER_B`
WHERE `NAMEID` = 490) AS `t`
INNER JOIN `mysqldb`.`USER_A` ON `t`.`NAMEID` = `USER_A`.`NAMEID`])

3. the result follow the execute plan
>SELECT *
 FROM (SELECT *
 FROM `USER_B`
 WHERE `NAMEID` = 490) AS `t`
 INNER JOIN `USER_A` ON `t`.`NAMEID` = `USER_A`.`NAMEID`;
+--+--+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+--++-+-+--+++--+---+---+---+--+--+--++---+---+
|NAMEID|   IMSI   | TS11  | TS21  | TS22  | TS62  | BS22  | 
BS24  | BS25  | BS26  | BS2G  | BS3G  | BS2F  | BS3F  | ANAMEID1  |  BC1  | 
ANAMEID2  |  BC2  | ANAMEID3  |  BC3  | NAMEID0  | IMSI0  | IMEISV  | VLRADD  | 
SGSNNUM  | CSLOC  | PSLOC  | NPREFIX  | SUBSTYPE  | KIND  |EKI  
  | AKATYPE  | A3A8IND  | FSETIND  | A4IND  | AUTHINFO  |  RID  |
+--+--+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+--++-+-+--+++--+---+---+---+--+--+--++---+---+
| 490  | 260  | 1 | 1 | 1 | null  | null  | 
null  | null  | null  | null  | null  | null  | null  | null  | null  | 
null  | null  | null  | null  | null | null   | null| null| 
null | 2  | 2  | null | null  | 325   | 
12345678901234567890123456789012  | 0| 4| 15   | 2  | 
null  | null  |
+--+--+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+--++-+-+--+++--+---+---+---+--+--+--++---+---+
1 row selected (0.979 seconds)

4. drill view describe
> describe USER_CSPS_CSdata;
+--++--+
| COLUMN_NAME  | DATA_TYPE  | IS_NULLABLE  |
+--++--+
| NAMEID   | CHARACTER VARYING  | YES  |
| NAME | CHARACTER VARYING  | YES  |
| TS11 | TINYINT| YES  |
| TS21 | TINYINT| YES  |
| TS22 | TINYINT| YES  |
| TS62 | TINYINT| YES  |
| BS22 | TINYINT| YES  |
| BS24 | TINYINT| YES  |
| BS25 | TINYINT| YES  |
| BS26 | TINYINT| YES  |
| BS2G | TINYINT| YES  |
| BS3G | TINYINT| YES  |
| BS2F | TINYINT| YES  |
| BS3F | TINYINT| YES  |
| AMSISDN1 | CHARACTER VARYING  | YES  |
| BC1  | INTEGER| YES  |
| AMSISDN2 | CHARACTER VARYING  | YES  |
| BC2  | INTEGER| YES  |
| AMSISDN3 | CHARACTER VARYING  | YES  |
| BC3  | INTEGER| YES  |
+--++--+


> describe USER_CSPS_Subscription;
+--++--+
| COLUMN_NAME  | 

[GitHub] drill pull request #548: DRILL-4785: Avoid checking for hard affinity scans ...

2016-07-18 Thread vkorukanti
GitHub user vkorukanti opened a pull request:

https://github.com/apache/drill/pull/548

DRILL-4785: Avoid checking for hard affinity scans in limit 0 shortcut cases



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/vkorukanti/drill DRILL-4785

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/drill/pull/548.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #548


commit c94501e3034d703fff1a56984183a1c12650340c
Author: vkorukanti 
Date:   2016-07-18T23:56:25Z

DRILL-4785: Avoid checking for hard affinity scans in limit 0 shortcut cases




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] drill pull request #519: DRILL-4530: Optimize partition pruning with metadat...

2016-07-18 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/drill/pull/519


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] drill issue #546: DRILL-4783: flatten operator should not throw exception if...

2016-07-18 Thread chunhui-shi
Github user chunhui-shi commented on the issue:

https://github.com/apache/drill/pull/546
  
@jaltekruse Hi Jason, could you help to review this change? Thanks. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] drill issue #509: DRILL-4618: Fix hive function loader not correctly take ra...

2016-07-18 Thread chunhui-shi
Github user chunhui-shi commented on the issue:

https://github.com/apache/drill/pull/509
  
@StevenMPhillips Hi Steven, is this fix addressing your concern now? Thanks.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] drill issue #544: DRILL 4581 Revised

2016-07-18 Thread paul-rogers
Github user paul-rogers commented on the issue:

https://github.com/apache/drill/pull/544
  
Created new version, closing this one.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


Re: Dynamic UDFs support

2016-07-18 Thread Parth Chandra
+1 on simplifying the design and postpone the items Paul has suggested.

Arina, Paul, I think we need to work out some of the design related to
registering the UDF. Are you guys open for a quick hangout @10 a.m PDT
tomorrow?



On Thu, Jul 14, 2016 at 1:46 PM, Paul Rogers  wrote:

> Hi All,
>
> We’ve had quite a lively debate in the “comments” section of Arina’s
> wonderful design doc. Zelaine made a great suggestion: summarize the user
> experience as a way of making sense of the wealth of detailed comments.
>
> IMHO, the most important user experience goals are:
>
> 1. When a user submits a CREATE FUNCTION command, the command returns
> quickly (within a few seconds at most.)
> 2. If the above user then issues a query using that function (to the same
> Foreman), that query is guaranteed to successfully use the new function on
> all nodes.
> 3. Other users, connecting to any Foreman will see a very clean behavior
> when submitting a query with the new function. Before some point in time
> (can be different for each Foreman), a query with the function fails in
> planning. After that point, queries are guaranteed to successfully use the
> new function on all nodes.
>
> Basically, this says that CREATE FUNCTION can’t (potentially) take a long
> time. Use of functions can’t result in random failures during the time that
> the function is propagated across Drillbits.
>
> The goals we can perhaps postpone are:
>
> 1. Class name space isolation. (Allows two data scientists to define the
> same class without collisions.)
> 2. Function name spaces. (Allows me to define “paul.foo” and you to define
> “bob.foo” with out collisions. (Needed if many people develop functions
> independently. Else, we need a global name space.)
> 3. Dynamic DROP FUNCTION operation. (The issues here are messy, and it
> requires unloading classes and name space cleanup.) (Just let the cleanup
> happen offline.)
> 4. Dependency jars (e.g. third party libraries, etc.) (We require those to
> be statically added to the class path before Drill starts.)
>
> We are not creating per-user name spaces, or allowing people to use
> production clusters to try/revise functions. We’re just sampling deployment
> of simple functions.
>
> That’s my suggestion, what do others suggest?
>
> Thanks,
>
> - Paul
>
> > On Jul 7, 2016, at 12:32 PM, Arina Yelchiyeva <
> arina.yelchiy...@gmail.com> wrote:
> >
> > I also agree on using Zookeeper. I have re-worked dynamic UDF support
> > document taking into account Zookeeper usage.
> >
> > Link to the document -
> >
> https://docs.google.com/document/d/1MluM17EKajvNP_x8U4aymcOihhUm8BMm8t_hM0jEFWk/edit
> >
> > Kind regards
> > Arina
> >
> > On Tue, Jun 28, 2016 at 12:55 AM Paul Rogers 
> wrote:
> >
> >> Great idea! We already use ZK to track storage plugins. ZK is perhaps
> >> better suited to register each jar and/or function that using files in
> DFS.
> >> Still need to work out the proper sequencing. But you are right, this is
> >> the kind of thing that ZK is supposed to solve.
> >>
> >> - Paul
> >>
> >>
> >>> On Jun 27, 2016, at 2:01 PM, Parth Chandra  wrote:
> >>>
> >>> Reading thru some of Paul's comments on maintaining a consistent state
> >> for
> >>> the registration of the UDF, it looks like we need a consensus protocol
> >> for
> >>> determining that all the Drillbits have the UDF deployed.
> >>> I believe Zookeeper can provide a stronger guarantee than a 2 phase
> >>> approach. Should we look into that?
> >>>
> >>> On Fri, Jun 24, 2016 at 10:00 AM, Arina Yelchiyeva <
> >>> arina.yelchiy...@gmail.com> wrote:
> >>>
>  Hi all!
> 
>  I have updated design document.
>  Main changes:
>  1. Add to Drill’s config цшер  the staging and registration DFS
> >> locations.
>  2. User is no longer is responsible for copying jars into drillbit
> >> nodes.
>  Now user needs to copy jars into staging DFS location from where
> >> drillbits
>  will copy them to local fs.
>  2. During UDFs registration jars will be moved to DFS registration
> area.
>  3. During start up drillbit will copy all jars from registration area,
> >> so
>  newly added drillbit will have all UDFs as others.
>  4. Security issues - probably they will be added later as enhancement.
> 
>  More detains in the document:
> 
> 
> >>
> https://docs.google.com/document/d/1MluM17EKajvNP_x8U4aymcOihhUm8BMm8t_hM0jEFWk/edit
> 
>  Kind regards
>  Arina
> 
>  On Fri, Jun 17, 2016 at 1:25 AM Paul Rogers 
> >> wrote:
> 
> > Hi All,
> >
> > To answer Arina on item 3: there is actually no good location on any
>  local
> > node to put the UDFs. Reason: DoY allows the admin to start a
> Drillbit
> >> on
> > any available node. When it starts, a new, fresh copy of Drill will
> be
> > downloaded, and this can happen after the user issued the CREATE
> >> command.
> >
> 

[jira] [Resolved] (DRILL-4175) IOBE may occur in Calcite RexProgramBuilder when queries are submitted concurrently

2016-07-18 Thread Jinfeng Ni (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4175?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jinfeng Ni resolved DRILL-4175.
---
   Resolution: Fixed
Fix Version/s: 1.8.0

Fixed in 70aba772a9434e0703078bddb47008f35cffb8bf

> IOBE may occur in Calcite RexProgramBuilder when queries are submitted 
> concurrently
> ---
>
> Key: DRILL-4175
> URL: https://issues.apache.org/jira/browse/DRILL-4175
> Project: Apache Drill
>  Issue Type: Bug
> Environment: distribution
>Reporter: huntersjm
> Fix For: 1.8.0
>
>
> I queryed a sql just like `selelct v from table limit 1`,I get a error:
> org.apache.drill.common.exceptions.UserException: SYSTEM ERROR: 
> IndexOutOfBoundsException: Index: 68, Size: 67
> After debug, I found there is a bug in calcite parse:
> first we look line 72 in org.apache.calcite.rex.RexProgramBuilder
> {noformat}
>registerInternal(RexInputRef.of(i, fields), false);
> {noformat}
> there we get RexInputRef from RexInputRef.of, and it has a method named 
> createName(int idex), here NAMES is SelfPopulatingList.class. 
> SelfPopulatingList.class describe  as Thread-safe list, but in fact it is 
> Thread-unsafe. when NAMES.get(index) is called distributed, it gets a error. 
> We hope SelfPopulatingList.class to be {$0 $1 $2 $n}, but when it called 
> distributed, it may be {$0,$1...$29,$30...$59,$30,$31...$59...}.
> We see method registerInternal
> {noformat}
> private RexLocalRef registerInternal(RexNode expr, boolean force) {
> expr = simplify(expr);
> RexLocalRef ref;
> final Pair key;
> if (expr instanceof RexLocalRef) {
>   key = null;
>   ref = (RexLocalRef) expr;
> } else {
>   key = RexUtil.makeKey(expr);
>   ref = exprMap.get(key);
> }
> if (ref == null) {
>   if (validating) {
> validate(
> expr,
> exprList.size());
>   }
> {noformat}
> Here makeKey(expr) hope to get different key, however it get same key, so 
> addExpr(expr) called less, in this method
> {noformat}
> RexLocalRef ref;
> final int index = exprList.size();
> exprList.add(expr);
> ref =
> new RexLocalRef(
> index,
> expr.getType());
> localRefList.add(ref);
> return ref;
> {noformat}
> localRefList get error size, so in line 939,
> {noformat}
> final RexLocalRef ref = localRefList.get(index);
> {noformat}
> throw IndexOutOfBoundsException
> bugfix:
> We can't change origin code of calcite before they fix this bug, so we can 
> init NAMEs in RexLocalRef on start. Just add 
> {noformat}
> RexInputRef.createName(2048);
> {noformat}
> on Bootstrap.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[Drill-1328] Design Review - Compute and use statistics in Drill

2016-07-18 Thread Gautam Parai
Hi all,

I have uploaded a design specification for Drill-1328 Compute and use
statistics in Drill. The goal is to provide statistics support in Drill
which will help build better plans in Drill.

This design borrows ideas from the earlier design. However, we try to
address some concerns in the earlier design.

Please review the specification and provide feedback. The link to the
specification is present in the JIRA Issue Links section.

Thanks,
Gautam


[GitHub] drill issue #540: Fix for DRILL-4759: Drill throwing array index out of boun...

2016-07-18 Thread parthchandra
Github user parthchandra commented on the issue:

https://github.com/apache/drill/pull/540
  
+1 for updated patch.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] drill pull request #534: [DRILL-4743] HashJoin's not fully parallelized in q...

2016-07-18 Thread sudheeshkatkam
Github user sudheeshkatkam commented on a diff in the pull request:

https://github.com/apache/drill/pull/534#discussion_r71207876
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/server/options/TypeValidators.java
 ---
@@ -90,6 +91,62 @@ public void validate(OptionValue v) {
 }
   }
 
+  public static class MinRangeDoubleValidator extends RangeDoubleValidator 
{
+private final double min;
+private final double max;
+private final String maxValidatorName;
+
+public MinRangeDoubleValidator(String name, double min, double max, 
double def, String maxValidatorName) {
+  super(name, min, max, def);
+  this.min = min;
+  this.max = max;
+  this.maxValidatorName = maxValidatorName;
+}
+
+@Override
+public void validate(OptionValue v, final OptionManager manager) {
+  super.validate(v, manager);
+  if (manager != null) {
--- End diff --

Is this null check necessary?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] drill pull request #534: [DRILL-4743] HashJoin's not fully parallelized in q...

2016-07-18 Thread sudheeshkatkam
Github user sudheeshkatkam commented on a diff in the pull request:

https://github.com/apache/drill/pull/534#discussion_r71207858
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/PlannerSettings.java
 ---
@@ -212,6 +219,14 @@ public static long getInitialPlanningMemorySize() {
 return INITIAL_OFF_HEAP_ALLOCATION_IN_BYTES;
   }
 
+  public double getFilterMinSelectivityEstimateFactor() {
+return 
options.getOption(FILTER_MIN_SELECTIVITY_ESTIMATE_FACTOR.getOptionName()).float_val;
--- End diff --

Make the option validators above typed:
`public static final FloatValidator FILTER_MIN_SELECTIVITY_ESTIMATE_FACTOR 
...`

and change this line to:
`return options.getOption(FILTER_MIN_SELECTIVITY_ESTIMATE_FACTOR);`

Same for the other option.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


ApacheCon: Getting the word out internally

2016-07-18 Thread Melissa Warnkin
ApacheCon: Getting the word out internally
Dear Apache Enthusiast,

As you are no doubt already aware, we will be holding ApacheCon in
Seville, Spain, the week of November 14th, 2016. The call for papers
(CFP) for this event is now open, and will remain open until
September 9th.

The event is divided into two parts, each with its own CFP. The first
part of the event, called Apache Big Data, focuses on Big Data
projects and related technologies.

Website: http://events.linuxfoundation.org/events/apache-big-data-europe
CFP:
http://events.linuxfoundation.org/events/apache-big-data-europe/program/cfp

The second part, called ApacheCon Europe, focuses on the Apache
Software Foundation as a whole, covering all projects, community
issues, governance, and so on.

Website: http://events.linuxfoundation.org/events/apachecon-europe
CFP: http://events.linuxfoundation.org/events/apachecon-europe/program/cfp

ApacheCon is the official conference of the Apache Software
Foundation, and is the best place to meet members of your project and
other ASF projects, and strengthen your project's community.

If your organization is interested in sponsoring ApacheCon, contact Rich Bowen
at e...@apache.org  ApacheCon is a great place to find the brightest
developers in the world, and experts on a huge range of technologies.

I hope to see you in Seville!
==

Melissaon behalf of the ApacheCon Team


[GitHub] drill issue #519: DRILL-4530: Optimize partition pruning with metadata cachi...

2016-07-18 Thread jinfengni
Github user jinfengni commented on the issue:

https://github.com/apache/drill/pull/519
  
+1

LGTM.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] drill issue #540: Fix for DRILL-4759: Drill throwing array index out of boun...

2016-07-18 Thread ppadma
Github user ppadma commented on the issue:

https://github.com/apache/drill/pull/540
  
updated the fix. Please review. In DictionaryBigIntReader, we need to call 
parent readField if dictionary encoding is not enabled for the page.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Created] (DRILL-4786) Improve metadata cache performance for queries with multiple partitions

2016-07-18 Thread Aman Sinha (JIRA)
Aman Sinha created DRILL-4786:
-

 Summary: Improve metadata cache performance for queries with 
multiple partitions
 Key: DRILL-4786
 URL: https://issues.apache.org/jira/browse/DRILL-4786
 Project: Apache Drill
  Issue Type: Improvement
  Components: Metadata, Query Planning & Optimization
Affects Versions: 1.7.0
Reporter: Aman Sinha
Assignee: Aman Sinha


Consider  queries of the following type run against Parquet data with metadata 
caching:   

{noformat}
SELECT col FROM `A` WHERE dir0 = 'B`' AND dir1 IN ('1', '2', '3')
{noformat}

For such queries, Drill will read the metadata cache file from the top level 
directory 'A', which is not very efficient since we are only interested in the 
files  from some subdirectories of 'B'.   DRILL-4530 improves the performance 
of such queries when the leaf level directory is a single partition.  Here, 
there are 3 subpartitions due to the IN list.   We can build upon the 
DRILL-4530 enhancement by at least reading the cache file from the immediate 
parent level  `/A/B`  instead of the top level.  

The goal of this JIRA is to improve performance for such types of queries.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] drill pull request #546: DRILL-4783: flatten operator should not throw excep...

2016-07-18 Thread chunhui-shi
GitHub user chunhui-shi opened a pull request:

https://github.com/apache/drill/pull/546

DRILL-4783: flatten operator should not throw exception if there is empty 
resultset

returned for underlying operator which is convert_from or other complex 
functions.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/chunhui-shi/drill DRILL-4783-flatten

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/drill/pull/546.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #546


commit 88dc26dfeb7f47ded0bfdc4c7f89e6bc55520344
Author: chunhui-shi 
Date:   2016-07-16T07:18:33Z

DRILL-4783: flatten operator handle empty resultset(and schema unknown due 
to this)




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Created] (DRILL-4785) Limit 0 queries regressed in Drill 1.7.0

2016-07-18 Thread Dechang Gu (JIRA)
Dechang Gu created DRILL-4785:
-

 Summary: Limit 0 queries regressed in Drill 1.7.0 
 Key: DRILL-4785
 URL: https://issues.apache.org/jira/browse/DRILL-4785
 Project: Apache Drill
  Issue Type: Bug
  Components: Functions - Drill
Affects Versions: 1.7.0
 Environment: Redhat EL6
Reporter: Dechang Gu
Assignee: Venki Korukanti
 Fix For: 1.8.0


We noticed a bunch of limit 0 queries regressed quite a bit: +2500ms, while the 
same queries took  ~400ms in Apache Drill 1.6.0. 5-6X regression. Further 
investigation indicates that most likely the root cause of the regression is in 
the commit: 
vkorukanti committed with vkorukanti DRILL-4446: Support mandatory work 
assignment to endpoint requirement…
commit id:  10afc708600ea9f4cb0e7c2cd981b5b1001fea0d

With drill build on this commit, query takes 3095ms
and in the drillbit.log:
2016-07-15 17:27:55,048 ucs-node2.perf.lab 
[28768074-4ed6-a70a-2e6a-add3201ab801:foreman] INFO  
o.a.drill.exec.work.foreman.Foreman - Query text for query id 
28768074-4ed6-a70a-2e6a-add3201ab801: SELECT * FROM (SELECT CAST(EXTRACT(MONTH 
FROM CAST(`rfm_sales`.`business_date` AS DATE)) AS INTEGER) AS 
`mn_business_date_ok`,AVG((CASE WHEN ((CAST(EXTRACT(YEAR FROM 
CAST(`rfm_sales`.`business_date` AS DATE)) AS INTEGER) = 2014) AND 
(CAST((EXTRACT(MONTH FROM CAST(`rfm_sales`.`business_date` AS DATE)) - 1) / 3 + 
1 AS INTEGER) <= 4)) THEN `rfm_sales`.`pos_netsales` ELSE NULL END)) AS 
`avg_Calculation_CIDBACJBCCCBHDGB_ok`,SUM((CASE WHEN ((CAST(EXTRACT(YEAR FROM 
CAST(`rfm_sales`.`business_date` AS DATE)) AS INTEGER) = 2014) AND 
(CAST((EXTRACT(MONTH FROM CAST(`rfm_sales`.`business_date` AS DATE)) - 1) / 3 + 
1 AS INTEGER) <= 4)) THEN `rfm_sales`.`pos_netsales` ELSE NULL END)) AS 
`sum_Calculation_CIDBACJBCCCBHDGB_ok`,SUM((CASE WHEN ((CAST(EXTRACT(YEAR FROM 
CAST(`rfm_sales`.`business_date` AS DATE)) AS INTEGER) = 2014) AND 
(CAST((EXTRACT(MONTH FROM CAST(`rfm_sales`.`business_date` AS DATE)) - 1) / 3 + 
1 AS INTEGER) <= 4)) THEN 1 ELSE NULL END)) AS 
`sum_Calculation_CJEBBAEBBFADBDFJ_ok`,SUM((CASE WHEN ((CAST(EXTRACT(YEAR FROM 
CAST(`rfm_sales`.`business_date` AS DATE)) AS INTEGER) = 2014) AND 
(CAST((EXTRACT(MONTH FROM CAST(`rfm_sales`.`business_date` AS DATE)) - 1) / 3 + 
1 AS INTEGER) <= 4)) THEN (`rfm_sales`.`pos_comps` + `rfm_sales`.`pos_promos`) 
ELSE NULL END)) AS `sum_Net_Sales__YTD___copy__ok` FROM 
`dfs.xxx`.`views/rfm_sales` `rfm_sales` GROUP BY CAST(EXTRACT(MONTH FROM 
CAST(`rfm_sales`.`business_date` AS DATE)) AS INTEGER)) T LIMIT 0
2016-07-15 17:27:55,664 ucs-node2.perf.lab 
[28768074-4ed6-a70a-2e6a-add3201ab801:foreman] INFO  
o.a.d.exec.store.parquet.Metadata - Took 208 ms to read metadata from cache file
2016-07-15 17:27:56,783 ucs-node2.perf.lab 
[28768074-4ed6-a70a-2e6a-add3201ab801:foreman] INFO  
o.a.d.exec.store.parquet.Metadata - Took 129 ms to read metadata from cache file
2016-07-15 17:27:57,960 ucs-node2.perf.lab 
[28768074-4ed6-a70a-2e6a-add3201ab801:frag:0:0] INFO  
o.a.d.e.w.fragment.FragmentExecutor - 28768074-4ed6-a70a-2e6a-add3201ab801:0:0: 
State change requested AWAITING_ALLOCATION --> RUNNING
2016-07-15 17:27:57,961 ucs-node2.perf.lab 
[28768074-4ed6-a70a-2e6a-add3201ab801:frag:0:0] INFO  
o.a.d.e.w.f.FragmentStatusReporter - 28768074-4ed6-a70a-2e6a-add3201ab801:0:0: 
State to report: RUNNING
2016-07-15 17:27:57,989 ucs-node2.perf.lab 
[28768074-4ed6-a70a-2e6a-add3201ab801:frag:0:0] INFO  
o.a.d.e.w.fragment.FragmentExecutor - 28768074-4ed6-a70a-2e6a-add3201ab801:0:0: 
State change requested RUNNING --> FINISHED
2016-07-15 17:27:57,989 ucs-node2.perf.lab 
[28768074-4ed6-a70a-2e6a-add3201ab801:frag:0:0] INFO  
o.a.d.e.w.f.FragmentStatusReporter - 28768074-4ed6-a70a-2e6a-add3201ab801:0:0: 
State to report: FINISHED


while running the same query on the parent commit (commit id 
9f4fff800d128878094ae70b454201f79976135d), it only takes  492ms.
and in the drillbit.log:
2016-07-15 17:19:27,309 ucs-node7.perf.lab 
[2876826f-ee19-9466-0c0c-869f47c409f8:foreman] INFO  
o.a.drill.exec.work.foreman.Foreman - Query text for query id 
2876826f-ee19-9466-0c0c-869f47c409f8: SELECT * FROM (SELECT CAST(EXTRACT(MONTH 
FROM CAST(`rfm_sales`.`business_date` AS DATE)) AS INTEGER) AS 
`mn_business_date_ok`,AVG((CASE WHEN ((CAST(EXTRACT(YEAR FROM 
CAST(`rfm_sales`.`business_date` AS DATE)) AS INTEGER) = 2014) AND 
(CAST((EXTRACT(MONTH FROM CAST(`rfm_sales`.`business_date` AS DATE)) - 1) / 3 + 
1 AS INTEGER) <= 4)) THEN `rfm_sales`.`pos_netsales` ELSE NULL END)) AS 
`avg_Calculation_CIDBACJBCCCBHDGB_ok`,SUM((CASE WHEN ((CAST(EXTRACT(YEAR FROM 
CAST(`rfm_sales`.`business_date` AS DATE)) AS INTEGER) = 2014) AND 
(CAST((EXTRACT(MONTH FROM CAST(`rfm_sales`.`business_date` AS DATE)) - 1) / 3 + 
1 AS INTEGER) <= 4)) THEN `rfm_sales`.`pos_netsales` ELSE NULL END)) AS 
`sum_Calculation_CIDBACJBCCCBHDGB_ok`,SUM((CASE WHEN 

[GitHub] drill pull request #545: DRILL-4746: Verification Failures (Decimal values) ...

2016-07-18 Thread arina-ielchiieva
GitHub user arina-ielchiieva opened a pull request:

https://github.com/apache/drill/pull/545

DRILL-4746: Verification Failures (Decimal values) in drill's regress…

…ion tests

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/arina-ielchiieva/drill DRILL-4746

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/drill/pull/545.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #545


commit e5443cb102e32bbe7dcd2fb5cc592038e7875974
Author: Arina Ielchiieva 
Date:   2016-07-13T10:44:27Z

DRILL-4746: Verification Failures (Decimal values) in drill's regression 
tests




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---