from:"chunhui\-shi"

Re: [ANNOUNCE] New PMC Chair of Apache Drill

2019-08-27 Thread Chunhui Shi

Congrats Charles! And thanks Arina for your contributions!

Chunhui

On Mon, Aug 26, 2019 at 10:43 AM weijie tong 
wrote:

> Congratulations Charles.
>
> On Sat, Aug 24, 2019 at 11:33 AM Robert Hou  wrote:
>
> > Congratulations Charles, and thanks for your contributions to Drill!
> >
> > Thank you Arina for all you have done as PMC Chair this past year.
> >
> > --Robert
> >
> > On Fri, Aug 23, 2019 at 4:16 PM Khurram Faraaz 
> > wrote:
> >
> > > Congratulations Charles, and thank you Arina.
> > >
> > > Regards,
> > > Khurram
> > >
> > > On Fri, Aug 23, 2019 at 2:54 PM Niels Basjes  wrote:
> > >
> > > > Congratulations Charles.
> > > >
> > > > Niels Basjes
> > > >
> > > > On Thu, Aug 22, 2019, 09:28 Arina Ielchiieva 
> wrote:
> > > >
> > > > > Hi all,
> > > > >
> > > > > It has been a honor to serve as Drill Chair during the past year
> but
> > > it's
> > > > > high time for the new one...
> > > > >
> > > > > I am very pleased to announce that the Drill PMC has voted to elect
> > > > Charles
> > > > > Givre as the new PMC chair of Apache Drill. He has also been
> approved
> > > > > unanimously by the Apache Board in last board meeting.
> > > > >
> > > > > Congratulations, Charles!
> > > > >
> > > > > Kind regards,
> > > > > Arina
> > > > >
> > > >
> > >
> >
>

Re: [ANNOUNCE] New Committer: Karthikeyan Manivannan

2018-12-09 Thread Chunhui Shi

Congrats Karthik! why Larry Tesler?
--
From:Robert Hou 
Send Time:2018 Dec 7 (Fri) 22:16
To:dev@drill.apache.org 
Subject:Re: [ANNOUNCE] New Committer: Karthikeyan Manivannan

Congratulations, Karthik!  Thanks for all your contributions.

--Robert

On Fri, Dec 7, 2018 at 11:15 PM weijie tong  wrote:

> Congratulations Karthik !
>
> On Sat, Dec 8, 2018 at 12:10 PM Karthikeyan Manivannan <
> kmanivan...@mapr.com>
> wrote:
>
> > Thanks! In addition to all you wonderful Drillers, I would also like to
> > thank Google, StackOverflow and Larry Tesler
> > <
> >
> https://urldefense.proofpoint.com/v2/url?u=https-3A__www.indiatoday.in_education-2Dtoday_gk-2Dcurrent-2Daffairs_story_copy-2Dpaste-2Dinventor-2D337401-2D2016-2D08-2D26=DwIBaQ=cskdkSMqhcnjZxdQVpwTXg=GXRJhB4g1YFDJsrcglHwUA=unIwO2bGiU-CmEDMlh04j5SH0l7I9oQQysVWsBaBe2o=v_edFrOdFEaw0rIWpVS2PNSEJjUlIq28Kh0O3ULmBPE=
> > >
> > .
> >
> > On Fri, Dec 7, 2018 at 3:59 PM Padma Penumarthy <
> > penumarthy.pa...@gmail.com>
> > wrote:
> >
> > > Congrats Karthik.
> > >
> > > Thanks
> > > Padma
> > >
> > >
> > > On Fri, Dec 7, 2018 at 1:33 PM Paul Rogers 
> > > wrote:
> > >
> > > > Congrats Karthik!
> > > >
> > > > - Paul
> > > >
> > > > Sent from my iPhone
> > > >
> > > > > On Dec 7, 2018, at 11:12 AM, Abhishek Girish 
> > > wrote:
> > > > >
> > > > > Congratulations Karthik!
> > > > >
> > > > >> On Fri, Dec 7, 2018 at 11:11 AM Arina Ielchiieva <
> ar...@apache.org>
> > > > wrote:
> > > > >>
> > > > >> The Project Management Committee (PMC) for Apache Drill has
> invited
> > > > >> Karthikeyan
> > > > >> Manivannan to become a committer, and we are pleased to announce
> > that
> > > he
> > > > >> has accepted.
> > > > >>
> > > > >> Karthik started contributing to the Drill project in 2016. He has
> > > > >> implemented changes in various Drill areas, including batch
> sizing,
> > > > >> security, code-gen, C++ part. One of his latest improvements is
> ACL
> > > > >> support for Drill ZK nodes.
> > > > >>
> > > > >> Welcome Karthik, and thank you for your contributions!
> > > > >>
> > > > >> - Arina
> > > > >> (on behalf of Drill PMC)
> > > > >>
> > > >
> > >
> >
>

Re: [ANNOUNCE] New Committer: Hanumath Rao Maduri

2018-11-01 Thread Chunhui Shi

Congratulations Hanu!
--
From:Arina Ielchiieva 
Send Time:2018 Nov 1 (Thu) 06:05
To:dev ; user 
Subject:[ANNOUNCE] New Committer: Hanumath Rao Maduri

The Project Management Committee (PMC) for Apache Drill has invited Hanumath
Rao Maduri to become a committer, and we are pleased to announce that he
has accepted.

Hanumath became a contributor in 2017, making changes mostly in the Drill
planning side, including lateral / unnest support. He is also one of the
contributors of index based planning and execution support.

Welcome Hanumath, and thank you for your contributions!

- Arina
(on behalf of Drill PMC)

Re: [ANNOUNCE] New Committer: Gautam Parai

2018-10-23 Thread Chunhui Shi

Congrats Gautam!
--
From:Gautam Parai 
Send Time:2018 Oct 22 (Mon) 18:12
To:dev 
Subject:Re: [ANNOUNCE] New Committer: Gautam Parai

Thank you so much all! I hope to continue contributing to the Drill
community even more :)

Gautam

On Mon, Oct 22, 2018 at 5:27 PM weijie tong  wrote:

> Congratulations Gautam !
>
> On Tue, Oct 23, 2018 at 6:28 AM Aman Sinha  wrote:
>
> > Congratulations Gautam !
> >
> > On Mon, Oct 22, 2018 at 3:00 PM Jyothsna Reddy 
> > wrote:
> >
> > > Congrats Gautam!!
> > >
> > >
> > >
> > > On Mon, Oct 22, 2018 at 2:01 PM Vitalii Diravka 
> > > wrote:
> > >
> > > > Congratulations!
> > > >
> > > > On Mon, Oct 22, 2018 at 10:54 PM Khurram Faraaz 
> > > wrote:
> > > >
> > > > > Congrats Gautam!
> > > > >
> > > > > On Mon, Oct 22, 2018 at 10:29 AM Abhishek Girish <
> agir...@apache.org
> > >
> > > > > wrote:
> > > > >
> > > > > > Congrats Gautam!
> > > > > >
> > > > > > On Mon, Oct 22, 2018 at 10:19 AM Karthikeyan Manivannan <
> > > > > > kmanivan...@mapr.com> wrote:
> > > > > >
> > > > > > > Congrats !
> > > > > > >
> > > > > > > On Mon, Oct 22, 2018 at 10:07 AM Kunal Khatua <
> ku...@apache.org>
> > > > > wrote:
> > > > > > >
> > > > > > > > Congratulations, Gautam!
> > > > > > > > On 10/22/2018 10:02:46 AM, Paul Rogers
> >  > > >
> > > > > > > wrote:
> > > > > > > > Congrats Guatam!
> > > > > > > >
> > > > > > > > - Paul
> > > > > > > >
> > > > > > > > Sent from my iPhone
> > > > > > > >
> > > > > > > > > On Oct 22, 2018, at 8:46 AM, salim achouche wrote:
> > > > > > > > >
> > > > > > > > > Congrats Gautam!
> > > > > > > > >
> > > > > > > > >> On Mon, Oct 22, 2018 at 7:25 AM Arina Ielchiieva wrote:
> > > > > > > > >>
> > > > > > > > >> The Project Management Committee (PMC) for Apache Drill
> has
> > > > > invited
> > > > > > > > Gautam
> > > > > > > > >> Parai to become a committer, and we are pleased to
> announce
> > > that
> > > > > he
> > > > > > > has
> > > > > > > > >> accepted.
> > > > > > > > >>
> > > > > > > > >> Gautam has become a contributor since 2016, making changes
> > in
> > > > > > various
> > > > > > > > Drill
> > > > > > > > >> areas including planning side. He is also one of the
> > > > contributors
> > > > > of
> > > > > > > the
> > > > > > > > >> upcoming feature to support index based planning and
> > > execution.
> > > > > > > > >>
> > > > > > > > >> Welcome Gautam, and thank you for your contributions!
> > > > > > > > >>
> > > > > > > > >> - Arina
> > > > > > > > >> (on behalf of Drill PMC)
> > > > > > > > >>
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > --
> > > > > > > > > Regards,
> > > > > > > > > Salim
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: [HANGOUT] [new link] Topics for October 02 2018

2018-10-13 Thread Chunhui Shi

Hi Aman, are you going to send out the slides in another email?

Regards,
Chunhui
--
From:Aman Sinha 
Send Time:2018 Oct 12 (Fri) 10:59
To:user ; dev 
Subject:Re: [HANGOUT] [new link] Topics for October 02 2018

Attached is a PDF version of the slides.  Unfortunately, I don't have a 
recording. 

thanks,
Aman


On Thu, Oct 11, 2018 at 9:39 AM Pritesh Maker  wrote:
Divya -  anyone is welcome to join the hangout! Aman will be sharing the
 slides shortly. We use Google Hangouts which doesn't have the option to
 record the session.

 On Thu, Oct 11, 2018 at 1:06 AM Divya Gehlot 
 wrote:

 > Can we have the recordings of the talk for the benefit of the other drill
 > users in the community or it is a closed affair ?
 >
 >
 > Thanks,
 > Divya
 >
 > On Sat, 29 Sep 2018 at 05:13, Karthikeyan Manivannan  >
 > wrote:
 >
 > > Hi,
 > >
 > > We will have a Drill Hangout on October 2 2018 at 10 AM Pacific Time.
 > > Please suggest topics by replying to this thread.
 > >
 > > We now have a ==new Hangout link== that supports 25 participants
 > >
 > https://urldefense.proofpoint.com/v2/url?u=http-3A__meet.google.com_yki-2Diqdf-2Dtai=DwIBaQ=cskdkSMqhcnjZxdQVpwTXg=zySISmkmM4WNViCKijENtQ=qfLrUky-Q0VH16D_8DbqCu_9zAq0dy_xYHqNyo_LBZ4=49AFD7imHiJVVZgvJm_bepjET2MbgKn8axkfn7BFvPI=
 > >
 > > Please note that in this Hangout, non-MapR participants will have to wait
 > > to be let into the call by a MapR participant. Sorry for the
 > inconvenience.
 > >
 > > Thanks.
 > >
 > > Karthik
 > >
 >

Re: [ANNOUNCE] New Committer: Chunhui Shi

2018-09-28 Thread Chunhui Shi

Thank you Arina, PMCs, and every driller friends! I deeply appreciate the 
opportunity to be part of this global growing community of awesome developers.

Best regards,
Chunhui 


--
From:Arina Ielchiieva 
Send Time:2018 Sep 28 (Fri) 02:17
To:dev ; user 
Subject:[ANNOUNCE] New Committer: Chunhui Shi

The Project Management Committee (PMC) for Apache Drill has invited Chunhui
Shi to become a committer, and we are pleased to announce that he has
accepted.

Chunhui Shi has become a contributor since 2016, making changes in various
Drill areas. He has shown profound knowledge in Drill planning side during
his work to support lateral join. He is also one of the contributors of the
upcoming feature to support index based planning and execution.

Welcome Chunhui, and thank you for your contributions!

- Arina
(on behalf of Drill PMC)

Re: Publish Drill Calcite project artifacts to Apache maven repository

2018-09-12 Thread Chunhui Shi

For CALCITE-1178 and other loosing type check things, if the concern was that 
it is not compliant with SQL standard, should this be a SQL flavor defined in 
one of compliance? So Calcite users (e.g. Drill) can choose a customized 
compliance to enable the implicit type conversion.
--
Sender:Julian Hyde 
Sent at:2018 Sep 12 (Wed) 17:04
To:dev 
Cc:dev 
Subject:Re: Publish Drill Calcite project artifacts to Apache maven repository

Probably down to me. Although, in my defense, it is hard to be gatekeeper for 
big messy changes that are of obvious benefit to the contributor but not such 
obvious benefit to the rest of the project.

Is there a PR for CALCITE-1178?

Julian


> On Sep 12, 2018, at 10:32 AM, Vova Vysotskyi  wrote:
> 
> Thanks for your responses and clarifications!
> 
> Regarding the reasons for using the fork:
> We would love to move to the Apache Calcite instead of using the fork!
> 
> And we tried very hard to do it, especially during the rebase from 1.4 to
> 1.15 (DRILL-3993 ).
> But unfortunately, there left three Jiras, which weren't accepted by the
> Calcite community yet:
> CALCITE-2087 ,
> CALCITE-2018  and
> CALCITE-1178 .
> 
> Kind regards,
> Volodymyr Vysotskyi
> 
> 
> On Wed, Sep 12, 2018 at 7:39 PM Julian Hyde  wrote:
> 
>> I can confirm what Josh says about OSSRH. You need to fill out a form with
>> Sonatype that convinces them that you own the groupId (basically a domain
>> name). Then they give you authorization to publish artifacts under that
>> groupId. For example, I publish artifacts under the sqlline and
>> net.hydromatic groupIds.
>> 
>>> On Sep 12, 2018, at 9:28 AM, Josh Elser  wrote:
>>> 
>>> Maven central is made up of a number of "Trusted" Maven repositories.
>> This includes the ASF and OSSRH Maven repositories. Many other
>> organizations run "mirrors" of central.
>>> 
>>> The ASF Maven repo is published to by ASF projects who have gone through
>> the ASF release process. OSSRH allows any release which meets the criteria
>> described here[1]. As an individual, you are within your rights to publish
>> your fork of Calcite to OSSRH as long as there are no legal or trademark
>> concerns. It would be imperative to not cause confusion with official
>> Apache Calcite releases -- clear branding and separate Maven
>> groupId/artifactId "coordinates" should be sufficient.
>>> 
>>> However, since you are (presumably) acting as a member of Apache Drill,
>> it would be very odd (and potentially against ASF policy) to make a release
>> of software that *isn't* using the ASF Maven resources. This gives me some
>> pause -- do you have an ASF member on your PMC you can run this by?
>>> 
>>> Finally, as a Calcite PMC member, I feel obligated to ask why Drill
>> needs to maintain this fork, and see if there is something that can be done
>> from the Calcite side to get you "back on upstream"? Why the need to make
>> long-term plans to isolate Apache Drill from Apache Calcite?
>>> 
>>> [1] https://central.sonatype.org/pages/ossrh-guide.html
>>> 
>>> On 9/12/18 11:33 AM, Vova Vysotskyi wrote:
 Hi all,
 As you know, Drill uses its fork of Apache Calcite.
 In DRILL-6711  was
 proposed to deploy Drill Calcite project artifacts
 to Apache Maven repository or at least to the central maven repository.
 I have looked for the similar cases of fork versions and didn't find
 anything similar in the central repo.
 Also, I have looked at the Sonatype OSSRH Jiras for similar cases
 of deploying fork versions, but that projects used custom groupIds.
 Could someone please give me the advice what is the acceptable way
 of publishing the custom Drill Calcite artifacts to the central repo and
 is it possible to publish them without changing groupId?
 Kind regards,
 Volodymyr Vysotskyi
>> 
>>

Re: [IDEAS] Drill start up quotes

2018-09-12 Thread Chunhui Shi

Some more quotes:

We drill to know we're not alone
Good friends, good books, and a drill cluster: this is the ideal life
Outside of a dog, Drill is man's best friend


--
Sender:Arina Yelchiyeva 
Sent at:2018 Sep 11 (Tue) 10:27
To:user 
Cc:dev 
Subject:Re: [IDEAS] Drill start up quotes

Some quotes ideas:

drill never goes out of style
everything is easier with drill

Kunal,
regarding config, sounds reasonable, I'll do that.

Kind regards,
Arina


On Tue, Sep 11, 2018 at 12:17 AM Benedikt Koehler 
wrote:

> You told me to drill sergeant! (Forrest Gump)
>
> Benedikt
> @furukama
>
>
> Kunal Khatua  schrieb am Mo. 10. Sep. 2018 um 21:01:
>
> > +1 on the suggestion.
> >
> > I would also suggest that we change the backend implementation of the
> > quotes to refer to a properties file (within the classpath) rather than
> > have it hard coded within the SqlLine package.  This will ensure that new
> > quotes can be added with every release without the need to touch the
> > SqlLine fork for Drill.
> >
> > ~ Kunal
> > On 9/10/2018 7:06:59 AM, Arina Ielchiieva  wrote:
> > Hi all,
> >
> > we are close to SqlLine 1.5.0 upgrade which now has the mechanism to
> > preserve Drill customizations. This one does include multiline support
> but
> > the next release might.
> > You all know that one of the Drill customizations is quotes at startup. I
> > was thinking we might want to fresh up the list a little bit.
> >
> > Here is the current list:
> >
> > start your sql engine
> > this isn't your grandfather's sql
> > a little sql for your nosql
> > json ain't no thang
> > drill baby drill
> > just drill it
> > say hello to my little drill
> > what ever the mind of man can conceive and believe, drill can query
> > the only truly happy people are children, the creative minority and drill
> > users
> > a drill is a terrible thing to waste
> > got drill?
> > a drill in the hand is better than two in the bush
> >
> > If anybody has new serious / funny / philosophical / creative quotes
> > ideas, please share and we can consider adding them to the existing list.
> >
> > Kind regards,
> > Arina
> >
> --
>
> --
> Dr. Benedikt Köhler
> Kreuzweg 4 • 82131 Stockdorf
> Mobil: +49 170 333 0161 • Telefon: +49 89 857 45 84
> Mail: bened...@eigenarbeit.org
>

[GitHub] drill pull request #1224: DRILL-6321: Customize Drill's conformance. Allow s...

2018-04-30 Thread chunhui-shi

Github user chunhui-shi commented on a diff in the pull request:

https://github.com/apache/drill/pull/1224#discussion_r185159367
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/planner/sql/DrillConformance.java
 ---
@@ -0,0 +1,43 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.planner.sql;
+
+import org.apache.calcite.sql.validate.SqlConformanceEnum;
+import org.apache.calcite.sql.validate.SqlDelegatingConformance;
+
+/**
+ * Drill's SQL conformance is SqlConformanceEnum.DEFAULT except for method 
isApplyAllowed().
+ * Since Drill is going to allow OUTER APPLY and CROSS APPLY to allow each 
row from left child of Join
+ * to join with output of right side (sub-query or table function that 
will be invoked for each row).
+ * Refer to DRILL-5999 for more information.
+ */
+public class DrillConformance extends SqlDelegatingConformance {
--- End diff --

I think DrillConformance should be an independent class since we rely on 
this class to define the conformance used in Drill and we might add something 
more in the future to enable some other syntax. Let me know if this answer your 
suggestion.


---

[GitHub] drill pull request #1224: DRILL-6321: Customize Drill's conformance. Allow s...

2018-04-19 Thread chunhui-shi

GitHub user chunhui-shi opened a pull request:

https://github.com/apache/drill/pull/1224

DRILL-6321: Customize Drill's conformance. Allow support to APPLY keyâ¦

â¦words

Conflicts:

exec/java-exec/src/main/java/org/apache/drill/exec/planner/sql/SqlConverter.java

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/chunhui-shi/drill work1

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/drill/pull/1224.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1224


commit bc22d2a1ff6d687c228aaafdf2010b8c379f2577
Author: chunhui-shi <cshi@...>
Date:   2018-02-02T18:03:38Z

DRILL-6321: Customize Drill's conformance. Allow support to APPLY keywords

Conflicts:

exec/java-exec/src/main/java/org/apache/drill/exec/planner/sql/SqlConverter.java




---

bi-weekly Hangout at April 17th 10:00am PST

2018-04-16 Thread Chunhui Shi

We will have our routine hangout tomorrow.


Please raise any topic you want to discuss before the meeting or at the 
beginning of the meeting.


https://hangouts.google.com/hangouts/_/event/ci4rdiju8bv04a64efj5fedd0lc


Best,

Chunhui

[GitHub] drill issue #1198: DRILL-6294: Changes to support Calcite 1.16.0

2018-04-12 Thread chunhui-shi

Github user chunhui-shi commented on the issue:

https://github.com/apache/drill/pull/1198
  
+1. Thank you for making the fix.


---

[GitHub] drill pull request #1198: DRILL-6294: Changes to support Calcite 1.16.0

2018-03-30 Thread chunhui-shi

Github user chunhui-shi commented on a diff in the pull request:

https://github.com/apache/drill/pull/1198#discussion_r178395314
  
--- Diff: 
exec/jdbc/src/main/java/org/apache/drill/jdbc/impl/DrillResultSetImpl.java ---
@@ -1142,16 +1142,8 @@ public void moveToCurrentRow() throws SQLException {
   }
 
   @Override
-  public AvaticaStatement getStatement() {
-try {
-  throwIfClosed();
-} catch (AlreadyClosedSqlException e) {
-  // Can't throw any SQLException because AvaticaConnection's
-  // getStatement() is missing "throws SQLException".
-  throw new RuntimeException(e.getMessage(), e);
-} catch (SQLException e) {
-  throw new RuntimeException(e.getMessage(), e);
-}
+  public AvaticaStatement getStatement() throws SQLException {
+throwIfClosed();
--- End diff --

Since you are touching this file. You might want to remove not needed 
Exceptions for throwIfClosed() method that are derives of SqlException.


---

[GitHub] drill pull request #1198: DRILL-6294: Changes to support Calcite 1.16.0

2018-03-30 Thread chunhui-shi

Github user chunhui-shi commented on a diff in the pull request:

https://github.com/apache/drill/pull/1198#discussion_r178364263
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/planner/logical/DrillReduceAggregatesRule.java
 ---
@@ -218,7 +218,8 @@ private void reduceAggs(
 RelOptUtil.createProject(
 newAggRel,
 projList,
-oldAggRel.getRowType().getFieldNames());
+oldAggRel.getRowType().getFieldNames(),
+DrillRelFactories.LOGICAL_BUILDER);
--- End diff --

Could you explain why we are using DrillRelFactories.LOGICAL_BUILDER but 
not relBuilderFactory that was used in line 211? And could you point me to this 
4 param createProject method with Factory as the last param?


---

[GitHub] drill issue #1152: DRILL-6199: Add support for filter push down and partitio...

2018-03-19 Thread chunhui-shi

Github user chunhui-shi commented on the issue:

https://github.com/apache/drill/pull/1152
  
+1, good to me.


---

[jira] [Created] (DRILL-6193) Latest Calcite optimized out join condition and cause "This query cannot be planned possibly due to either a cartesian join or an inequality join"

2018-02-27 Thread Chunhui Shi (JIRA)

Chunhui Shi created DRILL-6193:
--

 Summary: Latest Calcite optimized out join condition and cause 
"This query cannot be planned possibly due to either a cartesian join or an 
inequality join"
 Key: DRILL-6193
 URL: https://issues.apache.org/jira/browse/DRILL-6193
 Project: Apache Drill
  Issue Type: Bug
  Components: Query Planning  Optimization
Affects Versions: 1.13.0
Reporter: Chunhui Shi
Assignee: Hanumath Rao Maduri
 Fix For: 1.13.0


I got the same error on apache master's MapR profile on the tip(before Hive 
upgrade) and on changeset 9e944c97ee6f6c0d1705f09d531af35deed2e310, the last 
commit of Calcite upgrade with the failed query reported in functional test but 
now it is on parquet file:
 
{quote}SELECT L.L_QUANTITY, L.L_DISCOUNT, L.L_EXTENDEDPRICE, L.L_TAX
 
FROM cp.`tpch/lineitem.parquet` L, cp.`tpch/orders.parquet` O
WHERE cast(L.L_ORDERKEY as int) = cast(O.O_ORDERKEY as int) AND 
cast(L.L_LINENUMBER as int) = 7 AND cast(L.L_ORDERKEY as int) = 10208 AND 
cast(O.O_ORDERKEY as int) = 10208;
 {quote}
However, built Drill on commit ef0fafea214e866556fa39c902685d48a56001e1, the 
commit right before Calcite upgrade commits, the same query worked.

This was caused by latest Calcite simplified the predicates and during this 
process, "cast(L.L_ORDERKEY as int) = cast(O.O_ORDERKEY as int) " was 
considered redundant and was removed, so the logical plan of this query is 
getting an always true condition for Join:
{quote}DrillJoinRel(condition=[true], joinType=[inner])
{quote}
While in previous version we have 
{quote}DrillJoinRel(condition=[=($5, $0)], joinType=[inner])
{quote}
 

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Re: How to get org.apache.calcite of calcite for drill

2018-02-26 Thread Chunhui Shi

The forked calcite is under MapR's repository:

https://github.com/mapr/incubator-calcite


Calcite for Drill-1.12.0 and before is in this branch: 
DrillCalcite1.4.0.
  The latest Drill's master branch is upgraded to use forked branch of Calcite 
1.15: 
DrillCalcite1.15.0


You could file a Drill JIRA to work on the fix you want to make.







From: Julian Hyde 
Sent: Monday, February 26, 2018 11:47:51 AM
To: dev@drill.apache.org
Cc: "郑伟杰(乙一)"
Subject: Re: How to get org.apache.calcite of calcite for drill

Does the Drill web site describe where to find the Calcite fork? I strongly 
believe that it should.

Julian


> On Feb 24, 2018, at 10:18 AM, Gautam Parai  wrote:
>
> Drill uses its own fork of Calcite. You could open a JIRA in Apache Calcite 
> and commit the changes, then port those back to Drill's forked version - a 
> committer can help you in porting/deploying the new JAR with your changes 
> (which will also bump up the version e.g. from r23 to r24).
>
>
> With the upgrade to latest Calcite which will be available in the next 
> release, I think we would no longer need Drill's forked version. But in 
> either case you would have to first get in your changes to Apache Calcite.
>
> 
> From: 郑伟杰(乙一) 
> Sent: Saturday, February 24, 2018 2:02:19 AM
> To: dev
> Subject: How to get org.apache.calcite of calcite for drill
>
> Hi everyone:In my scenario, I want to make calcite to support implicitly 
> cast when joinFilter, so i want to hack some code of calcite. But drill use 
> the specified version (1.4.0-drill-r23).   So I want to get the specified 
> source code, what can i do ?

[GitHub] drill pull request #1104: DRILL-6118: Handle item star columns during projec...

2018-02-05 Thread chunhui-shi

Github user chunhui-shi commented on a diff in the pull request:

https://github.com/apache/drill/pull/1104#discussion_r166066830
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/project/ProjectRecordBatch.java
 ---
@@ -596,10 +596,10 @@ private void classifyExpr(final NamedExpression ex, 
final RecordBatch incoming,
 final NameSegment ref = ex.getRef().getRootSegment();
 final boolean exprHasPrefix = 
expr.getPath().contains(StarColumnHelper.PREFIX_DELIMITER);
 final boolean refHasPrefix = 
ref.getPath().contains(StarColumnHelper.PREFIX_DELIMITER);
-final boolean exprIsStar = expr.getPath().equals(SchemaPath.WILDCARD);
-final boolean refContainsStar = 
ref.getPath().contains(SchemaPath.WILDCARD);
-final boolean exprContainsStar = 
expr.getPath().contains(SchemaPath.WILDCARD);
-final boolean refEndsWithStar = 
ref.getPath().endsWith(SchemaPath.WILDCARD);
+final boolean exprIsStar = 
expr.getPath().equals(SchemaPath.DYNAMIC_STAR);
--- End diff --

Why don't we need to handle WILDCARD case anymore?


---

[GitHub] drill pull request #1104: DRILL-6118: Handle item star columns during projec...

2018-02-05 Thread chunhui-shi

Github user chunhui-shi commented on a diff in the pull request:

https://github.com/apache/drill/pull/1104#discussion_r166094020
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/planner/logical/DrillFilterItemStarReWriterRule.java
 ---
@@ -0,0 +1,232 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.planner.logical;
+
+import com.google.common.collect.ImmutableList;
+import com.google.common.collect.ImmutableSet;
+import org.apache.calcite.adapter.enumerable.EnumerableTableScan;
+import org.apache.calcite.plan.RelOptRule;
+import org.apache.calcite.plan.RelOptRuleCall;
+import org.apache.calcite.plan.RelOptRuleOperand;
+import org.apache.calcite.plan.RelOptTable;
+import org.apache.calcite.prepare.RelOptTableImpl;
+import org.apache.calcite.rel.RelNode;
+import org.apache.calcite.rel.core.CorrelationId;
+import org.apache.calcite.rel.core.Filter;
+import org.apache.calcite.rel.core.Project;
+import org.apache.calcite.rel.core.TableScan;
+import org.apache.calcite.rel.logical.LogicalFilter;
+import org.apache.calcite.rel.logical.LogicalProject;
+import org.apache.calcite.rel.type.RelDataType;
+import org.apache.calcite.rel.type.RelDataTypeFactory;
+import org.apache.calcite.rel.type.RelDataTypeField;
+import org.apache.calcite.rex.RexCall;
+import org.apache.calcite.rex.RexInputRef;
+import org.apache.calcite.rex.RexNode;
+import org.apache.calcite.rex.RexVisitorImpl;
+import org.apache.calcite.schema.Table;
+import org.apache.drill.exec.planner.types.RelDataTypeDrillImpl;
+import org.apache.drill.exec.planner.types.RelDataTypeHolder;
+import org.apache.drill.exec.util.Utilities;
+
+import java.util.ArrayList;
+import java.util.Collection;
+import java.util.HashMap;
+import java.util.List;
+import java.util.Map;
+
+import static 
org.apache.drill.exec.planner.logical.FieldsReWriterUtil.DesiredField;
+import static 
org.apache.drill.exec.planner.logical.FieldsReWriterUtil.FieldsReWriter;
+
+/**
+ * Rule will transform filter -> project -> scan call with item star 
fields in filter
+ * into project -> filter -> project -> scan where item star fields are 
pushed into scan
+ * and replaced with actual field references.
+ *
+ * This will help partition pruning and push down rules to detect fields 
that can be pruned or push downed.
+ * Item star operator appears when sub-select or cte with star are used as 
source.
+ */
+public class DrillFilterItemStarReWriterRule extends RelOptRule {
+
+  public static final DrillFilterItemStarReWriterRule INSTANCE = new 
DrillFilterItemStarReWriterRule(
+  RelOptHelper.some(Filter.class, RelOptHelper.some(Project.class, 
RelOptHelper.any( TableScan.class))),
+  "DrillFilterItemStarReWriterRule");
+
+  private DrillFilterItemStarReWriterRule(RelOptRuleOperand operand, 
String id) {
+super(operand, id);
+  }
+
+  @Override
+  public void onMatch(RelOptRuleCall call) {
+Filter filterRel = call.rel(0);
+Project projectRel = call.rel(1);
+TableScan scanRel = call.rel(2);
+
+ItemStarFieldsVisitor itemStarFieldsVisitor = new 
ItemStarFieldsVisitor(filterRel.getRowType().getFieldNames());
--- End diff --

Other test cases should be covered are: 
nested field names, 
refer to two different fields under the same parent, eg. a.b and a.c.
and array type referred in filters and projects.


---

Re: [ANNOUNCE] New PMC member: Paul Rogers

2018-01-30 Thread Chunhui Shi

Congrats Paul! Well deserved!


From: Kunal Khatua 
Sent: Tuesday, January 30, 2018 2:05:56 PM
To: dev@drill.apache.org
Subject: RE: [ANNOUNCE] New PMC member: Paul Rogers

Congratulations, Paul !

-Original Message-
From: salim achouche [mailto:sachouc...@gmail.com]
Sent: Tuesday, January 30, 2018 2:00 PM
To: dev@drill.apache.org; Padma Penumarthy 
Subject: Re: [ANNOUNCE] New PMC member: Paul Rogers

Congrats Paul!

Regards,
Salim

> On Jan 30, 2018, at 1:58 PM, Padma Penumarthy  wrote:
>
> Congratulations Paul.
>
> Thanks
> Padma
>
>
>> On Jan 30, 2018, at 1:55 PM, Gautam Parai  wrote:
>>
>> Congratulations Paul!
>>
>> 
>> From: Timothy Farkas 
>> Sent: Tuesday, January 30, 2018 1:54:43 PM
>> To: dev@drill.apache.org
>> Subject: Re: [ANNOUNCE] New PMC member: Paul Rogers
>>
>> Congrats!
>>
>> 
>> From: Aman Sinha 
>> Sent: Tuesday, January 30, 2018 1:50:07 PM
>> To: dev@drill.apache.org
>> Subject: [ANNOUNCE] New PMC member: Paul Rogers
>>
>> I am pleased to announce that Drill PMC invited Paul Rogers to the
>> PMC and he has accepted the invitation.
>>
>> Congratulations Paul and thanks for your contributions !
>>
>> -Aman
>> (on behalf of Drill PMC)
>

Re: LATERAL and UNNEST support for Drill

2018-01-30 Thread Chunhui Shi

Hi Julian, I think CROSS APPLY and OUTER APPLY are what we want and we have 
discussed internally. the only problem is, they are not standard SQL although 
they are supported in SQL server and Oracle 12. Since SQL syntax does not have 
way to "invoke table function for each row", we have to choose between using 
APPLY or overloading the meaning of LATERAL as in the current document attached 
in the JIRA. Which way you think is the better way?


Thanks,

Chunhui


From: Julian Hyde <jh...@apache.org>
Sent: Tuesday, January 30, 2018 12:01:47 PM
To: dev@drill.apache.org
Subject: Re: LATERAL and UNNEST support for Drill

LATERAL is a prefix operator not a binary operator, so I believe you are 
missing a comma:

> FROM t1 LATERAL UNNEST (t1.array1), UNNEST (t1.array2)

should be

> FROM t1, LATERAL UNNEST (t1.array1), LATERAL UNNEST (t1.array2)

I agree with your remarks about the extra power of putting UNNEST in the FROM 
clause (per the standard) versus the SELECT clause (per PostgreSQL).

Note that Calcite supports CROSS APPLY and OUTER APPLY[1]. This is useful when 
you want to apply a table function for each row of a table. It is just 
syntactic sugar for LATERAL TABLE so you may get it virtually for free.

Julian


[1] 
https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.org_jira_browse_CALCITE-2D1472=DwIFAg=cskdkSMqhcnjZxdQVpwTXg=FCGQb-L4gJ1XbsL1WU2sugDtPvzIxWFzAi5u4TTtxaI=9Y08i3YgrresMOxi7InbjxT0WSHQkcPjJufQWLI9PGk=PfJfEyQhvXOSwuTo04m94qSHfz2KHZrR2WPazXpUl6g=
 
<https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.org_jira_browse_CALCITE-2D1472=DwIFAg=cskdkSMqhcnjZxdQVpwTXg=FCGQb-L4gJ1XbsL1WU2sugDtPvzIxWFzAi5u4TTtxaI=9Y08i3YgrresMOxi7InbjxT0WSHQkcPjJufQWLI9PGk=PfJfEyQhvXOSwuTo04m94qSHfz2KHZrR2WPazXpUl6g=>



> On Jan 29, 2018, at 8:58 AM, Sorabh Hamirwasia <shamirwa...@mapr.com> wrote:
>
> Hi Ted,
> Thanks for you question. Array type aggregator is not planned along with this 
> project. But probably after this is done we can look into it.
>
> Thanks,
> Sorabh
>
> Get Outlook for 
> iOS<https://urldefense.proofpoint.com/v2/url?u=https-3A__aka.ms_o0ukef=DwIFAg=cskdkSMqhcnjZxdQVpwTXg=FCGQb-L4gJ1XbsL1WU2sugDtPvzIxWFzAi5u4TTtxaI=9Y08i3YgrresMOxi7InbjxT0WSHQkcPjJufQWLI9PGk=InfpmexAnhHoPUeNA7M-E8qIORMLXwvsqDfFAA69glg=>
> 
> From: Ted Dunning <ted.dunn...@gmail.com>
> Sent: Sunday, January 28, 2018 10:30:30 PM
> To: dev@drill.apache.org
> Cc: Chunhui Shi; Parth Chandra; Aman Sinha; Sorabh Hamirwasia
> Subject: Re: LATERAL and UNNEST support for Drill
>
>
> I haven't looked at the design doc, but this is a great thing to have.
>
> Would you be building something to do the inverse as well?
>
> Something like an aggregator such as array_collect, perhaps?
>
>
>
> On Thu, Jan 25, 2018 at 2:56 PM, Sorabh Hamirwasia 
> <sohami.apa...@gmail.com<mailto:sohami.apa...@gmail.com>> wrote:
> Hi All,
>
> We (people in cc list) have been looking into design for support of LATERAL
> and UNNEST within Drill. With upgrade of Calcite to 1.15, these keywords
> are supported in Calcite too. As a first cut we have created a design
> document which proposes the changes and limitation's for this project.
> There are still few items which are in progress. I am sharing the JIRA
> details along with link to design document below. Please feel free to take
> a look and provide any feedback.
>
>
> DRILL-5999 
> <https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.org_jira_browse_DRILL-2D5999=DwIFAg=cskdkSMqhcnjZxdQVpwTXg=FCGQb-L4gJ1XbsL1WU2sugDtPvzIxWFzAi5u4TTtxaI=9Y08i3YgrresMOxi7InbjxT0WSHQkcPjJufQWLI9PGk=v0h-mUzhxxODVbYyAScphlkT0gnnF6vdvLAiaND2JvY=<https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.org_jira_browse_DRILL-2D5999=DwMFaQ=cskdkSMqhcnjZxdQVpwTXg=gRpEl0WzXE3EMrwj0KFbZXGXRyadOthF2jlYxvhTlQg=iKu0hCGDHFbZsyzbsmTFjCYxYuLB4FUf26dimMQ8ErE=s-Ja1U7TeOgi96_QzCmtlKlV9S8uvtBgjfywbJKp-Tw=>>
>
> Design Document
> <https://urldefense.proofpoint.com/v2/url?u=https-3A__docs.google.com_document_d_1-2DRCIJ0F7VwAqOxkVB305zADwtX-2DOS43Qj2kUmIILUaQ_edit-3Fusp-3Dsharing=DwIFAg=cskdkSMqhcnjZxdQVpwTXg=FCGQb-L4gJ1XbsL1WU2sugDtPvzIxWFzAi5u4TTtxaI=9Y08i3YgrresMOxi7InbjxT0WSHQkcPjJufQWLI9PGk=30KoGGEUntrnoT-iDtbBdvnQKz25w3l-Op_ksZVNhRA=<https://urldefense.proofpoint.com/v2/url?u=https-3A__docs.google.com_document_d_1-2DRCIJ0F7VwAqOxkVB305zADwtX-2DOS43Qj2kUmIILUaQ_edit-3Fusp-3Dsharing=DwMFaQ=cskdkSMqhcnjZxdQVpwTXg=gRpEl0WzXE3EMrwj0KFbZXGXRyadOthF2jlYxvhTlQg=iKu0hCGDHFbZsyzbsmTFjCYxYuLB4FUf26dimMQ8ErE=keKwGEfpcC8MLXAV4QFiAXASguRR9R1dsGMZdyUmd2E=>>
>
>
> Thanks,
> Sorabh
>

[GitHub] drill pull request #1096: DRILL-6099 : Push limit past flatten(project) with...

2018-01-29 Thread chunhui-shi

Github user chunhui-shi commented on a diff in the pull request:

https://github.com/apache/drill/pull/1096#discussion_r164585670
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/planner/logical/DrillPushLimitToScanRule.java
 ---
@@ -121,4 +132,50 @@ protected void doOnMatch(RelOptRuleCall call, 
DrillLimitRel limitRel, DrillScanR
 }
 
   }
+
+  private static boolean isProjectFlatten(RelNode project) {
--- End diff --

I think it might be more general to name the functions to schemaUnknown(for 
conert_fromJson), rowCountUnknown(for flatten), so if in future we have some 
other functions fall in these two categories, we could easily add these 
functions to the categories. What do you think?


---

[GitHub] drill issue #1096: DRILL-6099 : Push limit past flatten(project) without pus...

2018-01-22 Thread chunhui-shi

Github user chunhui-shi commented on the issue:

https://github.com/apache/drill/pull/1096
  
Once all tests are done, I think it is fine to add 'ready-to-commit' label 
to the JIRA.


---

[GitHub] drill issue #1096: DRILL-6099 : Push limit past flatten(project) without pus...

2018-01-22 Thread chunhui-shi

Github user chunhui-shi commented on the issue:

https://github.com/apache/drill/pull/1096
  
+1


---

[jira] [Created] (DRILL-6103) lsb_release: command not found

2018-01-22 Thread Chunhui Shi (JIRA)

Chunhui Shi created DRILL-6103:
--

 Summary: lsb_release: command not found
 Key: DRILL-6103
 URL: https://issues.apache.org/jira/browse/DRILL-6103
 Project: Apache Drill
  Issue Type: Bug
Reporter: Chunhui Shi


Got this error when running drillbit.sh:

 

$ bin/drillbit.sh restart
bin/drill-config.sh: line 317: lsb_release: command not found



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[GitHub] drill pull request #1096: DRILL-6099 : Push limit past flatten(project) with...

2018-01-22 Thread chunhui-shi

Github user chunhui-shi commented on a diff in the pull request:

https://github.com/apache/drill/pull/1096#discussion_r163048747
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/planner/PlannerPhase.java ---
@@ -341,7 +346,7 @@ static RuleSet getPruneScanRules(OptimizerRulesContext 
optimizerRulesContext) {
 
ParquetPruneScanRule.getFilterOnProjectParquet(optimizerRulesContext),
 
ParquetPruneScanRule.getFilterOnScanParquet(optimizerRulesContext),
 DrillPushLimitToScanRule.LIMIT_ON_SCAN,
-DrillPushLimitToScanRule.LIMIT_ON_PROJECT
+DrillPushLimitToScanRule.LIMIT_ON_PROJECT_SCAN
--- End diff --

Not sure if we still need "limit_on_project_scan". In theory, 
limit_on_project and limit_on_scan should already cover all the cases. Have you 
tested with "limit_on_project_scan" disabled?


---

[GitHub] drill pull request #1089: DRILL-6078: support timestamp type to be pushed in...

2018-01-19 Thread chunhui-shi

Github user chunhui-shi closed the pull request at:

https://github.com/apache/drill/pull/1089


---

[jira] [Created] (DRILL-6092) Support latest MapR release in format-maprdb storage plugin

2018-01-16 Thread Chunhui Shi (JIRA)

Chunhui Shi created DRILL-6092:
--

 Summary: Support latest MapR release in format-maprdb storage 
plugin
 Key: DRILL-6092
 URL: https://issues.apache.org/jira/browse/DRILL-6092
 Project: Apache Drill
  Issue Type: Bug
 Environment: Latest MapRDB release is 6.0. Apache Drill still has 5.2 
MapRDB libraries to build together with format-maprdb plugin. We should update 
to latest MapR. Simply bump up version in pom.xml is not working. 

Ideally we should allow users of Apache Drill to decide which version of MapR 
platform to pick, and Drill should work with latest major release (6.0 or 6.x)  
AND last major release, (5.2.1 or 5.2 or 5.x)

The same apply to other storage plugins, we should allow an easy way to 
configure which version of underneath storage to connect when build Drill.
Reporter: Chunhui Shi






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[GitHub] drill issue #1066: DRILL-3993: Changes to support Calcite 1.15

2018-01-16 Thread chunhui-shi

Github user chunhui-shi commented on the issue:

https://github.com/apache/drill/pull/1066
  
ï¼1. Thank you for addressing the comments.


---

[GitHub] drill pull request #1066: DRILL-3993: Changes to support Calcite 1.15

2018-01-15 Thread chunhui-shi

Github user chunhui-shi commented on a diff in the pull request:

https://github.com/apache/drill/pull/1066#discussion_r161607926
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/aggregate/HashAggTemplate.java
 ---
@@ -1303,6 +1305,8 @@ private void checkGroupAndAggrValues(int 
incomingRowIdx) {
   long memDiff = allocator.getAllocatedMemory() - allocatedBeforeHTput;
   if ( memDiff > 0 ) { logger.warn("Leak: HashTable put() OOM left 
behind {} bytes allocated",memDiff); }
 
+  checkForSpillPossibility(currentPartition);
--- End diff --

Not sure this check 'chooseAPartitionToFlush'  is needed. If an exception 
is desired, I would think modifying doSpill() is better way e.g. modifying this 
line: "  if ( victimPartition < 0 ) { return; } " Otherwise in this process 
chooseAPartitionToFlush will be called twice.

  int victimPartition = chooseAPartitionToFlush(currentPartition, 
forceSpill);

  // In case no partition has more than one batch -- try and "push the 
limits"; maybe next
  // time the spill could work.
  if ( victimPartition < 0 ) { return; } 


---

[GitHub] drill pull request #1089: DRILL-6078: support timestamp type to be pushed in...

2018-01-12 Thread chunhui-shi

GitHub user chunhui-shi opened a pull request:

https://github.com/apache/drill/pull/1089

DRILL-6078: support timestamp type to be pushed into MapRDB



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/chunhui-shi/drill work3

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/drill/pull/1089.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1089


commit 1c2a7c94fcdb56e7e430879b26f1b3e5a5144d11
Author: chunhui-shi <cshi@...>
Date:   2018-01-12T23:14:44Z

DRILL-6078: support timestamp type to be pushed into MapRDB




---

[GitHub] drill pull request #1066: DRILL-3993: Changes to support Calcite 1.15

2018-01-11 Thread chunhui-shi

Github user chunhui-shi commented on a diff in the pull request:

https://github.com/apache/drill/pull/1066#discussion_r161077516
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/AggPruleBase.java
 ---
@@ -82,4 +83,19 @@ protected boolean create2PhasePlan(RelOptRuleCall call, 
DrillAggregateRel aggreg
 }
 return true;
   }
+
+  /**
+   * Returns group-by keys with the remapped arguments for specified 
aggregate.
+   *
+   * @param groupSet ImmutableBitSet of aggregate rel node, whose group-by 
keys should be remapped.
+   * @return {@link ImmutableBitSet} instance with remapped keys.
+   */
+  public static ImmutableBitSet remapGroupSet(ImmutableBitSet groupSet) {
--- End diff --

what is the reason we are going this remap with new calcite? 
And if the result is only depended on size of groupSet, we don't really 
need to iterate through the groupSet.


---

[GitHub] drill pull request #1078: DRILL-6054: don't try to split the filter when it ...

2018-01-10 Thread chunhui-shi

Github user chunhui-shi commented on a diff in the pull request:

https://github.com/apache/drill/pull/1078#discussion_r160741806
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/planner/logical/partition/FindPartitionConditions.java
 ---
@@ -195,8 +195,16 @@ private void popOpStackAndBuildFilter() {
  * For all other operators we clear the children if one of the
  * children is a no push.
  */
-assert currentOp.getOp().getKind() == SqlKind.AND;
-newFilter = currentOp.getChildren().get(0);
+if (currentOp.getOp().getKind() == SqlKind.AND) {
+  newFilter = currentOp.getChildren().get(0);
+  for(OpState opState : opStack) {
--- End diff --

done.


---

[GitHub] drill pull request #1078: DRILL-6054: don't try to split the filter when it ...

2018-01-10 Thread chunhui-shi

Github user chunhui-shi commented on a diff in the pull request:

https://github.com/apache/drill/pull/1078#discussion_r160741771
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/planner/logical/partition/FindPartitionConditions.java
 ---
@@ -228,13 +236,16 @@ private boolean isHolisticExpression(RexCall call) {
 return false;
   }
 
+  protected boolean inputRefToPush(RexInputRef inputRef) {
--- End diff --

This is intentionally made to be 'protected' for future extension.
Right now, FindPartitionCondition use position based inputRef(using BitSet 
dirs) to mark which inputRef should be pushed. But in future, we may use name 
based policy to decide which one to push. 


---

[jira] [Created] (DRILL-6077) To take advantage of pre-aggregate results when generating plans for aggregation

2018-01-08 Thread Chunhui Shi (JIRA)

Chunhui Shi created DRILL-6077:
--

 Summary: To take advantage of pre-aggregate results when 
generating plans for aggregation
 Key: DRILL-6077
 URL: https://issues.apache.org/jira/browse/DRILL-6077
 Project: Apache Drill
  Issue Type: Bug
Reporter: Chunhui Shi


User could generate aggregation results (count, avg, min, max) for segments of 
data stored either in summary tables or in metadata stores. Planner should be 
able to leverage these results either by direct querying these pre-aggregation 
results from these summary tables or combining pre-aggregation results of old 
data with results from new data.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[GitHub] drill pull request #1078: DRILL-6054: don't try to split the filter when it ...

2017-12-29 Thread chunhui-shi

GitHub user chunhui-shi opened a pull request:

https://github.com/apache/drill/pull/1078

DRILL-6054: don't try to split the filter when it is not AND



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/chunhui-shi/drill work1

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/drill/pull/1078.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1078


commit e44ed46471317f43c494a497551e6546016f3a10
Author: chunhui-shi <cshi@...>
Date:   2017-12-22T23:42:27Z

DRILL-6054: don't try to split the filter when it is not AND




---

Re: Implementing inner joins for mongo

2017-12-29 Thread Chunhui Shi

Hi Dennis,


I would suggest you to look at the logical plan of this query will be 
generating by running this sql statement 'explain plan without implementation 
for https://github.com/apache/calcite/blob/master/core/src/main/java/org/apache/calcite/rel/rules/JoinPushThroughJoinRule.java


Hope this helps,


Chunhui


From: Dennis Knochenwefel 
Sent: Thursday, December 28, 2017 4:48:29 AM
To: dev@drill.apache.org
Subject: Implementing inner joins for mongo

Hello Drill Dev Pros,

I have found the drill mongo store and would like to extend it to push
down INNER JOINs. Therefore I would like to rewrite INNER JOINs into the
mongo aggregation pipeline. Here is a SQL example:

SELECT *
FROM `mymongo.db`.`facts` `facts`
   INNER JOIN `mymongo.db`.`set` `set` ON (`facts`.`group` = `set`.`group`)
WHERE ((`set`.`date` = '09.05.2017') AND (`set`.`id` = '1'))

Could you give me a hint how to do that? I am familiar with the
aggregation pipeline of mongo, but am not sure how to implement the
rewrite. I have found the push down of WHERE clauses for mongo [1]

But I am still struggling to do the same for inner joins. If I implement
"public class MongoPushDownInnerJoinScan extends
StoragePluginOptimizerRule" then how would the constructor look like.
Which equivalent of MongoGroupScan (AbstractGroupScan) [2] would I have
to implement? Any help would be very much appreciated.

Thank you and kind regards,

Dennis


[1]
https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_apache_drill_blob_master_contrib_storage-2Dmongo_src_main_java_org_apache_drill_exec_store_mongo_MongoPushDownFilterForScan.java=DwIDaQ=cskdkSMqhcnjZxdQVpwTXg=FCGQb-L4gJ1XbsL1WU2sugDtPvzIxWFzAi5u4TTtxaI=AgtqT-QgqWrLpsLSl6rmMe7wlrIS_ZxLsgbN53Dw9UE=deXYtgUqGSZXm13dZpenqfHCMpb_H6fL6DVcSHXTMQs=

[2]
https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_apache_drill_blob_master_contrib_storage-2Dmongo_src_main_java_org_apache_drill_exec_store_mongo_MongoGroupScan.java=DwIDaQ=cskdkSMqhcnjZxdQVpwTXg=FCGQb-L4gJ1XbsL1WU2sugDtPvzIxWFzAi5u4TTtxaI=AgtqT-QgqWrLpsLSl6rmMe7wlrIS_ZxLsgbN53Dw9UE=zuccYRJCUAMkxZwbmaDUWZoUBdvXf8jvkQGQcS3qfZE=

--
Dennis Knochenwefel
Founder
Reportix
Germany

[GitHub] drill issue #1066: DRILL-3993: Changes to support Calcite 1.15

2017-12-29 Thread chunhui-shi

Github user chunhui-shi commented on the issue:

https://github.com/apache/drill/pull/1066
  
Can you file a pull request to mapr/incubator-calcite with your changes in 
https://github.com/KulykRoman/incubator-calcite/commits/DrillCalcite1.15.0_rc0?
So we can review and get these changes into official branch to publish.


---

[jira] [Created] (DRILL-6056) Mock datasize could overflow to negative

2017-12-22 Thread Chunhui Shi (JIRA)

Chunhui Shi created DRILL-6056:
--

 Summary: Mock datasize could overflow to negative
 Key: DRILL-6056
 URL: https://issues.apache.org/jira/browse/DRILL-6056
 Project: Apache Drill
  Issue Type: Task
Reporter: Chunhui Shi


In some cases, mock datasize (rowCount * rowWidth) could be too large, 
especially when we test spilling or memory OOB exception.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Created] (DRILL-6055) Session Multiplexing

2017-12-22 Thread Chunhui Shi (JIRA)

Chunhui Shi created DRILL-6055:
--

 Summary: Session Multiplexing
 Key: DRILL-6055
 URL: https://issues.apache.org/jira/browse/DRILL-6055
 Project: Apache Drill
  Issue Type: Task
Reporter: Chunhui Shi


We could allow one connection to carry multiple user sessions.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Created] (DRILL-6054) Issues in FindPartitionConditions

2017-12-22 Thread Chunhui Shi (JIRA)

Chunhui Shi created DRILL-6054:
--

 Summary: Issues in FindPartitionConditions
 Key: DRILL-6054
 URL: https://issues.apache.org/jira/browse/DRILL-6054
 Project: Apache Drill
  Issue Type: Bug
Reporter: Chunhui Shi
Assignee: Chunhui Shi


When the condition is these cases, partition is not done correctly: 
b = 3 OR (dir0 = 1 and a = 2)
not (dir0 = 1 AND b = 2)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[GitHub] drill pull request #1066: DRILL-3993: Changes to support Calcite 1.15

2017-12-21 Thread chunhui-shi

Github user chunhui-shi commented on a diff in the pull request:

https://github.com/apache/drill/pull/1066#discussion_r158335378
  
--- Diff: exec/java-exec/src/main/codegen/includes/parserImpls.ftl ---
@@ -351,4 +351,23 @@ SqlNode SqlDropFunction() :
{
return new SqlDropFunction(pos, jar);
}
-}
\ No newline at end of file
+}
+
+<#if !parser.includeCompoundIdentifier >
--- End diff --

do we need a test case for this newly added 
ParenthesizedCompoundIdentifierList?


---

[GitHub] drill pull request #1066: DRILL-3993: Changes to support Calcite 1.15

2017-12-20 Thread chunhui-shi

Github user chunhui-shi commented on a diff in the pull request:

https://github.com/apache/drill/pull/1066#discussion_r158173488
  
--- Diff: exec/java-exec/src/test/resources/record/test_recorditerator.json 
---
@@ -60,7 +60,7 @@
 @id:2,
 child:1,
 pop:"project",
-exprs:[ { ref : "`*`", expr : "`*`"} ]
+exprs:[ { ref : "`**`", expr : "`**`"} ]
--- End diff --

Not sure I understand this '**' thing, can you explain more about this 
change?


---

[GitHub] drill pull request #1066: DRILL-3993: Changes to support Calcite 1.15

2017-12-20 Thread chunhui-shi

Github user chunhui-shi commented on a diff in the pull request:

https://github.com/apache/drill/pull/1066#discussion_r158173904
  
--- Diff: exec/jdbc-all/pom.xml ---
@@ -572,7 +572,7 @@
   This is likely due to you adding new 
dependencies to a java-exec and not updating the excludes in this module. This 
is important as it minimizes the size of the dependency of Drill application 
users.
 
 
-2900
+3100
--- End diff --

I played with this branch and I have to change the size of 3100 to 
3200, it might due to my build environment but we may want to increase it 
to 3200.


---

[GitHub] drill pull request #1066: DRILL-3993: Changes to support Calcite 1.15

2017-12-20 Thread chunhui-shi

Github user chunhui-shi commented on a diff in the pull request:

https://github.com/apache/drill/pull/1066#discussion_r158117382
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/planner/sql/SqlConverter.java
 ---
@@ -470,34 +576,32 @@ public void disallowTemporaryTables() {
  * @throws UserException if temporary tables usage is disallowed
  */
 @Override
-public RelOptTableImpl getTable(final List names) {
-  RelOptTableImpl temporaryTable = null;
-
-  if (mightBeTemporaryTable(names, session.getDefaultSchemaPath(), 
drillConfig)) {
-String temporaryTableName = 
session.resolveTemporaryTableName(names.get(names.size() - 1));
-if (temporaryTableName != null) {
-  List temporaryNames = 
Lists.newArrayList(temporarySchema, temporaryTableName);
-  temporaryTable = super.getTable(temporaryNames);
+public Prepare.PreparingTable getTable(final List names) {
+  String originalTableName = 
session.getOriginalTableNameFromTemporaryTable(names.get(names.size() - 1));
+  if (originalTableName != null) {
+if (!allowTemporaryTables) {
+  throw UserException
+  .validationError()
+  .message("Temporary tables usage is disallowed. Used 
temporary table name: [%s].", originalTableName)
+  .build(logger);
 }
   }
-  if (temporaryTable != null) {
-if (allowTemporaryTables) {
-  return temporaryTable;
+  // Fix for select from hbase table with schema name in query 
(example: "SELECT col FROM hbase.t)
+  // from hbase schema (did "USE hbase" before).
--- End diff --

Could you explain why this is needed now? I think this used to work -- if a 
schema is not found under default, Drill falls back to the root to do the 
search. 

What got changed thus you have to introduce this fix? 

What about this test case?
"use hbase; select t.col, t2.col2 from hbase2.t2 as t2, hbase.t as t where 
t.id = t2.id"



---

[GitHub] drill pull request #1066: DRILL-3993: Changes to support Calcite 1.15

2017-12-20 Thread chunhui-shi

Github user chunhui-shi commented on a diff in the pull request:

https://github.com/apache/drill/pull/1066#discussion_r158105572
  
--- Diff: 
contrib/storage-jdbc/src/main/java/org/apache/drill/exec/store/jdbc/JdbcPrel.java
 ---
@@ -62,7 +62,7 @@ public JdbcPrel(RelOptCluster cluster, RelTraitSet 
traitSet, JdbcIntermediatePre
 (JavaTypeFactory) getCluster().getTypeFactory());
 final JdbcImplementor.Result result =
 jdbcImplementor.visitChild(0, input.accept(new SubsetRemover()));
-sql = result.asQuery().toSqlString(dialect).getSql();
+sql = result.asSelect().toSqlString(dialect).getSql();
--- End diff --

Is the 'result' here guaranteed to be a SqlSelect?


---

[GitHub] drill pull request #1066: DRILL-3993: Changes to support Calcite 1.15

2017-12-20 Thread chunhui-shi

Github user chunhui-shi commented on a diff in the pull request:

https://github.com/apache/drill/pull/1066#discussion_r158120135
  
--- Diff: 
contrib/storage-hbase/src/main/java/org/apache/drill/exec/store/hbase/HBaseSchemaFactory.java
 ---
@@ -72,7 +72,16 @@ public AbstractSchema getSubSchema(String name) {
 @Override
 public Table getTable(String name) {
   HBaseScanSpec scanSpec = new HBaseScanSpec(name);
-  return new DrillHBaseTable(schemaName, plugin, scanSpec);
+  try {
+return new DrillHBaseTable(schemaName, plugin, scanSpec);
+  } catch (Exception e) {
+// Calcite firstly is looking for a table in the default schema, 
if a table was not found,
--- End diff --

'is looking for' and 'is looking in' seems to be saying Calcite IS working 
in this line of code, but I think you meant that for the new version calcite, 
it 'looks for' something... so would like to get some rephrase here.


---

[GitHub] drill pull request #1066: DRILL-3993: Changes to support Calcite 1.15

2017-12-20 Thread chunhui-shi

Github user chunhui-shi commented on a diff in the pull request:

https://github.com/apache/drill/pull/1066#discussion_r158103733
  
--- Diff: 
contrib/storage-hbase/src/main/java/org/apache/drill/exec/store/hbase/HBaseSchemaFactory.java
 ---
@@ -72,7 +72,16 @@ public AbstractSchema getSubSchema(String name) {
 @Override
 public Table getTable(String name) {
   HBaseScanSpec scanSpec = new HBaseScanSpec(name);
-  return new DrillHBaseTable(schemaName, plugin, scanSpec);
+  try {
+return new DrillHBaseTable(schemaName, plugin, scanSpec);
+  } catch (Exception e) {
+// Calcite firstly is looking for a table in the default schema, 
if a table was not found,
+// it is looking in root schema.
+// If a table does not exist, a query will fail at validation 
stage,
+// so the error should not be thrown there.
--- End diff --

do you mean 'should not be thrown HERE'? The same for other places.


---

[GitHub] drill pull request #795: DRILL-5089: Get only partial schemas of relevant st...

2017-11-21 Thread chunhui-shi

Github user chunhui-shi closed the pull request at:

https://github.com/apache/drill/pull/795


---

[GitHub] drill pull request #1032: DRILL-5089: Dynamically load schema of storage plu...

2017-11-20 Thread chunhui-shi

Github user chunhui-shi commented on a diff in the pull request:

https://github.com/apache/drill/pull/1032#discussion_r152073873
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/planner/sql/DynamicRootSchema.java
 ---
@@ -0,0 +1,140 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.planner.sql;
+
+import com.google.common.collect.ImmutableSortedSet;
+import com.google.common.collect.Lists;
+import com.google.common.collect.Sets;
+import org.apache.calcite.DataContext;
+import org.apache.calcite.jdbc.CalciteRootSchema;
+import org.apache.calcite.jdbc.CalciteSchema;
+
+import org.apache.calcite.linq4j.tree.Expression;
+import org.apache.calcite.linq4j.tree.Expressions;
+import org.apache.calcite.schema.SchemaPlus;
+import org.apache.calcite.schema.impl.AbstractSchema;
+import org.apache.calcite.util.BuiltInMethod;
+import org.apache.calcite.util.Compatible;
+import org.apache.drill.common.exceptions.ExecutionSetupException;
+import org.apache.drill.exec.store.SchemaConfig;
+import org.apache.drill.exec.store.StoragePlugin;
+import org.apache.drill.exec.store.StoragePluginRegistry;
+import org.apache.drill.exec.store.SubSchemaWrapper;
+
+import java.io.IOException;
+import java.util.List;
+import java.util.Map;
+import java.util.NavigableSet;
+import java.util.Set;
+
+/**
+ * This class is to allow us loading schemas from storage plugins later 
when {@link #getSubSchema(String, boolean)}
+ * is called.
+ */
+public class DynamicRootSchema extends DynamicSchema
+implements CalciteRootSchema {
+  private static final org.slf4j.Logger logger = 
org.slf4j.LoggerFactory.getLogger(DynamicRootSchema.class);
+  /** Creates a root schema. */
+  DynamicRootSchema(StoragePluginRegistry storages, SchemaConfig 
schemaConfig) {
+super(null, new RootSchema(), "");
+this.schemaConfig = schemaConfig;
+this.storages = storages;
+  }
+
+  @Override
+  public CalciteSchema getSubSchema(String schemaName, boolean 
caseSensitive) {
+CalciteSchema retSchema = getSubSchemaMap().get(schemaName);
+if (retSchema != null) {
+  return retSchema;
+}
+
+loadSchemaFactory(schemaName, caseSensitive);
+retSchema = getSubSchemaMap().get(schemaName);
+return retSchema;
+  }
+
+  @Override
+  public NavigableSet getTableNames() {
+Set pluginNames = Sets.newHashSet();
+for (Map.Entry<String, StoragePlugin> storageEntry : 
getSchemaFactories()) {
+  pluginNames.add(storageEntry.getKey());
+}
+return Compatible.INSTANCE.navigableSet(
+ImmutableSortedSet.copyOf(
+Sets.union(pluginNames, getSubSchemaMap().keySet(;
+  }
+
+  /**
+   * load schema factory(storage plugin) for schemaName
+   * @param schemaName
+   * @param caseSensitive
+   */
+  public void loadSchemaFactory(String schemaName, boolean caseSensitive) {
+try {
+  SchemaPlus thisPlus = this.plus();
+  StoragePlugin plugin = getSchemaFactories().getPlugin(schemaName);
+  if (plugin != null) {
+plugin.registerSchemas(schemaConfig, thisPlus);
+return;
+  }
+
+  // we could not find the plugin, the schemaName could be `dfs.tmp`, 
a 2nd level schema under 'dfs'
+  String[] paths = schemaName.split("\\.");
+  if (paths.length == 2) {
+plugin = getSchemaFactories().getPlugin(paths[0]);
+if (plugin == null) {
+  return;
+}
+
+// we could find storage plugin for first part(e.g. 'dfs') of 
schemaName (e.g. 'dfs.tmp')
+// register schema for this storage plugin to 'this'.
+plugin.registerSchemas(schemaConfig, thisPlus);
--- End diff --

we get to this place only when that split got an array of length 2 and 
af

[GitHub] drill pull request #1032: DRILL-5089: Dynamically load schema of storage plu...

2017-11-17 Thread chunhui-shi

Github user chunhui-shi commented on a diff in the pull request:

https://github.com/apache/drill/pull/1032#discussion_r151798147
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/planner/sql/DynamicRootSchema.java
 ---
@@ -0,0 +1,140 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.planner.sql;
+
+import com.google.common.collect.ImmutableSortedSet;
+import com.google.common.collect.Lists;
+import com.google.common.collect.Sets;
+import org.apache.calcite.DataContext;
+import org.apache.calcite.jdbc.CalciteRootSchema;
+import org.apache.calcite.jdbc.CalciteSchema;
+
+import org.apache.calcite.linq4j.tree.Expression;
+import org.apache.calcite.linq4j.tree.Expressions;
+import org.apache.calcite.schema.SchemaPlus;
+import org.apache.calcite.schema.impl.AbstractSchema;
+import org.apache.calcite.util.BuiltInMethod;
+import org.apache.calcite.util.Compatible;
+import org.apache.drill.common.exceptions.ExecutionSetupException;
+import org.apache.drill.exec.store.SchemaConfig;
+import org.apache.drill.exec.store.StoragePlugin;
+import org.apache.drill.exec.store.StoragePluginRegistry;
+import org.apache.drill.exec.store.SubSchemaWrapper;
+
+import java.io.IOException;
+import java.util.List;
+import java.util.Map;
+import java.util.NavigableSet;
+import java.util.Set;
+
+/**
+ * This class is to allow us loading schemas from storage plugins later 
when {@link #getSubSchema(String, boolean)}
+ * is called.
+ */
+public class DynamicRootSchema extends DynamicSchema
+implements CalciteRootSchema {
+  private static final org.slf4j.Logger logger = 
org.slf4j.LoggerFactory.getLogger(DynamicRootSchema.class);
+  /** Creates a root schema. */
+  DynamicRootSchema(StoragePluginRegistry storages, SchemaConfig 
schemaConfig) {
+super(null, new RootSchema(), "");
+this.schemaConfig = schemaConfig;
+this.storages = storages;
+  }
+
+  @Override
+  public CalciteSchema getSubSchema(String schemaName, boolean 
caseSensitive) {
+CalciteSchema retSchema = getSubSchemaMap().get(schemaName);
+if (retSchema != null) {
+  return retSchema;
+}
+
+loadSchemaFactory(schemaName, caseSensitive);
+retSchema = getSubSchemaMap().get(schemaName);
+return retSchema;
+  }
+
+  @Override
+  public NavigableSet getTableNames() {
+Set pluginNames = Sets.newHashSet();
+for (Map.Entry<String, StoragePlugin> storageEntry : 
getSchemaFactories()) {
+  pluginNames.add(storageEntry.getKey());
+}
+return Compatible.INSTANCE.navigableSet(
+ImmutableSortedSet.copyOf(
+Sets.union(pluginNames, getSubSchemaMap().keySet(;
+  }
+
+  /**
+   * load schema factory(storage plugin) for schemaName
+   * @param schemaName
+   * @param caseSensitive
+   */
+  public void loadSchemaFactory(String schemaName, boolean caseSensitive) {
+try {
+  SchemaPlus thisPlus = this.plus();
+  StoragePlugin plugin = getSchemaFactories().getPlugin(schemaName);
+  if (plugin != null) {
+plugin.registerSchemas(schemaConfig, thisPlus);
+return;
+  }
+
+  // we could not find the plugin, the schemaName could be `dfs.tmp`, 
a 2nd level schema under 'dfs'
+  String[] paths = schemaName.split("\\.");
+  if (paths.length == 2) {
+plugin = getSchemaFactories().getPlugin(paths[0]);
+if (plugin == null) {
+  return;
+}
+
+// we could find storage plugin for first part(e.g. 'dfs') of 
schemaName (e.g. 'dfs.tmp')
+// register schema for this storage plugin to 'this'.
+plugin.registerSchemas(schemaConfig, thisPlus);
+
+// we load second level schemas for this storage plugin
+final SchemaPlus fir

[GitHub] drill pull request #1032: DRILL-5089: Dynamically load schema of storage plu...

2017-11-17 Thread chunhui-shi

Github user chunhui-shi commented on a diff in the pull request:

https://github.com/apache/drill/pull/1032#discussion_r151793647
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/dfs/WorkspaceSchemaFactory.java
 ---
@@ -373,12 +402,12 @@ public String toString() {
   public class WorkspaceSchema extends AbstractSchema implements 
ExpandingConcurrentMap.MapValueFactory<TableInstance, DrillTable> {
 private final ExpandingConcurrentMap<TableInstance, DrillTable> tables 
= new ExpandingConcurrentMap<>(this);
 private final SchemaConfig schemaConfig;
-private final DrillFileSystem fs;
+private DrillFileSystem fs;
 
-public WorkspaceSchema(List parentSchemaPath, String wsName, 
SchemaConfig schemaConfig) throws IOException {
+public WorkspaceSchema(List parentSchemaPath, String wsName, 
SchemaConfig schemaConfig, DrillFileSystem fs) throws IOException {
   super(parentSchemaPath, wsName);
   this.schemaConfig = schemaConfig;
-  this.fs = 
ImpersonationUtil.createFileSystem(schemaConfig.getUserName(), fsConf);
+  this.fs = fs;
--- End diff --

Now we pass in fs instead creating from inside of WorkspaceSchema. 


---

[GitHub] drill pull request #1032: DRILL-5089: Dynamically load schema of storage plu...

2017-11-16 Thread chunhui-shi

Github user chunhui-shi commented on a diff in the pull request:

https://github.com/apache/drill/pull/1032#discussion_r151493351
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/dfs/WorkspaceSchemaFactory.java
 ---
@@ -175,6 +193,21 @@ public WorkspaceSchema createSchema(List 
parentSchemaPath, SchemaConfig
 return new WorkspaceSchema(parentSchemaPath, schemaName, schemaConfig);
   }
 
+  public WorkspaceSchema createSchema(List parentSchemaPath, 
SchemaConfig schemaConfig, DrillFileSystem fs) throws IOException {
+if (!accessible(fs)) {
--- End diff --

returning null then user could not even list this workspace, so they don't 
know the existence of this workspace at all. I think that is a good access 
control practice. 

If users expect to see a workspace but could not see it, then they need to 
figure out why by themselves.


---

[GitHub] drill pull request #1032: DRILL-5089: Dynamically load schema of storage plu...

2017-11-15 Thread chunhui-shi

Github user chunhui-shi commented on a diff in the pull request:

https://github.com/apache/drill/pull/1032#discussion_r151302799
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/planner/sql/DynamicRootSchema.java
 ---
@@ -0,0 +1,125 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.planner.sql;
+
+import com.google.common.collect.ImmutableSortedSet;
+import com.google.common.collect.Lists;
+import com.google.common.collect.Sets;
+import org.apache.calcite.DataContext;
+import org.apache.calcite.jdbc.CalciteRootSchema;
+import org.apache.calcite.jdbc.CalciteSchema;
+
+import org.apache.calcite.linq4j.tree.Expression;
+import org.apache.calcite.linq4j.tree.Expressions;
+import org.apache.calcite.schema.SchemaPlus;
+import org.apache.calcite.schema.impl.AbstractSchema;
+import org.apache.calcite.util.BuiltInMethod;
+import org.apache.calcite.util.Compatible;
+import org.apache.drill.common.exceptions.ExecutionSetupException;
+import org.apache.drill.exec.store.SchemaConfig;
+import org.apache.drill.exec.store.StoragePlugin;
+import org.apache.drill.exec.store.StoragePluginRegistry;
+import org.apache.drill.exec.store.SubSchemaWrapper;
+
+import java.io.IOException;
+import java.util.Map;
+import java.util.NavigableSet;
+import java.util.Set;
+
+public class DynamicRootSchema extends DynamicSchema
+implements CalciteRootSchema {
+
+  /** Creates a root schema. */
+  DynamicRootSchema(StoragePluginRegistry storages, SchemaConfig 
schemaConfig) {
+super(null, new RootSchema(), "");
+this.schemaConfig = schemaConfig;
+this.storages = storages;
+  }
+
+  @Override
+  public CalciteSchema getSubSchema(String schemaName, boolean 
caseSensitive) {
+CalciteSchema retSchema = getSubSchemaMap().get(schemaName);
+
+if (retSchema == null) {
+  loadSchemaFactory(schemaName, caseSensitive);
+}
+
+retSchema = getSubSchemaMap().get(schemaName);
+return retSchema;
+  }
+
+  @Override
+  public NavigableSet getTableNames() {
+Set pluginNames = Sets.newHashSet();
--- End diff --

plugin name in drill is case sensitive.


---

[GitHub] drill pull request #1032: DRILL-5089: Dynamically load schema of storage plu...

2017-11-15 Thread chunhui-shi

Github user chunhui-shi commented on a diff in the pull request:

https://github.com/apache/drill/pull/1032#discussion_r151301607
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/dfs/WorkspaceSchemaFactory.java
 ---
@@ -150,14 +152,30 @@ public WorkspaceSchemaFactory(
* @return True if the user has access. False otherwise.
*/
   public boolean accessible(final String userName) throws IOException {
-final FileSystem fs = ImpersonationUtil.createFileSystem(userName, 
fsConf);
+final DrillFileSystem fs = 
ImpersonationUtil.createFileSystem(userName, fsConf);
+return accessible(fs);
+  }
+
+  /**
+   * Checks whether a FileSystem object has the permission to list/read 
workspace directory
+   * @param fs a DrillFileSystem object that was created with certain user 
privilege
+   * @return True if the user has access. False otherwise.
+   * @throws IOException
+   */
+  public boolean accessible(DrillFileSystem fs) throws IOException {
 try {
-  // We have to rely on the listStatus as a FileSystem can have 
complicated controls such as regular unix style
-  // permissions, Access Control Lists (ACLs) or Access Control 
Expressions (ACE). Hadoop 2.7 version of FileSystem
-  // has a limited private API (FileSystem.access) to check the 
permissions directly
-  // (see https://issues.apache.org/jira/browse/HDFS-6570). Drill 
currently relies on Hadoop 2.5.0 version of
-  // FileClient. TODO: Update this when DRILL-3749 is fixed.
-  fs.listStatus(wsPath);
+  /**
+   * For Windows local file system, fs.access ends up using 
DeprecatedRawLocalFileStatus which has
+   * TrustedInstaller as owner, and a member of Administrators group 
could not satisfy the permission.
+   * In this case, we will still use method listStatus.
+   * In other cases, we use access method since it is cheaper.
+   */
+  if (SystemUtils.IS_OS_WINDOWS && 
fs.getUri().getScheme().equalsIgnoreCase("file")) {
--- End diff --

FileSystem in hdfs has a constant DEFAULT_FS "file:///", for now I will 
define our own.


---

[GitHub] drill pull request #1032: DRILL-5089: Dynamically load schema of storage plu...

2017-11-15 Thread chunhui-shi

Github user chunhui-shi commented on a diff in the pull request:

https://github.com/apache/drill/pull/1032#discussion_r151299650
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/SchemaTreeProvider.java
 ---
@@ -105,12 +106,36 @@ public SchemaPlus createRootSchema(final String 
userName, final SchemaConfigInfo
* @return
*/
   public SchemaPlus createRootSchema(SchemaConfig schemaConfig) {
+  final SchemaPlus rootSchema = 
DynamicSchema.createRootSchema(dContext.getStorage(), schemaConfig);
+  schemaTreesToClose.add(rootSchema);
+  return rootSchema;
+  }
+
+  /**
+   * Return full root schema with schema owner as the given user.
+   *
+   * @param userName Name of the user who is accessing the storage sources.
+   * @param provider {@link SchemaConfigInfoProvider} instance
+   * @return Root of the schema tree.
+   */
+  public SchemaPlus createFullRootSchema(final String userName, final 
SchemaConfigInfoProvider provider) {
+final String schemaUser = isImpersonationEnabled ? userName : 
ImpersonationUtil.getProcessUserName();
--- End diff --

not that many places, and need to pass in isImpersonationEnabled and 
userName if this line became a standalone method. will keep it as is for now.


---

[GitHub] drill pull request #1032: DRILL-5089: Dynamically load schema of storage plu...

2017-11-15 Thread chunhui-shi

Github user chunhui-shi commented on a diff in the pull request:

https://github.com/apache/drill/pull/1032#discussion_r151298428
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/dfs/FileSystemSchemaFactory.java
 ---
@@ -73,9 +87,10 @@ public void registerSchemas(SchemaConfig schemaConfig, 
SchemaPlus parent) throws
 
 public FileSystemSchema(String name, SchemaConfig schemaConfig) throws 
IOException {
   super(ImmutableList.of(), name);
+  final DrillFileSystem fs = 
ImpersonationUtil.createFileSystem(schemaConfig.getUserName(), 
plugin.getFsConf());
   for(WorkspaceSchemaFactory f :  factories){
-if (f.accessible(schemaConfig.getUserName())) {
-  WorkspaceSchema s = f.createSchema(getSchemaPath(), 
schemaConfig);
+WorkspaceSchema s = f.createSchema(getSchemaPath(), schemaConfig, 
fs);
+if ( s != null) {
--- End diff --

'factories' is from storage plugin, so it will be identical when we don't 
update this storage plugin. This FileSystemSchema constructor will be called 
only once for a query if this FileSystem storage plugin is needed for this 
query.



---

[jira] [Created] (DRILL-5969) unit test should continue to tests of storage plugins even there are some failures in exec

2017-11-15 Thread Chunhui Shi (JIRA)

Chunhui Shi created DRILL-5969:
--

 Summary: unit test should continue to tests of storage plugins 
even there are some failures in exec
 Key: DRILL-5969
 URL: https://issues.apache.org/jira/browse/DRILL-5969
 Project: Apache Drill
  Issue Type: Bug
Reporter: Chunhui Shi


We are seeing some random issues in unit tests such as 
https://issues.apache.org/jira/browse/DRILL-5925. While we should fix these 
issues, we may want to have different ways to handle such situation:

1, we may want to continue running the unit tests esp. those in storage plugins 
regardless non-fundamental random failures happened in exec module.
2, we may want to re-run these failed tests individually. If we saw these 
failed tests failed the first time but passed the second run when it was run 
individually, we could mark these as 'random failures' and decide to continue 
the whole set of unit tests.





--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Re: [DISCUSS] Drill 1.12.0 release

2017-11-09 Thread Chunhui Shi

Hi Arina,


Could we consider to include DRILL-5089 in 1.12.0? It is about lazy loading 
schema for storage plugins. Could you or Paul take a look at the pull request 
for this JIRA https://github.com/apache/drill/pull/1032? I think both of you 
are familiar with this part.


Thanks,


Chunhui


From: Arina Yelchiyeva 
Sent: Thursday, November 9, 2017 8:11:35 AM
To: dev@drill.apache.org
Subject: Re: [DISCUSS] Drill 1.12.0 release

Yes, they are already in master.

On Thu, Nov 9, 2017 at 6:05 PM, Charles Givre  wrote:

> We’re including the Networking functions in this release right?
>
> > On Nov 9, 2017, at 11:04, Arina Yelchiyeva 
> wrote:
> >
> > If changes will be done before cut off date, targeting mid November that
> it
> > will be possible to include this Jira.
> >
> > On Thu, Nov 9, 2017 at 6:03 PM, Charles Givre  wrote:
> >
> >> Hi Arina,
> >> Can we include DRILL-4091 Support for additional GIS operations in
> version
> >> 1.12?  In general the code looked pretty good.  There was a unit test
> >> missing which the developer submitted and some minor formatting issues
> >> which I’m still waiting on.
> >> Thanks,
> >> —C
> >>
> >>
> >>
> >>> On Nov 9, 2017, at 10:58, Arina Yelchiyeva  >
> >> wrote:
> >>>
> >>> Current status:
> >>>
> >>> Blocker:
> >>> DRILL-5917: Ban org.json:json library in Drill (developer - Vlad R.,
> code
> >>> reviewer - ?) - in progress.
> >>>
> >>> Targeted for 1.12 release:
> >>> DRILL-5337: OpenTSDB plugin (developer - Dmitriy & Vlad S., code
> >> reviewer -
> >>> Arina) - code review in final stage.
> >>> DRILL-4779: Kafka storage plugin support (developer - Anil & Kamesh,
> code
> >>> reviewer - Paul) - in review.
> >>> DRILL-5943: Avoid the strong check introduced by DRILL-5582 for PLAIN
> >>> mechanism (developer - Sorabh, code reviewer - Parth & Laurent) -
> waiting
> >>> for the code review.
> >>> DRILL-5771: Fix serDe errors for format plugins (developer - Arina,
> code
> >>> reviewer - Tim) - waiting for the code review.
> >>>
> >>> Kind regards
> >>> Arina
> >>>
> >>> On Fri, Oct 20, 2017 at 1:49 PM, Arina Yelchiyeva <
> >>> arina.yelchiy...@gmail.com> wrote:
> >>>
>  Current status:
> 
>  Targeted for 1.12 release:
>  DRILL-5832: Migrate OperatorFixture to use SystemOptionManager rather
> >> than
>  mock (developer - Paul, code reviewer - ?) - waiting for the code
> review
>  DRILL-5842: Refactor and simplify the fragment, operator contexts for
>  testing (developer - Paul, code reviewer - ?) - waiting for the code
>  review
>  DRILL-5834: Adding network functions (developer - Charles, code
> reviewer
>  - Arina) - waiting changes after code review
>  DRILL-5337: OpenTSDB plugin (developer - Dmitriy, code reviewer -
> >> Arina) - waiting
>  for the code review
>  DRILL-5772: Enable UTF-8 support in query string by default
> (developer -
>  Arina, code reviewer - Paul) - finalizing approach
>  DRILL-4779: Kafka storage plugin support (developer - Anil, code
> >> reviewer
>  - ?) - finishing implementation
> 
>  Under question:
>  DRILL-4286: Graceful shutdown of drillbit (developer - Jyothsna, code
>  reviewer - ?) - waiting for the status update from the developer
> 
>  Please free to suggest other items that are targeted for 1.12 release.
>  There are many Jiras that have fix version marked as 1.12, it would be
> >> good
>  if developers revisit them and update fix version to the actual one.
>  Link to the dashboard - https://issues.apache.org/
>  jira/secure/RapidBoard.jspa?rapidView=185=
> DRILL=detail
> 
>  Kind regards
>  Arina
> 
> 
>  On Wed, Oct 11, 2017 at 2:42 AM, Parth Chandra 
> >> wrote:
> 
> > I'm waiting to merge the SSL  changes in. Waiting a couple of days
> >> more to
> > see if there are any more comments before I merge the changes in.
> >
> > On Mon, Oct 9, 2017 at 10:28 AM, Paul Rogers 
> wrote:
> >
> >> Hi Arina,
> >>
> >> In addition to my own PRs, there are several in the “active” queue
> >> that
> > we
> >> could get in if we can just push them over the line and clear the
> >> queue.
> >> The owners of the PRs should check if we are waiting on them to take
> > action.
> >>
> >> 977 DRILL-5849: Add freemarker lib to dependencyManagement to
> >> ensure
> >> prop…
> >> 976 DRILL-5797: Choose parquet reader from read columns
> >> 975 DRILL-5743: Handling column family and column scan for hbase
> >> 973 DRILL-5775: Select * query on a maprdb binary table fails
> >> 972 DRILL-5838: Fix MaprDB filter pushdown for the case of
> nested
> >> field (reg. of DRILL-4264)
> >> 950 Drill 5431: SSL Support
> >> 949

[GitHub] drill pull request #1032: DRILL-5089: Dynamically load schema of storage plu...

2017-11-09 Thread chunhui-shi

GitHub user chunhui-shi opened a pull request:

https://github.com/apache/drill/pull/1032

DRILL-5089: Dynamically load schema of storage plugin only when needeâ¦

â¦d for every query

For each query, loading all storage plugins and loading all workspaces 
under file system plugins is not needed.

This patch use DynamicRootSchema as the root schema for Drill. Which loads 
correspondent storage only when needed.

infoschema to read full schema information and load second level schema 
accordingly.

for workspaces under the same Filesyetm, no need to create FileSystem for 
each workspace.

use fs.access API to check permission which is available after HDFS 2.6 
except for windows + local file system case.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/chunhui-shi/drill DRILL-5089-pull

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/drill/pull/1032.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1032


commit a381677c59a7371733bae12ad4896b7cc927da5e
Author: chunhui-shi <c...@maprtech.com>
Date:   2017-11-03T00:06:25Z

DRILL-5089: Dynamically load schema of storage plugin only when needed for 
every query

For each query, loading all storage plugins and loading all workspaces 
under file system plugins is not needed.

This patch use DynamicRootSchema as the root schema for Drill. Which loads 
correspondent storage only when needed.

infoschema to read full schema information and load second level schema 
accordingly.

for workspaces under the same Filesyetm, no need to create FileSystem for 
each workspace.

use fs.access API to check permission which is available after HDFS 2.6 
except for windows + local file system case.




---

[jira] [Created] (DRILL-5925) Unit test TestValueVector.testFixedVectorReallocation TestValueVector.testVariableVectorReallocation always fail

2017-11-03 Thread Chunhui Shi (JIRA)

Chunhui Shi created DRILL-5925:
--

 Summary: Unit test TestValueVector.testFixedVectorReallocation 
TestValueVector.testVariableVectorReallocation always fail
 Key: DRILL-5925
 URL: https://issues.apache.org/jira/browse/DRILL-5925
 Project: Apache Drill
  Issue Type: Bug
Reporter: Chunhui Shi


Tests in error: 
  TestValueVector.testFixedVectorReallocation »  Unexpected exception, 
expected<...
  TestValueVector.testVariableVectorReallocation »  Unexpected exception, 
expect...

Tests run: 2401, Failures: 0, Errors: 2, Skipped: 142

We are seeing these failures quite often. We should disable these two tests or 
modify the expected exception to be OutofMemory



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[GitHub] drill issue #981: DRILL-5854: IllegalStateException when empty batch with va...

2017-10-10 Thread chunhui-shi

Github user chunhui-shi commented on the issue:

https://github.com/apache/drill/pull/981
  
+1


---

Hangout minutes for Oct/3 2017

2017-10-05 Thread Chunhui Shi

Attendees: Sorabh, Sindhu, Padma, Arina, Vitalii, Volodymyr, Vova, Pritesh, 
Aman, Vlad, Boaz


We discussed about 1.12.0 release timeline, and might want to set the release 
time to early November. Arina offered to work as release manager for this 
release and will come up with the timeline proposal. Thanks Arina!


We also talked about some possible features to be included in 1.12.0. E.g. 
Kafka storage plugin. And what progress or obstacle in these works.


No other topic was raised.


Thank you everyone,


Chunhui

Drill hangout is going to be on regular time at 10:00am Pacific Time today

2017-10-03 Thread Chunhui Shi

Please use this link to join the hangout:


 https://plus.google.com/hangouts/_/event/ci4rdiju8bv04a64efj5fedd0lc


Thanks,


Chunhui

Re: Hex in Drill

2017-09-06 Thread Chunhui Shi

Have you tried VarBinary? For data type conversion, you may want to refer to 
this page: https://drill.apache.org/docs/data-type-conversion/, try these two 
pairs of functions, also I believe

  *
CONVERT_TO and 
CONVERT_FROM
  *   
STRING_BINARY
 and 
BINARY_STRING

Also you may want to try 'hex' or 'unhex', which is from Hive(written in Hive 
libraries), and Drill loads functions from Hive as well but may not test them 
thoroughly, so explore them to see if you can use them.

From: Charles Givre 
Sent: Wednesday, September 6, 2017 7:45:45 AM
To: dev
Subject: Hex in Drill

All,
I'm working on a format plugin in which the data contains a lot of
hexadecimal numbers.  I didn't see it in the docs, but does Drill have any
hex/dec/octal etc conversion functions?

Also, I realize this depends on the length of the integer, (most are
unsigned 4 or 8 bit ints), but what would be be best way to store these
fields in Drill?  I'm currently using the BigIntHolder() for that.  Is that
the best way?

I hope this makes sense.
-- C

Re: First Drill Ticket

2017-08-08 Thread Chunhui Shi

What is your github account?

From: Timothy Farkas 
Sent: Tuesday, August 8, 2017 3:31:13 PM
To: dev@drill.apache.org
Subject: First Drill Ticket

Hello All,

I'm getting started on my first newbie drill ticket 
https://issues.apache.org/jira/browse/DRILL-4211 . If I should be looking at a 
different ticket to do instead please let me know! Also how can I assign the 
ticket to myself? I seem to be blocked from doing so even though I already have 
an account on issues.apache.org.

Thanks,
Tim

[GitHub] drill pull request #776: DRILL-5165: limitRel to return correct rows for lim...

2017-08-08 Thread chunhui-shi

Github user chunhui-shi closed the pull request at:

https://github.com/apache/drill/pull/776


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

Re: Which code compiler is better

2017-08-01 Thread Chunhui Shi


Correct my previous response:

In DRILL-4778, JDK was faster in compilation but generated slower code. Janino 
was slower in compilation and generate faster code. Your JIRA did not mention 
how was the performance when running generated code. You may want to test this 
aspect as well.


From: weijie tong 
Sent: Sunday, July 30, 2017 6:10:12 AM
To: dev@drill.apache.org
Subject: Which code compiler is better

The compile process is long when we have 20 sum or avg expression and the
compiler is janino. But if we change the compiler to jdk，we gain lower
compile process time. It seems jdk compiler is better .If that's tue,why
not let jdk be the default one?

Re: Which code compiler is better

2017-07-31 Thread Chunhui Shi

A while ago my experiments ( https://issues.apache.org/jira/browse/DRILL-4778 ) 
show that JDK is more favorable in the aspect of the efficiency of generated 
code, as to the time of the compilation at that time it was Janino showing 
better performance. If in JDK 8 this is no more the case, I don't see any 
reason we need Janino for any case.


From: weijie tong 
Sent: Monday, July 31, 2017 8:31:41 AM
To: dev@drill.apache.org
Subject: Re: Which code compiler is better

here is JIRA link :  https://issues.apache.org/jira/browse/DRILL-5696

Our product environment is using JDK 8 , transformed code generation (not
plain java). Paul's experiments verified our product case.

The sql is like : "select
 (d.trade_cnt - d2.trade_cnt)/CAST(d2.trade_cnt AS DECIMAL(28,4)) as
trade_cnt_wr

,(d.trade_amt - d2.trade_amt)/CAST(d2.trade_amt AS DECIMAL(28,4)) as
trade_amt_wr
,(d.trade_shop_cnt - d2.trade_shop_cnt)/CAST(d2.trade_shop_cnt AS
DECIMAL(28,4)) as trade_shop_cnt_wr
,(d.online_shop_cnt - d2.online_shop_cnt)/CAST(d2.online_shop_cnt AS
DECIMAL(28,4)) as online_shop_cnt_wr
,CAST((d.trade_shop_rate - d2.trade_shop_rate) AS
DECIMAL(28,4))/CAST(d2.trade_shop_rate AS DECIMAL(28,4)) as
trade_shop_rate_wr
,(d.offline_item_cnt - d2.offline_item_cnt)/CAST(d2.offline_item_cnt
AS DECIMAL(28,4)) as offline_item_cnt_wr
,(d.business_amt_per_cnt -
d2.business_amt_per_cnt)/CAST(d2.business_amt_per_cnt AS
DECIMAL(28,4)) as business_amt_per_cnt_wr
,(d.order_amt_per_cnt -
d2.order_amt_per_cnt)/CAST(d2.order_amt_per_cnt AS DECIMAL(28,4)) as
order_amt_per_cnt_wr
,(d.new_shop_cnt - d2.new_shop_cnt)/CAST(d2.new_shop_cnt AS
DECIMAL(28,4)) as new_shop_cnt_wr
,(d.offline_shop_cnt - d2.offline_shop_cnt)/CAST(d2.offline_shop_cnt
AS DECIMAL(28,4)) as offline_shop_cnt_wr
,(d.item_use_cnt - d2.item_use_cnt)/CAST(d2.item_use_cnt AS
DECIMAL(28,4)) as item_use_cnt_wr
,(d.item_shop_rate - d2.item_shop_rate)/CAST(d2.item_shop_rate AS
DECIMAL(28,4)) as item_shop_rate_wr
,(d.discount_trd_cnt - d2.discount_trd_cnt)/CAST(d2.discount_trd_cnt
AS DECIMAL(28,4)) as discount_trd_cnt_wr
,(d.discount_shop_cnt -
d2.discount_shop_cnt)/CAST(d2.discount_shop_cnt AS DECIMAL(28,4)) as
discount_shop_cnt_wr
,(d.crm_shop_cnt - d2.crm_shop_cnt)/CAST(d2.crm_shop_cnt AS
DECIMAL(28,4)) as crm_shop_cnt_wr
,(d.crm_shop_rate - d2.crm_shop_rate)/CAST(d2.crm_shop_rate AS
DECIMAL(28,4)) as crm_shop_rate_wr
,(d.trade_cnt_voucher -
d2.trade_cnt_voucher)/CAST(d2.trade_cnt_voucher AS DECIMAL(28,4)) as
trade_cnt_voucher_wr
,(d.trade_amt_voucher -
d2.trade_amt_voucher)/CAST(d2.trade_amt_voucher AS DECIMAL(28,4)) as
trade_amt_voucher_wr
,(d.trade_cnt_per_shop -
d2.trade_cnt_per_shop)/CAST(d2.trade_cnt_per_shop AS DECIMAL(28,4)) as
trade_cnt_per_shop_wr

  from
x   "

The project operator's setup time is high when using janino compiler.






On Mon, Jul 31, 2017 at 10:54 AM, Paul Rogers  wrote:

> A while back I did some experiments with JDK 8. The Java 8 compiler
> appears to be faster in general than Janino, if I remember correctly. (Not
> surprising: many people focus on optimizing the Java compiler, a smaller
> team maintains Janino...)
>
>
> Another experiment was to do "plain Java" code generation and compile
> rather than the compile & byte-code merge we do now. The compilation was
> faster as was code execution. The main reason for the speed-up is that
> "plan Java" does fewer steps: it just compiles and loads. However,
> "traditional" Drill code generation compiles, does a byte code copy and
> merge and then loads. Some "templates" are rather large. By using "plain
> Java" subclassing, we need not copy the base class code as we do when doing
> the byte-code merge.
>
>
> Also, because each generated class (with plain Java) uses the same base
> class code, the JVM can reuse its JIT optimizations; it does not have to
> rediscover them for each new generated class.
>
>
> We've not had time to do full testing, so we conservatively stick with
> what we know works. Still,  preliminary testing did show that "plain Java"
> is both faster and more convenient. You can experiment with this option.
> Find the commented out line like the following in each operator (record
> batch):
>
>
>   // Uncomment out this line to debug the generated code.
>
> //cg.saveCodeForDebugging(true);
>
> Uncomment the line. You'll get debuggable plain Java code and compilation.
> The generated source code goes into /tmp/drill/codegen by default.
>
> Or, if you want to try for performance, and avoid the step of writing code
> to disk, use the following instead:
>
> cg.preferPlainJava(true);
>
> More details appears in [1].
>
> Thanks,
>
> - Paul
>
> [1] https://github.com/paul-rogers/drill/wiki/Code-
> Generation-and-%22Short%2C-Fat%22-Queries
> 
> From: Aman Sinha 
> Sent: Sunday, July 30, 2017 9:16:09 AM
> To: dev@drill.apache.org
> Subject: Re: Which code

[GitHub] drill pull request #875: DRILL-5671 Set secure ACLs (Access Control List) fo...

2017-07-21 Thread chunhui-shi

Github user chunhui-shi commented on a diff in the pull request:

https://github.com/apache/drill/pull/875#discussion_r128875470
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/coord/zk/ZKACLProviderFactory.java
 ---
@@ -0,0 +1,44 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.drill.exec.coord.zk;
+
+import org.apache.curator.framework.api.ACLProvider;
+import org.apache.curator.framework.imps.DefaultACLProvider;
+import org.apache.drill.common.config.DrillConfig;
+import org.apache.drill.exec.ExecConstants;
+
+
+public class ZKACLProviderFactory {
+
+static final org.slf4j.Logger logger = 
org.slf4j.LoggerFactory.getLogger(ZKACLProviderFactory.class);
+
+public static ACLProvider getACLProvider(DrillConfig config, String 
clusterId, String zkRoot) {
+if (config.getBoolean(ExecConstants.ZK_SECURE_ACL)) {
+if 
(config.getBoolean(ExecConstants.USER_AUTHENTICATION_ENABLED)){
+logger.trace("ZKACLProviderFactory: Using secure ZK ACL");
+return new ZKSecureACLProvider(clusterId, zkRoot);
+} else {
+logger.warn("ZKACLProviderFactory : Secure ZK ACL enabled 
but user authentication is disabled." +
--- End diff --

USER_AUTHENTICATION_ENABLED means the drill's client is logging in drill as 
some authenticated users, but enabling zookeeper ACL means we are going to 
protect our znodes using ACL. so other user should not be able to modify our 
znodes, I think this is a valid scenario, meaning no warning needed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

Re: Drill hangout today

2017-06-29 Thread Chunhui Shi

Hangout minutes:


Attendees: Jyothsna, Jinfeng, Pritesh, Boaz, Paul, Arina, John, Rob, etc

Arina as the release manager for 1.11.0 asked some questions about logistics of 
a new release for Drill.

Jinfeng provided some description about the workflow, like get PGP key, prepare 
candidates.

Arina also want to have the team home page to include current PMC, PMC chair, 
committer information and filed a JIRA for it.


Then John did an interesting demo about how he setup a Mesos managed Drill 
cluster and access the cluster through Jupyter notebook

to run SQL queries in Drill and plot results directly in Jupyter notebook. 
Since the demo is so successful and Paul already had some conversation

with John about Mesos managing Drill cluster,  there may be some follow up 
works to come.


John, could you share the links about how you did Mesos managed cluster, and 
how you run Jupyter notebook to make Spark and Drill

communicate with each other via Spark's dataframe?  I believe the broader 
community will be very interested in the work too.


Best,


Chunhui

Re: Drill hangout today

2017-06-27 Thread Chunhui Shi

As usual, the links are:


Hangout link - 
https://plus.google.com/hangouts/_/event/ci4rdiju8bv04a64efj5fedd0lc

Minutes will be posted at 
https://docs.google.com/document/d/1o2GvZUtJvKzN013JdM715ZBzhseT0VyZ9WgmLMeeUUk/edit?ts=5744c15c#heading=h.z8q6drmaybbj

Signup rotation for leading the hangouts is at 
https://docs.google.com/spreadsheets/d/1bEQKk16Kktb1XeZwKD8xCuhaO8FtNfF1Cr2rcTv1a6M/edit#gid=0


From: Chunhui Shi <c...@mapr.com>
Sent: Tuesday, June 27, 2017 9:53:37 AM
To: dev@drill.apache.org; u...@drill.apache.org
Subject: Drill hangout today

We are going to have a hangout today at 10:00am Pacific Time. Please feel free 
to raise topics of interests.

Chunhui

Re: [ANNOUNCE] New Committer: Paul Rogers

2017-05-19 Thread Chunhui Shi

Congrats Paul! Thank you for your contributions!


From: rahul challapalli 
Sent: Friday, May 19, 2017 9:20:52 AM
To: dev
Subject: Re: [ANNOUNCE] New Committer: Paul Rogers

Congratulations Paul. Well Deserved.

On Fri, May 19, 2017 at 8:46 AM, Gautam Parai  wrote:

> Congratulations Paul and thank you for your contributions!
>
>
> Gautam
>
> 
> From: Abhishek Girish 
> Sent: Friday, May 19, 2017 8:27:05 AM
> To: dev@drill.apache.org
> Subject: Re: [ANNOUNCE] New Committer: Paul Rogers
>
> Congrats Paul!
>
> On Fri, May 19, 2017 at 8:23 AM, Charles Givre  wrote:
>
> > Congrats Paul!!
> >
> > On Fri, May 19, 2017 at 11:22 AM, Aman Sinha 
> wrote:
> >
> > > The Project Management Committee (PMC) for Apache Drill has invited
> Paul
> > > Rogers to become a committer, and we are pleased to announce that he
> has
> > > accepted.
> > >
> > > Paul has a long list of contributions that have touched many aspects of
> > the
> > > product.
> > >
> > > Welcome Paul, and thank you for your contributions.  Keep up the good
> > work
> > > !
> > >
> > > - Aman
> > >
> > > (on behalf of the Apache Drill PMC)
> > >
> >
>

[GitHub] drill pull request #796: DRILL-5365: DrillFileSystem setConf in constructor....

2017-05-05 Thread chunhui-shi

Github user chunhui-shi commented on a diff in the pull request:

https://github.com/apache/drill/pull/796#discussion_r115107663
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/dfs/DrillFileSystem.java
 ---
@@ -89,22 +89,36 @@ public DrillFileSystem(Configuration fsConf) throws 
IOException {
   }
 
   public DrillFileSystem(Configuration fsConf, OperatorStats 
operatorStats) throws IOException {
-this.underlyingFs = FileSystem.get(fsConf);
+this(fsConf, URI.create(fsConf.getRaw(FS_DEFAULT_NAME_KEY)), 
operatorStats);
+  }
+
+  public DrillFileSystem(Configuration fsConf, URI Uri, OperatorStats 
operatorStats) throws IOException {
--- End diff --

Yes, this can be removed


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] drill pull request #797: DRILL-5286: No need to convert when the relNode and...

2017-03-30 Thread chunhui-shi

Github user chunhui-shi commented on a diff in the pull request:

https://github.com/apache/drill/pull/797#discussion_r109025164
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/SubsetTransformer.java
 ---
@@ -45,15 +48,20 @@ public RelTraitSet newTraitSet(RelTrait... traits) {
 
   }
 
-  boolean go(T n, RelNode candidateSet) throws E {
+  public boolean go(T n, RelNode candidateSet) throws E {
 if ( !(candidateSet instanceof RelSubset) ) {
   return false;
 }
 
 boolean transform = false;
+Set transformedRels = Sets.newHashSet();
 for (RelNode rel : ((RelSubset)candidateSet).getRelList()) {
   if (isPhysical(rel)) {
 RelNode newRel = RelOptRule.convert(candidateSet, 
rel.getTraitSet().plus(Prel.DRILL_PHYSICAL));
+if(transformedRels.contains(newRel)) {
--- End diff --

Change to use newIdentityHashSet. 

And we could not simply mark a node is isTransformed, since there could be 
many rules transforming the same node. Maintaining a map of applied rules in 
relNode might not be a good idea since remembering what rule has been applied 
should not be the job of a relNode(set). So what about maintaining such map in 
planner? Not a good idea since the same node might need to be converted into 
different sets. So I would think leaving this part as is.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] drill pull request #797: DRILL-5286: No need to convert when the relNode and...

2017-03-25 Thread chunhui-shi

Github user chunhui-shi commented on a diff in the pull request:

https://github.com/apache/drill/pull/797#discussion_r108052498
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/SubsetTransformer.java
 ---
@@ -45,15 +48,20 @@ public RelTraitSet newTraitSet(RelTrait... traits) {
 
   }
 
-  boolean go(T n, RelNode candidateSet) throws E {
+  public boolean go(T n, RelNode candidateSet) throws E {
 if ( !(candidateSet instanceof RelSubset) ) {
   return false;
 }
 
 boolean transform = false;
+Set transformedRels = Sets.newHashSet();
 for (RelNode rel : ((RelSubset)candidateSet).getRelList()) {
   if (isPhysical(rel)) {
 RelNode newRel = RelOptRule.convert(candidateSet, 
rel.getTraitSet().plus(Prel.DRILL_PHYSICAL));
+if(transformedRels.contains(newRel)) {
--- End diff --

this " if(transformedRels.contains(newRel))" is to check if the newRel is 
the _same_ object we got before, it is not to check a 'equal' node. So no 
hashCode or equal function required.

And yes, I did step into this code and it is doing what we want: for the 
same node(n) to be converted in below convertChild() and the same equivalent 
set(newRel), we don't need to run the rule again, since we are going to get the 
same result, and that result will be added to the same set. So we are very sure 
we could skip convertChild() for this pair of input(n, newRel) since this has 
been called before.

In some cases it can reduce the same rule from running 16 times to 4 times 
and saved hundreds ms.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] drill pull request #798: DRILL-5297: when the generated plan mismatches, Pla...

2017-03-24 Thread chunhui-shi

GitHub user chunhui-shi opened a pull request:

https://github.com/apache/drill/pull/798

DRILL-5297: when the generated plan mismatches, PlanTest print the geâ¦

â¦nerated plan along with expected pattern

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/chunhui-shi/drill DRILL-5297-pull

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/drill/pull/798.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #798


commit 5b92e13881c4c09e90f74073c436d374e7fd2254
Author: chunhui-shi <c...@maprtech.com>
Date:   2017-03-25T01:40:15Z

DRILL-5297: when the generated plan mismatches, PlanTest print the 
generated plan along with expected pattern




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] drill pull request #797: DRILL-5286: No need to convert when the relNode and...

2017-03-24 Thread chunhui-shi

GitHub user chunhui-shi opened a pull request:

https://github.com/apache/drill/pull/797

DRILL-5286: No need to convert when the relNode and target candidate â¦

â¦set are the same

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/chunhui-shi/drill DRILL-5286

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/drill/pull/797.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #797


commit 9da01d602ed5e21a5525b00707127f0cd6211a2a
Author: chunhui-shi <c...@maprtech.com>
Date:   2017-03-25T01:09:04Z

DRILL-5286: No need to convert when the relNode and target candidate set 
are the same




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] drill pull request #796: DRILL-5365: DrillFileSystem setConf in constructor....

2017-03-24 Thread chunhui-shi

GitHub user chunhui-shi opened a pull request:

https://github.com/apache/drill/pull/796

DRILL-5365: DrillFileSystem setConf in constructor. DrillFileSystem câ¦

â¦ould be created based on provided URI.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/chunhui-shi/drill DRILL_5365

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/drill/pull/796.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #796


commit e91755a4668346636dc1a33b6b0a86ad6a5654e2
Author: chunhui-shi <c...@maprtech.com>
Date:   2017-03-02T00:53:23Z

DRILL-5365: DrillFileSystem setConf in constructor. DrillFileSystem could 
be created based on provided URI.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] drill pull request #795: DRILL-5089: Get only partial schemas of relevant st...

2017-03-24 Thread chunhui-shi

GitHub user chunhui-shi opened a pull request:

https://github.com/apache/drill/pull/795

DRILL-5089: Get only partial schemas of relevant storage plugins instâ¦

â¦ead of all storages ahead.

1. For each query, rootSchema is empty, add schemas of a storage plugin 
only when needed -- when asked for a schemaPath.

2. Allow mock environments to provide dynamic schemas.

3. SchemaUtils.findSchema used in many Sql handlers now is handled by 
SqlConverter.catalog to allow expand schema dynamically

4. Temp table resolve also could not take a schema tree and assume it is 
complete.

NOTE: pom.xml in this pull request is pointing to temporary calcite versoin 
'r20-test', this should change to 'r20' once this pull request is ready to 
commit

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/chunhui-shi/drill DRILL-5089

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/drill/pull/795.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #795


commit 4814e37eefd16ee7aa1ba6d21d66211e41922dbf
Author: chunhui-shi <c...@maprtech.com>
Date:   2017-01-06T09:13:02Z

DRILL-5089: Get only partial schemas of relevant storage plugins instead of 
all storages ahead.

1. For each query, rootSchema is empty, add schemas of a storage plugin 
only when needed -- when asked for a schemaPath.

2. Allow mock environments to provide dynamic schemas.

3. SchemaUtils.findSchema used in many Sql handlers now is handled by 
SqlConverter.catalog to allow expand schema dynamically

4. Temp table resolve also could not take a schema tree and assume it is 
complete.

NOTE: pom.xml in this pull request is pointing to temporary calcite versoin 
'r20-test', this should change to 'r20' once this pull request is ready to 
commit




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[jira] [Created] (DRILL-5383) Several impersonation unit tests fail in unit test

2017-03-24 Thread Chunhui Shi (JIRA)

Chunhui Shi created DRILL-5383:
--

 Summary: Several impersonation unit tests fail in unit test
 Key: DRILL-5383
 URL: https://issues.apache.org/jira/browse/DRILL-5383
 Project: Apache Drill
  Issue Type: Bug
Reporter: Chunhui Shi
Assignee: Sudheesh Katkam
Priority: Critical


Run several round unit tests and got these errors:
Failed tests: 
  TestInboundImpersonationPrivileges.twoTargetGroups:135->run:62 proxyName: 
user3_2 targetName: user4_2 expected: but was:
  TestInboundImpersonationPrivileges.oneTargetGroup:118->run:62 proxyName: 
user5_1 targetName: user4_2 expected: but was:
  TestInboundImpersonationPrivileges.twoTargetUsers:126->run:62 proxyName: 
user5_2 targetName: user0_2 expected: but was:

Tests in error: 
  
TestDrillbitResilience.memoryLeaksWhenCancelled:890->assertCancelledWithoutException:532
 » 
  TestInboundImpersonation.selectChainedView:136 »  
org.apache.drill.common.exce...
  
TestImpersonationQueries.org.apache.drill.exec.impersonation.TestImpersonationQueries
 » UserRemote

Notice that if I run unit test in my setup, which has different settings.xml 
for maven to point to our internal repository I often (maybe 1 out of 2 runs) 
got different error at TestOptionsAuthEnabled#updateSysOptAsUserInAdminGroup, 

Since the error is quite consistent when the unit test is built on different 
nodes. I guess when we introduce more jars(kerby?) for unit tests we may not do 
the exclusion enough so conflicts are different for different builds.

We should be able to find out why it failed by remote debugging into these 
particular tests.

If we could not address this issue in one or two days, this JIRA should be used 
to disable these tests for now. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

3/21 Hangout starts now

2017-03-21 Thread Chunhui Shi

Hi,

I don't have topic for now. If you have anything want to raise for discussion 
please reply to the email or join the hangout at 
https://hangouts.google.com/hangouts/_/event/ci4rdiju8bv04a64efj5fedd0lc


Thanks,

Chunhui

Re: Drill date & time types encoding

2017-03-16 Thread Chunhui Shi

I think they are using the same timezone data from IANA,

For Java the timezone data can be found under jre/lib/zi, and Oracle has a 
timezone update tool too.
For enterprise software vendors, timezone update actually is a big thing.

From: Boaz Ben-Zvi 
Sent: Thursday, March 16, 2017 6:08:01 PM
To: dev@drill.apache.org
Subject: Re: Drill date & time types encoding

  Timezone calculations are not simple ( e.g.,  “2017-03-11 23:30:00-PST” + 
INTERVAL ‘3’ HOURS  --> need to know about daylight savings time, etc.)

  Linux does have a timezone. The actual implementation is quite complex – it 
keeps an elaborate “database” under /usr/share/zoneinfo , (which needs to be 
updated periodically, e.g. by running “yum update tzdata”).

Is Java’s TZ support 
(https://docs.oracle.com/javase/8/docs/api/java/util/TimeZone.html) equivalent 
to Linux ?

-   Boaz

On 3/16/17, 4:48 PM, "Paul Rogers"  wrote:

Thanks all for the explanations!

Did a bit of poking around. See DRILL-5360. For the Timestamp type:

* Literals are claimed to be in UTC (have not yet tested)

* Value vectors store Timestamps in server local time

* Drill clients get the Timestamp in server local time

* JDBC clients try to convert server local time to UTC, but use the client 
timezone to do so.

The result is that clients must know the server timezone, but Drill does 
not provide this info. Drill clients must convert from server timezone to UTC 
to get the UTC value of a timestamp.

JDBC clients must convert from the “UTC” given from JDBC to true UTC by 
subtracting the difference between server and client timezone offsets.

I suspect, as Jinfeng points out, that much of the confusion comes from the 
conflicting use of the term “timestamp” in the SQL 2011 standard [1] and 
standard Linux/Java practice.

In Linux and Java, a “timestamp” is ms since the Unix epoch, UTC. (That is, 
the UTC timestamp is implied, so all machines anywhere agree on what a time 
means.)

The SQL TIMESTAMP is what most databases call a DATETIME: a combination of 
a date and time that are “free floating”: there is no implied time zone. “3 PM” 
is just that, it does not imply “3 PM in Paris.”

SQL provides a TIMESTAMP WITH TIME ZONE, but that also differs from Linux 
practice: it is not a UTC time but rather a DATETIME with a associated timezone.

The Drill Timestamp is neither of these. It like a TIMESTAMP WITH TIMEZONE 
where the timezone is the server local timezone. But, Drill does not specify 
that timezone, so the client “just has to know.” Unlike the Linux timestamp, 
the client & server don’t agree ahead of time by convention; instead every 
server can have its own Timestamp timezone and the client must figure out the 
corresponding UTC or client local time.

What we have can work with clever adjustment programming. But it would be 
better (for wider adoption) to provide a cleaner, more deterministic API.

Unfortunately, we probably can’t fix the existing Timestamp as there is 
probably already code that tries (like JDBC) to work around the current 
behavior.

Instead, we should add the SQL TIMESTAMP WITH TIMEZONE. Or add a 
non-standard “LinuxTimezone” (or “TimezoneUTC”) that sores times in an 
agreed-upon UTC format.

Until then, tread carefully.

- Paul

[1] 
http://standards.iso.org/ittf/PubliclyAvailableStandards/c053681_ISO_IEC_9075-1_2011.zip

> On Mar 16, 2017, at 4:25 PM, Jinfeng Ni  wrote:

>

> My understanding is TIME/TIMESTAMP in Drill is TIME/TIMESTAMP without

> timezone. TimeStampTZ is for TIMESTAMP with timezone, which Drill

> probably does not fully support.

>

> SQL standards has  DATE, TIME WITHOUT TIME ZONE, TIMESTAMP WITHOUT

> TIME ZONE, TIME WITH TIME ZONE, or TIMESTAMP WITH TIME ZONE.

> Time/Timestamp without t/z should be interpreted as local time.

>

> Here is some descriptions in SQL 2011 : Sec 4.6.2.

>

> "

>

> A datetime data type that specifies WITH TIME ZONE is a data type that

> is datetime with time zone, while a datetime data type that specifies

> WITHOUT TIME ZONE is a data type that is datetime without time zone.

>

> The surface of the earth is divided into zones, called time zones, in

> which every correct clock tells the same time, known as local time.

> Local time is equal to UTC (Coordinated Universal Time) plus the time

> zone dis- placement, which is an interval value that ranges between

> INTERVAL '–14:00' HOUR TO MINUTE and INTERVAL '+14:00' HOUR TO MINUTE.

>

> A datetime value, of data type TIME WITHOUT TIME ZONE or TIMESTAMP

> WITHOUT TIME ZONE, may represent a local time, whereas a datetime

> value of data type TIME WITH TIME ZONE or TIMESTAMP WITH TIME ZONE

> represents UTC.

>

[GitHub] drill pull request #784: DRILL-5355: Misc. code cleanup

2017-03-14 Thread chunhui-shi

Github user chunhui-shi commented on a diff in the pull request:

https://github.com/apache/drill/pull/784#discussion_r106044432
  
--- Diff: 
logical/src/main/java/org/apache/drill/common/expression/FieldReference.java ---
@@ -1,4 +1,4 @@
-/**
--- End diff --

I have seen you changed '/**' to '/*' in several pull requests. What is the 
reason for that? I saw other Apache projects, e.g. Hive, use the previous 
style.  


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] drill pull request #784: DRILL-5355: Misc. code cleanup

2017-03-14 Thread chunhui-shi

Github user chunhui-shi commented on a diff in the pull request:

https://github.com/apache/drill/pull/784#discussion_r106044402
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/visitor/PrelVisualizerVisitor.java
 ---
@@ -0,0 +1,217 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.planner.physical.visitor;
+
+import org.apache.drill.exec.planner.physical.ExchangePrel;
+import org.apache.drill.exec.planner.physical.JoinPrel;
+import org.apache.drill.exec.planner.physical.Prel;
+import org.apache.drill.exec.planner.physical.ProjectPrel;
+import org.apache.drill.exec.planner.physical.ScanPrel;
+import org.apache.drill.exec.planner.physical.ScreenPrel;
+import org.apache.drill.exec.planner.physical.WriterPrel;
+
+/**
+ * Debug-time class that prints a PRel tree to the console for
+ * inspection. Insert this into code during development to see
+ * the state of the tree at various points of interest during
+ * the planning process.
+ */
+
+public class PrelVisualizerVisitor
--- End diff --

Would like to see a unit test for this class.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[jira] [Created] (DRILL-5353) Merge "Project on Project" generated in physical plan stage

2017-03-13 Thread Chunhui Shi (JIRA)

Chunhui Shi created DRILL-5353:
--

 Summary: Merge "Project on Project" generated in physical plan 
stage
 Key: DRILL-5353
 URL: https://issues.apache.org/jira/browse/DRILL-5353
 Project: Apache Drill
  Issue Type: Bug
Reporter: Chunhui Shi
Assignee: Chunhui Shi


There is possibility physical plan stage we will get a project-on-project plan. 
But the ProjectMergeRule(DrillMergeProjectRule) is only for logical planning. 
We need to apply the rule in physical plan stage as well.

And even after planning stage, the JoinPrelRenameVisitor could also inject 
extra Project which can be merged with (if there is one) Project underneath.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[GitHub] drill issue #776: DRILL-5165: limitRel to return correct rows for limit all ...

2017-03-08 Thread chunhui-shi

Github user chunhui-shi commented on the issue:

https://github.com/apache/drill/pull/776

It failed without the fix.

From: Jinfeng Ni <notificati...@github.com>
Sent: Wednesday, March 8, 2017 10:56:06 AM
To: apache/drill
    Cc: Chunhui Shi; Author
Subject: Re: [apache/drill] DRILL-5165: limitRel to return correct rows for 
limit all case (#776)

@jinfengni commented on this pull request.

In 
exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/limit/TestLimitWithExchanges.java<https://github.com/apache/drill/pull/776#discussion_r104993761>:

> @@ -125,6 +125,15 @@ public void testLimitImpactExchange() throws 
Exception {
 }
   }

+  @Test
+  public void TestLimitAllOnParquet() throws Exception {

Have you tried this new testcase without the change above? Will it 
successful or fail?

â
You are receiving this because you authored the thread.
Reply to this email directly, view it on 
GitHub<https://github.com/apache/drill/pull/776#pullrequestreview-25853026>, or 
mute the 
thread<https://github.com/notifications/unsubscribe-auth/AJa9A-HOKR8SCsdrPSb_457T4kbuelsBks5rjvnGgaJpZM4MWjjd>.

---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] drill pull request #776: DRILL-5165: limitRel to return correct rows for lim...

2017-03-08 Thread chunhui-shi

GitHub user chunhui-shi opened a pull request:

https://github.com/apache/drill/pull/776

DRILL-5165: limitRel to return correct rows for limit all case



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/chunhui-shi/drill DRILL-5165

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/drill/pull/776.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #776


commit cd10342fbcf4b3779b006515fb940137d9035157
Author: chunhui-shi <c...@maprtech.com>
Date:   2017-03-08T07:39:32Z

DRILL-5165: limitRel to return correct rows for limit all case




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[jira] [Created] (DRILL-5328) Trim down physical plan size - replace StoragePluginConfig with storage name

2017-03-07 Thread Chunhui Shi (JIRA)

Chunhui Shi created DRILL-5328:
--

 Summary: Trim down physical plan size - replace 
StoragePluginConfig with storage name
 Key: DRILL-5328
 URL: https://issues.apache.org/jira/browse/DRILL-5328
 Project: Apache Drill
  Issue Type: Improvement
Reporter: Chunhui Shi


For a physical plan, we now pass StoragePluginConfig as part of plan, then the 
destination use the config to fetch the storage plugin in 
StoragePluginRegistry. However, we can also fetch a storage plugin with the 
name which is identical to all Drillbits. 

In the example of simple physical plan of 150 lines shown below,  the storage 
plugin config took 60 lines. In a typical large system, FileSystem's 
StoragePluginConfig could be >500 lines. So this improvement should save the 
cost of passing a larger physical plan among nodes.

0: jdbc:drill:zk=10.10.88.126:5181> explain plan for select * from 
dfs.tmp.employee1 where last_name='Blumberg';
+--+--+
| text | json |
+--+--+
| 00-00Screen
00-01  Project(*=[$0])
00-02Project(T1¦¦*=[$0])
00-03  SelectionVectorRemover
00-04Filter(condition=[=($1, 'Blumberg')])
00-05  Project(T1¦¦*=[$0], last_name=[$1])
00-06Scan(groupscan=[ParquetGroupScan 
[entries=[ReadEntryWithPath [path=/tmp/employee1/0_0_0.parquet]], 
selectionRoot=/tmp/employee1, numFiles=1, usedMetadataFile=true, 
cacheFileRoot=/tmp/employee1, columns=[`*`]]])
 | {
  "head" : {
"version" : 1,
"generator" : {
  "type" : "ExplainHandler",
  "info" : ""
},
"type" : "APACHE_DRILL_PHYSICAL",
"options" : [ ],
"queue" : 0,
"resultMode" : "EXEC"
  },
  "graph" : [ {
"pop" : "parquet-scan",
"@id" : 6,
"userName" : "root",
"entries" : [ {
  "path" : "/tmp/employee1/0_0_0.parquet"
} ],
"storage" : {
  "type" : "file",
  "enabled" : true,
  "connection" : "maprfs:///",
  "config" : null,
  "workspaces" : {
"root" : {
  "location" : "/",
  "writable" : false,
  "defaultInputFormat" : null
},
"tmp" : {
  "location" : "/tmp",
  "writable" : true,
  "defaultInputFormat" : null
},
"shi" : {
  "location" : "/user/shi",
  "writable" : true,
  "defaultInputFormat" : null
},
"dir700" : {
  "location" : "/user/shi/dir700",
  "writable" : true,
  "defaultInputFormat" : null
},
"dir775" : {
  "location" : "/user/shi/dir775",
  "writable" : true,
  "defaultInputFormat" : null
},
"xyz" : {
  "location" : "/user/xyz",
  "writable" : true,
  "defaultInputFormat" : null
}
  },
  "formats" : {
"psv" : {
  "type" : "text",
  "extensions" : [ "tbl" ],
  "delimiter" : "|"
},
"csv" : {
  "type" : "text",
  "extensions" : [ "csv" ],
  "delimiter" : ","
},
"tsv" : {
  "type" : "text",
  "extensions" : [ "tsv" ],
  "delimiter" : "\t"
},
"parquet" : {
  "type" : "parquet"
},
"json" : {
  "type" : "json",
  "extensions" : [ "json" ]
},
"maprdb" : {
  "type" : "maprdb"
}
  }
},
"format" : {
  "type" : "parquet"
},
"columns" : [ "`*`" ],
"selectionRoot" : "/tmp/employee1",
"filter" : "true",
"fileSet" : [ "/tmp/employee1/0_0_0.parquet" ],
"files" : [ "/tmp/employee1/0_0_0.parquet" ],
"cost" : 1155.0
  }, {
"pop" : "project",
"@id" : 5,
"exprs" : [ {
  "ref" : "`T1¦¦*`",
  "expr&qu

[jira] [Created] (DRILL-5297) Print the plan text when plan pattern check fails in unit tests

2017-02-24 Thread Chunhui Shi (JIRA)

Chunhui Shi created DRILL-5297:
--

 Summary: Print the plan text when plan pattern check fails in unit 
tests 
 Key: DRILL-5297
 URL: https://issues.apache.org/jira/browse/DRILL-5297
 Project: Apache Drill
  Issue Type: Bug
Reporter: Chunhui Shi
Assignee: Chunhui Shi


If we have a unit test did not generate expected plan, we will print only the 
expected pattern like this:

Did not find expected pattern in plan: Scan.*FindLimit0Visitor"

We should also print the plan here for debugging purpose.




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[GitHub] drill pull request #750: DRILL-5273: CompliantTextReader excessive memory us...

2017-02-21 Thread chunhui-shi

Github user chunhui-shi commented on a diff in the pull request:

https://github.com/apache/drill/pull/750#discussion_r102377194
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/easy/text/compliant/CompliantTextRecordReader.java
 ---
@@ -118,12 +118,21 @@ public boolean apply(@Nullable SchemaPath path) {
* @param outputMutator  Used to create the schema in the output record 
batch
* @throws ExecutionSetupException
*/
+  @SuppressWarnings("resource")
   @Override
   public void setup(OperatorContext context, OutputMutator outputMutator) 
throws ExecutionSetupException {
 
 oContext = context;
-readBuffer = context.getManagedBuffer(READ_BUFFER);
-whitespaceBuffer = context.getManagedBuffer(WHITE_SPACE_BUFFER);
+// Note: DO NOT use managed buffers here. They remain in existence
+// until the fragment is shut down. The buffers here are large.
--- End diff --

I think the reason you chose to use context.getAllocator() was you don't 
want to fragmentize managed buffer?
Otherwise you might just call readBuffer.close()? Was there any problem 
with managed buffer's release? Just curious about the "DO NOT use managed 
buffer here" part. Besides that, +1.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[jira] [Created] (DRILL-5286) When rel and target candidate set is the same, planner should not need to do convert for the relNode since it must have been done

2017-02-21 Thread Chunhui Shi (JIRA)

Chunhui Shi created DRILL-5286:
--

 Summary: When rel and target candidate set is the same, planner 
should not need to do convert for the relNode since it must have been done
 Key: DRILL-5286
 URL: https://issues.apache.org/jira/browse/DRILL-5286
 Project: Apache Drill
  Issue Type: Bug
Reporter: Chunhui Shi
Assignee: Chunhui Shi






--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[GitHub] drill issue #594: DRILL-4842: SELECT * on JSON data results in NumberFormatE...

2017-02-06 Thread chunhui-shi

Github user chunhui-shi commented on the issue:

https://github.com/apache/drill/pull/594
  
+1. LGTM. Need to address conflict before ready to commit.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] drill pull request #741: DRILL-5196: init MongoDB cluster when run a single ...

2017-02-03 Thread chunhui-shi

Github user chunhui-shi commented on a diff in the pull request:

https://github.com/apache/drill/pull/741#discussion_r99455997
  
--- Diff: 
contrib/storage-mongo/src/test/java/org/apache/drill/exec/store/mongo/MongoTestSuit.java
 ---
@@ -204,16 +209,20 @@ private static void cleanup() {
   @BeforeClass
   public static void initMongo() throws Exception {
 synchronized (MongoTestSuit.class) {
-  if (distMode) {
-logger.info("Executing tests in distributed mode");
-DistributedMode.setup();
-  } else {
-logger.info("Executing tests in single mode");
-SingleMode.setup();
+  if (initCount.get() == 0) {
--- End diff --

Then it will increase the count when the init is not done. If init fail in 
the middle, then another test has no chance to try.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] drill pull request #741: DRILL-5196: init MongoDB cluster when run a single ...

2017-02-03 Thread chunhui-shi

Github user chunhui-shi commented on a diff in the pull request:

https://github.com/apache/drill/pull/741#discussion_r99453293
  
--- Diff: 
contrib/storage-mongo/src/test/java/org/apache/drill/exec/store/mongo/MongoTestSuit.java
 ---
@@ -234,15 +243,25 @@ private static void createDbAndCollections(String 
dbName,
 
   @AfterClass
   public static void tearDownCluster() throws Exception {
-if (mongoClient != null) {
-  mongoClient.dropDatabase(EMPLOYEE_DB);
-  mongoClient.close();
-}
 synchronized (MongoTestSuit.class) {
-  if (distMode) {
-DistributedMode.cleanup();
-  } else {
-SingleMode.cleanup();
+  if (initCount.decrementAndGet() == 0) {
+try {
+  if (mongoClient != null) {
--- End diff --

Then we will end up not to cleanup the cluster if mongoClient for some 
reason was set to null? I think it is better leave it as is.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] drill pull request #741: DRILL-5196: init MongoDB cluster when run a single ...

2017-02-03 Thread chunhui-shi

Github user chunhui-shi commented on a diff in the pull request:

https://github.com/apache/drill/pull/741#discussion_r99452939
  
--- Diff: 
contrib/storage-mongo/src/test/java/org/apache/drill/exec/store/mongo/MongoTestSuit.java
 ---
@@ -234,15 +243,25 @@ private static void createDbAndCollections(String 
dbName,
 
   @AfterClass
   public static void tearDownCluster() throws Exception {
-if (mongoClient != null) {
-  mongoClient.dropDatabase(EMPLOYEE_DB);
-  mongoClient.close();
-}
 synchronized (MongoTestSuit.class) {
-  if (distMode) {
-DistributedMode.cleanup();
-  } else {
-SingleMode.cleanup();
+  if (initCount.decrementAndGet() == 0) {
--- End diff --

Without Synchronized, there is still chance two threads go into this place. 
while thread 0 go to here and is in the middle of tearing down cluster, then 
thread 1 increases initCount to 1 and somehow report error then come to here to 
decrease initCount and continue... Then we don't know what will happen.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] drill pull request #702: DRILL-5088: set default codec for toJson

2017-02-03 Thread chunhui-shi

Github user chunhui-shi commented on a diff in the pull request:

https://github.com/apache/drill/pull/702#discussion_r99448154
  
--- Diff: 
contrib/storage-mongo/src/test/java/org/apache/drill/exec/store/mongo/TestTableGenerator.java
 ---
@@ -58,7 +59,16 @@ public static void generateTable(String dbName, String 
collection,
 .jsonArray(jsonArray).importFile(jsonFile).build();
 MongoImportExecutable importExecutable = MongoImportStarter
 .getDefaultInstance().prepare(mongoImportConfig);
-importExecutable.start();
+MongoImportProcess importProcess = importExecutable.start();
+
+try {
+  while (importProcess.isProcessRunning()) {
+Thread.sleep(1000);
+  }
+}catch (Exception ex) {
+  logger.error("Import mongoDb failed", ex);
--- End diff --

Paul, since there are two JIRAs here, I filed a new pull request  
https://github.com/apache/drill/pull/741 which addressed your comments. Could 
you look at that pull request? Thanks.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] drill pull request #741: DRILL-5196: init MongoDB cluster when run a single ...

2017-02-03 Thread chunhui-shi

GitHub user chunhui-shi opened a pull request:

https://github.com/apache/drill/pull/741

DRILL-5196: init MongoDB cluster when run a single test case directlyâ¦

â¦ through command line or IDE.

Other fixes include:
Sync mongo-java-driver versions to newer 3.2.0
update flapdoodle package to latest accordingly

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/chunhui-shi/drill DRILL-5196

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/drill/pull/741.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #741


commit 6068d7a21a0cf45ba9f5f3c76ad48dbd94c6e204
Author: chunhui-shi <c...@maprtech.com>
Date:   2017-01-14T01:20:46Z

DRILL-5196: init MongoDB cluster when run a single test case directly 
through command line or IDE.
Other fixes include:
Sync mongo-java-driver versions to newer 3.2.0
update flapdoodle package to latest accordingly




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] drill pull request #722: DRILL-5196: init MongoDB cluster when run a single ...

2017-02-03 Thread chunhui-shi

Github user chunhui-shi closed the pull request at:

https://github.com/apache/drill/pull/722


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

Re: Storage Plugin for accessing Hive ORC Table from Drill

2017-01-21 Thread Chunhui Shi

I guess you are using Hive 2.0 as meta server while Drill has only 1.2 
libraries.


In Hive 2.0 above, This delta format could have more than one '_' as separator 
while 1.2 has only one '_'.


I think Drill should eventually update to use Hive's 2.0/2.1 libraries.


From: Anup Tiwari 
Sent: Friday, January 20, 2017 10:07:50 PM
To: u...@drill.apache.org; dev@drill.apache.org
Subject: Re: Storage Plugin for accessing Hive ORC Table from Drill

@Andries, We are using Hive 2.1.1 with Drill 1.9.0.

@Zelaine, Could this be a problem in your Hive metastore?--> As i mentioned
earlier, i am able to read hive parquet tables in Drill through hive
storage plugin. So can you tell me a bit more like which type of
configuration i am missing in metastore?

Regards,
*Anup Tiwari*

On Sat, Jan 21, 2017 at 4:56 AM, Zelaine Fong  wrote:

> The stack trace shows the following:
>
> Caused by: org.apache.drill.common.exceptions.DrillRuntimeException:
> java.io.IOException: Failed to get numRows from HiveTable
>
> The Drill optimizer is trying to read rowcount information from Hive.
> Could this be a problem in your Hive metastore?
>
> Has anyone else seen this before?
>
> -- Zelaine
>
> On 1/20/17, 7:35 AM, "Andries Engelbrecht"  wrote:
>
> What version of Hive are you using?
>
>
> --Andries
>
> 
> From: Anup Tiwari 
> Sent: Friday, January 20, 2017 3:00:43 AM
> To: u...@drill.apache.org; dev@drill.apache.org
> Subject: Re: Storage Plugin for accessing Hive ORC Table from Drill
>
> Hi,
>
> Please find below Create Table Statement and subsequent Drill Error :-
>
> *Table Structure :*
>
> CREATE TABLE `logindetails_all`(
>   `sid` char(40),
>   `channel_id` tinyint,
>   `c_t` bigint,
>   `l_t` bigint)
> PARTITIONED BY (
>   `login_date` char(10))
> CLUSTERED BY (
>   channel_id)
> INTO 9 BUCKETS
> ROW FORMAT SERDE
>   'org.apache.hadoop.hive.ql.io.orc.OrcSerde'
> STORED AS INPUTFORMAT
>   'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat'
> OUTPUTFORMAT
>   'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat'
> LOCATION
>   'hdfs://hostname1:9000/usr/hive/warehouse/logindetails_all'
> TBLPROPERTIES (
>   'compactorthreshold.hive.compactor.delta.num.threshold'='6',
>   'compactorthreshold.hive.compactor.delta.pct.threshold'='0.5',
>   'transactional'='true',
>   'transient_lastDdlTime'='1484313383');
> ;
>
> *Drill Error :*
>
> *Query* : select * from hive.logindetails_all limit 1;
>
> *Error :*
> 2017-01-20 16:21:12,625 [277e145e-c6bc-3372-01d0-6c5b75b92d73:foreman]
> INFO  o.a.drill.exec.work.foreman.Foreman - Query text for query id
> 277e145e-c6bc-3372-01d0-6c5b75b92d73: select * from
> hive.logindetails_all
> limit 1
> 2017-01-20 16:21:12,831 [277e145e-c6bc-3372-01d0-6c5b75b92d73:foreman]
> ERROR o.a.drill.exec.work.foreman.Foreman - SYSTEM ERROR:
> NumberFormatException: For input string: "004_"
>
>
> [Error Id: 53fa92e1-477e-45d2-b6f7-6eab9ef1da35 on
> prod-hadoop-101.bom-prod.aws.games24x7.com:31010]
> org.apache.drill.common.exceptions.UserException: SYSTEM ERROR:
> NumberFormatException: For input string: "004_"
>
>
> [Error Id: 53fa92e1-477e-45d2-b6f7-6eab9ef1da35 on
> prod-hadoop-101.bom-prod.aws.games24x7.com:31010]
> at
> org.apache.drill.common.exceptions.UserException$
> Builder.build(UserException.java:543)
> ~[drill-common-1.9.0.jar:1.9.0]
> at
> org.apache.drill.exec.work.foreman.Foreman$ForemanResult.
> close(Foreman.java:825)
> [drill-java-exec-1.9.0.jar:1.9.0]
> at
> org.apache.drill.exec.work.foreman.Foreman.moveToState(
> Foreman.java:935)
> [drill-java-exec-1.9.0.jar:1.9.0]
> at org.apache.drill.exec.work.foreman.Foreman.run(Foreman.
> java:281)
> [drill-java-exec-1.9.0.jar:1.9.0]
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(
> ThreadPoolExecutor.java:1142)
> [na:1.8.0_72]
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(
> ThreadPoolExecutor.java:617)
> [na:1.8.0_72]
> at java.lang.Thread.run(Thread.java:745) [na:1.8.0_72]
> Caused by: org.apache.drill.exec.work.foreman.ForemanException:
> Unexpected
> exception during fragment initialization: Internal error: Error while
> applying rule DrillPushProjIntoScan, args
> [rel#4220197:LogicalProject.NONE.ANY([]).[](input=rel#
> 4220196:Subset#0.ENUMERABLE.ANY([]).[],sid=$0,channel_id=$
> 1,c_t=$2,l_t=$3,login_date=$4),
> rel#4220181:EnumerableTableScan.ENUMERABLE.ANY([]).[](table=[hive,
> logindetails_all])]
> ... 4 common frames omitted
> Caused by: java.lang.AssertionError: Internal error: Error while
> applying
> rule

1 2 3 >

1 - 100 of 216 matches

Mail list logo