from:"Ben\-Zvi"

Re: [VOTE] Release Apache Drill 1.18.0 - RC0

2020-09-02 Thread Ben-Zvi, Boaz

Hi Abhishek,

   Downloaded both the binary and src tarballs, and verified both the SHA 
signatures and the PGP.

Built successfully on the Mac (using the latest Corretto-11 JDK).

Installed on both Linux (CentOS 8.1 / Corretto-11) and an old Mac ( Catalina / 
Oracle JDK 14 ) and tested some old favorite queries.

+1 (binding) from me on RC0 

   Thanks,

   Boaz  

On 9/2/20, 12:18 PM, "Paul Rogers"  wrote:

Hi Abhishek,

Downloaded the tar file, installed Drill, cleaned my ZK and poked around in
the UI.

As you noted, you've already run the thousands of unit tests and the test
framework, so no point in trying to repeat that. Our tests, however, don't
cover the UI much at all, so I clicked around on the basics to ensure
things basically work. Seems good.

To catch the odd cases, would be great if someone who uses Drill in
production could try it out. Until then, my vote is +1.

- Paul


On Tue, Sep 1, 2020 at 5:28 PM Abhishek Girish  wrote:

> Thanks Vova!
>
> Hey folks, we need more votes to validate the release. Please give RC0 a
> try.
>
> Special request to PMCs - please vote as we only have 1 binding vote at
> this point. I am fine extending the voting window by a day or two if 
anyone
> is or plans to work on it soon.
>
> On Tue, Sep 1, 2020 at 12:09 PM Volodymyr Vysotskyi 
> wrote:
>
> > Verified checksums and signatures for binary and source tarballs and for
> > jars published to the maven repo.
> > Run all unit tests on Ubuntu with JDK 8 using tar with sources.
> > Run Drill in embedded mode on Ubuntu, submitted several queries, 
verified
> > that profiles displayed correctly.
> > Checked JDBC driver using SQuirreL SQL client and custom java client,
> > ensured that it works correctly with the custom authenticator.
> >
> > +1 (binding)
> >
> > Kind regards,
> > Volodymyr Vysotskyi
> >
> >
> > On Mon, Aug 31, 2020 at 1:37 PM Volodymyr Vysotskyi <
> volody...@apache.org>
> > wrote:
> >
> > > Hi all,
> > >
> > > I have looked into the DRILL-7785, and the problem is not in Drill, so
> it
> > > is not a blocker for the release.
> > > For more details please refer to my comment
> > > <
> >
> 
https://issues.apache.org/jira/browse/DRILL-7785?focusedCommentId=17187629=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17187629
 
> > >
> > > on this ticket.
> > >
> > > Kind regards,
> > > Volodymyr Vysotskyi
> > >
> > >
> > > On Mon, Aug 31, 2020 at 4:26 AM Abhishek Girish 
> > > wrote:
> > >
> > >> Yup we can certainly include it if RC0 fails. So far I’m inclined to
> not
> > >> consider it a blocker. I’ve requested Vova and Anton to take a look.
> > >>
> > >> So folks, please continue to test the candidate.
> > >>
> > >> On Sun, Aug 30, 2020 at 6:16 PM Charles Givre 
> wrote:
> > >>
> > >> > Ok.  Are you looking to include DRILL-7785?  I don't think it's a
> > >> blocker,
> > >> > but if we find anything with RC0... let's make sure we get it in.
> > >> >
> > >> > -- C
> > >> >
> > >> >
> > >> >
> > >> > > On Aug 30, 2020, at 9:14 PM, Abhishek Girish 
> > >> wrote:
> > >> >
> > >> > >
> > >> >
> > >> > > Hey Charles,
> > >> >
> > >> > >
> > >> >
> > >> > > I would have liked to. We did get one of the PRs merged after the
> > >> master
> > >> >
> > >> > > branch was closed as I hadn't made enough progress with the
> release
> > >> yet.
> > >> >
> > >> > > But that’s not the case now.
> > >> >
> > >> > >
> > >> >
> > >> > > Unless DRILL-7781 is a release blocker, we should probably skip
> it.
> > So
> > >> > far,
> > >> >
> > >> > > a lot of effort has gone into getting RC0 ready. So I'm hoping to
> > get
> > >> > this
> > >> >
> > >> > > closed asap.
> > >> >
> > >> > >
> > >> >
> > >> > > Regards,
> > >> >
> > >> > > Abhishek
> > >> >
> > >> > >
> > >> >
> > >> > > On Sun, Aug 30, 2020 at 6:07 PM Charles Givre 
> > >> wrote:
> > >> >
> > >> > >
> > >> >
> > >> > >> HI Abhishek,
> > >> >
> > >> > >>
> > >> >
> > >> > >> Can we merge DRILL-7781?  We really shouldn't ship something
> with a
> > >> > simple
> > >> >
> > >> > >> bug like this.
> > >> >
> > >> > >>
> > >> >
> > >> > >> -- C
> > >> >
> > >> > >>
> > >> >
> > >> > >>
> > >> >
> > >> > >>
> > >> >
> > >> > >>
> > >> >
> > >> > >>
> > >> >
> > >> > >>> On Aug 30, 2020, at 8:40 PM, Abhishek Girish <
> agir...@apache.org>
> > >> > wrote:
> > >> >
> > >> > >>
> > >> >
> > >> > >>>
> > >> >
> > >> > >>
> > >> >

Re: New Drill build failing - Access disabled to "http://apache-drill.s3.amazonaws.com/"

2019-10-15 Thread Boaz Ben-Zvi

  Done  -   DRILL-7405 

        Boaz
 
On 10/15/19, 12:06 PM, "Abhishek Girish"  wrote:
 

    Hey Boaz,
    

    Can you please file a JIRA and assign it to me? It looks like we may need

    to move over some of the datasets Drill references to Drill source under

    GitHub itself or to a difference S3 bucket if needed.
    

    Regards,

    Abhishek


    On Mon, Oct 14, 2019 at 5:09 PM Boaz Ben-Zvi  wrote:
    

    >    A new Drill build fails after deleting the Maven local repository

    > (~/.m2) with

    > [WARNING] Could not get

    > contentorg.apache.maven.wagon.authorization.AuthorizationException: Access

    > denied to:

    > 
https://urldefense.proofpoint.com/v2/url?u=http-3A__apache-2Ddrill.s3.amazonaws.com_files_sf-2D0.01-5Ftpc-2Dh-5Fparquet-5Ftyped.tgz=DwIBaQ=C5b8zRQO1miGmBeVZ2LFWg=GKkEedHmL7pz6suW7DPcdw=vB1H-SiuQUzyqGtwM-3pcVH1O09stIh71f-sJA9OrMg=anEVKFSNj2pU80Py9QFSq-E76GXypxJBHH0g4KkiCSE=
 

    > Has something changed with that S3 service (e.g., expired) ?   Should this

    > test data tar be placed elsewhere ?

    >  Thanks,

    >    Boaz

    >

[jira] [Created] (DRILL-7405) Build fails due to inaccessible apache-drill on S3 storage

2019-10-15 Thread Boaz Ben-Zvi (Jira)

Boaz Ben-Zvi created DRILL-7405:
---

 Summary: Build fails due to inaccessible apache-drill on S3 storage
 Key: DRILL-7405
 URL: https://issues.apache.org/jira/browse/DRILL-7405
 Project: Apache Drill
  Issue Type: Bug
  Components: Tools, Build  Test
Affects Versions: 1.16.0
Reporter: Boaz Ben-Zvi
Assignee: Abhishek Girish


  A new clean build (e.g. after deleting the ~/.m2 local repository) would fail 
now due to:  

Access denied to: 
[http://apache-drill.s3.amazonaws.com|https://urldefense.proofpoint.com/v2/url?u=http-3A__apache-2Ddrill.s3.amazonaws.com_files_sf-2D0.01-5Ftpc-2Dh-5Fparquet-5Ftyped.tgz=DwMGaQ=C5b8zRQO1miGmBeVZ2LFWg=KLC1nKJ8dIOnUay2kR6CAw=08mf7Xfn1orlbAA60GKLIuj_PTtfaSAijrKDLOucMPU=CX97We3sm3ZZ_aVJIrsUdXVJ3CNMYg7p3IsxbJpuXWk=]
 

(e.g., for the test data  sf-0.01_tpc-h_parquet_typed.tgz )

A new publicly available storage place is needed, plus appropriate changes in 
Drill to get to these resources.




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

New Drill build failing - Access disabled to "http://apache-drill.s3.amazonaws.com/"

2019-10-14 Thread Boaz Ben-Zvi

   A new Drill build fails after deleting the Maven local repository (~/.m2) 
with 
[WARNING] Could not get 
contentorg.apache.maven.wagon.authorization.AuthorizationException: Access 
denied to: 
http://apache-drill.s3.amazonaws.com/files/sf-0.01_tpc-h_parquet_typed.tgz
Has something changed with that S3 service (e.g., expired) ?   Should this test 
data tar be placed elsewhere ?
     Thanks,
           Boaz

[jira] [Resolved] (DRILL-7170) IllegalStateException: Record count not set for this vector container

2019-10-04 Thread Boaz Ben-Zvi (Jira)



 [ 
https://issues.apache.org/jira/browse/DRILL-7170?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Boaz Ben-Zvi resolved DRILL-7170.
-
  Reviewer: Sorabh Hamirwasia
Resolution: Fixed

> IllegalStateException: Record count not set for this vector container
> -
>
> Key: DRILL-7170
> URL: https://issues.apache.org/jira/browse/DRILL-7170
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Relational Operators
>Reporter: Sorabh Hamirwasia
>    Assignee: Boaz Ben-Zvi
>Priority: Major
> Fix For: 1.17.0
>
>
> {code:java}
> Query: 
> /root/drillAutomation/master/framework/resources/Advanced/tpcds/tpcds_sf1/original/maprdb/json/query95.sql
> WITH ws_wh AS
> (
> SELECT ws1.ws_order_number,
> ws1.ws_warehouse_sk wh1,
> ws2.ws_warehouse_sk wh2
> FROM   web_sales ws1,
> web_sales ws2
> WHERE  ws1.ws_order_number = ws2.ws_order_number
> ANDws1.ws_warehouse_sk <> ws2.ws_warehouse_sk)
> SELECT
> Count(DISTINCT ws_order_number) AS `order count` ,
> Sum(ws_ext_ship_cost)   AS `total shipping cost` ,
> Sum(ws_net_profit)  AS `total net profit`
> FROM web_sales ws1 ,
> date_dim ,
> customer_address ,
> web_site
> WHEREd_date BETWEEN '2000-04-01' AND  (
> Cast('2000-04-01' AS DATE) + INTERVAL '60' day)
> AND  ws1.ws_ship_date_sk = d_date_sk
> AND  ws1.ws_ship_addr_sk = ca_address_sk
> AND  ca_state = 'IN'
> AND  ws1.ws_web_site_sk = web_site_sk
> AND  web_company_name = 'pri'
> AND  ws1.ws_order_number IN
> (
> SELECT ws_order_number
> FROM   ws_wh)
> AND  ws1.ws_order_number IN
> (
> SELECT wr_order_number
> FROM   web_returns,
> ws_wh
> WHERE  wr_order_number = ws_wh.ws_order_number)
> ORDER BY count(DISTINCT ws_order_number)
> LIMIT 100
> Exception:
> java.sql.SQLException: SYSTEM ERROR: IllegalStateException: Record count not 
> set for this vector container
> Fragment 2:3
> Please, refer to logs for more information.
> [Error Id: 4ed92fce-505b-40ba-ac0e-4a302c28df47 on drill87:31010]
>   (java.lang.IllegalStateException) Record count not set for this vector 
> container
> 
> org.apache.drill.shaded.guava.com.google.common.base.Preconditions.checkState():459
> org.apache.drill.exec.record.VectorContainer.getRecordCount():394
> org.apache.drill.exec.record.RecordBatchSizer.():720
> org.apache.drill.exec.record.RecordBatchSizer.():704
> 
> org.apache.drill.exec.physical.impl.common.HashTableTemplate$BatchHolder.getActualSize():462
> 
> org.apache.drill.exec.physical.impl.common.HashTableTemplate.getActualSize():964
> 
> org.apache.drill.exec.physical.impl.common.HashTableTemplate.makeDebugString():973
> 
> org.apache.drill.exec.physical.impl.common.HashPartition.makeDebugString():601
> 
> org.apache.drill.exec.physical.impl.join.HashJoinBatch.makeDebugString():1313
> 
> org.apache.drill.exec.physical.impl.join.HashJoinBatch.executeBuildPhase():1105
> org.apache.drill.exec.physical.impl.join.HashJoinBatch.innerNext():525
> org.apache.drill.exec.record.AbstractRecordBatch.next():186
> org.apache.drill.exec.record.AbstractRecordBatch.next():126
> org.apache.drill.exec.record.AbstractRecordBatch.next():116
> org.apache.drill.exec.record.AbstractUnaryRecordBatch.innerNext():63
> 
> org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext():141
> org.apache.drill.exec.record.AbstractRecordBatch.next():186
> org.apache.drill.exec.record.AbstractRecordBatch.next():126
> org.apache.drill.exec.test.generated.HashAggregatorGen1068899.doWork():642
> org.apache.drill.exec.physical.impl.aggregate.HashAggBatch.innerNext():296
> org.apache.drill.exec.record.AbstractRecordBatch.next():186
> org.apache.drill.exec.record.AbstractRecordBatch.next():126
> org.apache.drill.exec.record.AbstractRecordBatch.next():116
> org.apache.drill.exec.record.AbstractUnaryRecordBatch.innerNext():63
> 
> org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext():141
> org.apache.drill.exec.record.AbstractRecordBatch.next():186
> org.apache.drill.exec.physical.impl.BaseRootExec.next():104
> 
> org.apache.drill.exec.physical.impl.SingleSenderCreator$SingleSenderRootExec.innerNext():93
> org.apache.drill.exec.physical.impl.BaseRootExec.next():94
> org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():296
> org.apache.drill.exec.work.fragment.FragmentExe

Apache Drill Hangout - June 25, 2019

2019-06-24 Thread Boaz Ben-Zvi


Hi Drillers,

  Our bi-weekly hangout is scheduled for tomorrow, Tuesday, June 25th, at 10 AM PST 
(link:https://meet.google.com/yki-iqdf-tai  
).

Please suggest any topics you would like to discuss during the hangout by 
replying to this email.

   Thanks,

 Boaz

[jira] [Created] (DRILL-7244) Run-time rowgroup pruning match() fails on casting a Long to an Integer

2019-05-07 Thread Boaz Ben-Zvi (JIRA)

Boaz Ben-Zvi created DRILL-7244:
---

 Summary: Run-time rowgroup pruning match() fails on casting a Long 
to an Integer
 Key: DRILL-7244
 URL: https://issues.apache.org/jira/browse/DRILL-7244
 Project: Apache Drill
  Issue Type: Bug
  Components: Storage - Parquet
Affects Versions: 1.17.0
Reporter: Boaz Ben-Zvi
Assignee: Boaz Ben-Zvi
 Fix For: 1.17.0


See DRILL-7062, where a temporary workaround was created, skipping pruning (and 
logging) instead of this failure: 

After a Parquet table is refreshed with selected "interesting" columns, a query 
whose WHERE clause contains a condition on a "non interesting" INT64 column 
fails during run-time pruning (calling match()) with:
{noformat}
org.apache.drill.common.exceptions.UserException: SYSTEM ERROR: 
ClassCastException: java.lang.Long cannot be cast to java.lang.Integer
{noformat}
A long term solution is to pass the whole (or the relevant part of the) schema 
to the runtime, instead of just passing the "interesting" columns.

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (DRILL-7240) Run-time rowgroup pruning match() fails on casting a Long to an Integer

2019-05-03 Thread Boaz Ben-Zvi (JIRA)

Boaz Ben-Zvi created DRILL-7240:
---

 Summary: Run-time rowgroup pruning match() fails on casting a Long 
to an Integer
 Key: DRILL-7240
 URL: https://issues.apache.org/jira/browse/DRILL-7240
 Project: Apache Drill
  Issue Type: Bug
  Components: Storage - Parquet
Affects Versions: 1.17.0
Reporter: Boaz Ben-Zvi
Assignee: Boaz Ben-Zvi
 Fix For: 1.17.0


After a Parquet table is refreshed with select "interesting" columns, a query 
whose WHERE clause contains a condition on a "non interesting" INT64 column 
fails during run-time pruning (calling match()) with:
{noformat}
org.apache.drill.common.exceptions.UserException: SYSTEM ERROR: 
ClassCastException: java.lang.Long cannot be cast to java.lang.Integer
{noformat}
 Near-term fix suggestion: Catch the match() exception error, and instead do 
not prune (i.e. run-time pruning would be disabled in such cases).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Re: May Apache Drill board report

2019-05-03 Thread Boaz Ben-Zvi


No comments; looks fine; +1

On 5/3/19 3:10 PM, Aman Sinha wrote:

+1

On Fri, May 3, 2019 at 1:40 PM Volodymyr Vysotskyi 
wrote:


Looks good, +1


Пт, 3 трав. 2019 23:32 користувач Arina Ielchiieva 
пише:


Hi all,

please take a look at the draft board report for the last quarter and let
me know if you have any comments.

Thanks,
Arina

=

## Description:
- Drill is a Schema-free SQL Query Engine for Hadoop, NoSQL and Cloud
   Storage.

## Issues:
  - There are no issues requiring board attention at this time.

## Activity:
- Since the last board report, Drill has released version 1.16.0,

including

   the following enhancements:
   - CREATE OR REPLACE SCHEMA command to define a schema for text files
   - REFRESH TABLE METADATA command can generate metadata cache files for
   specific columns
   - ANALYZE TABLE statement to computes statistics on Parquet data
   - SYSLOG (RFC-5424) Format Plugin
   - NEAREST DATE function to facilitate time series analysis
   - Format plugin for LTSV files
   - Ability to query Hive views
   - Upgrade to SQLLine 1.7
   - Apache Calcite upgrade to 1.18.0
   - Several Drill Web UI improvements, including:
  - Storage plugin management improvements
  - Query progress indicators and warnings
  - Ability to limit the result size for better UI response
  - Ability to sort the list of profiles in the Drill Web UI
  - Display query state in query result page
  - Button to reset the options filter

- Drill User Meetup will be held on May 22, 2019. Two talks are planned:
   - Alibaba's Usage of Apache Drill for querying a Time Series Database
   - What’s new with Apache Drill 1.16 & a demo of Schema Provisioning

## Health report:
- The project is healthy. Development activity as reflected in the pull
   requests and JIRAs is good.
- Activity on the dev and user mailing lists are stable.
- One PMC member was added in the last period.

## PMC changes:

- Currently 24 PMC members.
- Sorabh Hamirwasia was added to the PMC on Fri Apr 05 2019

## Committer base changes:

- Currently 51 committers.
- No new committers added in the last 3 months
- Last committer addition was Salim Achouche at Mon Dec 17 2018

## Releases:

- 1.16.0 was released on Thu May 02 2019

## Mailing list activity:

- dev@drill.apache.org:
- 406 subscribers (down -10 in the last 3 months):
- 2299 emails sent to list (1903 in previous quarter)

- iss...@drill.apache.org:
- 17 subscribers (down -1 in the last 3 months):
- 2373 emails sent to list (2233 in previous quarter)

- u...@drill.apache.org:
- 582 subscribers (down -15 in the last 3 months):
- 235 emails sent to list (227 in previous quarter)

## JIRA activity:

- 214 JIRA tickets created in the last 3 months
- 212 JIRA tickets closed/resolved in the last 3 months

Re: [VOTE] Apache Drill Release 1.16.0 - RC2

2019-04-30 Thread Boaz Ben-Zvi

Downloaded both the binary and src tarballs, and verified the SHA 
signatures and the PGP.


Built and ran the full unit tests on both Linux and Mac.

Successfully ran some old favorite queries, and several manual tests of 
REFRESH METADATA with COLUMNS, and verified the metadata files and 
summaries.


  +1 from me for RC2 .

   -- Boaz

On 4/30/19 11:26 AM, Kunal Khatua wrote:

Ran manual tests with random queries, trying out the UI and running joins on 
small tables.

HOCON export of storage plugins does not actually export in HOCON format, but 
that is not a blocker.

+1 (binding)

~ Kunal

On 4/30/2019 4:53:48 AM, Arina Yelchiyeva  wrote:
Downloaded binary tarball and ran Drill in embedded mode.
Verified schema provisioning for text files, dynamic UDFs.
Ran random queries, including long-running, queried system tables, created 
tables with different formats.
Checked Web UI (queries, profiles, storage plugins, logs pages).

+1 (binding)

Kind regards,
Arina


On Apr 30, 2019, at 8:33 AM, Aman Sinha wrote:

Downloaded binary tarball on my Mac and ran in embedded mode.
Verified Sorabh's release signature and the tar file's checksum
Did a quick glance through maven artifacts
Did some manual tests with TPC-DS Web_Sales table and ran REFRESH METADATA
command against the same table
Checked runtime query profiles of above queries and verified COUNT(*),
COUNT(column) optimization is getting applied.
Also did a build from source on my linux VM.

RC2 looks good ! +1

On Fri, Apr 26, 2019 at 8:28 AM SorabhApache wrote:


Hi Drillers,
I'd like to propose the third release candidate (RC2) for the Apache Drill,
version 1.16.0.

Changes since the previous release candidate:
DRILL-7201: Strange symbols in error window (Windows)
DRILL-7202: Failed query shows warning that fragments has made no progress
DRILL-7207: Update the copyright year in NOTICE.txt file
DRILL-7212: Add gpg key with apache.org email for sorabh
DRILL-7213: drill-format-mapr.jar contains stale git.properties file

The RC2 includes total of 220 resolved JIRAs [1].
Thanks to everyone for their hard work to contribute to this release.

The tarball artifacts are hosted at [2] and the maven artifacts are hosted
at [3].

This release candidate is based on commit
751e87736c2ddbc184b52cfa56f4e29c68417cfe located at [4].

Please download and try out the release candidate.

The vote ends at 04:00 PM UTC (09:00 AM PDT, 07:00 PM EET, 09:30 PM IST),
May 1st, 2019

[ ] +1
[ ] +0
[ ] -1

Here is my vote: +1
[1]

https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12313820=12344284
[2] http://home.apache.org/~sorabh/drill/releases/1.16.0/rc2/
[3]
https://repository.apache.org/content/repositories/orgapachedrill-1073/
[4] https://github.com/sohami/drill/commits/drill-1.16.0

Thanks,
Sorabh

Re: [VOTE] Apache Drill Release 1.16.0 - RC1

2019-04-24 Thread Boaz Ben-Zvi

   Downloaded both the binary and src tarballs, and verified the SHA 
signatures and the PGP.


Built and ran the full unit tests on both Linux and Mac (took 3:05 hours 
on my Mac).


Successfully ran some old favorite queries with Sort/Hash-Join/Hash-Agg 
spilling.


Ran several manual tests of REFRESH METADATA with COLUMNS, and verified 
the metadata files and summaries.


Noticed that when specifying a COLUMN which a sub-field in a complex 
type (e.g., a key in a map), the whole column (i.e. all the other keys 
as well) was marked as "interesting"; but this may be "by design", as 
the refresh granularity is the whole column.


Also noticed the sys.version issue (DRILL-7208 
) - should be minor as 
only affecting users of the SRC tarball, likely developers who 
build/modify the code anyway.


   Hence my vote is  +1 .

  -- Boaz

On 4/24/19 10:57 AM, Kunal Khatua wrote:

Downloaded the tarball and tried it in embedded mode.

Ran simple join queries and interacted with the WebUI.

Issues confirmed were DRILL-7192 and DRILL-7203.
I'm unable to repro DRILL-7201 and DRILL-7202, though I have a fix for the 
latter. Will work with Arina to identify repro steps.

None of these are blockers IMO, so I'll vote +1.

~ Kunal


On 4/24/2019 10:38:31 AM, Khurram Faraaz  wrote:
i see the correct version and commit, I deployed the binaries to test.

Apache Drill 1.16.0
"Start your SQL engine."
apache drill> select * from sys.version;
+-+--+-+---+---+---+
| version | commit_id |
commit_message | commit_time |
build_email | build_time |
+-+--+-+---+---+---+
| 1.16.0 | cf5b758e0a4c22b75bfb02ac2653ff09415ddf53 |
[maven-release-plugin] prepare release drill-1.16.0 | 22.04.2019 @ 09:08:36
PDT | sor...@apache.org | 22.04.2019 @ 09:53:25 PDT |
+-+--+-+---+---+---+
1 row selected (0.274 seconds)
apache drill>

Thanks,
Khurram

On Wed, Apr 24, 2019 at 9:52 AM SorabhApache wrote:


Hi Volodymyr/Anton,
I can verify that I am seeing both the below issues as reported by Anton
and Volodymyr. I will investigate further why we are seeing these issues.
Thanks for catching this. Can you please open JIRA's for these issues ?

1) Wrong result for sys.version query when built from source tarball.
2) git.properties file in drill-format-mapr-1.16.0.jar has wrong commit id
but as Volodymyr mentioned because of order in which jars are picked up
it's not showing the issue when tried from prebuilt tarball.

@Volodymyr Vysotskyi
Regarding the GPG key I am not sure if we mandate it to use apache.org,
there other keys in the file which are using gmail address as well. As far
as the signing person is authenticating the key and details associated with
it, I think it should be fine. But since it's recommended I will use
apache.org email address instead.

Thanks,
Sorabh

On Wed, Apr 24, 2019 at 8:53 AM Volodymyr Vysotskyi
wrote:


Hi Aman,

There are two different issues connected with *git.properties* file.
Regarding the problem I have mentioned, prebuilt tar
(apache-drill-1.16.0.tar.gz) contains *drill-format-mapr-1.16.0.jar* jar
which contains a *git.properties* file with the incorrect version.
When *select * from sys.version* query is submitted, class loader finds

the

first file named as *git.properties* from the classpath (each drill jar
contains its own *git.properties* file) and for my case file from
*drill-format-mapr-1.16.0.jar *is picked up, so the incorrect result is
returned. But it may not be reproducible for other machines since it
depends on the order of files for the class loader.

Regarding the problem Anton has mentioned, Drill should be built from the
sources (apache-drill-1.16.0-src.tar.gz), and for that version, *select *
from sys.version* returns the result without information about commit.

Kind regards,
Volodymyr Vysotskyi


On Wed, Apr 24, 2019 at 6:33 PM Aman Sinha wrote:


This works fine for me with the binary tarball that I installed on my

Mac.

..it shows the correct commit message.

Apache Drill 1.16.0

"This isn't your grandfather's SQL."

apache drill> *select* * *from* sys.version;



+-+--+-+---+---+---+

| version | commit_id |
commit_message | commit_time |
build_email | build_time |

[jira] [Created] (DRILL-7173) Analyze table may fail when prefer_plain_java is set to true on codegen for resetValues

2019-04-11 Thread Boaz Ben-Zvi (JIRA)

Boaz Ben-Zvi created DRILL-7173:
---

 Summary: Analyze table may fail when prefer_plain_java is set to 
true on codegen for resetValues 
 Key: DRILL-7173
 URL: https://issues.apache.org/jira/browse/DRILL-7173
 Project: Apache Drill
  Issue Type: Improvement
  Components: Execution - Codegen
Affects Versions: 1.15.0
 Environment: *prefer_plain_java: true*

 
Reporter: Boaz Ben-Zvi
 Fix For: 1.17.0


  The *prefer_plain_java* compile option is useful for debugging of generated 
code (can be set in dril-override.conf; the default value is false). When set 
to true, some "analyze table" calls generate code that fails due to addition of 
a SchemaChangeException which is not in the Streaming Aggr template.

For example:
{noformat}
apache drill (dfs.tmp)> create table lineitem3 as select * from 
cp.`tpch/lineitem.parquet`;
+--+---+
| Fragment | Number of records written |
+--+---+
| 0_0 | 60175 |
+--+---+
1 row selected (2.06 seconds)
apache drill (dfs.tmp)> analyze table lineitem3 compute statistics;
Error: SYSTEM ERROR: CompileException: File 
'org.apache.drill.exec.compile.DrillJavaFileObject[StreamingAggregatorGen4.java]',
 Line 7869, Column 20: StreamingAggregatorGen4.java:7869: error: resetValues() 
in org.apache.drill.exec.test.generated.StreamingAggregatorGen4 cannot override 
resetValues() in 
org.apache.drill.exec.physical.impl.aggregate.StreamingAggTemplate
 public boolean resetValues()
 ^
 overridden method does not throw 
org.apache.drill.exec.exception.SchemaChangeException 
(compiler.err.override.meth.doesnt.throw)
{noformat}
 

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Re: [ANNOUNCE] New PMC member: Sorabh Hamirwasia

2019-04-05 Thread Boaz Ben-Zvi

 Congratulation Sorabh - welcome to the  Project Management Committee !!


On Fri, Apr 5, 2019 at 10:58 AM Abhishek Ravi  wrote:

> Congratulations Sorabh! Well deserved!
>
> On Fri, Apr 5, 2019 at 10:49 AM hanu mapr  wrote:
>
> > Congratulations, Sorabh!
> >
> > On Fri, Apr 5, 2019 at 10:30 AM Jyothsna Reddy 
> > wrote:
> >
> > > Congratulations Sorabh :D :D
> > >
> > >
> > >
> > >
> > > On Fri, Apr 5, 2019 at 10:29 AM Paul Rogers  >
> > > wrote:
> > >
> > > > Congratulations Sorabh, well deserved!
> > > >
> > > > - Paul
> > > >
> > > >
> > > >
> > > > On Friday, April 5, 2019, 9:06:37 AM PDT, Arina Ielchiieva <
> > > > ar...@apache.org> wrote:
> > > >
> > > >  I am pleased to announce that Drill PMC invited Sorabh Hamirwasia to
> > > > the PMC and
> > > > he has accepted the invitation.
> > > >
> > > > Congratulations Sorabh and welcome!
> > > >
> > > > - Arina
> > > > (on behalf of Drill PMC)
> > > >
> > >
> >
>

Re: [DISCUSS] Including Features that Need Regular Updating?

2019-03-22 Thread Boaz Ben-Zvi


 Hi Charles,

    If these updates are only small simple tasks, it would not be a big 
issue to add them to the Drill Release Process (see [1]).


BTW, most of the release work is automated via a script (see section 4 
in [1]); so if these updates could be automated as well, it would be a 
trivial matter,


   Thanks for your useful contributions,

    -- Boaz

[1] https://github.com/parthchandra/drill/wiki/Drill-Release-Process

On 3/22/19 11:13 AM, Charles Givre wrote:

Hello all,
I have a question regarding new Drill features.  I have two UDFs which I’ve 
been considering submitting but these UDFs will require regular updating.  The 
features in question are UDFs to do IP Geolocation and a User Agent Parser.  
The IP geolocation is dependent on the MaxMind database and associated 
libraries.  Basically if it were to be included in Drill, every release we 
would have to update the MaxMind DB.  (This is done in many tools that rely on 
it for IP Geolocation)

The other is the user agent parser.   Likewise, the only updating it would need 
would be to update the pom.xml file to reflect the latest version of the UA 
parser.  These are both very useful features for security analysis but I wanted 
to ask the Drill developer community if this is something we wanted to consider.
— C

Re: [DISCUSS]: Git instructions

2019-03-04 Thread Boaz Ben-Zvi


 Hi Charles,

    Here [1] is a page I created, which translates the English "I want 
to do . using Git" into the actual git cryptic commands (using clear 
terms).


This is work in progress; while working on Drill, whenever I get stuck 
on some "how to do" git issue, I add the findings there to avoid wasting 
time next time.


    Hope this can save some time for others as well,

   -- Boaz

[1] https://github.com/Ben-Zvi/Misc/wiki/Git-for-Drill-developers


On 3/4/19 1:58 PM, Parth Chandra wrote:

Someone (even wiser) said to me : If git is your only problem when working
on open source, then you're doing great.

More seriously though, it is important to know git, especially if you have
committer privileges. And no matter how conversant you are with git, it
always helps to read this page [1].

In addition to having a list of commands for common git activities, we
could make things easier for newbies by providing scripts, but things like
rebase, resolving merge conflicts, etc. are sort of hard to script.

[1] https://git-scm.com/book/en/v2/Git-Branching-Branches-in-a-Nutshell


On Sun, Mar 3, 2019 at 6:33 PM Paul Rogers 
wrote:


Hi Charles,

As someone who struggled though learning these topics over the last few
years, I'd point out that there is no right way to do this stuff. You can
use the Git command line tools, You can use a UI. You can keep branches
locally, or publish everything to GitHub. As Parth wisely noted back when I
started with Drill, Git will be confusing until you "get" what's going on,
then it seems pretty simple.

While Drill should probably not try to provide a full Git tutorial, I
notice that many projects do provide a set of instructions for common
tasks. These don't explain the why and how, they just act as a reference,
which is pretty handy. Anyone know of a good writeup for another project we
can reference?

Maybe we can draft something (ideally on a Wiki, but since we don't have
one, in the developer documentation) that follows the material of other
projects, but with a Drill-specific spin.

Thanks,
- Paul



 On Saturday, March 2, 2019, 5:00:13 PM PST, Charles Givre <
cgi...@gmail.com> wrote:

  All,
Speaking as a non-developer, I wonder if it might be helpful to put
instructions on the CONTRIBUTING.md file that explain how to:
1.  Rebase a branch
2.  Squash commits

I know for developers these things seem trivial, but for non-developers or
people who don’t work with git on a regular basis, it can be quite
confusing.  In the last few weeks, we’ve seen a few non developers submit
PRs and they seem to be stuck on these steps.

Thanks,
— C

[jira] [Created] (DRILL-7069) Poor performance of transformBinaryInMetadataCache

2019-02-28 Thread Boaz Ben-Zvi (JIRA)

Boaz Ben-Zvi created DRILL-7069:
---

 Summary: Poor performance of transformBinaryInMetadataCache
 Key: DRILL-7069
 URL: https://issues.apache.org/jira/browse/DRILL-7069
 Project: Apache Drill
  Issue Type: Improvement
  Components: Metadata
Affects Versions: 1.15.0
Reporter: Boaz Ben-Zvi
Assignee: Boaz Ben-Zvi
 Fix For: 1.16.0


The performance of the method *transformBinaryInMetadataCache* scales poorly as 
the table's numbers of underlying files, row-groups and columns grow. This 
method is invoked during planning of every query using this table.

     A test on a table using 219 directories (each with 20 files), 1 row-group 
in each file, and 94 columns, measured about *1340 milliseconds*.

    The main culprit are the version checks, which take place in *every 
iteration* (i.e., about 400k times in the previous example) and involve 
construction of 6 MetadataVersion objects (and possibly garbage collections).

     Removing the version checks from the loops improved this method's 
performance on the above test down to about *250 milliseconds*.

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (DRILL-7043) Enhance Merge-Join to support Full Outer Join

2019-02-19 Thread Boaz Ben-Zvi (JIRA)

Boaz Ben-Zvi created DRILL-7043:
---

 Summary: Enhance Merge-Join to support Full Outer Join
 Key: DRILL-7043
 URL: https://issues.apache.org/jira/browse/DRILL-7043
 Project: Apache Drill
  Issue Type: Improvement
  Components: Execution - Relational Operators, Query Planning  
Optimization
Affects Versions: 1.15.0
Reporter: Boaz Ben-Zvi
Assignee: Boaz Ben-Zvi


   Currently the Merge Join operator internally cannot support a Right Outer 
Join (and thus a Full Outer Join; for ROJ alone, the planner rotates the inputs 
and specifies a Left Outer Join).

   The actual reason for not supporting ROJ is the current MJ implementation - 
when a match is found, it puts a mark on the right side and iterates down on 
the right, resetting back at the end (and on to the next left side entry).  
This would create an ambiguity if the next left entry is bigger than the 
previous - is this an unmatched (i.e., need to return the right entry), or 
there was a prior match (i.e., just advance to the next right).

   Seems that adding a relevant flag to the persisted state ({{status}}) and 
some other code changes would make the operator support Right-Outer-Join as 
well (and thus a Full Outer Join).  The planner need an update as well - to 
suggest the MJ in case of a FOJ, and maybe not to rotate the inputs in some MJ 
cases.

   Currently trying a FOJ with MJ (i.e. HJ disabled) produces the following "no 
plan found" from Calcite:
{noformat}
0: jdbc:drill:zk=local> select * from temp t1 full outer join temp2 t2 on 
t1.d_date = t2.d_date;
Error: SYSTEM ERROR: CannotPlanException: Node 
[rel#2804:Subset#8.PHYSICAL.SINGLETON([]).[]] could not be implemented; planner 
state:

Root: rel#2804:Subset#8.PHYSICAL.SINGLETON([]).[]
Original rel:
DrillScreenRel(subset=[rel#2804:Subset#8.PHYSICAL.SINGLETON([]).[]]): rowcount 
= 6.0, cumulative cost = {0.6001 rows, 0.6001 cpu, 0.0 
io, 0.0 network, 0.0 memory}, id = 2802
  DrillProjectRel(subset=[rel#2801:Subset#7.LOGICAL.ANY([]).[]], **=[$0], 
**0=[$2]): rowcount = 6.0, cumulative cost = {6.0 rows, 12.0 cpu, 0.0 io, 0.0 
network, 0.0 memory}, id = 2800
DrillJoinRel(subset=[rel#2799:Subset#6.LOGICAL.ANY([]).[]], 
condition=[=($1, $3)], joinType=[full]): rowcount = 6.0, cumulative cost = 
{10.0 rows, 104.0 cpu, 0.0 io, 0.0 network, 70.4 memory}, id = 2798

{noformat}
 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Resolved] (DRILL-6914) Query with RuntimeFilter and SemiJoin fails with IllegalStateException: Memory was leaked by query

2019-02-08 Thread Boaz Ben-Zvi (JIRA)



 [ 
https://issues.apache.org/jira/browse/DRILL-6914?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Boaz Ben-Zvi resolved DRILL-6914.
-
Resolution: Fixed

The interaction between the Hash-Join spill and the runtime filter was fixed in 
PR #1622. Testing with the latest code works OK (no memory leaks).

 

> Query with RuntimeFilter and SemiJoin fails with IllegalStateException: 
> Memory was leaked by query
> --
>
> Key: DRILL-6914
> URL: https://issues.apache.org/jira/browse/DRILL-6914
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Flow
>Affects Versions: 1.15.0
>Reporter: Abhishek Ravi
>Assignee: Boaz Ben-Zvi
>Priority: Major
> Fix For: 1.16.0
>
> Attachments: 23cc1af3-0e8e-b2c9-a889-a96504988d6c.sys.drill, 
> 23cc1b7c-5b5c-d123-5e72-6d7d2719df39.sys.drill
>
>
> Following query fails on TPC-H SF 100 dataset when 
> exec.hashjoin.enable.runtime_filter = true AND planner.enable_semijoin = true.
> Note that the query does not fail if any one of them or both are disabled.
> {code:sql}
> set `exec.hashjoin.enable.runtime_filter` = true;
> set `exec.hashjoin.runtime_filter.max.waiting.time` = 1;
> set `planner.enable_broadcast_join` = false;
> set `planner.enable_semijoin` = true;
> select
>  count(*) as row_count
> from
>  lineitem l1
> where
>  l1.l_shipdate IN (
>  select
>  distinct(cast(l2.l_shipdate as date))
>  from
>  lineitem l2);
> reset `exec.hashjoin.enable.runtime_filter`;
> reset `exec.hashjoin.runtime_filter.max.waiting.time`;
> reset `planner.enable_broadcast_join`;
> reset `planner.enable_semijoin`;
> {code}
>  
> {noformat}
> Error: SYSTEM ERROR: IllegalStateException: Memory was leaked by query. 
> Memory leaked: (134217728)
> Allocator(frag:1:0) 800/134217728/172453568/70126322567 
> (res/actual/peak/limit)
> Fragment 1:0
> Please, refer to logs for more information.
> [Error Id: ccee18b3-c3ff-4fdb-b314-23a6cfed0a0e on qa-node185.qa.lab:31010] 
> (state=,code=0)
> java.sql.SQLException: SYSTEM ERROR: IllegalStateException: Memory was leaked 
> by query. Memory leaked: (134217728)
> Allocator(frag:1:0) 800/134217728/172453568/70126322567 
> (res/actual/peak/limit)
> Fragment 1:0
> Please, refer to logs for more information.
> [Error Id: ccee18b3-c3ff-4fdb-b314-23a6cfed0a0e on qa-node185.qa.lab:31010]
> at 
> org.apache.drill.jdbc.impl.DrillCursor.nextRowInternally(DrillCursor.java:536)
> at org.apache.drill.jdbc.impl.DrillCursor.next(DrillCursor.java:640)
> at org.apache.calcite.avatica.AvaticaResultSet.next(AvaticaResultSet.java:217)
> at 
> org.apache.drill.jdbc.impl.DrillResultSetImpl.next(DrillResultSetImpl.java:151)
> at sqlline.BufferedRows.(BufferedRows.java:37)
> at sqlline.SqlLine.print(SqlLine.java:1716)
> at sqlline.Commands.execute(Commands.java:949)
> at sqlline.Commands.sql(Commands.java:882)
> at sqlline.SqlLine.dispatch(SqlLine.java:725)
> at sqlline.SqlLine.runCommands(SqlLine.java:1779)
> at sqlline.Commands.run(Commands.java:1485)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at sqlline.ReflectiveCommandHandler.execute(ReflectiveCommandHandler.java:38)
> at sqlline.SqlLine.dispatch(SqlLine.java:722)
> at sqlline.SqlLine.initArgs(SqlLine.java:458)
> at sqlline.SqlLine.begin(SqlLine.java:514)
> at sqlline.SqlLine.start(SqlLine.java:264)
> at sqlline.SqlLine.main(SqlLine.java:195)
> Caused by: org.apache.drill.common.exceptions.UserRemoteException: SYSTEM 
> ERROR: IllegalStateException: Memory was leaked by query. Memory leaked: 
> (134217728)
> Allocator(frag:1:0) 800/134217728/172453568/70126322567 
> (res/actual/peak/limit)
> Fragment 1:0
> Please, refer to logs for more information.
> [Error Id: ccee18b3-c3ff-4fdb-b314-23a6cfed0a0e on qa-node185.qa.lab:31010]
> at 
> org.apache.drill.exec.rpc.user.QueryResultHandler.resultArrived(QueryResultHandler.java:123)
> at org.apache.drill.exec.rpc.user.UserClient.handle(UserClient.java:422)
> at org.apache.drill.exec.rpc.user.UserClient.handle(UserClient.java:96)
> at org.apache.drill.exec.rpc.RpcBus$InboundHandler.decode(RpcBus.java:273)
> at org.apache.drill.exec.rpc.RpcBus$InboundHandler.decode(RpcBus.java:243)
> at 
> io.netty.handler

[jira] [Created] (DRILL-7034) Window function over a malformed CSV file crashes the JVM

2019-02-08 Thread Boaz Ben-Zvi (JIRA)

Boaz Ben-Zvi created DRILL-7034:
---

 Summary: Window function over a malformed CSV file crashes the JVM 
 Key: DRILL-7034
 URL: https://issues.apache.org/jira/browse/DRILL-7034
 Project: Apache Drill
  Issue Type: Bug
  Components: Execution - Relational Operators
Affects Versions: 1.15.0
Reporter: Boaz Ben-Zvi


The JVM crashes executing window functions over (an ordered) CSV file with a 
small format issue - an empty line.

To create: Take the following simple `a.csvh` file:
{noformat}
amount
10
11
{noformat}

And execute a simple window function like
{code:sql}
select max(amount) over(order by amount) FROM dfs.`/data/a.csvh`;
{code}

Then add an empty line between the `10` and the `11`:
{noformat}
amount
10

11
{noformat}

 and try again:
{noformat}
0: jdbc:drill:zk=local> select max(amount) over(order by amount) FROM 
dfs.`/data/a.csvh`;
+-+
| EXPR$0  |
+-+
| 10  |
| 11  |
+-+
2 rows selected (3.554 seconds)
0: jdbc:drill:zk=local> select max(amount) over(order by amount) FROM 
dfs.`/data/a.csvh`;
#
# A fatal error has been detected by the Java Runtime Environment:
#
#  SIGSEGV (0xb) at pc=0x0001064aeae7, pid=23450, tid=0x6103
#
# JRE version: Java(TM) SE Runtime Environment (8.0_181-b13) (build 
1.8.0_181-b13)
# Java VM: Java HotSpot(TM) 64-Bit Server VM (25.181-b13 mixed mode bsd-amd64 
compressed oops)
# Problematic frame:
# J 6719% C2 
org.apache.drill.exec.expr.fn.impl.ByteFunctionHelpers.memcmp(JIIJII)I (188 
bytes) @ 0x0001064aeae7 [0x0001064ae920+0x1c7]
#
# Core dump written. Default location: /cores/core or core.23450
#
# An error report file with more information is saved as:
# /Users/boazben-zvi/IdeaProjects/drill/hs_err_pid23450.log
#
# If you would like to submit a bug report, please visit:
#   http://bugreport.java.com/bugreport/crash.jsp
#
Abort trap: 6 (core dumped)
{noformat}




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Re: January Apache Drill board report

2019-01-31 Thread Boaz Ben-Zvi


  The report looks good ; thanks.

One more item that may deserve a mention is the publishing or Charles' 
and Paul's O'REILLY Book "Learning Apache Drill" in Nov. 2018.


    Thanks,

 Boaz

On 1/31/19 8:18 AM, Aman Sinha wrote:

Thanks for putting this together, Arina.
The Drill Developer Day and Meetup were separate events, so you can split
them up.
   - A half day Drill Developer Day was held on Nov 14.  A variety of
technical design issues were discussed.
   - A Drill user meetup was held on the same evening.  2 presentations -
one on use case for Drill and one about indexing support in Drill were
presented.

Rest of the report LGTM.

-Aman


On Thu, Jan 31, 2019 at 7:58 AM Arina Ielchiieva  wrote:


Hi all,

please take a look at the draft board report for the last quarter and let
me know if you have any comments.

Thanks,
Arina

=

## Description:
  - Drill is a Schema-free SQL Query Engine for Hadoop, NoSQL and Cloud
Storage.

## Issues:
  - There are no issues requiring board attention at this time.

## Activity:
  - Since the last board report, Drill has released version 1.15.0,
including the following enhancements:
- Add capability to do index based planning and execution
- CROSS join support
- INFORMATION_SCHEMA FILES and FUNCTIONS were added
- Support for TIMESTAMPADD and TIMESTAMPDIFF functions
- Ability to secure znodes with custom ACLs
- Upgrade to SQLLine 1.6
- Parquet filter pushdown for VARCHAR and DECIMAL data types
- Support JPPD (Join Predicate Push Down)
- Lateral join functionality was enabled by default
- Multiple Web UI improvements to simplify the use of options and submit
queries
- Query performance with the semi-join functionality was improved
- Support for aliases in the GROUP BY clause
- Options to return null for empty string and prevents Drill from
returning
  a result set for DDL statements
- Storage plugin names became case-insensitive

- Drill developer meet up was held on November 14, 2018.

## Health report:
  - The project is healthy. Development activity
as reflected in the pull requests and JIRAs is good.
  - Activity on the dev and user mailing lists are stable.
  - Three committers were added in the last period.

## PMC changes:

  - Currently 23 PMC members.
  - No new PMC members added in the last 3 months
  - Last PMC addition was Charles Givre on Mon Sep 03 2018

## Committer base changes:

  - Currently 51 committers.
  - New commmitters:
 - Hanumath Rao Maduri was added as a committer on Thu Nov 01 2018
 - Karthikeyan Manivannan was added as a committer on Fri Dec 07 2018
 - Salim Achouche was added as a committer on Mon Dec 17 2018

## Releases:

  - 1.15.0 was released on Mon Dec 31 2018

## Mailing list activity:

  - dev@drill.apache.org:
 - 415 subscribers (down -12 in the last 3 months):
 - 2066 emails sent to list (2653 in previous quarter)

  - iss...@drill.apache.org:
 - 18 subscribers (up 0 in the last 3 months):
 - 2480 emails sent to list (3228 in previous quarter)

  - u...@drill.apache.org:
 - 592 subscribers (down -5 in the last 3 months):
 - 249 emails sent to list (310 in previous quarter)


## JIRA activity:

  - 196 JIRA tickets created in the last 3 months
  - 171 JIRA tickets closed/resolved in the last 3 months

[jira] [Created] (DRILL-7015) Improve documentation for PARTITION BY

2019-01-29 Thread Boaz Ben-Zvi (JIRA)

Boaz Ben-Zvi created DRILL-7015:
---

 Summary: Improve documentation for PARTITION BY
 Key: DRILL-7015
 URL: https://issues.apache.org/jira/browse/DRILL-7015
 Project: Apache Drill
  Issue Type: Improvement
  Components: Documentation
Affects Versions: 1.15.0
Reporter: Boaz Ben-Zvi
Assignee: Bridget Bevens
 Fix For: 1.16.0


The documentation for CREATE TABLE AS (CTAS) shows the syntax of the command, 
without the optional PARTITION BY clause. That option is only mentioned later 
under the usage notes.

*+_Suggestion_+*: Add this optional clause to the syntax (same as for CREATE 
TEMPORARY TABLE (CTTAS)). And mention that this option is only applicable when 
storing in Parquet. 

And the documentation for CREATE TEMPORARY TABLE (CTTAS), the comment says:
{panel}
An optional parameter that can *only* be used to create temporary tables with 
the Parquet data format. 
{panel}
Which can mistakenly be understood as "only for temporary tables". 
*_+Suggestion+_*: erase the "to create temporary tables" part (not needed, as 
it is implied from the context of this page).

*_+Last suggestion+_*: In the documentation for the PARTITION BY clause, can 
add an example using the implicit column "filename" to demonstrate how the 
partitioning column puts each distinct value into a separate file. For example, 
add in the "Other Examples" section :
{noformat}
0: jdbc:drill:zk=local> select distinct r_regionkey, filename from mytable1;
+--++
| r_regionkey  |filename|
+--++
| 2| 0_0_3.parquet  |
| 1| 0_0_2.parquet  |
| 0| 0_0_1.parquet  |
| 3| 0_0_4.parquet  |
| 4| 0_0_5.parquet  |
+--++
{noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (DRILL-7013) Hash-Join and Hash-Aggr to handle incoming with selection vectors

2019-01-28 Thread Boaz Ben-Zvi (JIRA)

Boaz Ben-Zvi created DRILL-7013:
---

 Summary: Hash-Join and Hash-Aggr to handle incoming with selection 
vectors
 Key: DRILL-7013
 URL: https://issues.apache.org/jira/browse/DRILL-7013
 Project: Apache Drill
  Issue Type: Improvement
  Components: Execution - Relational Operators, Query Planning  
Optimization
Affects Versions: 1.15.0
Reporter: Boaz Ben-Zvi


  The Hash-Join and Hash-Aggr operators copy each incoming row separately. When 
the incoming data has a selection vector (e.g., outgoing from a Filter), a 
_SelectionVectorRemover_ is added before the Hash operator, as the latter 
cannot handle the selection vector.  

  Thus every row is needlessly being copied twice!

+Suggestion+: Enhance the Hash operators to handle potential incoming selection 
vectors, thus eliminating  the need for the extra copy. The planner needs to be 
changed not to add that SelectionVectorRemover.

For example:
{code:sql}
select * from cp.`tpch/lineitem.parquet` L,  cp.`tpch/orders.parquet` O where 
O.o_custkey > 1498 and L.l_orderkey > 58999 and O.o_orderkey = L.l_orderkey 
{code}
And the plan:
{panel}
00-00 Screen : rowType = RecordType(DYNAMIC_STAR **, DYNAMIC_STAR **0): 
 00-01 ProjectAllowDup(**=[$0], **0=[$1]) : rowType = RecordType(DYNAMIC_STAR 
**, DYNAMIC_STAR **0): 
 00-02 Project(T44¦¦**=[$0], T45¦¦**=[$2]) : rowType = RecordType(DYNAMIC_STAR 
T44¦¦**, DYNAMIC_STAR T45¦¦**): 
 00-03 HashJoin(condition=[=($1, $4)], joinType=[inner], semi-join: =[false]) : 
rowType = RecordType(DYNAMIC_STAR T44¦¦**, ANY l_orderkey, DYNAMIC_STAR 
T45¦¦**, ANY o_custkey, ANY o_orderkey): 
 00-05 *SelectionVectorRemover* : rowType = RecordType(DYNAMIC_STAR T44¦¦**, 
ANY l_orderkey):
 00-07 Filter(condition=[>($1, 58999)]) : rowType = RecordType(DYNAMIC_STAR 
T44¦¦**, ANY l_orderkey):
 00-09 Project(T44¦¦**=[$0], l_orderkey=[$1]) : rowType = 
RecordType(DYNAMIC_STAR T44¦¦**, ANY l_orderkey): 
 00-11 Scan(table=[[cp, tpch/lineitem.parquet]], groupscan=[ParquetGroupScan 
[entries=[ReadEntryWithPath [path=classpath:/tpch/lineitem.parquet]], 
 00-04 *SelectionVectorRemover* : rowType = RecordType(DYNAMIC_STAR T45¦¦**, 
ANY o_custkey, ANY o_orderkey):
 00-06 Filter(condition=[AND(>($1, 1498), >($2, 58999))]) : rowType = 
RecordType(DYNAMIC_STAR T45¦¦**, ANY o_custkey, ANY o_orderkey): 
 00-08 Project(T45¦¦**=[$0], o_custkey=[$1], o_orderkey=[$2]) : rowType = 
RecordType(DYNAMIC_STAR T45¦¦**, ANY o_custkey, ANY o_orderkey):
 00-10 Scan(table=[[cp, tpch/orders.parquet]],
{panel}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (DRILL-7012) Make SelectionVectorRemover project only the needed columns

2019-01-28 Thread Boaz Ben-Zvi (JIRA)

Boaz Ben-Zvi created DRILL-7012:
---

 Summary: Make SelectionVectorRemover project only the needed 
columns
 Key: DRILL-7012
 URL: https://issues.apache.org/jira/browse/DRILL-7012
 Project: Apache Drill
  Issue Type: Improvement
  Components: Execution - Relational Operators, Query Planning  
Optimization
Affects Versions: 1.15.0
Reporter: Boaz Ben-Zvi


   A SelectionVectorRemover is often used after a filter, to copy into a newly 
allocated new batch only the "filtered out" rows. In some cases the columns 
used by the filter are not needed downstream; currently these columns are being 
needlessly allocated and copied, and later removed by a Project.

  _+Suggested improvement+_: The planner can pass the information about these 
columns to the SelectionVectorRemover, which would avoid this useless 
allocation and copy. The Planner would also eliminate that Project from the 
plan.

   Here is an example, the query:
{code:java}
select max(l_quantity) from cp.`tpch/lineitem.parquet` L where L.l_orderkey > 
58999 and L.l_shipmode = 'TRUCK' group by l_linenumber ;
{code}
And the result plan (trimmed for readability), where "l_orderkey" and 
"l_shipmode" are removed by the Project:
{noformat}
00-00 Screen : rowType = RecordType(ANY EXPR$0): 
 00-01 Project(EXPR$0=[$0]) : rowType = RecordType(ANY EXPR$0): 
 00-02 Project(EXPR$0=[$1]) : rowType = RecordType(ANY EXPR$0): 
 00-03 HashAgg(group=[\{0}], EXPR$0=[MAX($1)]) : rowType = RecordType(ANY 
l_linenumber, ANY EXPR$0): 
 00-04 *Project*(l_linenumber=[$2], l_quantity=[$3]) : rowType = RecordType(ANY 
l_linenumber, ANY l_quantity): 
 00-05 *SelectionVectorRemover* : rowType = RecordType(ANY *l_orderkey*, ANY 
*l_shipmode*, ANY l_linenumber, ANY l_quantity): 
 00-06 *Filter*(condition=[AND(>($0, 58999), =($1, 'TRUCK'))]) : rowType = 
RecordType(ANY l_orderkey, ANY l_shipmode, ANY l_linenumber, ANY l_quantity): 
 00-07 Scan(table=[[cp, tpch/lineitem.parquet]], groupscan=[ParquetGroupScan 
[entries=[ReadEntryWithPath [path=classpath:/tpch/lineitem.parquet]], 
selectionRoot=classpath:/tpch/lineitem.parquet, numFiles=1, numRowGroups=1, 
usedMetadataFile=false, columns=[`l_orderkey`, `l_shipmode`, `l_linenumber`, 
`l_quantity`]]]) : rowType = RecordType(ANY l_orderkey, ANY l_shipmode, ANY 
l_linenumber, ANY l_quantity):
{noformat}
The implementation will not be simple, as the relevant code (e.g., 
GenericSV2Copier) has no idea of specific columns.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Re: Apache Drill Hangout - 22 Jan, 2019

2019-01-21 Thread Boaz Ben-Zvi



  Seems that the Apache mail server messed up the hangout link; here it is 
again:

 https://meet.google.com/yki-iqdf-tai?authuser=3 

   -- Boaz

On 2019/01/22 04:47:41, Boaz Ben-Zvi  wrote: 
>  Hi Drillers,
>    The bi-weekly Apache Drill hangout is scheduled for tomorrow, Tuesday, Jan 
> 22nd, at 10 AM PST. The original plan was for Arina to talk about Schema 
> Provisioning.If there are any other topics or questions, feel free to reply 
> or raise during the hangout.
> The hangout link: 
> Meet
> 
>   
> |  
> |   
> |   
> |   ||
> 
>|
> 
>   |
> |  
> ||  
> Meet
>  Real-time meetings by Google. Using your browser, share your video, desktop, 
> and presentations with teammates an...  |   |
> 
>   |
> 
>   |
> 
>  
> 
>     Thanks,
>           Boaz
>

Apache Drill Hangout - 22 Jan, 2019

2019-01-21 Thread Boaz Ben-Zvi

 Hi Drillers,
   The bi-weekly Apache Drill hangout is scheduled for tomorrow, Tuesday, Jan 
22nd, at 10 AM PST. The original plan was for Arina to talk about Schema 
Provisioning.If there are any other topics or questions, feel free to reply or 
raise during the hangout.
The hangout link: 
Meet

  
|  
|   
|   
|   ||

   |

  |
|  
||  
Meet
 Real-time meetings by Google. Using your browser, share your video, desktop, 
and presentations with teammates an...  |   |

  |

  |

 

    Thanks,
          Boaz

Re: [VOTE] Apache Drill release 1.15.0 - RC2

2018-12-27 Thread Boaz Ben-Zvi


  -- Verified gpg signature on source and binaries.

  -- Checked the checksum sha512 - matched.

  -- Downloaded source to Linux VM - full build and unit tests passed.

  -- On the Mac - Build and unit tests passed, except the 
`drill_derby_test` in the `contrib/storage-jdbc` which also fails for 
1.14.0 on my Mac (so it is a local environment issue).


  -- Manually ran on both Mac and Linux, and checked the Web-UI: All my 
`semijoin` tests, and memory spilling tests for hash-join and hash-aggr. 
And a select number of large queries. All passed OK.


   ==>    +1 (binding)

  Thanks,

   Boaz

On 12/27/18 12:54 PM, Abhishek Girish wrote:

+1

- Brought up Drill in distributed mode on a 4 node cluster with MapR
platform - looks good!
- Ran regression tests from [6] - looks good!
- Ran unit tests with default & mapr profile - looks good!
- Basic sanity tests on Sqlline, Web UI - looks good!

[6] 
https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_mapr_drill-2Dtest-2Dframework=DwIBaQ=cskdkSMqhcnjZxdQVpwTXg=PqKay2uOMZUqopDRKNfBtZSlsp2meGOxWNAVHxHnXCk=7tE7GD3UydzyDZaH_H0xw7V_m-XWe0tj8frqvjH2h7w=Q8PqbATc4VPUWvGcy_V_7iSQu9uyi1iCqLV5v1Mg31k=

On Thu, Dec 27, 2018 at 11:12 AM Aman Sinha  wrote:


- Downloaded source from [3] onto my Linux VM, built and ran unit tests.  I
had to run some test suites individually but got a clean run.
- Verified extraneous directory issue (DRILL-6916) is resolved
- Built the source using MapR profile and ran the secondary indexing tests
within mapr format plugin
- Downloaded binary tar ball from [3] on my Mac.  Verified checksum of the
file using shasum -a 512 *file *and comparing with the one on [3]
- Verified Vitalii's signature through the following command:  gpg --verify
Downloads/apache-drill-1.15.0.tar.gz.asc apache-drill-1.15.0.tar.gz
- Ran Drill in embedded mode and ran a few TPC-H queries.  Checked query
profiles through Web UI

LGTM.   +1

Aman

On Thu, Dec 27, 2018 at 6:17 AM Denys Ordynskiy 
wrote:


- downloaded source code, successfully built Drill with mapr profile;
- run Drill in distributed mode on Ubuntu on JDK8;
- connected from Drill Explorer, explored data on S3 and MapRFS storage;
- submitted some tests for Drill Web UI and Drill Rest API.

+1

On Wed, Dec 26, 2018 at 8:40 PM Arina Ielchiieva 

wrote:

Build from source on Linux,  started in embedded mode, ran random

queries.

Downloaded tarball on Windows, started Drill in embedded mode, run

random

queries.
Check Web UI: Profiles, Options, Plugins sections.

Additionally checked:
- information_schema files table;
- new SqlLine version;
- JDBC using Squirrel;
- ODBC using Drill Explorer;
- return result set option.

+1 (binding)

Kind regards,
Arina

On Wed, Dec 26, 2018 at 8:32 PM Volodymyr Vysotskyi <

volody...@apache.org>

wrote:


- Downloaded built tar, checked signatures and hashes for built and

source

tars
and for jars;
- run Drill in embedded mode on both Ubuntu and Windows on JDK8 and

JDK11;

- created views, submitted random TPCH queries from UI and SqlLine,

checked

that profiles are displayed correctly;
- downloaded source tar, ran unit tests and all tests are passed;
- built with mapr profile, started in distributed mode, submitted

several

tests for hive tables, checked logs, no errors are found;
- connected from SQuirrel, ran several queries, tested
exec.query.return_result_set_for_ddl
option;
- checked metadata correctness for decimal results;
- ran several queries from a Java application;
- built native client and submitted several queries.

+1 (binding)

Kind regards,
Volodymyr Vysotskyi


On Mon, Dec 24, 2018 at 9:27 PM Vitalii Diravka 
wrote:


Hi all,

I'd like to propose the second release candidate (rc2) of Apache

Drill,

version 1.15.0.

Changes since the previous release candidate: fixed the

show-stoppers:

DRILL-6919: Error: cannot find symbol in class ServerSocketUtil
DRILL-6920: Fix TestClient.testBasics() yarn test failure
DRILL-6922: QUERY-level options are shown on Profiles tab
DRILL-6925: Unable to generate Protobuf


The release candidate covers a total of 205 resolved JIRAs [1],

[2].

Thanks to everyone who contributed to this release.

The tarball artifacts are hosted at [3] and the maven artifacts are

hosted

at [4].

This release candidate is based on commit
8743e8f1e8d5bca4d67c94d07a8560ad356ff2b6 located at [5].

Please download and try out the release.

The vote ends at 7:00am UTC (11:00am PDT, 9:00pm EET, 12:30am (next

day)

IST), Dec
28, 2018. It is one day longer, since 12/25/2018 is a holiday.

[ ] +1
[ ] +0
[ ] -1

Here's my vote: +1

   [1]

https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.org_jira_projects_DRILL_versions_12343317=DwIBaQ=cskdkSMqhcnjZxdQVpwTXg=PqKay2uOMZUqopDRKNfBtZSlsp2meGOxWNAVHxHnXCk=7tE7GD3UydzyDZaH_H0xw7V_m-XWe0tj8frqvjH2h7w=8UC4l_5h1brNiQ4MXawvPduc09kbi9k0wxN_fdwWwxk=

   [2]

Re: [VOTE] Apache Drill release 1.15.0 - RC2

2018-12-27 Thread Boaz Ben-Zvi


 Hi Karthik,

    I also see a (different) failure on the Mac when running the 
'contrib/storage-jdbc' test:


   [ERROR] Error starting the server for database 'drill_derby_test'.

However this test runs cleanly on Linux, so it has to do with my Mac 
environment. The same failures show on the older 1.14 as well.


To verify that yours is an environment issue too, try building 1.14 (see 
[1]) and then run the same test there:


/mvn integration-test failsafe:integration-test -pl contrib/storage-jdbc
/

That 1.14 should have similar failures. (The above form of mvn is needed 
as those tests are disabled in 1.14)

//

   Thanks,

    Boaz

[1] http://home.apache.org/~boaz/drill/releases/1.14.0/rc3/


On 12/27/18 1:46 PM, Karthikeyan Manivannan wrote:

Hi,

I am seeing a connection failure in a drill-jdbc-storage test on my Mac
with 1.15 rc2 . Is this because of some config issue on my Mac?

"[DEBUG] Configuring mojo org.codehaus.mojo:sql-maven-plugin:1.5:execute
from plugin realm ClassRealm[plugin>org.codehaus.mojo:sql-maven-plugin:1.5,
parent: sun.misc.Launcher$AppClassLoader@42a57993]
[DEBUG] Configuring mojo 'org.codehaus.mojo:sql-maven-plugin:1.5:execute'
with basic configurator -->
[DEBUG]   (f) autocommit = false
[DEBUG]   (s) delimiter = ;
[DEBUG]   (s) delimiterType = normal
[DEBUG]   (s) driver = com.mysql.cj.jdbc.Driver
[DEBUG]   (f) enableAnonymousPassword = false
[DEBUG]   (f) enableFiltering = false
[DEBUG]   (s) encoding = UTF-8
[DEBUG]   (s) escapeProcessing = true
[DEBUG]   (s) basedir =
/Users/karthik/test/drill/release/1.15/apache-drill-1.15.0-src/contrib/storage-jdbc/src/test/resources
[DEBUG]   (s) includes = [mysql-test-data.sql]
[DEBUG]   (f) fileset = org.codehaus.mojo.sql.Fileset@51e1e058
[DEBUG]   (f) forceMojoExecution = false
[DEBUG]   (s) keepFormat = false
[DEBUG]   (f) mavenSession =
org.apache.maven.execution.MavenSession@2af46afd
[DEBUG]   (s) onError = abort
[DEBUG]   (s) orderFile = ascending
[DEBUG]   (f) outputDelimiter = ,
[DEBUG]   (s) password = root
[DEBUG]   (s) printResultSet = false
[DEBUG]   (f) project = MavenProject:
org.apache.drill.contrib:drill-jdbc-storage:1.15.0 @
/Users/karthik/test/drill/release/1.15/apache-drill-1.15.0-src/contrib/storage-jdbc/pom.xml
[DEBUG]   (f) settings = org.apache.maven.execution.SettingsAdapter@23e0c200
[DEBUG]   (f) skip = false
[DEBUG]   (f) skipOnConnectionError = false
[DEBUG]   (s) url = jdbc:mysql://localhost:58278/drill_mysql_test
[DEBUG]   (s) username = root
[DEBUG] -- end configuration --
[DEBUG] connecting to jdbc:mysql://localhost:58278/drill_mysql_test
[INFO]

[INFO] Reactor Summary:
[INFO]
[INFO] contrib/jdbc-storage-plugin  FAILURE [01:00
min]
[INFO] contrib/hive-storage-plugin/Parent Pom . SKIPPED
[INFO] contrib/hive-storage-plugin/hive-exec-shaded ... SKIPPED
[INFO] contrib/mapr-format-plugin . SKIPPED
[INFO] contrib/hive-storage-plugin/core ... SKIPPED
[INFO] contrib/kafka-storage-plugin ... SKIPPED
[INFO] contrib/drill-udfs . SKIPPED
[INFO] Packaging and Distribution Assembly  SKIPPED
[INFO]

[INFO] BUILD FAILURE
[INFO]

[INFO] Total time: 01:01 min
[INFO] Finished at: 2018-12-27T13:18:50-08:00
[INFO] Final Memory: 303M/964M
[INFO]

[ERROR] Failed to execute goal
org.codehaus.mojo:sql-maven-plugin:1.5:execute (create-tables) on project
drill-jdbc-storage: Communications link failure
[ERROR]
[ERROR] The last packet sent successfully to the server was 0 milliseconds
ago. The driver has not received any packets from the server. Connection
refused (Connection refused)
[ERROR] -> [Help 1]
org.apache.maven.lifecycle.LifecycleExecutionException: Failed to execute
goal org.codehaus.mojo:sql-maven-plugin:1.5:execute (create-tables) on
project drill-jdbc-storage: Communications link failure

The last packet sent successfully to the server was 0 milliseconds ago. The
driver has not received any packets from the server.
at
org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:212)
at
org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:153)
at
org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:145)
at
org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject(LifecycleModuleBuilder.java:116)
at
org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject(LifecycleModuleBuilder.java:80)
at
org.apache.maven.lifecycle.internal.builder.singlethreaded.SingleThreadedBuilder.build(SingleThreadedBuilder.java:51)
at
org.apache.maven.lifecycle.internal.LifecycleStarter.execute(LifecycleStarter.java:128)
at

[jira] [Created] (DRILL-6915) Unit test mysql-test-data.sql in contrib/jdbc-storage-plugin fails on newer MacOS

2018-12-19 Thread Boaz Ben-Zvi (JIRA)

Boaz Ben-Zvi created DRILL-6915:
---

 Summary: Unit test mysql-test-data.sql in 
contrib/jdbc-storage-plugin fails on newer MacOS
 Key: DRILL-6915
 URL: https://issues.apache.org/jira/browse/DRILL-6915
 Project: Apache Drill
  Issue Type: Bug
  Components: Storage - JDBC
Affects Versions: 1.14.0
 Environment: MacOS, either High Sierra (10.13) or Mojave (10.14).

 
Reporter: Boaz Ben-Zvi


The newer MacOS file systems (10.13 and above) are case-insensitive by default. 
This leads to the following unit test failure:
{code:java}
~/drill > mvn clean install -rf :drill-jdbc-storage
[INFO] Scanning for projects...
[INFO] 
[INFO] Detecting the operating system and CPU architecture
[INFO] 
[INFO] os.detected.name: osx
[INFO] os.detected.arch: x86_64
[INFO] os.detected.version: 10.14
.
[INFO] 
[INFO] Building contrib/jdbc-storage-plugin 1.15.0-SNAPSHOT
[INFO] 
.
[INFO] >> 2018-12-19 15:11:32 7136 [Warning] Setting lower_case_table_names=2 
because file system for __drill/contrib/storage-jdbc/target/mysql-data/data/ is 
case insensitive
.
[ERROR] Failed to execute:
create table CASESENSITIVETABLE (
a BLOB,
b BLOB
)
[INFO] 
[INFO] Reactor Summary:
[INFO]
[INFO] contrib/jdbc-storage-plugin  FAILURE [01:30 min]
...
[ERROR] Failed to execute goal org.codehaus.mojo:sql-maven-plugin:1.5:execute 
(create-tables) on project drill-jdbc-storage: Table 'casesensitivetable' 
already exists -> [Help 1]{code}
in the test file *mysql-test-data.sql*, where +both+ tables 
*caseSensitiveTable* and *CASESENSITIVETABLE* are created.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (DRILL-6888) Nested classes in HashAggTemplate break the plain Java for debugging codegen

2018-12-07 Thread Boaz Ben-Zvi (JIRA)

Boaz Ben-Zvi created DRILL-6888:
---

 Summary: Nested classes in HashAggTemplate break the plain Java 
for debugging codegen
 Key: DRILL-6888
 URL: https://issues.apache.org/jira/browse/DRILL-6888
 Project: Apache Drill
  Issue Type: Improvement
  Components: Execution - Relational Operators
Affects Versions: 1.14.0
Reporter: Boaz Ben-Zvi
Assignee: Boaz Ben-Zvi


The *prefer_plain_java* compile option is useful for debugging of generated 
code.

  DRILL-6719 ("separate spilling logic for Hash Agg") introduced two nested 
classes into the HashAggTemplate class.  However those nested classes cause the 
prefer_plain_java compile option to fail when compiling the generated code, 
like:
{code:java}
Error: SYSTEM ERROR: CompileException: File 
'/tmp/janino5709636998794673307.java', Line 36, Column 35: No applicable 
constructor/method found for actual parameters 
"org.apache.drill.exec.test.generated.HashAggregatorGen11$HashAggSpilledPartition";
 candidates are: "protected 
org.apache.drill.exec.physical.impl.aggregate.HashAggTemplate$BatchHolder 
org.apache.drill.exec.physical.impl.aggregate.HashAggTemplate.injectMembers(org.apache.drill.exec.physical.impl.aggregate.HashAggTemplate$BatchHolder)"
{code}
+The proposed fix+: Move those nested classes outside HashAgTemplate.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (DRILL-6881) Hash-Table insert and probe: Compare hash values before keys

2018-12-04 Thread Boaz Ben-Zvi (JIRA)

Boaz Ben-Zvi created DRILL-6881:
---

 Summary: Hash-Table insert and probe: Compare hash values before 
keys
 Key: DRILL-6881
 URL: https://issues.apache.org/jira/browse/DRILL-6881
 Project: Apache Drill
  Issue Type: Improvement
  Components: Execution - Relational Operators
Affects Versions: 1.14.0
Reporter: Boaz Ben-Zvi
Assignee: Boaz Ben-Zvi
 Fix For: 1.16.0


  When checking for existence of a key in the hash table (during _put_ or 
_probe_ operations), the value of that key is compared (using generated code) 
with a potential match key (same bucket). 
   This comparison is slightly expensive (e.g., long keys, multi column keys, 
checking null conditions, NaN, etc). Instead, if the hash-values of the two 
keys are compared first (at practically zero cost), then the costly comparison 
can be avoided in case the hash values don't match.
 This code change is trivial, and given that the relevant Hash-Table code is 
*hot code*, then even minute improvements could add up.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (DRILL-6880) Hash-Join: Many null keys on the build side form a long linked chain in the Hash Table

2018-12-04 Thread Boaz Ben-Zvi (JIRA)

Boaz Ben-Zvi created DRILL-6880:
---

 Summary: Hash-Join: Many null keys on the build side form a long 
linked chain in the Hash Table
 Key: DRILL-6880
 URL: https://issues.apache.org/jira/browse/DRILL-6880
 Project: Apache Drill
  Issue Type: Improvement
  Components: Execution - Relational Operators
Affects Versions: 1.14.0
Reporter: Boaz Ben-Zvi
 Fix For: 1.16.0


When building the Hash Table for the Hash-Join, each new key is matched with an 
existing key (same bucket) by calling the generated method 
`isKeyMatchInternalBuild`, which compares the two. However when both keys are 
null, the method returns *false* (meaning not-equal; i.e. it is a new key), 
thus the new key is added into the list following the old key. When a third 
null key is found, it would be matched with the prior two, and added as well. 
Etc etc ...

This way many null values would perform checks at order N^2 / 2.

Suggested improvement: The generated code should return a third result, meaning 
"two null keys". Then in case of Inner or Left joins all the duplicate nulls 
can be discarded.

Below is a simple example, note the time difference between non-null and the 
all-nulls tables (also instrumentation showed that for nulls, the method above 
was called 1249975000 times!!)
{code:java}
0: jdbc:drill:zk=local> use dfs.tmp;
0: jdbc:drill:zk=local> create table test as (select cast(null as int) mycol 
from 
 dfs.`/data/test128M.tbl` limit 5);
0: jdbc:drill:zk=local> create table test1 as (select cast(1 as int) mycol1 
from 
 dfs.`/data/test128M.tbl` limit 6);
0: jdbc:drill:zk=local> create table test2 as (select cast(2 as int) mycol2 
from dfs.`/data/test128M.tbl` limit 5);
0: jdbc:drill:zk=local> select count(*) from test1 join test2 on test1.mycol1 = 
test2.mycol2;
+-+
| EXPR$0  |
+-+
| 0   |
+-+
1 row selected (0.443 seconds)
0: jdbc:drill:zk=local> create table test1 as (select cast(1 as int) mycol1 
from dfs.`/data/test128M.tbl` limit 6);
+---++
| Fragment  | Number of records written  |
+---++
| 0_0   | 6  |
+---++
1 row selected (0.517 seconds)
0: jdbc:drill:zk=local> select count(*) from test1 join test on test1.mycol1 = 
test.mycol;
+-+
| EXPR$0  |
+-+
| 0   |
+-+
1 row selected (140.098 seconds)
{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Re: Drill support for SQLPad

2018-11-29 Thread Boaz Ben-Zvi

   Just got it to run on my Mac; looks nice (though the results are 
flushed to the left, like  55_  instead of _55 )


Thanks Charles for making SQLPad work with Drill !

And for anyone else wanting to try (on a Mac), here are the steps used:

$ git clone https://github.com/cgivre/sqlpad.git

$ cd sqlpad/

$ git checkout drill

$ curl -o- 
https://raw.githubusercontent.com/creationix/nvm/v0.33.11/install.sh | bash


$ source ~/.bashrc    ### to get the nvm in the path

<< install "node" from https://nodejs.org/en/download/ >>

$ sudo npm i npm -g   ### need sudo for write access to 
/usr/local/lib/node_modules


$ npm install ### had some errors, so ran the following

$ npm audit fix ### still has some errors downloading fsevents-binaries

$ npm start

At this point, it opens your browser and connects to localhost:3000 .

Start Drill in embedded mode, and in the browser configure a Drill 
connection ( to 127.0.0.1:8048 , don't care about user/password).


Run queries ..

    Thanks,

   Boaz


On 11/29/18 7:26 AM, Charles Givre wrote:

All,
There is a really nice open source tool out there called SQLPad.  In addition 
to executing basic SQL Queries, SQLPad enables to to export results and produce 
basic visualizations.  Until recently, SQLPad did not support Drill however, I 
just wrote a first attempt at Drill support which you can download here:

https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_cgivre_sqlpad_tree_drill=DwIFaQ=cskdkSMqhcnjZxdQVpwTXg=PqKay2uOMZUqopDRKNfBtZSlsp2meGOxWNAVHxHnXCk=SDFB6Jw9G8FdLYUKjQoQK8aUTMY05hzjbfcCgSQIvxI=zTMNxDd3-5PocjYj_K7wajK_3dCsEUVJmnEW5UszMiQ=
 


Please check it out and let me know what you think.
Best,
— C

Re: Hangout Discussion Topics

2018-11-26 Thread Boaz Ben-Zvi

    I can present the list of Performance Projects (this was scheduled 
for the Developers Day two weeks ago, but was set aside for the lack of 
time then).


We can dive deeper into any specific project, or discuss a couple of 
general mechanisms that may be needed (preview: these are "shared 
memory" and "pass planner information to the operators").


  Thanks,

   Boaz

On 11/26/18 10:29 AM, Vitalii Diravka wrote:

Hi All,

Does anyone have any topics to discuss during the hangout tomorrow?

Kind regards
Vitalii

[jira] [Created] (DRILL-6864) Root POM: Update the git-commit-id plugin

2018-11-20 Thread Boaz Ben-Zvi (JIRA)

Boaz Ben-Zvi created DRILL-6864:
---

 Summary: Root POM: Update the git-commit-id plugin
 Key: DRILL-6864
 URL: https://issues.apache.org/jira/browse/DRILL-6864
 Project: Apache Drill
  Issue Type: Improvement
  Components: Tools, Build  Test
Affects Versions: 1.14.0
Reporter: Boaz Ben-Zvi
Assignee: Boaz Ben-Zvi
 Fix For: 1.15.0


   The Maven git-commit-id plugin is of version 2.1.9, which is 4.5 years old. 
Executing this plugin seems to take a significant portion of the mvn build 
time. Newer versions run more than twice as fast (see below).

  Suggestion: Upgrade to the latest (2.2.5), to shorten the Drill mvn build 
time.

Here are the run times with our *current (2.1.9)* version:
{code:java}
[INFO]   git-commit-id-plugin:revision (for-jars) . [25.320s]
[INFO]   git-commit-id-plugin:revision (for-jars) . [24.255s]
[INFO]   git-commit-id-plugin:revision (for-jars) . [22.821s]
[INFO]   git-commit-id-plugin:revision (for-jars) . [32.889s]
[INFO]   git-commit-id-plugin:revision (for-jars) . [34.557s]
[INFO]   git-commit-id-plugin:revision (for-jars) . [26.085s]
[INFO]   git-commit-id-plugin:revision (for-jars) . [46.135s]
[INFO]   git-commit-id-plugin:revision (for-jars) . [72.811s]
[INFO]   git-commit-id-plugin:revision (for-jars) . [45.956s]
[INFO]   git-commit-id-plugin:revision (for-jars) . [18.223s]
[INFO]   git-commit-id-plugin:revision (for-jars) . [19.841s]
[INFO]   git-commit-id-plugin:revision (for-jars) . [50.146s]
[INFO]   git-commit-id-plugin:revision (for-jars) . [30.993s]
[INFO]   git-commit-id-plugin:revision (for-jars) . [32.839s]
[INFO]   git-commit-id-plugin:revision (for-jars) . [33.852s]
[INFO]   git-commit-id-plugin:revision (for-jars) . [23.562s]
[INFO]   git-commit-id-plugin:revision (for-jars) . [25.333s]
[INFO]   git-commit-id-plugin:revision (for-jars) . [24.737s]
[INFO]   git-commit-id-plugin:revision (for-jars) . [19.098s]
[INFO]   git-commit-id-plugin:revision (for-jars) . [46.245s]
[INFO]   git-commit-id-plugin:revision (for-jars) . [40.350s]
[INFO]   git-commit-id-plugin:revision (for-jars) . [34.610s]
[INFO]   git-commit-id-plugin:revision (for-jars) . [78.756s]
[INFO]   git-commit-id-plugin:revision (for-source-tarball) ... [52.551s]
[INFO]   git-commit-id-plugin:revision (for-jars) . [10.940s]
[INFO]   git-commit-id-plugin:revision (for-jars) . [24.573s]
[INFO]   git-commit-id-plugin:revision (for-jars) . [24.404s]
[INFO]   git-commit-id-plugin:revision (for-jars) . [43.501s]
[INFO]   git-commit-id-plugin:revision (for-jars) . [25.041s]
[INFO]   git-commit-id-plugin:revision (for-jars) . [39.149s]
[INFO]   git-commit-id-plugin:revision (for-jars) . [40.310s]
{code}
And here are the run times with a newer (2.2.4) version:
{code:java}
[INFO]   git-commit-id-plugin:revision (for-jars) . [6.964s]
[INFO]   git-commit-id-plugin:revision (for-jars) . [18.732s]
[INFO]   git-commit-id-plugin:revision (for-jars) . [7.441s]
[INFO]   git-commit-id-plugin:revision (for-jars) . [8.146s]
[INFO]   git-commit-id-plugin:revision (for-jars) . [6.404s]
[INFO]   git-commit-id-plugin:revision (for-jars) . [7.837s]
[INFO]   git-commit-id-plugin:revision (for-jars) . [9.788s]
[INFO]   git-commit-id-plugin:revision (for-jars) . [9.136s]
[INFO]   git-commit-id-plugin:revision (for-jars) . [19.607s]
[INFO]   git-commit-id-plugin:revision (for-jars) . [9.289s]
[INFO]   git-commit-id-plugin:revision (for-jars) . [8.046s]
[INFO]   git-commit-id-plugin:revision (for-jars) . [8.268s]
[INFO]   git-commit-id-plugin:revision (for-jars) . [7.868s]
[INFO]   git-commit-id-plugin:revision (for-jars) . [10.750s]
[INFO]   git-commit-id-plugin:revision (for-jars) . [8.558s]
[INFO]   git-commit-id-plugin:revision (for-jars) . [11.267s]
[INFO]   git-commit-id-plugin:revision (for-jars) . [15.696s]
[INFO]   git-commit-id-plugin:revision (for-jars) . [9.446s]
[INFO]   git-commit-id-plugin:revision (for-jars) . [6.187s]
[INFO]   git-commit-id-plugin:revision (for-jars) . [24.806s]
[INFO]   git-commit-id-plugin:revision (for-jars) . [14.591s]
[INFO]   git-commit-id

[jira] [Created] (DRILL-6861) Hash-Join: Spilled partitions are skipped following an empty probe side

2018-11-19 Thread Boaz Ben-Zvi (JIRA)

Boaz Ben-Zvi created DRILL-6861:
---

 Summary: Hash-Join: Spilled partitions are skipped following an 
empty probe side
 Key: DRILL-6861
 URL: https://issues.apache.org/jira/browse/DRILL-6861
 Project: Apache Drill
  Issue Type: Bug
  Components: Execution - Relational Operators
Affects Versions: 1.14.0
Reporter: Boaz Ben-Zvi
Assignee: Boaz Ben-Zvi
 Fix For: 1.15.0


     Following DRILL-6755 (_Avoid building a hash table when the probe side is 
empty_) - The special case of an empty spilled probe-partition was not handled. 
 When such a case happens, the Hash-Join terminates early (returns NONE) and 
the remaining partitions are not processed/returned (which may lead to 
incorrect results).

  A test case - force tpcds/query95 to spill :
{code:java}
0: jdbc:drill:zk=local> alter system set `exec.hashjoin.max_batches_in_memory` 
= 40;
+---+---+
|  ok   |summary|
+---+---+
| true  | exec.hashjoin.max_batches_in_memory updated.  |
+---+---+
1 row selected (1.325 seconds)
0: jdbc:drill:zk=local> WITH ws_wh AS
. . . . . . . . . . . > (
. . . . . . . . . . . >SELECT ws1.ws_order_number,
. . . . . . . . . . . >   ws1.ws_warehouse_sk wh1,
. . . . . . . . . . . >   ws2.ws_warehouse_sk wh2
. . . . . . . . . . . >FROM   dfs.`/data/tpcds/sf1/parquet/web_sales` 
ws1,
. . . . . . . . . . . >   dfs.`/data/tpcds/sf1/parquet/web_sales` 
ws2
. . . . . . . . . . . >WHERE  ws1.ws_order_number = ws2.ws_order_number
. . . . . . . . . . . >ANDws1.ws_warehouse_sk <> 
ws2.ws_warehouse_sk)
. . . . . . . . . . . > SELECT
. . . . . . . . . . . >  Count(DISTINCT ws1.ws_order_number) AS `order 
count` ,
. . . . . . . . . . . >  Sum(ws1.ws_ext_ship_cost)   AS `total 
shipping cost` ,
. . . . . . . . . . . >  Sum(ws1.ws_net_profit)  AS `total 
net profit`
. . . . . . . . . . . > FROM dfs.`/data/tpcds/sf1/parquet/web_sales` ws1 ,
. . . . . . . . . . . >  dfs.`/data/tpcds/sf1/parquet/date_dim` dd,
. . . . . . . . . . . >  dfs.`/data/tpcds/sf1/parquet/customer_address` 
ca,
. . . . . . . . . . . >  dfs.`/data/tpcds/sf1/parquet/web_site` wbst
. . . . . . . . . . . > WHEREdd.d_date BETWEEN '2000-04-01' AND  (
. . . . . . . . . . . >   Cast('2000-04-01' AS DATE) + INTERVAL 
'60' day)
. . . . . . . . . . . > AND  ws1.ws_ship_date_sk = dd.d_date_sk
. . . . . . . . . . . > AND  ws1.ws_ship_addr_sk = ca.ca_address_sk
. . . . . . . . . . . > AND  ca.ca_state = 'IN'
. . . . . . . . . . . > AND  ws1.ws_web_site_sk = wbst.web_site_sk
. . . . . . . . . . . > AND  wbst.web_company_name = 'pri'
. . . . . . . . . . . > AND  ws1.ws_order_number IN
. . . . . . . . . . . >  (
. . . . . . . . . . . > SELECT ws_wh.ws_order_number
. . . . . . . . . . . > FROM   ws_wh)
. . . . . . . . . . . > AND  ws1.ws_order_number IN
. . . . . . . . . . . >  (
. . . . . . . . . . . > SELECT wr.wr_order_number
. . . . . . . . . . . > FROM   
dfs.`/data/tpcds/sf1/parquet/web_returns` wr,
. . . . . . . . . . . >ws_wh
. . . . . . . . . . . > WHERE  wr.wr_order_number = 
ws_wh.ws_order_number)
. . . . . . . . . . . > ORDER BY count(DISTINCT ws1.ws_order_number)
. . . . . . . . . . . > LIMIT 100;
+--+--+-+
| order count  | total shipping cost  |  total net profit   |
+--+--+-+
| 17   | 38508.1305   | 20822.3   |
+--+--+-+
1 row selected (105.621 seconds)
{code}
The correct results should be:
{code:java}
+--+--+-+
| order count  | total shipping cost  |  total net profit   |
+--+--+-+
| 34   | 63754.72 | 15919.0098  |
+--+--+-+
{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (DRILL-6860) SqlLine: EXPLAIN produces very long header lines

2018-11-15 Thread Boaz Ben-Zvi (JIRA)

Boaz Ben-Zvi created DRILL-6860:
---

 Summary: SqlLine: EXPLAIN produces very long header lines
 Key: DRILL-6860
 URL: https://issues.apache.org/jira/browse/DRILL-6860
 Project: Apache Drill
  Issue Type: Bug
  Components: Client - CLI
Affects Versions: 1.14.0
Reporter: Boaz Ben-Zvi
Assignee: Arina Ielchiieva
 Fix For: 1.15.0


Maybe a result of upgrading to SqlLine 1.5.0 (DRILL-3853 - PR #1462), the 
header dividing lines displayed when using EXPLAIN became very long:

{code}
0: jdbc:drill:zk=local> explain plan for select count(*) from 
dfs.`/data/tpcds/sf1/parquet/date_dim`;
+-+---+
|   
 text   
  | 





json





  |
+-+---+
| 00-00Screen
00-01  Project(EXPR$0=[$0])
00-02DirectScan(groupscan=[files = 
[/data/tpcds/sf1/parquet/date_dim/0_0_0.parquet], numFiles = 1, 
DynamicPojoRecordReader{records = [[73049]]}])
  | {
  "head" : {
"version" : 1,
"generator" : {
  "type" : "ExplainHandler",
  "info" : ""
},
"type" : "APACHE_DRILL_PHYSICAL",
"options" : [ {
  "kind" : "BOOLEAN",
  "accessibleScopes" : "ALL",
  "name" : "planner.enable_nljoin_for_scalar_only",
  "bool_val" : true,
  "scope" : "SESSION"
} ],
"queue" : 0,
"hasResourcePlan" : false,
"resultMode&

[jira] [Created] (DRILL-6859) BETWEEN dates with a slightly malformed DATE string returns false

2018-11-15 Thread Boaz Ben-Zvi (JIRA)

Boaz Ben-Zvi created DRILL-6859:
---

 Summary: BETWEEN dates with a slightly malformed DATE string 
returns false
 Key: DRILL-6859
 URL: https://issues.apache.org/jira/browse/DRILL-6859
 Project: Apache Drill
  Issue Type: Bug
  Components: Query Planning  Optimization
Affects Versions: 1.14.0
Reporter: Boaz Ben-Zvi
 Fix For: Future


(This may be a Calcite issue )

In the following query using BETWEEN with dates, the "month" is specified as 
"4", instead of "04", which causes the BETWEEN clause to evaluate to FALSE. 
Note that rewriting the clause with less-than etc. does work correctly.
{code:java}
0: jdbc:drill:zk=local> select count(*) from `date_dim` dd where dd.d_date 
BETWEEN '2000-4-01' and ( Cast('2000-4-01' AS DATE) + INTERVAL '60' day) ;
+-+
| EXPR$0  |
+-+
| 0   |
+-+
1 row selected (0.184 seconds)
0: jdbc:drill:zk=local> select count(*) from `date_dim` dd where dd.d_date 
BETWEEN '2000-04-01' and ( Cast('2000-4-01' AS DATE) + INTERVAL '60' day) limit 
10;
+-+
| EXPR$0  |
+-+
| 61  |
+-+
1 row selected (0.209 seconds)
0: jdbc:drill:zk=local> select count(*) from `date_dim` dd where dd.d_date >= 
'2000-4-01' and dd.d_date <= '2000-5-31';
+-+
| EXPR$0  |
+-+
| 61  |
+-+
1 row selected (0.227 seconds)
{code}

The physical plan for the second (good) case implements the BETWEEN clause with 
a FILTER on top of the scanner. For the first (failed) case, there is a "limit 
0" on top of the scanner.

(This query was extracted from TPC-DS 95, used over Parquet files).




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Resolved] (DRILL-6798) Planner changes to support semi-join

2018-11-15 Thread Boaz Ben-Zvi (JIRA)



 [ 
https://issues.apache.org/jira/browse/DRILL-6798?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Boaz Ben-Zvi resolved DRILL-6798.
-
Resolution: Fixed

Commit ID 71809ca6216d95540b2a41ce1ab2ebb742888671

 

> Planner changes to support semi-join
> 
>
> Key: DRILL-6798
> URL: https://issues.apache.org/jira/browse/DRILL-6798
> Project: Apache Drill
>  Issue Type: Sub-task
>  Components: Query Planning  Optimization
>Affects Versions: 1.14.0
>    Reporter: Boaz Ben-Zvi
>Assignee: Hanumath Rao Maduri
>Priority: Major
> Fix For: 1.15.0
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (DRILL-6845) Eliminate duplicates for Semi Hash Join

2018-11-12 Thread Boaz Ben-Zvi (JIRA)

Boaz Ben-Zvi created DRILL-6845:
---

 Summary: Eliminate duplicates for Semi Hash Join
 Key: DRILL-6845
 URL: https://issues.apache.org/jira/browse/DRILL-6845
 Project: Apache Drill
  Issue Type: Sub-task
  Components: Execution - Relational Operators
Affects Versions: 1.14.0
Reporter: Boaz Ben-Zvi
Assignee: Boaz Ben-Zvi
 Fix For: 1.15.0


Following DRILL-6735: The performance of the new Semi Hash Join may degrade if 
the build side contains excessive number of join-key duplicate rows; this 
mainly a result of the need to store all those rows first, before the hash 
table is built.

  Proposed solution: For Semi, the Hash Agg would create a Hash-Table 
initially, and use it to eliminate key-duplicate rows as they arrive.

  Proposed extra: That Hash-Table has an added cost (e.g. resizing). So perform 
"runtime stats" – Check initial number of incoming rows (e.g. 32k), and if the 
number of duplicates is less than some threshold (e.g. %20) – cancel that 
"early" hash table.

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (DRILL-6836) Eliminate StreamingAggr for COUNT DISTINCT

2018-11-08 Thread Boaz Ben-Zvi (JIRA)

Boaz Ben-Zvi created DRILL-6836:
---

 Summary: Eliminate StreamingAggr for COUNT DISTINCT
 Key: DRILL-6836
 URL: https://issues.apache.org/jira/browse/DRILL-6836
 Project: Apache Drill
  Issue Type: Improvement
  Components: Execution - Relational Operators, Query Planning  
Optimization
Affects Versions: 1.14.0
Reporter: Boaz Ben-Zvi
Assignee: Boaz Ben-Zvi
 Fix For: 1.16.0


The COUNT DISTINCT operation is often implemented with a Hash-Aggr operator for 
the DISTINCT, and a Streaming-Aggr above to perform the COUNT.  That 
Streaming-Aggr does the counting like any aggregation, counting each value, 
batch after batch.

  While very efficient, that counting work is basically not needed, as the 
Hash-Aggr knows the number of distinct values (in the in-memory partitions).

  Hence _a possible small performance improvement_ - eliminate the 
Streaming-Aggr operator, and notify the Hash-Aggr to return a COUNT (these are 
Planner changes). The Hash-Aggr operator would need to generate the single 
Float8 column output schema, and output that batch with a single value, just 
like the Streaming -Aggr did (likely without generating code).

  In case of a spill, the Hash-Aggr still needs to read and process those 
partitions, to get the exact distinct number.

   The expected improvement is the elimination of the batch by batch output 
from the Hash-Aggr, and the batch by batch, row by row processing of the 
Streaming-Aggr.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Re: Handling schema change in blocking operators

2018-11-06 Thread Boaz Ben-Zvi

Hi Paul,

(_a_) Having a "schema file" sounds like contradiction to calling Drill
"schema free"; maybe we could "sweep it under the mat" by creating a new
convention for scanners, such that if a scanner has multiple files to
read (e.g. f1.csv, f2,csv, ...), then is there's some file named
"MeFirst.csv", it would always be read first !! (With some option to
skip some of the rows there, like "MeFirst0.csv" means skip all the rows).

(_b_) If the schema (hint) is kept somewhere, could it be updated
automatically by the executing query ? If so, running again a query that
failed with "schema change" my succeed second time. If there is an issue
with permissions, maybe each user can keep such cache in its ~/.drill ...

(_c_) Indeed we can't have a general "schema change" solution; however
we can focus on the low hanging fruit, namely "schema evolution". In
many cases, the change in the schema is "natural", and we could easily
adopt the blocking operator. Cases like:

* Column added

* Fields added in a Json

* Numeric "enlargement", like INT --> BIGINT, or INT --> DECIMAL, etc.

* Non-Nullable to Nullable.

Further ideas:

- A blocking operator has a notion of the current schema; once the
schema "evolves", it can either "pause and convert all the old ones",
or work lazily -- just track the old ones, and make changes as needed
(e.g., work with two sets of generated code, as needed).

- As these changes are rare, we could restrict to handling only "one
active change at a time"

- Memory management could be an issue (with "pause and convert"), but
may be simple if the computation starts using the newer bigger batch
size (for "lazy").

- We should distinguish between "key" columns, and "non-key" columns
(for Sort / Hash-Join) or "value" columns in the Hash-Agg. One
possibility for the Hash operators is to have some hash function
compatibility, like HashFunc( INT 567 ) == HashFunc( BIGINT 567 ), to
simplify (and avoid rehashing).

Thanks,

Boaz

On 11/6/18 12:25 PM, Paul Rogers wrote:

HI Aman,

I would completely agree with the analysis -- except for the fact that we can't
create a general solution, only a patchwork of incomplete ad-hoc solutions. The
question is not whether it would be useful to have a general solution (it
would), rather whether it is technically possible without some help from the
user (it is not, IMHO.)

I like the scenario presented, gives us a concrete example. Let's say an IoT
device produced files with an evolving schema. A field in a JSON file started
as BIGINT, later because DOUBLE, and finally became VARCHAR. What should Drill
do? Maybe the values are:
1
1.1
1.33

The change of types might represent the idea that the above are money amounts,
and the only way to represent values exactly is with a string (in JSON) and
with a DECIMAL in Drill.

Or, maybe the values are:
1
1.1
1.1rev3

Which showed that the value is a version string. Early developers thought to
use an integer, later they wanted minor versions, and even later they realized
they needed a patch value. The correct value type is VARCHAR.

Once can also invent a scenario in which the proper type is BIGINT, DOUBLE or
even TIMESTAMP.

Since Drill can't know the user's intention, we can invest quite a bit of
effort and still not solve the problem.

What is the alternative?

Suppose we simply let the query fail when we see a schema change, but we point
the user to a solution:

Query failed: Schema conflict on column `foo`: BIGINT and DOUBLE.
Use a schema file to resolve the ambiguity.
See
https://urldefense.proofpoint.com/v2/url?u=http-3A__drill.apache.org_docs_schema-2Dfile=DwIFaQ=cskdkSMqhcnjZxdQVpwTXg=PqKay2uOMZUqopDRKNfBtZSlsp2meGOxWNAVHxHnXCk=DOzeipgsStxUnotQemlm6judvWUdbAdPuvMBBYh4ilU=_ezJ4X476FCrf8ouHloYk1NLS91bs7ITW7u36molPmU=
for more information.

Now, the user is in control: we stated what we can and cannot do and gave the
user the option to decide on the data type.

This is a special case of other use cases: it works just as well for specifying
CSV types, refining JSON types and so on. A single solution that solves
multiple problems.

This approach also solves the problem that the JDBC and ODBC clients can't
handle a schema that changes during processing. (The native Drill client can,
which is a rather cool feature. xDBC hasn't caught up, so we have to deal with
them as they are.)

In fact, Drill could then say: if your data is nice and clean, query it without
a schema since the data speaks for itself. If, however, your data is messy (as
real-word data tends to be), just provide a schema to explain the intent and
Drill will do the right thing.

And, again, if the team tried the schema solution first, you'd be in a much
better position to see what additional benefits could be had by trying to guess
the type (and solving the time-travel issue.) (This is the lazy approach: do
the least amount of work...)

In fact, it may turn out that schema

[jira] [Created] (DRILL-6799) Enhance the Hash-Join Operator to perform Anti-Semi-Join

2018-10-16 Thread Boaz Ben-Zvi (JIRA)

Boaz Ben-Zvi created DRILL-6799:
---

 Summary: Enhance the Hash-Join Operator to perform Anti-Semi-Join
 Key: DRILL-6799
 URL: https://issues.apache.org/jira/browse/DRILL-6799
 Project: Apache Drill
  Issue Type: Improvement
  Components: Execution - Relational Operators, Query Planning  
Optimization
Affects Versions: 1.14.0
Reporter: Boaz Ben-Zvi
Assignee: Boaz Ben-Zvi
 Fix For: 1.16.0


Similar to handling Semi-Join (see DRILL-6735), the Anti-Semi-Join can be 
enhanced by eliminating the extra DISTINCT (i.e. Hash-Aggr) operator.

Example (note the NOT IN):
select c.c_first_name, c.c_last_name from dfs.`/data/json/s1/customer` c where 
c.c_customer_sk NOT IN (select s.ss_customer_sk from 
dfs.`/data/json/s1/store_sales` s) limit 4;



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (DRILL-6798) Planner changes to support semi-join

2018-10-16 Thread Boaz Ben-Zvi (JIRA)

Boaz Ben-Zvi created DRILL-6798:
---

 Summary: Planner changes to support semi-join
 Key: DRILL-6798
 URL: https://issues.apache.org/jira/browse/DRILL-6798
 Project: Apache Drill
  Issue Type: Sub-task
Affects Versions: 1.14.0
Reporter: Boaz Ben-Zvi
Assignee: Hanumath Rao Maduri
 Fix For: 1.15.0






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Multi Commit PRs (Re: Drill Hangout tomorrow 09/18)

2018-09-25 Thread Boaz Ben-Zvi

More on splitting a PR into multiple commits - link [1] below shows
how to take the last commit and break it (thanks Hanumath.)

I just practiced this method on a PR (1480 - see [2]); this separates
the actual logic of the change from the less relevant definitions,
cleanups, etc.

This does require careful manual work from the developer; like if two
changes are adjacent (i.e. become a single "hunk"), then you need to
select the "e" option and edit that "hunk".

An open question: Must we eventually squash those multiple commits, or
would it work better to keep them apart committed into the master ?

Thanks,

Boaz

[1]
https://stackoverflow.com/questions/1440050/how-to-split-last-commit-into-two-in-git/1440200

[2] https://github.com/apache/drill/pull/1480/commits

On 9/24/18 1:49 PM, Jyothsna Reddy wrote:

Notes from the Hangout session

Attendes:
Jyothsna, Boaz, Sorabh, Arina, Bohdan, Ihor, Hanumath, Pritesh, Vitali,
Kunal, Robert

Interesting thing shared by Boaz : All the minor fragments are assigned to
Drillbits in round robin fashion and not in a sequential order.

Boaz brought up the topic of improving the quality of code reviews:
Topic of the Hangout: How do we improve the process of code review?

It is highly difficult for the reviewer to do a code review if he/she
doesn't know the context and hard to figure out if the PR contains too many
code changes.

Ideas to improve the code review process:

- One idea is to break the commits into smaller commits so that each
commit is coherent and keeping the refactoring changes in a different
commit. But its hard for the developers to separate out into multiple
commits if they are too deeply tangled. Although it creates more work for
developers, it makes reviewers job easier by doing this. This helps in
finding bugs in earlier stages too.
- It would be easier if someone can find ways where Git allows to split
the commits. Hanumath had tried this earlier.
- Mandating check style before code review and it shouldn't be code
reviewer's job to point out those.
- Bring a reviewer early on into code review process rather than dumping
a 1 line code changes at a go.
- Push smaller commits into master if they make sense.
- Do some live code review sessions where external contributors and
reviewer can have discussions related to pull requests in a hangout.
- Don't squash the commits unless needed.
- Reviewers should give full set of comments at one go and there
shouldn't be more than 4-5 rounds of code reviews.
- Check style should be included for spaces and stuff and developers
should try to use IntelliJ IDE and should pay attention to the warnings.
- Its helpful for reviewers if developers provide screenshots of UI for
UI changes and attach before and after code if changes are made to code
generators.

Please feel free to add ideas to the above incase if you have any ideas to
improve the code review process.

Thank you,
Jyothsna

On Mon, Sep 17, 2018 at 12:55 PM Jyothsna Reddy
wrote:

The Apache Drill Hangout will be held tomorrow at 10:00am PST; please let
us know should you have a topic for tomorrow's hangout. We will also ask
for topics at the beginning of the hangout.

Hangout Link -
https://urldefense.proofpoint.com/v2/url?u=https-3A__hangouts.google.com_hangouts_-5F_event_ci4rdiju8bv04a64efj5fedd0lc=DwIBaQ=cskdkSMqhcnjZxdQVpwTXg=7lXQnf0aC8VQ0iMXwVgNHw=9AQiac0o0ILqquFD8t1gtRKb9VgnUsNWPhyNGEa7x4Q=tNdr_LHgocB7NB3XiSCrp296AMXJgG7YHuOaKD95X74=

Thank you,
Jyothsna

[jira] [Created] (DRILL-6758) Hash Join should not return the join columns when they are not needed downstream

2018-09-21 Thread Boaz Ben-Zvi (JIRA)

Boaz Ben-Zvi created DRILL-6758:
---

 Summary: Hash Join should not return the join columns when they 
are not needed downstream
 Key: DRILL-6758
 URL: https://issues.apache.org/jira/browse/DRILL-6758
 Project: Apache Drill
  Issue Type: Improvement
  Components: Execution - Relational Operators, Query Planning  
Optimization
Affects Versions: 1.14.0
Reporter: Boaz Ben-Zvi
Assignee: Hanumath Rao Maduri
 Fix For: 1.15.0


Currently the Hash-Join operator returns all its (both sides) incoming columns. 
In cases where the join columns are not used further downstream, this is a 
waste (allocating vectors, copying each value, etc).

  Suggestion: Have the planner pass this information to the Hash-Join operator, 
to enable skipping the return of these columns.

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (DRILL-6735) Enhance the Hash-Join Operator to perform Semi and Anti-Semi joins

2018-09-07 Thread Boaz Ben-Zvi (JIRA)

Boaz Ben-Zvi created DRILL-6735:
---

 Summary: Enhance the Hash-Join Operator to perform Semi and 
Anti-Semi joins
 Key: DRILL-6735
 URL: https://issues.apache.org/jira/browse/DRILL-6735
 Project: Apache Drill
  Issue Type: Improvement
  Components: Execution - Relational Operators, Query Planning  
Optimization
Affects Versions: 1.14.0
Reporter: Boaz Ben-Zvi
Assignee: Boaz Ben-Zvi
 Fix For: 1.15.0


Currently Drill implements Semi-Join (see DRILL-402) by using a regular join, 
with a DISTINCT operator under the build upstream side to eliminate duplicates. 
Typically a physical plan for the Semi uses a hash-join, with a hash-aggr 
performing the DISTINCT (see example below). 
    This effectively builds the same hash table(s) twice - a big waste of time 
and memory.

+Improvement+: Eliminate the Hash-Aggr from the plan, and notify the Hash-Join 
to perform a Semi-join. The HJ then would just skip the duplicates in its hash 
table(s), thus performing a Semi -Join.

Example: 
{code}
select c.c_first_name, c.c_last_name from dfs.`/data/json/s1/customer` c where 
c.c_customer_sk in (select s.ss_customer_sk from 
dfs.`/data/json/s1/store_sales` s) limit 4;
{code}

And the result plan (see the HJ at 01-03, and the Hash Agg at 01-05):
{code}
00-00Screen : rowType = RecordType(ANY c_first_name, ANY c_last_name): 
rowcount = 4.0, cumulative cost = {4693752.96 rows, 2.309557672003E7 cpu, 
0.0 io, 2.1598011392E9 network, 3.589586176005E7 memory}, id = 1320
00-01  Project(c_first_name=[$1], c_last_name=[$2]) : rowType = 
RecordType(ANY c_first_name, ANY c_last_name): rowcount = 4.0, cumulative cost 
= {4693752.56 rows, 2.309557632004E7 cpu, 0.0 io, 2.1598011392E9 network, 
3.589586176005E7 memory}, id = 1319
00-02Project(c_customer_sk=[$1], c_first_name=[$2], c_last_name=[$3], 
ss_customer_sk=[$0]) : rowType = RecordType(ANY c_customer_sk, ANY 
c_first_name, ANY c_last_name, ANY ss_customer_sk): rowcount = 4.0, cumulative 
cost = {4693748.56 rows, 2.309556832004E7 cpu, 0.0 io, 2.1598011392E9 
network, 3.589586176005E7 memory}, id = 1318
00-03  SelectionVectorRemover : rowType = RecordType(ANY 
ss_customer_sk, ANY c_customer_sk, ANY c_first_name, ANY c_last_name): rowcount 
= 4.0, cumulative cost = {4693744.56 rows, 2.309555232004E7 cpu, 0.0 io, 
2.1598011392E9 network, 3.589586176005E7 memory}, id = 1317
00-04Limit(fetch=[4]) : rowType = RecordType(ANY ss_customer_sk, 
ANY c_customer_sk, ANY c_first_name, ANY c_last_name): rowcount = 4.0, 
cumulative cost = {4693740.56 rows, 2.309554832004E7 cpu, 0.0 io, 
2.1598011392E9 network, 3.589586176005E7 memory}, id = 1316
00-05  UnionExchange : rowType = RecordType(ANY ss_customer_sk, ANY 
c_customer_sk, ANY c_first_name, ANY c_last_name): rowcount = 4.0, cumulative 
cost = {4693736.56 rows, 2.309553232004E7 cpu, 0.0 io, 2.1598011392E9 
network, 3.589586176005E7 memory}, id = 1315
01-01SelectionVectorRemover : rowType = RecordType(ANY 
ss_customer_sk, ANY c_customer_sk, ANY c_first_name, ANY c_last_name): rowcount 
= 4.0, cumulative cost = {4693732.56 rows, 2.309550032004E7 cpu, 0.0 io, 
2.1597356032E9 network, 3.589586176005E7 memory}, id = 1314
01-02  Limit(fetch=[4]) : rowType = RecordType(ANY 
ss_customer_sk, ANY c_customer_sk, ANY c_first_name, ANY c_last_name): rowcount 
= 4.0, cumulative cost = {4693728.56 rows, 2.309549632004E7 cpu, 0.0 io, 
2.1597356032E9 network, 3.589586176005E7 memory}, id = 1313
01-03HashJoin(condition=[=($1, $0)], joinType=[inner]) : 
rowType = RecordType(ANY ss_customer_sk, ANY c_customer_sk, ANY c_first_name, 
ANY c_last_name): rowcount = 90182.8, cumulative cost = {4693724.56 rows, 
2.309548032004E7 cpu, 0.0 io, 2.1597356032E9 network, 3.589586176005E7 
memory}, id = 1312
01-05  HashAgg(group=[{0}]) : rowType = RecordType(ANY 
ss_customer_sk): rowcount = 18036.56, cumulative cost = {4509140.0 rows, 
2.182423760005E7 cpu, 0.0 io, 1.4775549952E9 network, 3.491878016004E7 
memory}, id = 1309
01-06Project(ss_customer_sk=[$0]) : rowType = 
RecordType(ANY ss_customer_sk): rowcount = 180365.6, cumulative cost = 
{4328774.4 rows, 2.038131280004E7 cpu, 0.0 io, 1.4775549952E9 network, 
3.17443456E7 memory}, id = 1308
01-07  HashToRandomExchange(dist0=[[$0]]) : rowType = 
RecordType(ANY ss_customer_sk, ANY E_X_P_R_H_A_S_H_F_I_E_L_D): rowcount = 
180365.6, cumulative cost = {4148408.83 rows, 2.020094720003E7 cpu, 
0.0 io, 1.4775549952E9 network, 3.17443456E7 memory}, id = 1307
02-01UnorderedMuxExchange : rowType = 
RecordType(ANY ss_customer_sk, ANY E_X_P_R_H_A_S_H_F_I_E_L_D): rowcount = 
180365.6, cumulative cost

Re: [ANNOUNCE] New Committer: Weijie Tong

2018-08-31 Thread Boaz Ben-Zvi

   Congrat.s Weijie - and thanks for implementing the Bloom Filters fro 
Drill .


 Boaz


On 8/31/18 1:04 PM, Aman Sinha wrote:

Congratulations Weijie ! Thanks for your contributions.

On Fri, Aug 31, 2018 at 11:58 AM salim achouche 
wrote:


Congrats  Weijie!

On Fri, Aug 31, 2018 at 10:28 AM Paul Rogers 
wrote:


Congratulations Weijie, thanks for your contributions to Drill.
Thanks,
- Paul



 On Friday, August 31, 2018, 8:51:30 AM PDT, Arina Ielchiieva <
ar...@apache.org> wrote:

  The Project Management Committee (PMC) for Apache Drill has invited

Weijie

Tong to become a committer, and we are pleased to announce that he has
accepted.

Weijie Tong has become a very active contributor to Drill in recent

months.

He contributed the Join predicate push down feature which will be

available

in Apache Drill 1.15. The feature is non trivial and has covered changes
to all aspects of Drill: RPC layer, Planning, and Execution.

Welcome Weijie, and thank you for your contributions!

- Arina
(on behalf of Drill PMC)




--
Regards,
Salim

[jira] [Resolved] (DRILL-6566) Jenkins Regression: TPCDS query 66 fails with RESOURCE ERROR: One or more nodes ran out of memory while executing the query. AGGR OOM at First Phase.

2018-08-22 Thread Boaz Ben-Zvi (JIRA)



 [ 
https://issues.apache.org/jira/browse/DRILL-6566?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Boaz Ben-Zvi resolved DRILL-6566.
-
Resolution: Fixed
  Reviewer: Timothy Farkas

Commit ID 71c6c689a083e7496f06e99b4d253f11866ee741 

 

> Jenkins Regression: TPCDS query 66 fails with RESOURCE ERROR: One or more 
> nodes ran out of memory while executing the query.  AGGR OOM at First Phase.
> --
>
> Key: DRILL-6566
> URL: https://issues.apache.org/jira/browse/DRILL-6566
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Relational Operators
>Affects Versions: 1.14.0
>Reporter: Robert Hou
>Assignee: Boaz Ben-Zvi
>Priority: Critical
>  Labels: ready-to-commit
> Fix For: 1.15.0
>
> Attachments: drillbit.log.6566
>
>
> This is TPCDS Query 66.
> Query: tpcds/tpcds_sf1/hive-generated-parquet/hive1_native/query66.sql
> SELECT w_warehouse_name,
> w_warehouse_sq_ft,
> w_city,
> w_county,
> w_state,
> w_country,
> ship_carriers,
> year1,
> Sum(jan_sales) AS jan_sales,
> Sum(feb_sales) AS feb_sales,
> Sum(mar_sales) AS mar_sales,
> Sum(apr_sales) AS apr_sales,
> Sum(may_sales) AS may_sales,
> Sum(jun_sales) AS jun_sales,
> Sum(jul_sales) AS jul_sales,
> Sum(aug_sales) AS aug_sales,
> Sum(sep_sales) AS sep_sales,
> Sum(oct_sales) AS oct_sales,
> Sum(nov_sales) AS nov_sales,
> Sum(dec_sales) AS dec_sales,
> Sum(jan_sales / w_warehouse_sq_ft) AS jan_sales_per_sq_foot,
> Sum(feb_sales / w_warehouse_sq_ft) AS feb_sales_per_sq_foot,
> Sum(mar_sales / w_warehouse_sq_ft) AS mar_sales_per_sq_foot,
> Sum(apr_sales / w_warehouse_sq_ft) AS apr_sales_per_sq_foot,
> Sum(may_sales / w_warehouse_sq_ft) AS may_sales_per_sq_foot,
> Sum(jun_sales / w_warehouse_sq_ft) AS jun_sales_per_sq_foot,
> Sum(jul_sales / w_warehouse_sq_ft) AS jul_sales_per_sq_foot,
> Sum(aug_sales / w_warehouse_sq_ft) AS aug_sales_per_sq_foot,
> Sum(sep_sales / w_warehouse_sq_ft) AS sep_sales_per_sq_foot,
> Sum(oct_sales / w_warehouse_sq_ft) AS oct_sales_per_sq_foot,
> Sum(nov_sales / w_warehouse_sq_ft) AS nov_sales_per_sq_foot,
> Sum(dec_sales / w_warehouse_sq_ft) AS dec_sales_per_sq_foot,
> Sum(jan_net)   AS jan_net,
> Sum(feb_net)   AS feb_net,
> Sum(mar_net)   AS mar_net,
> Sum(apr_net)   AS apr_net,
> Sum(may_net)   AS may_net,
> Sum(jun_net)   AS jun_net,
> Sum(jul_net)   AS jul_net,
> Sum(aug_net)   AS aug_net,
> Sum(sep_net)   AS sep_net,
> Sum(oct_net)   AS oct_net,
> Sum(nov_net)   AS nov_net,
> Sum(dec_net)   AS dec_net
> FROM   (SELECT w_warehouse_name,
> w_warehouse_sq_ft,
> w_city,
> w_county,
> w_state,
> w_country,
> 'ZOUROS'
> \|\| ','
> \|\| 'ZHOU' AS ship_carriers,
> d_yearAS year1,
> Sum(CASE
> WHEN d_moy = 1 THEN ws_ext_sales_price * ws_quantity
> ELSE 0
> END)  AS jan_sales,
> Sum(CASE
> WHEN d_moy = 2 THEN ws_ext_sales_price * ws_quantity
> ELSE 0
> END)  AS feb_sales,
> Sum(CASE
> WHEN d_moy = 3 THEN ws_ext_sales_price * ws_quantity
> ELSE 0
> END)  AS mar_sales,
> Sum(CASE
> WHEN d_moy = 4 THEN ws_ext_sales_price * ws_quantity
> ELSE 0
> END)  AS apr_sales,
> Sum(CASE
> WHEN d_moy = 5 THEN ws_ext_sales_price * ws_quantity
> ELSE 0
> END)  AS may_sales,
> Sum(CASE
> WHEN d_moy = 6 THEN ws_ext_sales_price * ws_quantity
> ELSE 0
> END)  AS jun_sales,
> Sum(CASE
> WHEN d_moy = 7 THEN ws_ext_sales_price * ws_quantity
> ELSE 0
> END)  AS jul_sales,
> Sum(CASE
> WHEN d_moy = 8 THEN ws_ext_sales_price * ws_quantity
> ELSE 0
> END)  AS aug_sales,
> Sum(CASE
> WHEN d_moy = 9 THEN ws_ext_sales_price * ws_quantity
> ELSE 0
> END)  AS sep_sales,
> Sum(CASE
> WHEN d_moy = 10 THEN ws_ext_sales_price * ws_quantity
> ELSE 0
> END)  AS oct_sales,
> Sum(CASE
> WHEN d_moy = 11 THEN ws_ext_sales_price * ws_quantity
> ELSE 0
> END)  AS nov_sales,
> Sum(CASE
> WHEN d_moy = 12 THEN ws_ext_sales_price * ws_quantity
> ELSE 0
> END)  AS dec_sales,
> Sum(CASE
>

Re: [ANNOUNCE] New PMC member: Boaz Ben-Zvi

2018-08-17 Thread Boaz Ben-Zvi


   Thank you all for the greetings and nice words

  Boaz

On 8/17/18 6:09 PM, salim achouche wrote:

Congrats Boaz!

Regards,
Salim


On Aug 17, 2018, at 2:33 PM, Robert Wu  wrote:

Congratulations, Boaz!

Best regards,

Rob

-Original Message-
From: Abhishek Girish 
Sent: Friday, August 17, 2018 2:17 PM
To: dev 
Subject: Re: [ANNOUNCE] New PMC member: Boaz Ben-Zvi

Congratulations, Boaz!

On Fri, Aug 17, 2018 at 2:15 PM Sorabh Hamirwasia 
wrote:


Congratulations Boaz!

On Fri, Aug 17, 2018 at 11:42 AM, Karthikeyan Manivannan <
kmanivan...@mapr.com> wrote:


Congrats! Well deserved!

On Fri, Aug 17, 2018, 11:31 AM Timothy Farkas  wrote:


Congrats!

On Fri, Aug 17, 2018 at 11:27 AM, Gautam Parai 

wrote:

Congratulations Boaz!!

Gautam

On Fri, Aug 17, 2018 at 11:04 AM, Khurram Faraaz


wrote:

Congratulations Boaz.

On Fri, Aug 17, 2018 at 10:47 AM, shi.chunhui <
shi.chun...@aliyun.com.invalid> wrote:


Congrats Boaz!


--

Sender:Arina Ielchiieva  Sent at:2018 Aug
17 (Fri) 17:51 To:dev ; user
 Subject:[ANNOUNCE] New PMC member:
Boaz Ben-Zvi

I am pleased to announce that Drill PMC invited Boaz Ben-Zvi
to

the

PMC

and

he has accepted the invitation.

Congratulations Boaz and thanks for your contributions!

- Arina
(on behalf of Drill PMC)

Re: [DISCUSS] sqlline upgrade

2018-08-13 Thread Boaz Ben-Zvi

supports scrolling through multi-line SQL statements as single snippets in
history

I opened an issue (#73) for that back in April (
https://github.com/julianhyde/sqlline/issues/73 ), but no progress so
far

On 8/13/18 10:15 AM, Abhishek Girish wrote:

+1. Need to try out (2) to understand it's impact on usability. Okay with
(3). Not sure of consequences of (4) - what will happen to options Drill
doesn't support?

Also, do we know if the latest sqlline supports scrolling through
multi-line SQL statements as single snippets in history?

On Mon, Aug 13, 2018 at 10:08 AM Arina Yelchiyeva <
arina.yelchiy...@gmail.com> wrote:

Hi all,

currently Apache Drill is using custom version of Sqlline [1]. It has some
custom fixes, the importance of which I propose to discuss.

*1. Isolation.*
Drill supports TRANSACTION_NONE only, default in Sqlline
is TRANSACTION_REPEATABLE_READ. Apparently the was not possibility to
override the default or by any other reason, in custom version setting
transaction is commented out. After upgrade to the latest version, during
connection Drill errors that default transaction level is not acceptable
but this can be easily fixed by passing arguments to Sqlline to change the
defaults. Example will be provided below.

*2. Resizing of output.*
After upgrade to the latest version, output is weirdly resized, custom
version has ResizingRowsProvider which fixed this issue but now this can be
easily fixed by passing incremental=false when calling Sqlline.

Example for points 1 and 2:
CMD="$JAVA $SHELL_OPTS -cp $CP sqlline.SqlLine -d
org.apache.drill.jdbc.Driver --maxWidth=1* --isolation=TRANSACTION_NONE
--incremental=false*"

I haven't noticed any other issues with Sqlline that might regress after
the upgrade. If I have missed something else, please feel free to correct
me.

*3. Output of Drill version at start up.*
Now:
*apache drill 1.15.0-SNAPSHOT *
*"got drill?"*

After the upgrade:
*sqlline version 1.4.0*
*0: jdbc:drill:zk=local>*

*4. Options that Drill did not support were commented out, so they are not
displayed in help menu.*

If we upgrade, we'll lose the last two enhancements, though I don't think
they are crucial. Other projects like Apache Phoenix are doing fine without
them. Plus I think it's quite obvious why moving from custom version is a
good choice.

Any thoughts?

[1]
https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_mapr_sqlline_commits_1.1.9-2Ddrill-2Dr7=DwIBaQ=cskdkSMqhcnjZxdQVpwTXg=EqulKDxxEDCX6zbp1AZAa1-iAPQGgCioAqgDp7DE2BU=IOnXhPFq5KW53jZWUtI3T9Ch5ywWk4apFyywnzcJRKg=uO5-3ioxzSHMH2fnsB_O-FSpT8Y0FwD24DCbbBMkupg=

Kind regards,
Arina

Re: [ANNOUNCE] Apache Drill Release 1.14.0

2018-08-09 Thread Boaz Ben-Zvi

   Thanks Vlad for the advice and the link. Looking back at prior releases,
the 1.9.0 announcement (by Sudheesh) was also sent to the
annou...@apache.org, but then he got feedback like "What's this project
about?" and "Why should I care?" , so the following announcements avoided
the Apache general list.
   The link does suggest " put 3-5 lines blurb for the project. " , which
could address such feedbacks.
 We should try and compose a good "blurb" first.

  Boaz


On Thu, Aug 9, 2018 at 11:14 AM, Vlad Rozov  wrote:

> I'd recommend announcing the release on the ASF-wide mailing list (
> annou...@apache.org) as well as there may be other community members who
> may get interested in the new functionality. Please see [1] for the
> requirements.
>
> Thank you,
>
> Vlad
>
> [1] https://urldefense.proofpoint.com/v2/url?u=http-3A__www.
> apache.org_legal_release-2Dpolicy.html-23release-
> 2Dannouncements=DwIBaQ=cskdkSMqhcnjZxdQVpwTXg=
> PqKay2uOMZUqopDRKNfBtZSlsp2meGOxWNAVHxHnXCk=wgS7HUblyvBJyV2zHVsb_
> f3xin2DwJkVwQLS8RXwH1M=qgOPe8LyLXFsI1T3WIIXnvmSWJASg6LeVldtgQKjuBw=
>
> On 2018/08/06 06:15:55, Charles Givre  wrote:
> > Thanks Boaz and great work everyone!
> >
> > Sent from my iPhone
> >
> > > On Aug 5, 2018, at 21:52, Abhishek Girish  wrote:
> > >
> > > Congratulations, everyone! And Boaz, thanks so much for coordinating
> the
> > > release.
> > >
> > > Folks, please try out 1.14 - it's our best release yet!
> > >
> > >> On Sat, Aug 4, 2018 at 11:35 PM Boaz Ben-Zvi  wrote:
> > >>
> > >> On behalf of the Apache Drill community, I am happy to announce the
> > >> release of Apache Drill 1.14.0.
> > >>
> > >> For information about Apache Drill, and to get involved, visit the
> project
> > >> website [1].
> > >>
> > >> This release of Drill provides the following many new features and
> > >> improvements:
> > >>
> > >> 
> =
> > >>
> > >> - Ability to run Drill in a Docker container. (DRILL-6346)
> > >>
> > >> - Ability to export and save your storage plugin configurations to a
> JSON
> > >> file for reuse. (DRILL-4580)
> > >>
> > >> - Ability to manage storage plugin configurations in the Drill
> > >> configuration file, storage-plugins-override.conf. (DRILL-6494)
> > >>
> > >> - Functions that return data type information. (DRILL-6361)
> > >>
> > >> - The Drill kafka storage plugin supports filter pushdown for query
> > >> conditions on certain Kafka metadata fields in messages. (DRILL-5977)
> > >>
> > >> - Spill to disk for the Hash Join operator. (DRILL-6027)
> > >>
> > >> - The dfs storage plugin supports a Logfile plugin extension that
> enables
> > >> Drill to directly read and query log files of any format. (DRILL-6104)
> > >>
> > >> - Phonetic and string distance functions. (DRILL-6519)
> > >>
> > >> - The store.hive.conf.properties option enables you to specify Hive
> > >> properties at the session level using the SET command. (DRILL-6575)
> > >>
> > >> - Drill can directly manage the CPU resources through the Drill
> start-up
> > >> script, drill-env.sh; you no longer have to manually add the PID to
> the
> > >> cgroup.procs file each time a Drillbit restarts. (DRILL-143)
> > >>
> > >> - Drill can query the metadata in various image formats with the image
> > >> metadata format plugin. (DRILL-4364)
> > >>
> > >> - Enhanced decimal data type support. (DRILL-6094)
> > >>
> > >> - Option to push LIMIT(0) on top of SCAN. (DRILL-6574)
> > >>
> > >> - Parquet filter pushdown improvements. (DRILL-6174)
> > >>
> > >> - Drill can infer filter conditions for join queries and push the
> filter
> > >> conditions down to the data source. (DRILL-6173)
> > >>
> > >> - Drill uses a native reader to read Hive tables when you enable the
> > >> store.hive.optimize_scan_with_native_readers option. When enabled,
> Drill
> > >> reads data faster and applies filter pushdown optimizations.
> (DRILL-6331)
> > >>
> > >> - Early release of lateral join. (DRILL-5999)
> > >>
> > >> ===
> >

[ANNOUNCE] Apache Drill Release 1.14.0

2018-08-05 Thread Boaz Ben-Zvi


On behalf of the Apache Drill community, I am happy to announce the release of 
Apache Drill 1.14.0.

For information about Apache Drill, and to get involved, visit the project 
website [1].

This release of Drill provides the following many new features and improvements:
=

- Ability to run Drill in a Docker container. (DRILL-6346)

- Ability to export and save your storage plugin configurations to a JSON file 
for reuse. (DRILL-4580)

- Ability to manage storage plugin configurations in the Drill configuration 
file, storage-plugins-override.conf. (DRILL-6494)

- Functions that return data type information. (DRILL-6361)

- The Drill kafka storage plugin supports filter pushdown for query conditions 
on certain Kafka metadata fields in messages. (DRILL-5977)

- Spill to disk for the Hash Join operator. (DRILL-6027)

- The dfs storage plugin supports a Logfile plugin extension that enables Drill 
to directly read and query log files of any format. (DRILL-6104)

- Phonetic and string distance functions. (DRILL-6519)

- The store.hive.conf.properties option enables you to specify Hive properties 
at the session level using the SET command. (DRILL-6575)

- Drill can directly manage the CPU resources through the Drill start-up 
script, drill-env.sh; you no longer have to manually add the PID to the 
cgroup.procs file each time a Drillbit restarts. (DRILL-143)

- Drill can query the metadata in various image formats with the image metadata 
format plugin. (DRILL-4364)

- Enhanced decimal data type support. (DRILL-6094)

- Option to push LIMIT(0) on top of SCAN. (DRILL-6574)

- Parquet filter pushdown improvements. (DRILL-6174)

- Drill can infer filter conditions for join queries and push the filter 
conditions down to the data source. (DRILL-6173)

- Drill uses a native reader to read Hive tables when you enable the 
store.hive.optimize_scan_with_native_readers option. When enabled, Drill reads 
data faster and applies filter pushdown optimizations. (DRILL-6331)

- Early release of lateral join. (DRILL-5999)

===

For the full list please see the release notes at [2].

The binary and source artifacts are available here [3].

1. https://drill.apache.org/
2. https://drill.apache.org/docs/apache-drill-1-14-0-release-notes/
3. https://drill.apache.org/download/

    Thanks to everyone in the community who contributed to this release!

   Boaz

Re: [RESULT] [VOTE] Apache Drill release 1.14.0 - RC3

2018-08-04 Thread Boaz Ben-Zvi


 Hi Vlad,

 Three of the PMC members actually tested RC3, which satisfies our 
by laws. I did not want to clutter the RESULTS message with all the 
finesse of the RC2-RC3 differences.


   Boaz


On 8/4/18 8:07 AM, Vlad Rozov wrote:

Hi Boaz,

You may count only RC3 votes as cutting new RC invalidates any prior RC voting.

Thank you,

Vlad

On 2018/08/04 01:54:19, Boaz Ben-Zvi  wrote:

    The vote on RC3 for the Drill 1.14.0 release passed at last (sigh) !

Thanks to all the people who contributed to this release, and to all who
tested/validated/helped/commented/voted on the 1.14.0 release.

Final vote tally:

   +1 Binding:   Aman, Arina, Parth, Vitalii.

   +1 non-Binding:   Abhishek, Boaz, Charles, Karthik, Khurram, Kunal,
Sorabh, Volodymyr.

     0:   no one ...

   -1:   no one 

We will start now the process of promoting the release to production,
distribute the release to the Apache mirrors, update the documentation
with the new links, and finally announce the release.

This may take a day or two 

   Thanks,

     Boaz

[RESULT] [VOTE] Apache Drill release 1.14.0 - RC3

2018-08-03 Thread Boaz Ben-Zvi


  The vote on RC3 for the Drill 1.14.0 release passed at last (sigh) !

Thanks to all the people who contributed to this release, and to all who 
tested/validated/helped/commented/voted on the 1.14.0 release.


Final vote tally:

 +1 Binding:   Aman, Arina, Parth, Vitalii.

 +1 non-Binding:   Abhishek, Boaz, Charles, Karthik, Khurram, Kunal, 
Sorabh, Volodymyr.


   0:   no one ...

 -1:   no one 

We will start now the process of promoting the release to production, 
distribute the release to the Apache mirrors, update the documentation 
with the new links, and finally announce the release.


This may take a day or two 

 Thanks,

   Boaz

Re: [VOTE] Apache Drill release 1.14.0 - RC3

2018-08-03 Thread Boaz Ben-Zvi

 Have you tried running this test ( 
TestImpersonationMetadata.testShowFilesInWSWithOtherPermissionsForQueryUser() 
) in the IDE ?


I tried to emulate this failure, but could not. The validation failure 
comes from getPlan() (in ShowFileHandler.java), where the drillSchema 
was not found. Maybe an issue with the query context ?


Anyway this failure does not happen for any other tester, so should not 
be considered a blocker for the 1.14.0


 Thanks,

   Boaz


On 8/3/18 4:03 PM, Jinfeng Ni wrote:
The failures were consistent; either through “mvn clean install” or 
run the individual test case using maven.


On Fri, Aug 3, 2018 at 4:00 PM Abhishek Girish <mailto:agir...@apache.org>> wrote:


I don't skip any tests specifically when I run them - and I don't
see these
failures. So these tests should run by default. Not sure why they
fail for
you - are they intermittent or consistent?

On Fri, Aug 3, 2018 at 3:57 PM Jinfeng Ni mailto:j...@apache.org>> wrote:

> The test case seems to be different, the error seems to be different
> (permission error vs validation error) , also the machine I used is
> different. ( I used a new computer this time)
>
> I just wanna to check if you guys run those test cases, or those
test cases
> are skipped during maven build process.
>
>
>
    > On Fri, Aug 3, 2018 at 3:47 PM Boaz Ben-Zvi mailto:b...@apache.org>> wrote:
>
> >   Hi Jinfeng,
> >
> >      Interestingly last time this test failed was for You - on
March
> > 2nd, 2017 (see below; on a local cluster node)
> >
> > The conclusion then was that this was one of
> > permission/configuration/timing issues.  Possibly same this time ?
> >
> >      Boaz
> >
> > =-=-=-=-=-=-=-=  3 / 2 / 2017 Jinfeng Ni mailto:j...@mapr.com>> =-=-=-=-=-=-=
> >
> > Also, I'm seeing intermittent failures in unit test
> > TestInBoundImpersonation.  Seems to me the failure is caused
by the
> > permission of view file.
> >
> >
> > The failure for that testcase seems to be on node 104.58, but
not on
> > 104.57. Is there something configured different on 104.58 node?
> >
> >
> > 177,095B comp, 1 paTests run: 4, Failures: 0, Errors: 1,
Skipped: 0, Time
> > elapsed: 18.181 sec <<< FAILURE! - in
> > org.apache.drill.exec.impersonation.TestInboundImpersonation
> >
>

selectChainedView(org.apache.drill.exec.impersonation.TestInboundImpersonation)
> > Time elapsed: 0.182 sec  <<< ERROR!
> > java.lang.Exception:
> > org.apache.drill.common.exceptions.UserRemoteException:
PERMISSION ERROR:
> > Not authorized to read view [u0_lineitem] in schema
> [miniDfsPlugin.user0_1]
> >
> >
> > On 8/3/18 2:52 PM, Jinfeng Ni wrote:
> > > I keep seeing multiple unit test failures in
TestImpersonationMetadata.
> > If
> > > I run the test case individually,  still hit the error.
> > >
> > > I notice those testcases are marked as "SlowTest",
"SecurityTest". Did
> > > people have a way to skip those tests, or those failures are
probably
> > > caused by environment setting?
> > >
> > > 05:44:32.626 [main] ERROR org.apache.drill.TestReporter -
Test Failed
> (d:
> > > 16.0 KiB(1.0 MiB), h: 19.4 MiB(424.3 MiB), nh: 726.3
KiB(86.3 MiB)):
> > >
> >
>

testShowFilesInWSWithOtherPermissionsForQueryUser(org.apache.drill.exec.impersonation.TestImpersonationMetadata)
> > > org.apache.drill.common.exceptions.UserRemoteException:
VALIDATION
> ERROR:
> > > Invalid FROM/IN clause [miniDfsPlugin.drillTestGrp0_755]
> > >
> > > Results :
> > >
> > > Tests in error:
> > >
> > >
> >
>
TestImpersonationMetadata.testShowFilesInWSWithOtherPermissionsForQueryUser
> > > » UserRemote
> > >
> > > Tests run: 1, Failures: 0, Errors: 1, Skipped: 0
> > >
> > >
> > >
> > >
> > > On Fri, Aug 3, 2018 at 2:22 PM, Karthikeyan Manivannan <
> > kmanivan...@mapr.com <mailto:kmanivan...@mapr.com>
> > >> wrote:
> > >> Built drill from src tarball, on Linux.
> > >> Ran unit tests on Linux - 3 tests (in TestSimpleExternalSort)
> timed-out
> > but
> > >> I guess it i

Re: [VOTE] Apache Drill release 1.14.0 - RC3

2018-08-03 Thread Boaz Ben-Zvi

- Downloaded the source tarball from [2] on my Linux VM, built

and

ran

the

unit tests. 2 tests in 'TestUtf8SupportInQueryString' had errors

but

passed when run independently.
- Downloaded the binary tarball from [2] onto my Macbook,

untarred

and

ran

Drill in embedded mode
- Ran a few queries against a TPC-DS SF1 data set
- Examined the run-time query profiles of these queries with and

without

parallelism.
- Checked the maven artifacts on [3].
- Checked the KEYS, README files

LGTM. +1 (binding)

PS: Regarding the regression reported by Sorabh, I presume this

not

tested through automated tests. I cannot say what's the net

impact

that

issue on user base
but given the 230+ JIRAs fixed in this release, I think the RC

pretty

strong.

On Thu, Aug 2, 2018 at 11:20 PM Sorabh Hamirwasia <

shamirwa...@mapr.com>

wrote:

Downloaded source from [4] and built and ran unit tests on

linux

and

mac

environment.

- KafkaFilterPushdownTest is failing on mac environment and

passing

linux. Same as DRILL-6625
- MongoDB storage plugin unit tests is failing on linux but

passing

mac for me. [Platform dependent failure]
- Is anyone else observing this ?

Downloaded tarball from [2] and installed Drillbit in

distributed

mode.

1. Tested graceful shutdown feature and it's not working

when

auth

https is enabled. [Regression]
- Verified that it's working on 1.13.0 branch
2. Tested Kerberos and Plain mechanism with and without SASL

encryption

enabled [Pass]
- Used sqlline for java client and querysubmitter to test

for

c++

client.
3. Tested SSL encryption with and without cert/hostname

verification

with a generated certificate. [Pass]
- Used sqlline for java client and querysubmitter to test

for

c++

client.

Based on above regression my vote is -1.

Thanks,
Sorabh

On Thu, Aug 2, 2018 at 9:53 PM, Abhishek Girish <

agir...@apache.org>

wrote:

I am observing unit test failures in Kafka module - looks

identical

DRILL-6625
<https://urldefense.proofpoint.com/v2/url?u=https-3A__urldefense=DwIFaQ=cskdkSMqhcnjZxdQVpwTXg=PqKay2uOMZUqopDRKNfBtZSlsp2meGOxWNAVHxHnXCk=Q3RFKMgFN1M5KwN5HB05B7Lr2Mp4W3Rf7PUflzDvJuw=Ko27MHJAQk1GvKhvs7Zr-1_rUjsgTJY3jfE1t2l5urY=.

proofpoint.com/v2/url?u=https-

3A__issues.apache.org_jira_browse_DRILL-2D6625=DwIBaQ&

c=cskdkSMqhcnjZxdQVpwTXg=gRpEl0WzXE3EMrwj0KFbZXGXRyadOt

hF2jlYxvhTlQg=

VuCGvvHXP9kHQJIVtwWDuQF-Cn_kaf83TqGGATU78y4=iuhik-
LIdAD80y6KgoiOvS1TnLPw9YtWaW7YNN5UH6c=> - but only
intermittently (had multiple runs, but only one failed so

far).

this

something to be investigated for the release?

On Thu, Aug 2, 2018 at 2:10 AM Volodymyr Vysotskyi <

volody...@apache.org

wrote:

- Downloaded source archive at [2], built and ran unit

tests,

no issues were found.
- Downloaded built archive at [2], ran Drill in embedded

mode,

created and queried views, submitted several TPC-DS queries

from

sf1

data,
checked that profiles are displayed correctly.
- Connected from SQuirrel to Drill in embedded mode using

drill-jdbc-driver

from a built archive, ran several queries; ran queries

from a

java

application,
no issues were found.
- Connected directly to drillbit on the cluster using

drill-jdbc-driver

built with mapr profile,
submitted several queries.
Have a problem with connecting to Drill on the cluster

using

zookeeper,

but

looks like this is a config issue.

+1 (non-binding)

Kind regards,
Volodymyr Vysotskyi

On Thu, Aug 2, 2018 at 9:04 AM Khurram Faraaz <

kfar...@mapr.com>

wrote:

- Downloaded tarballs and deployed binaries on 4 node

cluster.

- Executed basic SQL queries from sqlline and from Web

UI.

- Verified features on the Web UI.

Looks good.
+1 (non-binding)

On Wed, Aug 1, 2018 at 9:55 PM, Kunal Khatua <

ku...@apache.org

wrote:

Built from source and tried the binaries.

Tested spill-to-disk behavior, a couple of concurrent

queries

and

general

UX checks. LGTM.

+1 (non-binding)
On 8/1/2018 4:58:08 PM, Boaz Ben-Zvi

wrote:

Thanks Vlad for bringing these two points to our

attention.

Therefore the vote on RC3 should be open till Friday,

August

3rd,

PM PDT.

And we should (sans any new issue) get enough PMC +1

votes

RC3

Friday.

Thanks,

Boaz

On 8/1/18 8:40 AM, Vlad Rozov wrote:

Apache release votes should be open for at least 72

hours

[1]

and

every

new RC requires that PMC "Before voting +1 PMC members

are

required

download the signed source code package, compile it as

provided,

and

test

the resulting executable on their own platform, along

with

also

verifying

that the package meets the requirements of the ASF

policy

releases".

Thank you,

Vlad

[1]https://urldefense.proofpoint.com/v2/url?u=http-

3A__www.apache.org_legal_release-2Dpolicy.html=

DwIBaQ=

cskdkSMqhcnjZxdQV

Re: [VOTE] Apache Drill release 1.14.0 - RC3

2018-08-01 Thread Boaz Ben-Zvi


 Thanks Vlad for bringing these two points to our attention.

Therefore the vote on RC3 should be open till Friday, August 3rd, at 6 
PM PDT.


And we should (sans any new issue) get enough PMC +1 votes on RC3 by Friday.

   Thanks,

    Boaz


On 8/1/18 8:40 AM, Vlad Rozov wrote:

Apache release votes should be open for at least 72 hours [1] and every new RC requires 
that PMC "Before voting +1 PMC members are required to download the signed source 
code package, compile it as provided, and test the resulting executable on their own 
platform, along with also verifying that the package meets the requirements of the ASF 
policy on releases".

Thank you,

Vlad

[1]https://urldefense.proofpoint.com/v2/url?u=http-3A__www.apache.org_legal_release-2Dpolicy.html=DwIBaQ=cskdkSMqhcnjZxdQVpwTXg=PqKay2uOMZUqopDRKNfBtZSlsp2meGOxWNAVHxHnXCk=7cmJ-j2LSiMs7xMoaBk4MmgV24HxaR5LzmDYE6dlKvw=GuxbeqAnbOHCDsEpaMRlPd4ursbQfyIQl5wPywp0V2g=

On 2018/07/31 22:09:04, Boaz Ben-Zvi  wrote:

   Hi RC reviewers and testers,

      There are a couple of RC2 minor issues that are flagged as errors
in the IDE ( in Eclipse, and may be in IntelliJ ). See DRILL-6650 and
DRILL-6651 for detail.

These two do *not* matter for the Maven build, or for testing the
tarballs, etc.  So if you started a long testing cycle with RC2, you may
continue.

I will produce a new RC3 soon to include the fixes for the above. (Note
that this RC3 would be *force-pushed* into branch 1.14.0 , thus erasing
the RC2 commit ID)
And if no one objects, the voting deadline would remain as is (Aug 2nd)
as the differences have a very minor impact.

     Thanks,

   Boaz


On 7/30/18 3:57 PM, Boaz Ben-Zvi wrote:

  Hi Drillers,

Proposing the *third* Release Candidate (RC2) for the Apache Drill,
version 1.14.0 .

This RC2 includes 235 committed Jiras [1]. Thanks to all the Drill
developers who works hard and contributed to this release.

The RC2 tarballs are hosted at [2] , and the Maven artifacts are at [3].

This Release Candidate is based on (Apache Drill branch named
"1.14.0") commit ID: 4da8aff88966adee5d7438024a826bb599450a6f ,
available at [4].

Please download and try/test this Release Candidate.

Given that our bylaws require 3 business days, the vote would end on
Thursday, August 2nd, 2018 at 5:00 PM PDT .

[ ] +1
[ ] +0
[ ] -1

  My vote is +1 !!

    Thank you,

Boaz


[1]
https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.org_jira_secure_ReleaseNote.jspa-3FprojectId-3D12313820-26version-3D12342097=DwIBaQ=cskdkSMqhcnjZxdQVpwTXg=PqKay2uOMZUqopDRKNfBtZSlsp2meGOxWNAVHxHnXCk=7cmJ-j2LSiMs7xMoaBk4MmgV24HxaR5LzmDYE6dlKvw=wPpmkjpk5reuOsI2Zhwg0-9vvYAQb_bWrhoJmkw4Bbc=  



[2]https://urldefense.proofpoint.com/v2/url?u=http-3A__home.apache.org_-7Eboaz_drill_releases_1.14.0_rc2_=DwIBaQ=cskdkSMqhcnjZxdQVpwTXg=PqKay2uOMZUqopDRKNfBtZSlsp2meGOxWNAVHxHnXCk=7cmJ-j2LSiMs7xMoaBk4MmgV24HxaR5LzmDYE6dlKvw=dINsXomnONCGf0t39_J5VB1_T16dqj7yq34sNyU72_M=

[3]
https://urldefense.proofpoint.com/v2/url?u=https-3A__repository.apache.org_content_repositories_orgapachedrill-2D1050=DwIBaQ=cskdkSMqhcnjZxdQVpwTXg=PqKay2uOMZUqopDRKNfBtZSlsp2meGOxWNAVHxHnXCk=7cmJ-j2LSiMs7xMoaBk4MmgV24HxaR5LzmDYE6dlKvw=K2ffEjpPckQ9-7YybbeHm_ClMS-F7K9pv_N4xbAhs4E=

[4]https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_apache_drill_tree_1.14.0=DwIBaQ=cskdkSMqhcnjZxdQVpwTXg=PqKay2uOMZUqopDRKNfBtZSlsp2meGOxWNAVHxHnXCk=7cmJ-j2LSiMs7xMoaBk4MmgV24HxaR5LzmDYE6dlKvw=x4FT03yAXZOcnqY0kI1vmRJmuGa25ZGVhKVKjXH2ML4=

   OR

https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_Ben-2DZvi_drill_tree_drill-2D1.14.0=DwIBaQ=cskdkSMqhcnjZxdQVpwTXg=PqKay2uOMZUqopDRKNfBtZSlsp2meGOxWNAVHxHnXCk=7cmJ-j2LSiMs7xMoaBk4MmgV24HxaR5LzmDYE6dlKvw=aGU2JYBo3RzIdTX6yNrVzyoqz9evUi9smRYMs2YfG2A=

Re: [VOTE] Apache Drill release 1.14.0 - RC3

2018-08-01 Thread Boaz Ben-Zvi


 Thanks Vlad for bringing these two points to our attention.

Therefore the vote on RC3 should be open till Friday, August 3rd, at 6 
PM PDT.


And we should (sans any new issue) get enough PMC +1 votes on RC3 by Friday.

   Thanks,

    Boaz


On 8/1/18 8:40 AM, Vlad Rozov wrote:

Apache release votes should be open for at least 72 hours [1] and every new RC requires 
that PMC "Before voting +1 PMC members are required to download the signed source 
code package, compile it as provided, and test the resulting executable on their own 
platform, along with also verifying that the package meets the requirements of the ASF 
policy on releases".

Thank you,

Vlad

[1] 
https://urldefense.proofpoint.com/v2/url?u=http-3A__www.apache.org_legal_release-2Dpolicy.html=DwIBaQ=cskdkSMqhcnjZxdQVpwTXg=PqKay2uOMZUqopDRKNfBtZSlsp2meGOxWNAVHxHnXCk=7cmJ-j2LSiMs7xMoaBk4MmgV24HxaR5LzmDYE6dlKvw=GuxbeqAnbOHCDsEpaMRlPd4ursbQfyIQl5wPywp0V2g=

On 2018/07/31 22:09:04, Boaz Ben-Zvi  wrote:

   Hi RC reviewers and testers,

      There are a couple of RC2 minor issues that are flagged as errors
in the IDE ( in Eclipse, and may be in IntelliJ ). See DRILL-6650 and
DRILL-6651 for detail.

These two do *not* matter for the Maven build, or for testing the
tarballs, etc.  So if you started a long testing cycle with RC2, you may
continue.

I will produce a new RC3 soon to include the fixes for the above. (Note
that this RC3 would be *force-pushed* into branch 1.14.0 , thus erasing
the RC2 commit ID)
And if no one objects, the voting deadline would remain as is (Aug 2nd)
as the differences have a very minor impact.

     Thanks,

   Boaz


On 7/30/18 3:57 PM, Boaz Ben-Zvi wrote:

  Hi Drillers,

Proposing the *third* Release Candidate (RC2) for the Apache Drill,
version 1.14.0 .

This RC2 includes 235 committed Jiras [1]. Thanks to all the Drill
developers who works hard and contributed to this release.

The RC2 tarballs are hosted at [2] , and the Maven artifacts are at [3].

This Release Candidate is based on (Apache Drill branch named
"1.14.0") commit ID: 4da8aff88966adee5d7438024a826bb599450a6f ,
available at [4].

Please download and try/test this Release Candidate.

Given that our bylaws require 3 business days, the vote would end on
Thursday, August 2nd, 2018 at 5:00 PM PDT .

[ ] +1
[ ] +0
[ ] -1

  My vote is +1 !!

    Thank you,

Boaz


[1]
https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.org_jira_secure_ReleaseNote.jspa-3FprojectId-3D12313820-26version-3D12342097=DwIBaQ=cskdkSMqhcnjZxdQVpwTXg=PqKay2uOMZUqopDRKNfBtZSlsp2meGOxWNAVHxHnXCk=7cmJ-j2LSiMs7xMoaBk4MmgV24HxaR5LzmDYE6dlKvw=wPpmkjpk5reuOsI2Zhwg0-9vvYAQb_bWrhoJmkw4Bbc=


[2] 
https://urldefense.proofpoint.com/v2/url?u=http-3A__home.apache.org_-7Eboaz_drill_releases_1.14.0_rc2_=DwIBaQ=cskdkSMqhcnjZxdQVpwTXg=PqKay2uOMZUqopDRKNfBtZSlsp2meGOxWNAVHxHnXCk=7cmJ-j2LSiMs7xMoaBk4MmgV24HxaR5LzmDYE6dlKvw=dINsXomnONCGf0t39_J5VB1_T16dqj7yq34sNyU72_M=

[3]
https://urldefense.proofpoint.com/v2/url?u=https-3A__repository.apache.org_content_repositories_orgapachedrill-2D1050=DwIBaQ=cskdkSMqhcnjZxdQVpwTXg=PqKay2uOMZUqopDRKNfBtZSlsp2meGOxWNAVHxHnXCk=7cmJ-j2LSiMs7xMoaBk4MmgV24HxaR5LzmDYE6dlKvw=K2ffEjpPckQ9-7YybbeHm_ClMS-F7K9pv_N4xbAhs4E=

[4] 
https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_apache_drill_tree_1.14.0=DwIBaQ=cskdkSMqhcnjZxdQVpwTXg=PqKay2uOMZUqopDRKNfBtZSlsp2meGOxWNAVHxHnXCk=7cmJ-j2LSiMs7xMoaBk4MmgV24HxaR5LzmDYE6dlKvw=x4FT03yAXZOcnqY0kI1vmRJmuGa25ZGVhKVKjXH2ML4=

   OR

https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_Ben-2DZvi_drill_tree_drill-2D1.14.0=DwIBaQ=cskdkSMqhcnjZxdQVpwTXg=PqKay2uOMZUqopDRKNfBtZSlsp2meGOxWNAVHxHnXCk=7cmJ-j2LSiMs7xMoaBk4MmgV24HxaR5LzmDYE6dlKvw=aGU2JYBo3RzIdTX6yNrVzyoqz9evUi9smRYMs2YfG2A=

[VOTE] Apache Drill release 1.14.0 - RC3

2018-07-31 Thread Boaz Ben-Zvi


  Hi Drill RC Reviewers and Testers,

 Proposing the *fourth* Release Candidate (RC3) for the Apache 
Drill, version 1.14.0 .


( RC3 differs from RC2 only in two minor issues - DRILL-6650 and 
DRILL-6651 - only affecting builds in the IDE - hence any testing of the 
RC2 tarballs is valid).


This RC3 includes 237 committed Jiras [1]. Thanks to all the Drill 
developers who worked hard and contributed to this release.


 The RC3 tarballs are hosted at [2] , and the Maven artifacts are at [3].

 This Release Candidate is based on (Apache Drill branch named 
"1.14.0") commit ID: 0508a128853ce796ca7e99e13008e49442f83147 , 
available at [4].


 Please download and try/test this Release Candidate.

 Given the minor change to RC2, the vote would end on Thursday, August 
2nd, 2018 at 5:00 PM PDT .


 [ ] +1
 [ ] +0
 [ ] -1

  My vote is +1 !!

    Thank you,

 Boaz


 [1] 
https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12313820=12342097 



 [2] http://home.apache.org/~boaz/drill/releases/1.14.0/rc3/

 [3] 
https://repository.apache.org/content/repositories/orgapachedrill-1051


 [4] https://github.com/apache/drill/tree/1.14.0

   OR

 https://github.com/Ben-Zvi/drill/tree/drill-1.14.0

Re: [VOTE] Apache Drill release 1.14.0 - RC2

2018-07-31 Thread Boaz Ben-Zvi


 Hi RC reviewers and testers,

    There are a couple of RC2 minor issues that are flagged as errors 
in the IDE ( in Eclipse, and may be in IntelliJ ). See DRILL-6650 and 
DRILL-6651 for detail.


These two do *not* matter for the Maven build, or for testing the 
tarballs, etc.  So if you started a long testing cycle with RC2, you may 
continue.


I will produce a new RC3 soon to include the fixes for the above. (Note 
that this RC3 would be *force-pushed* into branch 1.14.0 , thus erasing 
the RC2 commit ID)
And if no one objects, the voting deadline would remain as is (Aug 2nd) 
as the differences have a very minor impact.


   Thanks,

 Boaz


On 7/30/18 3:57 PM, Boaz Ben-Zvi wrote:


 Hi Drillers,

Proposing the *third* Release Candidate (RC2) for the Apache Drill, 
version 1.14.0 .


This RC2 includes 235 committed Jiras [1]. Thanks to all the Drill 
developers who works hard and contributed to this release.


The RC2 tarballs are hosted at [2] , and the Maven artifacts are at [3].

This Release Candidate is based on (Apache Drill branch named 
"1.14.0") commit ID: 4da8aff88966adee5d7438024a826bb599450a6f , 
available at [4].


Please download and try/test this Release Candidate.

Given that our bylaws require 3 business days, the vote would end on 
Thursday, August 2nd, 2018 at 5:00 PM PDT .


[ ] +1
[ ] +0
[ ] -1

 My vote is +1 !!

   Thank you,

Boaz


[1] 
https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12313820=12342097 



[2] http://home.apache.org/~boaz/drill/releases/1.14.0/rc2/

[3] 
https://repository.apache.org/content/repositories/orgapachedrill-1050


[4] https://github.com/apache/drill/tree/1.14.0

  OR

https://github.com/Ben-Zvi/drill/tree/drill-1.14.0

[VOTE] Apache Drill release 1.14.0 - RC2

2018-07-30 Thread Boaz Ben-Zvi


 Hi Drillers,

    Proposing the *third* Release Candidate (RC2) for the Apache Drill, 
version 1.14.0 .


This RC2 includes 235 committed Jiras [1]. Thanks to all the Drill 
developers who works hard and contributed to this release.


The RC2 tarballs are hosted at [2] , and the Maven artifacts are at [3].

This Release Candidate is based on (Apache Drill branch named "1.14.0") 
commit ID: 4da8aff88966adee5d7438024a826bb599450a6f , available at [4].


Please download and try/test this Release Candidate.

Given that our bylaws require 3 business days, the vote would end on 
Thursday, August 2nd, 2018 at 5:00 PM PDT .


[ ] +1
[ ] +0
[ ] -1

 My vote is +1 !!

   Thank you,

Boaz


[1] 
https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12313820=12342097 



[2] http://home.apache.org/~boaz/drill/releases/1.14.0/rc2/

[3] https://repository.apache.org/content/repositories/orgapachedrill-1050

[4] https://github.com/apache/drill/tree/1.14.0

  OR

https://github.com/Ben-Zvi/drill/tree/drill-1.14.0

Re: [VOTE] Apache Drill release 1.14.0 - RC1

2018-07-30 Thread Boaz Ben-Zvi


 Hi Arina,

    The RC1 (and now RC2) are being built in a separate branch - 1.14.0 
; building the RC.. over the master would require a forced push (as the 
top of master is marked 1.15.0-SNAPSHOT).


We can just work by cherry-picking the needed commits into the 1.14.0 
branch. It is about the same (other than different commit IDs).


The master branch has been open for new work.

   Thanks,

    Boaz


On 7/30/18 1:56 PM, Arina Yelchiyeva wrote:

Hi Boaz,

it's unfortunate we have to have second iteration of RC. New RC will 
contain two commits (one from previous iteration) from:
https://github.com/apache/drill/pull/1404 
<https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_apache_drill_pull_1404=DwMFaQ=cskdkSMqhcnjZxdQVpwTXg=7lXQnf0aC8VQ0iMXwVgNHw=Y5ocpPz6MxuOl_b8KpxBSxcQNRR3fYti88mdkysBYt4=-5i5DCVG_AdKnSWwDURr97RbD26zF2dyu0q7Tm4I9vk=>
https://github.com/apache/drill/pull/1406 
<https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_apache_drill_pull_1406=DwMFaQ=cskdkSMqhcnjZxdQVpwTXg=7lXQnf0aC8VQ0iMXwVgNHw=Y5ocpPz6MxuOl_b8KpxBSxcQNRR3fYti88mdkysBYt4=_UQTeIIMlX9tdjac2_FAano7MB27pbiooXnoR1xMu14=>


But I believe we should merge them into master first before creating 
new RC. Both has passed CR, so it's OK to merge them.


Kind regards,
Arina

On Mon, Jul 30, 2018 at 11:48 PM Boaz Ben-Zvi <mailto:b...@apache.org>> wrote:


   OK -- RC1 is off; will produce RC2 soon with the PR #1406 (and
thanks
Vitalii for the other comments)

  Boaz


On 7/30/18 9:53 AM, Charles Givre wrote:
> I attempted to build from source and got the following errors:
>
> Results :
>
> Failed tests:
>

TestConvertCountToDirectScan.ensureCorrectCountWithMissingStatistics:153->PlanTestBase.testPlanMatchingPatterns:84->PlanTestBase.testPlanMatchingPatterns:103

> Found unwanted pattern in plan: DynamicPojoRecordReader
> 00-00    Screen
> 00-01 Project(cnt_str=[$0], cnt_total=[$1])
> 00-02 Scan(groupscan=[files =
>

[/Users/cgivre/github/drill-dev/rc1/apache-drill-1.14.0-src/exec/java-exec/target/org.apache.drill.exec.planner.logical.TestConvertCountToDirectScan/dfsTestTmp/1532967947836-0/wide_str_table/0_0_1.parquet,

>

/Users/cgivre/github/drill-dev/rc1/apache-drill-1.14.0-src/exec/java-exec/target/org.apache.drill.exec.planner.logical.TestConvertCountToDirectScan/dfsTestTmp/1532967947836-0/wide_str_table/0_0_2.parquet],

> numFiles = 2, DynamicPojoRecordReader{records = [[0, 2]]}])
>
>
> Tests run: 3331, Failures: 1, Errors: 0, Skipped: 156
>
>
>
>
>> On Jul 30, 2018, at 11:17, Vitalii Diravka
mailto:vitalii.dira...@gmail.com>
>> <mailto:vitalii.dira...@gmail.com
<mailto:vitalii.dira...@gmail.com>>> wrote:
>>
>> Hi all!
>>
>> I'm in the process of verifying the Drill RC1.
>> And I always get BUILD FAILURE for "mvn clean install". It is
related to
>> DRILL-6641 (PR is on review).
>> It fails randomly from Jira description, but for me the test
fails every
>> time.
>> Does anybody else have this issue? Or maybe it is my local
environment
>> setup causes it.
>> I want to be sure, since I can not validate RC1 without clean
build with
>> unit tests.
>>
>> Boaz, could you please edit the [1] link, I assume it should be
>>

https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12313820=12342097

<https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.org_jira_secure_ReleaseNote.jspa-3FprojectId-3D12313820-26version-3D12342097=DwMFaQ=cskdkSMqhcnjZxdQVpwTXg=7lXQnf0aC8VQ0iMXwVgNHw=Y5ocpPz6MxuOl_b8KpxBSxcQNRR3fYti88mdkysBYt4=-Mp9xWoUV5qx_S8DpTfrvCj2LQ3TyQezgzzyPfACG4Q=>

>>

<https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.org_jira_secure_ReleaseNote.jspa-3FprojectId-3D12313820-26version-3D12342097=DwMFAg=cskdkSMqhcnjZxdQVpwTXg=7lXQnf0aC8VQ0iMXwVgNHw=fsrZudttKIbOzqwzR6E8VlkHuGlqEq1RLobDOm8zivw=b4JQ8vrZwv1C5l6cEU2W4Al-Aiw7XHkz1RMlAUtHSec=>
>> and add the space after the first link in [4].
>>
>> Kind regards
>> Vitalii
>>
>>
>> On Sat, Jul 28, 2018 at 2:25 AM Boaz Ben-Zvi mailto:b...@apache.org>> wrote:
>>
>>>  Hi Drillers,
>>>
>>> Proposing the *second* Release Candidate (RC1) for the Apache
>>> Drill, version 1.14.0 .
>>>
>>> This RC1 includes 234 committed Jiras [1]. Thanks to all the Drill
>>> developers who works hard and contributed to this release.
>>>
>>> The RC1 tarballs are hosted at [2] , and the Maven artifac

Re: [VOTE] Apache Drill release 1.14.0 - RC1

2018-07-30 Thread Boaz Ben-Zvi

  OK -- RC1 is off; will produce RC2 soon with the PR #1406 (and thanks 
Vitalii for the other comments)


 Boaz


On 7/30/18 9:53 AM, Charles Givre wrote:

I attempted to build from source and got the following errors:

Results :

Failed tests:
TestConvertCountToDirectScan.ensureCorrectCountWithMissingStatistics:153->PlanTestBase.testPlanMatchingPatterns:84->PlanTestBase.testPlanMatchingPatterns:103 
Found unwanted pattern in plan: DynamicPojoRecordReader

00-00    Screen
00-01 Project(cnt_str=[$0], cnt_total=[$1])
00-02 Scan(groupscan=[files = 
[/Users/cgivre/github/drill-dev/rc1/apache-drill-1.14.0-src/exec/java-exec/target/org.apache.drill.exec.planner.logical.TestConvertCountToDirectScan/dfsTestTmp/1532967947836-0/wide_str_table/0_0_1.parquet, 
/Users/cgivre/github/drill-dev/rc1/apache-drill-1.14.0-src/exec/java-exec/target/org.apache.drill.exec.planner.logical.TestConvertCountToDirectScan/dfsTestTmp/1532967947836-0/wide_str_table/0_0_2.parquet], 
numFiles = 2, DynamicPojoRecordReader{records = [[0, 2]]}])



Tests run: 3331, Failures: 1, Errors: 0, Skipped: 156




On Jul 30, 2018, at 11:17, Vitalii Diravka <mailto:vitalii.dira...@gmail.com>> wrote:


Hi all!

I'm in the process of verifying the Drill RC1.
And I always get BUILD FAILURE for "mvn clean install". It is related to
DRILL-6641 (PR is on review).
It fails randomly from Jira description, but for me the test fails every
time.
Does anybody else have this issue? Or maybe it is my local environment
setup causes it.
I want to be sure, since I can not validate RC1 without clean build with
unit tests.

Boaz, could you please edit the [1] link, I assume it should be
https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12313820=12342097 
<https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.org_jira_secure_ReleaseNote.jspa-3FprojectId-3D12313820-26version-3D12342097=DwMFAg=cskdkSMqhcnjZxdQVpwTXg=7lXQnf0aC8VQ0iMXwVgNHw=fsrZudttKIbOzqwzR6E8VlkHuGlqEq1RLobDOm8zivw=b4JQ8vrZwv1C5l6cEU2W4Al-Aiw7XHkz1RMlAUtHSec=>

and add the space after the first link in [4].

Kind regards
Vitalii


On Sat, Jul 28, 2018 at 2:25 AM Boaz Ben-Zvi  wrote:


 Hi Drillers,

Proposing the *second* Release Candidate (RC1) for the Apache
Drill, version 1.14.0 .

This RC1 includes 234 committed Jiras [1]. Thanks to all the Drill
developers who works hard and contributed to this release.

The RC1 tarballs are hosted at [2] , and the Maven artifacts are at [3].

This Release Candidate is based on (Apache Drill branch named "1.14.0")
commit ID: c705271d550a6adf0a874cd4a6bddd62d5ecc1d9 , available at [4].

Please download and try/test this Release Candidate.

Given that our bylaws require 3 business days, the vote would end on
Wednesday, August 1st, 2018 at 5:00 PM PDT .

[ ] +1
[ ] +0
[ ] -1

 My vote is +1 !!

   Thank you,

Boaz

[1] https://issues.apache.org/jira/browse/DRILL-6637?filter=12344431

[2] http://home.apache.org/~boaz/drill/releases/1.14.0/rc1/

[3] 
https://repository.apache.org/content/repositories/orgapachedrill-1049


[4] https://github.com/apache/drill/tree/1.14.0OR
https://github.com/Ben-Zvi/drill/tree/drill-1.14.0 (commit id:
65a6cb5233058b24613aaecfa9c9c7007a12c7e8)

[VOTE] Apache Drill release 1.14.0 - RC1

2018-07-27 Thread Boaz Ben-Zvi


 Hi Drillers,

    Proposing the *second* Release Candidate (RC1) for the Apache 
Drill, version 1.14.0 .


This RC1 includes 234 committed Jiras [1]. Thanks to all the Drill 
developers who works hard and contributed to this release.


The RC1 tarballs are hosted at [2] , and the Maven artifacts are at [3].

This Release Candidate is based on (Apache Drill branch named "1.14.0") 
commit ID: c705271d550a6adf0a874cd4a6bddd62d5ecc1d9 , available at [4].


Please download and try/test this Release Candidate.

Given that our bylaws require 3 business days, the vote would end on 
Wednesday, August 1st, 2018 at 5:00 PM PDT .


[ ] +1
[ ] +0
[ ] -1

 My vote is +1 !!

   Thank you,

Boaz

[1] https://issues.apache.org/jira/browse/DRILL-6637?filter=12344431

[2] http://home.apache.org/~boaz/drill/releases/1.14.0/rc1/

[3] https://repository.apache.org/content/repositories/orgapachedrill-1049

[4] https://github.com/apache/drill/tree/1.14.0OR 
https://github.com/Ben-Zvi/drill/tree/drill-1.14.0 (commit id: 
65a6cb5233058b24613aaecfa9c9c7007a12c7e8)

Re: [VOTE] Apache Drill release 1.14.0 - RC0

2018-07-27 Thread Boaz Ben-Zvi

   OK -- opinions seem to concur that this issue is a blocker. So 
cancelling RC0 now 


I'll merge PR #1404 into the 1.14.0 branch (how nice - similar numbers 
:-) and start producing RC1 ..


    Thanks,

    Boaz


On 7/26/18 8:32 PM, Abhishek Girish wrote:
Looks like a blocker for the release? If we call it out earlier, we 
can avoid folks spending time validating the RC.


On Thu, Jul 26, 2018 at 2:24 PM Anton Gozhiy <mailto:anton5...@gmail.com>> wrote:


Hi All,

I reported a bug, could you take a look?:
https://issues.apache.org/jira/browse/DRILL-6639

<https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.org_jira_browse_DRILL-2D6639=DwMFaQ=cskdkSMqhcnjZxdQVpwTXg=7lXQnf0aC8VQ0iMXwVgNHw=3kegVZ6Fg3KVCcILrfOYYN2HgbR6K8ikbt94_SG6bb0=4T2QtXvtXhshIesXlVXliHiHbadeOjf0SNZLnN67UMM=>
It was not reproducible with Drill 1.13.0

Thanks!
On Thu, Jul 26, 2018 at 5:15 AM Boaz Ben-Zvi mailto:b...@apache.org>> wrote:

>   Hi Drillers,
>
>      Proposing the first Release Candidate (RC0) for the Apache
Drill,
> version 1.14.0 .
>
> This RC0 includes 233 committed Jiras [1]. Thanks to all the Drill
> developers who works hard and contributed to this release.
>
> The RC0 tarballs are hosted at [2] , and the Maven artifacts are
at [3].
>
> This Release Candidate is based on (Apache Drill branch named
"1.14.0")
> commit ID: 31b88bafbc650fe674086815ec4b1a460efea013 , available
at [4].
>
> Please download and try/test this Release Candidate.
>
> Given that our bylaws require 3 business days, the vote would end on
> Monday, July 30, 2018 at 8:00 PM PDT .
>
> [ ] +1
> [ ] +0
> [ ] -1
>
>     Thank you,
>
>              Boaz
>
> [1]
https://issues.apache.org/jira/browse/DRILL-6637?filter=12344431

<https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.org_jira_browse_DRILL-2D6637-3Ffilter-3D12344431=DwMFaQ=cskdkSMqhcnjZxdQVpwTXg=7lXQnf0aC8VQ0iMXwVgNHw=3kegVZ6Fg3KVCcILrfOYYN2HgbR6K8ikbt94_SG6bb0=mN9HcVEV_4g_VwOTFTlNhr-Xnubl-9LcN40tltB1kEw=>
>
> [2] http://home.apache.org/~boaz/drill/releases/1.14.0/rc0/

<https://urldefense.proofpoint.com/v2/url?u=http-3A__home.apache.org_-7Eboaz_drill_releases_1.14.0_rc0_=DwMFaQ=cskdkSMqhcnjZxdQVpwTXg=7lXQnf0aC8VQ0iMXwVgNHw=3kegVZ6Fg3KVCcILrfOYYN2HgbR6K8ikbt94_SG6bb0=b0FTy1M2HndnPI4HWmyxeHC-WSaBxXT_VA2IbVlJsKw=>
>
> [3]
https://repository.apache.org/content/repositories/orgapachedrill-1048

<https://urldefense.proofpoint.com/v2/url?u=https-3A__repository.apache.org_content_repositories_orgapachedrill-2D1048=DwMFaQ=cskdkSMqhcnjZxdQVpwTXg=7lXQnf0aC8VQ0iMXwVgNHw=3kegVZ6Fg3KVCcILrfOYYN2HgbR6K8ikbt94_SG6bb0=FnyWFvDjJ1vi9EnGGTKdBCOVPX_YQJDNqM4m7DLDJok=>
>
> [4] https://github.com/Ben-Zvi/drill/tree/drill-1.14.0

<https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_Ben-2DZvi_drill_tree_drill-2D1.14.0=DwMFaQ=cskdkSMqhcnjZxdQVpwTXg=7lXQnf0aC8VQ0iMXwVgNHw=3kegVZ6Fg3KVCcILrfOYYN2HgbR6K8ikbt94_SG6bb0=KklAAP3lZKZJAbpv7rdXkqx8p_Q4eCqR13jORBUC5Oc=>
OR
> https://github.com/apache/drill/tree/1.14.0

<https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_apache_drill_tree_1.14.0=DwMFaQ=cskdkSMqhcnjZxdQVpwTXg=7lXQnf0aC8VQ0iMXwVgNHw=3kegVZ6Fg3KVCcILrfOYYN2HgbR6K8ikbt94_SG6bb0=Pf5wGh9r307ohkY_rxvmzFYYohGlTVSzqjP1XIVwx6o=>
>
>
>
>

-- 
Sincerely, Anton Gozhiy

anton5...@gmail.com <mailto:anton5...@gmail.com>

Hangout Summary - July 24 (Re: Drill Hangout tomorrow at 10 am PST)

2018-07-25 Thread Boaz Ben-Zvi

   At the July 24th Hangout we mainly discussed some blocking issues 
with the 1.14 release candidate (such as the Javadoc tests dependencies).


Another important issue raised by Arina is the submission of new UDFs by 
contributors, which add some dependencies under /exec.


These dependencies are not getting well tested like the main Drill code.

A solution suggested - put these under either a new directory, or under 
/contrib.


   Thanks,

 Boaz


On 7/23/18 6:16 PM, Boaz Ben-Zvi wrote:
    The bi-weekly Drill Hangout shall take place tomorrow July 24th at 
10 am PDT


Any discussion topics are welcome (I'm currently busy with the 1.14 
RC; I could say a word or two about that)


The Hangout link:

https://hangouts.google.com/hangouts/_/event/ci4rdiju8bv04a64efj5fedd0lc

  Thanks,

   Boaz

[VOTE] Apache Drill release 1.14.0 - RC0

2018-07-25 Thread Boaz Ben-Zvi


 Hi Drillers,

    Proposing the first Release Candidate (RC0) for the Apache Drill, 
version 1.14.0 .


This RC0 includes 233 committed Jiras [1]. Thanks to all the Drill 
developers who works hard and contributed to this release.


The RC0 tarballs are hosted at [2] , and the Maven artifacts are at [3].

This Release Candidate is based on (Apache Drill branch named "1.14.0") 
commit ID: 31b88bafbc650fe674086815ec4b1a460efea013 , available at [4].


Please download and try/test this Release Candidate.

Given that our bylaws require 3 business days, the vote would end on 
Monday, July 30, 2018 at 8:00 PM PDT .


[ ] +1
[ ] +0
[ ] -1

   Thank you,

    Boaz

[1] https://issues.apache.org/jira/browse/DRILL-6637?filter=12344431

[2] http://home.apache.org/~boaz/drill/releases/1.14.0/rc0/

[3] https://repository.apache.org/content/repositories/orgapachedrill-1048

[4] https://github.com/Ben-Zvi/drill/tree/drill-1.14.0  OR 
https://github.com/apache/drill/tree/1.14.0

New development version: 1.15.0-SNAPSHOT (was Re: Temporary Hold back on commits to the Apache master branch)

2018-07-25 Thread Boaz Ben-Zvi


  Hi All,

       The Apache master branch is now open for new commits; it has 
been upgraded to the new development version - 1.15.0-SNAPSHOT .


   Thanks for your patience,

    Boaz

On 7/23/18 12:00 PM, Boaz Ben-Zvi wrote:

  Hi Committers,
      Please hold back on any commits into the Apache master branch while we 
are working on creating a Release Candidate for 1.14.
   May take a few hours;
       Thanks,
               Boaz

[jira] [Created] (DRILL-6637) Root pom: Release build needs to remove dep to tests in maven-javadoc-plugin

2018-07-25 Thread Boaz Ben-Zvi (JIRA)

Boaz Ben-Zvi created DRILL-6637:
---

 Summary: Root pom: Release build needs to remove dep to tests in 
maven-javadoc-plugin
 Key: DRILL-6637
 URL: https://issues.apache.org/jira/browse/DRILL-6637
 Project: Apache Drill
  Issue Type: Bug
  Components: Tools, Build  Test
Affects Versions: 1.14.0
Reporter: Boaz Ben-Zvi
Assignee: Boaz Ben-Zvi
 Fix For: 1.14.0


Error in "Preparing the release: build: 
{code}
27111 [INFO] [ERROR] Failed to execute goal 
org.apache.maven.plugins:maven-javadoc-plugin:2.10.3:jar (attach-javadocs) on 
project drill-fmpp-maven-plugin: MavenReportException: Error while generating 
Javadoc: artifact not found - Failure to find 
org.apache.drill.exec:drill-java-exec:jar:tests:1.14.0 in 
http://conjars.org/repo was cached in the local repository, resolution will not 
be reattempted until the update interval of conjars has elapsed or updates are 
forced
{code}

(Temporary ?) fix following Tim's suggestion: removing the dependencies to the 
test jars in the maven-javadoc-plugin in the drill-root pom.




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (DRILL-6632) drill-jdbc-all jar size limit too small for release build

2018-07-24 Thread Boaz Ben-Zvi (JIRA)

Boaz Ben-Zvi created DRILL-6632:
---

 Summary: drill-jdbc-all jar size limit too small for release build
 Key: DRILL-6632
 URL: https://issues.apache.org/jira/browse/DRILL-6632
 Project: Apache Drill
  Issue Type: Bug
  Components: Tools, Build  Test
Affects Versions: 1.14.0
Reporter: Boaz Ben-Zvi
Assignee: Boaz Ben-Zvi
 Fix For: 1.14.0


Among the changes for DRILL-6294, the limit for the drill-jdbc-all jar file 
size was increased to 3600, about what was needed to accommodate the new 
Calcite version.  

However a Release build requires a slightly larger size (probably due to adding 
several of those 
*org.codehaus.plexus.compiler.javac.JavacCompiler6931842185404907145arguments*).

Proposed Fix: Increase the size limit to 36,500,000

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Drill Hangout tomorrow at 10 am PST

2018-07-23 Thread Boaz Ben-Zvi

    The bi-weekly Drill Hangout shall take place tomorrow July 24th at 
10 am PDT


Any discussion topics are welcome (I'm currently busy with the 1.14 RC; 
I could say a word or two about that)


The Hangout link:

https://hangouts.google.com/hangouts/_/event/ci4rdiju8bv04a64efj5fedd0lc

  Thanks,

   Boaz

Temporary Hold back on commits to the Apache master branch

2018-07-23 Thread Boaz Ben-Zvi

 Hi Committers,
     Please hold back on any commits into the Apache master branch while we are 
working on creating a Release Candidate for 1.14.
  May take a few hours;
      Thanks,
              Boaz

[jira] [Created] (DRILL-6626) Hash Aggregate: Index out of bounds with small output batch size and spilling

2018-07-22 Thread Boaz Ben-Zvi (JIRA)

Boaz Ben-Zvi created DRILL-6626:
---

 Summary: Hash Aggregate: Index out of bounds with small output 
batch size and spilling
 Key: DRILL-6626
 URL: https://issues.apache.org/jira/browse/DRILL-6626
 Project: Apache Drill
  Issue Type: Bug
  Components: Execution - Relational Operators
Affects Versions: 1.14.0
Reporter: Boaz Ben-Zvi


   This new IOOB failure was seen while trying to recreate the NPE failure in 
DRILL-6622 (over TPC-DS SF1). The proposed fix for the latter (PR #1391) does 
not seem to make a difference.
This IOOB can easily be created with other large Hash-Agg queries that need to 
spill. 

The IOOB was caused after restricting the output batch size (to force many), 
and the Hash Aggr memory (to force a spill):

{code}
0: jdbc:drill:zk=local> alter system set 
`drill.exec.memory.operator.output_batch_size` = 262144;
+---++
|  ok   |summary |
+---++
| true  | drill.exec.memory.operator.output_batch_size updated.  |
+---++
1 row selected (0.106 seconds)
0: jdbc:drill:zk=local>
0: jdbc:drill:zk=local> alter session set `exec.errors.verbose` = true;
+---+---+
|  ok   |summary|
+---+---+
| true  | exec.errors.verbose updated.  |
+---+---+
1 row selected (0.081 seconds)
0: jdbc:drill:zk=local>
0: jdbc:drill:zk=local> alter session set `exec.hashagg.mem_limit` = 16777216;
+---+--+
|  ok   | summary  |
+---+--+
| true  | exec.hashagg.mem_limit updated.  |
+---+--+
1 row selected (0.089 seconds)
0: jdbc:drill:zk=local>
0: jdbc:drill:zk=local> SELECT c_customer_id FROM 
dfs.`/data/tpcds/sf1/parquet/customer`
. . . . . . . . . . . > UNION
. . . . . . . . . . . > SELECT ca_address_id FROM 
dfs.`/data/tpcds/sf1/parquet/customer_address`
. . . . . . . . . . . > UNION
. . . . . . . . . . . > SELECT cd_credit_rating FROM 
dfs.`/data/tpcds/sf1/parquet/customer_demographics`
. . . . . . . . . . . > UNION
. . . . . . . . . . . > SELECT hd_buy_potential FROM 
dfs.`/data/tpcds/sf1/parquet/household_demographics`
. . . . . . . . . . . > UNION
. . . . . . . . . . . > SELECT i_item_id FROM dfs.`/data/tpcds/sf1/parquet/item`
. . . . . . . . . . . > UNION
. . . . . . . . . . . > SELECT p_promo_id FROM 
dfs.`/data/tpcds/sf1/parquet/promotion`
. . . . . . . . . . . > UNION
. . . . . . . . . . . > SELECT t_time_id FROM 
dfs.`/data/tpcds/sf1/parquet/time_dim`
. . . . . . . . . . . > UNION
. . . . . . . . . . . > SELECT d_date_id FROM 
dfs.`/data/tpcds/sf1/parquet/date_dim`
. . . . . . . . . . . > UNION
. . . . . . . . . . . > SELECT s_store_id FROM 
dfs.`/data/tpcds/sf1/parquet/store`
. . . . . . . . . . . > UNION
. . . . . . . . . . . > SELECT w_warehouse_id FROM 
dfs.`/data/tpcds/sf1/parquet/warehouse`
. . . . . . . . . . . > UNION
. . . . . . . . . . . > SELECT sm_ship_mode_id FROM 
dfs.`/data/tpcds/sf1/parquet/ship_mode`
. . . . . . . . . . . > UNION
. . . . . . . . . . . > SELECT r_reason_id FROM 
dfs.`/data/tpcds/sf1/parquet/reason`
. . . . . . . . . . . > UNION
. . . . . . . . . . . > SELECT cc_call_center_id FROM 
dfs.`/data/tpcds/sf1/parquet/call_center`
. . . . . . . . . . . > UNION
. . . . . . . . . . . > SELECT web_site_id FROM 
dfs.`/data/tpcds/sf1/parquet/web_site`
. . . . . . . . . . . > UNION
. . . . . . . . . . . > SELECT wp_web_page_id FROM 
dfs.`/data/tpcds/sf1/parquet/web_page`
. . . . . . . . . . . > UNION
. . . . . . . . . . . > SELECT cp_catalog_page_id FROM 
dfs.`/data/tpcds/sf1/parquet/catalog_page`;
Error: SYSTEM ERROR: IndexOutOfBoundsException: Index: 26474, Size: 7

Fragment 4:0

[Error Id: d44e64ea-f474-436e-94b0-61c61eec2227 on 172.30.8.176:31020]

  (java.lang.IndexOutOfBoundsException) Index: 26474, Size: 7
java.util.ArrayList.rangeCheck():653
java.util.ArrayList.get():429

org.apache.drill.exec.physical.impl.common.HashTableTemplate$BatchHolder.rehash():293

org.apache.drill.exec.physical.impl.common.HashTableTemplate$BatchHolder.access$1300():120

org.apache.drill.exec.physical.impl.common.HashTableTemplate.resizeAndRehashIfNeeded():805
org.apache.drill.exec.physical.impl.common.HashTableTemplate.put():682

org.apache.drill.exec.physical.impl.aggregate.HashAggTemplate.checkGroupAndAggrValues():1379
org.apache.drill.exec.physical.impl.aggregate.HashAggTemplate.doWork():604
org.apache.drill.exec.physi

[jira] [Created] (DRILL-6625) Intermittent failures in Kafka unit tests

2018-07-20 Thread Boaz Ben-Zvi (JIRA)

Boaz Ben-Zvi created DRILL-6625:
---

 Summary: Intermittent failures in Kafka unit tests
 Key: DRILL-6625
 URL: https://issues.apache.org/jira/browse/DRILL-6625
 Project: Apache Drill
  Issue Type: Bug
  Components: Storage - Other
Affects Versions: 1.13.0
Reporter: Boaz Ben-Zvi
Assignee: Abhishek Ravi
 Fix For: 1.15.0


The following failures have been seen (consistently on my Mac, or occasionally 
on Jenkins) when running the unit tests, in the Kafka test suit. After the 
failure, maven hangs for a long time.

 Cost was 0.0 (instead of 26.0) :

{code:java}
Running org.apache.drill.exec.store.kafka.KafkaFilterPushdownTest
16:46:57.748 [main] ERROR org.apache.drill.TestReporter - Test Failed (d: -65.3 
KiB(73.6 KiB), h: -573.5 MiB(379.5 MiB), nh: 1.2 MiB(117.1 MiB)): 
testPushdownWithOr(org.apache.drill.exec.store.kafka.KafkaFilterPushdownTest)
java.lang.AssertionError: Unable to find expected string "kafkaScanSpec" : {
  "topicName" : "drill-pushdown-topic"
},
"cost" : 26.0 in plan: {
  "head" : {
"version" : 1,
"generator" : {
  "type" : "ExplainHandler",
  "info" : ""
},
"type" : "APACHE_DRILL_PHYSICAL",
"options" : [ {
  "kind" : "STRING",
  "accessibleScopes" : "ALL",
  "name" : "store.kafka.record.reader",
  "string_val" : 
"org.apache.drill.exec.store.kafka.decoders.JsonMessageReader",
  "scope" : "SESSION"
}, {
  "kind" : "LONG",
  "accessibleScopes" : "ALL",
  "name" : "planner.width.max_per_node",
  "num_val" : 2,
  "scope" : "SESSION"
}, {
  "kind" : "BOOLEAN",
  "accessibleScopes" : "ALL",
  "name" : "exec.errors.verbose",
  "bool_val" : true,
  "scope" : "SESSION"
}, {
  "kind" : "LONG",
  "accessibleScopes" : "ALL",
  "name" : "store.kafka.poll.timeout",
  "num_val" : 200,
  "scope" : "SESSION"
} ],
"queue" : 0,
"hasResourcePlan" : false,
"resultMode" : "EXEC"
  },
  "graph" : [ {
"pop" : "kafka-scan",
"@id" : 6,
"userName" : "",
"kafkaStoragePluginConfig" : {
  "type" : "kafka",
  "kafkaConsumerProps" : {
"bootstrap.servers" : "127.0.0.1:63751",
"group.id" : "drill-test-consumer"
  },
  "enabled" : true
},
"columns" : [ "`**`" ],
"kafkaScanSpec" : {
  "topicName" : "drill-pushdown-topic"
},
"cost" : 0.0
  }, {
{code}

Or occasionally:

{code}
---
 T E S T S
---
11:52:57.571 [main] ERROR o.a.d.e.s.k.KafkaMessageGenerator - 
org.apache.kafka.common.errors.NetworkException: The server disconnected before 
a response was received.
java.util.concurrent.ExecutionException: 
org.apache.kafka.common.errors.NetworkException: The server disconnected before 
a response was received.
{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Re: [DISCUSS] 1.14.0 release

2018-07-19 Thread Boaz Ben-Zvi


  That cracked blocker is DRILL-6453 (not 6475 ...), sorry ..

  Boaz

On 7/19/18 2:32 AM, Boaz Ben-Zvi wrote:

 Hi Charles,

 I merged the DRILL-6104 work into the Apache master; thanks for 
your useful contribution, it will be included in 1.14.


We may have just cracked the blocker DRILL-6475 , so we hopefully 
could start the RC process tomorrow.


    Thanks,

  Boaz


On 7/18/18 8:42 PM, Charles Givre wrote:

HI Boaz,
DRILL-6104 is ready to release.  Do you think we’ll have an RC this 
week?

Thanks,
— C


On Jul 2, 2018, at 23:01, Boaz Ben-Zvi  wrote:

   Let's try to make progress on the 1.14 release, aiming for a 
Release Candidate towards the end of this week (a little ambitious, 
with the July 4th and people on vacations).


Current Status of the previously requested Jiras:

==

In Progress - DRILL-6104: Generic Logfile Format Plugin

PR - DRILL-6422: Update Guava to 23.0 and shade it

PR - DRILL-5999 (DRILL-6516): Support for EMIT outcome in Streaming Agg

Ready2Commit: DRILL-5977: predicate pushdown support kafkaMsgOffset

Ready2Commit: DRILL-6519: Add String Distance and Phonetic Functions

Ready2Commit: DRILL-6577: Change Hash-Join default to not fallback 
(into pre-1.14 unlimited memory)


Committed: DRILL-6353: Upgrade Parquet MR dependencies

Committed: DRILL-6310: limit batch size for hash aggregate

===

And there are few more open or in a PR state.

    Lets try and most of these ready by the end of the week.

    Boaz

Re: [DISCUSS] 1.14.0 release

2018-07-19 Thread Boaz Ben-Zvi


 Hi Charles,

 I merged the DRILL-6104 work into the Apache master; thanks for 
your useful contribution, it will be included in 1.14.


We may have just cracked the blocker DRILL-6475 , so we hopefully could 
start the RC process tomorrow.


    Thanks,

  Boaz


On 7/18/18 8:42 PM, Charles Givre wrote:

HI Boaz,
DRILL-6104 is ready to release.  Do you think we’ll have an RC this week?
Thanks,
— C


On Jul 2, 2018, at 23:01, Boaz Ben-Zvi  wrote:

   Let's try to make progress on the 1.14 release, aiming for a Release 
Candidate towards the end of this week (a little ambitious, with the July 4th 
and people on vacations).

Current Status of the previously requested Jiras:

==

In Progress - DRILL-6104: Generic Logfile Format Plugin

PR - DRILL-6422: Update Guava to 23.0 and shade it

PR - DRILL-5999 (DRILL-6516): Support for EMIT outcome in Streaming Agg

Ready2Commit: DRILL-5977: predicate pushdown support kafkaMsgOffset

Ready2Commit: DRILL-6519: Add String Distance and Phonetic Functions

Ready2Commit: DRILL-6577: Change Hash-Join default to not fallback (into 
pre-1.14 unlimited memory)

Committed: DRILL-6353: Upgrade Parquet MR dependencies

Committed: DRILL-6310: limit batch size for hash aggregate

===

And there are few more open or in a PR state.

Lets try and most of these ready by the end of the week.

Boaz

Re: [ANNOUNCE] New PMC Chair of Apache Drill

2018-07-18 Thread Boaz Ben-Zvi


   "plus one" more congratulations 


On 7/18/18 3:20 PM, Parth Chandra wrote:

Congratulations

On Wed, Jul 18, 2018 at 3:14 PM, Kunal Khatua  wrote:


Congratulations, Arina !
On 7/18/2018 2:26:05 PM, Volodymyr Vysotskyi  wrote:
Congratulations, Arina! Well deserved!

Kind regards,
Volodymyr Vysotskyi


On Thu, Jul 19, 2018 at 12:24 AM Abhishek Girish wrote:


Congratulations, Arina!

On Wed, Jul 18, 2018 at 2:19 PM Aman Sinha wrote:


Drill developers,
Time flies and it is time for a new PMC chair ! Thank you all for your
support during the past year.

I am very pleased to announce that the Drill PMC has voted to elect

Arina

Ielchiieva as the new PMC chair of Apache Drill. She has also been
approved unanimously by the Apache Board in today's board meeting.

Please

join me in congratulating Arina !

Thanks,
Aman

Re: [DISCUSS] 1.14.0 release

2018-07-13 Thread Boaz Ben-Zvi

(Guessing ...) It is possible that the root cause for DRILL-6606 is similar
to that in  DRILL-6453 -- that is the new "early sniffing" in the
Hash-Join, which repeatedly invokes next() on the two "children" of the
join *during schema discovery* until non-empty data is returned (or NONE,
STOP, etc).  Last night Salim, Vlad and I briefly discussed alternatives,
like postponing the "sniffing" to a later time (beginning of the build for
the right child, and beginning of the probe for the left child).

However this would require some work time. So what should we do about 1.14 ?

  Thanks,

  Boaz

On Fri, Jul 13, 2018 at 3:46 AM, Arina Yelchiyeva <
arina.yelchiy...@gmail.com> wrote:

> During implementing late limit 0 optimization, Bohdan has found one more
> regression after Hash Join spill to disk.
> https://issues.apache.org/jira/browse/DRILL-6606
> 
> Boaz please take a look.
>
> Kind regards,
> Arina
>

Re: [DISCUSS] 1.14.0 release

2018-07-12 Thread Boaz Ben-Zvi

  We are getting close to a Release Candidate, though some issues are 
still pending, and we need to make decisions soon.


Soliciting opinions -- which of the following issues should be 
considered a RELEASE BLOCKER for 1.14:


= OPEN ==

OPEN - DRILL-6453 : TPCDS query 72 is Hanging (on a cluster)   (( Boaz, 
Salim ))


    We still do not have a lead on the cause, nor a work around to make 
this query run.


OPEN - DRILL-6475: Query with UNNEST causes a Null Pointer .  (( Hanumath ))

OPEN - DRILL-5495: convert_from causes ArrayIndexOutOfBounds exception. 
(( Vitalii ))


 In Review ===

DRILL-6589: Push Transitive Closure generated predicates past aggregates 
/ projects ((Gautam / Vitalii))


DRILL-6588: System table columns incorrectly marked as non-nullable 
((Kunal / Aman))


DRILL-6542: (May be Ready2Commit soon) IndexOutOfBounds exception for 
multilevel lateral ((Sorabh / Parth))


DRILL-6517: (May be Ready2Commit soon) IllegalState exception in 
Hash-Join ((Boaz / Padma, Tim))


DRILL-6496: VectorUtil.showVectorAccessibleContent does not log vector 
content ((Tim / Volodymyr))


DRILL-6410: Memory Leak in Parquet Reader during cancellation ((Vlad / 
Parth))


DRILL-6179: Added pcapng-format support ((Vlad / Paul))

DRILL-5796: Filter pruning for multi rowgroup parquet file ((Jean-Blas / 
Arina))


DRILL-5365: FileNotFoundException when reading a parquet file ((Tim / 
Vitalii))


==

    Thanks,

 -- Boaz
p.s.
   There's a batch commit in process now with some of the PRs listed in 
the prior email.


On 7/9/18 9:53 PM, Boaz Ben-Zvi wrote:

  We are making progress towards 1.14.

Let's aim for a Release Candidate branch off on  Thursday (July 12)  !!!

Below are the unfinished cases; can most be completed and checked in 
by 7/12 ?


(( Relevant people:

    Abhishek, Arina, Boaz, Charles, Hanumath, Jean-Blas, Karthik, Kunal,

    Parth, Paul, Salim, Sorabh, Tim, Vitalii, Vlad, Volodymyr ))

==

Open/blocker - DRILL-6453 + DRILL-6517:
   Two issues - Parquet Scanner (?) not setting container's record num 
(to zero), and a hang following this failure.

   Currently testing a fix / workaround ((Boaz))

In Progress - DRILL-6104: Generic Logfile Format Plugin  ((Charles + 
Paul -- can you be done by 7/12 ?))


PR - DRILL-5999 (DRILL-6516): Support for EMIT outcome in Streaming 
Agg ((Parth + Boaz reviewing))


Open - DRILL-6542: Index out of bounds ((Sorabh))

Open - DRILL-6475: Unnest Null fieldId pointer ((Hanumath))

 The following PRs are still waiting for reviews  

DRILL-6583: UI usability issue ((Kunal / Sorabh))

DRILL-6579: Add sanity checks to the Parquet Reader ((Salim / Vlad + 
Boaz))


DRILL-6578: handle query cancellation in Parquet Reader ((Salim / Vlad 
+ Boaz))


DRILL-6560: Allow options for controlling the batch size per operator 
((Salim / Karthik))


DRILL-6559: Travis timing out ((Vitalii / Tim))

DRILL-6496: VectorUtil.showVectorAccessibleContent does not log vector 
content ((Tim / Volodymyr))


DRILL-6410: Memory Leak in Parquet Reader during cancellation ((Vlad / 
Parth))


DRILL-6346: Create an Official Drill Docker Container ((Abhishek / Tim))

DRILL-6179: Added pcapng-format support ((Vlad / Paul))

DRILL-5796: Filter pruning for multi rowgroup parquet file ((Jean-Blas 
/ Arina))


DRILL-5365: FileNotFoundException when reading a parquet file ((Tim / 
Vitalii))


==

   Thanks,

  Boaz

On 7/6/18 2:51 PM, Pritesh Maker wrote:
Here is the release 1.14 dashboard 
(https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.org_jira_secure_Dashboard.jspa-3FselectPageId-3D12332463=DwIGaQ=cskdkSMqhcnjZxdQVpwTXg=7lXQnf0aC8VQ0iMXwVgNHw=V0b4-BeuIMwRczzbiSXCgL7Z7f8lrmLBGH1vnSHLjB4=HRQU6Q4umbONtN4EqY3ryggJNEyCOghAzICypRJOels= 
) and agile board 
(https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.org_jira_secure_RapidBoard.jspa-3FrapidView-3D185=DwIGaQ=cskdkSMqhcnjZxdQVpwTXg=7lXQnf0aC8VQ0iMXwVgNHw=V0b4-BeuIMwRczzbiSXCgL7Z7f8lrmLBGH1vnSHLjB4=GKxSMl97YRnHJu-AL-A-vvRe5SXqw7vdDPDzMzj-Cj4=)


I believe Volodymyr is targeting DRILL-6422 (Guava update) for 1.15 
release so it shouldn't be blocking the release. So overall, we have 
2 open bugs, 2 in progress bugs (+2 doc issues), and 12 in review (+1 
ready to commit).


If the reviewable commits won't be ready soon, can the developers 
please remove the 1.14 fix version for these issues.


Pritesh




On 7/6/18, 11:54 AM, "Boaz Ben-Zvi" b...@mapr.com> wrote:


   Current status: There's a blocker, and some work in progress 
that will

 stretch into next week.
  Current detail:
  ==
  Open/blocker - DRILL-6453 + DRILL-6517: Two issues - 
Parquet Scanner not setting record num (to zero), and a hang 
following this failure.

  In Progress - DRILL-6104: Generic Logfile Format Plugin
  PR - DRILL-6422: Update Guava to 2

Re: [DISCUSS] 1.14.0 release

2018-07-09 Thread Boaz Ben-Zvi


 Hi Charles,

    The main reason for rushing a Release Candidate is that we could 
give it enough testing.


Given that DRILL-6104 is a separate feature, with almost no impact on 
the current code, then it seems low risk to add it a few days later.


  Anyone has an objection ?

 Boaz

On 7/9/18 9:54 PM, Charles Givre wrote:

Hi Boaz,
I’m traveling at the moment, but I can have DRILL-6104 back in Paul’s hands by 
the end of the week.
—C


On Jul 10, 2018, at 00:53, Boaz Ben-Zvi  wrote:

   We are making progress towards 1.14.

Let's aim for a Release Candidate branch off on  Thursday (July 12)  !!!

Below are the unfinished cases; can most be completed and checked in by 7/12 ?

(( Relevant people:

 Abhishek, Arina, Boaz, Charles, Hanumath, Jean-Blas, Karthik, Kunal,

 Parth, Paul, Salim, Sorabh, Tim, Vitalii, Vlad, Volodymyr ))

==

Open/blocker - DRILL-6453 + DRILL-6517:
   Two issues - Parquet Scanner (?) not setting container's record num (to 
zero), and a hang following this failure.
   Currently testing a fix / workaround ((Boaz))

In Progress - DRILL-6104: Generic Logfile Format Plugin  ((Charles + Paul -- 
can you be done by 7/12 ?))

PR - DRILL-5999 (DRILL-6516): Support for EMIT outcome in Streaming Agg ((Parth 
+ Boaz reviewing))

Open - DRILL-6542: Index out of bounds ((Sorabh))

Open - DRILL-6475: Unnest Null fieldId pointer ((Hanumath))

 The following PRs are still waiting for reviews  

DRILL-6583: UI usability issue ((Kunal / Sorabh))

DRILL-6579: Add sanity checks to the Parquet Reader ((Salim / Vlad + Boaz))

DRILL-6578: handle query cancellation in Parquet Reader ((Salim / Vlad + Boaz))

DRILL-6560: Allow options for controlling the batch size per operator ((Salim / 
Karthik))

DRILL-6559: Travis timing out ((Vitalii / Tim))

DRILL-6496: VectorUtil.showVectorAccessibleContent does not log vector content 
((Tim / Volodymyr))

DRILL-6410: Memory Leak in Parquet Reader during cancellation ((Vlad / Parth))

DRILL-6346: Create an Official Drill Docker Container ((Abhishek / Tim))

DRILL-6179: Added pcapng-format support ((Vlad / Paul))

DRILL-5796: Filter pruning for multi rowgroup parquet file ((Jean-Blas / Arina))

DRILL-5365: FileNotFoundException when reading a parquet file ((Tim / Vitalii))

==

   Thanks,

  Boaz

On 7/6/18 2:51 PM, Pritesh Maker wrote:

Here is the release 1.14 dashboard 
(https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.org_jira_secure_Dashboard.jspa-3FselectPageId-3D12332463=DwIGaQ=cskdkSMqhcnjZxdQVpwTXg=7lXQnf0aC8VQ0iMXwVgNHw=V0b4-BeuIMwRczzbiSXCgL7Z7f8lrmLBGH1vnSHLjB4=HRQU6Q4umbONtN4EqY3ryggJNEyCOghAzICypRJOels=
 ) and agile board 
(https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.org_jira_secure_RapidBoard.jspa-3FrapidView-3D185=DwIGaQ=cskdkSMqhcnjZxdQVpwTXg=7lXQnf0aC8VQ0iMXwVgNHw=V0b4-BeuIMwRczzbiSXCgL7Z7f8lrmLBGH1vnSHLjB4=GKxSMl97YRnHJu-AL-A-vvRe5SXqw7vdDPDzMzj-Cj4=)

I believe Volodymyr is targeting DRILL-6422 (Guava update) for 1.15 release so 
it shouldn't be blocking the release. So overall, we have 2 open bugs, 2 in 
progress bugs (+2 doc issues), and 12 in review (+1 ready to commit).

If the reviewable commits won't be ready soon, can the developers please remove 
the 1.14 fix version for these issues.

Pritesh




On 7/6/18, 11:54 AM, "Boaz Ben-Zvi"  wrote:

   Current status: There's a blocker, and some work in progress that will
 stretch into next week.
  Current detail:
  ==
  Open/blocker - DRILL-6453 + DRILL-6517: Two issues - Parquet Scanner 
not setting record num (to zero), and a hang following this failure.
  In Progress - DRILL-6104: Generic Logfile Format Plugin
  PR - DRILL-6422: Update Guava to 23.0 and shade it
  PR - DRILL-5999 (DRILL-6516): Support for EMIT outcome in Streaming 
Agg (I'm reviewing)
   Ready2Commit: DRILL-6519: Add String Distance and Phonetic Functions 
(Arina gave it a +1 ; is it "Ready-To-Commit" or waiting for more reviews ?)
   Committed: DRILL-6570: Mentioned as a blocker by Kunal (I just merge 
#1354; the Jira was (mistakenly ?) marked "Resolved" so it missed the batch 
commit).
  Committed: DRILL-5977: predicate pushdown support kafkaMsgOffset
  Committed: DRILL-6577: Change Hash-Join default to not fallback (into
 pre-1.14 unlimited memory)
  Committed: DRILL-6353: Upgrade Parquet MR dependencies
  Committed: DRILL-6310: limit batch size for hash aggregate
  ===
  Thanks,
   Boaz
  On 7/2/18 9:51 PM, Khurram Faraaz wrote:
 > Do we plan to fix this one too, because this is a regression from Apache
 > Drill 1.13.0.
 > 
https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.org_jira_browse_DRILL-2D6453=DwIBaQ=cskdkS

Re: [DISCUSS] 1.14.0 release

2018-07-09 Thread Boaz Ben-Zvi


  We are making progress towards 1.14.

Let's aim for a Release Candidate branch off on  Thursday (July 12)  !!!

Below are the unfinished cases; can most be completed and checked in by 
7/12 ?


(( Relevant people:

    Abhishek, Arina, Boaz, Charles, Hanumath, Jean-Blas, Karthik, Kunal,

    Parth, Paul, Salim, Sorabh, Tim, Vitalii, Vlad, Volodymyr ))

==

Open/blocker - DRILL-6453 + DRILL-6517:
   Two issues - Parquet Scanner (?) not setting container's record num (to 
zero), and a hang following this failure.
   Currently testing a fix / workaround ((Boaz))

In Progress - DRILL-6104: Generic Logfile Format Plugin  ((Charles + Paul -- 
can you be done by 7/12 ?))

PR - DRILL-5999 (DRILL-6516): Support for EMIT outcome in Streaming Agg ((Parth 
+ Boaz reviewing))

Open - DRILL-6542: Index out of bounds ((Sorabh))

Open - DRILL-6475: Unnest Null fieldId pointer ((Hanumath))

 The following PRs are still waiting for reviews  

DRILL-6583: UI usability issue ((Kunal / Sorabh))

DRILL-6579: Add sanity checks to the Parquet Reader ((Salim / Vlad + Boaz))

DRILL-6578: handle query cancellation in Parquet Reader ((Salim / Vlad + Boaz))

DRILL-6560: Allow options for controlling the batch size per operator ((Salim / 
Karthik))

DRILL-6559: Travis timing out ((Vitalii / Tim))

DRILL-6496: VectorUtil.showVectorAccessibleContent does not log vector content 
((Tim / Volodymyr))

DRILL-6410: Memory Leak in Parquet Reader during cancellation ((Vlad / Parth))

DRILL-6346: Create an Official Drill Docker Container ((Abhishek / Tim))

DRILL-6179: Added pcapng-format support ((Vlad / Paul))

DRILL-5796: Filter pruning for multi rowgroup parquet file ((Jean-Blas / Arina))

DRILL-5365: FileNotFoundException when reading a parquet file ((Tim / Vitalii))

==

   Thanks,

  Boaz

On 7/6/18 2:51 PM, Pritesh Maker wrote:

Here is the release 1.14 dashboard 
(https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.org_jira_secure_Dashboard.jspa-3FselectPageId-3D12332463=DwIGaQ=cskdkSMqhcnjZxdQVpwTXg=7lXQnf0aC8VQ0iMXwVgNHw=V0b4-BeuIMwRczzbiSXCgL7Z7f8lrmLBGH1vnSHLjB4=HRQU6Q4umbONtN4EqY3ryggJNEyCOghAzICypRJOels=
 ) and agile board 
(https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.org_jira_secure_RapidBoard.jspa-3FrapidView-3D185=DwIGaQ=cskdkSMqhcnjZxdQVpwTXg=7lXQnf0aC8VQ0iMXwVgNHw=V0b4-BeuIMwRczzbiSXCgL7Z7f8lrmLBGH1vnSHLjB4=GKxSMl97YRnHJu-AL-A-vvRe5SXqw7vdDPDzMzj-Cj4=)

I believe Volodymyr is targeting DRILL-6422 (Guava update) for 1.15 release so 
it shouldn't be blocking the release. So overall, we have 2 open bugs, 2 in 
progress bugs (+2 doc issues), and 12 in review (+1 ready to commit).

If the reviewable commits won't be ready soon, can the developers please remove 
the 1.14 fix version for these issues.

Pritesh




On 7/6/18, 11:54 AM, "Boaz Ben-Zvi"  wrote:

   Current status: There's a blocker, and some work in progress that will
 stretch into next week.
 
 Current detail:
 
 ==
 
 Open/blocker - DRILL-6453 + DRILL-6517: Two issues - Parquet Scanner not setting record num (to zero), and a hang following this failure.
 
 In Progress - DRILL-6104: Generic Logfile Format Plugin
 
 PR - DRILL-6422: Update Guava to 23.0 and shade it
 
 PR - DRILL-5999 (DRILL-6516): Support for EMIT outcome in Streaming Agg (I'm reviewing)
 
 
 Ready2Commit: DRILL-6519: Add String Distance and Phonetic Functions (Arina gave it a +1 ; is it "Ready-To-Commit" or waiting for more reviews ?)
 
 
 Committed: DRILL-6570: Mentioned as a blocker by Kunal (I just merge #1354; the Jira was (mistakenly ?) marked "Resolved" so it missed the batch commit).
 
 Committed: DRILL-5977: predicate pushdown support kafkaMsgOffset
 
 Committed: DRILL-6577: Change Hash-Join default to not fallback (into

 pre-1.14 unlimited memory)
 
 Committed: DRILL-6353: Upgrade Parquet MR dependencies
 
 Committed: DRILL-6310: limit batch size for hash aggregate
 
 ===
 
 Thanks,
 
  Boaz
 
 On 7/2/18 9:51 PM, Khurram Faraaz wrote:

 > Do we plan to fix this one too, because this is a regression from Apache
 > Drill 1.13.0.
 > 
https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.org_jira_browse_DRILL-2D6453=DwIBaQ=cskdkSMqhcnjZxdQVpwTXg=EqulKDxxEDCX6zbp1AZAa1-iAPQGgCioAqgDp7DE2BU=9HpIZKDh_DBcx9yXCI2TtUWum8vRhdQgmw_3ljJQi1M=w-1C-HnlUAGRHvDFUusZ78WZrHahDka2r-F-cNb-pZA=
 >
 > On Mon, Jul 2, 2018 at 9:33 PM, Kunal Khatua  wrote:
 >
 >> DRILL-6570 seems like a must-have (release blocker, IMHO).
 >> On 7/2/2018 8:02:00 PM, Boaz Ben-Zvi  wrote:
 >> Let's try to make progress on the 1.14 release, aiming for a Release
 >> Candidate towards the en

Re: [DISCUSS] 1.14.0 release

2018-07-02 Thread Boaz Ben-Zvi

  Let's try to make progress on the 1.14 release, aiming for a Release 
Candidate towards the end of this week (a little ambitious, with the 
July 4th and people on vacations).


Current Status of the previously requested Jiras:

==

In Progress - DRILL-6104: Generic Logfile Format Plugin

PR - DRILL-6422: Update Guava to 23.0 and shade it

PR - DRILL-5999 (DRILL-6516): Support for EMIT outcome in Streaming Agg

Ready2Commit: DRILL-5977: predicate pushdown support kafkaMsgOffset

Ready2Commit: DRILL-6519: Add String Distance and Phonetic Functions

Ready2Commit: DRILL-6577: Change Hash-Join default to not fallback (into 
pre-1.14 unlimited memory)


Committed: DRILL-6353: Upgrade Parquet MR dependencies

Committed: DRILL-6310: limit batch size for hash aggregate

===

And there are few more open or in a PR state.

   Lets try and most of these ready by the end of the week.

   Boaz

[jira] [Created] (DRILL-6577) Change Hash-Join default to not fallback (into pre-1.14 unlimited memory)

2018-07-02 Thread Boaz Ben-Zvi (JIRA)

Boaz Ben-Zvi created DRILL-6577:
---

 Summary: Change Hash-Join default to not fallback (into pre-1.14 
unlimited memory)
 Key: DRILL-6577
 URL: https://issues.apache.org/jira/browse/DRILL-6577
 Project: Apache Drill
  Issue Type: Improvement
  Components: Execution - Relational Operators
Affects Versions: 1.13.0
Reporter: Boaz Ben-Zvi
Assignee: Boaz Ben-Zvi
 Fix For: 1.14.0


Change the default for `drill.exec.hashjoin.fallback.enabled` to *false* (same 
as for the similar Hash-Agg option). This would force users to calculate and 
assign sufficient memory for the query, or explicitly choose to fallback.




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (DRILL-6543) Options for memory mgmt: Reserve allowance for non-buffered, and Hash-Join default to not fallback

2018-06-26 Thread Boaz Ben-Zvi (JIRA)

Boaz Ben-Zvi created DRILL-6543:
---

 Summary: Options for memory mgmt: Reserve allowance for 
non-buffered, and Hash-Join default to not fallback   
 Key: DRILL-6543
 URL: https://issues.apache.org/jira/browse/DRILL-6543
 Project: Apache Drill
  Issue Type: Improvement
  Components: Execution - Relational Operators
Affects Versions: 1.13.0
Reporter: Boaz Ben-Zvi
Assignee: Boaz Ben-Zvi
 Fix For: 1.14.0


Changes to options related to memory budgeting:

(1) Change the default for "drill.exec.hashjoin.fallback.enabled" to *false* 
(same as for the similar Hash-Agg option). This would force users to calculate 
and assign sufficient memory for the query, or explicitly choose to fallback.

(2) When the "planner.memory.max_query_memory_per_node" (MQMPN) option is set 
equal (or "nearly equal") to the allocated *Direct Memory*, an OOM is still 
possible. The reason is that the memory used by the "non-buffered" operators is 
not taken into account.

For example, MQMPN == Direct-Memory == 100 MB. Run a query with 5 buffered 
operators (e.g., 5 instances of a Hash-Join), so each gets "promised" 20 MB. 
When other non-buffered operators (e.g., a Scanner, or a Sender) also grab some 
of the Direct Memory, then less than 100 MB is left available. And if all those 
5 Hash-Joins are pushing their limits, then one HJ may have only allocated 12MB 
so far, but on the next 1MB allocation it will hit an OOM (from the JVM, as all 
the 100MB Direct memory is already used).

A solution -- a new option to _*reserve*_ some of the Direct Memory for those 
non-buffered operators (e.g., default %25). This *allowance* may prevent many 
of the cases like the example above. The new option would return an error (when 
a query initiates) if the MQMPN is set too high. Note that this option +can 
not+ address concurrent queries.

This should also apply to the alternative for the MQMPN - the 
{{"planner.memory.percent_per_query"}} option (PPQ). The PPQ does not 
_*reserve*_ such memory (e.g., can set it to %100); only its documentation 
clearly explains this issue (that doc suggests reserving %50 allowance, as it 
was written when the Hash-Join was non-buffered; i.e., before spill was 
implemented).

The memory given to the buffered operators is the highest calculated between 
the MQMPN and the PPQ. The new reserve option would verify that this figure 
allows the allowance.

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Re: [ANNOUNCE] New Committer: Padma Penumarthy

2018-06-15 Thread Boaz Ben-Zvi

 Congratulations Padma; welcome to our “club” 

On 6/15/18, 12:47 PM, "Jyothsna Reddy"  wrote:

Congratulations, Padma !!



On Fri, Jun 15, 2018 at 12:39 PM, AnilKumar B  wrote:

> Congratulations, Padma
>
> On Fri, Jun 15, 2018 at 12:36 PM Kunal Khatua  wrote:
>
> > Congratulations, Padma !
> >
> >
> > On 6/15/2018 12:34:15 PM, Robert Wu  wrote:
> > Congratulations, Padma!
> >
> > Best regards,
> >
> > Rob
> >
> > -Original Message-
> > From: Hanumath Rao Maduri
> > Sent: Friday, June 15, 2018 12:25 PM
> > To: dev@drill.apache.org
> > Subject: Re: [ANNOUNCE] New Committer: Padma Penumarthy
> >
> > Congratulations Padma!
> >
> > On Fri, Jun 15, 2018 at 12:04 PM, Gautam Parai wrote:
> >
> > > Congratulations Padma!!
> > >
> > >
> > > Gautam
> > >
> > > 
> > > From: Vlad Rozov
> > > Sent: Friday, June 15, 2018 11:56:37 AM
> > > To: dev@drill.apache.org
> > > Subject: Re: [ANNOUNCE] New Committer: Padma Penumarthy
> > >
> > > Congrats Padma!
> > >
> > > Thank you,
> > >
> > > Vlad
> > >
> > > On 6/15/18 11:38, Charles Givre wrote:
> > > > Congrats Padma!!
> > > >
> > > >> On Jun 15, 2018, at 13:57, Bridget Bevens wrote:
> > > >>
> > > >> Congratulations, Padma!!! 
> > > >>
> > > >> 
> > > >> From: Prasad Nagaraj Subramanya
> > > >> Sent: Friday, June 15, 2018 10:32:04 AM
> > > >> To: dev@drill.apache.org
> > > >> Subject: Re: [ANNOUNCE] New Committer: Padma Penumarthy
> > > >>
> > > >> Congratulations Padma!
> > > >>
> > > >> Thanks,
> > > >> Prasad
> > > >>
> > > >> On Fri, Jun 15, 2018 at 9:59 AM Vitalii Diravka <>
> > > vitalii.dira...@gmail.com>
> > > >> wrote:
> > > >>
> > > >>> Congrats Padma!
> > > >>>
> > > >>> Kind regards
> > > >>> Vitalii
> > > >>>
> > > >>>
> > > >>> On Fri, Jun 15, 2018 at 7:40 PM Arina Ielchiieva
> > > >>>
> > > wrote:
> > > >>>
> > >  Padma, congratulations and welcome!
> > > 
> > >  Kind regards,
> > >  Arina
> > > 
> > >  On Fri, Jun 15, 2018 at 7:36 PM Aman Sinha
> > > wrote:
> > > 
> > > > The Project Management Committee (PMC) for Apache Drill has
> > > > invited
> > > >>> Padma
> > > > Penumarthy to become a committer, and we are pleased to announce
> > > > that
> > > >>> she
> > > > has
> > > > accepted.
> > > >
> > > > Padma has been contributing to Drill for about 1 1/2 years. She
> > > > has
> > > >>> made
> > > > improvements for work-unit assignment in the parallelizer,
> > > performance
> > > >>> of
> > > > filter operator for pattern matching and (more recently) on the
> > > > batch sizing for several operators: Flatten, MergeJoin, 
HashJoin,
> > UnionAll.
> > > >
> > > > Welcome Padma, and thank you for your contributions. Keep up
> > > > the
> > > good
> > >  work
> > > > !
> > > >
> > > > -Aman
> > > > (on behalf of Drill PMC)
> > > >
> > >
> > >
> >
> --
> Thanks & Regards,
> B Anil Kumar.
>

[DISCUSS] 1.14.0 release

2018-06-12 Thread Boaz Ben-Zvi

Hello Drillers,
  Nearly three months have passed since the 1.13.0 release, and it is time to 
start planning for the 1.14.0 ; I volunteer to manage the new release.
  If there is any ongoing work not yet committed into the Apache Drill master, 
that you strongly feel MUST be included in the 1.14 release, please reply to 
this thread. There are quite a few pending PRs, and we should prioritize and 
close the needed ones soon enough.
    Thanks,
       Boaz

[jira] [Created] (DRILL-6487) Negative row count when selecting from a json file with an OFFSET clause

2018-06-11 Thread Boaz Ben-Zvi (JIRA)

Boaz Ben-Zvi created DRILL-6487:
---

 Summary: Negative row count when selecting from a json file with 
an OFFSET clause
 Key: DRILL-6487
 URL: https://issues.apache.org/jira/browse/DRILL-6487
 Project: Apache Drill
  Issue Type: Bug
  Components: Query Planning  Optimization
Affects Versions: 1.13.0
Reporter: Boaz Ben-Zvi
 Fix For: 1.14.0


This simple query fails: 

{code}
select * from dfs.`/data/foo.json` offset 1 row;
{code}

where foo.json is 
{code}
{"key": "aa", "sales": 11}
{"key": "bb", "sales": 22}
{code}

The error returned is:
{code}
0: jdbc:drill:zk=local> select * from dfs.`/data/foo.json` offset 1 row;
Error: SYSTEM ERROR: AssertionError


[Error Id: 960d66a9-b480-4a7e-9a25-beb4928e8139 on 10.254.130.25:31020]

  (org.apache.drill.exec.work.foreman.ForemanException) Unexpected exception 
during fragment initialization: null
org.apache.drill.exec.work.foreman.Foreman.run():282
java.util.concurrent.ThreadPoolExecutor.runWorker():1142
java.util.concurrent.ThreadPoolExecutor$Worker.run():617
java.lang.Thread.run():745
  Caused By (java.lang.AssertionError) null
org.apache.calcite.rel.metadata.RelMetadataQuery.isNonNegative():900
org.apache.calcite.rel.metadata.RelMetadataQuery.validateResult():919
org.apache.calcite.rel.metadata.RelMetadataQuery.getRowCount():236
org.apache.calcite.rel.SingleRel.estimateRowCount():68

org.apache.drill.exec.planner.physical.visitor.ExcessiveExchangeIdentifier$MajorFragmentStat.add():103

org.apache.drill.exec.planner.physical.visitor.ExcessiveExchangeIdentifier.visitPrel():76

org.apache.drill.exec.planner.physical.visitor.ExcessiveExchangeIdentifier.visitPrel():32

org.apache.drill.exec.planner.physical.visitor.BasePrelVisitor.visitProject():50
org.apache.drill.exec.planner.physical.ProjectPrel.accept():98

org.apache.drill.exec.planner.physical.visitor.ExcessiveExchangeIdentifier.visitScreen():63

org.apache.drill.exec.planner.physical.visitor.ExcessiveExchangeIdentifier.visitScreen():32
org.apache.drill.exec.planner.physical.ScreenPrel.accept():65

org.apache.drill.exec.planner.physical.visitor.ExcessiveExchangeIdentifier.removeExcessiveEchanges():41

org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.convertToPrel():557
org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.getPlan():179
org.apache.drill.exec.planner.sql.DrillSqlWorker.getQueryPlan():145
org.apache.drill.exec.planner.sql.DrillSqlWorker.getPlan():83
org.apache.drill.exec.work.foreman.Foreman.runSQL():567
org.apache.drill.exec.work.foreman.Foreman.run():264
java.util.concurrent.ThreadPoolExecutor.runWorker():1142
java.util.concurrent.ThreadPoolExecutor$Worker.run():617
java.lang.Thread.run():745 (state=,code=0)
{code}





--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (DRILL-6479) Support for EMIT outcome in Hash Aggregate

2018-06-07 Thread Boaz Ben-Zvi (JIRA)

Boaz Ben-Zvi created DRILL-6479:
---

 Summary: Support for EMIT outcome in Hash Aggregate
 Key: DRILL-6479
 URL: https://issues.apache.org/jira/browse/DRILL-6479
 Project: Apache Drill
  Issue Type: Improvement
  Components: Execution - Relational Operators
Reporter: Boaz Ben-Zvi
Assignee: Boaz Ben-Zvi
 Fix For: 1.14.0


With the new Lateral and Unnest -- if a Hash-Aggregate operator is present in 
the sub-query, then it needs to handle the EMIT outcome correctly. This means 
that when a EMIT is received then perform the aggregation operation on the 
records buffered so far and produce the output with it. After handling an EMIT 
the Hash-Aggr should refresh it's state and a continue to work on the next 
batches of incoming records unless an EMIT is seen again.




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (DRILL-6475) Unnest: Null fieldId Pointer

2018-06-06 Thread Boaz Ben-Zvi (JIRA)

Boaz Ben-Zvi created DRILL-6475:
---

 Summary: Unnest: Null fieldId Pointer 
 Key: DRILL-6475
 URL: https://issues.apache.org/jira/browse/DRILL-6475
 Project: Apache Drill
  Issue Type: Bug
  Components: Execution - Relational Operators
Reporter: Boaz Ben-Zvi
Assignee: Parth Chandra
 Fix For: 1.14.0


 Executing the following (in TestE2EUnnestAndLateral.java) causes an NPE as 
`fieldId` is null in `schemaChanged()`: 

```

@Test
public void testMultipleBatchesLateral_twoUnnests() throws Exception {
 String sql = "SELECT t5.l_quantity FROM dfs.`lateraljoin/multipleFiles/` t, 
LATERAL " +
 "(SELECT t2.ordrs FROM UNNEST(t.c_orders) t2(ordrs)) t3(ordrs), LATERAL " +
 "(SELECT t4.l_quantity FROM UNNEST(t3.ordrs) t4(l_quantity)) t5";
 test(sql);
}

```

 

And the error is:

```

Error: SYSTEM ERROR: NullPointerException

Fragment 0:0

[Error Id: 25f42765-8f68-418e-840a-ffe65788e1e2 on 10.254.130.25:31020]

(java.lang.NullPointerException) null
 
org.apache.drill.exec.physical.impl.unnest.UnnestRecordBatch.schemaChanged():381
 org.apache.drill.exec.physical.impl.unnest.UnnestRecordBatch.innerNext():199
 org.apache.drill.exec.record.AbstractRecordBatch.next():172
 
org.apache.drill.exec.physical.impl.validate.IteratorValidatorBatchIterator.next():229
 org.apache.drill.exec.record.AbstractRecordBatch.next():119
 
org.apache.drill.exec.physical.impl.join.LateralJoinBatch.prefetchFirstBatchFromBothSides():241
 org.apache.drill.exec.physical.impl.join.LateralJoinBatch.buildSchema():264
 org.apache.drill.exec.record.AbstractRecordBatch.next():152
 
org.apache.drill.exec.physical.impl.validate.IteratorValidatorBatchIterator.next():229
 org.apache.drill.exec.record.AbstractRecordBatch.next():119
 org.apache.drill.exec.record.AbstractRecordBatch.next():109
 org.apache.drill.exec.record.AbstractUnaryRecordBatch.innerNext():63
 org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext():137
 org.apache.drill.exec.record.AbstractRecordBatch.next():172
 
org.apache.drill.exec.physical.impl.validate.IteratorValidatorBatchIterator.next():229
 org.apache.drill.exec.record.AbstractRecordBatch.next():119
 org.apache.drill.exec.record.AbstractRecordBatch.next():109
 org.apache.drill.exec.record.AbstractUnaryRecordBatch.innerNext():63
 org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext():137
 org.apache.drill.exec.record.AbstractRecordBatch.next():172
 
org.apache.drill.exec.physical.impl.validate.IteratorValidatorBatchIterator.next():229
 org.apache.drill.exec.physical.impl.BaseRootExec.next():103
 org.apache.drill.exec.physical.impl.ScreenCreator$ScreenRoot.innerNext():83
 org.apache.drill.exec.physical.impl.BaseRootExec.next():93
 org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():292
 org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():279
 java.security.AccessController.doPrivileged():-2
 javax.security.auth.Subject.doAs():422
 org.apache.hadoop.security.UserGroupInformation.doAs():1657
 org.apache.drill.exec.work.fragment.FragmentExecutor.run():279
 org.apache.drill.common.SelfCleaningRunnable.run():38
 java.util.concurrent.ThreadPoolExecutor.runWorker():1142
 java.util.concurrent.ThreadPoolExecutor$Worker.run():617
 java.lang.Thread.run():745 (state=,code=0)

```

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Re: How to generate hash code for each build side one of the hash join columns

2018-05-31 Thread Boaz Ben-Zvi

 Hi Weijie,

Another option is to totally avoid the generated code.
We were considering the idea of replacing the generated code used for computing 
hash values with “real java” code.

This idea is analogous to the usage of the copyEntry() method in the 
ValueVector interface (that Paul added last year).
See an example of using the copyEntry() (via the appendRow() in 
VectorContainer) in the new Hash-Join-Spill code.
Basically no need to generate “type specific” code, as the virtual copyEntry() 
method does the “type specific” work. 

Similarly we could have a hash64() method in ValueVector, which would perform 
the “type specific” computation.
(One difference from copyEntry() – the hash64() would also need to take the 
“seed” parameter, which is the hash value produced by the previous hash).
And similar to appendRow(), there would be evalHash() iterating over the key 
columns.
(And one difference from appendRow() – need to iterate only on the key columns; 
these are the first columns; their number can be found from the config: e.g., 
htConfig.getKeyExprsBuild().size() )

   With such implementation, that evalHash() could be used anywhere (e.g., to 
match the Bloom filters on the left side of the join).

   Thanks,

 Boaz

On 5/30/18, 7:49 PM, "weijie tong"  wrote:

Hi Aman:

  Thanks for your tips. I have rebased the latest code from the master
branch . Yes, the spill-to-disk feature does changed the original
implementation. I have adjusted my implementation according to the new
feature. But as you say, it will take some challenge to integration as I
noticed the spill-to-disk feature will continue to tune its implementation
performance.

  The BloomFilter was implemented natively in Drill , not an external
library. It's implemented the algorithm of the paper which was mentioned by
you.

On Thu, May 31, 2018 at 1:56 AM Aman Sinha  wrote:

> Hi Weijie,
> I was hoping you could leverage the existing methods..so its good that you
> found the ones that work for your use case.
> One thing I want to point out (maybe you're already aware) .. the Hash 
Join
> code has changed significantly in the master branch due to the
> spill-to-disk feature.
> So, this may pose some integration challenges for your run-time join
> pushdown feature.
> Also, one other question/clarification:  for the bloom filter itself are
> you implementing it natively in Drill or using an external library ?
>
> -Aman
>
> On Tue, May 29, 2018 at 8:23 PM, weijie tong 
> wrote:
>
> > I found ClassGenerator's nestEvalBlock(JBlock block) and
> unNestEvalBlock()
> > which has the same effect to what I change to the ClassGenerator. So I
> give
> > up what I change to the ClassGenerator and hope this can help someone
> else.
> >
> > On Tue, May 29, 2018 at 1:53 PM weijie tong 
> > wrote:
> >
> > > The code formatting is not nice. Put them again:
> > >
> > > private void setupGetBuild64Hash(ClassGenerator cg,
> > MappingSet
> > > incomingMapping, VectorAccessible batch, LogicalExpression[] keyExprs,
> > > TypedFieldId[] buildKeyFieldIds)
> > > throws SchemaChangeException {
> > > cg.setMappingSet(incomingMapping);
> > > if (keyExprs == null || keyExprs.length == 0) {
> > >   cg.getEvalBlock()._return(JExpr.lit(0));
> > > }
> > > String seedValue = "seedValue";
> > > String fieldId = "fieldId";
> > > LogicalExpression seed =
> > > ValueExpressions.getParameterExpression(seedValue, Types.required(
> > > TypeProtos.MinorType.INT));
> > >
> > > LogicalExpression fieldIdParamExpr =
> > > ValueExpressions.getParameterExpression(fieldId, Types.required(
> > > TypeProtos.MinorType.INT) );
> > > HoldingContainer fieldIdParamHolder = cg.addExpr(fieldIdParamExpr);
> > > int i = 0;
> > >  for (LogicalExpression expr : keyExprs) {
> > >  TypedFieldId targetTypeFieldId = buildKeyFieldIds[i];
> > >  ValueExpressions.IntExpression targetBuildFieldIdExp = new
> > > ValueExpressions.IntExpression(targetTypeFieldId.getFieldIds()[0],
> > > ExpressionPosition.UNKNOWN);
> > >
> > > JFieldRef targetBuildSideFieldId =
> cg.addExpr(targetBuildFieldIdExp,
> > > ClassGenerator.BlkCreateMode.TRUE_IF_BOUND).getValue();
> > > JBlock ifBlock =
> > > cg.getEvalBlock()._if(fieldIdParamHolder.getValue().
> > eq(targetBuildSideFieldId))._then();
> > > //specify a special JBlock which is a inner one of the eval block
> to
> > > the ClassGenerator to substitute the returned JBlock of getEvalBlock()
> > > cg.setCustomizedEvalInnerBlock(ifBlock);
> > > LogicalExpression hashExpression =
> > > HashPrelUtil.getHashExpression(expr, seed, incomingProbe != null);
> > > LogicalExpression materializedExpr =
> > >

[jira] [Created] (DRILL-6444) Hash Join: Avoid partitioning when memory is sufficient

2018-05-24 Thread Boaz Ben-Zvi (JIRA)

Boaz Ben-Zvi created DRILL-6444:
---

 Summary: Hash Join: Avoid partitioning when memory is sufficient 
 Key: DRILL-6444
 URL: https://issues.apache.org/jira/browse/DRILL-6444
 Project: Apache Drill
  Issue Type: Improvement
  Components: Execution - Relational Operators
Reporter: Boaz Ben-Zvi
Assignee: Boaz Ben-Zvi


The Hash Join Spilling feature introduced partitioning (of the incoming build 
side) which adds some overhead (copying the incoming data, row by row). That 
happens even when no spilling is needed.

Suggested optimization: Try reading the incoming build data without 
partitioning, while checking that enough memory is available. In case the whole 
build side (plus hash table) fits in memory - then continue like a "single 
partition". In case not, then need to partition the data read so far and 
continue as usual (with partitions).

(See optimization 8.1 in the Hash Join Spill design document: 
[https://docs.google.com/document/d/1-c_oGQY4E5d58qJYv_zc7ka834hSaB3wDQwqKcMoSAI/edit]
 )

This is currently implemented only for the case of num_partitions = 1 (i.e, no 
spilling, and no memory checking).

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Re: Delete spurious branches from Apache

2018-05-22 Thread Boaz Ben-Zvi

  Done – all three were deleted, using github (thanks @Vlad Rozov).

There is also an active branch called “gh-pages” (and a non-active one called 
“gh-pages-master”) that seems to be used for the Drill documentation.
Maybe this work should be moved to its own Repo ?  It does not make sense to be 
a part of the Drill sources (e.g., note this branch is “3086 commits behind”).
@Bridget Bevens – what do you think ?

Thanks,

   Boaz

On 5/22/18, 1:02 PM, "Parth Chandra" <par...@apache.org> wrote:

Yes, please go ahead and remove these branches.

On Mon, May 21, 2018 at 8:06 PM, Vlad Rozov <vro...@apache.org> wrote:

> There is an option  to delete the branch on github directly (no need to
> use "git push").
>
> Thank you,
>
> Vlad
>
>
> On 5/21/18 18:35, Boaz Ben-Zvi wrote:
>
>> I mistakenly pushed a branch (“MERGE-180521-01”) into the Apache
>> repo, and plan to delete it soon
>> (i.e. do  “git push 
https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_apache_drill.git=DwIFaQ=cskdkSMqhcnjZxdQVpwTXg=EqulKDxxEDCX6zbp1AZAa1-iAPQGgCioAqgDp7DE2BU=Jgwab6ezufiqEbsi7h1wNNGZPnUbR28Caq2JWPsSlp0=JiRFn2vcSJHxPA09nlPxBKSi9xk0thr-JKiaYNyMEws=
 --delete
>> MERGE-180521-01” ).
>>
>>Just in case someone notices 
>>
>>On this occasion: Looks like there are other similar such branches:
>> “DRILL-3478” and “DRILL-4235” ; any objection to deleting those as well ?
>>
>>  Thanks,
>>
>>  Boaz
>>
>>
>>
>

Delete spurious branches from Apache

2018-05-21 Thread Boaz Ben-Zvi

   I mistakenly pushed a branch (“MERGE-180521-01”) into the Apache repo, and 
plan to delete it soon 
(i.e. do  “git push https://github.com/apache/drill.git --delete 
MERGE-180521-01” ).

  Just in case someone notices 

  On this occasion: Looks like there are other similar such branches:  
“DRILL-3478” and “DRILL-4235” ; any objection to deleting those as well ?

Thanks,

Boaz

[jira] [Created] (DRILL-6400) Hash-Aggr: Avoid recreating common Hash-Table setups for every partition

2018-05-09 Thread Boaz Ben-Zvi (JIRA)

Boaz Ben-Zvi created DRILL-6400:
---

 Summary: Hash-Aggr: Avoid recreating common Hash-Table setups for 
every partition
 Key: DRILL-6400
 URL: https://issues.apache.org/jira/browse/DRILL-6400
 Project: Apache Drill
  Issue Type: Improvement
  Components: Execution - Relational Operators
Affects Versions: 1.13.0
Reporter: Boaz Ben-Zvi
Assignee: Boaz Ben-Zvi
 Fix For: 1.14.0


 The current Hash-Aggr code (and soon the Hash-Join code) creates multiple 
partitions to hold the incoming data; each partition with its own HashTable. 

     The current code invokes the HashTable method _createAndSetupHashTable()_ 
for *each* partition. But most of the setups done by this method are identical 
for all the partitions (e.g., code generation).  Calling this method has a 
performance cost (some local tests measured between 3 - 30 milliseconds, 
depends on the key columns).

  Suggested performance improvement: Extract the common settings to be called 
*once*, and use the results later by all the partitions. When running with the 
default 32 partitions, this can have a measurable improvement (and if spilling, 
this method is used again).

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Re: [IMPORTANT] Gitbox enabled

2018-05-09 Thread Boaz Ben-Zvi

 Note *committers* , in case you get the same error:

After successfully enabling the needed Two Factor Authentication (2FA), my “git 
push” started failing, like:

~/drill > git push origin
Username for 'https://github.com': ben-zvi
Password for 'https://ben-...@github.com':
remote: Invalid username or password.
fatal: Authentication failed for 'https://github.com/Ben-Zvi/drill.git/'

The solution: Need to enter a personal access token instead of the github 
password. 
To generate a personal access token, go to https://github.com/settings/tokens 
The token is a long hash code ; just copy and paste it as a password.

   Thanks,

   Boaz


On 5/3/18, 11:36 AM, "Parth Chandra" <par...@apache.org> wrote:

Note to all the *committers* -

Gitbox integration has been enabled. This means you can merge in a PR
directly from Github. (i.e. the apache/drill repository on github is now
the master repository, and  is writable. (It is no longer a mirror).

This also means that the original git-wip repository will not be available
and pushing to this repository will not achieve anything useful.

[IMPORTANT] Please visit 
https://urldefense.proofpoint.com/v2/url?u=https-3A__gitbox.apache.org_setup_=DwIBaQ=cskdkSMqhcnjZxdQVpwTXg=EqulKDxxEDCX6zbp1AZAa1-iAPQGgCioAqgDp7DE2BU=eWZe_PpBWgOlhFxb0vKBmQ0-DhAbSxF_WoJ8rg8dT9U=fsXeKdxWoc0QL7vvlOPkm4D4aiyv_gLn0IM1oP8s7TM=
 to setup 2FA if
you'd like to use GitHub as a remote g...@github.com:apache/drill.git

You can also use GitBox as a remote

https://urldefense.proofpoint.com/v2/url?u=https-3A__gitbox.apache.org_repos_asf_drill.git=DwIBaQ=cskdkSMqhcnjZxdQVpwTXg=EqulKDxxEDCX6zbp1AZAa1-iAPQGgCioAqgDp7DE2BU=eWZe_PpBWgOlhFxb0vKBmQ0-DhAbSxF_WoJ8rg8dT9U=DOiW8QP9zBVK4JKcS-aDmTsrOvQJu7syQgTWpCRi2zM=

Same thing for drill-site g...@github.com:apache/drill-site.git or

https://urldefense.proofpoint.com/v2/url?u=https-3A__gitbox.apache.org_repos_asf_drill-2Dsite.git=DwIBaQ=cskdkSMqhcnjZxdQVpwTXg=EqulKDxxEDCX6zbp1AZAa1-iAPQGgCioAqgDp7DE2BU=eWZe_PpBWgOlhFxb0vKBmQ0-DhAbSxF_WoJ8rg8dT9U=XKrIn1R2oXPsF_Y4OPadWjIeRQXAOv7BJgEi-qWY-Kg=


[IMPORTANT] - The github UI currently enables the option to "Create a merge
commit" . Please *do not* use this option. Click on the drop down and chose
the "rebase and merge" or "squash and merge" option

@vrozov is the expert on this, so if you run into difficulties please
include him in the communication. (Better still just post on the list).

Thanks

Parth

[GitHub] drill pull request #1248: DRIL-6027: Implement Spilling for the Hash-Join

2018-05-01 Thread Ben-Zvi

GitHub user Ben-Zvi opened a pull request:

https://github.com/apache/drill/pull/1248

DRIL-6027: Implement Spilling for the Hash-Join

This PR covers the work to enable the Hash-Join operator (*HJ*) to spill - 
when its limited memory becomes too small to hold the incoming data. 
 @ilooner is a co-contributor of this work.

Below is a high level description of the main changes, to help the 
reviewers. More design detail is available in the design document 
(https://docs.google.com/document/d/1-c_oGQY4E5d58qJYv_zc7ka834hSaB3wDQwqKcMoSAI/)
Some of this work follows a prior similar work done for the Hash-Aggregate 
(*HAG*) operator; some similarity to the HAG is mentioned to help reviewrs 
familiar with those changes.

h2. Partitions:
Just like the HAG spilling, the main idea to enable spilling is to split 
the incoming rows into separate *Partitions*, such that the HJ can gradually 
adopt to a memory pressure situation by picking an in-memory partition and 
spilling it as the need arises, thus freeing some memory.
Unlike the HAG, the HJ has two incomings - the build/inner/right and the 
probe/outer/left. The HJ partitions its Build side first, and if needed, may 
spill some of these partitions as data is read. Later the Probe side is read 
and partitioned the same way, where outer partitions matching spilled inner 
partitions are spilled as well - unconditionally.

h6. {{HashPartition}} class:
A new class {{HashPartition}} was created to encapsulate the work of each 
partition; this class handles the pair - the build-side partition and its 
matching probe-side partition. Most of its code was extracted from prior code 
in {{HashJoinBatch}}.

h4. Hash Values:
The hash-values are computed at first time, then saved into a special 
column (named "Hash_Values"), which may be spilled, etc. This avoids 
recomputation (unlike the HAG, which recomputes). After reading a batch from a 
spill file, this Hash-values vector is separated (into {{read_HV_vector}}) and 
used instead of computing the hash values.

h4. Build Hash Table:
Unlike the HAG - the hash-table (and "helper") are built (per each inner 
partition) only *after* that whole partition was read into memory. (This avoids 
wasted work, in case the partition needs to spill). Another improvement: As the 
number of entries is known at that final time (ignoring duplicates), then the 
hash table can be initially sized right, avoiding the need for later costly 
resizings (see {{hashTable.updateInitialCapacity()}}). 

h4. Same as the HAG:
* Same metrics (NUM_PARTITIONS, SPILLED_PARTITIONS, SPILL_MB, 
SPILL_CYCLE) 
* Using the {{SpillSet}} class.
* Recursive spilling. (Nearly the same code - see {{innerNext()}} in 
{{HashJoinBatch.java}}). Except that the HJ may have duplicate entries - so 
when the spill cycle has consumed more than 20 bits of the hash value, then err.
* Option controlling the number of partitions (and when that number is 1 
--> spilling is disabled).

h6. Avoid copying:
Copying the incoming build data into the partitions' batches is a new extra 
step, adding some overhead. To match performance with prior Drill, in case of a 
single partition (no spilling, no memory checks) -- the incoming vectors are 
used as is, without copying. Future work may extend this for the general case 
(involving memory checks, etc.)

h2. Memory Calculations:
h4. Initial memory allocation:
The HJ was made a "buffered" operator (see {{isBufferedOperator()}}, just 
like the HAG and the External Sort), hence gets assigned an equal memory share 
(out of the "memory per query per node"; see 
{{setupBufferedOpsMemoryAllocations()}}). Except when the number of partitions 
is forced to be 1, when it "falls back" to the "old uncontrolled" behavior 
(similar to what was done for the HAG).

h4. Memory Calculator:
The memory calculator is knowlegable of the current and future memory needs 
(including current memory usage of all the partitions, an outgoing batch, an 
incoming outer batch, and the hash tables and "helpers"). The calculator is 
used first to find an optimal number of partitions (starting from a number 
controll by {{hashjoin_num_partitions}}, default 32, and lowering if that 
number requires too much memory). The second use of the calculator is to 
determine if a spill is needed, prior to allocating more memory (see 
{{shouldSpill()}}). This chack is performed at two places: When reading the 
build side and about to allocate a new batch (see {{appendInnerRow()}}). And 
when hash tables (and helpers) are allocated for the in-memory partitions (in 
{{executeBuildPhase()}}).

h6. Implementation:
The {{HashJoinMemoryCalculator}} is an interface, implemented by 
{{HashJoinMemoryCalculatorImpl}} for regular work. For testing, we can limit 
the numb

[GitHub] drill pull request #1239: DRILL-143: CGroup Support for Drill-on-YARN

2018-04-23 Thread Ben-Zvi

Github user Ben-Zvi commented on a diff in the pull request:

https://github.com/apache/drill/pull/1239#discussion_r183607110
  
--- Diff: distribution/src/resources/yarn-drillbit.sh ---
@@ -110,6 +114,36 @@
 # Enables Java GC logging. Passed from the drill.yarn.drillbit.log-gc
 # garbage collection option.
 
+### Function to enforce CGroup (Refer local drillbit.sh)
+check_and_enforce_cgroup(){
+dbitPid=$1;
+kill -0 $dbitPid
+if [ $? -gt 0 ]; then 
+  echo "ERROR: Failed to add Drillbit to CGroup ( $DRILLBIT_CGROUP ) 
for 'cpu'. Ensure that the Drillbit ( pid=$dbitPid ) started up." >&2
+  exit 1
+fi
+SYS_CGROUP_DIR=${SYS_CGROUP_DIR:-"/sys/fs/cgroup"}
+if [ -f $SYS_CGROUP_DIR/cpu/$DRILLBIT_CGROUP/cgroup.procs ]; then
+  echo $dbitPid > $SYS_CGROUP_DIR/cpu/$DRILLBIT_CGROUP/cgroup.procs
+  # Verify Enforcement
+  cgroupStatus=`grep -w $pid 
$SYS_CGROUP_DIR/cpu/${DRILLBIT_CGROUP}/cgroup.procs`
+  if [ -z "$cgroupStatus" ]; then
--- End diff --

I'm confused: Is this checking for $dbitPid (in cgroup.procs) or for $pid ?
In case the former, then need to negate the following "-z" condition.
 


---

[GitHub] drill issue #1227: Drill-6236: batch sizing for hash join

2018-04-20 Thread Ben-Zvi

Github user Ben-Zvi commented on the issue:

https://github.com/apache/drill/pull/1227
  
Need to be "DRILL" in capital letters ...



---

[GitHub] drill issue #1227: Drill 6236: batch sizing for hash join

2018-04-20 Thread Ben-Zvi

Github user Ben-Zvi commented on the issue:

https://github.com/apache/drill/pull/1227
  
Need to update the subject line of this PR: the '-' is missing between 
DRILL and 6236 (should be DRILL**-**6236) ; because of this missing '-' the PR 
is not listed in the Jira 



---

[GitHub] drill pull request #1227: Drill 6236: batch sizing for hash join

2018-04-19 Thread Ben-Zvi

Github user Ben-Zvi commented on a diff in the pull request:

https://github.com/apache/drill/pull/1227#discussion_r182929294
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/record/RecordBatchMemoryManager.java
 ---
@@ -188,12 +196,18 @@ public int getOutgoingRowWidth() {
   public void setRecordBatchSizer(int index, RecordBatchSizer sizer) {
 Preconditions.checkArgument(index >= 0 && index < numInputs);
 this.sizer[index] = sizer;
-inputBatchStats[index] = new BatchStats();
+if (inputBatchStats[index] == null) {
+  inputBatchStats[index] = new BatchStats();
+}
+updateIncomingStats(index);
   }
 
   public void setRecordBatchSizer(RecordBatchSizer sizer) {
--- End diff --

Can instead just call the above method with DEFAULT_INPUT_INDEX as the 
first parameter. 


---

1 2 3 4 5 >

1 - 100 of 434 matches

Mail list logo