Re: May Apache Drill board report

2019-05-04 Thread Arina Ielchiieva
Thanks everybody for the feedback, report has been posted.

Kind regards,
Arina

On Sat, May 4, 2019 at 2:29 AM SorabhApache  wrote:

> +1
>
> On Fri, May 3, 2019 at 4:22 PM Boaz Ben-Zvi  wrote:
>
> > No comments; looks fine; +1
> >
> > On 5/3/19 3:10 PM, Aman Sinha wrote:
> > > +1
> > >
> > > On Fri, May 3, 2019 at 1:40 PM Volodymyr Vysotskyi <
> volody...@apache.org
> > >
> > > wrote:
> > >
> > >> Looks good, +1
> > >>
> > >>
> > >> Пт, 3 трав. 2019 23:32 користувач Arina Ielchiieva 
> > >> пише:
> > >>
> > >>> Hi all,
> > >>>
> > >>> please take a look at the draft board report for the last quarter and
> > let
> > >>> me know if you have any comments.
> > >>>
> > >>> Thanks,
> > >>> Arina
> > >>>
> > >>> =
> > >>>
> > >>> ## Description:
> > >>> - Drill is a Schema-free SQL Query Engine for Hadoop, NoSQL and Cloud
> > >>>Storage.
> > >>>
> > >>> ## Issues:
> > >>>   - There are no issues requiring board attention at this time.
> > >>>
> > >>> ## Activity:
> > >>> - Since the last board report, Drill has released version 1.16.0,
> > >> including
> > >>>the following enhancements:
> > >>>- CREATE OR REPLACE SCHEMA command to define a schema for text
> files
> > >>>- REFRESH TABLE METADATA command can generate metadata cache files
> > for
> > >>>specific columns
> > >>>- ANALYZE TABLE statement to computes statistics on Parquet data
> > >>>- SYSLOG (RFC-5424) Format Plugin
> > >>>- NEAREST DATE function to facilitate time series analysis
> > >>>- Format plugin for LTSV files
> > >>>- Ability to query Hive views
> > >>>- Upgrade to SQLLine 1.7
> > >>>- Apache Calcite upgrade to 1.18.0
> > >>>- Several Drill Web UI improvements, including:
> > >>>   - Storage plugin management improvements
> > >>>   - Query progress indicators and warnings
> > >>>   - Ability to limit the result size for better UI response
> > >>>   - Ability to sort the list of profiles in the Drill Web UI
> > >>>   - Display query state in query result page
> > >>>   - Button to reset the options filter
> > >>>
> > >>> - Drill User Meetup will be held on May 22, 2019. Two talks are
> > planned:
> > >>>- Alibaba's Usage of Apache Drill for querying a Time Series
> > Database
> > >>>- What’s new with Apache Drill 1.16 & a demo of Schema
> Provisioning
> > >>>
> > >>> ## Health report:
> > >>> - The project is healthy. Development activity as reflected in the
> pull
> > >>>requests and JIRAs is good.
> > >>> - Activity on the dev and user mailing lists are stable.
> > >>> - One PMC member was added in the last period.
> > >>>
> > >>> ## PMC changes:
> > >>>
> > >>> - Currently 24 PMC members.
> > >>> - Sorabh Hamirwasia was added to the PMC on Fri Apr 05 2019
> > >>>
> > >>> ## Committer base changes:
> > >>>
> > >>> - Currently 51 committers.
> > >>> - No new committers added in the last 3 months
> > >>> - Last committer addition was Salim Achouche at Mon Dec 17 2018
> > >>>
> > >>> ## Releases:
> > >>>
> > >>> - 1.16.0 was released on Thu May 02 2019
> > >>>
> > >>> ## Mailing list activity:
> > >>>
> > >>> - d...@drill.apache.org:
> > >>> - 406 subscribers (down -10 in the last 3 months):
> > >>> - 2299 emails sent to list (1903 in previous quarter)
> > >>>
> > >>> - iss...@drill.apache.org:
> > >>> - 17 subscribers (down -1 in the last 3 months):
> > >>> - 2373 emails sent to list (2233 in previous quarter)
> > >>>
> > >>> - user@drill.apache.org:
> > >>> - 582 subscribers (down -15 in the last 3 months):
> > >>> - 235 emails sent to list (227 in previous quarter)
> > >>>
> > >>> ## JIRA activity:
> > >>>
> > >>> - 214 JIRA tickets created in the last 3 months
> > >>> - 212 JIRA tickets closed/resolved in the last 3 months
> > >>>
> >
>


Serious performance problem in ctas json to parquet writing

2019-05-04 Thread Mehran.D [BR-PD]
Dears.

I have some jsons to be converted to parquet files.
After versin 1.13 I have a very serious issue in this writing. A sample of 
10 record json document takes 5 minutes to finish.
I'v included the plan.


Overview
Operator ID

Type

Avg Setup Time

Max Setup Time

Avg Process Time

Max Process Time

Min Wait Time

Avg Wait Time

Max Wait Time

% Fragment Time

% Query Time

Rows

Avg Peak Memory

Max Peak Memory

00-xx-00

SCREEN

0.000s

0.000s

1.262s

2.510s

0.004s

0.075s

0.145s

0.82%

0.82%

111,491

10MB

20MB

00-xx-01

PROJECT

0.002s

0.002s

0.001s

0.001s

0.000s

0.000s

0.000s

0.00%

0.00%

1

-

-

00-xx-02

PARQUET_WRITER

0.293s

0.293s

50.750s

50.750s

0.000s

0.000s

0.000s

16.44%

16.44%

111,490

-

-

00-xx-03

PROJECT_ALLOW_DUP

0.032s

0.032s

2m0s

2m0s

0.000s

0.000s

0.000s

39.01%

39.01%

111,490

13MB

13MB

00-xx-04

PROJECT

16.092s

16.092s

2m15s

2m15s

0.000s

0.000s

0.000s

43.73%

43.73%

111,490

13MB

13MB




I do not know what the mothods  'PROJECT_ALLOW_DUP' and 'PROJECT' does.
Please tell me what has changed from 1.13 up to now.
I have similar problem in 1.16 release.


and

Best Regards,

  [LOGO1]
Mehran Dashti
Product Leader
09125902452