from:"Mehran.D \[BR\-PD\]"

Running drill in distributed mode

2018-12-09 Thread Mehran.D [BR-PD]

I wanted to know if it is possible to run drill in distributed mode on local 
file systems of machines.
Is it possible to run it such as splunk does and have a search head and 1 or 
mode indexers that can distribute search query and search  head can aggregate 
response and get the query done?
I ran drill on local file system with one specific zookeeper and second  
machine is getting unavailable in drill monitoring web  interface

I wanted to know if:

* Is it possible to run apache drill in distributed local file system?

* Is it possible to run drill in a way that query another apache drill 
as a storage-plugin?

Best Regards,

  [LOGO1]
Mehran Dashti
Product Leader
09125902452

RE: Serious performance problem in ctas json to parquet writing

2019-05-15 Thread Mehran.D [BR-PD]

I’ve added the palan for both versions

Although number of row may be different but pan is correct.;

Please tell me what Project and Project_allow_dup steps do that have this 
massive ingestion time.





Plan for 1.13



Operator IDType  Avg Setup Time Max Setup Time   Avg 
Process Time Max Process Time   Min Wait Time
Avg Wait Time   Max Wait Time  % Fragment Time % Query TimeRows 
Avg Peak MemoryMax Peak Memory

00-xx-00   SCREEN 0.000s   0.000s   0.160s   0.319s   0.000s   
0.006s   0.011s   33.19%  33.19%  23,838   4MB   8MB

00-xx-01   PROJECT  0.000s   0.000s   0.000s   0.000s   
0.000s   0.000s   0.000s   0.00%0.00%1  --

00-xx-02   PARQUET_WRITER  0.001s   0.001s   0.469s   
0.469s   0.000s   0.000s   0.000s   48.81%  48.81%23,837   -
  -

00-xx-03   PROJECT_ALLOW_DUP 0.001s   0.001s   0.002s   0.002s   
0.000s   0.000s   0.000s   0.19%0.19%23,837   9MB   9MB

00-xx-04   PROJECT  0.009s   0.009s   0.171s   0.171s   
0.000s   0.000s   0.000s   17.81%  17.81%  23,837   11MB11MB



Paln for 1.13


  Screen : rowType = RecordType(VARCHAR(255) Fragment, BIGINT Number of records 
written): rowcount = 22815.0, cumulative cost = {116356.5 rows, 1.47179565E7 
cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 31768366
00-01  Project(Fragment=[$0], Number of records written=[$1]) : rowType = 
RecordType(VARCHAR(255) Fragment, BIGINT Number of records written): rowcount = 
22815.0, cumulative cost = {114075.0 rows, 1.4715675E7 cpu, 0.0 io, 0.0 
network, 0.0 memory}, id = 31768365
00-02Writer : rowType = RecordType(VARCHAR(255) Fragment, BIGINT Number 
of records written): rowcount = 22815.0, cumulative cost = {91260.0 rows, 
1.4670045E7 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 31768364
00-03  ProjectAllowDup(id=[$0], src_port=[$1], dst_port=[$2], 
dst_ip=[$3], src_ip=[$4], secondary_event_type=[$5], 
secondary_event_source=[$6], hash=[$7], receiver_msisdn=[$8], 
calling_number=[$9], subscriber_id=[$10], bolton_type=[$11], source_tid=[$12], 
a_number_balance_after=[$13], called_number=[$14], 
b_number_balance_after=[$15], secondary_device_id=[$16], namespace=[$17], 
secondary_event_severity=[$18], secondary_rept_dev_ip_addr=[$19], 
reference_id=[$20], transaction_id=[$21], price_base=[$22], deposit=[$23], 
module=[$24], vul_result_id=[$25], dest_service_name=[$26], vul_id=[$27], 
reason=[$28], device_id=[$29], sub_module=[$30], user_name=[$31], 
virus_name=[$32], custom_str12=[$33], custom_str13=[$34], custom_str10=[$35], 
custom_str11=[$36], custom_str16=[$37], custom_str17=[$38], custom_str14=[$39], 
custom_str15=[$40], custom_str4=[$41], custom_str5=[$42], custom_str2=[$43], 
custom_str3=[$44], custom_str8=[$45], custom_str9=[$46], custom_str6=[$47], 
custom_str7=[$48], custom_int8=[$49], custom_int9=[$50], custom_int6=[$51], 
custom_int7=[$52], custom_dateand_time2=[$53], custom_dateand_time3=[$54], 
custom_int10=[$55], custom_dateand_time1=[$56], custom_str20=[$57], 
custom_int1=[$58], custom_str18=[$59], custom_str19=[$60], custom_int4=[$61], 
custom_int5=[$62], custom_int2=[$63], custom_int3=[$64], custom_str1=[$65], 
custom_dateand_time4=[$66], custom_dateand_time5=[$67], cp_username=[$68], 
min_value=[$69], circle_name=[$70], cp_account_name=[$71], average60=[$72], 
serial=[$73], max_value=[$74], average15=[$75], internal_error_code=[$76], 
bank_id=[$77], a_number_balance_before=[$78], transfer_type=[$79], state=[$80], 
correlation_id=[$81], auth_code=[$82], retry_attempt=[$83], peak_value=[$84], 
request=[$85], is_credit=[$86], ability_id=[$87], response=[$88], 
process_name=[$89], recv_month=[$90], recv_day=[$91], recv_year=[$92], 
reporting_ip=[$93], raw_event_msg=[$94], event_type_id=[$95], severity=[$96], 
recv_time=[$97], device_time=[$98], record_time=[$99], recv_hour=[$100], 
recv_minute=[$101], impact=[$102], risk=[$103], collector_id=[$104], 
event_source_id=[$105], protocol_id=[$106]) : rowType = RecordType(BIGINT id, 
INTEGER src_port, INTEGER dst_port, BIGINT dst_ip, BIGINT src_ip, INTEGER 
secondary_event_type, INTEGER secondary_event_source, VARCHAR(65535) hash, 
BIGINT receiver_msisdn, VARCHAR(65535) calling_number, BIGINT subscriber_id, 
VARCHAR(65535) bolton_type, BIGINT source_tid, BIGINT a_number_balance_after, 
VARCHAR(65535) called_number, BIGINT b_number_balance_after, INTEGER 
secondary_device_id, VARCHAR(65535) namespace, INTEGER 
secondary_event_severity, BIGINT secondary_rept_dev_ip_addr, BIGINT 
reference_id, BIGINT transaction_id, BIGINT price_base, BIGINT deposit, 
VARCHAR(65535) module, VARCHAR(65535) vul_result_id, INTEGER dest_service_name, 
VARCHAR(65535) vul_id, VARCHAR(65535) reason, INTEGER device_id, VARCHAR(65535) 
sub_module, VARCHAR(65535) user_name,

Serious performance problem in ctas json to parquet writing

2019-05-04 Thread Mehran.D [BR-PD]

Dears.

I have some jsons to be converted to parquet files.
After versin 1.13 I have a very serious issue in this writing. A sample of 
10 record json document takes 5 minutes to finish.
I'v included the plan.


Overview
Operator ID

Type

Avg Setup Time

Max Setup Time

Avg Process Time

Max Process Time

Min Wait Time

Avg Wait Time

Max Wait Time

% Fragment Time

% Query Time

Rows

Avg Peak Memory

Max Peak Memory

00-xx-00

SCREEN

0.000s

0.000s

1.262s

2.510s

0.004s

0.075s

0.145s

0.82%

0.82%

111,491

10MB

20MB

00-xx-01

PROJECT

0.002s

0.002s

0.001s

0.001s

0.000s

0.000s

0.000s

0.00%

0.00%

1

-

-

00-xx-02

PARQUET_WRITER

0.293s

0.293s

50.750s

50.750s

0.000s

0.000s

0.000s

16.44%

16.44%

111,490

-

-

00-xx-03

PROJECT_ALLOW_DUP

0.032s

0.032s

2m0s

2m0s

0.000s

0.000s

0.000s

39.01%

39.01%

111,490

13MB

13MB

00-xx-04

PROJECT

16.092s

16.092s

2m15s

2m15s

0.000s

0.000s

0.000s

43.73%

43.73%

111,490

13MB

13MB




I do not know what the mothods  'PROJECT_ALLOW_DUP' and 'PROJECT' does.
Please tell me what has changed from 1.13 up to now.
I have similar problem in 1.16 release.


and

Best Regards,

  [LOGO1]
Mehran Dashti
Product Leader
09125902452

Running drill in distributed mode

RE: Serious performance problem in ctas json to parquet writing

Serious performance problem in ctas json to parquet writing

3 matches

Site Navigation

Mail list logo

Footer information