[jira] [Commented] (IMPALA-9857) Batch ALTER_PARTITION events

2021-10-06 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-9857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17425307#comment-17425307
 ] 

ASF subversion and git services commented on IMPALA-9857:
-

Commit d8d44f3f147ce0f98bdd9e0387ae080010f55965 in impala's branch 
refs/heads/master from Vihang Karajgaonkar
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=d8d44f3 ]

IMPALA-9857: Batching of consecutive partition events

This patch improves the performance of events processor
by batching together consecutive ALTER_PARTITION or
INSERT events. Currently, without this patch, if
the events stream consists of a lot of consecutive
ALTER_PARTITION events which cannot be skipped,
events processor will refresh partition from each
event one by one. Similarly, in case of INSERT events
in a partition events processor refresh one partition
at a time.

By batching together such consecutive ALTER_PARTITION or
INSERT events, events processor needs to take lock on the table
only once per batch and can refresh all the partitions from
the events using multiple threads. For transactional (acid)
tables, this provides even significant performance gain
since currently we refresh the whole table in case of
ALTER_PARTITION or INSERT partition events. By batching them
together, events processor will refresh the table once per
batch.

The batch of eligible ALTER_PARTITION and INSERT events will
be processed as ALTER_PARTITIONS and INSERT_PARTITIONS event
respectively.

Performance tests:
In order to simulate bunch of ALTER_PARTITION and INSERT
events, a simple test was performed by running the following
query from hive:
insert into store_sales_copy partition(ss_sold_date_sk)
select * from store_sales;

This query generates 1824 ALTER_PARTITION and 1824 INSERT
events and time taken to process all the events generated
was measured before and after the patch for external and
ACID table.

Table Type  Before  After
==
External table  75 sec  25 sec
ACID tables 313 sec 47 sec

Additionally, the patch also fixes a minor bug in
evaluateSelfEvent() method which should return false when
serviceId does not match.

Testing Done:
1. Added new tests which cover the batching logic of events.
2. Exhaustive tests.

Change-Id: I5d27a68a64436d31731e9a219b1efd6fc842de73
Reviewed-on: http://gerrit.cloudera.org:8080/17848
Tested-by: Impala Public Jenkins 
Reviewed-by: Sourabh Goyal 
Reviewed-by: Zoltan Borok-Nagy 


> Batch ALTER_PARTITION events
> 
>
> Key: IMPALA-9857
> URL: https://issues.apache.org/jira/browse/IMPALA-9857
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Catalog
>Reporter: Vihang Karajgaonkar
>Assignee: Vihang Karajgaonkar
>Priority: Major
>
> When Hive inserts data into partitioned tables, it generates a lot of 
> ALTER_PARTITION (and possibly INSERT_EVENT) in quick succession. Currently, 
> such events are processed one by one by EventsProcessor which is can be slow 
> and can cause EventsProcessor to lag behind. This JIRA proposes to use 
> batching for such ALTER_PARTITION events such that all the successive 
> ALTER_PARTITION events for the same table are batched together into one 
> ALTER_PARTITIONS event and then are processed together to refresh all the 
> partitions from the events. This can significantly speed up the event 
> processing in such cases.   



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-9857) Batch ALTER_PARTITION events

2021-10-04 Thread Vihang Karajgaonkar (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-9857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17424042#comment-17424042
 ] 

Vihang Karajgaonkar commented on IMPALA-9857:
-

IMPALA-10949 is created as a follow-up which can improve the batching logic 
significantly.

> Batch ALTER_PARTITION events
> 
>
> Key: IMPALA-9857
> URL: https://issues.apache.org/jira/browse/IMPALA-9857
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Catalog
>Reporter: Vihang Karajgaonkar
>Assignee: Vihang Karajgaonkar
>Priority: Major
>
> When Hive inserts data into partitioned tables, it generates a lot of 
> ALTER_PARTITION (and possibly INSERT_EVENT) in quick succession. Currently, 
> such events are processed one by one by EventsProcessor which is can be slow 
> and can cause EventsProcessor to lag behind. This JIRA proposes to use 
> batching for such ALTER_PARTITION events such that all the successive 
> ALTER_PARTITION events for the same table are batched together into one 
> ALTER_PARTITIONS event and then are processed together to refresh all the 
> partitions from the events. This can significantly speed up the event 
> processing in such cases.   



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org