[ 
https://issues.apache.org/jira/browse/ASTERIXDB-2541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16938806#comment-16938806
 ] 

ASF subversion and git services commented on ASTERIXDB-2541:
------------------------------------------------------------

Commit 7f5b547328fe0a77d7b512f030df36d03187caf9 in asterixdb's branch 
refs/heads/master from luochen
[ https://gitbox.apache.org/repos/asf?p=asterixdb.git;h=7f5b547 ]

[ASTERIXDB-2541][STO] Introduce GreedyScheduler

- user model changes: yes.
   Add new option: storage.io.scheduler (async/greedy)
- storage format changes: no.
- interface changes: yes.
  Introduce IIndexCursorStats

Details:
- Introduce GreedyScheduler that always executes the merge
operation with the smallest number of remaining pages to minimize
the number of disk components
- Introduce IIndexCursorStats to collect the statistics of index scans.
This allows GreedyScheduler to know the remaning pages of merge
operations.
- Extend AbstractIoOperation so that GreedyScheduler can pause/resume
merge operations if needed.

Change-Id: I38fe394d1180d4e3f6796064c0e6c6630b6ad303
Reviewed-on: https://asterix-gerrit.ics.uci.edu/3284
Reviewed-by: Murtadha Hubail <mhub...@apache.org>
Contrib: Jenkins <jenk...@fulliautomatix.ics.uci.edu>
Tested-by: Michael Blow <mb...@apache.org>


> Introduce GreedyScheduler
> -------------------------
>
>                 Key: ASTERIXDB-2541
>                 URL: https://issues.apache.org/jira/browse/ASTERIXDB-2541
>             Project: Apache AsterixDB
>          Issue Type: Improvement
>          Components: STO - Storage
>            Reporter: Chen Luo
>            Assignee: Chen Luo
>            Priority: Major
>
> Our currently AsynchronousScheduler tries to schedule all merge operations at 
> the same without any control. This is not optimal in terms of minimizing the 
> number of disk components, which directly impacts query performance.
> Here we introduce GreedyScheduler to minimize the number of disk components 
> over time. It keeps tracks of all merge operations of an LSM index, and only 
> activates the merge operation with the smallest number of remaining I/Os. It 
> can be proven that if the number of components is the same for all merge 
> operations, then this GreedyScheduler is strictly optimal. Otherwise, this 
> will still be a good heuristic.
> In order for GreedyScheduler to work, we need the following two changes:
> * Keep track of the number of scanned pages of index cursors so that we will 
> know how many pages left;
> * Introduce a mechanism to activate/deactivate merge operations
> NOTE: GreedyScheduler should only be used during runtime (with a controlled 
> data arrival process) so that it can reduce the number of disk components at 
> its best effort. It CANNOT be used when benchmarking the system by writing as 
> fast as possible since large merges will be starved. The measured write 
> throughput will be high but unsustainable.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to