[jira] [Commented] (IGNITE-6930) Optionally to do not write free list updates to WAL

2019-10-14 Thread Aleksey Plekhanov (Jira)


[ 
https://issues.apache.org/jira/browse/IGNITE-6930?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16951185#comment-16951185
 ] 

Aleksey Plekhanov commented on IGNITE-6930:
---

[~ivan.glukos], thanks for the review!

Merged to master.

> Optionally to do not write free list updates to WAL
> ---
>
> Key: IGNITE-6930
> URL: https://issues.apache.org/jira/browse/IGNITE-6930
> Project: Ignite
>  Issue Type: Task
>  Components: cache
>Reporter: Vladimir Ozerov
>Assignee: Aleksey Plekhanov
>Priority: Major
>  Labels: IEP-8, performance
> Fix For: 2.8
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> When cache entry is created, we need to write update the free list. When 
> entry is updated, we need to update free list(s) several times. Currently 
> free list is persistent structure, so every update to it must be logged to be 
> able to recover after crash. This may incur significant overhead, especially 
> for small entries.
> E.g. this is how WAL for a single update looks like. "D" - updates with real 
> data, "F" - free-list management:
> {code}
>  1. [D] DataRecord [writeEntries=[UnwrapDataEntry[k = key, v = [ BinaryObject 
> [idHash=2053299190, hash=1986931360, typeId=-1580729813]], super = [DataEntry 
> [cacheId=94416770, op=UPDATE, writeVer=GridCacheVersion [topVer=122147562, 
> order=1510667560607, nodeOrder=1], partId=0, partCnt=4, super=WALRecord 
> [size=0, chainSize=0, pos=null, type=DATA_RECORD]]
>  2. [F] PagesListRemovePageRecord [rmvdPageId=00010005, 
> pageId=00010006, grpId=94416770, super=PageDeltaRecord 
> [grpId=94416770, pageId=00010006, super=WALRecord [size=37, 
> chainSize=0, pos=null, type=PAGES_LIST_REMOVE_PAGE]]]
>  3. [D] DataPageInsertRecord [super=PageDeltaRecord [grpId=94416770, 
> pageId=00010005, super=WALRecord [size=129, chainSize=0, pos=null, 
> type=DATA_PAGE_INSERT_RECORD]]]
>  4. [F] PagesListAddPageRecord [dataPageId=00010005, 
> super=PageDeltaRecord [grpId=94416770, pageId=00010008, 
> super=WALRecord [size=37, chainSize=0, pos=null, type=PAGES_LIST_ADD_PAGE]]]
>  5. [F] DataPageSetFreeListPageRecord [freeListPage=281474976710664, 
> super=PageDeltaRecord [grpId=94416770, pageId=00010005, 
> super=WALRecord [size=37, chainSize=0, pos=null, 
> type=DATA_PAGE_SET_FREE_LIST_PAGE]]]
>  6. [D] ReplaceRecord [io=DataLeafIO[ver=1], idx=0, super=PageDeltaRecord 
> [grpId=94416770, pageId=00010004, super=WALRecord [size=47, 
> chainSize=0, pos=null, type=BTREE_PAGE_REPLACE]]]
>  7. [F] DataPageRemoveRecord [itemId=0, super=PageDeltaRecord 
> [grpId=94416770, pageId=00010005, super=WALRecord [size=30, 
> chainSize=0, pos=null, type=DATA_PAGE_REMOVE_RECORD]]]
>  8. [F] PagesListRemovePageRecord [rmvdPageId=00010005, 
> pageId=00010008, grpId=94416770, super=PageDeltaRecord 
> [grpId=94416770, pageId=00010008, super=WALRecord [size=37, 
> chainSize=0, pos=null, type=PAGES_LIST_REMOVE_PAGE]]]
>  9. [F] DataPageSetFreeListPageRecord [freeListPage=0, super=PageDeltaRecord 
> [grpId=94416770, pageId=00010005, super=WALRecord [size=37, 
> chainSize=0, pos=null, type=DATA_PAGE_SET_FREE_LIST_PAGE]]]
> 10. [F] PagesListAddPageRecord [dataPageId=00010005, 
> super=PageDeltaRecord [grpId=94416770, pageId=00010006, 
> super=WALRecord [size=37, chainSize=0, pos=null, type=PAGES_LIST_ADD_PAGE]]]
> 11. [F] DataPageSetFreeListPageRecord [freeListPage=281474976710662, 
> super=PageDeltaRecord [grpId=94416770, pageId=00010005, 
> super=WALRecord [size=37, chainSize=0, pos=null, 
> type=DATA_PAGE_SET_FREE_LIST_PAGE]]]
> {code}
> If you sum all space required for operation (size in p.3 is shown incorrectly 
> here), you will see that data update required ~300 bytes, so do free list 
> update! 
> *Proposed solution*
> 1) Optionally do not write free list updates to WAL
> 2) In case of node restart we start with empty free lists, so data inserts 
> will have to allocate new pages
> 3) When old data page is read, add it to the free list
> 4) Start a background thread which will iterate over all old data pages and 
> re-create the free list, so that eventually all data pages are tracked.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (IGNITE-6930) Optionally to do not write free list updates to WAL

2019-10-14 Thread Ignite TC Bot (Jira)


[ 
https://issues.apache.org/jira/browse/IGNITE-6930?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16951178#comment-16951178
 ] 

Ignite TC Bot commented on IGNITE-6930:
---

{panel:title=Branch: [pull/6893/head] Base: [master] : No blockers 
found!|borderStyle=dashed|borderColor=#ccc|titleBGColor=#D6F7C1}{panel}
[TeamCity *-- Run :: All* 
Results|https://ci.ignite.apache.org/viewLog.html?buildId=4696109buildTypeId=IgniteTests24Java8_RunAll]

> Optionally to do not write free list updates to WAL
> ---
>
> Key: IGNITE-6930
> URL: https://issues.apache.org/jira/browse/IGNITE-6930
> Project: Ignite
>  Issue Type: Task
>  Components: cache
>Reporter: Vladimir Ozerov
>Assignee: Aleksey Plekhanov
>Priority: Major
>  Labels: IEP-8, performance
> Fix For: 2.8
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> When cache entry is created, we need to write update the free list. When 
> entry is updated, we need to update free list(s) several times. Currently 
> free list is persistent structure, so every update to it must be logged to be 
> able to recover after crash. This may incur significant overhead, especially 
> for small entries.
> E.g. this is how WAL for a single update looks like. "D" - updates with real 
> data, "F" - free-list management:
> {code}
>  1. [D] DataRecord [writeEntries=[UnwrapDataEntry[k = key, v = [ BinaryObject 
> [idHash=2053299190, hash=1986931360, typeId=-1580729813]], super = [DataEntry 
> [cacheId=94416770, op=UPDATE, writeVer=GridCacheVersion [topVer=122147562, 
> order=1510667560607, nodeOrder=1], partId=0, partCnt=4, super=WALRecord 
> [size=0, chainSize=0, pos=null, type=DATA_RECORD]]
>  2. [F] PagesListRemovePageRecord [rmvdPageId=00010005, 
> pageId=00010006, grpId=94416770, super=PageDeltaRecord 
> [grpId=94416770, pageId=00010006, super=WALRecord [size=37, 
> chainSize=0, pos=null, type=PAGES_LIST_REMOVE_PAGE]]]
>  3. [D] DataPageInsertRecord [super=PageDeltaRecord [grpId=94416770, 
> pageId=00010005, super=WALRecord [size=129, chainSize=0, pos=null, 
> type=DATA_PAGE_INSERT_RECORD]]]
>  4. [F] PagesListAddPageRecord [dataPageId=00010005, 
> super=PageDeltaRecord [grpId=94416770, pageId=00010008, 
> super=WALRecord [size=37, chainSize=0, pos=null, type=PAGES_LIST_ADD_PAGE]]]
>  5. [F] DataPageSetFreeListPageRecord [freeListPage=281474976710664, 
> super=PageDeltaRecord [grpId=94416770, pageId=00010005, 
> super=WALRecord [size=37, chainSize=0, pos=null, 
> type=DATA_PAGE_SET_FREE_LIST_PAGE]]]
>  6. [D] ReplaceRecord [io=DataLeafIO[ver=1], idx=0, super=PageDeltaRecord 
> [grpId=94416770, pageId=00010004, super=WALRecord [size=47, 
> chainSize=0, pos=null, type=BTREE_PAGE_REPLACE]]]
>  7. [F] DataPageRemoveRecord [itemId=0, super=PageDeltaRecord 
> [grpId=94416770, pageId=00010005, super=WALRecord [size=30, 
> chainSize=0, pos=null, type=DATA_PAGE_REMOVE_RECORD]]]
>  8. [F] PagesListRemovePageRecord [rmvdPageId=00010005, 
> pageId=00010008, grpId=94416770, super=PageDeltaRecord 
> [grpId=94416770, pageId=00010008, super=WALRecord [size=37, 
> chainSize=0, pos=null, type=PAGES_LIST_REMOVE_PAGE]]]
>  9. [F] DataPageSetFreeListPageRecord [freeListPage=0, super=PageDeltaRecord 
> [grpId=94416770, pageId=00010005, super=WALRecord [size=37, 
> chainSize=0, pos=null, type=DATA_PAGE_SET_FREE_LIST_PAGE]]]
> 10. [F] PagesListAddPageRecord [dataPageId=00010005, 
> super=PageDeltaRecord [grpId=94416770, pageId=00010006, 
> super=WALRecord [size=37, chainSize=0, pos=null, type=PAGES_LIST_ADD_PAGE]]]
> 11. [F] DataPageSetFreeListPageRecord [freeListPage=281474976710662, 
> super=PageDeltaRecord [grpId=94416770, pageId=00010005, 
> super=WALRecord [size=37, chainSize=0, pos=null, 
> type=DATA_PAGE_SET_FREE_LIST_PAGE]]]
> {code}
> If you sum all space required for operation (size in p.3 is shown incorrectly 
> here), you will see that data update required ~300 bytes, so do free list 
> update! 
> *Proposed solution*
> 1) Optionally do not write free list updates to WAL
> 2) In case of node restart we start with empty free lists, so data inserts 
> will have to allocate new pages
> 3) When old data page is read, add it to the free list
> 4) Start a background thread which will iterate over all old data pages and 
> re-create the free list, so that eventually all data pages are tracked.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (IGNITE-6930) Optionally to do not write free list updates to WAL

2019-10-14 Thread Ivan Rakov (Jira)


[ 
https://issues.apache.org/jira/browse/IGNITE-6930?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16951163#comment-16951163
 ] 

Ivan Rakov commented on IGNITE-6930:


[~alex_pl] Thanks, please proceed to merge.

> Optionally to do not write free list updates to WAL
> ---
>
> Key: IGNITE-6930
> URL: https://issues.apache.org/jira/browse/IGNITE-6930
> Project: Ignite
>  Issue Type: Task
>  Components: cache
>Reporter: Vladimir Ozerov
>Assignee: Aleksey Plekhanov
>Priority: Major
>  Labels: IEP-8, performance
> Fix For: 2.8
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> When cache entry is created, we need to write update the free list. When 
> entry is updated, we need to update free list(s) several times. Currently 
> free list is persistent structure, so every update to it must be logged to be 
> able to recover after crash. This may incur significant overhead, especially 
> for small entries.
> E.g. this is how WAL for a single update looks like. "D" - updates with real 
> data, "F" - free-list management:
> {code}
>  1. [D] DataRecord [writeEntries=[UnwrapDataEntry[k = key, v = [ BinaryObject 
> [idHash=2053299190, hash=1986931360, typeId=-1580729813]], super = [DataEntry 
> [cacheId=94416770, op=UPDATE, writeVer=GridCacheVersion [topVer=122147562, 
> order=1510667560607, nodeOrder=1], partId=0, partCnt=4, super=WALRecord 
> [size=0, chainSize=0, pos=null, type=DATA_RECORD]]
>  2. [F] PagesListRemovePageRecord [rmvdPageId=00010005, 
> pageId=00010006, grpId=94416770, super=PageDeltaRecord 
> [grpId=94416770, pageId=00010006, super=WALRecord [size=37, 
> chainSize=0, pos=null, type=PAGES_LIST_REMOVE_PAGE]]]
>  3. [D] DataPageInsertRecord [super=PageDeltaRecord [grpId=94416770, 
> pageId=00010005, super=WALRecord [size=129, chainSize=0, pos=null, 
> type=DATA_PAGE_INSERT_RECORD]]]
>  4. [F] PagesListAddPageRecord [dataPageId=00010005, 
> super=PageDeltaRecord [grpId=94416770, pageId=00010008, 
> super=WALRecord [size=37, chainSize=0, pos=null, type=PAGES_LIST_ADD_PAGE]]]
>  5. [F] DataPageSetFreeListPageRecord [freeListPage=281474976710664, 
> super=PageDeltaRecord [grpId=94416770, pageId=00010005, 
> super=WALRecord [size=37, chainSize=0, pos=null, 
> type=DATA_PAGE_SET_FREE_LIST_PAGE]]]
>  6. [D] ReplaceRecord [io=DataLeafIO[ver=1], idx=0, super=PageDeltaRecord 
> [grpId=94416770, pageId=00010004, super=WALRecord [size=47, 
> chainSize=0, pos=null, type=BTREE_PAGE_REPLACE]]]
>  7. [F] DataPageRemoveRecord [itemId=0, super=PageDeltaRecord 
> [grpId=94416770, pageId=00010005, super=WALRecord [size=30, 
> chainSize=0, pos=null, type=DATA_PAGE_REMOVE_RECORD]]]
>  8. [F] PagesListRemovePageRecord [rmvdPageId=00010005, 
> pageId=00010008, grpId=94416770, super=PageDeltaRecord 
> [grpId=94416770, pageId=00010008, super=WALRecord [size=37, 
> chainSize=0, pos=null, type=PAGES_LIST_REMOVE_PAGE]]]
>  9. [F] DataPageSetFreeListPageRecord [freeListPage=0, super=PageDeltaRecord 
> [grpId=94416770, pageId=00010005, super=WALRecord [size=37, 
> chainSize=0, pos=null, type=DATA_PAGE_SET_FREE_LIST_PAGE]]]
> 10. [F] PagesListAddPageRecord [dataPageId=00010005, 
> super=PageDeltaRecord [grpId=94416770, pageId=00010006, 
> super=WALRecord [size=37, chainSize=0, pos=null, type=PAGES_LIST_ADD_PAGE]]]
> 11. [F] DataPageSetFreeListPageRecord [freeListPage=281474976710662, 
> super=PageDeltaRecord [grpId=94416770, pageId=00010005, 
> super=WALRecord [size=37, chainSize=0, pos=null, 
> type=DATA_PAGE_SET_FREE_LIST_PAGE]]]
> {code}
> If you sum all space required for operation (size in p.3 is shown incorrectly 
> here), you will see that data update required ~300 bytes, so do free list 
> update! 
> *Proposed solution*
> 1) Optionally do not write free list updates to WAL
> 2) In case of node restart we start with empty free lists, so data inserts 
> will have to allocate new pages
> 3) When old data page is read, add it to the free list
> 4) Start a background thread which will iterate over all old data pages and 
> re-create the free list, so that eventually all data pages are tracked.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (IGNITE-6930) Optionally to do not write free list updates to WAL

2019-10-04 Thread Aleksey Plekhanov (Jira)


[ 
https://issues.apache.org/jira/browse/IGNITE-6930?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16944281#comment-16944281
 ] 

Aleksey Plekhanov commented on IGNITE-6930:
---

# The test assumes that PDS size didn't change between the first checkpoint and 
after several checkpoints. It's not true anymore with caching since the only 
final free-list state is persisted on checkpoint, some changed, but currently 
empty buckets are not persisted. So with caching PDS size in this test after 
the first checkpoint about 0.5 size of the original test, and after several 
checkpoints about 0.75 size of the original test.
 # This test checks that free-list works and pages cache flush correctly under 
the concurrent load. It helps me to catch a couple of concurrent bugs (these 
bugs have also reproduced by yardstick benchmark, but haven't reproduced by 
other tests on TC). I will add a comment about this.
 # I think they are too low level for some configuration files, but can be 
configured by system properties. I will change it.
 # I think 64 and 4 it's reasonable values. I've benchmarked with higher 
values, but it almost gives no performance boost. 8 (2 per bucket)- it's too 
small. There will be big overhead for service objects (at least 16 bytes per 
object, at least 3 objects: lock, GridLongList and arr inside GridLongList), so 
we will have 48 bytes for service objects per bucket and only 16 bytes (2 
longs) of useful data. 64/4 is a more reliable configuration since we allocate 
more heap space (16*8=128 bytes) for useful data than for service objects. 
Also, I think choosing MAX_SIZE dynamically, it's not such a good idea, since, 
there can be more than one node inside one JVM and we don't know when and how 
many nodes will be started when we start first one.
 # Ok, I will implement counter of empty flushed buckets.

> Optionally to do not write free list updates to WAL
> ---
>
> Key: IGNITE-6930
> URL: https://issues.apache.org/jira/browse/IGNITE-6930
> Project: Ignite
>  Issue Type: Task
>  Components: cache
>Reporter: Vladimir Ozerov
>Assignee: Aleksey Plekhanov
>Priority: Major
>  Labels: IEP-8, performance
> Fix For: 2.8
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> When cache entry is created, we need to write update the free list. When 
> entry is updated, we need to update free list(s) several times. Currently 
> free list is persistent structure, so every update to it must be logged to be 
> able to recover after crash. This may incur significant overhead, especially 
> for small entries.
> E.g. this is how WAL for a single update looks like. "D" - updates with real 
> data, "F" - free-list management:
> {code}
>  1. [D] DataRecord [writeEntries=[UnwrapDataEntry[k = key, v = [ BinaryObject 
> [idHash=2053299190, hash=1986931360, typeId=-1580729813]], super = [DataEntry 
> [cacheId=94416770, op=UPDATE, writeVer=GridCacheVersion [topVer=122147562, 
> order=1510667560607, nodeOrder=1], partId=0, partCnt=4, super=WALRecord 
> [size=0, chainSize=0, pos=null, type=DATA_RECORD]]
>  2. [F] PagesListRemovePageRecord [rmvdPageId=00010005, 
> pageId=00010006, grpId=94416770, super=PageDeltaRecord 
> [grpId=94416770, pageId=00010006, super=WALRecord [size=37, 
> chainSize=0, pos=null, type=PAGES_LIST_REMOVE_PAGE]]]
>  3. [D] DataPageInsertRecord [super=PageDeltaRecord [grpId=94416770, 
> pageId=00010005, super=WALRecord [size=129, chainSize=0, pos=null, 
> type=DATA_PAGE_INSERT_RECORD]]]
>  4. [F] PagesListAddPageRecord [dataPageId=00010005, 
> super=PageDeltaRecord [grpId=94416770, pageId=00010008, 
> super=WALRecord [size=37, chainSize=0, pos=null, type=PAGES_LIST_ADD_PAGE]]]
>  5. [F] DataPageSetFreeListPageRecord [freeListPage=281474976710664, 
> super=PageDeltaRecord [grpId=94416770, pageId=00010005, 
> super=WALRecord [size=37, chainSize=0, pos=null, 
> type=DATA_PAGE_SET_FREE_LIST_PAGE]]]
>  6. [D] ReplaceRecord [io=DataLeafIO[ver=1], idx=0, super=PageDeltaRecord 
> [grpId=94416770, pageId=00010004, super=WALRecord [size=47, 
> chainSize=0, pos=null, type=BTREE_PAGE_REPLACE]]]
>  7. [F] DataPageRemoveRecord [itemId=0, super=PageDeltaRecord 
> [grpId=94416770, pageId=00010005, super=WALRecord [size=30, 
> chainSize=0, pos=null, type=DATA_PAGE_REMOVE_RECORD]]]
>  8. [F] PagesListRemovePageRecord [rmvdPageId=00010005, 
> pageId=00010008, grpId=94416770, super=PageDeltaRecord 
> [grpId=94416770, pageId=00010008, super=WALRecord [size=37, 
> chainSize=0, pos=null, type=PAGES_LIST_REMOVE_PAGE]]]
>  9. [F] DataPageSetFreeListPageRecord [freeListPage=0, super=PageDeltaRecord 
> [grpId=94416770, pageId=00010005, super=WALRecord [size=37, 
> chainSize=0, 

[jira] [Commented] (IGNITE-6930) Optionally to do not write free list updates to WAL

2019-10-04 Thread Ignite TC Bot (Jira)


[ 
https://issues.apache.org/jira/browse/IGNITE-6930?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16944278#comment-16944278
 ] 

Ignite TC Bot commented on IGNITE-6930:
---

{panel:title=Branch: [pull/6893/head] Base: [master] : No blockers 
found!|borderStyle=dashed|borderColor=#ccc|titleBGColor=#D6F7C1}{panel}
[TeamCity *-- Run :: All* 
Results|https://ci.ignite.apache.org/viewLog.html?buildId=4654547buildTypeId=IgniteTests24Java8_RunAll]

> Optionally to do not write free list updates to WAL
> ---
>
> Key: IGNITE-6930
> URL: https://issues.apache.org/jira/browse/IGNITE-6930
> Project: Ignite
>  Issue Type: Task
>  Components: cache
>Reporter: Vladimir Ozerov
>Assignee: Aleksey Plekhanov
>Priority: Major
>  Labels: IEP-8, performance
> Fix For: 2.8
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> When cache entry is created, we need to write update the free list. When 
> entry is updated, we need to update free list(s) several times. Currently 
> free list is persistent structure, so every update to it must be logged to be 
> able to recover after crash. This may incur significant overhead, especially 
> for small entries.
> E.g. this is how WAL for a single update looks like. "D" - updates with real 
> data, "F" - free-list management:
> {code}
>  1. [D] DataRecord [writeEntries=[UnwrapDataEntry[k = key, v = [ BinaryObject 
> [idHash=2053299190, hash=1986931360, typeId=-1580729813]], super = [DataEntry 
> [cacheId=94416770, op=UPDATE, writeVer=GridCacheVersion [topVer=122147562, 
> order=1510667560607, nodeOrder=1], partId=0, partCnt=4, super=WALRecord 
> [size=0, chainSize=0, pos=null, type=DATA_RECORD]]
>  2. [F] PagesListRemovePageRecord [rmvdPageId=00010005, 
> pageId=00010006, grpId=94416770, super=PageDeltaRecord 
> [grpId=94416770, pageId=00010006, super=WALRecord [size=37, 
> chainSize=0, pos=null, type=PAGES_LIST_REMOVE_PAGE]]]
>  3. [D] DataPageInsertRecord [super=PageDeltaRecord [grpId=94416770, 
> pageId=00010005, super=WALRecord [size=129, chainSize=0, pos=null, 
> type=DATA_PAGE_INSERT_RECORD]]]
>  4. [F] PagesListAddPageRecord [dataPageId=00010005, 
> super=PageDeltaRecord [grpId=94416770, pageId=00010008, 
> super=WALRecord [size=37, chainSize=0, pos=null, type=PAGES_LIST_ADD_PAGE]]]
>  5. [F] DataPageSetFreeListPageRecord [freeListPage=281474976710664, 
> super=PageDeltaRecord [grpId=94416770, pageId=00010005, 
> super=WALRecord [size=37, chainSize=0, pos=null, 
> type=DATA_PAGE_SET_FREE_LIST_PAGE]]]
>  6. [D] ReplaceRecord [io=DataLeafIO[ver=1], idx=0, super=PageDeltaRecord 
> [grpId=94416770, pageId=00010004, super=WALRecord [size=47, 
> chainSize=0, pos=null, type=BTREE_PAGE_REPLACE]]]
>  7. [F] DataPageRemoveRecord [itemId=0, super=PageDeltaRecord 
> [grpId=94416770, pageId=00010005, super=WALRecord [size=30, 
> chainSize=0, pos=null, type=DATA_PAGE_REMOVE_RECORD]]]
>  8. [F] PagesListRemovePageRecord [rmvdPageId=00010005, 
> pageId=00010008, grpId=94416770, super=PageDeltaRecord 
> [grpId=94416770, pageId=00010008, super=WALRecord [size=37, 
> chainSize=0, pos=null, type=PAGES_LIST_REMOVE_PAGE]]]
>  9. [F] DataPageSetFreeListPageRecord [freeListPage=0, super=PageDeltaRecord 
> [grpId=94416770, pageId=00010005, super=WALRecord [size=37, 
> chainSize=0, pos=null, type=DATA_PAGE_SET_FREE_LIST_PAGE]]]
> 10. [F] PagesListAddPageRecord [dataPageId=00010005, 
> super=PageDeltaRecord [grpId=94416770, pageId=00010006, 
> super=WALRecord [size=37, chainSize=0, pos=null, type=PAGES_LIST_ADD_PAGE]]]
> 11. [F] DataPageSetFreeListPageRecord [freeListPage=281474976710662, 
> super=PageDeltaRecord [grpId=94416770, pageId=00010005, 
> super=WALRecord [size=37, chainSize=0, pos=null, 
> type=DATA_PAGE_SET_FREE_LIST_PAGE]]]
> {code}
> If you sum all space required for operation (size in p.3 is shown incorrectly 
> here), you will see that data update required ~300 bytes, so do free list 
> update! 
> *Proposed solution*
> 1) Optionally do not write free list updates to WAL
> 2) In case of node restart we start with empty free lists, so data inserts 
> will have to allocate new pages
> 3) When old data page is read, add it to the free list
> 4) Start a background thread which will iterate over all old data pages and 
> re-create the free list, so that eventually all data pages are tracked.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (IGNITE-6930) Optionally to do not write free list updates to WAL

2019-10-03 Thread Ivan Rakov (Jira)


[ 
https://issues.apache.org/jira/browse/IGNITE-6930?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16943808#comment-16943808
 ] 

Ivan Rakov commented on IGNITE-6930:


[~DmitriyGovorukhin] Do you want to take a look as well?

> Optionally to do not write free list updates to WAL
> ---
>
> Key: IGNITE-6930
> URL: https://issues.apache.org/jira/browse/IGNITE-6930
> Project: Ignite
>  Issue Type: Task
>  Components: cache
>Reporter: Vladimir Ozerov
>Assignee: Aleksey Plekhanov
>Priority: Major
>  Labels: IEP-8, performance
> Fix For: 2.8
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> When cache entry is created, we need to write update the free list. When 
> entry is updated, we need to update free list(s) several times. Currently 
> free list is persistent structure, so every update to it must be logged to be 
> able to recover after crash. This may incur significant overhead, especially 
> for small entries.
> E.g. this is how WAL for a single update looks like. "D" - updates with real 
> data, "F" - free-list management:
> {code}
>  1. [D] DataRecord [writeEntries=[UnwrapDataEntry[k = key, v = [ BinaryObject 
> [idHash=2053299190, hash=1986931360, typeId=-1580729813]], super = [DataEntry 
> [cacheId=94416770, op=UPDATE, writeVer=GridCacheVersion [topVer=122147562, 
> order=1510667560607, nodeOrder=1], partId=0, partCnt=4, super=WALRecord 
> [size=0, chainSize=0, pos=null, type=DATA_RECORD]]
>  2. [F] PagesListRemovePageRecord [rmvdPageId=00010005, 
> pageId=00010006, grpId=94416770, super=PageDeltaRecord 
> [grpId=94416770, pageId=00010006, super=WALRecord [size=37, 
> chainSize=0, pos=null, type=PAGES_LIST_REMOVE_PAGE]]]
>  3. [D] DataPageInsertRecord [super=PageDeltaRecord [grpId=94416770, 
> pageId=00010005, super=WALRecord [size=129, chainSize=0, pos=null, 
> type=DATA_PAGE_INSERT_RECORD]]]
>  4. [F] PagesListAddPageRecord [dataPageId=00010005, 
> super=PageDeltaRecord [grpId=94416770, pageId=00010008, 
> super=WALRecord [size=37, chainSize=0, pos=null, type=PAGES_LIST_ADD_PAGE]]]
>  5. [F] DataPageSetFreeListPageRecord [freeListPage=281474976710664, 
> super=PageDeltaRecord [grpId=94416770, pageId=00010005, 
> super=WALRecord [size=37, chainSize=0, pos=null, 
> type=DATA_PAGE_SET_FREE_LIST_PAGE]]]
>  6. [D] ReplaceRecord [io=DataLeafIO[ver=1], idx=0, super=PageDeltaRecord 
> [grpId=94416770, pageId=00010004, super=WALRecord [size=47, 
> chainSize=0, pos=null, type=BTREE_PAGE_REPLACE]]]
>  7. [F] DataPageRemoveRecord [itemId=0, super=PageDeltaRecord 
> [grpId=94416770, pageId=00010005, super=WALRecord [size=30, 
> chainSize=0, pos=null, type=DATA_PAGE_REMOVE_RECORD]]]
>  8. [F] PagesListRemovePageRecord [rmvdPageId=00010005, 
> pageId=00010008, grpId=94416770, super=PageDeltaRecord 
> [grpId=94416770, pageId=00010008, super=WALRecord [size=37, 
> chainSize=0, pos=null, type=PAGES_LIST_REMOVE_PAGE]]]
>  9. [F] DataPageSetFreeListPageRecord [freeListPage=0, super=PageDeltaRecord 
> [grpId=94416770, pageId=00010005, super=WALRecord [size=37, 
> chainSize=0, pos=null, type=DATA_PAGE_SET_FREE_LIST_PAGE]]]
> 10. [F] PagesListAddPageRecord [dataPageId=00010005, 
> super=PageDeltaRecord [grpId=94416770, pageId=00010006, 
> super=WALRecord [size=37, chainSize=0, pos=null, type=PAGES_LIST_ADD_PAGE]]]
> 11. [F] DataPageSetFreeListPageRecord [freeListPage=281474976710662, 
> super=PageDeltaRecord [grpId=94416770, pageId=00010005, 
> super=WALRecord [size=37, chainSize=0, pos=null, 
> type=DATA_PAGE_SET_FREE_LIST_PAGE]]]
> {code}
> If you sum all space required for operation (size in p.3 is shown incorrectly 
> here), you will see that data update required ~300 bytes, so do free list 
> update! 
> *Proposed solution*
> 1) Optionally do not write free list updates to WAL
> 2) In case of node restart we start with empty free lists, so data inserts 
> will have to allocate new pages
> 3) When old data page is read, add it to the free list
> 4) Start a background thread which will iterate over all old data pages and 
> re-create the free list, so that eventually all data pages are tracked.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (IGNITE-6930) Optionally to do not write free list updates to WAL

2019-10-03 Thread Ivan Rakov (Jira)


[ 
https://issues.apache.org/jira/browse/IGNITE-6930?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16943806#comment-16943806
 ] 

Ivan Rakov commented on IGNITE-6930:


[~alex_pl], I've take a look. Some comments:
1) testRestoreFreeListCorrectlyAfterRandomStop - why do we need to disable 
caching here?
2) testFreeListUnderLoadMultipleCheckpoints - what is being tested? I think, we 
need to add comment that test is intended to cover weakened pageId != 0 
assertion.
3) MAX_SIZE, STRIPES_COUNT - don't you think that we should make these options 
configurable?
4) How did you choose 64 and 4 as defaults? Can you share some benchmarks? I 
think that 64 might be on overkill: in data load scenario, data pages traverse 
from biggest to lowest buckets by turn. I don't think that pages are likely to 
heavily accumulate in a certain bucket; maybe 8 as MAX_SIZE would show the same 
performance boost.
5) PagesList.PagesCache#flush: do we need to garbage-collect all allocated long 
lists when we flush page cache? We can just clear() them and reuse again after 
the checkpoint. It should reduce GC pressure.

> Optionally to do not write free list updates to WAL
> ---
>
> Key: IGNITE-6930
> URL: https://issues.apache.org/jira/browse/IGNITE-6930
> Project: Ignite
>  Issue Type: Task
>  Components: cache
>Reporter: Vladimir Ozerov
>Assignee: Aleksey Plekhanov
>Priority: Major
>  Labels: IEP-8, performance
> Fix For: 2.8
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> When cache entry is created, we need to write update the free list. When 
> entry is updated, we need to update free list(s) several times. Currently 
> free list is persistent structure, so every update to it must be logged to be 
> able to recover after crash. This may incur significant overhead, especially 
> for small entries.
> E.g. this is how WAL for a single update looks like. "D" - updates with real 
> data, "F" - free-list management:
> {code}
>  1. [D] DataRecord [writeEntries=[UnwrapDataEntry[k = key, v = [ BinaryObject 
> [idHash=2053299190, hash=1986931360, typeId=-1580729813]], super = [DataEntry 
> [cacheId=94416770, op=UPDATE, writeVer=GridCacheVersion [topVer=122147562, 
> order=1510667560607, nodeOrder=1], partId=0, partCnt=4, super=WALRecord 
> [size=0, chainSize=0, pos=null, type=DATA_RECORD]]
>  2. [F] PagesListRemovePageRecord [rmvdPageId=00010005, 
> pageId=00010006, grpId=94416770, super=PageDeltaRecord 
> [grpId=94416770, pageId=00010006, super=WALRecord [size=37, 
> chainSize=0, pos=null, type=PAGES_LIST_REMOVE_PAGE]]]
>  3. [D] DataPageInsertRecord [super=PageDeltaRecord [grpId=94416770, 
> pageId=00010005, super=WALRecord [size=129, chainSize=0, pos=null, 
> type=DATA_PAGE_INSERT_RECORD]]]
>  4. [F] PagesListAddPageRecord [dataPageId=00010005, 
> super=PageDeltaRecord [grpId=94416770, pageId=00010008, 
> super=WALRecord [size=37, chainSize=0, pos=null, type=PAGES_LIST_ADD_PAGE]]]
>  5. [F] DataPageSetFreeListPageRecord [freeListPage=281474976710664, 
> super=PageDeltaRecord [grpId=94416770, pageId=00010005, 
> super=WALRecord [size=37, chainSize=0, pos=null, 
> type=DATA_PAGE_SET_FREE_LIST_PAGE]]]
>  6. [D] ReplaceRecord [io=DataLeafIO[ver=1], idx=0, super=PageDeltaRecord 
> [grpId=94416770, pageId=00010004, super=WALRecord [size=47, 
> chainSize=0, pos=null, type=BTREE_PAGE_REPLACE]]]
>  7. [F] DataPageRemoveRecord [itemId=0, super=PageDeltaRecord 
> [grpId=94416770, pageId=00010005, super=WALRecord [size=30, 
> chainSize=0, pos=null, type=DATA_PAGE_REMOVE_RECORD]]]
>  8. [F] PagesListRemovePageRecord [rmvdPageId=00010005, 
> pageId=00010008, grpId=94416770, super=PageDeltaRecord 
> [grpId=94416770, pageId=00010008, super=WALRecord [size=37, 
> chainSize=0, pos=null, type=PAGES_LIST_REMOVE_PAGE]]]
>  9. [F] DataPageSetFreeListPageRecord [freeListPage=0, super=PageDeltaRecord 
> [grpId=94416770, pageId=00010005, super=WALRecord [size=37, 
> chainSize=0, pos=null, type=DATA_PAGE_SET_FREE_LIST_PAGE]]]
> 10. [F] PagesListAddPageRecord [dataPageId=00010005, 
> super=PageDeltaRecord [grpId=94416770, pageId=00010006, 
> super=WALRecord [size=37, chainSize=0, pos=null, type=PAGES_LIST_ADD_PAGE]]]
> 11. [F] DataPageSetFreeListPageRecord [freeListPage=281474976710662, 
> super=PageDeltaRecord [grpId=94416770, pageId=00010005, 
> super=WALRecord [size=37, chainSize=0, pos=null, 
> type=DATA_PAGE_SET_FREE_LIST_PAGE]]]
> {code}
> If you sum all space required for operation (size in p.3 is shown incorrectly 
> here), you will see that data update required ~300 bytes, so do free list 
> update! 
> *Proposed solution*
> 1) Optionally do not write free list updates to WAL
> 2) In 

[jira] [Commented] (IGNITE-6930) Optionally to do not write free list updates to WAL

2019-10-03 Thread Aleksey Plekhanov (Jira)


[ 
https://issues.apache.org/jira/browse/IGNITE-6930?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16943607#comment-16943607
 ] 

Aleksey Plekhanov commented on IGNITE-6930:
---

The patch is ready. To minimize WAL record I've used next approach:

There is a small on-heap pages list cache allocated for each bucket. There are 
three types of operations with free-lists: put the page to the tail of the 
bucket (after insert and remove row), take a page from the tail of the bucket 
(before insert row), remove the page from the bucket (before remove row), each 
of these operations first look into the pages cache, then work with page memory.

There is no WAL record needed if the page uses only buckets pages cache. So, 
it's possible then the page was put into free-list, moved through the bucket, 
leave the free list and hasn't produced any free-list WAL record at all.

On-heap pages cache is flushed to page memory before each checkpoint to ensure 
the same recovery guarantees as now (physical WAL records are restored from WAL 
only to the moment of the last unsuccessful checkpoint if it was started, so we 
need only final buckets state at the moment of checkpoint). 

[~ivan.glukos], could you please have a look?

 

> Optionally to do not write free list updates to WAL
> ---
>
> Key: IGNITE-6930
> URL: https://issues.apache.org/jira/browse/IGNITE-6930
> Project: Ignite
>  Issue Type: Task
>  Components: cache
>Reporter: Vladimir Ozerov
>Assignee: Aleksey Plekhanov
>Priority: Major
>  Labels: IEP-8, performance
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> When cache entry is created, we need to write update the free list. When 
> entry is updated, we need to update free list(s) several times. Currently 
> free list is persistent structure, so every update to it must be logged to be 
> able to recover after crash. This may incur significant overhead, especially 
> for small entries.
> E.g. this is how WAL for a single update looks like. "D" - updates with real 
> data, "F" - free-list management:
> {code}
>  1. [D] DataRecord [writeEntries=[UnwrapDataEntry[k = key, v = [ BinaryObject 
> [idHash=2053299190, hash=1986931360, typeId=-1580729813]], super = [DataEntry 
> [cacheId=94416770, op=UPDATE, writeVer=GridCacheVersion [topVer=122147562, 
> order=1510667560607, nodeOrder=1], partId=0, partCnt=4, super=WALRecord 
> [size=0, chainSize=0, pos=null, type=DATA_RECORD]]
>  2. [F] PagesListRemovePageRecord [rmvdPageId=00010005, 
> pageId=00010006, grpId=94416770, super=PageDeltaRecord 
> [grpId=94416770, pageId=00010006, super=WALRecord [size=37, 
> chainSize=0, pos=null, type=PAGES_LIST_REMOVE_PAGE]]]
>  3. [D] DataPageInsertRecord [super=PageDeltaRecord [grpId=94416770, 
> pageId=00010005, super=WALRecord [size=129, chainSize=0, pos=null, 
> type=DATA_PAGE_INSERT_RECORD]]]
>  4. [F] PagesListAddPageRecord [dataPageId=00010005, 
> super=PageDeltaRecord [grpId=94416770, pageId=00010008, 
> super=WALRecord [size=37, chainSize=0, pos=null, type=PAGES_LIST_ADD_PAGE]]]
>  5. [F] DataPageSetFreeListPageRecord [freeListPage=281474976710664, 
> super=PageDeltaRecord [grpId=94416770, pageId=00010005, 
> super=WALRecord [size=37, chainSize=0, pos=null, 
> type=DATA_PAGE_SET_FREE_LIST_PAGE]]]
>  6. [D] ReplaceRecord [io=DataLeafIO[ver=1], idx=0, super=PageDeltaRecord 
> [grpId=94416770, pageId=00010004, super=WALRecord [size=47, 
> chainSize=0, pos=null, type=BTREE_PAGE_REPLACE]]]
>  7. [F] DataPageRemoveRecord [itemId=0, super=PageDeltaRecord 
> [grpId=94416770, pageId=00010005, super=WALRecord [size=30, 
> chainSize=0, pos=null, type=DATA_PAGE_REMOVE_RECORD]]]
>  8. [F] PagesListRemovePageRecord [rmvdPageId=00010005, 
> pageId=00010008, grpId=94416770, super=PageDeltaRecord 
> [grpId=94416770, pageId=00010008, super=WALRecord [size=37, 
> chainSize=0, pos=null, type=PAGES_LIST_REMOVE_PAGE]]]
>  9. [F] DataPageSetFreeListPageRecord [freeListPage=0, super=PageDeltaRecord 
> [grpId=94416770, pageId=00010005, super=WALRecord [size=37, 
> chainSize=0, pos=null, type=DATA_PAGE_SET_FREE_LIST_PAGE]]]
> 10. [F] PagesListAddPageRecord [dataPageId=00010005, 
> super=PageDeltaRecord [grpId=94416770, pageId=00010006, 
> super=WALRecord [size=37, chainSize=0, pos=null, type=PAGES_LIST_ADD_PAGE]]]
> 11. [F] DataPageSetFreeListPageRecord [freeListPage=281474976710662, 
> super=PageDeltaRecord [grpId=94416770, pageId=00010005, 
> super=WALRecord [size=37, chainSize=0, pos=null, 
> type=DATA_PAGE_SET_FREE_LIST_PAGE]]]
> {code}
> If you sum all space required for operation (size in p.3 is shown incorrectly 
> here), you will see that data update required ~300 bytes, so do free list