[jira] [Comment Edited] (IGNITE-6930) Optionally to do not write free list updates to WAL
[ https://issues.apache.org/jira/browse/IGNITE-6930?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16944281#comment-16944281 ] Aleksey Plekhanov edited comment on IGNITE-6930 at 10/4/19 7:09 AM: [~ivan.glukos], # The test assumes that PDS size didn't change between the first checkpoint and after several checkpoints. It's not true anymore with caching since the only final free-list state is persisted on checkpoint, some changed, but currently empty buckets are not persisted. So with caching PDS size in this test after the first checkpoint about 0.5 size of the original test, and after several checkpoints about 0.75 size of the original test. # This test checks that free-list works and pages cache flush correctly under the concurrent load. It helps me to catch a couple of concurrent bugs (these bugs have also reproduced by yardstick benchmark, but haven't reproduced by other tests on TC). I will add a comment about this. # I think they are too low level for some configuration files, but can be configured by system properties. I will change it. # I think 64 and 4 it's reasonable values. I've benchmarked with higher values, but it almost gives no performance boost. 8 (2 per bucket)- it's too small. There will be big overhead for service objects (at least 16 bytes per object, at least 3 objects: lock, GridLongList and arr inside GridLongList), so we will have 48 bytes for service objects per bucket and only 16 bytes (2 longs) of useful data. 64/4 is a more reliable configuration since we allocate more heap space (16*8=128 bytes) for useful data than for service objects. Also, I think choosing MAX_SIZE dynamically, it's not such a good idea, since, there can be more than one node inside one JVM and we don't know when and how many nodes will be started when we start first one. # Ok, I will implement counter of empty flushed buckets. was (Author: alex_pl): # The test assumes that PDS size didn't change between the first checkpoint and after several checkpoints. It's not true anymore with caching since the only final free-list state is persisted on checkpoint, some changed, but currently empty buckets are not persisted. So with caching PDS size in this test after the first checkpoint about 0.5 size of the original test, and after several checkpoints about 0.75 size of the original test. # This test checks that free-list works and pages cache flush correctly under the concurrent load. It helps me to catch a couple of concurrent bugs (these bugs have also reproduced by yardstick benchmark, but haven't reproduced by other tests on TC). I will add a comment about this. # I think they are too low level for some configuration files, but can be configured by system properties. I will change it. # I think 64 and 4 it's reasonable values. I've benchmarked with higher values, but it almost gives no performance boost. 8 (2 per bucket)- it's too small. There will be big overhead for service objects (at least 16 bytes per object, at least 3 objects: lock, GridLongList and arr inside GridLongList), so we will have 48 bytes for service objects per bucket and only 16 bytes (2 longs) of useful data. 64/4 is a more reliable configuration since we allocate more heap space (16*8=128 bytes) for useful data than for service objects. Also, I think choosing MAX_SIZE dynamically, it's not such a good idea, since, there can be more than one node inside one JVM and we don't know when and how many nodes will be started when we start first one. # Ok, I will implement counter of empty flushed buckets. > Optionally to do not write free list updates to WAL > --- > > Key: IGNITE-6930 > URL: https://issues.apache.org/jira/browse/IGNITE-6930 > Project: Ignite > Issue Type: Task > Components: cache >Reporter: Vladimir Ozerov >Assignee: Aleksey Plekhanov >Priority: Major > Labels: IEP-8, performance > Fix For: 2.8 > > Time Spent: 10m > Remaining Estimate: 0h > > When cache entry is created, we need to write update the free list. When > entry is updated, we need to update free list(s) several times. Currently > free list is persistent structure, so every update to it must be logged to be > able to recover after crash. This may incur significant overhead, especially > for small entries. > E.g. this is how WAL for a single update looks like. "D" - updates with real > data, "F" - free-list management: > {code} > 1. [D] DataRecord [writeEntries=[UnwrapDataEntry[k = key, v = [ BinaryObject > [idHash=2053299190, hash=1986931360, typeId=-1580729813]], super = [DataEntry > [cacheId=94416770, op=UPDATE, writeVer=GridCacheVersion [topVer=122147562, > order=1510667560607, nodeOrder=1], partId=0, partCnt=4, super=WALRecord
[jira] [Comment Edited] (IGNITE-6930) Optionally to do not write free list updates to WAL
[ https://issues.apache.org/jira/browse/IGNITE-6930?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16943806#comment-16943806 ] Ivan Rakov edited comment on IGNITE-6930 at 10/3/19 6:23 PM: - [~alex_pl], I've taken a look. Some comments: 1) testRestoreFreeListCorrectlyAfterRandomStop - why do we need to disable caching here? 2) testFreeListUnderLoadMultipleCheckpoints - what is being tested? I think, we need to add comment that test is intended to cover weakened pageId != 0 assertion. 3) MAX_SIZE, STRIPES_COUNT - don't you think that we should make these options configurable? 4) How did you choose 64 and 4 as defaults? Can you share some benchmarks? I think that 64 might be on overkill: in data load scenario, data pages traverse from highest to lowest buckets by turn. I don't think that pages are likely to heavily accumulate in a certain bucket; maybe 8 as MAX_SIZE would show the same performance boost. Another option is choosing MAX_SIZE dynamically based on process -Xmx and local (caches * partitions) count. 5) PagesList.PagesCache#flush: do we need to garbage-collect all allocated long lists when we flush page cache? We can just clear() them and reuse again after the checkpoint. It should reduce GC pressure. Another option: clear() long lists, but remember number of flush calls. If absolutely empty cache bucket was flushed for a certain number of times (e.g. 10) in a row, then long lists can be finally released and collected as garbage. was (Author: ivan.glukos): [~alex_pl], I've taken a look. Some comments: 1) testRestoreFreeListCorrectlyAfterRandomStop - why do we need to disable caching here? 2) testFreeListUnderLoadMultipleCheckpoints - what is being tested? I think, we need to add comment that test is intended to cover weakened pageId != 0 assertion. 3) MAX_SIZE, STRIPES_COUNT - don't you think that we should make these options configurable? 4) How did you choose 64 and 4 as defaults? Can you share some benchmarks? I think that 64 might be on overkill: in data load scenario, data pages traverse from highest to lowest buckets by turn. I don't think that pages are likely to heavily accumulate in a certain bucket; maybe 8 as MAX_SIZE would show the same performance boost. Another option is choosing MAX_SIZE dynamically based on process -Xmx and local (caches * partitions) count. 5) PagesList.PagesCache#flush: do we need to garbage-collect all allocated long lists when we flush page cache? We can just clear() them and reuse again after the checkpoint. It should reduce GC pressure. > Optionally to do not write free list updates to WAL > --- > > Key: IGNITE-6930 > URL: https://issues.apache.org/jira/browse/IGNITE-6930 > Project: Ignite > Issue Type: Task > Components: cache >Reporter: Vladimir Ozerov >Assignee: Aleksey Plekhanov >Priority: Major > Labels: IEP-8, performance > Fix For: 2.8 > > Time Spent: 10m > Remaining Estimate: 0h > > When cache entry is created, we need to write update the free list. When > entry is updated, we need to update free list(s) several times. Currently > free list is persistent structure, so every update to it must be logged to be > able to recover after crash. This may incur significant overhead, especially > for small entries. > E.g. this is how WAL for a single update looks like. "D" - updates with real > data, "F" - free-list management: > {code} > 1. [D] DataRecord [writeEntries=[UnwrapDataEntry[k = key, v = [ BinaryObject > [idHash=2053299190, hash=1986931360, typeId=-1580729813]], super = [DataEntry > [cacheId=94416770, op=UPDATE, writeVer=GridCacheVersion [topVer=122147562, > order=1510667560607, nodeOrder=1], partId=0, partCnt=4, super=WALRecord > [size=0, chainSize=0, pos=null, type=DATA_RECORD]] > 2. [F] PagesListRemovePageRecord [rmvdPageId=00010005, > pageId=00010006, grpId=94416770, super=PageDeltaRecord > [grpId=94416770, pageId=00010006, super=WALRecord [size=37, > chainSize=0, pos=null, type=PAGES_LIST_REMOVE_PAGE]]] > 3. [D] DataPageInsertRecord [super=PageDeltaRecord [grpId=94416770, > pageId=00010005, super=WALRecord [size=129, chainSize=0, pos=null, > type=DATA_PAGE_INSERT_RECORD]]] > 4. [F] PagesListAddPageRecord [dataPageId=00010005, > super=PageDeltaRecord [grpId=94416770, pageId=00010008, > super=WALRecord [size=37, chainSize=0, pos=null, type=PAGES_LIST_ADD_PAGE]]] > 5. [F] DataPageSetFreeListPageRecord [freeListPage=281474976710664, > super=PageDeltaRecord [grpId=94416770, pageId=00010005, > super=WALRecord [size=37, chainSize=0, pos=null, > type=DATA_PAGE_SET_FREE_LIST_PAGE]]] > 6. [D] ReplaceRecord [io=DataLeafIO[ver=1], idx=0, super=PageDeltaRecord >
[jira] [Comment Edited] (IGNITE-6930) Optionally to do not write free list updates to WAL
[ https://issues.apache.org/jira/browse/IGNITE-6930?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16943806#comment-16943806 ] Ivan Rakov edited comment on IGNITE-6930 at 10/3/19 6:19 PM: - [~alex_pl], I've taken a look. Some comments: 1) testRestoreFreeListCorrectlyAfterRandomStop - why do we need to disable caching here? 2) testFreeListUnderLoadMultipleCheckpoints - what is being tested? I think, we need to add comment that test is intended to cover weakened pageId != 0 assertion. 3) MAX_SIZE, STRIPES_COUNT - don't you think that we should make these options configurable? 4) How did you choose 64 and 4 as defaults? Can you share some benchmarks? I think that 64 might be on overkill: in data load scenario, data pages traverse from highest to lowest buckets by turn. I don't think that pages are likely to heavily accumulate in a certain bucket; maybe 8 as MAX_SIZE would show the same performance boost. Another option is choosing MAX_SIZE dynamically based on process -Xmx and local (caches * partitions) count. 5) PagesList.PagesCache#flush: do we need to garbage-collect all allocated long lists when we flush page cache? We can just clear() them and reuse again after the checkpoint. It should reduce GC pressure. was (Author: ivan.glukos): [~alex_pl], I've taken a look. Some comments: 1) testRestoreFreeListCorrectlyAfterRandomStop - why do we need to disable caching here? 2) testFreeListUnderLoadMultipleCheckpoints - what is being tested? I think, we need to add comment that test is intended to cover weakened pageId != 0 assertion. 3) MAX_SIZE, STRIPES_COUNT - don't you think that we should make these options configurable? 4) How did you choose 64 and 4 as defaults? Can you share some benchmarks? I think that 64 might be on overkill: in data load scenario, data pages traverse from biggest to lowest buckets by turn. I don't think that pages are likely to heavily accumulate in a certain bucket; maybe 8 as MAX_SIZE would show the same performance boost. Another option is choosing MAX_SIZE dynamically based on process -Xmx and local (caches * partitions) count. 5) PagesList.PagesCache#flush: do we need to garbage-collect all allocated long lists when we flush page cache? We can just clear() them and reuse again after the checkpoint. It should reduce GC pressure. > Optionally to do not write free list updates to WAL > --- > > Key: IGNITE-6930 > URL: https://issues.apache.org/jira/browse/IGNITE-6930 > Project: Ignite > Issue Type: Task > Components: cache >Reporter: Vladimir Ozerov >Assignee: Aleksey Plekhanov >Priority: Major > Labels: IEP-8, performance > Fix For: 2.8 > > Time Spent: 10m > Remaining Estimate: 0h > > When cache entry is created, we need to write update the free list. When > entry is updated, we need to update free list(s) several times. Currently > free list is persistent structure, so every update to it must be logged to be > able to recover after crash. This may incur significant overhead, especially > for small entries. > E.g. this is how WAL for a single update looks like. "D" - updates with real > data, "F" - free-list management: > {code} > 1. [D] DataRecord [writeEntries=[UnwrapDataEntry[k = key, v = [ BinaryObject > [idHash=2053299190, hash=1986931360, typeId=-1580729813]], super = [DataEntry > [cacheId=94416770, op=UPDATE, writeVer=GridCacheVersion [topVer=122147562, > order=1510667560607, nodeOrder=1], partId=0, partCnt=4, super=WALRecord > [size=0, chainSize=0, pos=null, type=DATA_RECORD]] > 2. [F] PagesListRemovePageRecord [rmvdPageId=00010005, > pageId=00010006, grpId=94416770, super=PageDeltaRecord > [grpId=94416770, pageId=00010006, super=WALRecord [size=37, > chainSize=0, pos=null, type=PAGES_LIST_REMOVE_PAGE]]] > 3. [D] DataPageInsertRecord [super=PageDeltaRecord [grpId=94416770, > pageId=00010005, super=WALRecord [size=129, chainSize=0, pos=null, > type=DATA_PAGE_INSERT_RECORD]]] > 4. [F] PagesListAddPageRecord [dataPageId=00010005, > super=PageDeltaRecord [grpId=94416770, pageId=00010008, > super=WALRecord [size=37, chainSize=0, pos=null, type=PAGES_LIST_ADD_PAGE]]] > 5. [F] DataPageSetFreeListPageRecord [freeListPage=281474976710664, > super=PageDeltaRecord [grpId=94416770, pageId=00010005, > super=WALRecord [size=37, chainSize=0, pos=null, > type=DATA_PAGE_SET_FREE_LIST_PAGE]]] > 6. [D] ReplaceRecord [io=DataLeafIO[ver=1], idx=0, super=PageDeltaRecord > [grpId=94416770, pageId=00010004, super=WALRecord [size=47, > chainSize=0, pos=null, type=BTREE_PAGE_REPLACE]]] > 7. [F] DataPageRemoveRecord [itemId=0, super=PageDeltaRecord > [grpId=94416770, pageId=00010005, super=WALRecord
[jira] [Comment Edited] (IGNITE-6930) Optionally to do not write free list updates to WAL
[ https://issues.apache.org/jira/browse/IGNITE-6930?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16943806#comment-16943806 ] Ivan Rakov edited comment on IGNITE-6930 at 10/3/19 6:18 PM: - [~alex_pl], I've taken a look. Some comments: 1) testRestoreFreeListCorrectlyAfterRandomStop - why do we need to disable caching here? 2) testFreeListUnderLoadMultipleCheckpoints - what is being tested? I think, we need to add comment that test is intended to cover weakened pageId != 0 assertion. 3) MAX_SIZE, STRIPES_COUNT - don't you think that we should make these options configurable? 4) How did you choose 64 and 4 as defaults? Can you share some benchmarks? I think that 64 might be on overkill: in data load scenario, data pages traverse from biggest to lowest buckets by turn. I don't think that pages are likely to heavily accumulate in a certain bucket; maybe 8 as MAX_SIZE would show the same performance boost. Another option is choosing MAX_SIZE dynamically based on process -Xmx and local (caches * partitions) count. 5) PagesList.PagesCache#flush: do we need to garbage-collect all allocated long lists when we flush page cache? We can just clear() them and reuse again after the checkpoint. It should reduce GC pressure. was (Author: ivan.glukos): [~alex_pl], I've taken a look. Some comments: 1) testRestoreFreeListCorrectlyAfterRandomStop - why do we need to disable caching here? 2) testFreeListUnderLoadMultipleCheckpoints - what is being tested? I think, we need to add comment that test is intended to cover weakened pageId != 0 assertion. 3) MAX_SIZE, STRIPES_COUNT - don't you think that we should make these options configurable? 4) How did you choose 64 and 4 as defaults? Can you share some benchmarks? I think that 64 might be on overkill: in data load scenario, data pages traverse from biggest to lowest buckets by turn. I don't think that pages are likely to heavily accumulate in a certain bucket; maybe 8 as MAX_SIZE would show the same performance boost. 5) PagesList.PagesCache#flush: do we need to garbage-collect all allocated long lists when we flush page cache? We can just clear() them and reuse again after the checkpoint. It should reduce GC pressure. > Optionally to do not write free list updates to WAL > --- > > Key: IGNITE-6930 > URL: https://issues.apache.org/jira/browse/IGNITE-6930 > Project: Ignite > Issue Type: Task > Components: cache >Reporter: Vladimir Ozerov >Assignee: Aleksey Plekhanov >Priority: Major > Labels: IEP-8, performance > Fix For: 2.8 > > Time Spent: 10m > Remaining Estimate: 0h > > When cache entry is created, we need to write update the free list. When > entry is updated, we need to update free list(s) several times. Currently > free list is persistent structure, so every update to it must be logged to be > able to recover after crash. This may incur significant overhead, especially > for small entries. > E.g. this is how WAL for a single update looks like. "D" - updates with real > data, "F" - free-list management: > {code} > 1. [D] DataRecord [writeEntries=[UnwrapDataEntry[k = key, v = [ BinaryObject > [idHash=2053299190, hash=1986931360, typeId=-1580729813]], super = [DataEntry > [cacheId=94416770, op=UPDATE, writeVer=GridCacheVersion [topVer=122147562, > order=1510667560607, nodeOrder=1], partId=0, partCnt=4, super=WALRecord > [size=0, chainSize=0, pos=null, type=DATA_RECORD]] > 2. [F] PagesListRemovePageRecord [rmvdPageId=00010005, > pageId=00010006, grpId=94416770, super=PageDeltaRecord > [grpId=94416770, pageId=00010006, super=WALRecord [size=37, > chainSize=0, pos=null, type=PAGES_LIST_REMOVE_PAGE]]] > 3. [D] DataPageInsertRecord [super=PageDeltaRecord [grpId=94416770, > pageId=00010005, super=WALRecord [size=129, chainSize=0, pos=null, > type=DATA_PAGE_INSERT_RECORD]]] > 4. [F] PagesListAddPageRecord [dataPageId=00010005, > super=PageDeltaRecord [grpId=94416770, pageId=00010008, > super=WALRecord [size=37, chainSize=0, pos=null, type=PAGES_LIST_ADD_PAGE]]] > 5. [F] DataPageSetFreeListPageRecord [freeListPage=281474976710664, > super=PageDeltaRecord [grpId=94416770, pageId=00010005, > super=WALRecord [size=37, chainSize=0, pos=null, > type=DATA_PAGE_SET_FREE_LIST_PAGE]]] > 6. [D] ReplaceRecord [io=DataLeafIO[ver=1], idx=0, super=PageDeltaRecord > [grpId=94416770, pageId=00010004, super=WALRecord [size=47, > chainSize=0, pos=null, type=BTREE_PAGE_REPLACE]]] > 7. [F] DataPageRemoveRecord [itemId=0, super=PageDeltaRecord > [grpId=94416770, pageId=00010005, super=WALRecord [size=30, > chainSize=0, pos=null, type=DATA_PAGE_REMOVE_RECORD]]] > 8. [F] PagesListRemovePageRecord
[jira] [Comment Edited] (IGNITE-6930) Optionally to do not write free list updates to WAL
[ https://issues.apache.org/jira/browse/IGNITE-6930?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16943806#comment-16943806 ] Ivan Rakov edited comment on IGNITE-6930 at 10/3/19 6:17 PM: - [~alex_pl], I've taken a look. Some comments: 1) testRestoreFreeListCorrectlyAfterRandomStop - why do we need to disable caching here? 2) testFreeListUnderLoadMultipleCheckpoints - what is being tested? I think, we need to add comment that test is intended to cover weakened pageId != 0 assertion. 3) MAX_SIZE, STRIPES_COUNT - don't you think that we should make these options configurable? 4) How did you choose 64 and 4 as defaults? Can you share some benchmarks? I think that 64 might be on overkill: in data load scenario, data pages traverse from biggest to lowest buckets by turn. I don't think that pages are likely to heavily accumulate in a certain bucket; maybe 8 as MAX_SIZE would show the same performance boost. 5) PagesList.PagesCache#flush: do we need to garbage-collect all allocated long lists when we flush page cache? We can just clear() them and reuse again after the checkpoint. It should reduce GC pressure. was (Author: ivan.glukos): [~alex_pl], I've take a look. Some comments: 1) testRestoreFreeListCorrectlyAfterRandomStop - why do we need to disable caching here? 2) testFreeListUnderLoadMultipleCheckpoints - what is being tested? I think, we need to add comment that test is intended to cover weakened pageId != 0 assertion. 3) MAX_SIZE, STRIPES_COUNT - don't you think that we should make these options configurable? 4) How did you choose 64 and 4 as defaults? Can you share some benchmarks? I think that 64 might be on overkill: in data load scenario, data pages traverse from biggest to lowest buckets by turn. I don't think that pages are likely to heavily accumulate in a certain bucket; maybe 8 as MAX_SIZE would show the same performance boost. 5) PagesList.PagesCache#flush: do we need to garbage-collect all allocated long lists when we flush page cache? We can just clear() them and reuse again after the checkpoint. It should reduce GC pressure. > Optionally to do not write free list updates to WAL > --- > > Key: IGNITE-6930 > URL: https://issues.apache.org/jira/browse/IGNITE-6930 > Project: Ignite > Issue Type: Task > Components: cache >Reporter: Vladimir Ozerov >Assignee: Aleksey Plekhanov >Priority: Major > Labels: IEP-8, performance > Fix For: 2.8 > > Time Spent: 10m > Remaining Estimate: 0h > > When cache entry is created, we need to write update the free list. When > entry is updated, we need to update free list(s) several times. Currently > free list is persistent structure, so every update to it must be logged to be > able to recover after crash. This may incur significant overhead, especially > for small entries. > E.g. this is how WAL for a single update looks like. "D" - updates with real > data, "F" - free-list management: > {code} > 1. [D] DataRecord [writeEntries=[UnwrapDataEntry[k = key, v = [ BinaryObject > [idHash=2053299190, hash=1986931360, typeId=-1580729813]], super = [DataEntry > [cacheId=94416770, op=UPDATE, writeVer=GridCacheVersion [topVer=122147562, > order=1510667560607, nodeOrder=1], partId=0, partCnt=4, super=WALRecord > [size=0, chainSize=0, pos=null, type=DATA_RECORD]] > 2. [F] PagesListRemovePageRecord [rmvdPageId=00010005, > pageId=00010006, grpId=94416770, super=PageDeltaRecord > [grpId=94416770, pageId=00010006, super=WALRecord [size=37, > chainSize=0, pos=null, type=PAGES_LIST_REMOVE_PAGE]]] > 3. [D] DataPageInsertRecord [super=PageDeltaRecord [grpId=94416770, > pageId=00010005, super=WALRecord [size=129, chainSize=0, pos=null, > type=DATA_PAGE_INSERT_RECORD]]] > 4. [F] PagesListAddPageRecord [dataPageId=00010005, > super=PageDeltaRecord [grpId=94416770, pageId=00010008, > super=WALRecord [size=37, chainSize=0, pos=null, type=PAGES_LIST_ADD_PAGE]]] > 5. [F] DataPageSetFreeListPageRecord [freeListPage=281474976710664, > super=PageDeltaRecord [grpId=94416770, pageId=00010005, > super=WALRecord [size=37, chainSize=0, pos=null, > type=DATA_PAGE_SET_FREE_LIST_PAGE]]] > 6. [D] ReplaceRecord [io=DataLeafIO[ver=1], idx=0, super=PageDeltaRecord > [grpId=94416770, pageId=00010004, super=WALRecord [size=47, > chainSize=0, pos=null, type=BTREE_PAGE_REPLACE]]] > 7. [F] DataPageRemoveRecord [itemId=0, super=PageDeltaRecord > [grpId=94416770, pageId=00010005, super=WALRecord [size=30, > chainSize=0, pos=null, type=DATA_PAGE_REMOVE_RECORD]]] > 8. [F] PagesListRemovePageRecord [rmvdPageId=00010005, > pageId=00010008, grpId=94416770, super=PageDeltaRecord > [grpId=94416770,