[GitHub] reddycharan commented on issue #570: Multiple active entrylogs
reddycharan commented on issue #570: Multiple active entrylogs URL: https://github.com/apache/bookkeeper/issues/570#issuecomment-369856263 As far as I understood, EntryLogger is the abstraction layer for all of the operations on entrylog files. Be it how entries are written to entrylog file (how active/rotated entrylog files are organized), read from entrylog files (how read files are organized), flush/checkpoint path and getting info about the state of entrylog files for garbage collection/compaction. Also, I tried to explain in detail why it is the right thing to do abstraction in EntryLogger in this comment https://github.com/apache/bookkeeper/issues/570#issuecomment-368597767 Entrylogger is the right place to deal with multiple entrylogs, it is more organic, less churn of the code, since ideally other components of the bookie doesn't need to be modified for anything to do with entry log files. But yes, regarding sub-task4, it is valid point to raise the need of SortedLedgerStorage if we are going to have entrylog per ledger. I need to have some perf numbers to validate the write/read scenarios in the case of entrylog per ledger using InterleavedLedgerStorage vs SortedLedgerStorage. **sub-task4** - introduce parallel (have just one Runnable per ledger) EntryMemTable flusher for SortedLedgerStorage which can be used in the case of entrylogperledger > @reddycharan is there any way you can sub-divide your subtask-5? Yes, I think it should be possible. Probably in the first task I could introduce EntryLogManager interface and EntryLogManagerForSingleEntryLog and in the final task I can introduce EntryLogManagerForEntryLogPerLedger. So **sub-task1** - Removal of unnecessary synchronization in InterleavedLedgerStorage methods. As described in http://mail-archives.apache.org/mod_mbox/bookkeeper-dev/201707.mbox/%3CCAO2yDyZ946fp2S_qR2iL178hPiXgrnFGb%3DpvkyK4ReSYAtNLBw%40mail.gmail.com%3E **sub-task2** - move the complete logic of flushIntervalInBytes from EntryLogger to BufferedChannel **sub-task3** - make changes to SyncThread/checkpoint logic, so that incase of entrylogperledger, checkpoint happens for every flushInterval but not when active entrylog is created/rolled over. **sub-task4** - introduce parallel (have just one Runnable per ledger) EntryMemTable flusher for SortedLedgerStorage which can be used in the case of entrylogperledger. (evaluate the need of SortedLedgerStorage by doing perf comparisons) **sub-task5** - introduce EntryLogManager interface and EntryLogManagerForSingleEntryLog **sub-task6** - introduce EntryLogManagerForEntryLogPerLedger Thanks guys for providing the feedback. Will proceed with the plan and start creating new pull requests for the sub-tasks. But I'll leave the existing pullrequest (https://github.com/apache/bookkeeper/pull/1201), since it might be helpful to refer to get the full picture. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] reddycharan commented on issue #570: Multiple active entrylogs
reddycharan commented on issue #570: Multiple active entrylogs URL: https://github.com/apache/bookkeeper/issues/570#issuecomment-369856263 As far as I understood, EntryLogger is the abstraction layer for all of the operations on entrylog files. Be it how entries are written to entrylog file (how active/rotated entrylog files are organized), read from entrylog files (how read files are organized), flush/checkpoint path and getting info about the state of entrylog files for garbage collection/compaction. Also, I tried to explain in detail why it is the right thing to do abstraction in EntryLogger in this comment https://github.com/apache/bookkeeper/issues/570#issuecomment-368597767 Entrylogger is the right place to deal with multiple entrylogs, it is more organic, less churn of the code, since ideally other components of the bookie doesn't need to be modified much. But yes, regarding sub-task4, it is valid point to raise the need of SortedLedgerStorage if we are going to have entrylog per ledger. I need to have some perf numbers to validate the write/read scenarios in the case of entrylog per ledger using InterleavedLedgerStorage vs SortedLedgerStorage. **sub-task4** - introduce parallel (have just one Runnable per ledger) EntryMemTable flusher for SortedLedgerStorage which can be used in the case of entrylogperledger > @reddycharan is there any way you can sub-divide your subtask-5? Yes, I think it should be possible. Probably in the first task I could introduce EntryLogManager interface and EntryLogManagerForSingleEntryLog and in the final task I can introduce EntryLogManagerForEntryLogPerLedger. So **sub-task1** - Removal of unnecessary synchronization in InterleavedLedgerStorage methods. As described in http://mail-archives.apache.org/mod_mbox/bookkeeper-dev/201707.mbox/%3CCAO2yDyZ946fp2S_qR2iL178hPiXgrnFGb%3DpvkyK4ReSYAtNLBw%40mail.gmail.com%3E **sub-task2** - move the complete logic of flushIntervalInBytes from EntryLogger to BufferedChannel **sub-task3** - make changes to SyncThread/checkpoint logic, so that incase of entrylogperledger, checkpoint happens for every flushInterval but not when active entrylog is created/rolled over. **sub-task4** - introduce parallel (have just one Runnable per ledger) EntryMemTable flusher for SortedLedgerStorage which can be used in the case of entrylogperledger. (evaluate the need of SortedLedgerStorage by doing perf comparisons) **sub-task5** - introduce EntryLogManager interface and EntryLogManagerForSingleEntryLog **sub-task6** - introduce EntryLogManagerForEntryLogPerLedger Thanks guys for providing the feedback. Will proceed with the plan and start creating new pull requests for the sub-tasks. But I'll leave the existing pullrequest (https://github.com/apache/bookkeeper/pull/1201), since it might be helpful to refer to get the full picture. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] reddycharan commented on issue #570: Multiple active entrylogs
reddycharan commented on issue #570: Multiple active entrylogs URL: https://github.com/apache/bookkeeper/issues/570#issuecomment-369537838 >Just a question, is some code based on this change running in some production system ? @jvrao No, not yet, we are in the process of pushing all of our internal changes to the community code and eventually use the community repo version, so that we can be close to the community. Since we are in this transition phase we kept hold of any big changes to our repo and instead work directly with the community repo. >Effectively it seems to me that this new kind of storage is vert different from InterleavedStorage. I remember we already discussed about this. Is it really a benefit to change InterleavedS and SortedS and not to start from scratch eventually creating a base class for common code? Ideally with the design approach i chose, there shouldn't be any changes to InterleavedLedgerStorage class and SortedLedgerStorage class. Even now the changes made to these two classes are minimal. These changes are required for the following reasons - 1) Removal of unnecessary synchronization in InterleavedLedgerStorage methods. As described in http://mail-archives.apache.org/mod_mbox/bookkeeper-dev/201707.mbox/%3CCAO2yDyZ946fp2S_qR2iL178hPiXgrnFGb%3DpvkyK4ReSYAtNLBw%40mail.gmail.com%3E 2) minor changes to checkpoint logic, since in entryLogPerLedger it should not checkpoint when a new entrylog is created but instead it should checkpoint for every "flushInterval" period and also during this time it should flush both rotatedLogs and current active logs. 3) couple of minor changes in SortedLedgerStorage regarding how entryLogger methods are called and creation of flushexecutor and passing it to EntryMemTable in the case of entryLogPerLedger. The crux of the implementation change is in EntryLogger class and for the reasons explained above abstraction is done in EntryLogger class. >I think a more confortable way to get this in is to split the patch into smaller changes. @reddycharan Ok, I've spent multiple weeks on this work item. If it is going to make things easier on reviewers I can consider splitting this PR into multiple PRs (hoping it wont take heavy toll on me and complicate things further). Probably I can split like following **sub-task1** - Removal of unnecessary synchronization in InterleavedLedgerStorage methods. As described in http://mail-archives.apache.org/mod_mbox/bookkeeper-dev/201707.mbox/%3CCAO2yDyZ946fp2S_qR2iL178hPiXgrnFGb%3DpvkyK4ReSYAtNLBw%40mail.gmail.com%3E **sub-task2** - move the complete logic of flushIntervalInBytes from EntryLogger to BufferedChannel **sub-task3** - make changes to SyncThread/checkpoint logic, so that incase of entrylogperledger, checkpoint happens for every flushInterval but not when active entrylog is created/rolled over. **sub-task4** - introduce parallel (have just one Runnable per ledger) EntryMemTable flusher for SortedLedgerStorage which can be used in the case of entrylogperledger **sub-task5** - Introduce EntryLogManager abstraction in EntryLogger and let it deal with the singleentrylog/entrylogperledger functionality and core business logic. As you can imagine, sub-task5 is going to be the crux of this work item and majority of my code change and the design choice of abstraction at EntryLogger is going to remain. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] reddycharan commented on issue #570: Multiple active entrylogs
reddycharan commented on issue #570: Multiple active entrylogs URL: https://github.com/apache/bookkeeper/issues/570#issuecomment-369537838 >Just a question, is some code based on this change running in some production system ? @jvrao No, not yet, we are in the process of pushing all of our internal changes to the community code and eventually use the community repo version, so that we can be close to the community. Since we are in this transition phase we kept hold of any big changes to our repo and instead work directly with the community repo. >Effectively it seems to me that this new kind of storage is vert different from InterleavedStorage. I remember we already discussed about this. Is it really a benefit to change InterleavedS and SortedS and not to start from scratch eventually creating a base class for common code? Ideally with the design approach i chose, there shouldn't be any changes to InterleavedLedgerStorage class and SortedLedgerStorage class. Even now the changes made to these two classes are minimal. These changes are required for the following reasons - 1) Removal of unnecessary synchronization in InterleavedLedgerStorage methods. As described in http://mail-archives.apache.org/mod_mbox/bookkeeper-dev/201707.mbox/%3CCAO2yDyZ946fp2S_qR2iL178hPiXgrnFGb%3DpvkyK4ReSYAtNLBw%40mail.gmail.com%3E 2) minor changes to checkpoint logic, since in entryLogPerLedger it should not checkpoint when a new entrylog is created but instead it should checkpoint for every "flushInterval" period and also during this time it should flush both rotatedLogs and current active logs. 3) couple of minor changes in SortedLedgerStorage regarding how entryLogger methods are called and creation of flushexecutor and passing it to EntryMemTable in the case of entryLogPerLedger. The crux of the implementation change is in EntryLogger class and for the reasons explained above abstraction is done in EntryLogger class. >I think a more confortable way to get this in is to split the patch into smaller changes. @reddycharan Ok, I've spent multiple weeks on this work item. If it is going to make things easier on reviewers I can consider splitting this PR into multiple PRs (hoping it wont take heavy toll on me and complicate things further). Probably I can split like following **sub-task1** - Removal of unnecessary synchronization in InterleavedLedgerStorage methods. As described in http://mail-archives.apache.org/mod_mbox/bookkeeper-dev/201707.mbox/%3CCAO2yDyZ946fp2S_qR2iL178hPiXgrnFGb%3DpvkyK4ReSYAtNLBw%40mail.gmail.com%3E **sub-task2** - moved the complete logic of flushIntervalInBytes from EntryLogger to BufferedChannel **sub-task3** - make changes to SyncThread, so that incase of entrylogperledger, checkpoint happens for every flushInterval but not when active entrylog is created/rolled over. **sub-task4** - introduce parallel (have just one Runnable per ledger) EntryMemTable flusher for SortedLedgerStorage which can be used in the case of entrylogperledger **sub-task5** - Introduce EntryLogManager abstraction in EntryLogger and let it deal with the singleentrylog/entrylogperledger functionality and core business logic. As you can imagine, sub-task5 is going to be the crux of this work item and majority of my code change and the design choice of abstraction at EntryLogger is going to remain. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] reddycharan commented on issue #570: Multiple active entrylogs
reddycharan commented on issue #570: Multiple active entrylogs URL: https://github.com/apache/bookkeeper/issues/570#issuecomment-369537838 >Just a question, is some code based on this change running in some production system ? @jvrao No, not yet, we are in the process of pushing all of our internal changes to the community code and eventually use the community repo version, so that we can be close to the community. Since we are in this transition phase we kept hold of any big changes to our repo and instead work directly with the community repo. >Effectively it seems to me that this new kind of storage is vert different from InterleavedStorage. I remember we already discussed about this. Is it really a benefit to change InterleavedS and SortedS and not to start from scratch eventually creating a base class for common code? Ideally with the design approach i chose, there shouldn't be any changes to InterleavedLedgerStorage class and SortedLedgerStorage class. Even now the changes made to these two classes are minimal. These changes are required for the following reasons - 1) Removal of unnecessary synchronization in InterleavedLedgerStorage methods. As described in http://mail-archives.apache.org/mod_mbox/bookkeeper-dev/201707.mbox/%3CCAO2yDyZ946fp2S_qR2iL178hPiXgrnFGb%3DpvkyK4ReSYAtNLBw%40mail.gmail.com%3E 2) minor changes to checkpoint logic, since in entryLogPerLedger it should not checkpoint when a new entrylog is created but instead it should checkpoint for every "flushInterval" period and also during this time it should flush both rotatedLogs and current active logs. 3) couple of minor changes in SortedLedgerStorage regarding how entryLogger methods are called and creation of flushexecutor and passing it to EntryMemTable in the case of entryLogPerLedger. The crux of the implementation change is in EntryLogger class and for the reasons explained above abstraction is done in EntryLogger class. >I think a more confortable way to get this in is to split the patch into smaller changes. @reddycharan Ok, I've spent multiple weeks on this work item. If it is going to make things easier on reviewers I can consider splitting this PR into multiple PRs (hoping it wont take heavy toll on me and complicate things further). Probably I can split like following **sub-task1** - Removal of unnecessary synchronization in InterleavedLedgerStorage methods. As described in http://mail-archives.apache.org/mod_mbox/bookkeeper-dev/201707.mbox/%3CCAO2yDyZ946fp2S_qR2iL178hPiXgrnFGb%3DpvkyK4ReSYAtNLBw%40mail.gmail.com%3E **sub-task2** - move the complete logic of flushIntervalInBytes from EntryLogger to BufferedChannel **sub-task3** - make changes to SyncThread, so that incase of entrylogperledger, checkpoint happens for every flushInterval but not when active entrylog is created/rolled over. **sub-task4** - introduce parallel (have just one Runnable per ledger) EntryMemTable flusher for SortedLedgerStorage which can be used in the case of entrylogperledger **sub-task5** - Introduce EntryLogManager abstraction in EntryLogger and let it deal with the singleentrylog/entrylogperledger functionality and core business logic. As you can imagine, sub-task5 is going to be the crux of this work item and majority of my code change and the design choice of abstraction at EntryLogger is going to remain. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] reddycharan commented on issue #570: Multiple active entrylogs
reddycharan commented on issue #570: Multiple active entrylogs URL: https://github.com/apache/bookkeeper/issues/570#issuecomment-369537838 >Just a question, is some code based on this change running in some production system ? @jvrao No, not yet, we are in the process of pushing all of our internal changes to the community code and eventually use the community repo version, so that we can be close to the community. Since we are in this transition phase we kept hold of any big changes to our repo and instead work directly with the community repo. >Effectively it seems to me that this new kind of storage is vert different from InterleavedStorage. I remember we already discussed about this. Is it really a benefit to change InterleavedS and SortedS and not to start from scratch eventually creating a base class for common code? Ideally with the design approach i chose, there shouldn't be any changes to InterleavedLedgerStorage class and SortedLedgerStorage class. Even now the changes made to these two classes are minimal. These changes are required for the following reasons - 1) Removal of unnecessary synchronization in InterleavedLedgerStorage methods. As described in http://mail-archives.apache.org/mod_mbox/bookkeeper-dev/201707.mbox/%3CCAO2yDyZ946fp2S_qR2iL178hPiXgrnFGb%3DpvkyK4ReSYAtNLBw%40mail.gmail.com%3E 2) minor changes to checkpoint logic, since in entryLogPerLedger it should not checkpoint when a new entrylog is created but instead it should checkpoint for every "flushInterval" period and also during this time it should flush both rotatedLogs and current active logs. 3) couple of minor changes in SortedLedgerStorage regarding how entryLogger methods are called and creation of flushexecutor and passing it to EntryMemTable in the case of entryLogPerLedger. The crux of the implementation change is in EntryLogger class and for the reasons explained above abstraction is done in EntryLogger class. >I think a more confortable way to get this in is to split the patch into smaller changes. @reddycharan Ok, I've spent multiple weeks on this work item. If it is going to make things easier on reviewers I can consider splitting this PR into multiple PRs (hoping it wont take heavy toll on me and complicate things further). Probably I can split like following sub-task1 - Removal of unnecessary synchronization in InterleavedLedgerStorage methods. As described in http://mail-archives.apache.org/mod_mbox/bookkeeper-dev/201707.mbox/%3CCAO2yDyZ946fp2S_qR2iL178hPiXgrnFGb%3DpvkyK4ReSYAtNLBw%40mail.gmail.com%3E sub-task2 - moved the complete logic of flushIntervalInBytes from EntryLogger to BufferedChannel sub-task3 - make changes to SyncThread, so that incase of entrylogperledger, checkpoint happens for every flushInterval but not when active entrylog is created/rolled over. sub-task4 - introduce parallel (have just one Runnable per ledger) EntryMemTable flusher for SortedLedgerStorage which can be used in the case of entrylogperledger sub-task5 - Introduce EntryLogManager abstraction in EntryLogger and let it deal with the singleentrylog/entrylogperledger functionality and core business logic. As you can imagine, sub-task5 is going to be the crux of this work item and majority of my code change and the design choice of abstraction at EntryLogger is going to remain. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] reddycharan commented on issue #570: Multiple active entrylogs
reddycharan commented on issue #570: Multiple active entrylogs URL: https://github.com/apache/bookkeeper/issues/570#issuecomment-369416511 Hey @ivankelly , to answer ?when/where was this discussed part?, I can point you to the meeting notes, mail exchanges, slack messages and git links which I shared with the community throughout the period. This work item was initially created by @jvrao during April 2017 https://issues.apache.org/jira/browse/BOOKKEEPER-1041. And by May 2017 I came up with design overview and implementation choices and presented to the community on May 18th community meeting. You can find the meeting minutes of what I presented on that day - https://cwiki.apache.org/confluence/display/BOOKKEEPER/2017-5-18+Meeting+Notes and I explained the design overview in the same jira issue. And yes, initially the design was to have configured number of entrylogs per ledgerdir. While implementing it, I noticed few issues with how checkpoint is done, regarding synchronization logic and preallocation of entry logs and raised the concerns regarding the same in the community by starting mail threads and having conversations in the community bi-weekly meets. In one of such mail conversation, @sijie asked same set of questions regarding where the abstraction/composition should happen. I explained in detail in the following mail http://mail-archives.apache.org/mod_mbox/bookkeeper-dev/201707.mbox/%3CCAAFz1KNJbPnhLGwp27kKKGgOtLiDPR7FHpSngdcN-C-Njxt7eQ%40mail.gmail.com%3E . Also in this email I provided git link for the initial implementation of code https://github.com/reddycharan/bookkeeper/tree/multipleentrylogs. Following this mail conversation @sijie , @jvrao and I had in length conversation regarding where the composition should happen and convinced @sijie that given the way code is structured it is right thing to do (abstraction/composition) at lowest level - Entrylogger level. But in the follow up, I?ve been questioned about the intricacies of multithreaded/synchronization aspects of slot map management in configured number of entrylogs per ledgerdir approach and the benefits of having configured number of entrylogs per ledgerdir vs explicit entry log per ledger in the compaction story (since it is binary decision of whether to keep the entry log or not during compaction in the entrylog per ledger case). So I changed the design to entrylog per ledger, which changed the logic of choosing entry log for the given ledger but the underlying changes of the abstraction of entry logger remained the same. This has been informed to the community formally in several community meet calls and Sijie in particular (multiple times) before proceeding. https://cwiki.apache.org/confluence/display/BOOKKEEPER/2017-06-01+Meeting+Notes https://cwiki.apache.org/confluence/display/BOOKKEEPER/2017-10-19+Meeting+Notes https://cwiki.apache.org/confluence/display/BOOKKEEPER/2017-11-30+Meeting+Notes https://cwiki.apache.org/confluence/display/BOOKKEEPER/2017-12-14+Meeting+Notes. As I requested in 2017/12/14 meeting I shared preview version of my code in community slack https://apachebookkeeper.slack.com/archives/C6G5104SF/p1513212259000262 and the code is shared in my GitHub repo - https://github.com/reddycharan/bookkeeper/tree/entrylogperledger . @eolivelli and @sijie were helpful to do early CR, provide some initial feedback and gave their approval. Finally as I envisioned, discussed and shared, after rebasing my changes on the recent community code I created formal pull request - https://github.com/apache/bookkeeper/pull/1201 and also updated this issue with formal description of entry log per ledger design which I?ve been discussing all along. Agreed, I could have updated this issue description with the new design back then itself. But I?m not sure if it would have made difference since this was communicated/explained multiple times in the community meets and particularly to the interested people. For the people who are coming across this work item for the first time, it is needed/required when I created the formal pull request and hence I made sure I updated issue with full description (new design) before sending the Pull Request. I can say community has changed a lot in the amount of activity, processes, systems used and the list of participants in the last year (2017). This Work Item has spanned during this transition phase (it took quite longer time, this is because of change in priorities on our side, me dealing with multiple repos - internals salesforce one and community, deluge of commits in the second half of 2017 and the need to rebase my commits on top of them every single time I wanted to push it forward) and I see why you might have different opinion, considering you weren?t present in any of the above mentioned conversations/communications. But I see you are the one who migrated jira work item to git issues, but I?m not sur
[GitHub] reddycharan commented on issue #570: Multiple active entrylogs
reddycharan commented on issue #570: Multiple active entrylogs URL: https://github.com/apache/bookkeeper/issues/570#issuecomment-369416511 Hey @ivankelly , to answer ?when/where was this discussed part?, I can point you to the meeting notes, mail exchanges, slack messages and git links which I shared with the community throughout the period. This work item was initially created by @jvrao during April 2017 https://issues.apache.org/jira/browse/BOOKKEEPER-1041. And by May 2017 I came up with design overview and implementation choices and presented to the community on May 18th community meeting. You can find the meeting minutes of what I presented on that day - https://cwiki.apache.org/confluence/display/BOOKKEEPER/2017-5-18+Meeting+Notes and I explained the design overview in the same jira issue. And yes, initially the design was to have configured number of entrylogs per ledgerdir. While implementing it, I noticed few issues with how checkpoint is done, regarding synchronization logic and preallocation of entry logs and raised the concerns regarding the same in the community by starting mail threads and having conversations in the community bi-weekly meets. In one of such mail conversation, @sijie asked same set of questions regarding where the abstraction/composition should happen. I explained in detail in the following mail http://mail-archives.apache.org/mod_mbox/bookkeeper-dev/201707.mbox/%3CCAAFz1KNJbPnhLGwp27kKKGgOtLiDPR7FHpSngdcN-C-Njxt7eQ%40mail.gmail.com%3E . Also in this email I provided git link for the initial implementation of code https://github.com/reddycharan/bookkeeper/tree/multipleentrylogs. Following this mail conversation @sijie , @jvrao and I had in length conversation regarding where the composition should happen and convinced @sijie that given the way code is structured it is right thing to do (abstraction/composition) at lowest level - Entrylogger level. But in the follow up, I?ve been questioned about the intricacies of multithreaded/synchronization aspects of slot map management in configured number of entrylogs per ledgerdir approach and the benefits of explicit entry log per ledger in the compaction story (binary decision of whether to keep the entry log or not during compaction). So I changed the design to entrylog per ledger, which changed the logic of choosing entry log for the given ledger but the underlying changes of the abstraction of entry logger remained the same. This has been informed to the community formally in several community meet calls and Sijie in particular (multiple times) before proceeding. https://cwiki.apache.org/confluence/display/BOOKKEEPER/2017-06-01+Meeting+Notes https://cwiki.apache.org/confluence/display/BOOKKEEPER/2017-10-19+Meeting+Notes https://cwiki.apache.org/confluence/display/BOOKKEEPER/2017-11-30+Meeting+Notes https://cwiki.apache.org/confluence/display/BOOKKEEPER/2017-12-14+Meeting+Notes. As I requested in 2017/12/14 meeting I shared preview version of my code in community slack https://apachebookkeeper.slack.com/archives/C6G5104SF/p1513212259000262 and the code is shared in my GitHub repo - https://github.com/reddycharan/bookkeeper/tree/entrylogperledger . @eolivelli and @sijie were helpful to do early CR, provide some initial feedback and gave their approval. Finally as I envisioned, discussed and shared, after rebasing my changes on the recent community code I created formal pull request - https://github.com/apache/bookkeeper/pull/1201 and also updated this issue with formal description of entry log per ledger design which I?ve been discussing all along. Agreed, I could have updated this issue description with the new design back then itself. But I?m not sure if it would have made difference since this was communicated/explained multiple times in the community meets and particularly to the interested people. For the people who are coming across this work item for the first time, it is needed/required when I created the formal pull request and hence I made sure I updated issue with full description (new design) before sending the Pull Request. I can say community has changed a lot in the amount of activity, processes, systems used and the list of participants in the last year (2017). This Work Item has spanned during this transition phase (it took quite longer time, this is because of change in priorities on our side, me dealing with multiple repos - internals salesforce one and community, deluge of commits in the second half of 2017 and the need to rebase my commits on top of them every single time I wanted to push it forward) and I see why you might have different opinion, considering you weren?t present in any of the above mentioned conversations/communications. But I see you are the one who migrated jira work item to git issues, but I?m not sure if it was automated task or well curated manual task. Hope this answers ?when/where was th
[GitHub] reddycharan commented on issue #570: Multiple active entrylogs
reddycharan commented on issue #570: Multiple active entrylogs URL: https://github.com/apache/bookkeeper/issues/570#issuecomment-369416511 Hey @ivankelly , to answer ?when/where was this discussed part?, I can point you to the meeting notes, mail exchanges, slack messages and git links which I shared with the community throughout the period. This work item was initially created by @jvrao during April 2017 https://issues.apache.org/jira/browse/BOOKKEEPER-1041. And by May 2017 I came up with design overview and implementation choices and presented to the community on May 18th community meeting. You can find the meeting minutes of what I presented on that day - https://cwiki.apache.org/confluence/display/BOOKKEEPER/2017-5-18+Meeting+Notes and I explained the design overview in the same jira issue. And yes, initially the design was to have configured number of entrylogs per ledgerdir. While implementing it, I noticed few issues with how checkpoint is done, regarding synchronization logic and preallocation of entry logs and raised the concerns regarding the same in the community by starting mail threads and having conversations in the community bi-weekly meets. In one of such mail conversation, @sijie asked same set of questions regarding where the abstraction/composition should happen. I explained in detail in the following mail http://mail-archives.apache.org/mod_mbox/bookkeeper-dev/201707.mbox/%3CCAAFz1KNJbPnhLGwp27kKKGgOtLiDPR7FHpSngdcN-C-Njxt7eQ%40mail.gmail.com%3E . Also in this email I provided git link for the initial implementation of code https://github.com/reddycharan/bookkeeper/tree/multipleentrylogs. Following this mail conversation @sijie , @jvrao and I had in length conversation regarding where the composition should happen and convinced @sijie that given the way code is structured it is right thing to do (abstraction/composition) at lowest level - Entrylogger level. But in the follow up, I?ve been questioned about the intricacies of multithreaded/synchronization aspects of slot map management in configured number of entrylogs per ledgerdir approach and the benefits of explicit entry log per ledger in the compaction story (binary decision of whether to keep the entry log or not during compaction). So I changed the design to entrylog per ledger, which changed the logic of choosing entry log for the given ledger but the underlying changes of the abstraction of entry logger remained the same. This has been informed to the community formally in several community meet calls and Sijie in particular (multiple times) before proceeding. https://cwiki.apache.org/confluence/display/BOOKKEEPER/2017-06-01+Meeting+Notes https://cwiki.apache.org/confluence/display/BOOKKEEPER/2017-10-19+Meeting+Notes https://cwiki.apache.org/confluence/display/BOOKKEEPER/2017-11-30+Meeting+Notes https://cwiki.apache.org/confluence/display/BOOKKEEPER/2017-12-14+Meeting+Notes. As I requested in 2017/12/14 meeting I shared preview version of my code in community slack https://apachebookkeeper.slack.com/archives/C6G5104SF/p1513212259000262 and the code is shared in my GitHub repo - https://github.com/reddycharan/bookkeeper/tree/entrylogperledger . @eolivelli and @sijie were helpful to do early CR, provide some initial feedback and gave their approval. Finally as I envisioned, discussed and shared, after rebasing my changes on the recent community code I created formal pull request - https://github.com/apache/bookkeeper/pull/1201 and also updated this issue with formal description of entry log per ledger design which I?ve been discussing all along. Agreed, I could have updated this issue description with the new design back then itself. But I?m not sure if it would have made difference since this was communicated/explained multiple times in the community meets and particularly to the interested people. For the people who are coming across this work item for the first time, it is needed/required when I created the formal pull request and hence I made sure I updated issue with full description (new design) before sending the Pull Request. I can say community has changed a lot in the amount of activity, processes, systems used and the list of participants in the last year (2017). This Work Item has spanned during this transition phase and I see why you might have different opinion, considering you weren?t present in any of the above mentioned conversations/communications. But I see you are the one who migrated jira work item to git issues, but I?m not sure if it was automated task or well curated manual task. Hope this answers ?when/where was this discussed part?. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infra
[GitHub] reddycharan commented on issue #570: Multiple active entrylogs
reddycharan commented on issue #570: Multiple active entrylogs URL: https://github.com/apache/bookkeeper/issues/570#issuecomment-369416511 Hey @ivankelly , to answer ?when/where was this discussed part?, I can point you to the meeting notes, mail exchanges, slack messages and git links which I shared with the community throughout the period. This work item was initially created by @jvrao during April 2017 https://issues.apache.org/jira/browse/BOOKKEEPER-1041. And by May 2017 I came up with design overview and implementation choices and presented to the community on May 18th community meeting. You can find the meeting minutes of what I presented on that day - https://cwiki.apache.org/confluence/display/BOOKKEEPER/2017-5-18+Meeting+Notes and I explained the design overview in the same jira issue. And yes, initially the design was to have configured number of entrylogs per ledgerdir. While implementing it, I noticed few issues with how checkpoint is done, regarding synchronization logic and preallocation of entry logs and raised the concerns regarding the same in the community by starting mail threads and having conversations in the community bi-weekly meets. In one of such mail conversation, @sijie asked same set of questions regarding where the abstraction/composition should happen. I explained in detail in the following mail http://mail-archives.apache.org/mod_mbox/bookkeeper-dev/201707.mbox/%3CCAAFz1KNJbPnhLGwp27kKKGgOtLiDPR7FHpSngdcN-C-Njxt7eQ%40mail.gmail.com%3E . Also in this email I provided git link for the initial implementation of code https://github.com/reddycharan/bookkeeper/tree/multipleentrylogs. Following this mail conversation @sijie , @jvrao and I had in length conversation regarding where the composition should happen and convinced @sijie that given the way code is structured it is right thing to do (abstraction/composition) at lowest level - Entrylogger level. But in the follow up, I?ve been questioned about the intricacies of multithreaded/synchronization aspects of slot map management in configured number of entrylogs per ledgerdir approach and the benefits of explicit entry log per ledger in the compaction story (binary decision of whether to keep the entry log or not during compaction). So I changed the design to entrylog per ledger, which changed the logic of choosing entry log for the given ledger but the underlying changes of the abstraction of entry logger remained the same. This has been informed to the community formally in several community meet calls and Sijie in particular (multiple times) before proceeding. https://cwiki.apache.org/confluence/display/BOOKKEEPER/2017-06-01+Meeting+Notes https://cwiki.apache.org/confluence/display/BOOKKEEPER/2017-10-19+Meeting+Notes https://cwiki.apache.org/confluence/display/BOOKKEEPER/2017-11-30+Meeting+Notes https://cwiki.apache.org/confluence/display/BOOKKEEPER/2017-12-14+Meeting+Notes. As I requested in 2017/12/14 meeting I shared preview version of my code in community slack https://apachebookkeeper.slack.com/archives/C6G5104SF/p1513212259000262 and the code is shared in my GitHub repo - https://github.com/reddycharan/bookkeeper/tree/entrylogperledger . @eolivelli and @sijie were helpful to do early CR, provide some initial feedback and gave their approval. Finally as I envisioned, discussed and shared, after rebasing my changes on the recent community code I created formal pull request - https://github.com/apache/bookkeeper/pull/1201 and also updated this issue with formal description of entry log per ledger design which I?ve been discussing all along. Agreed, I could have updated this issue description with the new design back then itself. But I?m not sure if it would have made difference since this was communicated/explained multiple times in the community meets and particularly to the interested people multiple times. For the people who are coming across this work item for the first time, it is needed/required when I created the formal pull request and hence I made sure I updated issue with full description (new design) before sending the Pull Request. I can say community has changed a lot in the amount of activity, processes, systems used and the list of participants in the last year (2017). This Work Item has spanned during this transition phase and I see why you might have different opinion, considering you weren?t present in any of the above mentioned conversations/communications. But I see you are the one who migrated jira work item to git issues, but I?m not sure if it was automated task or well curated manual task. Hope this answers ?when/where was this discussed part?. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, pleas
[GitHub] reddycharan commented on issue #570: Multiple active entrylogs
reddycharan commented on issue #570: Multiple active entrylogs URL: https://github.com/apache/bookkeeper/issues/570#issuecomment-368597767 Hey Ivan.. Sijie, JV and our internal team had long conversation about the same and I responded to community aswell in one of the mail Initially when we designed to implement this feature by providing config for number of entrylogs per ledgerdir and because of that the logic was convoluted with slot mapping management and Sijie felt that it is complicated and questioned about it. But now with entrylog per ledger, the management logic in entrylogger is simpler and straightforward. The same amount of complication will be applicable even in the case of composition at LedgerStorage level. When we get addEntry at ComposedLedgerStorage there should be logic to which LedgerStorage it should goto. So with this fact I can say that composing at LedgerStorage level it will be atleast as complicated as composing at EntryLogger level. with the ComposedLedgerStorage approach (Composite Pattern), ComposedLedgerStorage managing N InterleavedLedgerStorage/SortedLedgerStorages, there is going to be considerable change in the way the resources and state information are handled. Each Interleaved/SortedLedgerStorage will have its own EntryLogger (to serve our MultipleEntryLogs feature), but rest all needs to be shared, mainly 'LedgerCache', 'gcThread' and 'activeLedgers'. So there is going to be major churn in the code for moving the operations dealing with shared resources and state to ComposedLedgerStorage and leave the rest in InterleavedLedgerStorage. Amount of changes required in testcode to deal with these changes is even more. We have to do this while providing backward compatibility (Single EntryLog). Now this should make one question where should the composition happen. As far as I can say, it is supposed to happen at the lowest possible level rather than at higher level, which needs extra band-aid efforts to deal separately for the common resources/state and multiplexable resources. with composition at EntryLogger level, currently in my implementation getEntry/readEntry path, index manager and GC components are unchanged. but with composition at LedgerStorage even for readEntry/getEntry the multiplexing/redirection needs to happen. This is dealbreaker for me. Things get very messy with SortedLedgerSotrage entries in Memtable when Ledgerdir of the entryLog becomes full (we have to rotate current entrylog and create new one in writableledgerdir) while simulataneous writes and read are happening. Also this needs instance of LedgerStorage class for each entryLog in readpath and corresponding EntryMemTable if it is SortedLedgerStorage, though Memtable is not needed in read case. Most importantly we need to provide this feature with backward compatibility (we should be able to read previous data) and also provide existing behaviour (single entrylog) with config option. and mainly LedgerStorage is not the only consumer of EntryLogger, but also GarbageCollectorThread. It calls quite a few methods of EntryLogger - (entryLogger.addEntry, entryLogger.flush, entryLogger.removeEntryLog, entryLogger.scanEntryLog, entryLogger.getLeastUnflushedLogId, entryLogger.logExists, entryLogger.getEntryLogMetadata). It is going to be an issue with ComposedLedgerStorage, because with that EntryLogger is going to be responsible for just one EntryLog. For GarbageCollectorThread to work correctly with multiple EntryLogs, composition is required here as well. Which is double whammy when it comes down to implementation. Anyhow the only posiible shortcoming i see with multiplexing at Entrylogger level is that we are having single MemTable in the case of SortedLedgerStorage, but we mitigated that issue by doing parallel flushes. In terms of complexity and code churn if we go with multiplexing at ledgerstorage class level it will be several times higher than the current implementation with not much benefits. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] reddycharan commented on issue #570: Multiple active entrylogs
reddycharan commented on issue #570: Multiple active entrylogs URL: https://github.com/apache/bookkeeper/issues/570#issuecomment-368185454 **Design Overview** **Entrylog per Ledger** As name suggests, in this approach in a bookie we are going to have active entrylog dedicated to active ledger. But it is not a strict enforcement of one-to-one mapping of ledger to entrylog in a bookie. Strict one-to-one relationship between ledger and entrylog in a bookie is not possible for multiple reasons (ledgerdir might become full, entrylog might reach its capacity, segmentation/replication can happen, possibility of bookie crash,..). Besides, once entrylog is rotated it cannt be reopened. Since while rotating entrylog file, EntryLogger appends the ledger map at the end of the entry log and updates the entrylog file with the offset and size of the map. It is like sealing the entrylog file. And EntryLogger maintains a pointer called 'leastUnflushedLogId', which specifies the least entrylogid which is not yet rotated and closed and GC considers all the entrylogs with logid lesser than 'leastUnflushedLogId' (entrylogids are sequential numbers) are eligible for compaction/garbagecollection. In summary once the entrylog is rotated and closed we need to maintain immutable semantics on entrylog file. So instead we can provide relaxed constraint where an entrylog is dedicated/committed to a ledger but not otherway around. So in most cases there would be just one entrylog for a ledger in a bookie, but in situations like when entrylog reaches capacity it is rotated and new one is created for that ledger, when ledgerdir is full all the entrylogs in that ledgerdir are rotated and new ones are created for those ledgers, because of segmentation and replication various segments of ledger might end up in different entrylogs and because of a bookie crash while replaying the journal new entrylog will be created for the leftout entries in the journal., entries of a ledger might end up in different entrylogs in a bookie. To summarize briefly about this approach. - is to have server configuration specifying entrylogperledger is enabled - for the previous behavior (one active entrylog) that config can be set to false - when entrylogger receives addEntry call, it needs to know the entry log for the current ledger - so EntryLogger needs to maintain state information of mapping of ledgerId to entrylogid. If the in-memory map doesn't contain entry for the ledger, then EntryLogger will create a new Entrylog and add the mapping of ledgerId to EntryLog. - for creation of new entrylog, EntryLogger will pick writable ledgerdir with least number of active entrylogs - if entrylog reaches the capacity, then it will be rotated and new entrylog will be assigned to that ledger and mapping will be updated - If a ledgerdir becomes full, then all the entrylogs in that ledgerdir, should be rotated. New EntryLogs should be created in the available writable ledgerdirs for those ledgers and the mapping should be updated. - when ledgerdir becomes writable again that ledgerdir should become eligible for creation of new entrylogs - Currently Bookie is not informed about the writeclose of the ledger, so there needs to be a way to know when to remove the mapping entry from the map and rotate the entrylog. One simple way to handle it is to use cache (Guava Cache library) with timebased eviction policy (on last access) and as part of removal listener we can rotate the corresponding entrylog. - Time based eviction policy is simple to provide, but untill entrylog file is rotated and flushed, filehandles of entrylogs are kept open and it wont become eligible for compaction/garbage collection. So explicit writeclose call from client to bookies ensemble of that ledger is needed for better handling of entrylogs. - Both the time based eviction and removal policy and explicit writeclose call are required because not in all cases explicit write close calls to bookies are guaranteed, like during ensemble change of ledger, client crash and unreliable write close protocol. Advisory Write Close is explained below in detail. - For this feature I need to make changes to checkpoint logic. Currently with BOOKKEEPER-564 change, we are scheduling checkpoint only when current entrylog file is rotated. So we dont call 'flushCurrentLog' when we checkpoint. But for this feature, since there are going to be multiple active entrylogs, scheduling checkpoint when entrylog file is rotated, is not an option. So I need to call flushCurrentLogs when checkpoint is made for every 'flushinterval' period - With entrylogperledger feature we are not changing format of the entrylog contents in anyway, so it should be possible to switch entrylogperledger configuration back and forth. **Advisory Write Close implementation details** - Advisory write close messag
[GitHub] reddycharan commented on issue #570: Multiple active entrylogs
reddycharan commented on issue #570: Multiple active entrylogs URL: https://github.com/apache/bookkeeper/issues/570#issuecomment-368185000 For this work item, there are two possible approaches 1) is to have configurable number of active entrylogs per ledgerdir 2) active entrylog per ledger For the first approach I mentioned the design overview earlier. But we decided to go with the second approach for the following reasons There are couple of major benefits by going with entrylogperledger. The implementation would become simpler, because it is a complex logic to handle and maintain slotmap in configured number of Entrylogs per LedgerDir design/implementation and it needs to be thoroughly analyzed/tested from multi-thread perspective. But most importantly the garbagecollection component becomes much simpler because there is no need of compaction in entrylog per ledger approach. So there is huge gain in indirect performance because of absence of IO activity for compaction and huge improvement in space reclamation during compaction. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services