[GitHub] sijie commented on issue #774: Open should run recovery by default
sijie commented on issue #774: Open should run recovery by default URL: https://github.com/apache/bookkeeper/pull/774#issuecomment-347651531 @ivankelly : I think @jiazhai also put down his opinion as well (just to make sure everyone that express the opinion in this thread is not missed) This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] sijie commented on issue #774: Open should run recovery by default
sijie commented on issue #774: Open should run recovery by default URL: https://github.com/apache/bookkeeper/pull/774#issuecomment-347371758 "Personally, I consider that I know what I'm doing in this area, and I got it wrong when updating the tutorial, and I only spotted the mistake by pure chance. As for reading the docs, I generally only do that when something is non-obvious. createOpenLedgerOp looks easy to use and its usage seems obvious, so why would people carefully read the docs for it?" @ivankelly "why would people carefully read the docs for it?" - Because you built the original API. In your mind, "open a ledger" is fencing and close the ledger. It doesn't mean people will have same assumption that "open a ledger" will fence and close the ledger. In contrast, if you ask most of the people who doesn't work with bookkeeper before, what are their expectations of a `open for read` behavior. I believe most of people will assume `open for read` generally has no side effects to the ledger or the writer. A default behavior for `open` should be as general as other storage system, no side effects (you can open as many time as you can without impacting others), easy to reason about without reading any documentation. `open-to-close` behavior is a special case for general `open`, it is used when a leader (writer) failed, it wants to fence the previous ledger and open a new one. because `open-to-close` is a special case, it should be called out explicitly. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] sijie commented on issue #774: Open should run recovery by default
sijie commented on issue #774: Open should run recovery by default URL: https://github.com/apache/bookkeeper/pull/774#issuecomment-347368506 @ivankelly I think this problem here is more about an assumption problem. All the points you made here assume one thing - bookkeeper can only used in the use case you described: people open a ledger is to fence and close the writer. Also you have this assumption in your mind is more because you added this feature before. This assumption was kept in your mind for so long. If we are growing bookkeeper beyond this use case, we should look beyond this assumption and learn what is the most common behavior for opening and what is more neutral behavior when people use bookkeeper. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] sijie commented on issue #774: Open should run recovery by default
sijie commented on issue #774: Open should run recovery by default URL: https://github.com/apache/bookkeeper/pull/774#issuecomment-347057001 @jvrao @jiazhai and other people are welcome to chime in. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] sijie commented on issue #774: Open should run recovery by default
sijie commented on issue #774: Open should run recovery by default URL: https://github.com/apache/bookkeeper/pull/774#issuecomment-347056937 the summary of my opinion on this : - from a storage system perspective, open should not have any implicit impacts on the current writer. if open wants to have a side effort (e.g. fence and recovery), it should be called out explicitly either by `withRecovery(true)` or `withOpenFlags(O_FENCE)`. - from user education/adoption awareness, a `multiple readers` semantic has more use cases that a `single reader` semantic. from this sense, we should encourage tailing and have recovery default to `false`. - the original api was designed for logging, `openLedgerNoRecovery` was added after `openLedger`. if we want to grow beyond logging into a storage system, we should have openLedger with no recovery as default. It is what most of the people would expect from a storage system. - regarding `safer` point, if a user already used bookkeeper and he/she knew bookkeeper, knew the difference between openLedger and openLedgerNoRecovery, when he migrates to the new API, they will find a way mapping openLedger and openLedgerNoRecovery into the new builder API. Javadoc, documentation, and release notes will provide the guide for existing users. It is not a problem for existing users. We should develop the storage mindshare for people who is new to bookkeeper, where *open ledger doesn't implicit close current writer* is making more sense as a storage. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] sijie commented on issue #774: Open should run recovery by default
sijie commented on issue #774: Open should run recovery by default URL: https://github.com/apache/bookkeeper/pull/774#issuecomment-346910714 Regarding, TOAB. LAC is the key to achieve TOAB, and make bookkeeper meaningful in multiple reader use cases. It is not a sidecar thing. Without LAC, you only can read entries after closing a ledger. The whole bookkeeper can only be used in single writer and single reader use cases. We have different views on LAC, is just because of the way how we have been used bookkeeper. In HDFS name mode, hedwig, pulsar, even pravega, bookkeeper is only used in single writer and single reader use cases, in those cases consistency on writer is important, fencing and recovery is the key. In distributed log, we use bk for streaming, multiple reader use cases, LAC is the key. Both of them are needed at some sense and key to the success of TOAB. There is no sense for the discussion to continue, I would stop here. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] sijie commented on issue #774: Open should run recovery by default
sijie commented on issue #774: Open should run recovery by default URL: https://github.com/apache/bookkeeper/pull/774#issuecomment-346908459 I don't think we should go down with split. It doesn't make any sense. The whole idea of a fluent style API is to avoid such split. Coming from storage background, it is more natural to have there in mind: create to get a handle to write, open to get a handle to read. Recovery or no recovery is the flag provided to open, indicating what will be the open behavior. Ideally this should be done with read flags, recovery is the flag of O_FENCE or O_RECOVERY. It should be as easy and natural as how people use a filesystem: open a file to read, open doesn't have any impacts to writers unless a O_FENCE flag is explicitly provided. from a storage system perspective, I would prefer a no recovery behavior as default. People shouldn't think too much about open implicitly fence and close a ledger. But again, I wouldn't block this PR, if other people want to keep the default behavior. Lastly, if we want BookKeeper to grow beyond a logging system to become a storage, we need to think and design a API that people have common sense on a storage system. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] sijie commented on issue #774: Open should run recovery by default
sijie commented on issue #774: Open should run recovery by default URL: https://github.com/apache/bookkeeper/pull/774#issuecomment-346891627 @ivankelly as I said in my previous comment, I am -0 for this. that means I wouldn't block the change making recovery as default. I don't like splitting the builder into two. the different views on LAC exactly reflects the views on how bookkeeper would look like and can be used for. LAC is not an optimization (it might be an optimization when you guys initially developed it for hdfs namenode). However, LAC is the most beautifully thing when you comes from streaming and tailing world. And it is the key differentiator how bookkeeper is different from a traditional storage system like filesystem, blobstore. LAC is the mechanism for propagating how your length of a stream grow and provides the consistency boundary for simple repeatable read consistency. while fencing and recovery can't. fencing and recovery are the mechanism for preventing split brain, which is more on writer side. Many other storage systems have fencing and recovery mechanism. It is just a way to prevent split brain. They might do it in many different ways. However in a bytes-orient storage system, like blob store, filesystem, you can't have LAC. LAC is unique to bookkeeper, because it is record-orient storage system. LAC makes bookkeeper so unique for streaming and tailing. That is something I would like us to emphasize and educate people to use bookkeeper. The reason why do you think fencing and recovery is cool in bookkeeper, is exactly because the way how you use bookkeeper. Most of the use cases like hdfs namenode, hedwig, pulsar are only use bookkeeper for writes. You need fencing and recovery to prevent split brain when ownership or leadership change. But this category of use cases is so limited. in distributedlog, at Twitter, we use bookkeeper in a lot of streaming and tailing use cases, where LAC is the soul. It guarantees TOAB and simple repeatable consistency while you do streaming and tailing. That's where bookkeeper is so unique from other storage systems. If we want bookkeeper to grow into a large community, we have to educate people in a way that people will realize this uniqueness. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] sijie commented on issue #774: Open should run recovery by default
sijie commented on issue #774: Open should run recovery by default URL: https://github.com/apache/bookkeeper/pull/774#issuecomment-346684200 yes we intentionally made no recovery as default. because that is what people would typically expect for a "single-writer, multiple-readers" semantic. If people want to open and recovery a ledger, `withRecovery(true)` should be used for people aware of what exactly happen regarding "recovery". This is a new API we are adding for bookkeeper. so it doesn't need to carry the old behavior. so I am -1 on reverting the default back to true. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services