On Thu, Jan 12, 2017 at 3:20 PM Sijie Guo <[email protected]> wrote: > On Thu, Jan 12, 2017 at 9:12 AM, Sebastián Schepens < > [email protected]> wrote: > > Pulsar is creating the ledgers. This ledgers should get written when a > topic receives messages I guess, but these must be idle topics. Ledgers > should be closed no errors or once every period of time to allow rotation. > > The default grace period for open ledgers is 30s, theoretically, but > shouldn't clients close ledgers when a node disconnects? > > > Ideally I think pulsar broker should close ledgers periodically on > rotations. I guess probably there are idle topics, so the ledgers used for > those topics are empty and not closed by pulsar broker. >
I'm gonna see if I can find this logic. > > > Perhaps this is not happening because the ledgers aren't currently being > written? > Even so, ledgers in the grace period should be listed as underreplicated, > shouldn't them? As I've said, before turning off another bookie I waited > till all ledgers were replicated. > > > So there is a logic in auto recovery: > > when it detects a ledger is missing bookies, it will mark it as > under-replicated and the replication worker (which is the auto recovery > daemon) will start replicate those under-replicated ledgers. If there are > open ledgers, it doesn't replicate the ledger immediately. It defers the > action in openLedgerRereplicationGracePeriod period (which is the 30 > seconds). After openLedgerRereplicationGracePeriod period, it forces > fencing the ledger and releases the lock for replicating this ledger. so > that this ledger can be replicated later by any replication worker. > > In theory, if you waited until all ledgers were replicated (means no > ledgers are marked as under-replicated), those ledgers should already > successfully be re-replicated. > This is precisely what I thought. > There is one possibility that I can think of - there are ledgers created > after the auditor of auto recovery audits all the existing ledgers. What is > your auditorPeriodicBookieCheckInterval ? > But, wouldn't ledgers created after the audit exclude the failed bookie? I mean, the audit started because a node went down, new ledgers should exclude that node. We have auditorPeriodicBookieCheckInterval at the default which 86400 seconds. I understand that running this check very often could bring issues as it stresses zookeeper a lot. Another question about quorums, say I have 3 write quorum and 3 ack quorum, that would theoretically be able to handle a loss of 2 nodes as well, wouldn't it? Thanks, Sebastian > - Sijie > > > > Thanks for the tip on the quorums! > > Sebastian > > On Thu, Jan 12, 2017 at 1:31 PM Sijie Guo <[email protected]> wrote: > > I see. Let me ask one more questions - how do you create ledgers? And when > do you write these ledgers and when do you close them. > > I think they are probably just empty ledgers at the time you were rolling. > There is a setting in the recovery tool to force close the open ledgers. I > need to check and confirm that. > > > > On Jan 12, 2017 6:14 AM, "Sebastián Schepens" < > [email protected]> wrote: > > Sijie, > We were replacing all our nodes and testing how to do it best without > affecting the cluster. > > This same thing happened again yesterday. I have 4 underreplicated > ledgers, which are empty. > But this time, I turned off bookies on by one, and waiting for all > underreplicated ledgers to replicate before turning off another bookie. > Even while doing this 'rolling' replace, I ended up with inconsistent > ledgers. How can this be possible? > One would expect that when there are no underreplicated ledgers, it would > be safe to loose a machine. > > What's the recommended quorum setup if I wanted to safely tolerate 2 > machine failure? > > > If you want to tolerate 2 failures, you need to write quorum size - ack > quorum size to be larger than or equal to 2. > > > Thanks, > Sebastian > > > On Wed, Jan 11, 2017 at 5:04 PM Sijie Guo <[email protected]> wrote: > > On Wed, Jan 11, 2017 at 11:15 AM, Sebastián Schepens < > [email protected]> wrote: > > Hi guys, > I'm doing some tests and turned off 2 bookies almost simultaneously hoping > that all the ledgers would still be able to replicate since we have > ensemble and quorum size of 3. > Almost all ledgers managed to replicate using the autorecovery daemon > except for 5. What's curious about this 5 ledgers is that they are all > empty and the only node which contains data for it claims it does not exist. > > Here's the ledger metadata for one of them: > ledgerID: 772 > BookieMetadataFormatVersion 2 > quorumSize: 3 > ensembleSize: 3 > length: 0 > lastEntryId: -1 > state: IN_RECOVERY > segment { > ensembleMember: "10.64.103.57:3181" > ensembleMember: "10.64.103.249:3181" > ensembleMember: "10.64.102.95:3181" > firstEntryId: 0 > } > digestType: CRC32 > password: "" > ackQuorumSize: 2 > > Where all nodes except 10.64.103.249 are down. > > And that node contains these logs: > ERROR - [BookieReadThread-3181-10-1:ReadEntryProcessorV3@123] - No ledger > found while reading entry:-1 from ledger: 772 > > > They seem to be empty ledgers with no entries. > > > > I don't understand how these ledgers ended in this state, is it > recoverable? > > > If the ledgers are closed, if you lose two bookies, the re-replication can > replicate the data correctly. As when the ledger is in closed state, it > will contains the last entry id in the metadata, it would use the > information to determine the state of the ledger and replicate data > correctly. > > However, if the ledgers are open and you lost two bookies (which is the > majority of your quorum), the client can't make a decision what is the last > entry id based on only one left bookie, so it can not close/seal the ledger > correctly. > > Can you explain more about your tests? It would help me understand more > about that. > > > > I could just delete the ledgers cause they are empty too. By the way, > bookkeeper shell should have a command for deleting ledgers. > > > Yeah, this is a good suggestion. Do you mind creating a jira for adding > the delete ledger command? > > > > Thanks, > Sebastian > > > >
