On Thu, Jan 12, 2017 at 10:33 AM, Sebastián Schepens < [email protected]> wrote:
> On Thu, Jan 12, 2017 at 3:20 PM Sijie Guo <[email protected]> wrote: > >> On Thu, Jan 12, 2017 at 9:12 AM, Sebastián Schepens <sebastian.schepens@ >> mercadolibre.com> wrote: >> >> Pulsar is creating the ledgers. This ledgers should get written when a >> topic receives messages I guess, but these must be idle topics. Ledgers >> should be closed no errors or once every period of time to allow rotation. >> >> The default grace period for open ledgers is 30s, theoretically, but >> shouldn't clients close ledgers when a node disconnects? >> >> >> Ideally I think pulsar broker should close ledgers periodically on >> rotations. I guess probably there are idle topics, so the ledgers used for >> those topics are empty and not closed by pulsar broker. >> > > I'm gonna see if I can find this logic. > > >> >> >> Perhaps this is not happening because the ledgers aren't currently being >> written? >> Even so, ledgers in the grace period should be listed as underreplicated, >> shouldn't them? As I've said, before turning off another bookie I waited >> till all ledgers were replicated. >> >> >> So there is a logic in auto recovery: >> >> when it detects a ledger is missing bookies, it will mark it as >> under-replicated and the replication worker (which is the auto recovery >> daemon) will start replicate those under-replicated ledgers. If there are >> open ledgers, it doesn't replicate the ledger immediately. It defers the >> action in openLedgerRereplicationGracePeriod period (which is the 30 >> seconds). After openLedgerRereplicationGracePeriod period, it forces >> fencing the ledger and releases the lock for replicating this ledger. so >> that this ledger can be replicated later by any replication worker. >> >> In theory, if you waited until all ledgers were replicated (means no >> ledgers are marked as under-replicated), those ledgers should already >> successfully be re-replicated. >> > > This is precisely what I thought. > > >> There is one possibility that I can think of - there are ledgers created >> after the auditor of auto recovery audits all the existing ledgers. What is >> your auditorPeriodicBookieCheckInterval ? >> > > But, wouldn't ledgers created after the audit exclude the failed bookie? I > mean, the audit started because a node went down, new ledgers should > exclude that node. > That is correct. One possibility is the pulsar broker detects the failed bookie later than auditor detects it. One simple thing to try to confirm if it is this case: can you do the rolling replacement when there is no traffic? > We have auditorPeriodicBookieCheckInterval at the default which 86400 > seconds. I understand that running this check very often could bring issues > as it stresses zookeeper a lot. > Never mind at this part. The auditor will start auditing when detecting a bookie is lost from zookeeper. > > Another question about quorums, say I have 3 write quorum and 3 ack > quorum, that would theoretically be able to handle a loss of 2 nodes as > well, wouldn't it? > Ah, you are right. My comment in previous email is wrong - it should be ack quorum size larger than the num of failures. > > Thanks, > Sebastian > > >> - Sijie >> >> >> >> Thanks for the tip on the quorums! >> >> Sebastian >> >> On Thu, Jan 12, 2017 at 1:31 PM Sijie Guo <[email protected]> wrote: >> >> I see. Let me ask one more questions - how do you create ledgers? And >> when do you write these ledgers and when do you close them. >> >> I think they are probably just empty ledgers at the time you were >> rolling. There is a setting in the recovery tool to force close the open >> ledgers. I need to check and confirm that. >> >> >> >> On Jan 12, 2017 6:14 AM, "Sebastián Schepens" <sebastian.schepens@ >> mercadolibre.com> wrote: >> >> Sijie, >> We were replacing all our nodes and testing how to do it best without >> affecting the cluster. >> >> This same thing happened again yesterday. I have 4 underreplicated >> ledgers, which are empty. >> But this time, I turned off bookies on by one, and waiting for all >> underreplicated ledgers to replicate before turning off another bookie. >> Even while doing this 'rolling' replace, I ended up with inconsistent >> ledgers. How can this be possible? >> One would expect that when there are no underreplicated ledgers, it would >> be safe to loose a machine. >> >> What's the recommended quorum setup if I wanted to safely tolerate 2 >> machine failure? >> >> >> If you want to tolerate 2 failures, you need to write quorum size - ack >> quorum size to be larger than or equal to 2. >> >> >> Thanks, >> Sebastian >> >> >> On Wed, Jan 11, 2017 at 5:04 PM Sijie Guo <[email protected]> wrote: >> >> On Wed, Jan 11, 2017 at 11:15 AM, Sebastián Schepens <sebastian.schepens@ >> mercadolibre.com> wrote: >> >> Hi guys, >> I'm doing some tests and turned off 2 bookies almost simultaneously >> hoping that all the ledgers would still be able to replicate since we have >> ensemble and quorum size of 3. >> Almost all ledgers managed to replicate using the autorecovery daemon >> except for 5. What's curious about this 5 ledgers is that they are all >> empty and the only node which contains data for it claims it does not exist. >> >> Here's the ledger metadata for one of them: >> ledgerID: 772 >> BookieMetadataFormatVersion 2 >> quorumSize: 3 >> ensembleSize: 3 >> length: 0 >> lastEntryId: -1 >> state: IN_RECOVERY >> segment { >> ensembleMember: "10.64.103.57:3181" >> ensembleMember: "10.64.103.249:3181" >> ensembleMember: "10.64.102.95:3181" >> firstEntryId: 0 >> } >> digestType: CRC32 >> password: "" >> ackQuorumSize: 2 >> >> Where all nodes except 10.64.103.249 are down. >> >> And that node contains these logs: >> ERROR - [BookieReadThread-3181-10-1:ReadEntryProcessorV3@123] - No >> ledger found while reading entry:-1 from ledger: 772 >> >> >> They seem to be empty ledgers with no entries. >> >> >> >> I don't understand how these ledgers ended in this state, is it >> recoverable? >> >> >> If the ledgers are closed, if you lose two bookies, the re-replication >> can replicate the data correctly. As when the ledger is in closed state, it >> will contains the last entry id in the metadata, it would use the >> information to determine the state of the ledger and replicate data >> correctly. >> >> However, if the ledgers are open and you lost two bookies (which is the >> majority of your quorum), the client can't make a decision what is the last >> entry id based on only one left bookie, so it can not close/seal the ledger >> correctly. >> >> Can you explain more about your tests? It would help me understand more >> about that. >> >> >> >> I could just delete the ledgers cause they are empty too. By the way, >> bookkeeper shell should have a command for deleting ledgers. >> >> >> Yeah, this is a good suggestion. Do you mind creating a jira for adding >> the delete ledger command? >> >> >> >> Thanks, >> Sebastian >> >> >> >>
