Sebastian, Sorry for late response. (I didn't have network access over the weekend).
Let me try to answer your questions in sequence and then I will suggest a solution for you. *Re "2016-11-25 13:55:36,775 - ERROR - [New I/O boss #9:PerChannelBookieClient$1@**255] - Could not connect to bookie: [id: 0x3dc4f6b8]/10.3.3.41:3181 <http://10.3.3.41:3181>, current state CONNECTING :"* The logging here is expected. It is a bit noisy. I created https://issues.apache.org/jira/browse/BOOKKEEPER-978 to turn that logging into debug. *Re "The strange thing is that 10.3.3.41 is the autorecovery node, it seems to be trying to replicate data to itself, is this ok?"* It is a problem for running autorecovery in separate nodes. As it will use itself as the target bookie. This is a bug and we've fixed at https://github.com/twitter/bookkeeper/commit/fc7e171135a58cddb9e91f5f614b3ceb6f9f9fee . I will port this change to master in https://issues.apache.org/jira/browse/BOOKKEEPER-977 *Re "Is it normal for the recovery processes to run out of direct memory?"* I don't think it is normal. I suspected that it might be related to replicate too many ledgers at the same time. Do you mind me asking how many pulsar topics do you have? *Re "* *2016-11-25 18:15:44,792 - INFO - [main-EventThread:**LedgerFragmentReplicator@139] - Fragment :Fragment(LedgerID: 14401, FirstEntryID: 0[-1], LastKnownEntryID: -1[-1], Host: 10.3.2.* *158:3181, Closed: true) is split into sub fragments :[Fragment(LedgerID: 14401, FirstEntryID: 0[-1], LastKnownEntryID: -1[-1], Host: 10.3.2.158:3181 <http://10.3.2.158:3181>, Closed: true)]* *2016-11-25 18:15:44,792 - ERROR - [main-EventThread:* *LedgerFragmentReplicator$2@251**] - * *BK error reading ledger entry: -1"* You are right. The replicator worker should skip replicating empty ledgers. (https://issues.apache.org/jira/browse/BOOKKEEPER-979) I am going to port the fix today. I would suggest ignoring the logging or set the logging level to WARN for now. Then try to run the recover tool to replicate to a specific bookie (specify the target bookie), or try to run autorecovery as daemon in the bookie server right now. Once I finished porting the fixes for the above jiras, I will let you know and you can try the fixes from master to run autorecovery as separate daemons. Let me know if this helps. - Sijie On Fri, Nov 25, 2016 at 6:19 AM, Sebastián Schepens < sebastian.schep...@mercadolibre.com> wrote: > Guys, > Still seeing weird behavior. > Trying autorecovery on separate nodes, they've been running for a long > time just spitting error logs, but apparently moving data, but they never > seem to end and eventually throw OutOfDirectMemory errors (we have assigned > 20G direct memory to them...). > > I now have started autorecovery processes of version 4.4 (rest of the > cluster is 4.3), they throw the following errors and do not seem to be > moving data, all node's IO metrics are flat: > > 2016-11-25 13:55:36,775 - ERROR - [New I/O boss > #9:PerChannelBookieClient$1@255] - Could not connect to bookie: [id: > 0x3dc4f6b8]/10.3.3.41:3181, current state CONNECTING : > java.net.ConnectException: Connection refused: /10.3.3.41:3181 > at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) > at sun.nio.ch.SocketChannelImpl.finishConnect( > SocketChannelImpl.java:717) > at org.jboss.netty.channel.socket.nio.NioClientBoss. > connect(NioClientBoss.java:152) > at org.jboss.netty.channel.socket.nio.NioClientBoss. > processSelectedKeys(NioClientBoss.java:105) > at org.jboss.netty.channel.socket.nio.NioClientBoss. > process(NioClientBoss.java:79) > at org.jboss.netty.channel.socket.nio.AbstractNioSelector.run( > AbstractNioSelector.java:318) > at org.jboss.netty.channel.socket.nio.NioClientBoss.run( > NioClientBoss.java:42) > at org.jboss.netty.util.ThreadRenamingRunnable.run( > ThreadRenamingRunnable.java:108) > at org.jboss.netty.util.internal.DeadLockProofWorker$1.run( > DeadLockProofWorker.java:42) > at java.util.concurrent.ThreadPoolExecutor.runWorker( > ThreadPoolExecutor.java:1142) > at java.util.concurrent.ThreadPoolExecutor$Worker.run( > ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > > 2016-11-25 13:46:46,458 - ERROR - [BookKeeperClientWorker- > orderedsafeexecutor-1-0:BookkeeperInternalCallbacks$MultiCallback@125] - > Error in multi callback : -8 > 2016-11-25 13:46:46,458 - ERROR - [BookKeeperClientWorker- > orderedsafeexecutor-1-0:LedgerFragmentReplicator$2$1@274] - BK error > writing entry for ledgerId: 9449, entryId: 2536703, b > ookie: 10.3.3.41:3181 > org.apache.bookkeeper.client.BKException$BKBookieHandleNotAvailableExce > ption > at org.apache.bookkeeper.client.BKException.create( > BKException.java:62) > at org.apache.bookkeeper.client.LedgerFragmentReplicator$2$1. > writeComplete(LedgerFragmentReplicator.java:279) > at org.apache.bookkeeper.proto.BookieClient$1$1.safeRun( > BookieClient.java:189) > at org.apache.bookkeeper.util.SafeRunnable.run(SafeRunnable. > java:31) > at java.util.concurrent.Executors$RunnableAdapter. > call(Executors.java:511) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at java.util.concurrent.ThreadPoolExecutor.runWorker( > ThreadPoolExecutor.java:1142) > at java.util.concurrent.ThreadPoolExecutor$Worker.run( > ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > > The strange thing is that 10.3.3.41 is the autorecovery node, it seems to > be trying to replicate data to itself, is this ok? > > It's also attempting to connect to the failed node: > > 2016-11-25 13:55:36,777 - ERROR - [New I/O boss > #9:PerChannelBookieClient$1@255] - Could not connect to bookie: [id: > 0xb1b27b5b]/10.3.2.158:3181, current state CONNECTING : > java.net.ConnectException: Connection refused: /10.3.2.158:3181 > at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) > at sun.nio.ch.SocketChannelImpl.finishConnect( > SocketChannelImpl.java:717) > at org.jboss.netty.channel.socket.nio.NioClientBoss. > connect(NioClientBoss.java:152) > at org.jboss.netty.channel.socket.nio.NioClientBoss. > processSelectedKeys(NioClientBoss.java:105) > at org.jboss.netty.channel.socket.nio.NioClientBoss. > process(NioClientBoss.java:79) > at org.jboss.netty.channel.socket.nio.AbstractNioSelector.run( > AbstractNioSelector.java:318) > at org.jboss.netty.channel.socket.nio.NioClientBoss.run( > NioClientBoss.java:42) > at org.jboss.netty.util.ThreadRenamingRunnable.run( > ThreadRenamingRunnable.java:108) > at org.jboss.netty.util.internal.DeadLockProofWorker$1.run( > DeadLockProofWorker.java:42) > at java.util.concurrent.ThreadPoolExecutor.runWorker( > ThreadPoolExecutor.java:1142) > at java.util.concurrent.ThreadPoolExecutor$Worker.run( > ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > > If I execute a manual recovery I get the same errors: > 2016-11-25 14:16:56,438 - ERROR - [New I/O boss > #9:PerChannelBookieClient$1@255] - Could not connect to bookie: [id: > 0x3e258d5b]/10.3.2.158:3181, current state CONNECTING : > java.net.ConnectException: Connection refused: /10.3.2.158:3181 > at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) > at sun.nio.ch.SocketChannelImpl.finishConnect( > SocketChannelImpl.java:717) > at org.jboss.netty.channel.socket.nio.NioClientBoss. > connect(NioClientBoss.java:152) > at org.jboss.netty.channel.socket.nio.NioClientBoss. > processSelectedKeys(NioClientBoss.java:105) > at org.jboss.netty.channel.socket.nio.NioClientBoss. > process(NioClientBoss.java:79) > at org.jboss.netty.channel.socket.nio.AbstractNioSelector.run( > AbstractNioSelector.java:318) > at org.jboss.netty.channel.socket.nio.NioClientBoss.run( > NioClientBoss.java:42) > at org.jboss.netty.util.ThreadRenamingRunnable.run( > ThreadRenamingRunnable.java:108) > at org.jboss.netty.util.internal.DeadLockProofWorker$1.run( > DeadLockProofWorker.java:42) > at java.util.concurrent.ThreadPoolExecutor.runWorker( > ThreadPoolExecutor.java:1142) > at java.util.concurrent.ThreadPoolExecutor$Worker.run( > ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > > But it seems to be moving data now. > > Could you guys give me some insight? > Are these logs normal? If so, why are they level error? Why is the > autorecovery process trying to connect to itself and not replicating data > to other nodes? Is there a way to know the progress of the recovery > process? Is it normal for the recovery processes to run out of direct > memory? > > Thanks, > Sebastian > > > On Wed, Nov 23, 2016 at 11:33 AM Sebastián Schepens <sebastian.schepens@ > mercadolibre.com> wrote: > >> Sijie, >> Yes, that's precisely what I meant, we're running separate autorecovery >> processes, not daemons on all nodes. >> >> Autorecovery processes run quietly until I stop a node, as soon as I stop >> a node, they're plagued with logs like the following, where the stopped >> node is (10.3.2.56): >> >> 2016-11-22 17:34:06,085 - ERROR - [bookkeeper-io-1-1: >> PerChannelBookieClient$2@284] - Could not connect to bookie: [id: >> 0xd3b0c759, L:/10.3.3.42:45164]/10.3.2.56:3181, current state CONNECTING >> : >> java.net.ConnectException: syscall:getsockopt(...): /10.3.2.56:3181 >> >> There seems to be waves of thousands and thousands of these logs while >> some data movement seems to be occurring, but it's really weird that it's >> constantly trying to connect to the failed node. >> Couldn't it realize it's down because it's not shown as available on >> zookeeper? >> >> We also see a couple of this logs, but really few of them compared to the >> previous. >> >> 2016-11-23 14:28:01,661 - WARN - [ReplicationWorker: >> RackawareEnsemblePlacementPolicy@543] - Failed to choose a bookie: >> excluded [<Bookie:10.3.2.57:3181>, <Bookie:10.3.2.195:3181> >> , <Bookie:10.3.2.158:3181>], fallback to choose bookie randomly from the >> cluster. >> >> >> The cluster currently has 6 nodes, and as I said before we're using >> ensemble size 3, write quorum 3 and ack quorum 2. >> >> Thanks, >> Sebastian >> >> On Tue, Nov 22, 2016 at 2:10 PM Sijie Guo <si...@apache.org> wrote: >> >> I think what Sebastian said is that manual recovery didn't even work. >> This seems to a bit strange to me. The autorecovery will check if the >> bookie is available or not. After that, it should rereplicate the data from >> other nodes in the ensemble. This seems to indicate something is broken. >> Sebastian, Can you point us some loggings? >> >> Sijie >> >> On Nov 19, 2016 9:46 AM, "Rithin Shetty" <rit...@gmail.com> wrote: >> >> A few things to note: Make sure 'autoRecoveryDaemonEnabled' set to true >> on all the bookie conf files; by default this is false. Otherwise recovery >> will not work. The auto recovery process tries to make sure that the data >> doesn't exist on the source node before replicating to destination. That >> might be the reason why it is trying to talk to the source node. >> >> --Rithin >> >> On Fri, Nov 18, 2016 at 12:00 PM, Sebastián Schepens <sebastian.schepens@ >> mercadolibre.com> wrote: >> >> Hi guys, >> I'm running into some issues while trying to recover a decomissioned node. >> Both the recovery command and autorecovery processes fail trying to >> connect to the failing node, which seems reasonable because the node is >> down. >> But I don't get why it's trying to connect to that node, according to the >> documentation it should pull ledger data from other nodes in the ensemble >> (3) and replicate them. >> The recovery command also seems to completely ignore the destination node >> given as third argument. >> >> Could someone give us some help? >> Thanks, >> Sebastian >> >> >>