Thanks for reporting back, Aaron. Shall we close the jira you created? -Flavio
> -----Original Message----- > From: Aaron Zimmerman [mailto:[email protected]] > Sent: 14 July 2014 16:21 > To: [email protected] > Subject: Re: entire cluster dies with EOFException > > Closing the loop on this, It appears that upping the initLimit did resolve the > issue. Thanks all for the help. > > Thanks, > > Aaron Zimmerman > > > On Tue, Jul 8, 2014 at 4:40 PM, Flavio Junqueira < > [email protected]> wrote: > > > Agreed, but we need that check because we expect bytes for the > > checksum computation right underneath. The bit that's odd is that we > > make the same check again below: > > > > try { > > long crcValue = ia.readLong("crcvalue"); > > byte[] bytes = Util.readTxnBytes(ia); > > // Since we preallocate, we define EOF to be an > > if (bytes == null || bytes.length==0) { > > throw new EOFException("Failed to read " + logFile); > > } > > // EOF or corrupted record > > // validate CRC > > Checksum crc = makeChecksumAlgorithm(); > > crc.update(bytes, 0, bytes.length); > > if (crcValue != crc.getValue()) > > throw new IOException(CRC_ERROR); > > if (bytes == null || bytes.length == 0) > > return false; > > hdr = new TxnHeader(); > > record = SerializeUtils.deserializeTxn(bytes, hdr); > > } catch (EOFException e) { > > > > I'm moving this discussion, to the jira, btw. > > > > -Flavio > > > > On 07 Jul 2014, at 22:03, Aaron Zimmerman > > <[email protected]> > > wrote: > > > > > Flavio, > > > > > > Yes that is the initial error, and then the nodes in the cluster are > > > restarted but fail to restart with > > > > > > 2014-07-04 12:58:52,734 [myid:1] - INFO [main:FileSnap@83] - > > > Reading snapshot /var/lib/zookeeper/version-2/snapshot.300011fc0 > > > 2014-07-04 12:58:52,896 [myid:1] - DEBUG > > > [main:FileTxnLog$FileTxnIterator@575] - Created new input stream > > > /var/lib/zookeeper/version-2/log.300000021 > > > 2014-07-04 12:58:52,915 [myid:1] - DEBUG > > > [main:FileTxnLog$FileTxnIterator@578] - Created new input archive > > > /var/lib/zookeeper/version-2/log.300000021 > > > 2014-07-04 12:59:25,870 [myid:1] - DEBUG > > > [main:FileTxnLog$FileTxnIterator@618] - EOF excepton > > java.io.EOFException: > > > Failed to read /var/lib/zookeeper/version-2/log.300000021 > > > 2014-07-04 12:59:25,871 [myid:1] - DEBUG > > > [main:FileTxnLog$FileTxnIterator@575] - Created new input stream > > > /var/lib/zookeeper/version-2/log.300011fc2 > > > 2014-07-04 12:59:25,872 [myid:1] - DEBUG > > > [main:FileTxnLog$FileTxnIterator@578] - Created new input archive > > > /var/lib/zookeeper/version-2/log.300011fc2 > > > 2014-07-04 12:59:48,722 [myid:1] - DEBUG > > > [main:FileTxnLog$FileTxnIterator@618] - EOF excepton > > java.io.EOFException: > > > Failed to read /var/lib/zookeeper/version-2/log.300011fc2 > > > > > > Thanks, > > > > > > AZ > > > > > > > > > On Mon, Jul 7, 2014 at 3:33 PM, Flavio Junqueira < > > > [email protected]> wrote: > > > > > >> I'm a bit confused, the stack trace you reported was this one: > > >> > > >> [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:Follower@89] - Exception > > >> when following the leader java.io.EOFException > > >> at java.io.DataInputStream.readInt(DataInputStream.java:375) > > >> at > > >> > org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63) > > >> at > > >> > > > org.apache.zookeeper.server.quorum.QuorumPacket.deserialize(QuorumPa > ck > > et.java:83) > > >> at > > >> > > org.apache.jute.BinaryInputArchive.readRecord(BinaryInputArchive.java: > > 108) > > >> at > > >> > org.apache.zookeeper.server.quorum.Learner.readPacket(Learner.java:152) > > >> at > > >> > > > org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java > > :85) > > >> at > > >> > org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:7 > > >> 40) > > >> > > >> > > >> That's in a different part of the code. > > >> > > >> -Flavio > > >> > > >> On 07 Jul 2014, at 18:50, Aaron Zimmerman > > >> <[email protected]> > > >> wrote: > > >> > > >>> Util.readTxnBytes reads from the buffer and if the length is 0, it > > return > > >>> the zero length array, seemingly indicating the end of the file. > > >>> > > >>> Then this is detected in FileTxnLog.java:671: > > >>> > > >>> byte[] bytes = Util.readTxnBytes(ia); > > >>> // Since we preallocate, we define EOF to be an > > >>> if (bytes == null || bytes.length==0) { > > >>> throw new EOFException("Failed to read " + logFile); > > >>> } > > >>> > > >>> > > >>> This exception is caught a few lines later, and the streams closed etc. > > >>> > > >>> So this seems to be not really an error condition, but a signal > > >>> that > > the > > >>> entire file has been read? Is this exception a red herring? > > >>> > > >>> > > >>> > > >>> > > >>> On Mon, Jul 7, 2014 at 11:50 AM, Raúl Gutiérrez Segalés < > > >> [email protected] > > >>>> wrote: > > >>> > > >>>> On 7 July 2014 09:39, Aaron Zimmerman > > >>>> <[email protected]> > > >> wrote: > > >>>> > > >>>>> What I don't understand is how the entire cluster could die in > > >>>>> such a situation. I was able to load zookeeper locally using > > >>>>> the snapshot > > and > > >>>> 10g > > >>>>> log file without apparent issue. > > >>>> > > >>>> > > >>>> Sure, but it's syncing up with other learners that becomes > > >>>> challenging > > >> when > > >>>> having either big snapshots or too many txnlogs, right? > > >>>> > > >>>> > > >>>>> I can see how large amounts of data could cause latency issues > > >>>>> in syncing causing a single worker to die, but > > how > > >>>>> would that explain the node's inability to restart? When the > > >>>>> server replays the log file, does it have to sync the > > >>>>> transactions to other > > >>>> nodes > > >>>>> while it does so? > > >>>>> > > >>>> > > >>>> Given that your txn churn is so big, by the time it finished up > > reading > > >>>> from disc it'll need > > >>>> to catch up with the quorum.. how many txns have happened by that > > >> point? By > > >>>> the way, we use > > >>>> this patch: > > >>>> > > >>>> https://issues.apache.org/jira/browse/ZOOKEEPER-1804 > > >>>> > > >>>> to measure transaction rate, do you have any approximation of > > >>>> what > > your > > >>>> transaction rate might be? > > >>>> > > >>>> > > >>>>> > > >>>>> I can alter the settings as has been discussed, but I worry that > > >>>>> I'm > > >> just > > >>>>> delaying the same thing from happening again, if I deploy > > >>>>> another > > storm > > >>>>> topology or something. How can I get the cluster in a state > > >>>>> where I > > >> can > > >>>> be > > >>>>> confident that it won't crash in a similar way as load > > >>>>> increases, or > > at > > >>>>> least set up some kind of monitoring that will let me know > > >>>>> something > > is > > >>>>> unhealthy? > > >>>>> > > >>>> > > >>>> I think it depends on what your txn rate is, lets measure that > > >>>> first I guess. > > >>>> > > >>>> > > >>>> -rgs > > >>>> > > >> > > >> > > > >
