Util.readTxnBytes reads from the buffer and if the length is 0, it return
the zero length array, seemingly indicating the end of the file.
Then this is detected in FileTxnLog.java:671:
byte[] bytes = Util.readTxnBytes(ia);
// Since we preallocate, we define EOF to be an
if (bytes == null || bytes.length==0) {
throw new EOFException("Failed to read " + logFile);
}
This exception is caught a few lines later, and the streams closed etc.
So this seems to be not really an error condition, but a signal that the
entire file has been read? Is this exception a red herring?
On Mon, Jul 7, 2014 at 11:50 AM, Raúl Gutiérrez Segalés <[email protected]
> wrote:
> On 7 July 2014 09:39, Aaron Zimmerman <[email protected]> wrote:
>
> > What I don't understand is how the entire cluster could die in such a
> > situation. I was able to load zookeeper locally using the snapshot and
> 10g
> > log file without apparent issue.
>
>
> Sure, but it's syncing up with other learners that becomes challenging when
> having either big snapshots or too many txnlogs, right?
>
>
> > I can see how large amounts of data could
> > cause latency issues in syncing causing a single worker to die, but how
> > would that explain the node's inability to restart? When the server
> > replays the log file, does it have to sync the transactions to other
> nodes
> > while it does so?
> >
>
> Given that your txn churn is so big, by the time it finished up reading
> from disc it'll need
> to catch up with the quorum.. how many txns have happened by that point? By
> the way, we use
> this patch:
>
> https://issues.apache.org/jira/browse/ZOOKEEPER-1804
>
> to measure transaction rate, do you have any approximation of what your
> transaction rate might be?
>
>
> >
> > I can alter the settings as has been discussed, but I worry that I'm just
> > delaying the same thing from happening again, if I deploy another storm
> > topology or something. How can I get the cluster in a state where I can
> be
> > confident that it won't crash in a similar way as load increases, or at
> > least set up some kind of monitoring that will let me know something is
> > unhealthy?
> >
>
> I think it depends on what your txn rate is, lets measure that first I
> guess.
>
>
> -rgs
>