Re: Write retry issues with ZooKeeperClient

2017-06-27 Thread Sam Just
JV: What do you mean by "May not be perfect for negative testing"?

I don't think there's anything inevitable about this particular class of
behavior.  ZK could have chosen to avoid this problem entirely by doing
duplicate op detection server-side with a per-session transaction log.

Since it doesn't, we'll need to solve it ourselves.
ZkLedgerUnderreplicationManager relies on either getting success or
NodeExists on the ephemeral lock node to determine whether a particular
ReplicationWorker is responsible for replicating a particular ledger.  I
haven't reproduced this one, but it seems to me that two workers could both
try to replicate the same ledger and *both get NodeExists* on the lock
node.  This would leave the ledger locked for replication until whichever
one actually wrote the node restarts.

Users like the above are pretty easy to fix.  One option would be to simply
include with the write a payload including a nonce for the client.  Upon a
ConnectionLoss event, we read the node and determine whether we "won".  I
think ledger creation probably falls into the same category, the metadata
could include an identifier for the creator.

I'm less sure about AbstractZkLedgerManager.writeLedgerMetadata.  It relies
on setData success/BadVersion atomicity to ensure consistent rmw.  It's
certainly solvable by including a version vector, but I think there's
probably a simpler way to do it.  I'm not sure under what circumstances
there even *can* be racing writes to the metadata -- is the only case that
matters concurrent updates between the writer and a single replication
worker?

The outline is probably:
1) Add a way to do retries automatically on *unconditional* creates and
deletes (there are some users that don't care).
2) Audit and modify existing users to either use the methods introduced in
1 or handle the ConnectionLoss events explicitly.
3) Switch ZooKeeperClient to not retry conditional writes instead
propagating the ConnectionLoss event, that should make its semantics match
the vanilla ZK client.
-Sam

On Tue, Jun 27, 2017 at 8:59 AM, Sijie Guo <guosi...@gmail.com> wrote:

> I agree in current callback model, we only propagate error code to the
> outer client, so we lose the information about the detail cause.
>
> But I think we also tried to map some zk error codes to bk error codes.
>
> NoNode -> NoLedger
> NodeExists -> LedgerExists
> ··· (other codes)
>
> Also we tried to hide zk metadata related implementation behind interfaces,
> so some of the errors should be handled by the zk ledger manager
> implementation before propagating to the client.
>
> Sijie
>
> On Jun 27, 2017 7:32 AM, "Enrico Olivelli - Diennea" <
> enrico.olive...@diennea.com> wrote:
>
> > Il giorno mar, 27/06/2017 alle 13.45 +, Venkateswara Rao Jujjuri ha
> > scritto:
> >
> > This is nothing different in any network based system. Like nfs. So we
> need
> > to have some kind of logic on the client side to make reasonable
> > assumption. May not be perfect for negative testing.
> >
> >
> > Many times I wanted to have some "exception cause" on BKException,
> > expecially for ZK issues.
> > The way we use only int error codes hides the root causes of the error.
> > BookKeeper client writes to the log, but the "cause" cannot be reported
> to
> > higher level logs and sometime this is annoying.
> > In the future I would like to add more details on errors
> >
> > -- Enrico
> >
> >
> >
> >
> >
> > JV
> >
> > On Mon, Jun 26, 2017 at 11:19 PM Sijie Guo <guosi...@gmail.com guo
> > si...@gmail.com>> wrote:
> >
> >
> >
> > Hi Sam,
> >
> > Let's assume there is no such retry logic. How do you expect to handle
> this
> > situation?
> >
> > Can your application try to create a new ledger or catch NodeExists
> > exception?
> >
> > - Sijie
> >
> > On Mon, Jun 26, 2017 at 5:49 PM, Sam Just <sj...@salesforce.com > j...@salesforce.com>> wrote:
> >
> >
> >
> > Hmm, curator seems to have essentially the same problem:
> > https://issues.apache.org/jira/browse/CURATOR-268
> > I'm not sure there's a good way to solve this transparently -- the right
> > answer is
> > probably to plumb the ConnectionLoss event through ZooKeeperClient for
> > interested callers, audit for metadata users where we depend on
> >
> >
> > atomicity,
> >
> >
> > and update each one to handle it appropriately.
> > -Sam
> >
> > On Mon, Jun 26, 2017 at 4:34 PM, Sam Just <sj...@salesforce.com > j...@salesforce.com>> wrote:
> >
> >
> >
> > BookKeeper has a wrapp

Re: Write retry issues with ZooKeeperClient

2017-06-27 Thread Sam Just
On Tue, Jun 27, 2017 at 2:29 PM, Sijie Guo <guosi...@gmail.com> wrote:

> On Tue, Jun 27, 2017 at 10:18 AM, Sam Just <sj...@salesforce.com> wrote:
>
> > JV: What do you mean by "May not be perfect for negative testing"?
> >
> > I don't think there's anything inevitable about this particular class of
> > behavior.  ZK could have chosen to avoid this problem entirely by doing
> > duplicate op detection server-side with a per-session transaction log.
> >
> > Since it doesn't, we'll need to solve it ourselves.
> > ZkLedgerUnderreplicationManager relies on either getting success or
> > NodeExists on the ephemeral lock node to determine whether a particular
> > ReplicationWorker is responsible for replicating a particular ledger.  I
> > haven't reproduced this one, but it seems to me that two workers could
> both
> > try to replicate the same ledger and *both get NodeExists* on the lock
> > node.  This would leave the ledger locked for replication until whichever
> > one actually wrote the node restarts.
> >
> > Users like the above are pretty easy to fix.  One option would be to
> simply
> > include with the write a payload including a nonce for the client.  Upon
> a
> > ConnectionLoss event, we read the node and determine whether we "won".  I
> > think ledger creation probably falls into the same category, the metadata
> > could include an identifier for the creator.
> >
>
> for ephemeral znode, you don't have to add extra payload.
>
> in the retry logic,
>
> - catch NodeExists exception
> - call exists to check the znode. you can get the Stat (
> https://zookeeper.apache.org/doc/r3.4.6/api/org/apache/
> zookeeper/data/Stat.html#getEphemeralOwner()
> ) of this znode.
> - you can compare the Stat#getEphmeralOwner with the client's current
> session id. if they match, the node is created by this session, otherwise
> this node is created by other session.
>
>
Ah, I'll do that then, much easier.


>
> >
> > I'm less sure about AbstractZkLedgerManager.writeLedgerMetadata.  It
> > relies
> > on setData success/BadVersion atomicity to ensure consistent rmw.
>
> It's
> > certainly solvable by including a version vector, but I think there's
> > probably a simpler way to do it.  I'm not sure under what circumstances
> > there even *can* be racing writes to the metadata -- is the only case
> that
> > matters concurrent updates between the writer and a single replication
> > worker?
> >
>
> I don't think it can be addressed by a version vector, there is no public
> *history* about who changed what version.
> But the question is why do you need to address this #writeLedgerMetadata
> inside the ZooKeeperClient retry loop. I think there is already a logic on
> #writeLedgerMetadata to resolve conflicts due to BadVersion.
> The LedgerHandle will re-read the ledger metadata when encountering version
> conflicts.
>
>
That's the part I'm unsure of.  I know it has handling for BadVersion, I
don't know
whether it's still correct if you get a BadVersion even though your write
actually
succeeded, I need to look into that more.  Also, I'm not at all suggesting
we handle
this in the retry loop, I'm suggesting that for non-idempotent writes, we
should not
retry in ZooKeeperClient and instead propagate the ConnectionLoss error and
let the caller deal with it.
-Sam


>
> >
> > The outline is probably:
> > 1) Add a way to do retries automatically on *unconditional* creates and
> > deletes (there are some users that don't care).
> > 2) Audit and modify existing users to either use the methods introduced
> in
> > 1 or handle the ConnectionLoss events explicitly.
> > 3) Switch ZooKeeperClient to not retry conditional writes instead
> > propagating the ConnectionLoss event, that should make its semantics
> match
> > the vanilla ZK client.
> > -Sam
> >
> > On Tue, Jun 27, 2017 at 8:59 AM, Sijie Guo <guosi...@gmail.com> wrote:
> >
> > > I agree in current callback model, we only propagate error code to the
> > > outer client, so we lose the information about the detail cause.
> > >
> > > But I think we also tried to map some zk error codes to bk error codes.
> > >
> > > NoNode -> NoLedger
> > > NodeExists -> LedgerExists
> > > ··· (other codes)
> > >
> > > Also we tried to hide zk metadata related implementation behind
> > interfaces,
> > > so some of the errors should be handled by the zk ledger manager
> > > implementation before propagating to the client.
> > >
> > > Sijie
> > >
> 

Re: [VOTE] Release 4.5.0, release candidate #0

2017-08-08 Thread Sam Just
+1

On Tue, Aug 8, 2017 at 10:22 AM, Matteo Merli  wrote:

> +1
>
> Checked src and bin package
>  * Signatures ok
>  * Build
>  * Rat
>  * Run local bookie
>(I had to set allowLoopback=true in conf/bk_server.conf for that. I
> agree we can
> document it and improve it later.)
>
> Matteo
>
> On Mon, Aug 7, 2017 at 7:13 AM Venkateswara Rao Jujjuri  >
> wrote:
>
> > Hi everyone,
> >
> > Please review and vote on the release candidate #0 for version 4.5.0, as
> > follows:
> >
> > [ ] +1, Approve the release
> > [ ] -1, Do not approve the release (please provide specific comments)
> >
> > The complete staging area is available for your review, which includes:
> >
> > * Release Notes [1]
> > * The official Apache source and binary distributions to be deployed to
> > dist.apache.org [2]
> > * All artifacts to be deployed to the Maven Central Repository [3]
> > * Source code tag "release-4.5.0" [4]
> >
> > BookKeeper's KEY file contains PGP keys we use to sign this release:
> > https://dist.apache.org/repos/dist/release/bookkeeper/KEYS
> >
> > Please review this release candidate.
> >
> > - Review release notes
> > - Download the source package (verify md5, shasum, and asc) and follow
> the
> > instructions to build and run the bookkeeper service.
> > - Download the binary package (verify md5, shasum, and asc) and follow
> the
> > instructions to run the bookkeeper service.
> > - Review maven repo, release tag, licenses, and any other things you
> think
> > it is important to a release.
> >
> > [1] https://github.com/apache/bookkeeper/pull/402
> > [2] https://dist.apache.org/repos/dist/dev/bookkeeper/4.5.0-rc0/
> > [3] https://repository.apache.org/content/repositories/
> > orgapachebookkeeper-1012/
> > [4] https://github.com/apache/bookkeeper/tree/release-4.5.0
> >
> > --
> > Jvrao
> > ---
> > First they ignore you, then they laugh at you, then they fight you, then
> > you win. - Mahatma Gandhi
> >
> --
> Matteo Merli
> 
>


Write retry issues with ZooKeeperClient

2017-06-26 Thread Sam Just
BookKeeper has a wrapper class for the ZooKeeper client called
ZooKeeperClient.
Its purpose appears to be to transparently perform retries in the case that
ZooKeeper returns ConnectionLoss on an operation due to a Disconnect event.

The trouble is that it's possible that a write which received a
ConnectionLoss
error as a return value actually succeeded.  Once ZooKeeperClient retries,
it'll
get back NodeExists and propagate that error to the caller, even though the
write succeeded and the node in fact did not exist.

It seems as though the same issue would hold for setData and delete calls
which
use the version argument -- you could get a spurious BadVersion.

I believe I've reproduced the variant with a spurious NodeExists.  It
manifested as a suprious BKLedgerExistException when running against a 3
instance ZooKeeper cluster with dm-delay under the ZooKeeper instance
storage
to force Disconnect events by injecting write delays.  This seems to make
sense
as AbstractZkLedgerManager.createLedgerMetadata uses
ZkUtils.asyncCreateFullPathOptimistic to create the metadata node and
appears
to depend on the create atomicity to avoid two writers overwriting each
other's
nodes.

AbstractZkLedgerManager.writeLedger would seem to have the same problem with
its dependence on using setData with the version argument to avoid lost
updates.

Am I missing something in this analysis?  It seems to me that behavior could
be actually problematic during periods of spotty connectivity to the
ZooKeeper cluster.

Thanks!
-Sam


Re: Write retry issues with ZooKeeperClient

2017-06-26 Thread Sam Just
Hmm, curator seems to have essentially the same problem:
https://issues.apache.org/jira/browse/CURATOR-268
I'm not sure there's a good way to solve this transparently -- the right
answer is
probably to plumb the ConnectionLoss event through ZooKeeperClient for
interested callers, audit for metadata users where we depend on atomicity,
and update each one to handle it appropriately.
-Sam

On Mon, Jun 26, 2017 at 4:34 PM, Sam Just <sj...@salesforce.com> wrote:

> BookKeeper has a wrapper class for the ZooKeeper client called
> ZooKeeperClient.
> Its purpose appears to be to transparently perform retries in the case that
> ZooKeeper returns ConnectionLoss on an operation due to a Disconnect event.
>
> The trouble is that it's possible that a write which received a
> ConnectionLoss
> error as a return value actually succeeded.  Once ZooKeeperClient retries,
> it'll
> get back NodeExists and propagate that error to the caller, even though the
> write succeeded and the node in fact did not exist.
>
> It seems as though the same issue would hold for setData and delete calls
> which
> use the version argument -- you could get a spurious BadVersion.
>
> I believe I've reproduced the variant with a spurious NodeExists.  It
> manifested as a suprious BKLedgerExistException when running against a 3
> instance ZooKeeper cluster with dm-delay under the ZooKeeper instance
> storage
> to force Disconnect events by injecting write delays.  This seems to make
> sense
> as AbstractZkLedgerManager.createLedgerMetadata uses
> ZkUtils.asyncCreateFullPathOptimistic to create the metadata node and
> appears
> to depend on the create atomicity to avoid two writers overwriting each
> other's
> nodes.
>
> AbstractZkLedgerManager.writeLedger would seem to have the same problem
> with
> its dependence on using setData with the version argument to avoid lost
> updates.
>
> Am I missing something in this analysis?  It seems to me that behavior
> could
> be actually problematic during periods of spotty connectivity to the
> ZooKeeper cluster.
>
> Thanks!
> -Sam
>


Re: Write retry issues with ZooKeeperClient

2017-06-27 Thread Sam Just
Should EXPIRED be considered a recoverable error for retry purposes?
Retrying in that case would mean that operations which might have been
submitted under the assumption that ephemeral nodes were still present
would be retried after the ephemeral nodes disappeared.  Don't all users
have special handling for EXPIRED anyway?
-Sam

On Tue, Jun 27, 2017 at 4:08 PM, Sijie Guo <guosi...@gmail.com> wrote:

> On Tue, Jun 27, 2017 at 2:36 PM, Sam Just <sj...@salesforce.com> wrote:
>
> > On Tue, Jun 27, 2017 at 2:29 PM, Sijie Guo <guosi...@gmail.com> wrote:
> >
> > > On Tue, Jun 27, 2017 at 10:18 AM, Sam Just <sj...@salesforce.com>
> wrote:
> > >
> > > > JV: What do you mean by "May not be perfect for negative testing"?
> > > >
> > > > I don't think there's anything inevitable about this particular class
> > of
> > > > behavior.  ZK could have chosen to avoid this problem entirely by
> doing
> > > > duplicate op detection server-side with a per-session transaction
> log.
> > > >
> > > > Since it doesn't, we'll need to solve it ourselves.
> > > > ZkLedgerUnderreplicationManager relies on either getting success or
> > > > NodeExists on the ephemeral lock node to determine whether a
> particular
> > > > ReplicationWorker is responsible for replicating a particular ledger.
> > I
> > > > haven't reproduced this one, but it seems to me that two workers
> could
> > > both
> > > > try to replicate the same ledger and *both get NodeExists* on the
> lock
> > > > node.  This would leave the ledger locked for replication until
> > whichever
> > > > one actually wrote the node restarts.
> > > >
> > > > Users like the above are pretty easy to fix.  One option would be to
> > > simply
> > > > include with the write a payload including a nonce for the client.
> > Upon
> > > a
> > > > ConnectionLoss event, we read the node and determine whether we
> > "won".  I
> > > > think ledger creation probably falls into the same category, the
> > metadata
> > > > could include an identifier for the creator.
> > > >
> > >
> > > for ephemeral znode, you don't have to add extra payload.
> > >
> > > in the retry logic,
> > >
> > > - catch NodeExists exception
> > > - call exists to check the znode. you can get the Stat (
> > > https://zookeeper.apache.org/doc/r3.4.6/api/org/apache/
> > > zookeeper/data/Stat.html#getEphemeralOwner()
> > > ) of this znode.
> > > - you can compare the Stat#getEphmeralOwner with the client's current
> > > session id. if they match, the node is created by this session,
> otherwise
> > > this node is created by other session.
> > >
> > >
> > Ah, I'll do that then, much easier.
> >
> >
> > >
> > > >
> > > > I'm less sure about AbstractZkLedgerManager.writeLedgerMetadata.  It
> > > > relies
> > > > on setData success/BadVersion atomicity to ensure consistent rmw.
> > >
> > > It's
> > > > certainly solvable by including a version vector, but I think there's
> > > > probably a simpler way to do it.  I'm not sure under what
> circumstances
> > > > there even *can* be racing writes to the metadata -- is the only case
> > > that
> > > > matters concurrent updates between the writer and a single
> replication
> > > > worker?
> > > >
> > >
> > > I don't think it can be addressed by a version vector, there is no
> public
> > > *history* about who changed what version.
> > > But the question is why do you need to address this
> #writeLedgerMetadata
> > > inside the ZooKeeperClient retry loop. I think there is already a logic
> > on
> > > #writeLedgerMetadata to resolve conflicts due to BadVersion.
> > > The LedgerHandle will re-read the ledger metadata when encountering
> > version
> > > conflicts.
> > >
> > >
> > That's the part I'm unsure of.  I know it has handling for BadVersion, I
> > don't know
> > whether it's still correct if you get a BadVersion even though your write
> > actually
> > succeeded, I need to look into that more.  Also, I'm not at all
> suggesting
> > we handle
> > this in the retry loop, I'm suggesting that for non-idempotent writes, we
> > should not
> > retry in ZooKeeperClient and instead propagate the ConnectionLoss er

git/github commit hooks

2017-10-09 Thread Sam Just
Last thursday, we a had a short discussion about possibly changing the
merge process to allow unsquashed commits and the use of the github
merge button.  One sticking point is that we'd like an automatic way
to enforce some commit message metadata requirements and formatting.

Git lets you define some hooks for validating commits and commit
messages locally, see
https://git-scm.com/book/gr/v2/Customizing-Git-Git-Hooks.
Specifically, you can define a commit-msg hook which gets to validate
the file containing the commit message before allowing the commit.  I
think https://developer.github.com/webhooks/ can be leveraged to do
the same checks on a github PR prior to allowing the PR to be merged,
but haven't had time yet to figure out precisely how.
-Sam


Re: Usefulness of ensemble change during recovery

2018-08-13 Thread Sam Just
To flesh out JV's point a bit more, suppose we've got a 5/5/4 ledger which
needs to be recovery opened.  In such a scenario, suppose the last entry on
each of the 5 bookies (no holes) are 10,10,10,10,19.  Any entry in [10,19]
is valid as the end of the ledger, but the safest answer for the end of the
ledger is really 10 here -- 11-19 cannot have been ack'd to the client and
we have 5 copies of 0-10, but only 1 of 11-19.  Currently, a client
performing a recovery open on this ledger which is able to talk to all 5
bookies will read and rewrite up to 19 ensuring that at least 4 bookies end
up with 11-19.  I'd argue that rewriting the entries in that case is
important if we want to let 19 be the end of the ledger because once we
permit a client to read 19, losing that single copy would genuinely be data
loss.  In that case, it happens that we have enough information to mark 10
as the end of the ledger, but if the client performing recovery open has
access only to bookies 3 and 4, it would be forced to conclude that 19
could be the end of the ledger.  In that case, if we want to avoid exposing
entries which have never been written to fewer than aQ bookies, we really
do have to either
1) do an ensemble change and write out the tail entries of the ledger to a
healthy ensemble
2) fail the recovery open

I'd therefore argue that repairing the tail of the ledger -- with an
ensemble change if necessary -- is actually required to allow readers to
access the ledger.
-Sam

On Mon, Aug 6, 2018 at 9:27 AM Venkateswara Rao Jujjuri 
wrote:

> I don't think it a good idea to leave the tail to the replication.
> This could lead to the perception of data loss, and it's more evident in
> the case of larger WQ and disparity with AQ.
> If we determine LLAC based on having 'a copy', which is never acknowledged
> to the client, and if that bookie goes down(or crashes and burns)
> before replication worker gets a chance, it gives the illusion of data
> loss. Moreover, we have no way to determine the real data loss vs
> this scenario where we have never acknowledged the client.
>
>
> On Mon, Aug 6, 2018 at 12:32 AM, Sijie Guo  wrote:
>
> > On Mon, Aug 6, 2018 at 12:08 AM Ivan Kelly  wrote:
> >
> > > >> Recovery operates on a few seconds of data (from the last LAC
> written
> > > >> to the end of the ledger, call this LLAC).
> > > >
> > > > the data during this duration can be very large if the traffic of the
> > > > ledger is large. That has
> > > > been observed at Twitter's production. so when we are talking about
> "a
> > > few
> > > > seconds of data",
> > > > we can't assume the amount of data is little. That says the recovery
> > can
> > > be
> > > > taking time than
> > >
> > > Yes, it can be large, but still it is only a few seconds worth of
> > > data. It is the amount of data that can be transmitted in the period
> > > of one roundtrip, as the next roundtrip will update the LAC.
> >
> >
> > > I didn't mean to imply the data was small. I was implying that the
> > > data was small in comparison to the overall size of that ledger.
> >
> >
> > > > what we can expect, so if we don't handle failures during recovery
> how
> > we
> > > > are able to ensure
> > > > we have enough data copy during recovery.
> > >
> > > Consider a e3w3a2 ledger, there's two cases where you can lose a
> > > bookie during recover.
> > >
> > > Case one, one bookie is lost. You can still recover from as ack=2 is
> > > available.
> > > Case two, two bookies are lost. You can't recover, but ledger is
> > > unavailable anyhow, since any entry in the ledger may only have been
> > > replicated to 2.
> > >
> > > However, with e3w3a3 I guess you wouldn't be able to recover at all,
> > > and we have to handle that case.
> > >
> > > > I am not sure "make ledger metadata immutable" == "getting rid of
> > merging
> > > > ledger metadata".
> > > > because I don't think these are same thing. making ledger metadata
> > > > immutable will make code
> > > > much clearer and simpler because the ledger metadata is immutable.
> how
> > > > getting rid of merging
> > > > ledger metadata is a different thing, when you make ledger metadata
> > > > immutable, it will help make
> > > > merging ledger metadata on conflicts clearer.
> > >
> > > I wouldn't call it merging in this case.
> >
> >
> > That's fine.
> >
> >
> > > Merging implies taking two
> > > valid pieces of metadata and getting another usable, valid metadata
> > > from it.
> > > What happens with immutable metadata, is that you are taking one valid
> > > metadata, and applying operations to it. So in the failure during
> > > recovery place, we would have a list of AddEnsemble operations which
> > > we add when we try to close.
> > >
> > > In theory this is perfectly valid and clean. It just can look messy in
> > > the code, due to how the PendingAddOp reaches back into the ledger
> > > handle to get the current ensemble.
> > >
> >
> > That's okay since it is reality which we have to face anyway. But the
> most
> > 

Re: Scanning the list of entries present on a bookie

2018-04-12 Thread Sam Just
IIRC, InterleavedLedgerStorage has for each ledger an index file
mapping the entries to entry logger offsets, you could probably scan
that directly (particularly if you included a lower bound -- probably
the client's current idea of the LAC).
-Sam

On Thu, Apr 12, 2018 at 12:31 AM, Enrico Olivelli  wrote:
> Hi BookKeepers,
> during implementation of BP-14 I am facing a problem so I am asking for
> suggestions.
>
> My need is to be able to know the list of all entries stored on a
> LedgerStorage given a ledgerId.
>
> Scanning from 0 to LedgerStorage#getLastAddConfirmed() does not seem to
> work because we have to deal with WriteAdvHandle, so there can be temporary
> "gaps" in the sequence of entries.
>
> I can have a writer which writes entries 0,1,5,6,7. Its LAC will be at most
> 1 as entries 2,3,4 are not written yet.
> I need on the bookie to able to know that entries 0, 1, 5, 6, 7 are stored
> on LedgerStorage.
>
> I cannot issue a scan from 0 to Long.MAX_VALUE, my current 'solution' it to
> make the client (writer) send the 'maximum entry id' and perform a scan
> from 0 to maxEntryId.
> In the example the writer will send a forceLedger RPC with maxEntryId = 7.
>
> This is need only for recoveries are bookie restart because I have to
> reconstruct the knowledge about which entries have been persisted durably
> on the Bookie.
>
> I am not very expert about LedgerStorage implementations, and I don't know
> if it would be feasible to have such 'scan all entries' method.
>
> This is the code I am talking about
> https://github.com/apache/bookkeeper/pull/1317/files#diff-3b81b1c90d1f51017627b3c032676168R1210
>
> Any help is really appreciated
> Enrico


Re: Clusterwide vs Client configuration for metadata format version

2018-12-18 Thread Sam Just
I'll take a look.

On Tue, Dec 18, 2018 at 1:39 AM Ivan Kelly  wrote:

> JV, Sam, Charan, Andrey, could one of you chime in on this? It's
> holding up 4.9 release.
>
> -Ivan
>
> On Thu, Dec 13, 2018 at 5:38 PM Ivan Kelly  wrote:
> >
> > I'd be interested to see the opinion of the salesforce folks on this.
> > On Thu, Dec 13, 2018 at 5:35 PM Ivan Kelly  wrote:
> > >
> > > > I am not sure about this. If clients don't react the changes of
> ledger
> > > > layout,
> > > > the information in ledger layout is just informative, you still need
> to
> > > > coordinate
> > > > both readers and writers. so IMO the version in ledger layout is not
> really
> > > > useful.
> > >
> > > The clients react the next time they initialize the ledger manager.
> > > Which is exactly the same as would occur with a configuration setting.
> > >
> > > -Ivan
>


-- 




Re: Clusterwide vs Client configuration for metadata format version

2018-12-18 Thread Sam Just
I think both approaches are viable, but I think that the max allowable
version is more naturally a bk cluster property rather than a bk client
property.  Controlling this from the client means that the same client
version deployed to two different clusters might need different settings
depending on the other clients deployed to those clusters.  Placing it in
the metadata means that the clients simply pick up the correct version for
the environment from the ledger metadata without needing additional
configuration.  However, client config management is likely to be managed
on a per-cluster basis anyway, so in practice there may be little
difference.
-Sam

On Tue, Dec 18, 2018 at 10:01 AM Sam Just  wrote:

> I'll take a look.
>
> On Tue, Dec 18, 2018 at 1:39 AM Ivan Kelly  wrote:
>
>> JV, Sam, Charan, Andrey, could one of you chime in on this? It's
>> holding up 4.9 release.
>>
>> -Ivan
>>
>> On Thu, Dec 13, 2018 at 5:38 PM Ivan Kelly  wrote:
>> >
>> > I'd be interested to see the opinion of the salesforce folks on this.
>> > On Thu, Dec 13, 2018 at 5:35 PM Ivan Kelly  wrote:
>> > >
>> > > > I am not sure about this. If clients don't react the changes of
>> ledger
>> > > > layout,
>> > > > the information in ledger layout is just informative, you still
>> need to
>> > > > coordinate
>> > > > both readers and writers. so IMO the version in ledger layout is
>> not really
>> > > > useful.
>> > >
>> > > The clients react the next time they initialize the ledger manager.
>> > > Which is exactly the same as would occur with a configuration setting.
>> > >
>> > > -Ivan
>>
>
>
> --
>
> <http://smart.salesforce.com/sig/sjust//us_mb/default/link.html>
>


-- 

<http://smart.salesforce.com/sig/sjust//us_mb/default/link.html>