subject:"\[jira\] \[Commented\] \(HDFS\-5223\) Allow edit log\/fsimage format changes without changing layout version"

[jira] [Commented] (HDFS-5223) Allow edit log/fsimage format changes without changing layout version

2015-05-19 Thread Chris Nauroth (JIRA)

[
https://issues.apache.org/jira/browse/HDFS-5223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14550876#comment-14550876
]

Chris Nauroth commented on HDFS-5223:
-

I have deleted the design document and patch and moved it over to HDFS-8432, so
that this patch can remain focused on discussion of the proposal for feature
flags.

Allow edit log/fsimage format changes without changing layout version
-

Key: HDFS-5223
URL: https://issues.apache.org/jira/browse/HDFS-5223
Project: Hadoop HDFS
Issue Type: Bug
Components: namenode
Affects Versions: 2.1.1-beta
Reporter: Aaron T. Myers
Assignee: Colin Patrick McCabe
Attachments: HDFS-5223.004.patch

Currently all HDFS on-disk formats are version by the single layout version.
This means that even for changes which might be backward compatible, like the
addition of a new edit log op code, we must go through the full `namenode
-upgrade' process which requires coordination with DNs, etc. HDFS should
support a lighter weight alternative.
Copied description from HDFS-8075 which is a duplicate and now closed. (by
sanjay on APril 7 2015)
Background
* HDFS image layout was changed to use Protobufs to allow easier forward and
backward compatibility.
* Hdfs has a layout version which is changed on each change (even if it an
optional protobuf field was added).
* Hadoop supports two ways of going back during an upgrade:
** downgrade: go back to old binary version but use existing image/edits so
that newly created files are not lost
** rollback: go back to checkpoint created before upgrade was started -
hence newly created files are lost.
Layout needs to be revisited if we want to support downgrade is some
circumstances which we dont today. Here are use cases:
* Some changes can support downgrade even though they was a change in layout
since there is not real data loss but only loss of new functionality. E.g.
when we added ACLs one could have downgraded - there is no data loss but you
will lose the newly created ACLs. That is acceptable for a user since one
does not expect to retain the newly added ACLs in an old version.
* Some changes may lead to data-loss if the functionality was used. For
example, the recent truncate will cause data loss if the functionality was
actually used. Now one can tell admins NOT use such new such new features
till the upgrade is finalized in which case one could potentially support
downgrade.
* A fairly fundamental change to layout where a downgrade is not possible but
a rollback is. Say we change the layout completely from protobuf to something
else. Another example is when HDFS moves to support partial namespace in
memory - they is likely to be a fairly fundamental change in layout.

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-5223) Allow edit log/fsimage format changes without changing layout version

2015-05-05 Thread Chris Nauroth (JIRA)

[
https://issues.apache.org/jira/browse/HDFS-5223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14529416#comment-14529416
]

Chris Nauroth commented on HDFS-5223:
-

bq. The fact that presently all features are always enabled means that we
should consider ourselves obligated to make sure that all features work well
with all other features.

Yes, definitely agreed.

I don't think I communicated this point clearly. What I meant was that I see
additional complexity arising from the new combinations of feature X on while
feature Y off. These represent new system states not covered by existing
tests, and this is where we get a combinatorial explosion in the test matrix.
It's particularly challenging if one feature is coupled to another, either in
code or as an implementation pre-requisite.

As an example, an early design proposal for ACLs would have involved
implementing xattrs first, followed by implementing ACLs in terms of private
xattrs. (This is a common implementation in other file systems.) We didn't do
it this way, but if we had, then we'd have a situation where the
{{EXTENDED_ACL}} feature is dependent upon {{XATTRS}}. What is the effect of
disabling {{XATTRS}} while {{EXTENDED_ACL}} is enabled? I suppose the correct
response is to block disabling {{XATTRS}} if {{EXTENDED_ACL}} is still on.
This becomes extra code to write and test. It also becomes extra knowledge for
operators to be aware that both must be enabled before the feature can be used.

The monotonically increasing (well, technically decreasing!) layout version has
the benefit of restricting possible system states, because it guarantees that
prior features in the lineage are enabled. The drawback is that it harms
flexibility. In this particular case, I prefer keeping that invariant and the
safety it brings over the increased flexibility.

bq. One might want to use the OOB ack feature just when doing a rolling restart
(no upgrade) to effect a configuration change, without the additional
complexity of metadata changes, etc.

FWIW, the existing rolling upgrade functionality doesn't really dictate what it
is that you're upgrading, and the design targeted a DN-only upgrade as one of
its use cases. It would be completely legitimate to skip the NN portion of the
rolling upgrade procedure and do just the DN portion to push a configuration
change with no code changes, like increasing the xceiver count.

Allow edit log/fsimage format changes without changing layout version
-

[jira] [Commented] (HDFS-5223) Allow edit log/fsimage format changes without changing layout version

2015-05-05 Thread Aaron T. Myers (JIRA)

[
https://issues.apache.org/jira/browse/HDFS-5223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14529231#comment-14529231
]

Aaron T. Myers commented on HDFS-5223:
--

Hey Chris, thanks a lot for working on this.

Seems like this approach would certainly help with the downgrade/rollback
issue, but wouldn't do much to make the upgrade itself easier. In cases where
the only NN metadata change between versions is just the introduction of new
edit log op codes, I think it'd be much better if we could just swap the
software during a rolling restart without having to use the {{-rollingUpgrade}}
functionality at all, and then optionally enable the feature via an
administrative command afterward - essentially the feature flags proposal
earlier discussed. That approach will both make non-destructive downgrades
possible from versions which introduce new op codes, and make upgrades
substantially easier as well.

What's your reasoning for wanting to stick with a linear layout version number
approach when introducing new op codes? In general I think it'd be beneficial
for HDFS to move toward a bit-set denoting which features/op codes are
enabled/disabled, much like [~tlipcon] described earlier.

Allow edit log/fsimage format changes without changing layout version
-

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-5223) Allow edit log/fsimage format changes without changing layout version

2015-05-05 Thread Chris Nauroth (JIRA)

[
https://issues.apache.org/jira/browse/HDFS-5223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14529313#comment-14529313
]

Chris Nauroth commented on HDFS-5223:
-

Hi [~atm].

bq. Seems like this approach would certainly help with the downgrade/rollback
issue, but wouldn't do much to make the upgrade itself easier.

That's correct. The rolling upgrade procedure still would be required. This
document/patch focuses on expanding the uses cases that can support downgrade.

bq. In general I think it'd be beneficial for HDFS to move toward a bit-set
denoting which features/op codes are enabled/disabled, much like Todd Lipcon
described earlier.

I share some of the concerns mentioned earlier about operational complexity.

https://issues.apache.org/jira/browse/HDFS-5223?focusedCommentId=13779177page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13779177

Complexity in HDFS often arises from combinations of its features rather than
individual features in isolation. If individual features can be toggled, then
no two HDFS instances running the same software version are really guaranteed
to be alike. This becomes another layer of troubleshooting required for a
technical support team. Testing the possible combinations of features on and
off becomes a combinatorial explosion that's difficult for a QA team to manage.

Aside from managing metadata upgrades, we've also found rolling upgrade to be
valuable because of the OOB ack propagated through write pipelines (HDFS-5583)
to tell clients to pause rather than aborting the connection. Even if it
wasn't required from a metadata standpoint, some users might continue to use
rolling upgrade to get this benefit, even within a minor release line where the
layout version hasn't changed. Considering that use case, I see value in
improving our ability to downgrade within the current rolling upgrade scheme.

If you prefer to keep the discussion here focused on building consensus around
feature flags, then I could potentially move this work to a separate jira where
it could move ahead independently. Let me know your thoughts. Thanks!

Allow edit log/fsimage format changes without changing layout version
-

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-5223) Allow edit log/fsimage format changes without changing layout version

2015-05-05 Thread Aaron T. Myers (JIRA)

[
https://issues.apache.org/jira/browse/HDFS-5223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14529336#comment-14529336
]

Aaron T. Myers commented on HDFS-5223:
--

bq. Complexity in HDFS often arises from combinations of its features rather
than individual features in isolation. If individual features can be toggled,
then no two HDFS instances running the same software version are really
guaranteed to be alike. This becomes another layer of troubleshooting required
for a technical support team. Testing the possible combinations of features on
and off becomes a combinatorial explosion that's difficult for a QA team to
manage.

This is an issue, to be sure, but is this really different with or without
feature flags present? Even today, users can always choose to use or not use
all the various features of HDFS in any number of combinations. The fact that
presently all features are always enabled means that we should consider
ourselves obligated to make sure that all features work well with all other
features.

bq. Aside from managing metadata upgrades, we've also found rolling upgrade to
be valuable because of the OOB ack propagated through write pipelines
(HDFS-5583) to tell clients to pause rather than aborting the connection. Even
if it wasn't required from a metadata standpoint, some users might continue to
use rolling upgrade to get this benefit, even within a minor release line where
the layout version hasn't changed. Considering that use case, I see value in
improving our ability to downgrade within the current rolling upgrade scheme.

Fair point, but this suggests to me that the OOB ack feature should perhaps be
separated from the rolling upgrade feature, since those seem somewhat
orthogonal. One might want to use the OOB ack feature just when doing a rolling
restart (no upgrade) to effect a configuration change, without the additional
complexity of metadata changes, etc.

Allow edit log/fsimage format changes without changing layout version
-

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-5223) Allow edit log/fsimage format changes without changing layout version

2015-04-07 Thread Sanjay Radia (JIRA)

[
https://issues.apache.org/jira/browse/HDFS-5223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14483654#comment-14483654
]

Sanjay Radia commented on HDFS-5223:

For the edits one could require that in order to downgrade you must do a
save-image and then delete the null edits-log. We would then limit our
solution to the image. For the image we could do the following
* Add a *second* layout version field (call it compatible-layout-version)
that indicates which version can safely read the image without data-loss. A NN
that starts up will compare this field with its current layout version and then
proceed as long as the edits is null.
** The ACL example (see Jira description) will state that the previous version
can safely read the image without data loss. Of course newly created ACLs would
be lost.
** Truncate example is tricky: one can safely downgrade if the truncate
operation was not used. We could add code to not allow such new features till
finalize is done. This is somewhat analogous to what ext3 was trying to do
with its superblock feature flags (see Todd's comment above); what I am
proposing is slightly different since it limits such features till upgrade is
finalized while ext3's approach is more general in that you can downgrade at
anytime as long as you have used the feature. Alternatively, we could simply
not support downgrade for such a feature and simply mark the
compatible-layout-version accordingly.

Allow edit log/fsimage format changes without changing layout version
-

Currently all HDFS on-disk formats are version by the single layout version.
This means that even for changes which might be backward compatible, like the
addition of a new edit log op code, we must go through the full `namenode
-upgrade' process which requires coordination with DNs, etc. HDFS should
support a lighter weight alternative.
Copied description from HDFS-8075 which is a duplicate and now closed.
Background
* HDFS image layout was changed to use Protobufs to allow easier forward and
backward compatibility.
* Hdfs has a layout version which is changed on each change (even if it an
optional protobuf field was added).
* Hadoop supports two ways of going back during an upgrade:
** downgrade: go back to old binary version but use existing image/edits so
that newly created files are not lost
** rollback: go back to checkpoint created before upgrade was started -
hence newly created files are lost.
Layout needs to be revisited if we want to support downgrade is some
circumstances which we dont today. Here are use cases:
* Some changes can support downgrade even though they was a change in layout
since there is not real data loss but only loss of new functionality. E.g.
when we added ACLs one could have downgraded - there is no data loss but you
will lose the newly created ACLs. That is acceptable for a user since one
does not expect to retain the newly added ACLs in an old version.
* Some changes may lead to data-loss if the functionality was used. For
example, the recent truncate will cause data loss if the functionality was
actually used. Now one can tell admins NOT use such new such new features
till the upgrade is finalized in which case one could potentially support
downgrade.
* A fairly fundamental change to layout where a downgrade is not possible but
a rollback is. Say we change the layout completely from protobuf to something
else. Another example is when HDFS moves to support partial namespace in
memory - they is likely to be a fairly fundamental change in layout.

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-5223) Allow edit log/fsimage format changes without changing layout version

2015-04-07 Thread Sanjay Radia (JIRA)

[
https://issues.apache.org/jira/browse/HDFS-5223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14483704#comment-14483704
]

Sanjay Radia commented on HDFS-5223:

The above solution was inspired by Hive's ORC. They have two complementary
mechanisms to address dealing with old and new binaries. They specify the
oldest version that can safely read the new data (which inspired the solution i
gave above) and also new binaries can write in older format. This second
mechanim is too burdensome for HDFS. Instead I would prefer to disable the new
new features after which one cannot downgrade.

Allow edit log/fsimage format changes without changing layout version
-

Currently all HDFS on-disk formats are version by the single layout version.
This means that even for changes which might be backward compatible, like the
addition of a new edit log op code, we must go through the full `namenode
-upgrade' process which requires coordination with DNs, etc. HDFS should
support a lighter weight alternative.
Copied description from HDFS-8075 which is a duplicate and now closed.
Background
* HDFS image layout was changed to use Protobufs to allow easier forward and
backward compatibility.
* Hdfs has a layout version which is changed on each change (even if it an
optional protobuf field was added).
* Hadoop supports two ways of going back during an upgrade:
** downgrade: go back to old binary version but use existing image/edits so
that newly created files are not lost
** rollback: go back to checkpoint created before upgrade was started -
hence newly created files are lost.
Layout needs to be revisited if we want to support downgrade is some
circumstances which we dont today. Here are use cases:
* Some changes can support downgrade even though they was a change in layout
since there is not real data loss but only loss of new functionality. E.g.
when we added ACLs one could have downgraded - there is no data loss but you
will lose the newly created ACLs. That is acceptable for a user since one
does not expect to retain the newly added ACLs in an old version.
* Some changes may lead to data-loss if the functionality was used. For
example, the recent truncate will cause data loss if the functionality was
actually used. Now one can tell admins NOT use such new such new features
till the upgrade is finalized in which case one could potentially support
downgrade.
* A fairly fundamental change to layout where a downgrade is not possible but
a rollback is. Say we change the layout completely from protobuf to something
else. Another example is when HDFS moves to support partial namespace in
memory - they is likely to be a fairly fundamental change in layout.

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-5223) Allow edit log/fsimage format changes without changing layout version

2014-01-15 Thread Todd Lipcon (JIRA)

[
https://issues.apache.org/jira/browse/HDFS-5223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13872634#comment-13872634
]

Todd Lipcon commented on HDFS-5223:
---

Just skimmed really quick. I noticed that this doesn't separate flags into
compatible ones and incompatible ones per the discussion above.
Implementations like ext3 and nilfs2 do this so that it's possible to introduce
features and label them as backward-compatible or at least
readonly-back-compatible.

As an example of a readonly-back-compatible flag, consider the case of adding
new ops like Add cache pool and remove cache pool. An old NN could easily
start up and simply ignore these opcodes that it doesn't understand (once we
have protobuf-ified). Another example would be adding a new field to the inode
structure such as preferred storage class. An old NN could simply ignore the
new fields in read-only mode, or drop them relatively safely in a downgrade
scenario. On the other hand, a feature such as compression, or adding
OP_ADD_BLOCK instead of OP_UPDATE_BLOCK would not be ro-compatible since an old
NN wouldn't be able to reconstruct the user data.

I think it would be short-sighted of us to not include a similar functionality
in our flags, even if this initial patch doesn't handle the two types of flags
differently.

Allow edit log/fsimage format changes without changing layout version
-

--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (HDFS-5223) Allow edit log/fsimage format changes without changing layout version

2014-01-15 Thread Colin Patrick McCabe (JIRA)

[
https://issues.apache.org/jira/browse/HDFS-5223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13872771#comment-13872771
]

Colin Patrick McCabe commented on HDFS-5223:

I think the way to do backwards compatible feature flags would be to have a
flag prefix such as compat_. This would let older NameNodes know that the
(new, unknown to them) flag they were looking at was compatible. The
information needs to be in the flag name, since it can't be in the older
software.

However, I don't think we should implement this now. We don't really have any
infrastructure in place today to use backwards compatible feature flags.
With the simple DataInputStream/DataOutputStream based decoding we have now,
any extra field results in a loading failure.

So it's fair to say: we can't create even one backwards compatible feature
flag without installing new software beyond what this patch provides. Given
that this is true, we should just implement compatible feature flags later when
we know we need (or at least can use) them. I am all for doing it after the
protobuf merge. But there's simply no reason to do it before because I don't
believe a backwards compatible change can be made at this point.

Allow edit log/fsimage format changes without changing layout version
-

--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (HDFS-5223) Allow edit log/fsimage format changes without changing layout version

2013-09-26 Thread Nathan Roberts (JIRA)

[
https://issues.apache.org/jira/browse/HDFS-5223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13779177#comment-13779177
]

Nathan Roberts commented on HDFS-5223:
--

Thanks Aaron and Todd for bringing this up.

I love the flexibility of feature bits however I'm very nervous about the
complexity they tend to bring. As long as there is incredibly tight controls it
can work but more often than not I've seen this sort of approach lead to some
incredibly unmaintainable code. The code can get very complex dealing with
multiple combinations and the testing/QA can get also be very difficult to
manage. Things can get overwhelmingly complex quite quickly. Having an
-enableAllNewFeatures helps a bit but I'm not sure it lowers the complexity
all that much.

Of the two options, I'd lean in the direction of #1 at this point.

iiuc, option 2 basically means that V2 software has to remember how to both
read and write in V1 format whereas option 1 only requires that V2 be able to
read V1 format (like we do today). I kind of like the fact that new software
doesn't ever have to write things according to the older format.

* When we update the SBN to V2 it would be allowed to come up and it would
still be able to process V1 images/edits
* The first time it tries to write a new image, it would do so in V2 format
* When uploading a new V2 image to ANN, the upload would not proceed because of
the version mismatch (this way the ANN's local storage stays purely V1)
* At this point we can still rollback by simply re-bootstrapping the SBN
* Now we failover to the SBN, the SBN changes the shared edits area to indicate
V2 (just an update to VERSION file I think)
* Upgrade old ANN with V2 software
* old ANN comes up as Standby, reads the new V2 image and starts processing new
V2 edits (somewhere in here also has to change local storage to V2)

What's not great about this approach is that as soon as V2 software becomes
active, we're writing in V2 format and at that point can't go back without
losing edits. However, that's basically very similar to today's -upgrade. The
only difference being that we haven't done anything to protect the blocks on
the datanodes (with -upgrade we hardlink everything and therefore guarantee
data blocks can't go away). So, maybe we need a mode where HDFS stops deleting
blocks both from the NN's perspective (won't issue invalidates any longer), as
well as from the DN side where it will ignore block deletion requests. Kind of
a semi-safe-mode where the filesystem acts pretty much normally except that it
refuses to delete any blocks. If we get ourselves into a true disaster-recovery
situation, we can go back to V1 software + last V1 fsimage + all V1 edits that
applied to that image + all blocks from the datanodes.

Allow edit log/fsimage format changes without changing layout version
-

Key: HDFS-5223
URL: https://issues.apache.org/jira/browse/HDFS-5223
Project: Hadoop HDFS
Issue Type: Bug
Components: namenode
Affects Versions: 2.1.1-beta
Reporter: Aaron T. Myers

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-5223) Allow edit log/fsimage format changes without changing layout version

2013-09-26 Thread Todd Lipcon (JIRA)

[
https://issues.apache.org/jira/browse/HDFS-5223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13779511#comment-13779511
]

Todd Lipcon commented on HDFS-5223:
---

bq. I love the flexibility of feature bits however I'm very nervous about the
complexity they tend to bring. As long as there is incredibly tight controls it
can work but more often than not I've seen this sort of approach lead to some
incredibly unmaintainable code. The code can get very complex dealing with
multiple combinations and the testing/QA can get also be very difficult to
manage. Things can get overwhelmingly complex quite quickly.

I agree that it's a bit more complex, but I'm not sure it's quite as bad in our
context as it might be in others. Most of our edit log changes to date have
been fairly simple. Looking through the Feature enum, they tend to fall into
the following categories:
- Entirely new opcodes (eg CONCAT) - these are easy to do on the writer side by
just throwing an exception in logEdit() if the feature isn't supported.
Sometimes these also involve a new set of data written to the FSImage (eg in
the case of delegation token persistence) but again it should be pretty
orthogonal to other features.
- New container format features (eg fsimage compression, or checksums on edit
entries). These are new features which are off by default and orthogonal to any
other features.
- Single additional fields in existing opcodes. We'd need to be somewhat
careful not to make use of any of these fields if the feature isn't enabled,
but I think there's usually pretty clear semantics.

Certainly it's more complex than option 1, but I think the ability to downgrade
without data loss is pretty key. A lot of Hadoop operators are already hesitant
to upgrade between minor versions already, and losing the ability to roll back
would make it a non-starter for a lot of shops. If that's the case, then I
think it would be really tough to add new opcodes or other format changes even
between minor releases (eg 2.3 to 2.4) and convince an operator to do the
upgrade.

Am I being overly conservative in what operators will put up with, instead of
overly conservative in the complexity we introduce?

(btw, I agree completely about the no-delete mode -- I think a TTL delete
mode is also a nice feature we could build in at the same time, where block
deletions are always delayed for a day, to mitigate potential for data loss
even with bugs present)

Allow edit log/fsimage format changes without changing layout version
-

Key: HDFS-5223
URL: https://issues.apache.org/jira/browse/HDFS-5223
Project: Hadoop HDFS
Issue Type: Bug
Components: namenode
Affects Versions: 2.1.1-beta
Reporter: Aaron T. Myers

[jira] [Commented] (HDFS-5223) Allow edit log/fsimage format changes without changing layout version

2013-09-18 Thread Aaron T. Myers (JIRA)

[
https://issues.apache.org/jira/browse/HDFS-5223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13771059#comment-13771059
]

Aaron T. Myers commented on HDFS-5223:
--

I was chatting about this informally with [~tlipcon] a day or two ago, and we
came up with the following two alternative implementations:

# Introduce a new separate NN metadata version number which is decoupled from
the existing layout version. We will allow the NN to start up if its NN
metadata version number is higher than what's in the fsimage/edit log headers
without requiring the '-upgrade' flag. From now on the addition of new edit log
opcodes would increment the NN metadata version, and we would require that
changes made to the format of existing fsimage/edit log entries be done in a
backward compatible fashion. We would freeze the existing layout version
number and from now on only increment this in the case of more fundamental NN
metadata version changes.
# Introduce a set of NN metadata format feature flags which can be enabled or
disabled by the admin at runtime. These feature flags could be enabled/disabled
entirely independently, so we would move away from a strictly-increasing NN
metadata version number. The fsimage and edit log header would be changed to
enumerate which of these features were enabled. We will allow the NN to start
up only if its software supports the full set of features identified in the
fsimage/edit log headers.

I'd love to solicit others thoughts/feedback on these proposals, or suggest an
alternative if you have one.

Allow edit log/fsimage format changes without changing layout version
-

Key: HDFS-5223
URL: https://issues.apache.org/jira/browse/HDFS-5223
Project: Hadoop HDFS
Issue Type: Bug
Components: namenode
Affects Versions: 2.1.1-beta
Reporter: Aaron T. Myers

[jira] [Commented] (HDFS-5223) Allow edit log/fsimage format changes without changing layout version

2013-09-18 Thread Todd Lipcon (JIRA)

[
https://issues.apache.org/jira/browse/HDFS-5223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13771100#comment-13771100
]

Todd Lipcon commented on HDFS-5223:
---

To expand a little bit on Aaron's summary of our discussion above.

*Proposal 1*:
- note that we already include a version number in the header of the edit log
and image formats. So, within a single image or edits directories, you might
now have different edit log segments or images with different version numbers
-- the ones written post-upgrade would have a higher version number.
- note that this allows for in-place software upgrade, but not in-place
software downgrade. Once you've written an edit log with the new version, you
couldn't downgrade the NN back to the previous version, because it would refuse
to read the higher-versioned edit log segment.

bq. and we would require that changes made to the format of existing
fsimage/edit log entries be done in a backward compatible fashion

This isn't quite the case -- because the new edit log segments would have a new
version number, we have the same ability to evolve opcodes as today. I verified
with Aaron that he mis-stated this above.

*Proposal 2*:
- This is basically the way that file systems such as ext3 handle version
compatibility. Every ext3 filesystem's superblock contains a set of flags which
determine which features have been enabled for it. Similarly, we'd add
something to the edit log and fsimage headers with a set of feature names.
Here's the docs from Documentation/filesystems/ext2.txt in the kernel tree:

{code}
These feature flags have specific meanings for the kernel as follows:

A COMPAT flag indicates that a feature is present in the filesystem,
but the on-disk format is 100% compatible with older on-disk formats, so
a kernel which didn't know anything about this feature could read/write
the filesystem without any chance of corrupting the filesystem (or even
making it inconsistent). This is essentially just a flag which says
this filesystem has a (hidden) feature that the kernel or e2fsck may
want to be aware of (more on e2fsck and feature flags later). The ext3
HAS_JOURNAL feature is a COMPAT flag because the ext3 journal is simply
a regular file with data blocks in it so the kernel does not need to
take any special notice of it if it doesn't understand ext3 journaling.

An RO_COMPAT flag indicates that the on-disk format is 100% compatible
with older on-disk formats for reading (i.e. the feature does not change
the visible on-disk format). However, an old kernel writing to such a
filesystem would/could corrupt the filesystem, so this is prevented. The
most common such feature, SPARSE_SUPER, is an RO_COMPAT feature because
sparse groups allow file data blocks where superblock/group descriptor
backups used to live, and ext2_free_blocks() refuses to free these blocks,
which would leading to inconsistent bitmaps. An old kernel would also
get an error if it tried to free a series of blocks which crossed a group
boundary, but this is a legitimate layout in a SPARSE_SUPER filesystem.

An INCOMPAT flag indicates the on-disk format has changed in some
way that makes it unreadable by older kernels, or would otherwise
cause a problem if an old kernel tried to mount it. FILETYPE is an
INCOMPAT flag because older kernels would think a filename was longer
than 256 characters, which would lead to corrupt directory listings.
The COMPRESSION flag is an obvious INCOMPAT flag - if the kernel
doesn't understand compression, you would just get garbage back from
read() instead of it automatically decompressing your data. The ext3
RECOVER flag is needed to prevent a kernel which does not understand the
ext3 journal from mounting the filesystem without replaying the journal.
{code}

This would allow us to do rolling upgrades, run mixed-version clusters, and
still retain the ability to roll back to a prior version until the new feature
was used. So, to take the example of a feature like snapshots which required a
metadata change, the admin workflow would be:

# Shutdown standby node
# Upgrade standby software version
# Start standby node, failover to it
# Shutdown and upgrade the old active, start it back up.
# Note: at this point, the format for the edit logs and images is identical to
the pre-upgrade format, so the user could still roll back. Trying to create a
snapshot at this point would fail with an error like Snapshots not enabled for
this filesystem. Run dfsadmin -enableFeature snapshots to enable
# User runs the above command, which forces an edit log roll. The new edit logs
contain the flag indicating that snapshots are enabled, and may use the new
opcodes (or add new fields to the old opcodes as necessary)

If the explicit enable doesn't sit well with people, we could also add a
slightly simpler version like -enableAllNewFeatures or whatever, which a user
can use after an upgrade

[jira] [Commented] (HDFS-5223) Allow edit log/fsimage format changes without changing layout version

[jira] [Commented] (HDFS-5223) Allow edit log/fsimage format changes without changing layout version

[jira] [Commented] (HDFS-5223) Allow edit log/fsimage format changes without changing layout version

[jira] [Commented] (HDFS-5223) Allow edit log/fsimage format changes without changing layout version

[jira] [Commented] (HDFS-5223) Allow edit log/fsimage format changes without changing layout version

[jira] [Commented] (HDFS-5223) Allow edit log/fsimage format changes without changing layout version

[jira] [Commented] (HDFS-5223) Allow edit log/fsimage format changes without changing layout version

[jira] [Commented] (HDFS-5223) Allow edit log/fsimage format changes without changing layout version

[jira] [Commented] (HDFS-5223) Allow edit log/fsimage format changes without changing layout version

[jira] [Commented] (HDFS-5223) Allow edit log/fsimage format changes without changing layout version

[jira] [Commented] (HDFS-5223) Allow edit log/fsimage format changes without changing layout version

[jira] [Commented] (HDFS-5223) Allow edit log/fsimage format changes without changing layout version

[jira] [Commented] (HDFS-5223) Allow edit log/fsimage format changes without changing layout version

13 matches

Site Navigation

Mail list logo

Footer information