[jira] [Commented] (HDFS-5223) Allow edit log/fsimage format changes without changing layout version

2015-05-19 Thread Chris Nauroth (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14550876#comment-14550876
 ] 

Chris Nauroth commented on HDFS-5223:
-

I have deleted the design document and patch and moved it over to HDFS-8432, so 
that this patch can remain focused on discussion of the proposal for feature 
flags.

 Allow edit log/fsimage format changes without changing layout version
 -

 Key: HDFS-5223
 URL: https://issues.apache.org/jira/browse/HDFS-5223
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Affects Versions: 2.1.1-beta
Reporter: Aaron T. Myers
Assignee: Colin Patrick McCabe
 Attachments: HDFS-5223.004.patch


 Currently all HDFS on-disk formats are version by the single layout version. 
 This means that even for changes which might be backward compatible, like the 
 addition of a new edit log op code, we must go through the full `namenode 
 -upgrade' process which requires coordination with DNs, etc. HDFS should 
 support a lighter weight alternative.
 Copied description from HDFS-8075 which is a duplicate and now closed. (by 
 sanjay on APril 7 2015)
 Background
 * HDFS image layout was changed to use Protobufs to allow easier forward and 
 backward compatibility.
 * Hdfs has a layout version which is changed on each change (even if it an  
 optional protobuf field was added).
 * Hadoop supports two ways of going back during an upgrade:
 **  downgrade: go back to old binary version but use existing image/edits so 
 that newly created files are not lost
 ** rollback: go back to checkpoint created before upgrade was started - 
 hence newly created files are lost.
 Layout needs to be revisited if we want to support downgrade is some 
 circumstances which we dont today. Here are use cases:
 * Some changes can support downgrade even though they was a change in layout 
 since there is not real data loss but only loss of new functionality. E.g. 
 when we added ACLs one could have downgraded - there is no data loss but you 
 will lose the newly created ACLs. That is acceptable for a user since one 
 does not expect to retain the newly added ACLs in an old version.
 * Some changes may lead to data-loss if the functionality was used. For 
 example, the recent truncate will cause data loss if the functionality was 
 actually used. Now one can tell admins NOT use such new such new features 
 till the upgrade is finalized in which case one could potentially support 
 downgrade.
 * A fairly fundamental change to layout where a downgrade is not possible but 
 a rollback is. Say we change the layout completely from protobuf to something 
 else. Another example is when HDFS moves to support partial namespace in 
 memory - they is likely to be a fairly fundamental change in layout.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-5223) Allow edit log/fsimage format changes without changing layout version

2015-05-05 Thread Chris Nauroth (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14529416#comment-14529416
 ] 

Chris Nauroth commented on HDFS-5223:
-

bq. The fact that presently all features are always enabled means that we 
should consider ourselves obligated to make sure that all features work well 
with all other features.

Yes, definitely agreed.

I don't think I communicated this point clearly.  What I meant was that I see 
additional complexity arising from the new combinations of feature X on while 
feature Y off.  These represent new system states not covered by existing 
tests, and this is where we get a combinatorial explosion in the test matrix.  
It's particularly challenging if one feature is coupled to another, either in 
code or as an implementation pre-requisite.

As an example, an early design proposal for ACLs would have involved 
implementing xattrs first, followed by implementing ACLs in terms of private 
xattrs.  (This is a common implementation in other file systems.)  We didn't do 
it this way, but if we had, then we'd have a situation where the 
{{EXTENDED_ACL}} feature is dependent upon {{XATTRS}}.  What is the effect of 
disabling {{XATTRS}} while {{EXTENDED_ACL}} is enabled?  I suppose the correct 
response is to block disabling {{XATTRS}} if {{EXTENDED_ACL}} is still on.  
This becomes extra code to write and test.  It also becomes extra knowledge for 
operators to be aware that both must be enabled before the feature can be used.

The monotonically increasing (well, technically decreasing!) layout version has 
the benefit of restricting possible system states, because it guarantees that 
prior features in the lineage are enabled.  The drawback is that it harms 
flexibility.  In this particular case, I prefer keeping that invariant and the 
safety it brings over the increased flexibility.

bq. One might want to use the OOB ack feature just when doing a rolling restart 
(no upgrade) to effect a configuration change, without the additional 
complexity of metadata changes, etc.

FWIW, the existing rolling upgrade functionality doesn't really dictate what it 
is that you're upgrading, and the design targeted a DN-only upgrade as one of 
its use cases.  It would be completely legitimate to skip the NN portion of the 
rolling upgrade procedure and do just the DN portion to push a configuration 
change with no code changes, like increasing the xceiver count.

 Allow edit log/fsimage format changes without changing layout version
 -

 Key: HDFS-5223
 URL: https://issues.apache.org/jira/browse/HDFS-5223
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Affects Versions: 2.1.1-beta
Reporter: Aaron T. Myers
Assignee: Colin Patrick McCabe
 Attachments: HDFS-5223-HDFS-Downgrade-Extended-Support.pdf, 
 HDFS-5223.004.patch, HDFS-5223.005.patch


 Currently all HDFS on-disk formats are version by the single layout version. 
 This means that even for changes which might be backward compatible, like the 
 addition of a new edit log op code, we must go through the full `namenode 
 -upgrade' process which requires coordination with DNs, etc. HDFS should 
 support a lighter weight alternative.
 Copied description from HDFS-8075 which is a duplicate and now closed. (by 
 sanjay on APril 7 2015)
 Background
 * HDFS image layout was changed to use Protobufs to allow easier forward and 
 backward compatibility.
 * Hdfs has a layout version which is changed on each change (even if it an  
 optional protobuf field was added).
 * Hadoop supports two ways of going back during an upgrade:
 **  downgrade: go back to old binary version but use existing image/edits so 
 that newly created files are not lost
 ** rollback: go back to checkpoint created before upgrade was started - 
 hence newly created files are lost.
 Layout needs to be revisited if we want to support downgrade is some 
 circumstances which we dont today. Here are use cases:
 * Some changes can support downgrade even though they was a change in layout 
 since there is not real data loss but only loss of new functionality. E.g. 
 when we added ACLs one could have downgraded - there is no data loss but you 
 will lose the newly created ACLs. That is acceptable for a user since one 
 does not expect to retain the newly added ACLs in an old version.
 * Some changes may lead to data-loss if the functionality was used. For 
 example, the recent truncate will cause data loss if the functionality was 
 actually used. Now one can tell admins NOT use such new such new features 
 till the upgrade is finalized in which case one could potentially support 
 downgrade.
 * A fairly fundamental change to layout where a downgrade is not possible but 
 a rollback is. Say we change the layout 

[jira] [Commented] (HDFS-5223) Allow edit log/fsimage format changes without changing layout version

2015-05-05 Thread Aaron T. Myers (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14529231#comment-14529231
 ] 

Aaron T. Myers commented on HDFS-5223:
--

Hey Chris, thanks a lot for working on this.

Seems like this approach would certainly help with the downgrade/rollback 
issue, but wouldn't do much to make the upgrade itself easier. In cases where 
the only NN metadata change between versions is just the introduction of new 
edit log op codes, I think it'd be much better if we could just swap the 
software during a rolling restart without having to use the {{-rollingUpgrade}} 
functionality at all, and then optionally enable the feature via an 
administrative command afterward - essentially the feature flags proposal 
earlier discussed. That approach will both make non-destructive downgrades 
possible from versions which introduce new op codes, and make upgrades 
substantially easier as well.

What's your reasoning for wanting to stick with a linear layout version number 
approach when introducing new op codes? In general I think it'd be beneficial 
for HDFS to move toward a bit-set denoting which features/op codes are 
enabled/disabled, much like [~tlipcon] described earlier.

 Allow edit log/fsimage format changes without changing layout version
 -

 Key: HDFS-5223
 URL: https://issues.apache.org/jira/browse/HDFS-5223
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Affects Versions: 2.1.1-beta
Reporter: Aaron T. Myers
Assignee: Colin Patrick McCabe
 Attachments: HDFS-5223-HDFS-Downgrade-Extended-Support.pdf, 
 HDFS-5223.004.patch, HDFS-5223.005.patch


 Currently all HDFS on-disk formats are version by the single layout version. 
 This means that even for changes which might be backward compatible, like the 
 addition of a new edit log op code, we must go through the full `namenode 
 -upgrade' process which requires coordination with DNs, etc. HDFS should 
 support a lighter weight alternative.
 Copied description from HDFS-8075 which is a duplicate and now closed. (by 
 sanjay on APril 7 2015)
 Background
 * HDFS image layout was changed to use Protobufs to allow easier forward and 
 backward compatibility.
 * Hdfs has a layout version which is changed on each change (even if it an  
 optional protobuf field was added).
 * Hadoop supports two ways of going back during an upgrade:
 **  downgrade: go back to old binary version but use existing image/edits so 
 that newly created files are not lost
 ** rollback: go back to checkpoint created before upgrade was started - 
 hence newly created files are lost.
 Layout needs to be revisited if we want to support downgrade is some 
 circumstances which we dont today. Here are use cases:
 * Some changes can support downgrade even though they was a change in layout 
 since there is not real data loss but only loss of new functionality. E.g. 
 when we added ACLs one could have downgraded - there is no data loss but you 
 will lose the newly created ACLs. That is acceptable for a user since one 
 does not expect to retain the newly added ACLs in an old version.
 * Some changes may lead to data-loss if the functionality was used. For 
 example, the recent truncate will cause data loss if the functionality was 
 actually used. Now one can tell admins NOT use such new such new features 
 till the upgrade is finalized in which case one could potentially support 
 downgrade.
 * A fairly fundamental change to layout where a downgrade is not possible but 
 a rollback is. Say we change the layout completely from protobuf to something 
 else. Another example is when HDFS moves to support partial namespace in 
 memory - they is likely to be a fairly fundamental change in layout.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-5223) Allow edit log/fsimage format changes without changing layout version

2015-05-05 Thread Chris Nauroth (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14529313#comment-14529313
 ] 

Chris Nauroth commented on HDFS-5223:
-

Hi [~atm].

bq. Seems like this approach would certainly help with the downgrade/rollback 
issue, but wouldn't do much to make the upgrade itself easier.

That's correct.  The rolling upgrade procedure still would be required.  This 
document/patch focuses on expanding the uses cases that can support downgrade.

bq. In general I think it'd be beneficial for HDFS to move toward a bit-set 
denoting which features/op codes are enabled/disabled, much like Todd Lipcon 
described earlier.

I share some of the concerns mentioned earlier about operational complexity.

https://issues.apache.org/jira/browse/HDFS-5223?focusedCommentId=13779177page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13779177

Complexity in HDFS often arises from combinations of its features rather than 
individual features in isolation.  If individual features can be toggled, then 
no two HDFS instances running the same software version are really guaranteed 
to be alike.  This becomes another layer of troubleshooting required for a 
technical support team.  Testing the possible combinations of features on and 
off becomes a combinatorial explosion that's difficult for a QA team to manage.

Aside from managing metadata upgrades, we've also found rolling upgrade to be 
valuable because of the OOB ack propagated through write pipelines (HDFS-5583) 
to tell clients to pause rather than aborting the connection.  Even if it 
wasn't required from a metadata standpoint, some users might continue to use 
rolling upgrade to get this benefit, even within a minor release line where the 
layout version hasn't changed.  Considering that use case, I see value in 
improving our ability to downgrade within the current rolling upgrade scheme.

If you prefer to keep the discussion here focused on building consensus around 
feature flags, then I could potentially move this work to a separate jira where 
it could move ahead independently.  Let me know your thoughts.  Thanks!

 Allow edit log/fsimage format changes without changing layout version
 -

 Key: HDFS-5223
 URL: https://issues.apache.org/jira/browse/HDFS-5223
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Affects Versions: 2.1.1-beta
Reporter: Aaron T. Myers
Assignee: Colin Patrick McCabe
 Attachments: HDFS-5223-HDFS-Downgrade-Extended-Support.pdf, 
 HDFS-5223.004.patch, HDFS-5223.005.patch


 Currently all HDFS on-disk formats are version by the single layout version. 
 This means that even for changes which might be backward compatible, like the 
 addition of a new edit log op code, we must go through the full `namenode 
 -upgrade' process which requires coordination with DNs, etc. HDFS should 
 support a lighter weight alternative.
 Copied description from HDFS-8075 which is a duplicate and now closed. (by 
 sanjay on APril 7 2015)
 Background
 * HDFS image layout was changed to use Protobufs to allow easier forward and 
 backward compatibility.
 * Hdfs has a layout version which is changed on each change (even if it an  
 optional protobuf field was added).
 * Hadoop supports two ways of going back during an upgrade:
 **  downgrade: go back to old binary version but use existing image/edits so 
 that newly created files are not lost
 ** rollback: go back to checkpoint created before upgrade was started - 
 hence newly created files are lost.
 Layout needs to be revisited if we want to support downgrade is some 
 circumstances which we dont today. Here are use cases:
 * Some changes can support downgrade even though they was a change in layout 
 since there is not real data loss but only loss of new functionality. E.g. 
 when we added ACLs one could have downgraded - there is no data loss but you 
 will lose the newly created ACLs. That is acceptable for a user since one 
 does not expect to retain the newly added ACLs in an old version.
 * Some changes may lead to data-loss if the functionality was used. For 
 example, the recent truncate will cause data loss if the functionality was 
 actually used. Now one can tell admins NOT use such new such new features 
 till the upgrade is finalized in which case one could potentially support 
 downgrade.
 * A fairly fundamental change to layout where a downgrade is not possible but 
 a rollback is. Say we change the layout completely from protobuf to something 
 else. Another example is when HDFS moves to support partial namespace in 
 memory - they is likely to be a fairly fundamental change in layout.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-5223) Allow edit log/fsimage format changes without changing layout version

2015-05-05 Thread Aaron T. Myers (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14529336#comment-14529336
 ] 

Aaron T. Myers commented on HDFS-5223:
--

bq. Complexity in HDFS often arises from combinations of its features rather 
than individual features in isolation. If individual features can be toggled, 
then no two HDFS instances running the same software version are really 
guaranteed to be alike. This becomes another layer of troubleshooting required 
for a technical support team. Testing the possible combinations of features on 
and off becomes a combinatorial explosion that's difficult for a QA team to 
manage.

This is an issue, to be sure, but is this really different with or without 
feature flags present? Even today, users can always choose to use or not use 
all the various features of HDFS in any number of combinations. The fact that 
presently all features are always enabled means that we should consider 
ourselves obligated to make sure that all features work well with all other 
features.

bq. Aside from managing metadata upgrades, we've also found rolling upgrade to 
be valuable because of the OOB ack propagated through write pipelines 
(HDFS-5583) to tell clients to pause rather than aborting the connection. Even 
if it wasn't required from a metadata standpoint, some users might continue to 
use rolling upgrade to get this benefit, even within a minor release line where 
the layout version hasn't changed. Considering that use case, I see value in 
improving our ability to downgrade within the current rolling upgrade scheme.

Fair point, but this suggests to me that the OOB ack feature should perhaps be 
separated from the rolling upgrade feature, since those seem somewhat 
orthogonal. One might want to use the OOB ack feature just when doing a rolling 
restart (no upgrade) to effect a configuration change, without the additional 
complexity of metadata changes, etc.

 Allow edit log/fsimage format changes without changing layout version
 -

 Key: HDFS-5223
 URL: https://issues.apache.org/jira/browse/HDFS-5223
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Affects Versions: 2.1.1-beta
Reporter: Aaron T. Myers
Assignee: Colin Patrick McCabe
 Attachments: HDFS-5223-HDFS-Downgrade-Extended-Support.pdf, 
 HDFS-5223.004.patch, HDFS-5223.005.patch


 Currently all HDFS on-disk formats are version by the single layout version. 
 This means that even for changes which might be backward compatible, like the 
 addition of a new edit log op code, we must go through the full `namenode 
 -upgrade' process which requires coordination with DNs, etc. HDFS should 
 support a lighter weight alternative.
 Copied description from HDFS-8075 which is a duplicate and now closed. (by 
 sanjay on APril 7 2015)
 Background
 * HDFS image layout was changed to use Protobufs to allow easier forward and 
 backward compatibility.
 * Hdfs has a layout version which is changed on each change (even if it an  
 optional protobuf field was added).
 * Hadoop supports two ways of going back during an upgrade:
 **  downgrade: go back to old binary version but use existing image/edits so 
 that newly created files are not lost
 ** rollback: go back to checkpoint created before upgrade was started - 
 hence newly created files are lost.
 Layout needs to be revisited if we want to support downgrade is some 
 circumstances which we dont today. Here are use cases:
 * Some changes can support downgrade even though they was a change in layout 
 since there is not real data loss but only loss of new functionality. E.g. 
 when we added ACLs one could have downgraded - there is no data loss but you 
 will lose the newly created ACLs. That is acceptable for a user since one 
 does not expect to retain the newly added ACLs in an old version.
 * Some changes may lead to data-loss if the functionality was used. For 
 example, the recent truncate will cause data loss if the functionality was 
 actually used. Now one can tell admins NOT use such new such new features 
 till the upgrade is finalized in which case one could potentially support 
 downgrade.
 * A fairly fundamental change to layout where a downgrade is not possible but 
 a rollback is. Say we change the layout completely from protobuf to something 
 else. Another example is when HDFS moves to support partial namespace in 
 memory - they is likely to be a fairly fundamental change in layout.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-5223) Allow edit log/fsimage format changes without changing layout version

2015-04-07 Thread Sanjay Radia (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14483654#comment-14483654
 ] 

Sanjay Radia commented on HDFS-5223:


For the edits one could require that in order to downgrade you must do a 
save-image and then delete the null edits-log. We would then limit our 
solution to the image. For the image we could do the following
* Add a *second* layout version field (call it compatible-layout-version) 
that indicates which version can safely read the image without data-loss. A NN 
that starts up will compare this field with its current layout version and then 
proceed as long as the edits is null.
** The ACL example (see Jira description) will state that the previous version  
can safely read the image without data loss. Of course newly created ACLs would 
be lost.
** Truncate example is tricky: one can safely downgrade if the truncate 
operation was not used. We could add code to not allow such new features till 
finalize is done.  This is somewhat analogous to what ext3 was trying to do 
with  its superblock feature flags (see Todd's comment above); what I am 
proposing is slightly different since it limits such features till upgrade is 
finalized while ext3's approach is more general in that you can downgrade at 
anytime as long as you have used the feature.  Alternatively, we could simply 
not support downgrade for such a feature and simply mark the 
compatible-layout-version accordingly.





 Allow edit log/fsimage format changes without changing layout version
 -

 Key: HDFS-5223
 URL: https://issues.apache.org/jira/browse/HDFS-5223
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Affects Versions: 2.1.1-beta
Reporter: Aaron T. Myers
Assignee: Colin Patrick McCabe
 Attachments: HDFS-5223.004.patch


 Currently all HDFS on-disk formats are version by the single layout version. 
 This means that even for changes which might be backward compatible, like the 
 addition of a new edit log op code, we must go through the full `namenode 
 -upgrade' process which requires coordination with DNs, etc. HDFS should 
 support a lighter weight alternative.
 Copied description from HDFS-8075 which is a duplicate and now closed.
 Background
 * HDFS image layout was changed to use Protobufs to allow easier forward and 
 backward compatibility.
 * Hdfs has a layout version which is changed on each change (even if it an  
 optional protobuf field was added).
 * Hadoop supports two ways of going back during an upgrade:
 **  downgrade: go back to old binary version but use existing image/edits so 
 that newly created files are not lost
 ** rollback: go back to checkpoint created before upgrade was started - 
 hence newly created files are lost.
 Layout needs to be revisited if we want to support downgrade is some 
 circumstances which we dont today. Here are use cases:
 * Some changes can support downgrade even though they was a change in layout 
 since there is not real data loss but only loss of new functionality. E.g. 
 when we added ACLs one could have downgraded - there is no data loss but you 
 will lose the newly created ACLs. That is acceptable for a user since one 
 does not expect to retain the newly added ACLs in an old version.
 * Some changes may lead to data-loss if the functionality was used. For 
 example, the recent truncate will cause data loss if the functionality was 
 actually used. Now one can tell admins NOT use such new such new features 
 till the upgrade is finalized in which case one could potentially support 
 downgrade.
 * A fairly fundamental change to layout where a downgrade is not possible but 
 a rollback is. Say we change the layout completely from protobuf to something 
 else. Another example is when HDFS moves to support partial namespace in 
 memory - they is likely to be a fairly fundamental change in layout.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-5223) Allow edit log/fsimage format changes without changing layout version

2015-04-07 Thread Sanjay Radia (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14483704#comment-14483704
 ] 

Sanjay Radia commented on HDFS-5223:


The above solution was inspired by Hive's ORC. They have two complementary 
mechanisms to address dealing with old and new binaries. They specify the 
oldest version that can safely read the new data (which inspired the solution i 
gave above) and also new binaries can write in older format. This second 
mechanim is too burdensome for HDFS. Instead I would prefer to disable the new 
new features after which one cannot downgrade. 

 Allow edit log/fsimage format changes without changing layout version
 -

 Key: HDFS-5223
 URL: https://issues.apache.org/jira/browse/HDFS-5223
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Affects Versions: 2.1.1-beta
Reporter: Aaron T. Myers
Assignee: Colin Patrick McCabe
 Attachments: HDFS-5223.004.patch


 Currently all HDFS on-disk formats are version by the single layout version. 
 This means that even for changes which might be backward compatible, like the 
 addition of a new edit log op code, we must go through the full `namenode 
 -upgrade' process which requires coordination with DNs, etc. HDFS should 
 support a lighter weight alternative.
 Copied description from HDFS-8075 which is a duplicate and now closed.
 Background
 * HDFS image layout was changed to use Protobufs to allow easier forward and 
 backward compatibility.
 * Hdfs has a layout version which is changed on each change (even if it an  
 optional protobuf field was added).
 * Hadoop supports two ways of going back during an upgrade:
 **  downgrade: go back to old binary version but use existing image/edits so 
 that newly created files are not lost
 ** rollback: go back to checkpoint created before upgrade was started - 
 hence newly created files are lost.
 Layout needs to be revisited if we want to support downgrade is some 
 circumstances which we dont today. Here are use cases:
 * Some changes can support downgrade even though they was a change in layout 
 since there is not real data loss but only loss of new functionality. E.g. 
 when we added ACLs one could have downgraded - there is no data loss but you 
 will lose the newly created ACLs. That is acceptable for a user since one 
 does not expect to retain the newly added ACLs in an old version.
 * Some changes may lead to data-loss if the functionality was used. For 
 example, the recent truncate will cause data loss if the functionality was 
 actually used. Now one can tell admins NOT use such new such new features 
 till the upgrade is finalized in which case one could potentially support 
 downgrade.
 * A fairly fundamental change to layout where a downgrade is not possible but 
 a rollback is. Say we change the layout completely from protobuf to something 
 else. Another example is when HDFS moves to support partial namespace in 
 memory - they is likely to be a fairly fundamental change in layout.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-5223) Allow edit log/fsimage format changes without changing layout version

2014-01-15 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13872634#comment-13872634
 ] 

Todd Lipcon commented on HDFS-5223:
---

Just skimmed really quick. I noticed that this doesn't separate flags into 
compatible ones and incompatible ones per the discussion above. 
Implementations like ext3 and nilfs2 do this so that it's possible to introduce 
features and label them as backward-compatible or at least 
readonly-back-compatible. 

As an example of a readonly-back-compatible flag, consider the case of adding 
new ops like Add cache pool and remove cache pool. An old NN could easily 
start up and simply ignore these opcodes that it doesn't understand (once we 
have protobuf-ified). Another example would be adding a new field to the inode 
structure such as preferred storage class. An old NN could simply ignore the 
new fields in read-only mode, or drop them relatively safely in a downgrade 
scenario. On the other hand, a feature such as compression, or adding 
OP_ADD_BLOCK instead of OP_UPDATE_BLOCK would not be ro-compatible since an old 
NN wouldn't be able to reconstruct the user data.

I think it would be short-sighted of us to not include a similar functionality 
in our flags, even if this initial patch doesn't handle the two types of flags 
differently.

 Allow edit log/fsimage format changes without changing layout version
 -

 Key: HDFS-5223
 URL: https://issues.apache.org/jira/browse/HDFS-5223
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Affects Versions: 2.1.1-beta
Reporter: Aaron T. Myers
Assignee: Colin Patrick McCabe
 Attachments: HDFS-5223.004.patch


 Currently all HDFS on-disk formats are version by the single layout version. 
 This means that even for changes which might be backward compatible, like the 
 addition of a new edit log op code, we must go through the full `namenode 
 -upgrade' process which requires coordination with DNs, etc. HDFS should 
 support a lighter weight alternative.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5223) Allow edit log/fsimage format changes without changing layout version

2014-01-15 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13872771#comment-13872771
 ] 

Colin Patrick McCabe commented on HDFS-5223:


I think the way to do backwards compatible feature flags would be to have a 
flag prefix such as compat_.  This would let older NameNodes know that the 
(new, unknown to them) flag they were looking at was compatible.  The 
information needs to be in the flag name, since it can't be in the older 
software.

However, I don't think we should implement this now.  We don't really have any 
infrastructure in place today to use backwards compatible feature flags.  
With the simple DataInputStream/DataOutputStream based decoding we have now, 
any extra field results in a loading failure.

So it's fair to say: we can't create even one backwards compatible feature 
flag without installing new software beyond what this patch provides.  Given 
that this is true, we should just implement compatible feature flags later when 
we know we need (or at least can use) them.  I am all for doing it after the 
protobuf merge.  But there's simply no reason to do it before because I don't 
believe a backwards compatible change can be made at this point.

 Allow edit log/fsimage format changes without changing layout version
 -

 Key: HDFS-5223
 URL: https://issues.apache.org/jira/browse/HDFS-5223
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Affects Versions: 2.1.1-beta
Reporter: Aaron T. Myers
Assignee: Colin Patrick McCabe
 Attachments: HDFS-5223.004.patch


 Currently all HDFS on-disk formats are version by the single layout version. 
 This means that even for changes which might be backward compatible, like the 
 addition of a new edit log op code, we must go through the full `namenode 
 -upgrade' process which requires coordination with DNs, etc. HDFS should 
 support a lighter weight alternative.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5223) Allow edit log/fsimage format changes without changing layout version

2013-09-26 Thread Nathan Roberts (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13779177#comment-13779177
 ] 

Nathan Roberts commented on HDFS-5223:
--

Thanks Aaron and Todd for bringing this up.

I love the flexibility of feature bits however I'm very nervous about the 
complexity they tend to bring. As long as there is incredibly tight controls it 
can work but more often than not I've seen this sort of approach lead to some 
incredibly unmaintainable code. The code can get very complex dealing with 
multiple combinations and the testing/QA can get also be very difficult to 
manage. Things can get overwhelmingly complex quite quickly. Having an 
-enableAllNewFeatures helps a bit but I'm not sure it lowers the complexity 
all that much. 

Of the two options, I'd lean in the direction of #1 at this point. 

iiuc, option 2 basically means that V2 software has to remember how to both 
read and write in V1 format whereas option 1 only requires that V2 be able to 
read V1 format (like we do today). I kind of like the fact that new software 
doesn't ever have to write things according to the older format. 

* When we update the SBN to V2 it would be allowed to come up and it would 
still be able to process V1 images/edits
* The first time it tries to write a new image, it would do so in V2 format 
* When uploading a new V2 image to ANN, the upload would not proceed because of 
the version mismatch (this way the ANN's local storage stays purely V1)
* At this point we can still rollback by simply re-bootstrapping the SBN
* Now we failover to the SBN, the SBN changes the shared edits area to indicate 
V2 (just an update to VERSION file I think)
* Upgrade old ANN with V2 software
* old ANN comes up as Standby, reads the new V2 image and starts processing new 
V2 edits (somewhere in here also has to change local storage to V2)

What's not great about this approach is that as soon as V2 software becomes 
active, we're writing in V2 format and at that point can't go back without 
losing edits. However, that's basically very similar to today's -upgrade. The 
only difference being that we haven't done anything to protect the blocks on 
the datanodes (with -upgrade we hardlink everything and therefore guarantee 
data blocks can't go away). So, maybe we need a mode where HDFS stops deleting 
blocks both from the NN's perspective (won't issue invalidates any longer), as 
well as from the DN side where it will ignore block deletion requests. Kind of 
a semi-safe-mode where the filesystem acts pretty much normally except that it 
refuses to delete any blocks. If we get ourselves into a true disaster-recovery 
situation, we can go back to V1 software + last V1 fsimage + all V1 edits that 
applied to that image + all blocks from the datanodes.




 Allow edit log/fsimage format changes without changing layout version
 -

 Key: HDFS-5223
 URL: https://issues.apache.org/jira/browse/HDFS-5223
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Affects Versions: 2.1.1-beta
Reporter: Aaron T. Myers

 Currently all HDFS on-disk formats are version by the single layout version. 
 This means that even for changes which might be backward compatible, like the 
 addition of a new edit log op code, we must go through the full `namenode 
 -upgrade' process which requires coordination with DNs, etc. HDFS should 
 support a lighter weight alternative.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-5223) Allow edit log/fsimage format changes without changing layout version

2013-09-26 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13779511#comment-13779511
 ] 

Todd Lipcon commented on HDFS-5223:
---

bq. I love the flexibility of feature bits however I'm very nervous about the 
complexity they tend to bring. As long as there is incredibly tight controls it 
can work but more often than not I've seen this sort of approach lead to some 
incredibly unmaintainable code. The code can get very complex dealing with 
multiple combinations and the testing/QA can get also be very difficult to 
manage. Things can get overwhelmingly complex quite quickly. 

I agree that it's a bit more complex, but I'm not sure it's quite as bad in our 
context as it might be in others. Most of our edit log changes to date have 
been fairly simple. Looking through the Feature enum, they tend to fall into 
the following categories:
- Entirely new opcodes (eg CONCAT) - these are easy to do on the writer side by 
just throwing an exception in logEdit() if the feature isn't supported. 
Sometimes these also involve a new set of data written to the FSImage (eg in 
the case of delegation token persistence) but again it should be pretty 
orthogonal to other features.
- New container format features (eg fsimage compression, or checksums on edit 
entries). These are new features which are off by default and orthogonal to any 
other features.
- Single additional fields in existing opcodes. We'd need to be somewhat 
careful not to make use of any of these fields if the feature isn't enabled, 
but I think there's usually pretty clear semantics.

Certainly it's more complex than option 1, but I think the ability to downgrade 
without data loss is pretty key. A lot of Hadoop operators are already hesitant 
to upgrade between minor versions already, and losing the ability to roll back 
would make it a non-starter for a lot of shops. If that's the case, then I 
think it would be really tough to add new opcodes or other format changes even 
between minor releases (eg 2.3 to 2.4) and convince an operator to do the 
upgrade.

Am I being overly conservative in what operators will put up with, instead of 
overly conservative in the complexity we introduce?


(btw, I agree completely about the no-delete mode -- I think a TTL delete 
mode is also a nice feature we could build in at the same time, where block 
deletions are always delayed for a day, to mitigate potential for data loss 
even with bugs present)

 Allow edit log/fsimage format changes without changing layout version
 -

 Key: HDFS-5223
 URL: https://issues.apache.org/jira/browse/HDFS-5223
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Affects Versions: 2.1.1-beta
Reporter: Aaron T. Myers

 Currently all HDFS on-disk formats are version by the single layout version. 
 This means that even for changes which might be backward compatible, like the 
 addition of a new edit log op code, we must go through the full `namenode 
 -upgrade' process which requires coordination with DNs, etc. HDFS should 
 support a lighter weight alternative.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-5223) Allow edit log/fsimage format changes without changing layout version

2013-09-18 Thread Aaron T. Myers (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13771059#comment-13771059
 ] 

Aaron T. Myers commented on HDFS-5223:
--

I was chatting about this informally with [~tlipcon] a day or two ago, and we 
came up with the following two alternative implementations:

# Introduce a new separate NN metadata version number which is decoupled from 
the existing layout version. We will allow the NN to start up if its NN 
metadata version number is higher than what's in the fsimage/edit log headers 
without requiring the '-upgrade' flag. From now on the addition of new edit log 
opcodes would increment the NN metadata version, and we would require that 
changes made to the format of existing fsimage/edit log entries be done in a 
backward compatible fashion. We would freeze the existing layout version 
number and from now on only increment this in the case of more fundamental NN 
metadata version changes.
# Introduce a set of NN metadata format feature flags which can be enabled or 
disabled by the admin at runtime. These feature flags could be enabled/disabled 
entirely independently, so we would move away from a strictly-increasing NN 
metadata version number. The fsimage and edit log header would be changed to 
enumerate which of these features were enabled. We will allow the NN to start 
up only if its software supports the full set of features identified in the 
fsimage/edit log headers.

I'd love to solicit others thoughts/feedback on these proposals, or suggest an 
alternative if you have one.

 Allow edit log/fsimage format changes without changing layout version
 -

 Key: HDFS-5223
 URL: https://issues.apache.org/jira/browse/HDFS-5223
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Affects Versions: 2.1.1-beta
Reporter: Aaron T. Myers

 Currently all HDFS on-disk formats are version by the single layout version. 
 This means that even for changes which might be backward compatible, like the 
 addition of a new edit log op code, we must go through the full `namenode 
 -upgrade' process which requires coordination with DNs, etc. HDFS should 
 support a lighter weight alternative.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-5223) Allow edit log/fsimage format changes without changing layout version

2013-09-18 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13771100#comment-13771100
 ] 

Todd Lipcon commented on HDFS-5223:
---

To expand a little bit on Aaron's summary of our discussion above.

*Proposal 1*:
- note that we already include a version number in the header of the edit log 
and image formats. So, within a single image or edits directories, you might 
now have different edit log segments or images with different version numbers 
-- the ones written post-upgrade would have a higher version number.
- note that this allows for in-place software upgrade, but not in-place 
software downgrade. Once you've written an edit log with the new version, you 
couldn't downgrade the NN back to the previous version, because it would refuse 
to read the higher-versioned edit log segment.

bq. and we would require that changes made to the format of existing 
fsimage/edit log entries be done in a backward compatible fashion

This isn't quite the case -- because the new edit log segments would have a new 
version number, we have the same ability to evolve opcodes as today. I verified 
with Aaron that he mis-stated this above.

*Proposal 2*:
- This is basically the way that file systems such as ext3 handle version 
compatibility. Every ext3 filesystem's superblock contains a set of flags which 
determine which features have been enabled for it. Similarly, we'd add 
something to the edit log and fsimage headers with a set of feature names. 
Here's the docs from Documentation/filesystems/ext2.txt in the kernel tree:

{code}
These feature flags have specific meanings for the kernel as follows:

A COMPAT flag indicates that a feature is present in the filesystem,
but the on-disk format is 100% compatible with older on-disk formats, so
a kernel which didn't know anything about this feature could read/write
the filesystem without any chance of corrupting the filesystem (or even
making it inconsistent).  This is essentially just a flag which says
this filesystem has a (hidden) feature that the kernel or e2fsck may
want to be aware of (more on e2fsck and feature flags later).  The ext3
HAS_JOURNAL feature is a COMPAT flag because the ext3 journal is simply
a regular file with data blocks in it so the kernel does not need to
take any special notice of it if it doesn't understand ext3 journaling.

An RO_COMPAT flag indicates that the on-disk format is 100% compatible
with older on-disk formats for reading (i.e. the feature does not change
the visible on-disk format).  However, an old kernel writing to such a
filesystem would/could corrupt the filesystem, so this is prevented. The
most common such feature, SPARSE_SUPER, is an RO_COMPAT feature because
sparse groups allow file data blocks where superblock/group descriptor
backups used to live, and ext2_free_blocks() refuses to free these blocks,
which would leading to inconsistent bitmaps.  An old kernel would also
get an error if it tried to free a series of blocks which crossed a group
boundary, but this is a legitimate layout in a SPARSE_SUPER filesystem.

An INCOMPAT flag indicates the on-disk format has changed in some
way that makes it unreadable by older kernels, or would otherwise
cause a problem if an old kernel tried to mount it.  FILETYPE is an
INCOMPAT flag because older kernels would think a filename was longer
than 256 characters, which would lead to corrupt directory listings.
The COMPRESSION flag is an obvious INCOMPAT flag - if the kernel
doesn't understand compression, you would just get garbage back from
read() instead of it automatically decompressing your data.  The ext3
RECOVER flag is needed to prevent a kernel which does not understand the
ext3 journal from mounting the filesystem without replaying the journal.
{code}

This would allow us to do rolling upgrades, run mixed-version clusters, and 
still retain the ability to roll back to a prior version until the new feature 
was used. So, to take the example of a feature like snapshots which required a 
metadata change, the admin workflow would be:

# Shutdown standby node
# Upgrade standby software version
# Start standby node, failover to it
# Shutdown and upgrade the old active, start it back up.
# Note: at this point, the format for the edit logs and images is identical to 
the pre-upgrade format, so the user could still roll back. Trying to create a 
snapshot at this point would fail with an error like Snapshots not enabled for 
this filesystem. Run dfsadmin -enableFeature snapshots to enable
# User runs the above command, which forces an edit log roll. The new edit logs 
contain the flag indicating that snapshots are enabled, and may use the new 
opcodes (or add new fields to the old opcodes as necessary)

If the explicit enable doesn't sit well with people, we could also add a 
slightly simpler version like -enableAllNewFeatures or whatever, which a user 
can use after an upgrade