subject:"\[jira\] \[Commented\] \(HDFS\-7859\) Erasure Coding\: Persist EC schemas in NameNode"


[ 
https://issues.apache.org/jira/browse/HDFS-7859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14515030#comment-14515030
 ] 

Allen Wittenauer commented on HDFS-7859:


test-patch.sh reads the name of the patch, not any of the JIRA metadata.  So if 
the patch is named something generic, it thinks it is trunk.  See 
HowToContribute for the official rules, but as you can see from the name of the 
patch above, it knows about a few different methods to name them.

 Erasure Coding: Persist EC schemas in NameNode
 --

 Key: HDFS-7859
 URL: https://issues.apache.org/jira/browse/HDFS-7859
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Kai Zheng
Assignee: Xinwei Qin 
 Attachments: HDFS-7859-HDFS-7285.002.patch, HDFS-7859.001.patch, 
 HDFS-7859.002.patch


 In meetup discussion with [~zhz] and [~jingzhao], it's suggested that we 
 persist EC schemas in NameNode centrally and reliably, so that EC zones can 
 reference them by name efficiently.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7859) Erasure Coding: Persist EC schemas in NameNode

2015-04-27 Thread Zhe Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-7859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14514890#comment-14514890
 ] 

Zhe Zhang commented on HDFS-7859:
-

[~aw] I quickly went through HDFS-7285 sub tasks. If you'd like you can try 
with HDFS-8236. 

I actually tried with HDFS-8033 earlier but it still tried to apply the patch 
against trunk. Maybe it's because I didn't set target version to HDFS-7285 
_when submitting patch_.

 Erasure Coding: Persist EC schemas in NameNode
 --

 Key: HDFS-7859
 URL: https://issues.apache.org/jira/browse/HDFS-7859
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Kai Zheng
Assignee: Xinwei Qin 
 Attachments: HDFS-7859-HDFS-7285.002.patch, HDFS-7859.001.patch, 
 HDFS-7859.002.patch


 In meetup discussion with [~zhz] and [~jingzhao], it's suggested that we 
 persist EC schemas in NameNode centrally and reliably, so that EC zones can 
 reference them by name efficiently.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7859) Erasure Coding: Persist EC schemas in NameNode


[ 
https://issues.apache.org/jira/browse/HDFS-7859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14514233#comment-14514233
 ] 

Allen Wittenauer commented on HDFS-7859:


(now we just need a submit button. lol)

 Erasure Coding: Persist EC schemas in NameNode
 --

 Key: HDFS-7859
 URL: https://issues.apache.org/jira/browse/HDFS-7859
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Kai Zheng
Assignee: Xinwei Qin 
 Attachments: HDFS-7859-HDFS-7285.002.patch, HDFS-7859.001.patch, 
 HDFS-7859.002.patch


 In meetup discussion with [~zhz] and [~jingzhao], it's suggested that we 
 persist EC schemas in NameNode centrally and reliably, so that EC zones can 
 reference them by name efficiently.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7859) Erasure Coding: Persist EC schemas in NameNode


[ 
https://issues.apache.org/jira/browse/HDFS-7859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14516309#comment-14516309
 ] 

Allen Wittenauer commented on HDFS-7859:


FYI, there are now two of these running:

https://builds.apache.org/job/PreCommit-HDFS-Build/10424/console
https://builds.apache.org/job/PreCommit-HDFS-Build/10425/console

It's still churning through hadoop-hdfs unit tests on the one that [~xinwei] 
submitted earlier.  hadoop-hdfs is one of the slowest set of unit tests we 
have. I have a hunch that you folks have added code in this branch which has 
made it even slower ... to the point that Jenkins will likely kill the test 
patch job before it finishes. 

 Erasure Coding: Persist EC schemas in NameNode
 --

 Key: HDFS-7859
 URL: https://issues.apache.org/jira/browse/HDFS-7859
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Kai Zheng
Assignee: Xinwei Qin 
 Attachments: HDFS-7859-HDFS-7285.002.patch, 
 HDFS-7859-HDFS-7285.002.patch, HDFS-7859.001.patch, HDFS-7859.002.patch


 In meetup discussion with [~zhz] and [~jingzhao], it's suggested that we 
 persist EC schemas in NameNode centrally and reliably, so that EC zones can 
 reference them by name efficiently.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7859) Erasure Coding: Persist EC schemas in NameNode

[
https://issues.apache.org/jira/browse/HDFS-7859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14514718#comment-14514718
]

Allen Wittenauer commented on HDFS-7859:

I did some playing with a test jira this morning. IIRC, It looks like submit
patch is only available to the requester and the assignee when the jira is in
the 'in progress' status. The 'in progress' status can only be changed by the
assignee and/or the requester. I then thought well, I'll force it through
jenkins... but test-patch.sh is smart in that it will only process jiras that
are in patch available status. So while I could have changed the meta info in
the JIRA to force it to kick off, I didn't want to freak anyone out more than I
already had by popping up in here. I thought it was going to be an easy/quick
test. :(

Running test-patch.sh as a developer against this JIRA # *does* run it against
the HDFS-7285 branch though, as expected. :D (I had tested patches against
branch-2, but hadn't had a chance to test against a dev branch... so this
updated last night and thought it'd be a good guinea pig)

Erasure Coding: Persist EC schemas in NameNode
--

Key: HDFS-7859
URL: https://issues.apache.org/jira/browse/HDFS-7859
Project: Hadoop HDFS
Issue Type: Sub-task
Reporter: Kai Zheng
Assignee: Xinwei Qin
Attachments: HDFS-7859-HDFS-7285.002.patch, HDFS-7859.001.patch,
HDFS-7859.002.patch

In meetup discussion with [~zhz] and [~jingzhao], it's suggested that we
persist EC schemas in NameNode centrally and reliably, so that EC zones can
reference them by name efficiently.

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7859) Erasure Coding: Persist EC schemas in NameNode

2015-04-27 Thread Zhe Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-7859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14514629#comment-14514629
 ] 

Zhe Zhang commented on HDFS-7859:
-

Thanks Allen. Do you know why Submit Patch isn't available here?

 Erasure Coding: Persist EC schemas in NameNode
 --

 Key: HDFS-7859
 URL: https://issues.apache.org/jira/browse/HDFS-7859
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Kai Zheng
Assignee: Xinwei Qin 
 Attachments: HDFS-7859-HDFS-7285.002.patch, HDFS-7859.001.patch, 
 HDFS-7859.002.patch


 In meetup discussion with [~zhz] and [~jingzhao], it's suggested that we 
 persist EC schemas in NameNode centrally and reliably, so that EC zones can 
 reference them by name efficiently.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7859) Erasure Coding: Persist EC schemas in NameNode

2015-04-27 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-7859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14516433#comment-14516433
 ] 

Hadoop QA commented on HDFS-7859:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  15m 23s | Pre-patch HDFS-7285 compilation 
is healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:red}-1{color} | tests included |   0m  0s | The patch doesn't appear 
to include any new or modified tests.  Please justify why no new tests are 
needed for this patch. Also please list what manual steps were performed to 
verify this patch. |
| {color:red}-1{color} | whitespace |   0m  0s | The patch has 2  line(s) that 
end in whitespace. |
| {color:green}+1{color} | javac |   7m 34s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |   9m 41s | There were no new javadoc 
warning messages. |
| {color:red}-1{color} | release audit |   0m 16s | The applied patch generated 
1 release audit warnings. |
| {color:red}-1{color} | checkstyle |   4m  1s | The applied patch generated  9 
 additional checkstyle issues. |
| {color:green}+1{color} | install |   1m 50s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 33s | The patch built with 
eclipse:eclipse. |
| {color:red}-1{color} | findbugs |   3m 13s | The patch appears to introduce 
11 new Findbugs (version 2.0.3) warnings. |
| {color:green}+1{color} | native |   3m 17s | Pre-build of native portion |
| {color:red}-1{color} | hdfs tests | 217m 55s | Tests failed in hadoop-hdfs. |
| | | 263m 53s | |
\\
\\
|| Reason || Tests ||
| FindBugs | module:hadoop-hdfs |
|  |  Inconsistent synchronization of 
org.apache.hadoop.hdfs.DFSOutputStream.streamer; locked 89% of time  
Unsynchronized access at DFSOutputStream.java:89% of time  Unsynchronized 
access at DFSOutputStream.java:[line 142] |
|  |  Result of integer multiplication cast to long in 
org.apache.hadoop.hdfs.DFSStripedInputStream.planReadPortions(int, int, long, 
int, int)  At DFSStripedInputStream.java:to long in 
org.apache.hadoop.hdfs.DFSStripedInputStream.planReadPortions(int, int, long, 
int, int)  At DFSStripedInputStream.java:[line 95] |
|  |  Dead store to offSuccess in 
org.apache.hadoop.hdfs.StripedDataStreamer.endBlock()  At 
StripedDataStreamer.java:org.apache.hadoop.hdfs.StripedDataStreamer.endBlock()  
At StripedDataStreamer.java:[line 104] |
|  |  Result of integer multiplication cast to long in 
org.apache.hadoop.hdfs.server.blockmanagement.BlockInfoStriped.spaceConsumed()  
At BlockInfoStriped.java:to long in 
org.apache.hadoop.hdfs.server.blockmanagement.BlockInfoStriped.spaceConsumed()  
At BlockInfoStriped.java:[line 208] |
|  |  Possible null pointer dereference of arr$ in 
org.apache.hadoop.hdfs.server.blockmanagement.BlockInfoStripedUnderConstruction.initializeBlockRecovery(long)
  Dereferenced at BlockInfoStripedUnderConstruction.java:arr$ in 
org.apache.hadoop.hdfs.server.blockmanagement.BlockInfoStripedUnderConstruction.initializeBlockRecovery(long)
  Dereferenced at BlockInfoStripedUnderConstruction.java:[line 206] |
|  |  Found reliance on default encoding in 
org.apache.hadoop.hdfs.server.namenode.ErasureCodingZoneManager.createErasureCodingZone(String,
 ECSchema):in 
org.apache.hadoop.hdfs.server.namenode.ErasureCodingZoneManager.createErasureCodingZone(String,
 ECSchema): String.getBytes()  At ErasureCodingZoneManager.java:[line 116] |
|  |  Found reliance on default encoding in 
org.apache.hadoop.hdfs.server.namenode.ErasureCodingZoneManager.getECZoneInfo(INodesInPath):in
 
org.apache.hadoop.hdfs.server.namenode.ErasureCodingZoneManager.getECZoneInfo(INodesInPath):
 new String(byte[])  At ErasureCodingZoneManager.java:[line 81] |
|  |  
org.apache.hadoop.hdfs.server.namenode.FSEditLogOp$AddECSchemaOp.toString() 
makes inefficient use of keySet iterator instead of entrySet iterator  At 
FSEditLogOp.java:keySet iterator instead of entrySet iterator  At 
FSEditLogOp.java:[line 4552] |
|  |  
org.apache.hadoop.hdfs.server.namenode.FSEditLogOp$ModifyECSchemaOp.toString() 
makes inefficient use of keySet iterator instead of entrySet iterator  At 
FSEditLogOp.java:keySet iterator instead of entrySet iterator  At 
FSEditLogOp.java:[line 4624] |
|  |  
org.apache.hadoop.hdfs.server.namenode.FSImageSerialization.writeECSchema(DataOutputStream,
 ECSchema) makes inefficient use of keySet iterator instead of entrySet 
iterator  At FSImageSerialization.java:of keySet iterator instead of entrySet 
iterator  At FSImageSerialization.java:[line 792] |
|  |  Result of integer multiplication cast to long in 
org.apache.hadoop.hdfs.util.StripedBlockUtil.constructInternalBlock(LocatedStripedBlock,
 int, int, int, int)  At StripedBlockUtil.java:to long in

[jira] [Commented] (HDFS-7859) Erasure Coding: Persist EC schemas in NameNode

2015-04-16 Thread Tsz Wo Nicholas Sze (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-7859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14498895#comment-14498895
 ] 

Tsz Wo Nicholas Sze commented on HDFS-7859:
---

HDFS-8062 does note require this since default schema can be hard coded.

 Erasure Coding: Persist EC schemas in NameNode
 --

 Key: HDFS-7859
 URL: https://issues.apache.org/jira/browse/HDFS-7859
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Kai Zheng
Assignee: Xinwei Qin 
 Attachments: HDFS-7859.001.patch


 In meetup discussion with [~zhz] and [~jingzhao], it's suggested that we 
 persist EC schemas in NameNode centrally and reliably, so that EC zones can 
 reference them by name efficiently.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7859) Erasure Coding: Persist EC schemas in NameNode

2015-04-16 Thread Kai Zheng (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-7859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14498847#comment-14498847
 ] 

Kai Zheng commented on HDFS-7859:
-

[~szetszwo] I don't have much time to sort the complete list yet but thought 
HDFS-8062 would be one.

 Erasure Coding: Persist EC schemas in NameNode
 --

 Key: HDFS-7859
 URL: https://issues.apache.org/jira/browse/HDFS-7859
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Kai Zheng
Assignee: Xinwei Qin 
 Attachments: HDFS-7859.001.patch


 In meetup discussion with [~zhz] and [~jingzhao], it's suggested that we 
 persist EC schemas in NameNode centrally and reliably, so that EC zones can 
 reference them by name efficiently.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7859) Erasure Coding: Persist EC schemas in NameNode

2015-04-15 Thread Kai Zheng (JIRA)

[
https://issues.apache.org/jira/browse/HDFS-7859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14497306#comment-14497306
]

Kai Zheng commented on HDFS-7859:
-

[~szetszwo],
bq.Since we don't not yet support add/delete/update/rename schema operations,
we don't need to persist anything in NN at this moment. We will support some of
these schema operations down the road. We may persist schemas at that time.
Sound good?
Please note it's not true we don't need to persist anything in NN at this
moment.. We had already persisted some hard-coded values that should be
covered by a schema in the image. Without this, we will definitely need to
revisit the image format change some time later. As I said above, it's flexible
enough in the schema definition and if we persist the whole schema object in
image, we would not likely need to change the image later. Please note this
issue blocks many subsequent issues and I thought we still have enough time for
them right before the merge happening.

Erasure Coding: Persist EC schemas in NameNode
--

Key: HDFS-7859
URL: https://issues.apache.org/jira/browse/HDFS-7859
Project: Hadoop HDFS
Issue Type: Sub-task
Reporter: Kai Zheng
Assignee: Xinwei Qin
Attachments: HDFS-7859.001.patch

In meetup discussion with [~zhz] and [~jingzhao], it's suggested that we
persist EC schemas in NameNode centrally and reliably, so that EC zones can
reference them by name efficiently.

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7859) Erasure Coding: Persist EC schemas in NameNode

2015-04-15 Thread Tsz Wo Nicholas Sze (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-7859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14497447#comment-14497447
 ] 

Tsz Wo Nicholas Sze commented on HDFS-7859:
---

 ... We had already persisted some hard-coded values that should be covered by 
 a schema ...

What do you mean?  Could you give an example?

 ... Please note this issue blocks many subsequent issues and I thought we 
 still have enough time for them right before the merge happening.

What are the subsequent issues?

 Erasure Coding: Persist EC schemas in NameNode
 --

 Key: HDFS-7859
 URL: https://issues.apache.org/jira/browse/HDFS-7859
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Kai Zheng
Assignee: Xinwei Qin 
 Attachments: HDFS-7859.001.patch


 In meetup discussion with [~zhz] and [~jingzhao], it's suggested that we 
 persist EC schemas in NameNode centrally and reliably, so that EC zones can 
 reference them by name efficiently.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7859) Erasure Coding: Persist EC schemas in NameNode

2015-04-15 Thread Kai Zheng (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-7859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14497485#comment-14497485
 ] 

Kai Zheng commented on HDFS-7859:
-

bq.What do you mean? Could you give an example?
Well, my last said was bad and inaccurate. After double checking related codes, 
I saw only stripped blocks derived from the following hard-coded values are 
persisted in the image. So please ignore the saying. 
bq.What are the subsequent issues?
We do have some and will sort them out later. I have opened HDFS-8156 to 
resolve some deps caused by HDFS-7866, originally planned to be done here.

 Erasure Coding: Persist EC schemas in NameNode
 --

 Key: HDFS-7859
 URL: https://issues.apache.org/jira/browse/HDFS-7859
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Kai Zheng
Assignee: Xinwei Qin 
 Attachments: HDFS-7859.001.patch


 In meetup discussion with [~zhz] and [~jingzhao], it's suggested that we 
 persist EC schemas in NameNode centrally and reliably, so that EC zones can 
 reference them by name efficiently.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7859) Erasure Coding: Persist EC schemas in NameNode

2015-04-15 Thread Kai Zheng (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-7859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14495853#comment-14495853
 ] 

Kai Zheng commented on HDFS-7859:
-

Hi [~szetszwo],

Per your request I updated the doc in HDFS-7337 accordingly. It entirely 
rewrote the schema section and mainly reflects existing related discussions and 
even implementations. I wish it addresses your questions here well. Your 
further comments and questions are very welcome. Thanks in advance!

 Erasure Coding: Persist EC schemas in NameNode
 --

 Key: HDFS-7859
 URL: https://issues.apache.org/jira/browse/HDFS-7859
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Kai Zheng
Assignee: Xinwei Qin 
 Attachments: HDFS-7859.001.patch


 In meetup discussion with [~zhz] and [~jingzhao], it's suggested that we 
 persist EC schemas in NameNode centrally and reliably, so that EC zones can 
 reference them by name efficiently.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7859) Erasure Coding: Persist EC schemas in NameNode


[ 
https://issues.apache.org/jira/browse/HDFS-7859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14493702#comment-14493702
 ] 

Xinwei Qin  commented on HDFS-7859:
---

OK, I will track it.

 Erasure Coding: Persist EC schemas in NameNode
 --

 Key: HDFS-7859
 URL: https://issues.apache.org/jira/browse/HDFS-7859
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Kai Zheng
Assignee: Xinwei Qin 
 Attachments: HDFS-7859.001.patch


 In meetup discussion with [~zhz] and [~jingzhao], it's suggested that we 
 persist EC schemas in NameNode centrally and reliably, so that EC zones can 
 reference them by name efficiently.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7859) Erasure Coding: Persist EC schemas in NameNode


[ 
https://issues.apache.org/jira/browse/HDFS-7859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14493695#comment-14493695
 ] 

Kai Zheng commented on HDFS-7859:
-

Note I have updated the patch in HDFS-7866 aligning with this. When it's 
getting in then this one can rebase and be in then.

 Erasure Coding: Persist EC schemas in NameNode
 --

 Key: HDFS-7859
 URL: https://issues.apache.org/jira/browse/HDFS-7859
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Kai Zheng
Assignee: Xinwei Qin 
 Attachments: HDFS-7859.001.patch


 In meetup discussion with [~zhz] and [~jingzhao], it's suggested that we 
 persist EC schemas in NameNode centrally and reliably, so that EC zones can 
 reference them by name efficiently.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7859) Erasure Coding: Persist EC schemas in NameNode


[ 
https://issues.apache.org/jira/browse/HDFS-7859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14493699#comment-14493699
 ] 

Xinwei Qin  commented on HDFS-7859:
---

[~drankye], thanks for your comments.
{quote}
1. Looks like this couples with HDFS-7866. Maybe I could commit HDFS-7866 first 
and then this gets all the left work done. Will it work for you this way?
{quote}
Yes, committing HDFS-7866 first is better.
bq. 2. What methods can ECSchemaManager call to make it happen? 
Some methods like {{logAddECSchema()}} in {{FSEditLog.java}} are missing, I 
will add them in next patch.
bq. 3. In ECSchemaManager, new methods like addECSchema are not necessarily 
public.
I will change to friendly.
bq. 4. Are we supporting the two formats? Please add Javadoc to explain them, 
thanks.
Yes, two formats are supported. These methods are all only called during 
namenode startup or do checkpoint, and which method is called depends on the 
FSImage format. I will add detail Javadoc on them.
bq.  5. Would you have separate issue(s) for the following?
I will create a new issue for it.


 Erasure Coding: Persist EC schemas in NameNode
 --

 Key: HDFS-7859
 URL: https://issues.apache.org/jira/browse/HDFS-7859
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Kai Zheng
Assignee: Xinwei Qin 
 Attachments: HDFS-7859.001.patch


 In meetup discussion with [~zhz] and [~jingzhao], it's suggested that we 
 persist EC schemas in NameNode centrally and reliably, so that EC zones can 
 reference them by name efficiently.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7859) Erasure Coding: Persist EC schemas in NameNode


[ 
https://issues.apache.org/jira/browse/HDFS-7859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14493703#comment-14493703
 ] 

Xinwei Qin  commented on HDFS-7859:
---

OK, I will track it.

 Erasure Coding: Persist EC schemas in NameNode
 --

 Key: HDFS-7859
 URL: https://issues.apache.org/jira/browse/HDFS-7859
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Kai Zheng
Assignee: Xinwei Qin 
 Attachments: HDFS-7859.001.patch


 In meetup discussion with [~zhz] and [~jingzhao], it's suggested that we 
 persist EC schemas in NameNode centrally and reliably, so that EC zones can 
 reference them by name efficiently.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7859) Erasure Coding: Persist EC schemas in NameNode


[ 
https://issues.apache.org/jira/browse/HDFS-7859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14493704#comment-14493704
 ] 

Xinwei Qin  commented on HDFS-7859:
---

OK, I will track it.

 Erasure Coding: Persist EC schemas in NameNode
 --

 Key: HDFS-7859
 URL: https://issues.apache.org/jira/browse/HDFS-7859
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Kai Zheng
Assignee: Xinwei Qin 
 Attachments: HDFS-7859.001.patch


 In meetup discussion with [~zhz] and [~jingzhao], it's suggested that we 
 persist EC schemas in NameNode centrally and reliably, so that EC zones can 
 reference them by name efficiently.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7859) Erasure Coding: Persist EC schemas in NameNode

[
https://issues.apache.org/jira/browse/HDFS-7859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14495498#comment-14495498
]

Tsz Wo Nicholas Sze commented on HDFS-7859:
---

The patch under this JIRA handles saving / loading these default schemas in
fsimage. I think this is necessary even without loading custom schemas from
XML. Otherwise we cannot guarantee the NameNode which loads the fsimage has
the same default schemas as the NameNode which saved it. It is obviously even
more necessary when we add custom schemas ...

I think we should not persist anything to NN before we have a clear design
since we don't know what to persist. For example, should we persist schema ID?
We are not able to answer this question since we don't even know if a schema
should have an ID.

If we change the layout later on, it requires cluster upgrade for the new
layout and we have to support the old layout for backward compatibility.

For now, I suggest to just hard code the only (6,3)-Reed-Solomon schema. We
don't even need the xml file.

Erasure Coding: Persist EC schemas in NameNode
--

Key: HDFS-7859
URL: https://issues.apache.org/jira/browse/HDFS-7859
Project: Hadoop HDFS
Issue Type: Sub-task
Reporter: Kai Zheng
Assignee: Xinwei Qin
Attachments: HDFS-7859.001.patch

In meetup discussion with [~zhz] and [~jingzhao], it's suggested that we
persist EC schemas in NameNode centrally and reliably, so that EC zones can
reference them by name efficiently.

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7859) Erasure Coding: Persist EC schemas in NameNode


[ 
https://issues.apache.org/jira/browse/HDFS-7859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14495546#comment-14495546
 ] 

Tsz Wo Nicholas Sze commented on HDFS-7859:
---

 ... schema name for the ID purpose. ...

There are a few choice choices:
# Using schema name as ID
# A schema name and a separated numeric ID
# Multiple schema names and a numeric ID

Why using #1?

 Erasure Coding: Persist EC schemas in NameNode
 --

 Key: HDFS-7859
 URL: https://issues.apache.org/jira/browse/HDFS-7859
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Kai Zheng
Assignee: Xinwei Qin 
 Attachments: HDFS-7859.001.patch


 In meetup discussion with [~zhz] and [~jingzhao], it's suggested that we 
 persist EC schemas in NameNode centrally and reliably, so that EC zones can 
 reference them by name efficiently.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7859) Erasure Coding: Persist EC schemas in NameNode


[ 
https://issues.apache.org/jira/browse/HDFS-7859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14495553#comment-14495553
 ] 

Tsz Wo Nicholas Sze commented on HDFS-7859:
---

 ... We would persist the whole schema object ...

How can we be sure that the schema object format won't change?

Since we don't not yet support add/delete/update/rename schema operations, we 
don't need to persist anything in NN at this moment.  We will support some of 
these schema operations down the road.  We may persist schemas at that time.  
Sound good?

 Erasure Coding: Persist EC schemas in NameNode
 --

 Key: HDFS-7859
 URL: https://issues.apache.org/jira/browse/HDFS-7859
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Kai Zheng
Assignee: Xinwei Qin 
 Attachments: HDFS-7859.001.patch


 In meetup discussion with [~zhz] and [~jingzhao], it's suggested that we 
 persist EC schemas in NameNode centrally and reliably, so that EC zones can 
 reference them by name efficiently.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7859) Erasure Coding: Persist EC schemas in NameNode

2015-04-14 Thread Zhe Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-7859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14495442#comment-14495442
 ] 

Zhe Zhang commented on HDFS-7859:
-

[~szetszwo] / [~drankye]: The [phasing plan | 
https://issues.apache.org/jira/browse/HDFS-7285?focusedCommentId=14391207page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14391207]
 I posted might be a little confusing in regards of schemas. My apologies.

In the offline meetup on 03/31, we didn't reach a clear conclusion on how much 
of schema work to include before merging. Therefore I left it in phase I, but 
marked it as optional. My thought was that we could make a better decision 
after observing how fast the work could proceed. Up to this point I think this 
thread is going pretty well and it seems we can have a multi-schema 
implementation when other HDFS-7285 tasks are done (see details below).

Good [questions | 
https://issues.apache.org/jira/browse/HDFS-7859?focusedCommentId=14494933page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14494933]
 on schema design. I think we eventually need to answer them in the broader 
scope of HDFS-7337. IIUC HDFS-7859 / HDFS-7866 are not touching most of the 
tricky scenarios. Based on Kai's latest [comment | 
https://issues.apache.org/jira/browse/HDFS-7866?focusedCommentId=14494050page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14494050],
 HDFS-7866 will mostly handle _default_ schemas embedded in the 
{{ECSchemaManager}} code. 

The patch under this JIRA handles saving / loading these default schemas in 
fsimage. I think this is necessary even without loading custom schemas from 
XML. Otherwise we cannot guarantee the NameNode which loads the fsimage has the 
same default schemas as the NameNode which saved it. It is obviously even more 
necessary when we add custom schemas. The logic in the patch is quite 
straightforward; it's mostly about serialize / deserialize schemas.

So here's my proposal:
# Shrink this patch to get rid of logics on modifying and removing schemas 
({{ECSchemaManager#modifyECSchema}} and {{OP_MODIFY_EC_SCHEMA}}). 
# Repurpose HDFS-7866 to focus on loading custom schemas from site xml files.

[~szetszwo], [~drankye], [~vinayrpet]: let me know if you agree with the above. 
If we are all synced on this, how about moving this JIRA back to HDFS-7285 and 
keeping HDFS-7866 under HDFS-8031?

 Erasure Coding: Persist EC schemas in NameNode
 --

 Key: HDFS-7859
 URL: https://issues.apache.org/jira/browse/HDFS-7859
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Kai Zheng
Assignee: Xinwei Qin 
 Attachments: HDFS-7859.001.patch


 In meetup discussion with [~zhz] and [~jingzhao], it's suggested that we 
 persist EC schemas in NameNode centrally and reliably, so that EC zones can 
 reference them by name efficiently.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7859) Erasure Coding: Persist EC schemas in NameNode


[ 
https://issues.apache.org/jira/browse/HDFS-7859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14495556#comment-14495556
 ] 

Kai Zheng commented on HDFS-7859:
-

bq.Using schema name as ID
As we would not make it heavy so don't have some field like {{description}} for 
an {{ECSchema}}, a friendly name like {{RS-6-3}} would make it more sense in 
the way rather than an number ID. Users should be clearly understand the schema 
before using it to create any zone. The name will help with identifying that. 
bq.We don't even need the xml file.
Yeah, if we would do that thru command to define a schema by specifying the 
schema parameters, it should also be OK. I don't have strongly preference about 
that. Any file format or even not using file would also work I guess. We talked 
about this in the meetup, looks like XML file was synced.

 Erasure Coding: Persist EC schemas in NameNode
 --

 Key: HDFS-7859
 URL: https://issues.apache.org/jira/browse/HDFS-7859
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Kai Zheng
Assignee: Xinwei Qin 
 Attachments: HDFS-7859.001.patch


 In meetup discussion with [~zhz] and [~jingzhao], it's suggested that we 
 persist EC schemas in NameNode centrally and reliably, so that EC zones can 
 reference them by name efficiently.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7859) Erasure Coding: Persist EC schemas in NameNode


[ 
https://issues.apache.org/jira/browse/HDFS-7859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14495493#comment-14495493
 ] 

Kai Zheng commented on HDFS-7859:
-

Hi [~zhz],

Thanks for taking care of this and your good suggestion. It looks reasonable to 
me. This will sound like a more solid base for the merge. 

To summarize further:
1. This issue HDFS-7859 would provide two system defined schemas in Java codes: 
one is the system default schema (rs-6-3), already there; a new one, suggesting 
rs-10-4; It also ensure the two schemas will be persisted in the image/editlog 
for later querying.
2. The left gaps will be processed as follow-on to be done in HDFS-7866, mainly 
about how to customize site specific schemas thru a XML file. The design will 
also be updated.

 Erasure Coding: Persist EC schemas in NameNode
 --

 Key: HDFS-7859
 URL: https://issues.apache.org/jira/browse/HDFS-7859
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Kai Zheng
Assignee: Xinwei Qin 
 Attachments: HDFS-7859.001.patch


 In meetup discussion with [~zhz] and [~jingzhao], it's suggested that we 
 persist EC schemas in NameNode centrally and reliably, so that EC zones can 
 reference them by name efficiently.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7859) Erasure Coding: Persist EC schemas in NameNode


[ 
https://issues.apache.org/jira/browse/HDFS-7859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14495517#comment-14495517
 ] 

Kai Zheng commented on HDFS-7859:
-

bq.we don't know what to persist. For example, should we persist schema ID? We 
are not able to answer this question since we don't even know if a schema 
should have an ID.
It's not true. We have {{ECSchema}} defined and it uses schema name for the ID 
purpose. We would persist the whole schema object. The on-going work although 
isn't reflected in the design doc but we did do that following our related 
discussion. In the meetup with [~zhz] and [~jingzhao], we covered this aspect 
and even your questions already. It's my mistake I didn't put it down clearly 
and update the doc accordingly. 

 Erasure Coding: Persist EC schemas in NameNode
 --

 Key: HDFS-7859
 URL: https://issues.apache.org/jira/browse/HDFS-7859
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Kai Zheng
Assignee: Xinwei Qin 
 Attachments: HDFS-7859.001.patch


 In meetup discussion with [~zhz] and [~jingzhao], it's suggested that we 
 persist EC schemas in NameNode centrally and reliably, so that EC zones can 
 reference them by name efficiently.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7859) Erasure Coding: Persist EC schemas in NameNode


[ 
https://issues.apache.org/jira/browse/HDFS-7859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14495560#comment-14495560
 ] 

Kai Zheng commented on HDFS-7859:
-

bq.How can we be sure that the schema object format won't change?
Good question. In {{ECSchema}} class, in addition to the common parameters 
widely used by typical erasure codecs, an {{options}} map is also included so 
potentially any complex codec can use it to contain its own specific parameters 
or key-value pairs, such parameters are subject to its corresponding erasure 
coders to interpret. We try to make it flexible enough to avoid such change, 
but in case it needs change anyway, I thought it's supported, I mean the image 
layout version.

 Erasure Coding: Persist EC schemas in NameNode
 --

 Key: HDFS-7859
 URL: https://issues.apache.org/jira/browse/HDFS-7859
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Kai Zheng
Assignee: Xinwei Qin 
 Attachments: HDFS-7859.001.patch


 In meetup discussion with [~zhz] and [~jingzhao], it's suggested that we 
 persist EC schemas in NameNode centrally and reliably, so that EC zones can 
 reference them by name efficiently.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7859) Erasure Coding: Persist EC schemas in NameNode

[
https://issues.apache.org/jira/browse/HDFS-7859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14494933#comment-14494933
]

Tsz Wo Nicholas Sze commented on HDFS-7859:
---

I have the following questions:
- How to add a schema? Using a command?
- Then, is it possible to delete a schema? It seems that we have to support
deletion since a schema may be created by mistake or there could be typos when
creating a schema.
- If deletion is supported, what to do with the existing files with that schema?
- Do we support renaming schema?
- Does a EC schema have a schema ID?

I think we need a design for EC schema to answer all these questions and
specify what operations are supported.

BTW, we only support one schema (6,3)-Reed-Solomon in the first phase
HDFS-7285. I think we should focus on finishing a complete, working basic EC
feature and get HDFS-7285 merged to trunk. How about moving this JIRA and
related JIRAs to HDFS-8031 and defer the work? Sorry for commenting on this
late and thanks for all the good works.

Erasure Coding: Persist EC schemas in NameNode
--

Key: HDFS-7859
URL: https://issues.apache.org/jira/browse/HDFS-7859
Project: Hadoop HDFS
Issue Type: Sub-task
Reporter: Kai Zheng
Assignee: Xinwei Qin
Attachments: HDFS-7859.001.patch

In meetup discussion with [~zhz] and [~jingzhao], it's suggested that we
persist EC schemas in NameNode centrally and reliably, so that EC zones can
reference them by name efficiently.

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7859) Erasure Coding: Persist EC schemas in NameNode


[ 
https://issues.apache.org/jira/browse/HDFS-7859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14495040#comment-14495040
 ] 

Kai Zheng commented on HDFS-7859:
-

Hi [~szetszwo], thanks for taking care of this.
These questions did be considered thru the related work. The overall design and 
discussion are in HDFS-7337, would you take a look at it. Let's discuss further 
there. I will sort out latest discussions and clearly answer your questions 
there. Thanks. 

 Erasure Coding: Persist EC schemas in NameNode
 --

 Key: HDFS-7859
 URL: https://issues.apache.org/jira/browse/HDFS-7859
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Kai Zheng
Assignee: Xinwei Qin 
 Attachments: HDFS-7859.001.patch


 In meetup discussion with [~zhz] and [~jingzhao], it's suggested that we 
 persist EC schemas in NameNode centrally and reliably, so that EC zones can 
 reference them by name efficiently.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7859) Erasure Coding: Persist EC schemas in NameNode


[ 
https://issues.apache.org/jira/browse/HDFS-7859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14495049#comment-14495049
 ] 

Tsz Wo Nicholas Sze commented on HDFS-7859:
---

 ... The overall design and discussion are in HDFS-7337, would you take a look 
 at it. ...

Yes, I looked at it earlier but it did not answer my questions.  Since 
HDFS-7337 is already under HDFS-8031, let's move all the related works to 
HDFS-8031.

 Erasure Coding: Persist EC schemas in NameNode
 --

 Key: HDFS-7859
 URL: https://issues.apache.org/jira/browse/HDFS-7859
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Kai Zheng
Assignee: Xinwei Qin 
 Attachments: HDFS-7859.001.patch


 In meetup discussion with [~zhz] and [~jingzhao], it's suggested that we 
 persist EC schemas in NameNode centrally and reliably, so that EC zones can 
 reference them by name efficiently.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7859) Erasure Coding: Persist EC schemas in NameNode


[ 
https://issues.apache.org/jira/browse/HDFS-7859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14495059#comment-14495059
 ] 

Kai Zheng commented on HDFS-7859:
-

HDFS-7337 is rather large, we're implementing its related tasks incrementally. 
In your view, what's the difficulty that makes this sub-task hard to be in the 
merge?

 Erasure Coding: Persist EC schemas in NameNode
 --

 Key: HDFS-7859
 URL: https://issues.apache.org/jira/browse/HDFS-7859
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Kai Zheng
Assignee: Xinwei Qin 
 Attachments: HDFS-7859.001.patch


 In meetup discussion with [~zhz] and [~jingzhao], it's suggested that we 
 persist EC schemas in NameNode centrally and reliably, so that EC zones can 
 reference them by name efficiently.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7859) Erasure Coding: Persist EC schemas in NameNode