[jira] [Commented] (HDFS-8137) Sends the EC schema to DataNode as well in EC encoding/recovering command

2015-05-04 Thread Kai Zheng (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8137?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14527762#comment-14527762
 ] 

Kai Zheng commented on HDFS-8137:
-

Thanks Uma for the update. The patch looks great. +1

> Sends the EC schema to DataNode as well in EC encoding/recovering command
> -
>
> Key: HDFS-8137
> URL: https://issues.apache.org/jira/browse/HDFS-8137
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Kai Zheng
>Assignee: Uma Maheswara Rao G
> Attachments: HDFS-8137-0.patch, HDFS-8137-1.patch, HDFS-8137-2.patch
>
>
> Discussed with [~umamaheswararao] and [~vinayrpet], we should also send the 
> EC schema to DataNode as well contained in the EC encoding/recovering 
> command. The target DataNode will use it to guide the executing of the task. 
> Another way would be, DataNode would just request schema actively thru a 
> separate RPC call, and as an optimization consideration, DataNode may cache 
> schemas to avoid repeatedly asking for the same schema twice.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8137) Sends the EC schema to DataNode as well in EC encoding/recovering command

2015-05-04 Thread Kai Zheng (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8137?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14526509#comment-14526509
 ] 

Kai Zheng commented on HDFS-8137:
-

Thanks for your update, Uma. The patch looks good and just two minor comments:
* Regarding the following codes, I'm not very sure about the log output. Would 
it be better to be:
{code}
blockLog.warn( "Failed to get the ECSchema for the file {} ", 
filename_of_the_block);
...
blockLog.warn( "No EC Schema found for the file {} ", filename_of_the_block);
{code}
{code}
+ECSchema ecSchema = null;
+try {
+  ecSchema = namesystem.getECSchemaForPath(block
+  .getBlockCollection().getName());
+} catch (IOException e) {
+  blockLog.warn(
+  "Failed to get the ECSchema for the blockGroup {} ", block);
+}
+if (ecSchema == null) {
+  blockLog.warn("No EC Schema found for the blockGroup {} , "
+  + "so ignoring the block group for EC", block);
+  // TODO: we may have to revisit later for what we can do better 
to
+  // handle this case.
+  continue;
+}
{code}
* In the codes below, might be better to compare ecSchema1 and ecSchema2 with 
the system default schema used.
{code}
+ECSchema ecSchema1 = blkECRecoveryInfo1.getECSchema();
+ECSchema ecSchema2 = blkECRecoveryInfo2.getECSchema();
+assertEquals(ecSchema1.getSchemaName(), ecSchema2.getSchemaName());
+assertEquals(ecSchema1.getNumDataUnits(), ecSchema2.getNumDataUnits());
+assertEquals(ecSchema1.getNumParityUnits(), ecSchema2.getNumParityUnits());
{code}

> Sends the EC schema to DataNode as well in EC encoding/recovering command
> -
>
> Key: HDFS-8137
> URL: https://issues.apache.org/jira/browse/HDFS-8137
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Kai Zheng
>Assignee: Uma Maheswara Rao G
> Attachments: HDFS-8137-0.patch, HDFS-8137-1.patch
>
>
> Discussed with [~umamaheswararao] and [~vinayrpet], we should also send the 
> EC schema to DataNode as well contained in the EC encoding/recovering 
> command. The target DataNode will use it to guide the executing of the task. 
> Another way would be, DataNode would just request schema actively thru a 
> separate RPC call, and as an optimization consideration, DataNode may cache 
> schemas to avoid repeatedly asking for the same schema twice.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8137) Sends the EC schema to DataNode as well in EC encoding/recovering command

2015-04-30 Thread Uma Maheswara Rao G (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8137?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14521585#comment-14521585
 ] 

Uma Maheswara Rao G commented on HDFS-8137:
---

Thanks a lot for the review Kai. 
Good catch. You are right, we are storing in xattrs along with zone. 

{quote}
ECSchemaManager might not be supposed to get a schema associated with a zone, 
dir/file, but ErasureCodingZoneManager may do.
{quote}
By mistake I said as ECSchemaManager. Your are right, I should have said as 
ErasureCodingZoneManager as it has that related code what I was talking.

Also I added the getECSchema API in namesystem itself as we have already added 
some ECSchema related API in FSNameSystem.  For reusing the codes from 
ECZoneManager codes, keeping this new API in namesystem would give us the 
flexibility, but we can not get the same flexibility from BlockCollection as we 
can not access FSDirectory details there. 

Please check if the latest patch make sense for you?


> Sends the EC schema to DataNode as well in EC encoding/recovering command
> -
>
> Key: HDFS-8137
> URL: https://issues.apache.org/jira/browse/HDFS-8137
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Kai Zheng
>Assignee: Uma Maheswara Rao G
> Attachments: HDFS-8137-0.patch, HDFS-8137-1.patch
>
>
> Discussed with [~umamaheswararao] and [~vinayrpet], we should also send the 
> EC schema to DataNode as well contained in the EC encoding/recovering 
> command. The target DataNode will use it to guide the executing of the task. 
> Another way would be, DataNode would just request schema actively thru a 
> separate RPC call, and as an optimization consideration, DataNode may cache 
> schemas to avoid repeatedly asking for the same schema twice.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8137) Sends the EC schema to DataNode as well in EC encoding/recovering command

2015-04-30 Thread Kai Zheng (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8137?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14521460#comment-14521460
 ] 

Kai Zheng commented on HDFS-8137:
-

Hi Uma,
bq.We supposed to get schema values from ECSchemaManager, but right now I don't 
see a better way to get from ECScheaManeger, so I added an API to get from 
BlockCollection itself like isStriped API in it.
{{ECSchemaManager}} might not be supposed to get a schema associated with a 
zone, dir/file, but {{ErasureCodingZoneManager}} may do. We could query the 
schema info from a zone using ErasureCodingZoneManager. I thought it's good to 
add the method {{getECSchema}} along with the existing method {{isStriped}}, as 
it's essential to erasure coded files.
A quick look at the patch found it might need to align with some latest 
changes, regarding how to get schema from a zone/dir/xAttr, would you double 
check? Thanks.

> Sends the EC schema to DataNode as well in EC encoding/recovering command
> -
>
> Key: HDFS-8137
> URL: https://issues.apache.org/jira/browse/HDFS-8137
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Kai Zheng
>Assignee: Uma Maheswara Rao G
> Attachments: HDFS-8137-0.patch
>
>
> Discussed with [~umamaheswararao] and [~vinayrpet], we should also send the 
> EC schema to DataNode as well contained in the EC encoding/recovering 
> command. The target DataNode will use it to guide the executing of the task. 
> Another way would be, DataNode would just request schema actively thru a 
> separate RPC call, and as an optimization consideration, DataNode may cache 
> schemas to avoid repeatedly asking for the same schema twice.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8137) Sends the EC schema to DataNode as well in EC encoding/recovering command

2015-04-30 Thread Kai Zheng (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8137?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14521123#comment-14521123
 ] 

Kai Zheng commented on HDFS-8137:
-

Uma thanks for the patch and good comments. I'd like to look at this and give 
my thoughts later today.

> Sends the EC schema to DataNode as well in EC encoding/recovering command
> -
>
> Key: HDFS-8137
> URL: https://issues.apache.org/jira/browse/HDFS-8137
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Kai Zheng
>Assignee: Uma Maheswara Rao G
> Attachments: HDFS-8137-0.patch
>
>
> Discussed with [~umamaheswararao] and [~vinayrpet], we should also send the 
> EC schema to DataNode as well contained in the EC encoding/recovering 
> command. The target DataNode will use it to guide the executing of the task. 
> Another way would be, DataNode would just request schema actively thru a 
> separate RPC call, and as an optimization consideration, DataNode may cache 
> schemas to avoid repeatedly asking for the same schema twice.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)