from:"Yi Liu \(JIRA\)"

[jira] [Created] (HADOOP-18238) Hadoop 3.3.1 SFTPFileSystem.close() method have problem

2022-05-16 Thread yi liu (Jira)

yi liu created HADOOP-18238:
---

 Summary: Hadoop 3.3.1 SFTPFileSystem.close() method have problem
 Key: HADOOP-18238
 URL: https://issues.apache.org/jira/browse/HADOOP-18238
 Project: Hadoop Common
  Issue Type: Bug
  Components: common
Affects Versions: 3.3.1
Reporter: yi liu


@Override
public void close() throws IOException {
if (closed.getAndSet(true)) {
return;
}
try {
super.close();
} finally {
if (connectionPool != null) {
connectionPool.shutdown();
}
}
}

 

if  you  exe this method ,the  fs can not exec deleteOnExsist method，because 
the fs is closed.

如果手动调用，sftp fs执行close方法关闭连接池，让jvm能正常退出，deleteOnExsist 
将因为fs已关闭无法执行成功。如果不关闭，则连接池不会释放，jvm不能退出。

https://issues.apache.org/jira/browse/HADOOP-17528，这是3.2.0 sftpfilesystem的问题

 



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Commented] (HADOOP-13184) Add "Apache" to Hadoop project logo

2016-06-07 Thread Yi Liu (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-13184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15320013#comment-15320013
 ] 

Yi Liu commented on HADOOP-13184:
-

option 1 is more beautiful, +1.

> Add "Apache" to Hadoop project logo
> ---
>
> Key: HADOOP-13184
> URL: https://issues.apache.org/jira/browse/HADOOP-13184
> Project: Hadoop Common
>  Issue Type: Task
>Reporter: Chris Douglas
>Assignee: Abhishek
>
> Many ASF projects include "Apache" in their logo. We should add it to Hadoop.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Commented] (HADOOP-12756) Incorporate Aliyun OSS file system implementation

2016-05-27 Thread Yi Liu (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-12756?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15303697#comment-15303697
 ] 

Yi Liu commented on HADOOP-12756:
-

{quote}
 I'd recommend a wider conversation on the dev mailing lists before filing any 
specific requests to infra.
{quote}
+1 for this.

Another thing for the "auth-keys.xml", currently we use the credential file 
instead of normal Hadoop configuration property, I think the reason is it's 
more secure and the user can control the linux file permissions of 
"auth-keys.xml".  Could we allow the normal Hadoop configuration property for 
the credentials too, then we can specify the credentials through mvn build 
command line which could be more easily supported by the INFRA.  While user can 
still use the "auth-keys.xml" in practice. 



> Incorporate Aliyun OSS file system implementation
> -
>
> Key: HADOOP-12756
> URL: https://issues.apache.org/jira/browse/HADOOP-12756
> Project: Hadoop Common
>  Issue Type: New Feature
>  Components: fs
>Affects Versions: 2.8.0
>Reporter: shimingfei
>Assignee: shimingfei
> Attachments: HADOOP-12756-v02.patch, HCFS User manual.md, OSS 
> integration.pdf, OSS integration.pdf
>
>
> Aliyun OSS is widely used among China’s cloud users, but currently it is not 
> easy to access data laid on OSS storage from user’s Hadoop/Spark application, 
> because of no original support for OSS in Hadoop.
> This work aims to integrate Aliyun OSS with Hadoop. By simple configuration, 
> Spark/Hadoop applications can read/write data from OSS without any code 
> change. Narrowing the gap between user’s APP and data storage, like what have 
> been done for S3 in Hadoop 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Commented] (HADOOP-12756) Incorporate Aliyun OSS file system implementation

2016-05-27 Thread Yi Liu (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-12756?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15303667#comment-15303667
 ] 

Yi Liu commented on HADOOP-12756:
-

Agree with [~cnauroth].  The credentials need to go somewhere accessible by 
each Jenkins host that runs a Hadoop pre-commit build.   

{code}
have a dedicated host (or vm) equipped with all these credentials and run all 
the tests daily
{code}
Kai, I think it's not to find a dedicated host, instead, we need to make the 
auth-keys.xml available on all the Jenkins hosts that run Hadoop pre-commit 
build.  Not sure whether it's easy to support this by the INFRA.

{code}
It seems these two files should not be included in source code, as what 
.gitingore has excluded. Maybe we can provide these two files separately?
{code}
[~lingzhou], please don't add the credentials in patch. It's unexpected.

> Incorporate Aliyun OSS file system implementation
> -
>
> Key: HADOOP-12756
> URL: https://issues.apache.org/jira/browse/HADOOP-12756
> Project: Hadoop Common
>  Issue Type: New Feature
>  Components: fs
>Affects Versions: 2.8.0
>Reporter: shimingfei
>Assignee: shimingfei
> Attachments: HADOOP-12756-v02.patch, HCFS User manual.md, OSS 
> integration.pdf, OSS integration.pdf
>
>
> Aliyun OSS is widely used among China’s cloud users, but currently it is not 
> easy to access data laid on OSS storage from user’s Hadoop/Spark application, 
> because of no original support for OSS in Hadoop.
> This work aims to integrate Aliyun OSS with Hadoop. By simple configuration, 
> Spark/Hadoop applications can read/write data from OSS without any code 
> change. Narrowing the gap between user’s APP and data storage, like what have 
> been done for S3 in Hadoop 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Commented] (HADOOP-11180) Fix warning of "token.Token: Cannot find class for token kind kms-dt" for KMS when running jobs on Encryption zones

2016-04-27 Thread Yi Liu (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-11180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15261461#comment-15261461
 ] 

Yi Liu commented on HADOOP-11180:
-

Sure, thanks Andrew and Steve.  Here the log level change should be safe.

> Fix warning of "token.Token: Cannot find class for token kind kms-dt" for KMS 
> when running jobs on Encryption zones
> ---
>
> Key: HADOOP-11180
> URL: https://issues.apache.org/jira/browse/HADOOP-11180
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: kms, security
>Affects Versions: 2.6.0
>Reporter: Yi Liu
>Assignee: Yi Liu
>  Labels: BB2015-05-TBR
> Attachments: HADOOP-11180.001.patch
>
>
> This issue is produced when running MapReduce job and encryption zones are 
> configured.
> {quote}
> 14/10/09 05:06:02 INFO security.TokenCache: Got dt for 
> hdfs://hnode1.sh.intel.com:9000; Kind: HDFS_DELEGATION_TOKEN, Service: 
> 10.239.47.8:9000, Ident: (HDFS_DELEGATION_TOKEN token 21 for user)
> 14/10/09 05:06:02 WARN token.Token: Cannot find class for token kind kms-dt
> 14/10/09 05:06:02 INFO security.TokenCache: Got dt for 
> hdfs://hnode1.sh.intel.com:9000; Kind: kms-dt, Service: 10.239.47.8:16000, 
> Ident: 00 04 75 73 65 72 04 79 61 72 6e 00 8a 01 48 f1 8e 85 07 8a 01 49 15 
> 9b 09 07 04 02
> 14/10/09 05:06:03 INFO input.FileInputFormat: Total input paths to process : 1
> 14/10/09 05:06:03 INFO mapreduce.JobSubmitter: number of splits:1
> 14/10/09 05:06:03 INFO mapreduce.JobSubmitter: Submitting tokens for job: 
> job_141272197_0004
> 14/10/09 05:06:03 WARN token.Token: Cannot find class for token kind kms-dt
> 14/10/09 05:06:03 WARN token.Token: Cannot find class for token kind kms-dt
> {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HADOOP-12756) Incorporate Aliyun OSS file system implementation

2016-04-25 Thread Yi Liu (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-12756?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15255941#comment-15255941
 ] 

Yi Liu commented on HADOOP-12756:
-

Also the name "oss" is abbreviation of Object Store Service, it's too generic, 
I think we need to change the name to ali-oss or some other names which other 
people can understand what it is at first glance. 

> Incorporate Aliyun OSS file system implementation
> -
>
> Key: HADOOP-12756
> URL: https://issues.apache.org/jira/browse/HADOOP-12756
> Project: Hadoop Common
>  Issue Type: New Feature
>  Components: fs
>Reporter: shimingfei
>Assignee: shimingfei
> Attachments: 0001-OSS-filesystem-integration-with-Hadoop.patch, HCFS 
> User manual.md, OSS integration.pdf, OSS integration.pdf
>
>
> Aliyun OSS is widely used among China’s cloud users, but currently it is not 
> easy to access data laid on OSS storage from user’s Hadoop/Spark application, 
> because of no original support for OSS in Hadoop.
> This work aims to integrate Aliyun OSS with Hadoop. By simple configuration, 
> Spark/Hadoop applications can read/write data from OSS without any code 
> change. Narrowing the gap between user’s APP and data storage, like what have 
> been done for S3 in Hadoop 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HADOOP-12756) Incorporate Aliyun OSS file system implementation

2016-04-24 Thread Yi Liu (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-12756?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15255929#comment-15255929
 ] 

Yi Liu commented on HADOOP-12756:
-

Thanks for mingfei and Lei for the work. 

Hi [~ste...@apache.org] and [~cnauroth], regarding testability, they have 
talked with me offline, Aliyun created an account for the test and they 
retained the account for Hadoop, they wanted to pass the username/password 
through "-D" in mvn command. So the basic functionalities could be verified by 
unit tests.  Does this make sense to you?

Mingfei and Lei:
About the ali-oss client, does it rely on a different version of httpclient? 
Could we use the version which hadoop is using?

I will post my detailed comments later.



> Incorporate Aliyun OSS file system implementation
> -
>
> Key: HADOOP-12756
> URL: https://issues.apache.org/jira/browse/HADOOP-12756
> Project: Hadoop Common
>  Issue Type: New Feature
>  Components: fs
>Reporter: shimingfei
>Assignee: shimingfei
> Attachments: 0001-OSS-filesystem-integration-with-Hadoop.patch, HCFS 
> User manual.md, OSS integration.pdf, OSS integration.pdf
>
>
> Aliyun OSS is widely used among China’s cloud users, but currently it is not 
> easy to access data laid on OSS storage from user’s Hadoop/Spark application, 
> because of no original support for OSS in Hadoop.
> This work aims to integrate Aliyun OSS with Hadoop. By simple configuration, 
> Spark/Hadoop applications can read/write data from OSS without any code 
> change. Narrowing the gap between user’s APP and data storage, like what have 
> been done for S3 in Hadoop 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HADOOP-12040) Adjust inputs order for the decode API in raw erasure coder

2015-10-28 Thread Yi Liu (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-12040?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14977915#comment-14977915
 ] 

Yi Liu commented on HADOOP-12040:
-

Will commit shortly

> Adjust inputs order for the decode API in raw erasure coder
> ---
>
> Key: HADOOP-12040
> URL: https://issues.apache.org/jira/browse/HADOOP-12040
> Project: Hadoop Common
>  Issue Type: Sub-task
>Reporter: Kai Zheng
>Assignee: Kai Zheng
> Attachments: HADOOP-12040-HDFS-7285-v1.patch, HADOOP-12040-v2.patch, 
> HADOOP-12040-v3.patch, HADOOP-12040-v4.patch
>
>
> Currently we used the parity units + data units order for the inputs, 
> erasedIndexes and outputs parameters in the decode call in raw erasure coder, 
> which inherited from HDFS-RAID due to impact enforced by {{GaliosField}}. As 
> [~zhz] pointed and [~hitliuyi] felt, we'd better change the order to make it 
> natural for HDFS usage, where usually data blocks are before parity blocks in 
> a group. Doing this would avoid some reordering tricky logic.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HADOOP-12040) Adjust inputs order for the decode API in raw erasure coder

2015-10-28 Thread Yi Liu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-12040?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yi Liu updated HADOOP-12040:

  Resolution: Fixed
Hadoop Flags: Reviewed
   Fix Version/s: 2.8.0
Target Version/s: 2.8.0
  Status: Resolved  (was: Patch Available)

Committed to trunk and branch-2.

> Adjust inputs order for the decode API in raw erasure coder
> ---
>
> Key: HADOOP-12040
> URL: https://issues.apache.org/jira/browse/HADOOP-12040
> Project: Hadoop Common
>  Issue Type: Sub-task
>Reporter: Kai Zheng
>Assignee: Kai Zheng
> Fix For: 2.8.0
>
> Attachments: HADOOP-12040-HDFS-7285-v1.patch, HADOOP-12040-v2.patch, 
> HADOOP-12040-v3.patch, HADOOP-12040-v4.patch
>
>
> Currently we used the parity units + data units order for the inputs, 
> erasedIndexes and outputs parameters in the decode call in raw erasure coder, 
> which inherited from HDFS-RAID due to impact enforced by {{GaliosField}}. As 
> [~zhz] pointed and [~hitliuyi] felt, we'd better change the order to make it 
> natural for HDFS usage, where usually data blocks are before parity blocks in 
> a group. Doing this would avoid some reordering tricky logic.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HADOOP-12040) Adjust inputs order for the decode API in raw erasure coder

2015-10-27 Thread Yi Liu (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-12040?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14976024#comment-14976024
 ] 

Yi Liu commented on HADOOP-12040:
-

Generally looks good, Kai.

1. You need to cleanup the checkstype issue. For example, some line is longer 
than 80 characters. 
2. Some related tests show failure, such as TestRecoverStripedFile
3. 
{code}+for (int i = 0; i < erasedIndexes.length; i++) {
+  if (erasedIndexes[i] >= getNumDataUnits()) {
+erasedIndexes2[idx++] = erasedIndexes[i] - getNumDataUnits();
+numErasedParityUnits++;
+  }
+}
+for (int i = 0; i < erasedIndexes.length; i++) {
+  if (erasedIndexes[i] < getNumDataUnits()) {
+erasedIndexes2[idx++] = erasedIndexes[i] + getNumParityUnits();
+numErasedDataUnits++;
+  }
+}
{code}
This can be done in a {{for}}.



> Adjust inputs order for the decode API in raw erasure coder
> ---
>
> Key: HADOOP-12040
> URL: https://issues.apache.org/jira/browse/HADOOP-12040
> Project: Hadoop Common
>  Issue Type: Sub-task
>Reporter: Kai Zheng
>Assignee: Kai Zheng
> Attachments: HADOOP-12040-HDFS-7285-v1.patch, HADOOP-12040-v2.patch, 
> HADOOP-12040-v3.patch
>
>
> Currently we used the parity units + data units order for the inputs, 
> erasedIndexes and outputs parameters in the decode call in raw erasure coder, 
> which inherited from HDFS-RAID due to impact enforced by {{GaliosField}}. As 
> [~zhz] pointed and [~hitliuyi] felt, we'd better change the order to make it 
> natural for HDFS usage, where usually data blocks are before parity blocks in 
> a group. Doing this would avoid some reordering tricky logic.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HADOOP-12040) Adjust inputs order for the decode API in raw erasure coder

2015-10-27 Thread Yi Liu (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-12040?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14977510#comment-14977510
 ] 

Yi Liu commented on HADOOP-12040:
-

For the comment #3, it's not convenient to do, so don't need to address it.

> Adjust inputs order for the decode API in raw erasure coder
> ---
>
> Key: HADOOP-12040
> URL: https://issues.apache.org/jira/browse/HADOOP-12040
> Project: Hadoop Common
>  Issue Type: Sub-task
>Reporter: Kai Zheng
>Assignee: Kai Zheng
> Attachments: HADOOP-12040-HDFS-7285-v1.patch, HADOOP-12040-v2.patch, 
> HADOOP-12040-v3.patch
>
>
> Currently we used the parity units + data units order for the inputs, 
> erasedIndexes and outputs parameters in the decode call in raw erasure coder, 
> which inherited from HDFS-RAID due to impact enforced by {{GaliosField}}. As 
> [~zhz] pointed and [~hitliuyi] felt, we'd better change the order to make it 
> natural for HDFS usage, where usually data blocks are before parity blocks in 
> a group. Doing this would avoid some reordering tricky logic.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HADOOP-12040) Adjust inputs order for the decode API in raw erasure coder

2015-10-27 Thread Yi Liu (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-12040?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14977564#comment-14977564
 ] 

Yi Liu commented on HADOOP-12040:
-

+1 pending Jenkins.

> Adjust inputs order for the decode API in raw erasure coder
> ---
>
> Key: HADOOP-12040
> URL: https://issues.apache.org/jira/browse/HADOOP-12040
> Project: Hadoop Common
>  Issue Type: Sub-task
>Reporter: Kai Zheng
>Assignee: Kai Zheng
> Attachments: HADOOP-12040-HDFS-7285-v1.patch, HADOOP-12040-v2.patch, 
> HADOOP-12040-v3.patch, HADOOP-12040-v4.patch
>
>
> Currently we used the parity units + data units order for the inputs, 
> erasedIndexes and outputs parameters in the decode call in raw erasure coder, 
> which inherited from HDFS-RAID due to impact enforced by {{GaliosField}}. As 
> [~zhz] pointed and [~hitliuyi] felt, we'd better change the order to make it 
> natural for HDFS usage, where usually data blocks are before parity blocks in 
> a group. Doing this would avoid some reordering tricky logic.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HADOOP-12483) Maintain wrapped SASL ordering for postponed IPC responses

2015-10-18 Thread Yi Liu (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-12483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14962749#comment-14962749
 ] 

Yi Liu commented on HADOOP-12483:
-

+1, looks good to me. Thanks [~daryn], will commit shortly.

> Maintain wrapped SASL ordering for postponed IPC responses
> --
>
> Key: HADOOP-12483
> URL: https://issues.apache.org/jira/browse/HADOOP-12483
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: ipc
>Affects Versions: 2.8.0
>Reporter: Daryn Sharp
>Assignee: Daryn Sharp
>Priority: Critical
> Attachments: HADOOP-12483.patch
>
>
> A SASL encryption algorithm (wrapping) may have a required ordering for 
> encrypted responses.  The IPC layer encrypts when the response is set based 
> on the assumption it is being immediately sent.  Postponed responses violate 
> that assumption.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HADOOP-12483) Maintain wrapped SASL ordering for postponed IPC responses

2015-10-18 Thread Yi Liu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-12483?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yi Liu updated HADOOP-12483:

   Resolution: Fixed
 Hadoop Flags: Reviewed
Fix Version/s: 2.8.0
   Status: Resolved  (was: Patch Available)

Committed to trunk and branch-2.

> Maintain wrapped SASL ordering for postponed IPC responses
> --
>
> Key: HADOOP-12483
> URL: https://issues.apache.org/jira/browse/HADOOP-12483
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: ipc
>Affects Versions: 2.8.0
>Reporter: Daryn Sharp
>Assignee: Daryn Sharp
>Priority: Critical
> Fix For: 2.8.0
>
> Attachments: HADOOP-12483.patch
>
>
> A SASL encryption algorithm (wrapping) may have a required ordering for 
> encrypted responses.  The IPC layer encrypts when the response is set based 
> on the assumption it is being immediately sent.  Postponed responses violate 
> that assumption.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HADOOP-10300) Allowed deferred sending of call responses

2015-10-12 Thread Yi Liu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-10300?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yi Liu updated HADOOP-10300:

   Resolution: Fixed
 Hadoop Flags: Reviewed
Fix Version/s: 2.8.0
   Status: Resolved  (was: Patch Available)

Committed to trunk and branch-2.

> Allowed deferred sending of call responses
> --
>
> Key: HADOOP-10300
> URL: https://issues.apache.org/jira/browse/HADOOP-10300
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: ipc
>Affects Versions: 2.0.0-alpha, 3.0.0
>Reporter: Daryn Sharp
>Assignee: Daryn Sharp
>  Labels: BB2015-05-TBR
> Fix For: 2.8.0
>
> Attachments: HADOOP-10300.patch, HADOOP-10300.patch, 
> HADOOP-10300.patch
>
>
> RPC handlers currently do not return until the RPC call completes and 
> response is sent, or a partially sent response has been queued for the 
> responder.  It would be useful for a proxy method to notify the handler to 
> not yet the send the call's response.
> An potential use case is a namespace handler in the NN might want to return 
> before the edit log is synced so it can service more requests and allow 
> increased batching of edits per sync.  Background syncing could later trigger 
> the sending of the call response to the client.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HADOOP-10300) Allowed deferred sending of call responses

2015-10-12 Thread Yi Liu (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-10300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14952778#comment-14952778
 ] 

Yi Liu commented on HADOOP-10300:
-

+1, thanks [~daryn]. 

> Allowed deferred sending of call responses
> --
>
> Key: HADOOP-10300
> URL: https://issues.apache.org/jira/browse/HADOOP-10300
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: ipc
>Affects Versions: 2.0.0-alpha, 3.0.0
>Reporter: Daryn Sharp
>Assignee: Daryn Sharp
>  Labels: BB2015-05-TBR
> Attachments: HADOOP-10300.patch, HADOOP-10300.patch, 
> HADOOP-10300.patch
>
>
> RPC handlers currently do not return until the RPC call completes and 
> response is sent, or a partially sent response has been queued for the 
> responder.  It would be useful for a proxy method to notify the handler to 
> not yet the send the call's response.
> An potential use case is a namespace handler in the NN might want to return 
> before the edit log is synced so it can service more requests and allow 
> increased batching of edits per sync.  Background syncing could later trigger 
> the sending of the call response to the client.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HADOOP-12448) TestTextCommand: use mkdirs rather than mkdir to create test directory

2015-09-29 Thread Yi Liu (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-12448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14936280#comment-14936280
 ] 

Yi Liu commented on HADOOP-12448:
-

+1, thanks [~cmccabe] and [~cnauroth], will commit it shortly.

> TestTextCommand: use mkdirs rather than mkdir to create test directory
> --
>
> Key: HADOOP-12448
> URL: https://issues.apache.org/jira/browse/HADOOP-12448
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: test
>Affects Versions: 2.8.0
>Reporter: Colin Patrick McCabe
>Assignee: Colin Patrick McCabe
> Attachments: HADOOP-12448.001.patch, HADOOP-12448.002.patch
>
>
> TestTextCommand should use mkdirs rather than mkdir to create the test 
> directory.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HADOOP-12448) TestTextCommand: use mkdirs rather than mkdir to create test directory

2015-09-29 Thread Yi Liu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-12448?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yi Liu updated HADOOP-12448:

   Resolution: Fixed
 Hadoop Flags: Reviewed
Fix Version/s: 2.8.0
   Status: Resolved  (was: Patch Available)

Committed to trunk and branch-2.

> TestTextCommand: use mkdirs rather than mkdir to create test directory
> --
>
> Key: HADOOP-12448
> URL: https://issues.apache.org/jira/browse/HADOOP-12448
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: test
>Affects Versions: 2.8.0
>Reporter: Colin Patrick McCabe
>Assignee: Colin Patrick McCabe
> Fix For: 2.8.0
>
> Attachments: HADOOP-12448.001.patch, HADOOP-12448.002.patch
>
>
> TestTextCommand should use mkdirs rather than mkdir to create the test 
> directory.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HADOOP-12367) Move TestFileUtil's test resources to resources folder

2015-09-01 Thread Yi Liu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-12367?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yi Liu updated HADOOP-12367:

   Resolution: Fixed
 Hadoop Flags: Reviewed
Fix Version/s: 2.8.0
   Status: Resolved  (was: Patch Available)

Committed to trunk and branch-2, thanks [~andrew.wang].

> Move TestFileUtil's test resources to resources folder
> --
>
> Key: HADOOP-12367
> URL: https://issues.apache.org/jira/browse/HADOOP-12367
> Project: Hadoop Common
>  Issue Type: Improvement
>Affects Versions: 2.7.1
>Reporter: Andrew Wang
>Assignee: Andrew Wang
>Priority: Minor
> Fix For: 2.8.0
>
> Attachments: HADOOP-12367.001.patch, HADOOP-12367.002.patch
>
>
> Little cleanup. Right now we do an antrun step to copy the tar and tgz from 
> the source folder to target folder. We can skip this by just putting it in 
> the resources folder like all the other test resources.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HADOOP-12367) Move TestFileUtil's test resources to resources folder

2015-08-31 Thread Yi Liu (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-12367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14724540#comment-14724540
 ] 

Yi Liu commented on HADOOP-12367:
-

+1 pending Jenkins.  Thanks for the cleanup.

> Move TestFileUtil's test resources to resources folder
> --
>
> Key: HADOOP-12367
> URL: https://issues.apache.org/jira/browse/HADOOP-12367
> Project: Hadoop Common
>  Issue Type: Improvement
>Affects Versions: 2.7.1
>Reporter: Andrew Wang
>Assignee: Andrew Wang
>Priority: Minor
> Attachments: HADOOP-12367.001.patch
>
>
> Little cleanup. Right now we do an antrun step to copy the tar and tgz from 
> the source folder to target folder. We can skip this by just putting it in 
> the resources folder like all the other test resources.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HADOOP-10300) Allowed deferred sending of call responses

2015-08-18 Thread Yi Liu (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-10300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14702373#comment-14702373
 ] 

Yi Liu commented on HADOOP-10300:
-

Yes, it's OK with me.  Original patch looks good to me.
The trunk has some change since that time, please rebase it and let me take 
another look. Thanks.

 Allowed deferred sending of call responses
 --

 Key: HADOOP-10300
 URL: https://issues.apache.org/jira/browse/HADOOP-10300
 Project: Hadoop Common
  Issue Type: Sub-task
  Components: ipc
Affects Versions: 2.0.0-alpha, 3.0.0
Reporter: Daryn Sharp
Assignee: Daryn Sharp
  Labels: BB2015-05-TBR
 Attachments: HADOOP-10300.patch, HADOOP-10300.patch


 RPC handlers currently do not return until the RPC call completes and 
 response is sent, or a partially sent response has been queued for the 
 responder.  It would be useful for a proxy method to notify the handler to 
 not yet the send the call's response.
 An potential use case is a namespace handler in the NN might want to return 
 before the edit log is synced so it can service more requests and allow 
 increased batching of edits per sync.  Background syncing could later trigger 
 the sending of the call response to the client.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HADOOP-12295) Improve NetworkTopology#InnerNode#remove logic

2015-08-13 Thread Yi Liu (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-12295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14694910#comment-14694910
 ] 

Yi Liu commented on HADOOP-12295:
-

Thanks [~vinayrpet] for the review, committed to trunk and branch-2.  I can 
address the comment if Chris has, thanks.

 Improve NetworkTopology#InnerNode#remove logic
 --

 Key: HADOOP-12295
 URL: https://issues.apache.org/jira/browse/HADOOP-12295
 Project: Hadoop Common
  Issue Type: Improvement
Reporter: Yi Liu
Assignee: Yi Liu
 Attachments: HADOOP-12295.001.patch


 In {{NetworkTopology#InnerNode#remove}}, We can use {{childrenMap}} to get 
 the parent node, no need to loop the {{children}} list. Then it is more 
 efficient since in most cases deleting parent node doesn't happen.
 Another nit in current code is:
 {code}
   String parent = n.getNetworkLocation();
   String currentPath = getPath(this);
 {code}
 can be in closure of {{\!isAncestor\(n\)}}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HADOOP-12295) Improve NetworkTopology#InnerNode#remove logic

2015-08-13 Thread Yi Liu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-12295?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yi Liu updated HADOOP-12295:

   Resolution: Fixed
 Hadoop Flags: Reviewed
Fix Version/s: 2.8.0
   Status: Resolved  (was: Patch Available)

 Improve NetworkTopology#InnerNode#remove logic
 --

 Key: HADOOP-12295
 URL: https://issues.apache.org/jira/browse/HADOOP-12295
 Project: Hadoop Common
  Issue Type: Improvement
Reporter: Yi Liu
Assignee: Yi Liu
 Fix For: 2.8.0

 Attachments: HADOOP-12295.001.patch


 In {{NetworkTopology#InnerNode#remove}}, We can use {{childrenMap}} to get 
 the parent node, no need to loop the {{children}} list. Then it is more 
 efficient since in most cases deleting parent node doesn't happen.
 Another nit in current code is:
 {code}
   String parent = n.getNetworkLocation();
   String currentPath = getPath(this);
 {code}
 can be in closure of {{\!isAncestor\(n\)}}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HADOOP-12295) Improve NetworkTopology#InnerNode#remove logic

2015-08-03 Thread Yi Liu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-12295?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yi Liu updated HADOOP-12295:

Description: 
In {{NetworkTopology#InnerNode#remove}}, We can use {{childrenMap}} to get the 
parent node, no need to loop the {{children}} list.
Another nit in current code is:
{code}
  String parent = n.getNetworkLocation();
  String currentPath = getPath(this);
{code}
can be in closure of {{\!isAncestor\(n\)}}

  was:In {{NetworkTopology#InnerNode#remove}}, We can use {{childrenMap}} to 
get the parent node, no need to loop the {{children}} list.


 Improve NetworkTopology#InnerNode#remove logic
 --

 Key: HADOOP-12295
 URL: https://issues.apache.org/jira/browse/HADOOP-12295
 Project: Hadoop Common
  Issue Type: Improvement
Reporter: Yi Liu
Assignee: Yi Liu

 In {{NetworkTopology#InnerNode#remove}}, We can use {{childrenMap}} to get 
 the parent node, no need to loop the {{children}} list.
 Another nit in current code is:
 {code}
   String parent = n.getNetworkLocation();
   String currentPath = getPath(this);
 {code}
 can be in closure of {{\!isAncestor\(n\)}}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HADOOP-12295) Improve NetworkTopology#InnerNode#remove logic

2015-08-03 Thread Yi Liu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-12295?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yi Liu updated HADOOP-12295:

Description: 
In {{NetworkTopology#InnerNode#remove}}, We can use {{childrenMap}} to get the 
parent node, no need to loop the {{children}} list. Then it is more efficient 
since in most cases deleting parent node doesn't happen.
Another nit in current code is:
{code}
  String parent = n.getNetworkLocation();
  String currentPath = getPath(this);
{code}
can be in closure of {{\!isAncestor\(n\)}}

  was:
In {{NetworkTopology#InnerNode#remove}}, We can use {{childrenMap}} to get the 
parent node, no need to loop the {{children}} list.
Another nit in current code is:
{code}
  String parent = n.getNetworkLocation();
  String currentPath = getPath(this);
{code}
can be in closure of {{\!isAncestor\(n\)}}


 Improve NetworkTopology#InnerNode#remove logic
 --

 Key: HADOOP-12295
 URL: https://issues.apache.org/jira/browse/HADOOP-12295
 Project: Hadoop Common
  Issue Type: Improvement
Reporter: Yi Liu
Assignee: Yi Liu
 Attachments: HADOOP-12295.001.patch


 In {{NetworkTopology#InnerNode#remove}}, We can use {{childrenMap}} to get 
 the parent node, no need to loop the {{children}} list. Then it is more 
 efficient since in most cases deleting parent node doesn't happen.
 Another nit in current code is:
 {code}
   String parent = n.getNetworkLocation();
   String currentPath = getPath(this);
 {code}
 can be in closure of {{\!isAncestor\(n\)}}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HADOOP-12295) Improve NetworkTopology#InnerNode#remove logic

2015-08-03 Thread Yi Liu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-12295?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yi Liu updated HADOOP-12295:

Description: In {{NetworkTopology#InnerNode#remove}}, We can use 
{{childrenMap}} to get the parent node, no need to loop the {{children}} list.  
(was: In {{ NetworkTopology#InnerNode#remove}}, We can use {{childrenMap}} to 
get the parent node, no need to loop the {{children}} list.)

 Improve NetworkTopology#InnerNode#remove logic
 --

 Key: HADOOP-12295
 URL: https://issues.apache.org/jira/browse/HADOOP-12295
 Project: Hadoop Common
  Issue Type: Improvement
Reporter: Yi Liu
Assignee: Yi Liu

 In {{NetworkTopology#InnerNode#remove}}, We can use {{childrenMap}} to get 
 the parent node, no need to loop the {{children}} list.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HADOOP-12295) Improve NetworkTopology#InnerNode#remove logic

2015-08-03 Thread Yi Liu (JIRA)

Yi Liu created HADOOP-12295:
---

 Summary: Improve NetworkTopology#InnerNode#remove logic
 Key: HADOOP-12295
 URL: https://issues.apache.org/jira/browse/HADOOP-12295
 Project: Hadoop Common
  Issue Type: Improvement
Reporter: Yi Liu
Assignee: Yi Liu


In {{ NetworkTopology#InnerNode#remove}}, We can use {{childrenMap}} to get the 
parent node, no need to loop the {{children}} list.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HADOOP-12295) Improve NetworkTopology#InnerNode#remove logic

2015-08-03 Thread Yi Liu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-12295?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yi Liu updated HADOOP-12295:

Attachment: HADOOP-12295.001.patch

 Improve NetworkTopology#InnerNode#remove logic
 --

 Key: HADOOP-12295
 URL: https://issues.apache.org/jira/browse/HADOOP-12295
 Project: Hadoop Common
  Issue Type: Improvement
Reporter: Yi Liu
Assignee: Yi Liu
 Attachments: HADOOP-12295.001.patch


 In {{NetworkTopology#InnerNode#remove}}, We can use {{childrenMap}} to get 
 the parent node, no need to loop the {{children}} list.
 Another nit in current code is:
 {code}
   String parent = n.getNetworkLocation();
   String currentPath = getPath(this);
 {code}
 can be in closure of {{\!isAncestor\(n\)}}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HADOOP-12295) Improve NetworkTopology#InnerNode#remove logic

2015-08-03 Thread Yi Liu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-12295?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yi Liu updated HADOOP-12295:

Status: Patch Available  (was: Open)

 Improve NetworkTopology#InnerNode#remove logic
 --

 Key: HADOOP-12295
 URL: https://issues.apache.org/jira/browse/HADOOP-12295
 Project: Hadoop Common
  Issue Type: Improvement
Reporter: Yi Liu
Assignee: Yi Liu
 Attachments: HADOOP-12295.001.patch


 In {{NetworkTopology#InnerNode#remove}}, We can use {{childrenMap}} to get 
 the parent node, no need to loop the {{children}} list. Then it is more 
 efficient since in most cases deleting parent node doesn't happen.
 Another nit in current code is:
 {code}
   String parent = n.getNetworkLocation();
   String currentPath = getPath(this);
 {code}
 can be in closure of {{\!isAncestor\(n\)}}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HADOOP-12178) NPE during handling of SASL setup if problem with SASL resolver class

2015-07-16 Thread Yi Liu (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-12178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14629705#comment-14629705
 ] 

Yi Liu commented on HADOOP-12178:
-

Steve, sorry for late response.
I agree with you not all exceptions indicates a problem with SASL connection, 
and some can be rethrown. Seems {{setupSaslConnection}} only throws IOException 
which is must handled, but not very sure if there are other exceptions thrown 
out. 
To be safety, could we keep the
{code}
} catch (Exception ex) {
{code}
Thanks.

 NPE during handling of SASL setup if problem with SASL resolver class
 -

 Key: HADOOP-12178
 URL: https://issues.apache.org/jira/browse/HADOOP-12178
 Project: Hadoop Common
  Issue Type: Bug
  Components: ipc
Affects Versions: 2.7.1
Reporter: Steve Loughran
Assignee: Steve Loughran
Priority: Minor
 Attachments: HADOOP-12178-001.patch


 If there's any problem in the constructor of {{SaslRpcClient}}, then IPC 
 Client throws an NPE rather than forwarding the stack trace. This is because 
 the exception handler assumes that {{saslRpcClient}} is not null, that the 
 exception is related to the SASL setup itself.
 The exception handler needs to check for {{saslRpcClient}} being null, and if 
 so, rethrow the exception



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HADOOP-12178) NPE during handling of SASL setup if problem with SASL resolver class

2015-07-13 Thread Yi Liu (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-12178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14625644#comment-14625644
 ] 

Yi Liu commented on HADOOP-12178:
-

Thanks Steve.

{code}
-} catch (Exception ex) {
+} catch (IOException ex) {
{code}
I think changing {{Exception}} to {{IOException}} is unnecessary.
If {{SaslPropertiesResolver.getInstance(conf)}} throws RTE, then {{doAs}} will 
also throws RTE, if we change it to IOE, it can't be caught , so {{if 
(saslRpcClient == null)}} can't reach,  furthermore we need to handle other 
exception here.  

Others look good, just no need to change the exception.

 NPE during handling of SASL setup if problem with SASL resolver class
 -

 Key: HADOOP-12178
 URL: https://issues.apache.org/jira/browse/HADOOP-12178
 Project: Hadoop Common
  Issue Type: Bug
  Components: ipc
Affects Versions: 2.7.1
Reporter: Steve Loughran
Assignee: Steve Loughran
Priority: Minor
 Attachments: HADOOP-12178-001.patch


 If there's any problem in the constructor of {{SaslRpcClient}}, then IPC 
 Client throws an NPE rather than forwarding the stack trace. This is because 
 the exception handler assumes that {{saslRpcClient}} is not null, that the 
 exception is related to the SASL setup itself.
 The exception handler needs to check for {{saslRpcClient}} being null, and if 
 so, rethrow the exception



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HADOOP-12201) Add tracing to FileSystem#createFileSystem and Globber#glob

2015-07-07 Thread Yi Liu (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-12201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14617788#comment-14617788
 ] 

Yi Liu commented on HADOOP-12201:
-

+1, thanks Colin.

 Add tracing to FileSystem#createFileSystem and Globber#glob
 ---

 Key: HADOOP-12201
 URL: https://issues.apache.org/jira/browse/HADOOP-12201
 Project: Hadoop Common
  Issue Type: Bug
Affects Versions: 2.6.0
Reporter: Colin Patrick McCabe
Assignee: Colin Patrick McCabe
 Attachments: HADOOP-12201.001.patch, createfilesystem.png


 Add tracing to FileSystem#createFileSystem and Globber#glob



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HADOOP-12172) FsShell mkdir -p makes an unnecessary check for the existence of the parent.

2015-07-01 Thread Yi Liu (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-12172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14611287#comment-14611287
 ] 

Yi Liu commented on HADOOP-12172:
-

+1, thanks Chris

 FsShell mkdir -p makes an unnecessary check for the existence of the parent.
 

 Key: HADOOP-12172
 URL: https://issues.apache.org/jira/browse/HADOOP-12172
 Project: Hadoop Common
  Issue Type: Improvement
  Components: fs
Reporter: Chris Nauroth
Assignee: Chris Nauroth
Priority: Minor
 Attachments: HADOOP-12172.001.patch


 The {{mkdir}} command in {{FsShell}} checks for the existence of the parent 
 of the directory and returns an error if it doesn't exist.  The {{-p}} option 
 suppresses the error and allows the directory creation to continue, 
 implicitly creating all missing intermediate directories.  However, the 
 existence check still runs even with {{-p}} specified, and its result is 
 ignored.  Depending on the file system, this is a wasteful RPC call (HDFS) or 
 HTTP request (WebHDFS/S3/Azure) imposing extra latency for the client and 
 extra load for the server.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HADOOP-12124) Add HTrace support for FsShell

2015-06-29 Thread Yi Liu (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-12124?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14606740#comment-14606740
 ] 

Yi Liu commented on HADOOP-12124:
-

+1, thanks Colin.

 Add HTrace support for FsShell
 --

 Key: HADOOP-12124
 URL: https://issues.apache.org/jira/browse/HADOOP-12124
 Project: Hadoop Common
  Issue Type: Bug
Affects Versions: 2.6.0
Reporter: Colin Patrick McCabe
Assignee: Colin Patrick McCabe
 Attachments: HADOOP-12124.001.patch, HADOOP-12124.002.patch


 Add HTrace support for FsShell



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HADOOP-11847) Enhance raw coder allowing to read least required inputs in decoding

2015-05-26 Thread Yi Liu (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-11847?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14558731#comment-14558731
 ] 

Yi Liu commented on HADOOP-11847:
-

Kai, the patch looks good, one comment, +1 after addressing:
In RSRawDecoder#doDecode

{code}
+for (int bufferIdx = 0, i = 0; i  erasedOrNotToReadIndexes.length; i++) {
+  if (adjustedDirectBufferOutputsParameter[i] == null) {
+ByteBuffer buffer = checkGetDirectBuffer(bufferIdx, dataLen);
+buffer.limit(dataLen);
+adjustedDirectBufferOutputsParameter[i] = resetBuffer(buffer);
+bufferIdx++;
+  }
+}
{code}
Here, we need to set buffer position to 0.



 Enhance raw coder allowing to read least required inputs in decoding
 

 Key: HADOOP-11847
 URL: https://issues.apache.org/jira/browse/HADOOP-11847
 Project: Hadoop Common
  Issue Type: Sub-task
  Components: io
Reporter: Kai Zheng
Assignee: Kai Zheng
  Labels: BB2015-05-TBR
 Attachments: HADOOP-11847-HDFS-7285-v3.patch, 
 HADOOP-11847-HDFS-7285-v4.patch, HADOOP-11847-HDFS-7285-v5.patch, 
 HADOOP-11847-HDFS-7285-v6.patch, HADOOP-11847-HDFS-7285-v7.patch, 
 HADOOP-11847-HDFS-7285-v8.patch, HADOOP-11847-v1.patch, HADOOP-11847-v2.patch


 This is to enhance raw erasure coder to allow only reading least required 
 inputs while decoding. It will also refine and document the relevant APIs for 
 better understanding and usage. When using least required inputs, it may add 
 computating overhead but will possiblly outperform overall since less network 
 traffic and disk IO are involved.
 This is something planned to do but just got reminded by [~zhz]' s question 
 raised in HDFS-7678, also copied here:
 bq.Kai Zheng I have a question about decoding: in a (6+3) schema, if block #2 
 is missing, and I want to repair it with blocks 0, 1, 3, 4, 5, 8, how should 
 I construct the inputs to RawErasureDecoder#decode?
 With this work, hopefully the answer to above question would be obvious.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HADOOP-11847) Enhance raw coder allowing to read least required inputs in decoding

2015-05-26 Thread Yi Liu (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-11847?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14558742#comment-14558742
 ] 

Yi Liu commented on HADOOP-11847:
-

+1, thanks Kai

 Enhance raw coder allowing to read least required inputs in decoding
 

 Key: HADOOP-11847
 URL: https://issues.apache.org/jira/browse/HADOOP-11847
 Project: Hadoop Common
  Issue Type: Sub-task
  Components: io
Reporter: Kai Zheng
Assignee: Kai Zheng
  Labels: BB2015-05-TBR
 Attachments: HADOOP-11847-HDFS-7285-v3.patch, 
 HADOOP-11847-HDFS-7285-v4.patch, HADOOP-11847-HDFS-7285-v5.patch, 
 HADOOP-11847-HDFS-7285-v6.patch, HADOOP-11847-HDFS-7285-v7.patch, 
 HADOOP-11847-HDFS-7285-v8.patch, HADOOP-11847-HDFS-7285-v9.patch, 
 HADOOP-11847-v1.patch, HADOOP-11847-v2.patch


 This is to enhance raw erasure coder to allow only reading least required 
 inputs while decoding. It will also refine and document the relevant APIs for 
 better understanding and usage. When using least required inputs, it may add 
 computating overhead but will possiblly outperform overall since less network 
 traffic and disk IO are involved.
 This is something planned to do but just got reminded by [~zhz]' s question 
 raised in HDFS-7678, also copied here:
 bq.Kai Zheng I have a question about decoding: in a (6+3) schema, if block #2 
 is missing, and I want to repair it with blocks 0, 1, 3, 4, 5, 8, how should 
 I construct the inputs to RawErasureDecoder#decode?
 With this work, hopefully the answer to above question would be obvious.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HADOOP-11847) Enhance raw coder allowing to read least required inputs in decoding

2015-05-25 Thread Yi Liu (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-11847?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14558602#comment-14558602
 ] 

Yi Liu commented on HADOOP-11847:
-

Thanks Kai for the patch. I will check it later (I was out of town again 
yesterday).

 Enhance raw coder allowing to read least required inputs in decoding
 

 Key: HADOOP-11847
 URL: https://issues.apache.org/jira/browse/HADOOP-11847
 Project: Hadoop Common
  Issue Type: Sub-task
  Components: io
Reporter: Kai Zheng
Assignee: Kai Zheng
  Labels: BB2015-05-TBR
 Attachments: HADOOP-11847-HDFS-7285-v3.patch, 
 HADOOP-11847-HDFS-7285-v4.patch, HADOOP-11847-HDFS-7285-v5.patch, 
 HADOOP-11847-HDFS-7285-v6.patch, HADOOP-11847-HDFS-7285-v7.patch, 
 HADOOP-11847-HDFS-7285-v8.patch, HADOOP-11847-v1.patch, HADOOP-11847-v2.patch


 This is to enhance raw erasure coder to allow only reading least required 
 inputs while decoding. It will also refine and document the relevant APIs for 
 better understanding and usage. When using least required inputs, it may add 
 computating overhead but will possiblly outperform overall since less network 
 traffic and disk IO are involved.
 This is something planned to do but just got reminded by [~zhz]' s question 
 raised in HDFS-7678, also copied here:
 bq.Kai Zheng I have a question about decoding: in a (6+3) schema, if block #2 
 is missing, and I want to repair it with blocks 0, 1, 3, 4, 5, 8, how should 
 I construct the inputs to RawErasureDecoder#decode?
 With this work, hopefully the answer to above question would be obvious.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HADOOP-11847) Enhance raw coder allowing to read least required inputs in decoding

2015-05-21 Thread Yi Liu (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-11847?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14555446#comment-14555446
 ] 

Yi Liu commented on HADOOP-11847:
-

*in AbstractRawErasureDecoder.java*
for findFirstValidInput, still one comment not addressed:
{code}
+if (inputs[0] != null) {
+  return inputs[0];
+}
+
+for (int i = 1; i  inputs.length; i++) {
+  if (inputs[i] != null) {
+return inputs[i];
+  }
+}
{code}
It can be:
{code}
for (int i = 0; i  inputs.length; i++) {
  if (inputs[i] != null) {
return inputs[i];
  }
}
{code}

*In RSRawDecoder.java*
{code}
private void ensureBytesArrayBuffers(int dataLen) {
if (bytesArrayBuffers == null || bytesArrayBuffers[0].length  dataLen) {
  /**
   * Create this set of buffers on demand, which is only needed at the first
   * time running into this, using bytes array.
   */
  // Erased or not to read
  int maxInvalidUnits = getNumParityUnits();
  adjustedByteArrayOutputsParameter = new byte[maxInvalidUnits][];
  adjustedOutputOffsets = new int[maxInvalidUnits];

  // These are temp buffers for both inputs and outputs
  bytesArrayBuffers = new byte[maxInvalidUnits * 2][];
  for (int i = 0; i  bytesArrayBuffers.length; ++i) {
bytesArrayBuffers[i] = new byte[dataLen];
  }
}
  }

  private void ensureDirectBuffers(int dataLen) {
if (directBuffers == null || directBuffers[0].capacity()  dataLen) {
  /**
   * Create this set of buffers on demand, which is only needed at the first
   * time running into this, using DirectBuffer.
   */
  // Erased or not to read
  int maxInvalidUnits = getNumParityUnits();
  adjustedDirectBufferOutputsParameter = new ByteBuffer[maxInvalidUnits];

  // These are temp buffers for both inputs and outputs
  directBuffers = new ByteBuffer[maxInvalidUnits * 2];
  for (int i = 0; i  directBuffers.length; i++) {
directBuffers[i] = ByteBuffer.allocateDirect(dataLen);
  }
}
  }
{code}
1.  Do we need {{maxInvalidUnits * 2}} for bytesArrayBuffers and directBuffers? 
Since we don't need additional buffer for inputs.  The correct size should be 
{{parityUnitNum - outputs.length}}. If next time, there is no enough buffer, 
then you allocate new.
2. The share buffer size should be always the chunk size, otherwise they can't 
be shared, since the dataLen may be different.  

In {{doDecode}}
{code}
for (int i = 0; i  adjustedByteArrayOutputsParameter.length; i++) {
  adjustedByteArrayOutputsParameter[i] =
  resetBuffer(bytesArrayBuffers[bufferIdx++], 0, dataLen);
  adjustedOutputOffsets[i] = 0; // Always 0 for such temp output
}

int outputIdx = 0;
for (int i = 0; i  erasedIndexes.length; i++, outputIdx++) {
  for (int j = 0; j  erasedOrNotToReadIndexes.length; j++) {
// If this index is one requested by the caller via erasedIndexes, then
// we use the passed output buffer to avoid copying data thereafter.
if (erasedIndexes[i] == erasedOrNotToReadIndexes[j]) {
  adjustedByteArrayOutputsParameter[j] =
  resetBuffer(outputs[outputIdx], 0, dataLen);
  adjustedOutputOffsets[j] = outputOffsets[outputIdx];
}
  }
}
{code}
1. We should check erasedOrNotToReadIndexes contains erasedIndexes. 
2. We just need one loop,  go though {{adjustedByteArrayOutputsParameter}}, 
assign buffer from outputs if exists, otherwise from {{bytesArrayBuffers}}


 Enhance raw coder allowing to read least required inputs in decoding
 

 Key: HADOOP-11847
 URL: https://issues.apache.org/jira/browse/HADOOP-11847
 Project: Hadoop Common
  Issue Type: Sub-task
  Components: io
Reporter: Kai Zheng
Assignee: Kai Zheng
  Labels: BB2015-05-TBR
 Attachments: HADOOP-11847-HDFS-7285-v3.patch, 
 HADOOP-11847-HDFS-7285-v4.patch, HADOOP-11847-HDFS-7285-v5.patch, 
 HADOOP-11847-HDFS-7285-v6.patch, HADOOP-11847-v1.patch, HADOOP-11847-v2.patch


 This is to enhance raw erasure coder to allow only reading least required 
 inputs while decoding. It will also refine and document the relevant APIs for 
 better understanding and usage. When using least required inputs, it may add 
 computating overhead but will possiblly outperform overall since less network 
 traffic and disk IO are involved.
 This is something planned to do but just got reminded by [~zhz]' s question 
 raised in HDFS-7678, also copied here:
 bq.Kai Zheng I have a question about decoding: in a (6+3) schema, if block #2 
 is missing, and I want to repair it with blocks 0, 1, 3, 4, 5, 8, how should 
 I construct the inputs to RawErasureDecoder#decode?
 With this work, hopefully the answer to above question would be obvious.



--

[jira] [Commented] (HADOOP-11847) Enhance raw coder allowing to read least required inputs in decoding

2015-05-21 Thread Yi Liu (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-11847?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=1414#comment-1414
 ] 

Yi Liu commented on HADOOP-11847:
-

{quote}
Sorry I missed to explain why the codes are like that. It was thinking that 
it's rarely the first units that's erased, so in most cases just checking 
inputs\[0\] will return the wanted result, avoiding involving into the loop.
{quote}
If the first element is not null, it will return. It will have loop?

{quote}
How about simply having maxInvalidUnits = numParityUnits? The good is we don't 
have to re-allocate the shared buffers for different erasures.
{quote}
We don't need to allocate {{numParityUnits}} number of buffers, the output 
should at least have one, right?  Maybe more than one.   I don't think we have 
to re-allocate the shared buffers for different erasures.   If the buffers is 
not enough, then we allocate new and add it to the shared pool, it's typically 
behavior.

{quote}
We don't have or use chunkSize now. Please note the check is:
{quote}
Right, we don't need to use ChunkSize now.  I think 
{{bytesArrayBuffers\[0\].length  dataLen}} is OK.
{{ensureBytesArrayBuffer}} and {{ensureDirectBuffers}} need to be renamed and 
rewritten per above comments.

{quote}
Would you check again, thanks.
{quote}
{code}
for (int i = 0; i  adjustedByteArrayOutputsParameter.length; i++) {
  adjustedByteArrayOutputsParameter[i] =
  resetBuffer(bytesArrayBuffers[bufferIdx++], 0, dataLen);
  adjustedOutputOffsets[i] = 0; // Always 0 for such temp output
}

int outputIdx = 0;
for (int i = 0; i  erasedIndexes.length; i++, outputIdx++) {
  for (int j = 0; j  erasedOrNotToReadIndexes.length; j++) {
// If this index is one requested by the caller via erasedIndexes, then
// we use the passed output buffer to avoid copying data thereafter.
if (erasedIndexes[i] == erasedOrNotToReadIndexes[j]) {
  adjustedByteArrayOutputsParameter[j] =
  resetBuffer(outputs[outputIdx], 0, dataLen);
  adjustedOutputOffsets[j] = outputOffsets[outputIdx];
}
  }
}
{code}
You call {{resetBuffer}}: parityNum + erasedIndexes,  is that true?


 Enhance raw coder allowing to read least required inputs in decoding
 

 Key: HADOOP-11847
 URL: https://issues.apache.org/jira/browse/HADOOP-11847
 Project: Hadoop Common
  Issue Type: Sub-task
  Components: io
Reporter: Kai Zheng
Assignee: Kai Zheng
  Labels: BB2015-05-TBR
 Attachments: HADOOP-11847-HDFS-7285-v3.patch, 
 HADOOP-11847-HDFS-7285-v4.patch, HADOOP-11847-HDFS-7285-v5.patch, 
 HADOOP-11847-HDFS-7285-v6.patch, HADOOP-11847-v1.patch, HADOOP-11847-v2.patch


 This is to enhance raw erasure coder to allow only reading least required 
 inputs while decoding. It will also refine and document the relevant APIs for 
 better understanding and usage. When using least required inputs, it may add 
 computating overhead but will possiblly outperform overall since less network 
 traffic and disk IO are involved.
 This is something planned to do but just got reminded by [~zhz]' s question 
 raised in HDFS-7678, also copied here:
 bq.Kai Zheng I have a question about decoding: in a (6+3) schema, if block #2 
 is missing, and I want to repair it with blocks 0, 1, 3, 4, 5, 8, how should 
 I construct the inputs to RawErasureDecoder#decode?
 With this work, hopefully the answer to above question would be obvious.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HADOOP-11847) Enhance raw coder allowing to read least required inputs in decoding

2015-05-20 Thread Yi Liu (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-11847?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14553605#comment-14553605
 ] 

Yi Liu commented on HADOOP-11847:
-

I will give comments later today, I was out of town yesterday.

 Enhance raw coder allowing to read least required inputs in decoding
 

 Key: HADOOP-11847
 URL: https://issues.apache.org/jira/browse/HADOOP-11847
 Project: Hadoop Common
  Issue Type: Sub-task
  Components: io
Reporter: Kai Zheng
Assignee: Kai Zheng
  Labels: BB2015-05-TBR
 Attachments: HADOOP-11847-HDFS-7285-v3.patch, 
 HADOOP-11847-HDFS-7285-v4.patch, HADOOP-11847-HDFS-7285-v5.patch, 
 HADOOP-11847-HDFS-7285-v6.patch, HADOOP-11847-v1.patch, HADOOP-11847-v2.patch


 This is to enhance raw erasure coder to allow only reading least required 
 inputs while decoding. It will also refine and document the relevant APIs for 
 better understanding and usage. When using least required inputs, it may add 
 computating overhead but will possiblly outperform overall since less network 
 traffic and disk IO are involved.
 This is something planned to do but just got reminded by [~zhz]' s question 
 raised in HDFS-7678, also copied here:
 bq.Kai Zheng I have a question about decoding: in a (6+3) schema, if block #2 
 is missing, and I want to repair it with blocks 0, 1, 3, 4, 5, 8, how should 
 I construct the inputs to RawErasureDecoder#decode?
 With this work, hopefully the answer to above question would be obvious.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HADOOP-11847) Enhance raw coder allowing to read least required inputs in decoding

2015-05-19 Thread Yi Liu (JIRA)

[
https://issues.apache.org/jira/browse/HADOOP-11847?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14549929#comment-14549929
]

Yi Liu commented on HADOOP-11847:
-

Kai had an offline discussion with me.
1. In RSRawDecoder.java, for the additional input buffers, we don't need them,
we can use inputs directly, and do some modification for {{RSUtil.GF}} to check
whether some input is null. Then it's more efficient and simper.
2. For output, we have looked into RS implementation, all the outputs have
relationship with each other, so in current phase, we still decode all outputs.
For example, for 6+3, if there is 2 chunks missed, ideally we just need to
reconstruct 2 chunks, but because the relationship of outputs, currently we
still go to reconstruct 3 chunks. HADOOP-11871 is for further improvement of
this. So for output buffers, we may need to allocate some buffer(s).

Enhance raw coder allowing to read least required inputs in decoding

Key: HADOOP-11847
URL: https://issues.apache.org/jira/browse/HADOOP-11847
Project: Hadoop Common
Issue Type: Sub-task
Components: io
Reporter: Kai Zheng
Assignee: Kai Zheng
Labels: BB2015-05-TBR
Attachments: HADOOP-11847-HDFS-7285-v3.patch,
HADOOP-11847-HDFS-7285-v4.patch, HADOOP-11847-HDFS-7285-v5.patch,
HADOOP-11847-v1.patch, HADOOP-11847-v2.patch

This is to enhance raw erasure coder to allow only reading least required
inputs while decoding. It will also refine and document the relevant APIs for
better understanding and usage. When using least required inputs, it may add
computating overhead but will possiblly outperform overall since less network
traffic and disk IO are involved.
This is something planned to do but just got reminded by [~zhz]' s question
raised in HDFS-7678, also copied here:
bq.Kai Zheng I have a question about decoding: in a (6+3) schema, if block #2
is missing, and I want to repair it with blocks 0, 1, 3, 4, 5, 8, how should
I construct the inputs to RawErasureDecoder#decode?
With this work, hopefully the answer to above question would be obvious.

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HADOOP-11847) Enhance raw coder allowing to read least required inputs in decoding

2015-05-18 Thread Yi Liu (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-11847?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14549697#comment-14549697
 ] 

Yi Liu commented on HADOOP-11847:
-

Thanks Kai for the patch.

*In AbstractRawErasureCoder.java*
{code}
+  if (buffers[i] == null) {
+if (allowNull) {
+  continue;
+}
+throw new HadoopIllegalArgumentException(Invalid buffer found, not 
allowing null);
+  }
{code}
Using following code may be more simpler
{code}
if (buffers[i] == null  !allowNull) {
  throw new HadoopIllegalArgumentException(Invalid buffer found, not allowing 
null);
}
{code}


*In AbstractRawErasureDecoder.java*
Rename {{findGoodInput}} to {{getFirstNullInput}}, and we can use generic type 
of Java, also the implementation can be simplified: 
{code}
  /**
   * Find the first null input.
   * @param inputs
   * @return the first null input
   */
  protected T T getFirstNullInput(T[] inputs) {
for (T input : inputs) {
  if (input != null) {
return input;
  }
}

throw new HadoopIllegalArgumentException(
Invalid inputs are found, all being null);
  }
{code}
Look at the above, is it more cool? 
Then you can change
{code}
ByteBuffer goodInput = (ByteBuffer) findGoodInput(inputs);
{code} to
{code}
ByteBuffer firstNullInput = getFirstNullInput(inputs);
{code}

{code}
protected int[] getErasedOrNotToReadIndexes(Object[] inputs) {
int[] invalidIndexes = new int[inputs.length];
{code}
We can accept the {{int erasedNum}} parameter, then we can allocate the exact 
array size and no need array copy.


*In RSRawDecoder.java*
{code}
/**
   * We need a set of reusable buffers either for the bytes array
   * decoding version or direct buffer decoding version. Normally not both.
   *
   * For both input and output, in addition to the valid buffers from the caller
   * passed from above, we need to provide extra buffers for the internal
   * decoding implementation. For input, the caller should provide at least
   * numDataUnits valid buffers (non-NULL); for output, the caller should 
   * provide no more than numParityUnits but at least one buffers. And the left
   * buffers will be borrowed from either bytesArrayBuffersForInput or 
   * bytesArrayBuffersForOutput, for the bytes array version.
   *
   */
  // Reused buffers for decoding with bytes arrays
  private byte[][] bytesArrayBuffers;
  private byte[][] adjustedByteArrayInputsParameter;
  private byte[][] adjustedByteArrayOutputsParameter;
  private int[] adjustedInputOffsets;
  private int[] adjustedOutputOffsets;

  // Reused buffers for decoding with direct ByteBuffers
  private ByteBuffer[] directBuffers;
  private ByteBuffer[] adjustedDirectBufferInputsParameter;
  private ByteBuffer[] adjustedDirectBufferOutputsParameter;
{code}
I don't think we need these.

{code}
@Override
  protected void doDecode(byte[][] inputs, int[] inputOffsets,
  int dataLen, int[] erasedIndexes,
  byte[][] outputs, int[] outputOffsets) {
ensureBytesArrayBuffers(dataLen);

/**
 * As passed parameters are friendly to callers but not to the underlying
 * implementations, so we have to adjust them before calling doDecoder.
 */

int[] erasedOrNotToReadIndexes = getErasedOrNotToReadIndexes(inputs);
int bufferIdx = 0, erasedIdx;

// Prepare for adjustedInputsParameter and adjustedInputOffsets
System.arraycopy(inputs, 0, adjustedByteArrayInputsParameter,
0, inputs.length);
System.arraycopy(inputOffsets, 0, adjustedInputOffsets,
0, inputOffsets.length);
for (int i = 0; i  erasedOrNotToReadIndexes.length; i++) {
  // Borrow it from bytesArrayBuffersForInput for the temp usage.
  erasedIdx = erasedOrNotToReadIndexes[i];
  adjustedByteArrayInputsParameter[erasedIdx] =
  resetBuffer(bytesArrayBuffers[bufferIdx++], 0, dataLen);
  adjustedInputOffsets[erasedIdx] = 0; // Always 0 for such temp input
}

// Prepare for adjustedOutputsParameter
for (int i = 0; i  adjustedByteArrayOutputsParameter.length; i++) {
  adjustedByteArrayOutputsParameter[i] =
  resetBuffer(bytesArrayBuffers[bufferIdx++], 0, dataLen);
  adjustedOutputOffsets[i] = 0; // Always 0 for such temp output
}
for (int outputIdx = 0, i = 0;
 i  erasedIndexes.length; i++, outputIdx++) {
  for (int j = 0; j  erasedOrNotToReadIndexes.length; j++) {
// If this index is one requested by the caller via erasedIndexes, then
// we use the passed output buffer to avoid copying data thereafter.
if (erasedIndexes[i] == erasedOrNotToReadIndexes[j]) {
  adjustedByteArrayOutputsParameter[j] =
  resetBuffer(outputs[outputIdx], 0, dataLen);
  adjustedOutputOffsets[j] = outputOffsets[outputIdx];
}
  }
}

doDecodeImpl(adjustedByteArrayInputsParameter,

[jira] [Comment Edited] (HADOOP-11847) Enhance raw coder allowing to read least required inputs in decoding

2015-05-18 Thread Yi Liu (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-11847?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14549697#comment-14549697
 ] 

Yi Liu edited comment on HADOOP-11847 at 5/19/15 3:17 AM:
--

Thanks Kai for the patch.

*In AbstractRawErasureCoder.java*
{code}
+  if (buffers[i] == null) {
+if (allowNull) {
+  continue;
+}
+throw new HadoopIllegalArgumentException(Invalid buffer found, not 
allowing null);
+  }
{code}
Using following code may be more simpler
{code}
if (buffers[i] == null  !allowNull) {
  throw new HadoopIllegalArgumentException(Invalid buffer found, not allowing 
null);
}
{code}


*In AbstractRawErasureDecoder.java*
Rename {{findGoodInput}} to {{getFirstNotNullInput}}, and we can use generic 
type of Java, also the implementation can be simplified: 
{code}
  /**
   * Find the first not null input.
   * @param inputs
   * @return the first not null input
   */
  protected T T getFirstNotNullInput(T[] inputs) {
for (T input : inputs) {
  if (input != null) {
return input;
  }
}

throw new HadoopIllegalArgumentException(
Invalid inputs are found, all being null);
  }
{code}
Look at the above, is it more cool? 
Then you can change
{code}
ByteBuffer goodInput = (ByteBuffer) findGoodInput(inputs);
{code} to
{code}
ByteBuffer firstNotNullInput = getFirstNotNullInput(inputs);
{code}

{code}
protected int[] getErasedOrNotToReadIndexes(Object[] inputs) {
int[] invalidIndexes = new int[inputs.length];
{code}
We can accept the {{int erasedNum}} parameter, then we can allocate the exact 
array size and no need array copy.


*In RSRawDecoder.java*
{code}
/**
   * We need a set of reusable buffers either for the bytes array
   * decoding version or direct buffer decoding version. Normally not both.
   *
   * For both input and output, in addition to the valid buffers from the caller
   * passed from above, we need to provide extra buffers for the internal
   * decoding implementation. For input, the caller should provide at least
   * numDataUnits valid buffers (non-NULL); for output, the caller should 
   * provide no more than numParityUnits but at least one buffers. And the left
   * buffers will be borrowed from either bytesArrayBuffersForInput or 
   * bytesArrayBuffersForOutput, for the bytes array version.
   *
   */
  // Reused buffers for decoding with bytes arrays
  private byte[][] bytesArrayBuffers;
  private byte[][] adjustedByteArrayInputsParameter;
  private byte[][] adjustedByteArrayOutputsParameter;
  private int[] adjustedInputOffsets;
  private int[] adjustedOutputOffsets;

  // Reused buffers for decoding with direct ByteBuffers
  private ByteBuffer[] directBuffers;
  private ByteBuffer[] adjustedDirectBufferInputsParameter;
  private ByteBuffer[] adjustedDirectBufferOutputsParameter;
{code}
I don't think we need these.

{code}
@Override
  protected void doDecode(byte[][] inputs, int[] inputOffsets,
  int dataLen, int[] erasedIndexes,
  byte[][] outputs, int[] outputOffsets) {
ensureBytesArrayBuffers(dataLen);

/**
 * As passed parameters are friendly to callers but not to the underlying
 * implementations, so we have to adjust them before calling doDecoder.
 */

int[] erasedOrNotToReadIndexes = getErasedOrNotToReadIndexes(inputs);
int bufferIdx = 0, erasedIdx;

// Prepare for adjustedInputsParameter and adjustedInputOffsets
System.arraycopy(inputs, 0, adjustedByteArrayInputsParameter,
0, inputs.length);
System.arraycopy(inputOffsets, 0, adjustedInputOffsets,
0, inputOffsets.length);
for (int i = 0; i  erasedOrNotToReadIndexes.length; i++) {
  // Borrow it from bytesArrayBuffersForInput for the temp usage.
  erasedIdx = erasedOrNotToReadIndexes[i];
  adjustedByteArrayInputsParameter[erasedIdx] =
  resetBuffer(bytesArrayBuffers[bufferIdx++], 0, dataLen);
  adjustedInputOffsets[erasedIdx] = 0; // Always 0 for such temp input
}

// Prepare for adjustedOutputsParameter
for (int i = 0; i  adjustedByteArrayOutputsParameter.length; i++) {
  adjustedByteArrayOutputsParameter[i] =
  resetBuffer(bytesArrayBuffers[bufferIdx++], 0, dataLen);
  adjustedOutputOffsets[i] = 0; // Always 0 for such temp output
}
for (int outputIdx = 0, i = 0;
 i  erasedIndexes.length; i++, outputIdx++) {
  for (int j = 0; j  erasedOrNotToReadIndexes.length; j++) {
// If this index is one requested by the caller via erasedIndexes, then
// we use the passed output buffer to avoid copying data thereafter.
if (erasedIndexes[i] == erasedOrNotToReadIndexes[j]) {
  adjustedByteArrayOutputsParameter[j] =
  resetBuffer(outputs[outputIdx], 0, dataLen);
  adjustedOutputOffsets[j] = outputOffsets[outputIdx];
}

[jira] [Commented] (HADOOP-11938) Fix ByteBuffer version encode/decode API of raw erasure coder

2015-05-15 Thread Yi Liu (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-11938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14545154#comment-14545154
 ] 

Yi Liu commented on HADOOP-11938:
-

Looks good now, one nit, +1 after addressing
in TestRawCoderBase.java
{code}
Assert.fail(Encoding test with bad input passed);
{code}
We should write Encoding test with bad input should fail.  You write 
oppositely.  Same as few other Assert.fail.

Furthermore, we need to fix the Jenkins warnings (release 
audit/checkstyle/whitespace) if they are related to this patch.

 Fix ByteBuffer version encode/decode API of raw erasure coder
 -

 Key: HADOOP-11938
 URL: https://issues.apache.org/jira/browse/HADOOP-11938
 Project: Hadoop Common
  Issue Type: Sub-task
  Components: io
Reporter: Kai Zheng
Assignee: Kai Zheng
 Attachments: HADOOP-11938-HDFS-7285-v1.patch, 
 HADOOP-11938-HDFS-7285-v2.patch, HADOOP-11938-HDFS-7285-v3.patch, 
 HADOOP-11938-HDFS-7285-workaround.patch


 While investigating a test failure in {{TestRecoverStripedFile}}, one issue 
 in raw erasrue coder, caused by an optimization in below codes. It assumes 
 the  heap buffer backed by the bytes array available for reading or writing 
 always starts with zero and takes the whole space.
 {code}
   protected static byte[][] toArrays(ByteBuffer[] buffers) {
 byte[][] bytesArr = new byte[buffers.length][];
 ByteBuffer buffer;
 for (int i = 0; i  buffers.length; i++) {
   buffer = buffers[i];
   if (buffer == null) {
 bytesArr[i] = null;
 continue;
   }
   if (buffer.hasArray()) {
 bytesArr[i] = buffer.array();
   } else {
 throw new IllegalArgumentException(Invalid ByteBuffer passed,  +
 expecting heap buffer);
   }
 }
 return bytesArr;
   }
 {code} 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HADOOP-11938) Fix ByteBuffer version encode/decode API of raw erasure coder

2015-05-13 Thread Yi Liu (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-11938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14543076#comment-14543076
 ] 

Yi Liu commented on HADOOP-11938:
-

Looks more better than original.

Some lines are longer than 80 chars.

*In AbstractRawErasureCoder.java*

{code}
+for (int i = pos; i  buffer.remaining(); ++i) {
+  buffer.put(i, (byte) 0);
  }
{code}
it should be {{buffer.limit()}} instead of remaining
And we can just use {{buffer.put((byte)0)}}

{code}
@return the buffer itself, with ZERO bytes written, remaining the original
+   * position
{code}
remaining the original position,  maybe the position and limit is not 
changed after the call is more clear.

use {{HadoopIllegalArgumentException}} instead of {{IllegalArgumentException}}

*in XORRawDecoder.java and XORRawEncoder.java*
{code}
inputs[i].position() + inputs[0].remaining()
{code}
Just need use inputs\[i\].limit()

*in RSRawDecoder.java and RSRawEncoder.java*
{code}
+int dataLen = inputs[0].remaining();
{code}
is it necessary?  I think we don't need to pass {{dataLen}} to 
{{RSUtil.GF.solveVandermondeSystem}}

*in GaloisField.java*
{code}
public void solveVandermondeSystem(int[] x, ByteBuffer[] y,
  int len, int dataLen) {
{code}
As in previous comment, {{dataLen}} is unnecessary, so idx1  p.position() + 
dataLen can be idx1  p.limit()


{code}
  public void substitute(ByteBuffer[] p, ByteBuffer q, int x) {
-int y = 1;
+int y = 1, iIdx, oIdx;
+int len = p[0].remaining();
 for (int i = 0; i  p.length; i++) {
   ByteBuffer pi = p[i];
-  int len = pi.remaining();
-  for (int j = 0; j  len; j++) {
-int pij = pi.get(j)  0x00FF;
-q.put(j, (byte) (q.get(j) ^ mulTable[pij][y]));
+  for (iIdx = pi.position(), oIdx = q.position();
+   iIdx  pi.position() + len; iIdx++, oIdx++) {
+int pij = pi.get(iIdx)  0x00FF;
+q.put(oIdx, (byte) (q.get(oIdx) ^ mulTable[pij][y]));
   }
   y = mulTable[x][y];
 }
{code}
{{len}} is unnecessary.

Same for
{code}
public void remainder(ByteBuffer[] dividend, int len, int[] divisor) {
{code}

*in TestCoderBase.java*

{code}
+  private byte[] zeroChunkBytes;
..

   protected void eraseDataFromChunk(ECChunk chunk) {
 ByteBuffer chunkBuffer = chunk.getBuffer();
-// erase the data
-chunkBuffer.position(0);
-for (int i = 0; i  chunkSize; i++) {
-  chunkBuffer.put((byte) 0);
-}
+// erase the data at the position, and restore the buffer ready for reading
+// chunkSize bytes but all ZERO.
+int pos = chunkBuffer.position();
+chunkBuffer.flip();
+chunkBuffer.position(pos);
+chunkBuffer.limit(pos + chunkSize);
+chunkBuffer.put(zeroChunkBytes);
 chunkBuffer.flip();
+chunkBuffer.position(pos);
+chunkBuffer.limit(pos + chunkSize);
{code}

{code}
-  protected static ECChunk cloneChunkWithData(ECChunk chunk) {
+  protected ECChunk cloneChunkWithData(ECChunk chunk) {
 ByteBuffer srcBuffer = chunk.getBuffer();
-ByteBuffer destBuffer;
+ByteBuffer destBuffer = allocateOutputChunkBuffer();
  
-byte[] bytesArr = new byte[srcBuffer.remaining()];
+byte[] bytesArr = new byte[chunkSize];
 srcBuffer.mark();
 srcBuffer.get(bytesArr);
 srcBuffer.reset();
  
-if (srcBuffer.hasArray()) {
-  destBuffer = ByteBuffer.wrap(bytesArr);
-} else {
-  destBuffer = ByteBuffer.allocateDirect(srcBuffer.remaining());
-  destBuffer.put(bytesArr);
-  destBuffer.flip();
-}
+int pos = destBuffer.position();
+destBuffer.put(bytesArr);
+destBuffer.flip();
+destBuffer.position(pos);
{code}

{{destBuffer}} is still assumed to be chunkSize
Furthermore, some unnecessary flip


{code}
+  /**
+   * Convert an array of this chunks to an array of byte array.
+   * Note the chunk buffers are not affected.
+   * @param chunks
+   * @return an array of byte array
+   */
+  protected byte[][] toArrays(ECChunk[] chunks) {
+byte[][] bytesArr = new byte[chunks.length][];
+
+ByteBuffer buffer;
+for (int i = 0; i  chunks.length; i++) {
+  buffer = chunks[i].getBuffer();
+  if (buffer.hasArray()  buffer.position() == 0 
+  buffer.remaining() == chunkSize) {
+bytesArr[i] = buffer.array();
+  } else {
+bytesArr[i] = new byte[buffer.remaining()];
+// Avoid affecting the original one
+buffer.mark();
+buffer.get(bytesArr[i]);
+buffer.reset();
+  }
+}
+
+return bytesArr;
+  }
{code}
We already have this method, use {{ECChunk.toBuffers}}  ?  Convert to 
bytebuffer is enough? If not, should we have this method in code not only in 
test?


*In TestRawCoderBase.java*
You should add more description about your tests, for example, the negative 
test is for what and how you test,  you also need find a good name for it.
{code}
protected

[jira] [Commented] (HADOOP-11847) Enhance raw coder allowing to read least required inputs in decoding

2015-05-13 Thread Yi Liu (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-11847?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14543105#comment-14543105
 ] 

Yi Liu commented on HADOOP-11847:
-

Let's wait for HADOOP-11938, then back to this one.

 Enhance raw coder allowing to read least required inputs in decoding
 

 Key: HADOOP-11847
 URL: https://issues.apache.org/jira/browse/HADOOP-11847
 Project: Hadoop Common
  Issue Type: Sub-task
  Components: io
Reporter: Kai Zheng
Assignee: Kai Zheng
  Labels: BB2015-05-TBR
 Attachments: HADOOP-11847-HDFS-7285-v3.patch, 
 HADOOP-11847-HDFS-7285-v4.patch, HADOOP-11847-v1.patch, 
 HADOOP-11847-v2.patch, HADOOP-11847-v5.patch


 This is to enhance raw erasure coder to allow only reading least required 
 inputs while decoding. It will also refine and document the relevant APIs for 
 better understanding and usage. When using least required inputs, it may add 
 computating overhead but will possiblly outperform overall since less network 
 traffic and disk IO are involved.
 This is something planned to do but just got reminded by [~zhz]' s question 
 raised in HDFS-7678, also copied here:
 bq.Kai Zheng I have a question about decoding: in a (6+3) schema, if block #2 
 is missing, and I want to repair it with blocks 0, 1, 3, 4, 5, 8, how should 
 I construct the inputs to RawErasureDecoder#decode?
 With this work, hopefully the answer to above question would be obvious.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HADOOP-11938) Fix ByteBuffer version encode/decode API of raw erasure coder

2015-05-13 Thread Yi Liu (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-11938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14543078#comment-14543078
 ] 

Yi Liu commented on HADOOP-11938:
-

One more comment:
Add more javadoc for XOR coder, then other guys can have more better understand.


 Fix ByteBuffer version encode/decode API of raw erasure coder
 -

 Key: HADOOP-11938
 URL: https://issues.apache.org/jira/browse/HADOOP-11938
 Project: Hadoop Common
  Issue Type: Sub-task
  Components: io
Reporter: Kai Zheng
Assignee: Kai Zheng
 Attachments: HADOOP-11938-HDFS-7285-v1.patch, 
 HADOOP-11938-HDFS-7285-v2.patch, HADOOP-11938-HDFS-7285-workaround.patch


 While investigating a test failure in {{TestRecoverStripedFile}}, one issue 
 in raw erasrue coder, caused by an optimization in below codes. It assumes 
 the  heap buffer backed by the bytes array available for reading or writing 
 always starts with zero and takes the whole space.
 {code}
   protected static byte[][] toArrays(ByteBuffer[] buffers) {
 byte[][] bytesArr = new byte[buffers.length][];
 ByteBuffer buffer;
 for (int i = 0; i  buffers.length; i++) {
   buffer = buffers[i];
   if (buffer == null) {
 bytesArr[i] = null;
 continue;
   }
   if (buffer.hasArray()) {
 bytesArr[i] = buffer.array();
   } else {
 throw new IllegalArgumentException(Invalid ByteBuffer passed,  +
 expecting heap buffer);
   }
 }
 return bytesArr;
   }
 {code} 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Resolved] (HADOOP-11908) Erasure coding: Should be able to encode part of parity blocks.

2015-05-13 Thread Yi Liu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-11908?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yi Liu resolved HADOOP-11908.
-
Resolution: Duplicate

 Erasure coding: Should be able to encode part of parity blocks.
 ---

 Key: HADOOP-11908
 URL: https://issues.apache.org/jira/browse/HADOOP-11908
 Project: Hadoop Common
  Issue Type: Sub-task
  Components: io
Reporter: Yi Liu
Assignee: Kai Zheng

 {code}
 public void encode(ByteBuffer[] inputs, ByteBuffer[] outputs);
 {code}
 Currently when we do encode, the outputs are all parity blocks, we should be 
 able to encode part of parity blocks. 
 This is required when we do datanode striped block recovery, if one or more 
 parity blocks are missed, we need to do encode to recovery them. Only encode 
 part of parity blocks should be more efficient than all.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HADOOP-11961) Add isLinear interface to Erasure coder

2015-05-12 Thread Yi Liu (JIRA)

Yi Liu created HADOOP-11961:
---

 Summary: Add isLinear interface to Erasure coder
 Key: HADOOP-11961
 URL: https://issues.apache.org/jira/browse/HADOOP-11961
 Project: Hadoop Common
  Issue Type: Sub-task
Reporter: Yi Liu
Assignee: Yi Liu


Today, we have a discussion including [~zhz], [~drankye], etc., also discuss in 
HDFS-8347.
Some coder like {{RS}} and {{XOR}} is linear, some have coding boundary like 
HitchHicker.  If the coder is linear, we can decode at any size, and we don't 
need to padding inputs to *chunksize*,  if the coder is not linear, the inputs 
need to padding to *chunksize*, then do decode.

This interface is important for performance, and can save memory/disk space 
since the parity cells are the same as first data cell (less than codec 
chunksize).  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HADOOP-11961) Add interface of whether codec has chunk boundary to Erasure coder

2015-05-12 Thread Yi Liu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-11961?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yi Liu updated HADOOP-11961:

Description: (was: Today, we have a discussion including [~zhz], 
[~drankye], etc., also discuss in HDFS-8347.
Some coder like {{RS}} and {{XOR}} is linear, some have coding boundary like 
HitchHicker.  If the coder is linear, we can decode at any size, and we don't 
need to padding inputs to *chunksize*,  if the coder is not linear, the inputs 
need to padding to *chunksize*, then do decode.

This interface is important for performance, and can save memory/disk space 
since the parity cells are the same as first data cell (less than codec 
chunksize).  )
   Assignee: (was: Yi Liu)
Summary: Add interface of whether codec has chunk boundary to Erasure 
coder  (was: Add isLinear interface to Erasure coder)

 Add interface of whether codec has chunk boundary to Erasure coder
 --

 Key: HADOOP-11961
 URL: https://issues.apache.org/jira/browse/HADOOP-11961
 Project: Hadoop Common
  Issue Type: Sub-task
  Components: io
Reporter: Yi Liu





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HADOOP-11938) Fix ByteBuffer version encode/decode API of raw erasure coder

2015-05-12 Thread Yi Liu (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-11938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14539776#comment-14539776
 ] 

Yi Liu commented on HADOOP-11938:
-

OK, I see. Thanks for updating the patch, I will give comments to your update 
patch tomorrow.

 Fix ByteBuffer version encode/decode API of raw erasure coder
 -

 Key: HADOOP-11938
 URL: https://issues.apache.org/jira/browse/HADOOP-11938
 Project: Hadoop Common
  Issue Type: Sub-task
  Components: io
Reporter: Kai Zheng
Assignee: Kai Zheng
 Attachments: HADOOP-11938-HDFS-7285-v1.patch, 
 HADOOP-11938-HDFS-7285-v2.patch, HADOOP-11938-HDFS-7285-workaround.patch


 While investigating a test failure in {{TestRecoverStripedFile}}, one issue 
 in raw erasrue coder, caused by an optimization in below codes. It assumes 
 the  heap buffer backed by the bytes array available for reading or writing 
 always starts with zero and takes the whole space.
 {code}
   protected static byte[][] toArrays(ByteBuffer[] buffers) {
 byte[][] bytesArr = new byte[buffers.length][];
 ByteBuffer buffer;
 for (int i = 0; i  buffers.length; i++) {
   buffer = buffers[i];
   if (buffer == null) {
 bytesArr[i] = null;
 continue;
   }
   if (buffer.hasArray()) {
 bytesArr[i] = buffer.array();
   } else {
 throw new IllegalArgumentException(Invalid ByteBuffer passed,  +
 expecting heap buffer);
   }
 }
 return bytesArr;
   }
 {code} 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Resolved] (HADOOP-11961) Add interface of whether codec has chunk boundary to Erasure coder

2015-05-12 Thread Yi Liu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-11961?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yi Liu resolved HADOOP-11961.
-
Resolution: Invalid

 Add interface of whether codec has chunk boundary to Erasure coder
 --

 Key: HADOOP-11961
 URL: https://issues.apache.org/jira/browse/HADOOP-11961
 Project: Hadoop Common
  Issue Type: Sub-task
  Components: io
Reporter: Yi Liu





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HADOOP-11938) Fix ByteBuffer version encode/decode API of raw erasure coder

2015-05-12 Thread Yi Liu (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-11938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14539753#comment-14539753
 ] 

Yi Liu commented on HADOOP-11938:
-

{quote}
Yes. XOR coder can only recover one erasure of unit
{quote}
Interesting, if so, why we need XOR coder, what's it used for?  We should 
remove XOR coder and then no need to maintain the code anymore.

 Fix ByteBuffer version encode/decode API of raw erasure coder
 -

 Key: HADOOP-11938
 URL: https://issues.apache.org/jira/browse/HADOOP-11938
 Project: Hadoop Common
  Issue Type: Sub-task
  Components: io
Reporter: Kai Zheng
Assignee: Kai Zheng
 Attachments: HADOOP-11938-HDFS-7285-v1.patch, 
 HADOOP-11938-HDFS-7285-v2.patch, HADOOP-11938-HDFS-7285-workaround.patch


 While investigating a test failure in {{TestRecoverStripedFile}}, one issue 
 in raw erasrue coder, caused by an optimization in below codes. It assumes 
 the  heap buffer backed by the bytes array available for reading or writing 
 always starts with zero and takes the whole space.
 {code}
   protected static byte[][] toArrays(ByteBuffer[] buffers) {
 byte[][] bytesArr = new byte[buffers.length][];
 ByteBuffer buffer;
 for (int i = 0; i  buffers.length; i++) {
   buffer = buffers[i];
   if (buffer == null) {
 bytesArr[i] = null;
 continue;
   }
   if (buffer.hasArray()) {
 bytesArr[i] = buffer.array();
   } else {
 throw new IllegalArgumentException(Invalid ByteBuffer passed,  +
 expecting heap buffer);
   }
 }
 return bytesArr;
   }
 {code} 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HADOOP-11938) Fix ByteBuffer version encode/decode API of raw erasure coder

2015-05-11 Thread Yi Liu (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-11938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14537713#comment-14537713
 ] 

Yi Liu commented on HADOOP-11938:
-

Some comments:

In *AbstractRawErasureCoder.java*
{code}
+  protected ByteBuffer resetOutputBuffer(ByteBuffer buffer) {
+int pos = buffer.position();
+buffer.put(zeroChunkBytes);
+buffer.position(pos);
+
+return buffer;
   }
{code}
length of zeroChunkBytes could be larger than buffer.remaining(),   just use a 
*for* to put 0 to buffer.

{code}
protected ByteBuffer resetInputBuffer(ByteBuffer buffer) {
{code}
What's the reason we need to reset inputbuffer?  Input buffers are given by 
caller.

In AbstractRawErasureDecoder.java
{code}
boolean usingDirectBuffer = ! inputs[0].hasArray();
{code}
use {{inputs\[0\].isDirect();}}

{code}
@Override
public void decode(ByteBuffer[] inputs, int[] erasedIndexes, ByteBuffer[] 
outputs)
@Override
public void decode(byte[][] inputs, int[] erasedIndexes, byte[][] outputs) {
{code}
We should do following check:
1. all the input have the same length besides some inputs are null, and check 
output has enough space.
2. {{erasedIndexes}} matches the {{null}} position of inputs.
We should also enhance the description in RawErasureDecoder#decode  to describe 
more about decode/reconstruct.

{code}
+for (int i = 0; i  outputs.length; ++i) {
+  buffer = outputs[i];
+  // to be ready for read dataLen bytes
+  buffer.flip();
+  buffer.position(outputOffsets[i]);
+  buffer.limit(outputOffsets[i] + dataLen);
 }
{code}
This is unnecessary, remove it.

In *AbstractRawErasureEncoder.java*
{code}
boolean usingDirectBuffer = ! inputs[0].hasArray();
{code}
use {{inputs\[0\].isDirect();}}

all comments are same as in *AbstractRawErasureDecoder*

in *RSRawDecoder.java*
{code}
assert (getNumDataUnits() + getNumParityUnits()  RSUtil.GF.getFieldSize());

this.errSignature = new int[getNumParityUnits()];
this.primitivePower = RSUtil.getPrimitivePower(getNumDataUnits(),
getNumParityUnits());
{code}
Why not use {{numDataUnits}} and {{numParityUnits}} directly?

In *RSRawEncoder.java*
in {{initialize}}, use {{numDataUnits}} and {{numParityUnits}} directly.


 Fix ByteBuffer version encode/decode API of raw erasure coder
 -

 Key: HADOOP-11938
 URL: https://issues.apache.org/jira/browse/HADOOP-11938
 Project: Hadoop Common
  Issue Type: Sub-task
  Components: io
Reporter: Kai Zheng
Assignee: Kai Zheng
 Attachments: HADOOP-11938-HDFS-7285-v1.patch, 
 HADOOP-11938-HDFS-7285-workaround.patch


 While investigating a test failure in {{TestRecoverStripedFile}}, one issue 
 in raw erasrue coder, caused by an optimization in below codes. It assumes 
 the  heap buffer backed by the bytes array available for reading or writing 
 always starts with zero and takes the whole space.
 {code}
   protected static byte[][] toArrays(ByteBuffer[] buffers) {
 byte[][] bytesArr = new byte[buffers.length][];
 ByteBuffer buffer;
 for (int i = 0; i  buffers.length; i++) {
   buffer = buffers[i];
   if (buffer == null) {
 bytesArr[i] = null;
 continue;
   }
   if (buffer.hasArray()) {
 bytesArr[i] = buffer.array();
   } else {
 throw new IllegalArgumentException(Invalid ByteBuffer passed,  +
 expecting heap buffer);
   }
 }
 return bytesArr;
   }
 {code} 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HADOOP-11938) Fix ByteBuffer version encode/decode API of raw erasure coder

2015-05-11 Thread Yi Liu (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-11938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14537714#comment-14537714
 ] 

Yi Liu commented on HADOOP-11938:
-

Will post more later.

 Fix ByteBuffer version encode/decode API of raw erasure coder
 -

 Key: HADOOP-11938
 URL: https://issues.apache.org/jira/browse/HADOOP-11938
 Project: Hadoop Common
  Issue Type: Sub-task
  Components: io
Reporter: Kai Zheng
Assignee: Kai Zheng
 Attachments: HADOOP-11938-HDFS-7285-v1.patch, 
 HADOOP-11938-HDFS-7285-workaround.patch


 While investigating a test failure in {{TestRecoverStripedFile}}, one issue 
 in raw erasrue coder, caused by an optimization in below codes. It assumes 
 the  heap buffer backed by the bytes array available for reading or writing 
 always starts with zero and takes the whole space.
 {code}
   protected static byte[][] toArrays(ByteBuffer[] buffers) {
 byte[][] bytesArr = new byte[buffers.length][];
 ByteBuffer buffer;
 for (int i = 0; i  buffers.length; i++) {
   buffer = buffers[i];
   if (buffer == null) {
 bytesArr[i] = null;
 continue;
   }
   if (buffer.hasArray()) {
 bytesArr[i] = buffer.array();
   } else {
 throw new IllegalArgumentException(Invalid ByteBuffer passed,  +
 expecting heap buffer);
   }
 }
 return bytesArr;
   }
 {code} 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HADOOP-11938) Fix ByteBuffer version encode/decode API of raw erasure coder

2015-05-11 Thread Yi Liu (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-11938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14539254#comment-14539254
 ] 

Yi Liu commented on HADOOP-11938:
-

In *XORRawDecoder.java*
{code}
int dataLen = inputs[0].remaining();
int erasedIdx = erasedIndexes[0];

// Process the inputs.
int iPos, oPos, iIdx, oIdx;
oPos = output.position();
for (int i = 0; i  inputs.length; i++) {
  // Skip the erased location.
  if (i == erasedIdx) {
continue;
  }

  iPos = inputs[i].position();
  for (iIdx = iPos, oIdx = oPos;
   iIdx  iPos + dataLen; iIdx++, oIdx++) {
output.put(oIdx, (byte) (output.get(oIdx) ^ inputs[i].get(iIdx)));
  }
}
{code}
{{dataLen/iPos/oPos}} are not necessary.  we can use buffer.limit(), position() 
instead.

{code}
@Override
  protected void doDecode(ByteBuffer[] inputs, int[] erasedIndexes,
  ByteBuffer[] outputs) {
ByteBuffer output = outputs[0];
resetOutputBuffer(output);

int dataLen = inputs[0].remaining();
int erasedIdx = erasedIndexes[0];

// Process the inputs.
int iPos, oPos, iIdx, oIdx;
oPos = output.position();
for (int i = 0; i  inputs.length; i++) {
  // Skip the erased location.
  if (i == erasedIdx) {
continue;
  }

  iPos = inputs[i].position();
  for (iIdx = iPos, oIdx = oPos;
   iIdx  iPos + dataLen; iIdx++, oIdx++) {
output.put(oIdx, (byte) (output.get(oIdx) ^ inputs[i].get(iIdx)));
  }
}
  }
{code}
I wonder whether this works, I see it only decode for *output\[0\]*.
Do we ever test this? if not, we should have more test in this patch. 

In *XORRawEncoder.java*
same comments as in XORRawDecoder

In *GaloisField.java*
{code}
+ByteBuffer p, prev, after;
+int pos1, idx1, pos2, idx2;
{code}
besides {{idx1/idx2}}, others are unnecessary, then the code is more clear.
Also in some other places, most of them are unnecessary, if they are only used 
once or twice, we don't need declare a separate variable.


*For the tests, I want to see more tests:*
1) The length of inputs/outputs is not equal to chunksize, we can still decode.
2) Some negative test, we can catch the expected exception.


 Fix ByteBuffer version encode/decode API of raw erasure coder
 -

 Key: HADOOP-11938
 URL: https://issues.apache.org/jira/browse/HADOOP-11938
 Project: Hadoop Common
  Issue Type: Sub-task
  Components: io
Reporter: Kai Zheng
Assignee: Kai Zheng
 Attachments: HADOOP-11938-HDFS-7285-v1.patch, 
 HADOOP-11938-HDFS-7285-workaround.patch


 While investigating a test failure in {{TestRecoverStripedFile}}, one issue 
 in raw erasrue coder, caused by an optimization in below codes. It assumes 
 the  heap buffer backed by the bytes array available for reading or writing 
 always starts with zero and takes the whole space.
 {code}
   protected static byte[][] toArrays(ByteBuffer[] buffers) {
 byte[][] bytesArr = new byte[buffers.length][];
 ByteBuffer buffer;
 for (int i = 0; i  buffers.length; i++) {
   buffer = buffers[i];
   if (buffer == null) {
 bytesArr[i] = null;
 continue;
   }
   if (buffer.hasArray()) {
 bytesArr[i] = buffer.array();
   } else {
 throw new IllegalArgumentException(Invalid ByteBuffer passed,  +
 expecting heap buffer);
   }
 }
 return bytesArr;
   }
 {code} 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HADOOP-11938) Fix ByteBuffer version encode/decode API of raw erasure coder

2015-05-08 Thread Yi Liu (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-11938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14533954#comment-14533954
 ] 

Yi Liu commented on HADOOP-11938:
-

Thanks Kai for the catch.

 Fix ByteBuffer version encode/decode API of raw erasure coder
 -

 Key: HADOOP-11938
 URL: https://issues.apache.org/jira/browse/HADOOP-11938
 Project: Hadoop Common
  Issue Type: Sub-task
  Components: io
Reporter: Kai Zheng
Assignee: Kai Zheng

 While investigating a test failure in {{TestRecoverStripedFile}}, one issue 
 in raw erasrue coder, a bad optimization in below codes. It assumes the  heap 
 buffer backed by the bytes array available for reading or writing always 
 starts with zero and takes the whole.
 {code}
   protected static byte[][] toArrays(ByteBuffer[] buffers) {
 byte[][] bytesArr = new byte[buffers.length][];
 ByteBuffer buffer;
 for (int i = 0; i  buffers.length; i++) {
   buffer = buffers[i];
   if (buffer == null) {
 bytesArr[i] = null;
 continue;
   }
   if (buffer.hasArray()) {
 bytesArr[i] = buffer.array();
   } else {
 throw new IllegalArgumentException(Invalid ByteBuffer passed,  +
 expecting heap buffer);
   }
 }
 return bytesArr;
   }
 {code} 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HADOOP-11920) Refactor some codes for erasure coders

2015-05-06 Thread Yi Liu (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-11920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14531892#comment-14531892
 ] 

Yi Liu commented on HADOOP-11920:
-

Yes, the caller can pass a direct buffer, but also can use a java heap byte 
buffer.  Why it should be DirectByteBuffer?

In RawErasureEncoder, the encode declares it accepts {{ByteBuffer}}
{code}
public void encode(ByteBuffer[] inputs, ByteBuffer[] outputs);
{code}

If we want to accept only Direct ByteBuffer in {{XORRawEncoder#doEncode}}, we 
should check it must be a direct buffer.

 Refactor some codes for erasure coders
 --

 Key: HADOOP-11920
 URL: https://issues.apache.org/jira/browse/HADOOP-11920
 Project: Hadoop Common
  Issue Type: Sub-task
  Components: io
Reporter: Kai Zheng
Assignee: Kai Zheng
 Attachments: HADOOP-11920-HDFS-7285-02.patch, 
 HADOOP-11920-HDFS-7285-v4.patch, HADOOP-11920-v1.patch, 
 HADOOP-11920-v2.patch, HADOOP-11920-v3.patch


 While working on native erasure coders and also HADOOP-11847, it was found in 
 some chances better to refine a little bit of codes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HADOOP-11920) Refactor some codes for erasure coders

2015-05-06 Thread Yi Liu (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-11920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14531858#comment-14531858
 ] 

Yi Liu commented on HADOOP-11920:
-

Thanks Kai for the patch.

{quote}
resetDirectBuffer(outputs[0]);
{quote}
The name should be  resetBuffer, since it could be a java heap buffer?


For the Jenkins, can you run locally for the related test case, if they pass, I 
think it's OK to go.

 Refactor some codes for erasure coders
 --

 Key: HADOOP-11920
 URL: https://issues.apache.org/jira/browse/HADOOP-11920
 Project: Hadoop Common
  Issue Type: Sub-task
  Components: io
Reporter: Kai Zheng
Assignee: Kai Zheng
 Attachments: HADOOP-11920-HDFS-7285-02.patch, 
 HADOOP-11920-HDFS-7285-v4.patch, HADOOP-11920-v1.patch, 
 HADOOP-11920-v2.patch, HADOOP-11920-v3.patch


 While working on native erasure coders and also HADOOP-11847, it was found in 
 some chances better to refine a little bit of codes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HADOOP-11920) Refactor some codes for erasure coders

2015-05-06 Thread Yi Liu (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-11920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14531994#comment-14531994
 ] 

Yi Liu commented on HADOOP-11920:
-

+1 , thanks Kai.

 Refactor some codes for erasure coders
 --

 Key: HADOOP-11920
 URL: https://issues.apache.org/jira/browse/HADOOP-11920
 Project: Hadoop Common
  Issue Type: Sub-task
  Components: io
Reporter: Kai Zheng
Assignee: Kai Zheng
 Attachments: HADOOP-11920-HDFS-7285-02.patch, 
 HADOOP-11920-HDFS-7285-v4.patch, HADOOP-11920-HDFS-7285-v5.patch, 
 HADOOP-11920-v1.patch, HADOOP-11920-v2.patch, HADOOP-11920-v3.patch


 While working on native erasure coders and also HADOOP-11847, it was found in 
 some chances better to refine a little bit of codes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HADOOP-11847) Enhance raw coder allowing to read least required inputs in decoding

2015-05-04 Thread Yi Liu (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-11847?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14526741#comment-14526741
 ] 

Yi Liu commented on HADOOP-11847:
-

Hi Zhe, for striped block recovery, there are several situations:
1) only parity blocks missed
2) only data blocks missed
3) both parity and data blocks missed.

Before this patch commit, In HDFS-7348, for #1, I use encode as workaround, but 
it will encode all parity blocks. For #2, I found decode only works for data 
blocks, and the erasureIndices needs some special handle, see the decode test, 
so in HDFS-7348, in the test I made parityBlkNum of data blocks missed, then it 
works, but we need to have full inputs and allocate more buffers.  For #3, it 
doesn't work and there is no test.

So if without this fix, in HDFS-7348, HDFS-7678, the decode is just workaround 
and we still need to update after this patch. 
Even the decode interface is the same, but there is different requirements for 
the input parameters, so the code logic will be different. 

Should we review and push this patch as soon as possible? It's a block issue.
Ideally for {{decode}}, the input should be: 1) minimal input blocks (may 
include data or parity blocks), 2) Indices of input blocks, or some way to let 
decode function know, 3) output is blocks to be recovered (one or more), 4) 
Indices of output blocks.

 Enhance raw coder allowing to read least required inputs in decoding
 

 Key: HADOOP-11847
 URL: https://issues.apache.org/jira/browse/HADOOP-11847
 Project: Hadoop Common
  Issue Type: Sub-task
  Components: io
Reporter: Kai Zheng
Assignee: Kai Zheng
 Attachments: HADOOP-11847-HDFS-7285-v3.patch, 
 HADOOP-11847-HDFS-7285-v4.patch, HADOOP-11847-v1.patch, HADOOP-11847-v2.patch


 This is to enhance raw erasure coder to allow only reading least required 
 inputs while decoding. It will also refine and document the relevant APIs for 
 better understanding and usage. When using least required inputs, it may add 
 computating overhead but will possiblly outperform overall since less network 
 traffic and disk IO are involved.
 This is something planned to do but just got reminded by [~zhz]' s question 
 raised in HDFS-7678, also copied here:
 bq.Kai Zheng I have a question about decoding: in a (6+3) schema, if block #2 
 is missing, and I want to repair it with blocks 0, 1, 3, 4, 5, 8, how should 
 I construct the inputs to RawErasureDecoder#decode?
 With this work, hopefully the answer to above question would be obvious.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HADOOP-11887) Introduce Intel ISA-L erasure coding library for the native support

2015-05-04 Thread Yi Liu (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-11887?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14527736#comment-14527736
 ] 

Yi Liu commented on HADOOP-11887:
-

Hi [~cmccabe] and [~andrew.wang], do you have time to review this, appreciate 
if you guys can help? Since you are native experts :) 

 Introduce Intel ISA-L erasure coding library for the native support
 ---

 Key: HADOOP-11887
 URL: https://issues.apache.org/jira/browse/HADOOP-11887
 Project: Hadoop Common
  Issue Type: Sub-task
  Components: io
Reporter: Kai Zheng
Assignee: Kai Zheng
 Attachments: HADOOP-11887-v1.patch


 This is to introduce Intel ISA-L erasure coding library for the native 
 support, via dynamic loading mechanism (dynamic module, like *.so in *nix and 
 *.dll on Windows).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HADOOP-11908) Erasure coding: Should be able to encode part of parity blocks.

2015-05-03 Thread Yi Liu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-11908?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yi Liu updated HADOOP-11908:

Assignee: Kai Zheng  (was: Yi Liu)

 Erasure coding: Should be able to encode part of parity blocks.
 ---

 Key: HADOOP-11908
 URL: https://issues.apache.org/jira/browse/HADOOP-11908
 Project: Hadoop Common
  Issue Type: Sub-task
  Components: io
Reporter: Yi Liu
Assignee: Kai Zheng

 {code}
 public void encode(ByteBuffer[] inputs, ByteBuffer[] outputs);
 {code}
 Currently when we do encode, the outputs are all parity blocks, we should be 
 able to encode part of parity blocks. 
 This is required when we do datanode striped block recovery, if one or more 
 parity blocks are missed, we need to do encode to recovery them. Only encode 
 part of parity blocks should be more efficient than all.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HADOOP-11908) Erasure coding: Should be able to encode part of parity blocks.

2015-05-03 Thread Yi Liu (JIRA)

Yi Liu created HADOOP-11908:
---

 Summary: Erasure coding: Should be able to encode part of parity 
blocks.
 Key: HADOOP-11908
 URL: https://issues.apache.org/jira/browse/HADOOP-11908
 Project: Hadoop Common
  Issue Type: Sub-task
Reporter: Yi Liu
Assignee: Yi Liu


{code}
public void encode(ByteBuffer[] inputs, ByteBuffer[] outputs);
{code}
Currently when we do encode, the outputs are all parity blocks, we should be 
able to encode part of parity blocks. 
This is required when we do datanode striped block recovery, if one or more 
parity blocks are missed, we need to do encode to recovery them. Only encode 
part of parity blocks should be more efficient than all.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HADOOP-11887) Introduce Intel ISA-L erasure coding library for the native support

2015-04-29 Thread Yi Liu (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-11887?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14520689#comment-14520689
 ] 

Yi Liu commented on HADOOP-11887:
-

Thanks Kai for the work, I will take look at it in following few days.

 Introduce Intel ISA-L erasure coding library for the native support
 ---

 Key: HADOOP-11887
 URL: https://issues.apache.org/jira/browse/HADOOP-11887
 Project: Hadoop Common
  Issue Type: Sub-task
  Components: io
Reporter: Kai Zheng
Assignee: Kai Zheng
 Attachments: HADOOP-11887-v1.patch


 This is to introduce Intel ISA-L erasure coding library for the native 
 support, via dynamic loading mechanism (dynamic module, like *.so in *nix and 
 *.dll on Windows).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HADOOP-11766) Generic token authentication support for Hadoop

2015-04-17 Thread Yi Liu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-11766?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yi Liu updated HADOOP-11766:

Assignee: Kai Zheng

 Generic token authentication support for Hadoop
 ---

 Key: HADOOP-11766
 URL: https://issues.apache.org/jira/browse/HADOOP-11766
 Project: Hadoop Common
  Issue Type: New Feature
  Components: security
Reporter: Kai Zheng
Assignee: Kai Zheng

 As a major goal of Rhino project, we proposed *TokenAuth* effort in 
 HADOOP-9392, where it's to provide a common token authentication framework to 
 integrate multiple authentication mechanisms, by adding a new 
 {{AuthenticationMethod}} in lieu of {{KERBEROS}} and {{SIMPLE}}. To minimize 
 the required changes and risk, we thought of another approach to achieve the 
 general goals based on Kerberos as Kerberos itself supports a 
 pre-authentication framework in both spec and implementation, which was 
 discussed in HADOOP-10959 as *TokenPreauth*. In both approaches, we had 
 performed workable prototypes covering both command line console and Hadoop 
 web UI. 
 As HADOOP-9392 is rather lengthy and heavy, HADOOP-10959 is mostly focused on 
 the concrete implementation approach based on Kerberos, we open this for more 
 general and updated discussions about requirement, use cases, and concerns 
 for the generic token authentication support for Hadoop. We distinguish this 
 token from existing Hadoop tokens as the token in this discussion is majorly 
 for the initial and primary authentication. We will refine our existing codes 
 in HADOOP-9392 and HADOOP-10959, break them down into smaller patches based 
 on latest trunk. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HADOOP-11766) Generic token authentication support for Hadoop

2015-04-16 Thread Yi Liu (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-11766?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14497640#comment-14497640
 ] 

Yi Liu commented on HADOOP-11766:
-

Hi Kai, could you upload a design doc firstly? Then people can have better 
understand.

 Generic token authentication support for Hadoop
 ---

 Key: HADOOP-11766
 URL: https://issues.apache.org/jira/browse/HADOOP-11766
 Project: Hadoop Common
  Issue Type: New Feature
  Components: security
Reporter: Kai Zheng

 As a major goal of Rhino project, we proposed *TokenAuth* effort in 
 HADOOP-9392, where it's to provide a common token authentication framework to 
 integrate multiple authentication mechanisms, by adding a new 
 {{AuthenticationMethod}} in lieu of {{KERBEROS}} and {{SIMPLE}}. To minimize 
 the required changes and risk, we thought of another approach to achieve the 
 general goals based on Kerberos as Kerberos itself supports a 
 pre-authentication framework in both spec and implementation, which was 
 discussed in HADOOP-10959 as *TokenPreauth*. In both approaches, we had 
 performed workable prototypes covering both command line console and Hadoop 
 web UI. 
 As HADOOP-9392 is rather lengthy and heavy, HADOOP-10959 is mostly focused on 
 the concrete implementation approach based on Kerberos, we open this for more 
 general and updated discussions about requirement, use cases, and concerns 
 for the generic token authentication support for Hadoop. We distinguish this 
 token from existing Hadoop tokens as the token in this discussion is majorly 
 for the initial and primary authentication. We will refine our existing codes 
 in HADOOP-9392 and HADOOP-10959, break them down into smaller patches based 
 on latest trunk. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HADOOP-11789) NPE in TestCryptoStreamsWithOpensslAesCtrCryptoCodec

2015-04-09 Thread Yi Liu (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-11789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14487399#comment-14487399
 ] 

Yi Liu commented on HADOOP-11789:
-

{quote}
but the NPE on Jenkins needs to be fixed on the Jenkins side.
{quote}
Sorry, I missed this comment from Andrew :)   The new patch addresses that, 
thanks.

 NPE in TestCryptoStreamsWithOpensslAesCtrCryptoCodec 
 -

 Key: HADOOP-11789
 URL: https://issues.apache.org/jira/browse/HADOOP-11789
 Project: Hadoop Common
  Issue Type: Bug
Affects Versions: 3.0.0, 2.8.0
 Environment: ASF Jenkins
Reporter: Steve Loughran
Assignee: Yi Liu
 Attachments: HADOOP-11789.001.patch, HADOOP-11789.002.patch


 NPE surfacing in {{TestCryptoStreamsWithOpensslAesCtrCryptoCodec}} on  Jenkins



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HADOOP-11789) NPE in TestCryptoStreamsWithOpensslAesCtrCryptoCodec

2015-04-09 Thread Yi Liu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-11789?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yi Liu updated HADOOP-11789:

Status: Patch Available  (was: Reopened)

 NPE in TestCryptoStreamsWithOpensslAesCtrCryptoCodec 
 -

 Key: HADOOP-11789
 URL: https://issues.apache.org/jira/browse/HADOOP-11789
 Project: Hadoop Common
  Issue Type: Bug
Affects Versions: 3.0.0, 2.8.0
 Environment: ASF Jenkins
Reporter: Steve Loughran
Assignee: Yi Liu
 Attachments: HADOOP-11789.001.patch, HADOOP-11789.002.patch


 NPE surfacing in {{TestCryptoStreamsWithOpensslAesCtrCryptoCodec}} on  Jenkins



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HADOOP-11789) NPE in TestCryptoStreamsWithOpensslAesCtrCryptoCodec

2015-04-09 Thread Yi Liu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-11789?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yi Liu updated HADOOP-11789:

Attachment: HADOOP-11789.002.patch

[~ste...@apache.org], good idea, we should have a better message, thanks.

Update the patch to give a better message.


 NPE in TestCryptoStreamsWithOpensslAesCtrCryptoCodec 
 -

 Key: HADOOP-11789
 URL: https://issues.apache.org/jira/browse/HADOOP-11789
 Project: Hadoop Common
  Issue Type: Bug
Affects Versions: 3.0.0, 2.8.0
 Environment: ASF Jenkins
Reporter: Steve Loughran
Assignee: Yi Liu
 Attachments: HADOOP-11789.001.patch, HADOOP-11789.002.patch


 NPE surfacing in {{TestCryptoStreamsWithOpensslAesCtrCryptoCodec}} on  Jenkins



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HADOOP-11789) NPE in TestCryptoStreamsWithOpensslAesCtrCryptoCodec

2015-04-08 Thread Yi Liu (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-11789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14484799#comment-14484799
 ] 

Yi Liu commented on HADOOP-11789:
-

Colin, if {{-Pnative}} is set, but the os doesn't have a correct version of 
OpenSSL, then we should make the test failed? Could we test the crypto streams 
with OpensslAesCtrCryptoCodec only if correct Openssl is loaded? Otherwise 
people will still see the failure in the environment without correct OpenSSL.

 NPE in TestCryptoStreamsWithOpensslAesCtrCryptoCodec 
 -

 Key: HADOOP-11789
 URL: https://issues.apache.org/jira/browse/HADOOP-11789
 Project: Hadoop Common
  Issue Type: Bug
Affects Versions: 3.0.0, 2.8.0
 Environment: ASF Jenkins
Reporter: Steve Loughran
Assignee: Yi Liu
 Attachments: HADOOP-11789.001.patch


 NPE surfacing in {{TestCryptoStreamsWithOpensslAesCtrCryptoCodec}} on  Jenkins



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HADOOP-11789) NPE in TestCryptoStreamsWithOpensslAesCtrCryptoCodec

2015-04-08 Thread Yi Liu (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-11789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14486456#comment-14486456
 ] 

Yi Liu commented on HADOOP-11789:
-

Colin, I'm OK to close it as WONTFIX. 
[~steve_l] and [~xyao], do you have comments? If not, I will close it.

 NPE in TestCryptoStreamsWithOpensslAesCtrCryptoCodec 
 -

 Key: HADOOP-11789
 URL: https://issues.apache.org/jira/browse/HADOOP-11789
 Project: Hadoop Common
  Issue Type: Bug
Affects Versions: 3.0.0, 2.8.0
 Environment: ASF Jenkins
Reporter: Steve Loughran
Assignee: Yi Liu
 Attachments: HADOOP-11789.001.patch


 NPE surfacing in {{TestCryptoStreamsWithOpensslAesCtrCryptoCodec}} on  Jenkins



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HADOOP-11789) NPE in TestCryptoStreamsWithOpensslAesCtrCryptoCodec

2015-04-08 Thread Yi Liu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-11789?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yi Liu updated HADOOP-11789:

Resolution: Won't Fix
Status: Resolved  (was: Patch Available)

Thanks Colin and Andrew for the comments, close it as WON'T FIX.

 NPE in TestCryptoStreamsWithOpensslAesCtrCryptoCodec 
 -

 Key: HADOOP-11789
 URL: https://issues.apache.org/jira/browse/HADOOP-11789
 Project: Hadoop Common
  Issue Type: Bug
Affects Versions: 3.0.0, 2.8.0
 Environment: ASF Jenkins
Reporter: Steve Loughran
Assignee: Yi Liu
 Attachments: HADOOP-11789.001.patch


 NPE surfacing in {{TestCryptoStreamsWithOpensslAesCtrCryptoCodec}} on  Jenkins



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HADOOP-11789) NPE in TestCryptoStreamsWithOpensslAesCtrCryptoCodec

2015-04-02 Thread Yi Liu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-11789?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yi Liu updated HADOOP-11789:

Attachment: HADOOP-11789.001.patch

The failure is because openssl is not loaded or test is not run with -Pnative 
flag. Update the patch.

 NPE in TestCryptoStreamsWithOpensslAesCtrCryptoCodec 
 -

 Key: HADOOP-11789
 URL: https://issues.apache.org/jira/browse/HADOOP-11789
 Project: Hadoop Common
  Issue Type: Bug
Affects Versions: 3.0.0, 2.8.0
 Environment: ASF Jenkins
Reporter: Steve Loughran
Assignee: Yi Liu
 Attachments: HADOOP-11789.001.patch


 NPE surfacing in {{TestCryptoStreamsWithOpensslAesCtrCryptoCodec}} on  Jenkins



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HADOOP-11789) NPE in TestCryptoStreamsWithOpensslAesCtrCryptoCodec

2015-04-02 Thread Yi Liu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-11789?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yi Liu updated HADOOP-11789:

Status: Patch Available  (was: Open)

 NPE in TestCryptoStreamsWithOpensslAesCtrCryptoCodec 
 -

 Key: HADOOP-11789
 URL: https://issues.apache.org/jira/browse/HADOOP-11789
 Project: Hadoop Common
  Issue Type: Bug
Affects Versions: 3.0.0, 2.8.0
 Environment: ASF Jenkins
Reporter: Steve Loughran
Assignee: Yi Liu
 Attachments: HADOOP-11789.001.patch


 NPE surfacing in {{TestCryptoStreamsWithOpensslAesCtrCryptoCodec}} on  Jenkins



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Assigned] (HADOOP-11789) NPE in TestCryptoStreamsWithOpensslAesCtrCryptoCodec

2015-04-02 Thread Yi Liu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-11789?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yi Liu reassigned HADOOP-11789:
---

Assignee: Yi Liu

 NPE in TestCryptoStreamsWithOpensslAesCtrCryptoCodec 
 -

 Key: HADOOP-11789
 URL: https://issues.apache.org/jira/browse/HADOOP-11789
 Project: Hadoop Common
  Issue Type: Bug
Affects Versions: 3.0.0, 2.8.0
 Environment: ASF Jenkins
Reporter: Steve Loughran
Assignee: Yi Liu

 NPE surfacing in {{TestCryptoStreamsWithOpensslAesCtrCryptoCodec}} on  Jenkins



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HADOOP-10300) Allowed deferred sending of call responses

2015-03-30 Thread Yi Liu (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-10300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14386708#comment-14386708
 ] 

Yi Liu commented on HADOOP-10300:
-

{code}
  public void sendResponse() throws IOException {
  int count = responseWaitCount.decrementAndGet();
  assert count = 0 : response has already been sent;
  if (count == 0) {
if (rpcResponse == null) {
  // needed by postponed operations to indicate an exception has
  // occurred.  it's too late to re-encode the response so just
  // drop the connection.  unlikely to occur in practice but in tests
  connection.close();
} else {
  connection.sendResponse(this);
}
  }
}
{code}

In real case, {{rpcResponse}} has value before {{sendResponse}}, so it seems 
{{if (rpcResponse == null)}} will not happen.  Can we remove 
{{connection.close()}} and modify the test which makes this happen?

 Allowed deferred sending of call responses
 --

 Key: HADOOP-10300
 URL: https://issues.apache.org/jira/browse/HADOOP-10300
 Project: Hadoop Common
  Issue Type: Sub-task
  Components: ipc
Affects Versions: 2.0.0-alpha, 3.0.0
Reporter: Daryn Sharp
Assignee: Daryn Sharp
 Attachments: HADOOP-10300.patch, HADOOP-10300.patch


 RPC handlers currently do not return until the RPC call completes and 
 response is sent, or a partially sent response has been queued for the 
 responder.  It would be useful for a proxy method to notify the handler to 
 not yet the send the call's response.
 An potential use case is a namespace handler in the NN might want to return 
 before the edit log is synced so it can service more requests and allow 
 increased batching of edits per sync.  Background syncing could later trigger 
 the sending of the call response to the client.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HADOOP-11710) Make CryptoOutputStream behave like DFSOutputStream wrt synchronization

2015-03-15 Thread Yi Liu (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-11710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14362630#comment-14362630
 ] 

Yi Liu commented on HADOOP-11710:
-

{quote}
I cherry-picked this to branch-2.7
{quote}
Oh, I missed that. Thanks for committing to branch-2.7, [~ozawa].

 Make CryptoOutputStream behave like DFSOutputStream wrt synchronization
 ---

 Key: HADOOP-11710
 URL: https://issues.apache.org/jira/browse/HADOOP-11710
 Project: Hadoop Common
  Issue Type: Sub-task
  Components: fs
Affects Versions: 2.6.0
Reporter: Sean Busbey
Assignee: Sean Busbey
Priority: Critical
 Fix For: 2.7.0

 Attachments: HADOOP-11710.1.patch.txt, HADOOP-11710.2.patch.txt, 
 HADOOP-11710.3.patch.txt


 per discussion on parent, as an intermediate solution make CryptoOutputStream 
 behave like DFSOutputStream



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HADOOP-11711) Provide a default value for AES/CTR/NoPadding CryptoCodec classes

2015-03-12 Thread Yi Liu (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-11711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14359879#comment-14359879
 ] 

Yi Liu commented on HADOOP-11711:
-

+1, Thanks Andrew!

 Provide a default value for AES/CTR/NoPadding CryptoCodec classes
 -

 Key: HADOOP-11711
 URL: https://issues.apache.org/jira/browse/HADOOP-11711
 Project: Hadoop Common
  Issue Type: Improvement
Affects Versions: 2.6.0
Reporter: Andrew Wang
Assignee: Andrew Wang
Priority: Minor
 Attachments: hadoop-11711.001.patch, hadoop-11711.002.patch


 Users can configure the desired class to use for a given codec via a property 
 like {{hadoop.security.crypto.codec.classes.aes.ctr.nopadding}}. However, 
 even though we provide a default value for this codec in 
 {{core-default.xml}}, this default is not also done in the code.
 As a result, client deployments that do not include {{core-default.xml}} 
 cannot resolve any codecs, and get an NPE.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HADOOP-11710) Make CryptoOutputStream behave like DFSOutputStream wrt synchronization

2015-03-12 Thread Yi Liu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-11710?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yi Liu updated HADOOP-11710:

  Resolution: Fixed
   Fix Version/s: 2.7.0
Target Version/s: 2.7.0
Hadoop Flags: Reviewed
  Status: Resolved  (was: Patch Available)

Committed to trunk and branch-2.

 Make CryptoOutputStream behave like DFSOutputStream wrt synchronization
 ---

 Key: HADOOP-11710
 URL: https://issues.apache.org/jira/browse/HADOOP-11710
 Project: Hadoop Common
  Issue Type: Sub-task
  Components: fs
Affects Versions: 2.6.0
Reporter: Sean Busbey
Assignee: Sean Busbey
Priority: Critical
 Fix For: 2.7.0

 Attachments: HADOOP-11710.1.patch.txt, HADOOP-11710.2.patch.txt, 
 HADOOP-11710.3.patch.txt


 per discussion on parent, as an intermediate solution make CryptoOutputStream 
 behave like DFSOutputStream



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Comment Edited] (HADOOP-11710) Make CryptoOutputStream behave like DFSOutputStream wrt synchronization

2015-03-12 Thread Yi Liu (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-11710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14359876#comment-14359876
 ] 

Yi Liu edited comment on HADOOP-11710 at 3/13/15 3:25 AM:
--

Committed to trunk and branch-2.

The test failure is not related.


was (Author: hitliuyi):
Committed to trunk and branch-2.

 Make CryptoOutputStream behave like DFSOutputStream wrt synchronization
 ---

 Key: HADOOP-11710
 URL: https://issues.apache.org/jira/browse/HADOOP-11710
 Project: Hadoop Common
  Issue Type: Sub-task
  Components: fs
Affects Versions: 2.6.0
Reporter: Sean Busbey
Assignee: Sean Busbey
Priority: Critical
 Fix For: 2.7.0

 Attachments: HADOOP-11710.1.patch.txt, HADOOP-11710.2.patch.txt, 
 HADOOP-11710.3.patch.txt


 per discussion on parent, as an intermediate solution make CryptoOutputStream 
 behave like DFSOutputStream



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HADOOP-11711) Provide a default value for AES/CTR/NoPadding CryptoCodec classes

2015-03-12 Thread Yi Liu (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-11711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14359839#comment-14359839
 ] 

Yi Liu commented on HADOOP-11711:
-

Thanks [~andrew.wang] for the patch, it looks good to me, +1 pending Jenkins.
I find a really small nit in the test, it would be better if you could address:
{code}
public static final String
  HADOOP_SECURITY_CRYPTO_CODEC_CLASSES_AES_CTR_NOPADDING_KEY =
  HADOOP_SECURITY_CRYPTO_CODEC_CLASSES_KEY_PREFIX
  + CipherSuite.AES_CTR_NOPADDING.getConfigSuffix();
{code}
In {{CommonConfigurationKeysPublic.java}}, 
{{HADOOP_SECURITY_CRYPTO_CODEC_CLASSES_AES_CTR_NOPADDING_KEY}}  is defined, we 
could use it in TestCryptoStreamsWithJceAesCtrCryptoCodec.java and 
TestCryptoStreamsWithOpensslAesCtrCryptoCodec.java instead of constructing the 
string again.

 Provide a default value for AES/CTR/NoPadding CryptoCodec classes
 -

 Key: HADOOP-11711
 URL: https://issues.apache.org/jira/browse/HADOOP-11711
 Project: Hadoop Common
  Issue Type: Improvement
Affects Versions: 2.6.0
Reporter: Andrew Wang
Assignee: Andrew Wang
Priority: Minor
 Attachments: hadoop-11711.001.patch


 Users can configure the desired class to use for a given codec via a property 
 like {{hadoop.security.crypto.codec.classes.aes.ctr.nopadding}}. However, 
 even though we provide a default value for this codec in 
 {{core-default.xml}}, this default is not also done in the code.
 As a result, client deployments that do not include {{core-default.xml}} 
 cannot resolve any codecs, and get an NPE.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HADOOP-11710) Make CryptoOutputStream behave like DFSOutputStream wrt synchronization

2015-03-12 Thread Yi Liu (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-11710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14359815#comment-14359815
 ] 

Yi Liu commented on HADOOP-11710:
-

+1 pending Jenkins.

 Make CryptoOutputStream behave like DFSOutputStream wrt synchronization
 ---

 Key: HADOOP-11710
 URL: https://issues.apache.org/jira/browse/HADOOP-11710
 Project: Hadoop Common
  Issue Type: Sub-task
  Components: fs
Affects Versions: 2.6.0
Reporter: Sean Busbey
Assignee: Sean Busbey
Priority: Critical
 Attachments: HADOOP-11710.1.patch.txt, HADOOP-11710.2.patch.txt, 
 HADOOP-11710.3.patch.txt


 per discussion on parent, as an intermediate solution make CryptoOutputStream 
 behave like DFSOutputStream



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HADOOP-11708) CryptoOutputStream synchronization differences from DFSOutputStream break HBase

2015-03-12 Thread Yi Liu (JIRA)

[
https://issues.apache.org/jira/browse/HADOOP-11708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14359718#comment-14359718
]

Yi Liu commented on HADOOP-11708:
-

I also +1 for changing CryptoOutputStream to behave the same as HDFS.
We could not make DFSOutputStream or CryptOutputStream *synchronized* for all
methods, that would affect performance, in most cases, applications should
handle the synchronization, so it's enough we keep the same behave as HDFS.

Sorry that I could not get time working on HDFS-7911 in the past two days for
personal reason. Since [~busbey] has a patch in HADOOP-11710, I would mark
HDFS-7911 as duplicated.

CryptoOutputStream synchronization differences from DFSOutputStream break
HBase
---

Key: HADOOP-11708
URL: https://issues.apache.org/jira/browse/HADOOP-11708
Project: Hadoop Common
Issue Type: Bug
Components: fs
Affects Versions: 2.6.0
Reporter: Sean Busbey
Assignee: Sean Busbey
Priority: Critical

For the write-ahead-log, HBase writes to DFS from a single thread and sends
sync/flush/hflush from a configurable number of other threads (default 5).
FSDataOutputStream does not document anything about being thread safe, and it
is not thread safe for concurrent writes.
However, DFSOutputStream is thread safe for concurrent writes + syncs. When
it is the stream FSDataOutputStream wraps, the combination is threadsafe for
1 writer and multiple syncs (the exact behavior HBase relies on).
When HDFS Transparent Encryption is turned on, CryptoOutputStream is inserted
between FSDataOutputStream and DFSOutputStream. It is proactively labeled as
not thread safe, and this composition is not thread safe for any operations.

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HADOOP-11710) Make CryptoOutputStream behave like DFSOutputStream wrt synchronization

2015-03-12 Thread Yi Liu (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-11710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14359737#comment-14359737
 ] 

Yi Liu commented on HADOOP-11710:
-

Sean, don't move {{closed = true;}}. {{super.close();}} will invoke flush to 
flush the remaining data in the buffer, if we set *closed* to true before 
invoking {{super.close()}}, we will get error. 
I think the test failure should be related to this.

 Make CryptoOutputStream behave like DFSOutputStream wrt synchronization
 ---

 Key: HADOOP-11710
 URL: https://issues.apache.org/jira/browse/HADOOP-11710
 Project: Hadoop Common
  Issue Type: Sub-task
  Components: fs
Affects Versions: 2.6.0
Reporter: Sean Busbey
Assignee: Sean Busbey
Priority: Critical
 Attachments: HADOOP-11710.1.patch.txt, HADOOP-11710.2.patch.txt


 per discussion on parent, as an intermediate solution make CryptoOutputStream 
 behave like DFSOutputStream



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Comment Edited] (HADOOP-11710) Make CryptoOutputStream behave like DFSOutputStream wrt synchronization

2015-03-12 Thread Yi Liu (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-11710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14359737#comment-14359737
 ] 

Yi Liu edited comment on HADOOP-11710 at 3/13/15 1:20 AM:
--

Sean, don't move {{closed = true;}}. 
{{super.close();}} will invoke flush to flush the remaining data in the buffer, 
if we set *closed* to true before invoking {{super.close()}}, we will get 
error. 
I think the test failure should be related to this.


was (Author: hitliuyi):
Sean, don't move {{closed = true;}}. {{super.close();}} will invoke flush to 
flush the remaining data in the buffer, if we set *closed* to true before 
invoking {{super.close()}}, we will get error. 
I think the test failure should be related to this.

 Make CryptoOutputStream behave like DFSOutputStream wrt synchronization
 ---

 Key: HADOOP-11710
 URL: https://issues.apache.org/jira/browse/HADOOP-11710
 Project: Hadoop Common
  Issue Type: Sub-task
  Components: fs
Affects Versions: 2.6.0
Reporter: Sean Busbey
Assignee: Sean Busbey
Priority: Critical
 Attachments: HADOOP-11710.1.patch.txt, HADOOP-11710.2.patch.txt


 per discussion on parent, as an intermediate solution make CryptoOutputStream 
 behave like DFSOutputStream



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HADOOP-11710) Make CryptoOutputStream behave like DFSOutputStream wrt synchronization

2015-03-12 Thread Yi Liu (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-11710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14359743#comment-14359743
 ] 

Yi Liu commented on HADOOP-11710:
-

Oh, I just see Steve's comments
{quote}
However, I would recommend one change, which is in close(), move the close=true 
operation up immediately after the close check, just in case something in 
{{freeBuffers() }} raised an exception or the parent did -it'll stop a second 
close() call getting into a mess. This is not really related to the rest of the 
patch, except in the general improve re-entrancy contex
{quote}
I agree we should make {{closed}} be set to false, also I think use sun's API 
to release directbuffer rarely failed.  Maybe we can put it in {{try... 
finally}}.



 Make CryptoOutputStream behave like DFSOutputStream wrt synchronization
 ---

 Key: HADOOP-11710
 URL: https://issues.apache.org/jira/browse/HADOOP-11710
 Project: Hadoop Common
  Issue Type: Sub-task
  Components: fs
Affects Versions: 2.6.0
Reporter: Sean Busbey
Assignee: Sean Busbey
Priority: Critical
 Attachments: HADOOP-11710.1.patch.txt, HADOOP-11710.2.patch.txt


 per discussion on parent, as an intermediate solution make CryptoOutputStream 
 behave like DFSOutputStream



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HADOOP-11674) oneByteBuf in CryptoInputStream and CryptoOutputStream should be non static

2015-03-04 Thread Yi Liu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-11674?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yi Liu updated HADOOP-11674:

Summary: oneByteBuf in CryptoInputStream and CryptoOutputStream should be 
non static  (was: data corruption for parallel CryptoInputStream and 
CryptoOutputStream)

 oneByteBuf in CryptoInputStream and CryptoOutputStream should be non static
 ---

 Key: HADOOP-11674
 URL: https://issues.apache.org/jira/browse/HADOOP-11674
 Project: Hadoop Common
  Issue Type: Bug
  Components: io
Affects Versions: 2.6.0
Reporter: Sean Busbey
Assignee: Sean Busbey
Priority: Critical
 Attachments: HADOOP-11674.1.patch


 A common optimization in the io classes for Input/Output Streams is to save a 
 single length-1 byte array to use in single byte read/write calls.
 CryptoInputStream and CryptoOutputStream both attempt to follow this practice 
 but mistakenly mark the array as static. That means that only a single 
 instance of each can be present in a JVM safely.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HADOOP-11674) oneByteBuf in CryptoInputStream and CryptoOutputStream should be non static

2015-03-04 Thread Yi Liu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-11674?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yi Liu updated HADOOP-11674:

  Resolution: Fixed
   Fix Version/s: 2.7.0
Target Version/s: 2.7.0  (was: 3.0.0, 2.7.0, 2.6.1)
Hadoop Flags: Reviewed
  Status: Resolved  (was: Patch Available)

Committed to trunk and branch-2. Thanks [~busbey] for the contribution.

 oneByteBuf in CryptoInputStream and CryptoOutputStream should be non static
 ---

 Key: HADOOP-11674
 URL: https://issues.apache.org/jira/browse/HADOOP-11674
 Project: Hadoop Common
  Issue Type: Bug
  Components: io
Affects Versions: 2.6.0
Reporter: Sean Busbey
Assignee: Sean Busbey
Priority: Critical
 Fix For: 2.7.0

 Attachments: HADOOP-11674.1.patch


 A common optimization in the io classes for Input/Output Streams is to save a 
 single length-1 byte array to use in single byte read/write calls.
 CryptoInputStream and CryptoOutputStream both attempt to follow this practice 
 but mistakenly mark the array as static. That means that only a single 
 instance of each can be present in a JVM safely.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HADOOP-11674) oneByteBuf in CryptoInputStream and CryptoOutputStream should be non static

2015-03-04 Thread Yi Liu (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-11674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14348315#comment-14348315
 ] 

Yi Liu commented on HADOOP-11674:
-

+1, {{oneByteBuf}} should be non-static, otherwise there may be issue for 
{{read()}} in multi threads.

 oneByteBuf in CryptoInputStream and CryptoOutputStream should be non static
 ---

 Key: HADOOP-11674
 URL: https://issues.apache.org/jira/browse/HADOOP-11674
 Project: Hadoop Common
  Issue Type: Bug
  Components: io
Affects Versions: 2.6.0
Reporter: Sean Busbey
Assignee: Sean Busbey
Priority: Critical
 Attachments: HADOOP-11674.1.patch


 A common optimization in the io classes for Input/Output Streams is to save a 
 single length-1 byte array to use in single byte read/write calls.
 CryptoInputStream and CryptoOutputStream both attempt to follow this practice 
 but mistakenly mark the array as static. That means that only a single 
 instance of each can be present in a JVM safely.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HADOOP-11664) Loading predefined EC schemas from configuration

2015-03-02 Thread Yi Liu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-11664?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yi Liu updated HADOOP-11664:

Issue Type: Sub-task  (was: Task)
Parent: HADOOP-11264

 Loading predefined EC schemas from configuration
 

 Key: HADOOP-11664
 URL: https://issues.apache.org/jira/browse/HADOOP-11664
 Project: Hadoop Common
  Issue Type: Sub-task
Reporter: Kai Zheng
Assignee: Kai Zheng
 Attachments: HDFS-7371_v1.patch


 System administrator can configure multiple EC codecs in hdfs-site.xml file, 
 and codec instances or schemas in a new configuration file named 
 ec-schema.xml in the conf folder. A codec can be referenced by its instance 
 or schema using the codec name, and a schema can be utilized and specified by 
 the schema name for a folder or EC ZONE to enforce EC. Once a schema is used 
 to define an EC ZONE, then its associated parameter values will be stored as 
 xattributes and respected thereafter.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Moved] (HADOOP-11664) Loading predefined EC schemas from configuration

2015-03-02 Thread Yi Liu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-11664?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yi Liu moved HDFS-7371 to HADOOP-11664:
---

Key: HADOOP-11664  (was: HDFS-7371)
Project: Hadoop Common  (was: Hadoop HDFS)

 Loading predefined EC schemas from configuration
 

 Key: HADOOP-11664
 URL: https://issues.apache.org/jira/browse/HADOOP-11664
 Project: Hadoop Common
  Issue Type: Task
Reporter: Kai Zheng
Assignee: Kai Zheng
 Attachments: HDFS-7371_v1.patch


 System administrator can configure multiple EC codecs in hdfs-site.xml file, 
 and codec instances or schemas in a new configuration file named 
 ec-schema.xml in the conf folder. A codec can be referenced by its instance 
 or schema using the codec name, and a schema can be utilized and specified by 
 the schema name for a folder or EC ZONE to enforce EC. Once a schema is used 
 to define an EC ZONE, then its associated parameter values will be stored as 
 xattributes and respected thereafter.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HADOOP-11647) Reed-Solomon ErasureCoder

2015-02-27 Thread Yi Liu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-11647?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yi Liu updated HADOOP-11647:

Issue Type: Sub-task  (was: Task)
Parent: HADOOP-11264

 Reed-Solomon ErasureCoder
 -

 Key: HADOOP-11647
 URL: https://issues.apache.org/jira/browse/HADOOP-11647
 Project: Hadoop Common
  Issue Type: Sub-task
Reporter: Kai Zheng
Assignee: Kai Zheng
 Attachments: HDFS-7664-v1.patch


 This is to implement Reed-Solomon ErasureCoder using the API defined in 
 HDFS-7662. It supports to plugin via configuration for concrete 
 RawErasureCoder, using either JRSErasureCoder added in HDFS-7418 or 
 IsaRSErasureCoder added in HDFS-7338.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Moved] (HADOOP-11645) Erasure Codec API covering the essential aspects for an erasure code

2015-02-27 Thread Yi Liu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-11645?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yi Liu moved HDFS-7699 to HADOOP-11645:
---

Key: HADOOP-11645  (was: HDFS-7699)
Project: Hadoop Common  (was: Hadoop HDFS)

 Erasure Codec API covering the essential aspects for an erasure code
 

 Key: HADOOP-11645
 URL: https://issues.apache.org/jira/browse/HADOOP-11645
 Project: Hadoop Common
  Issue Type: Task
Reporter: Kai Zheng
Assignee: Kai Zheng

 This is to define the even higher level API *ErasureCodec* to possiblly 
 consider all the essential aspects for an erasure code, as discussed in in 
 HDFS-7337 in details. Generally, it will cover the necessary configurations 
 about which *RawErasureCoder* to use for the code scheme, how to form and 
 layout the BlockGroup, and etc. It will also discuss how an *ErasureCodec* 
 will be used in both client and DataNode, in all the supported modes related 
 to EC.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HADOOP-11645) Erasure Codec API covering the essential aspects for an erasure code

2015-02-27 Thread Yi Liu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-11645?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yi Liu updated HADOOP-11645:

Issue Type: Sub-task  (was: Task)
Parent: HADOOP-11264

 Erasure Codec API covering the essential aspects for an erasure code
 

 Key: HADOOP-11645
 URL: https://issues.apache.org/jira/browse/HADOOP-11645
 Project: Hadoop Common
  Issue Type: Sub-task
Reporter: Kai Zheng
Assignee: Kai Zheng

 This is to define the even higher level API *ErasureCodec* to possiblly 
 consider all the essential aspects for an erasure code, as discussed in in 
 HDFS-7337 in details. Generally, it will cover the necessary configurations 
 about which *RawErasureCoder* to use for the code scheme, how to form and 
 layout the BlockGroup, and etc. It will also discuss how an *ErasureCodec* 
 will be used in both client and DataNode, in all the supported modes related 
 to EC.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Moved] (HADOOP-11646) Erasure Coder API for encoding and decoding of block group

2015-02-27 Thread Yi Liu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-11646?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yi Liu moved HDFS-7662 to HADOOP-11646:
---

Fix Version/s: (was: HDFS-7285)
   HDFS-7285
  Key: HADOOP-11646  (was: HDFS-7662)
  Project: Hadoop Common  (was: Hadoop HDFS)

 Erasure Coder API for encoding and decoding of block group
 --

 Key: HADOOP-11646
 URL: https://issues.apache.org/jira/browse/HADOOP-11646
 Project: Hadoop Common
  Issue Type: Task
Reporter: Kai Zheng
Assignee: Kai Zheng
 Fix For: HDFS-7285

 Attachments: HDFS-7662-v1.patch, HDFS-7662-v2.patch, 
 HDFS-7662-v3.patch


 This is to define ErasureCoder API for encoding and decoding of BlockGroup. 
 Given a BlockGroup, ErasureCoder extracts data chunks from the blocks and 
 leverages RawErasureCoder defined in HDFS-7353 to perform concrete encoding 
 or decoding.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HADOOP-11646) Erasure Coder API for encoding and decoding of block group

2015-02-27 Thread Yi Liu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-11646?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yi Liu updated HADOOP-11646:

Issue Type: Sub-task  (was: Task)
Parent: HADOOP-11264

 Erasure Coder API for encoding and decoding of block group
 --

 Key: HADOOP-11646
 URL: https://issues.apache.org/jira/browse/HADOOP-11646
 Project: Hadoop Common
  Issue Type: Sub-task
Reporter: Kai Zheng
Assignee: Kai Zheng
 Fix For: HDFS-7285

 Attachments: HDFS-7662-v1.patch, HDFS-7662-v2.patch, 
 HDFS-7662-v3.patch


 This is to define ErasureCoder API for encoding and decoding of BlockGroup. 
 Given a BlockGroup, ErasureCoder extracts data chunks from the blocks and 
 leverages RawErasureCoder defined in HDFS-7353 to perform concrete encoding 
 or decoding.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Moved] (HADOOP-11647) Reed-Solomon ErasureCoder

2015-02-27 Thread Yi Liu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-11647?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yi Liu moved HDFS-7664 to HADOOP-11647:
---

Key: HADOOP-11647  (was: HDFS-7664)
Project: Hadoop Common  (was: Hadoop HDFS)

 Reed-Solomon ErasureCoder
 -

 Key: HADOOP-11647
 URL: https://issues.apache.org/jira/browse/HADOOP-11647
 Project: Hadoop Common
  Issue Type: Task
Reporter: Kai Zheng
Assignee: Kai Zheng
 Attachments: HDFS-7664-v1.patch


 This is to implement Reed-Solomon ErasureCoder using the API defined in 
 HDFS-7662. It supports to plugin via configuration for concrete 
 RawErasureCoder, using either JRSErasureCoder added in HDFS-7418 or 
 IsaRSErasureCoder added in HDFS-7338.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HADOOP-11595) Add default implementation for AbstractFileSystem#truncate

2015-02-19 Thread Yi Liu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-11595?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yi Liu updated HADOOP-11595:

   Resolution: Fixed
Fix Version/s: 2.7.0
   Status: Resolved  (was: Patch Available)

Committed to trunk and branch-2, thanks again Chris.

 Add default implementation for AbstractFileSystem#truncate
 --

 Key: HADOOP-11595
 URL: https://issues.apache.org/jira/browse/HADOOP-11595
 Project: Hadoop Common
  Issue Type: Bug
  Components: fs
Affects Versions: 2.7.0
Reporter: Yi Liu
Assignee: Yi Liu
 Fix For: 2.7.0

 Attachments: HADOOP-11595.001.patch


 As [~cnauroth] commented in HADOOP-11510, we should add a default 
 implementation for AbstractFileSystem#truncate to avoid 
 backwards-compatibility



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HADOOP-11595) Add default implementation for AbstractFileSystem#truncate

2015-02-19 Thread Yi Liu (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-11595?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14327149#comment-14327149
 ] 

Yi Liu commented on HADOOP-11595:
-

Chris, Thank you very much for the review and verification! 
Sorry for late response (I am on holiday this week for Chinese traditional new 
year), will commit it later.

 Add default implementation for AbstractFileSystem#truncate
 --

 Key: HADOOP-11595
 URL: https://issues.apache.org/jira/browse/HADOOP-11595
 Project: Hadoop Common
  Issue Type: Bug
  Components: fs
Affects Versions: 2.7.0
Reporter: Yi Liu
Assignee: Yi Liu
 Attachments: HADOOP-11595.001.patch


 As [~cnauroth] commented in HADOOP-11510, we should add a default 
 implementation for AbstractFileSystem#truncate to avoid 
 backwards-compatibility



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

1 2 3 4 5 6 >

1 - 100 of 595 matches

Mail list logo