[jira] [Commented] (HDFS-12737) Thousands of sockets lingering in TIME_WAIT state due to frequent file open operations

2017-11-01 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-12737?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16235221#comment-16235221
 ] 

Todd Lipcon commented on HDFS-12737:


In the data transfer protocol we just pass tokens with each operation. Could 
the relevant RPCs just be modified to take tokens as parameters rather than 
using them as part of the connection context?

> Thousands of sockets lingering in TIME_WAIT state due to frequent file open 
> operations
> --
>
> Key: HDFS-12737
> URL: https://issues.apache.org/jira/browse/HDFS-12737
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: ipc
> Environment: CDH5.10.2, HBase Multi-WAL=2, 250 replication peers
>Reporter: Wei-Chiu Chuang
>Assignee: Wei-Chiu Chuang
>Priority: Major
>
> On a HBase cluster we found HBase RegionServers have thousands of sockets in 
> TIME_WAIT state. It depleted system resources and caused other services to 
> fail.
> After months of troubleshooting, we found the issue is the cluster has 
> hundreds of replication peers, and has multi-WAL = 2. That creates hundreds 
> of replication threads in HBase RS, and each thread opens WAL file *every 
> second*.
> We found that the IPC client closes socket right away, and does not reuse 
> socket connection. Since each closed socket stays in TIME_WAIT state for 60 
> seconds in Linux by default, that generates thousands of TIME_WAIT sockets.
> {code:title=ClientDatanodeProtocolTranslatorPB:createClientDatanodeProtocolProxy}
> // Since we're creating a new UserGroupInformation here, we know that no
> // future RPC proxies will be able to re-use the same connection. And
> // usages of this proxy tend to be one-off calls.
> //
> // This is a temporary fix: callers should really achieve this by using
> // RPC.stopProxy() on the resulting object, but this is currently not
> // working in trunk. See the discussion on HDFS-1965.
> Configuration confWithNoIpcIdle = new Configuration(conf);
> confWithNoIpcIdle.setInt(CommonConfigurationKeysPublic
> .IPC_CLIENT_CONNECTION_MAXIDLETIME_KEY, 0);
> {code}
> This piece of code is used in DistributedFileSystem#open()
> {noformat}
> 2017-10-27 14:01:44,152 DEBUG org.apache.hadoop.ipc.Client: New connection 
> Thread[IPC Client (1838187805) connection to /172.131.21.48:20001 from 
> blk_1013754707_14032,5,main] for remoteId /172.131.21.48:20001
> java.lang.Throwable: For logging stack trace, not a real exception
> at org.apache.hadoop.ipc.Client.getConnection(Client.java:1556)
> at org.apache.hadoop.ipc.Client.call(Client.java:1482)
> at org.apache.hadoop.ipc.Client.call(Client.java:1443)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:230)
> at com.sun.proxy.$Proxy28.getReplicaVisibleLength(Unknown Source)
> at 
> org.apache.hadoop.hdfs.protocolPB.ClientDatanodeProtocolTranslatorPB.getReplicaVisibleLength(ClientDatanodeProtocolTranslatorPB.java:198)
> at 
> org.apache.hadoop.hdfs.DFSInputStream.readBlockLength(DFSInputStream.java:365)
> at 
> org.apache.hadoop.hdfs.DFSInputStream.fetchLocatedBlocksAndGetLastBlockLength(DFSInputStream.java:335)
> at 
> org.apache.hadoop.hdfs.DFSInputStream.openInfo(DFSInputStream.java:271)
> at 
> org.apache.hadoop.hdfs.DFSInputStream.(DFSInputStream.java:263)
> at org.apache.hadoop.hdfs.DFSClient.open(DFSClient.java:1585)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem$4.doCall(DistributedFileSystem.java:326)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem$4.doCall(DistributedFileSystem.java:322)
> at 
> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem.open(DistributedFileSystem.java:322)
> at 
> org.apache.hadoop.fs.FilterFileSystem.open(FilterFileSystem.java:162)
> at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:783)
> at 
> org.apache.hadoop.hbase.wal.WALFactory.createReader(WALFactory.java:293)
> at 
> org.apache.hadoop.hbase.wal.WALFactory.createReader(WALFactory.java:267)
> at 
> org.apache.hadoop.hbase.wal.WALFactory.createReader(WALFactory.java:255)
> at 
> org.apache.hadoop.hbase.wal.WALFactory.createReader(WALFactory.java:414)
> at 
> org.apache.hadoop.hbase.replication.regionserver.ReplicationWALReaderManager.openReader(ReplicationWALReaderManager.java:70)
> at 
> org.apache.hadoop.hbase.replication.regionserver.ReplicationSource$ReplicationSourceWorkerThread.openReader(ReplicationSource.java:747)
> at 
> 

[jira] [Commented] (HDFS-12443) Ozone: Improve SCM block deletion throttling algorithm

2017-11-01 Thread Weiwei Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-12443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16235211#comment-16235211
 ] 

Weiwei Yang commented on HDFS-12443:


All sound good to me, [~linyiqun]. Please go ahead. Thanks a lot.

> Ozone: Improve SCM block deletion throttling algorithm 
> ---
>
> Key: HDFS-12443
> URL: https://issues.apache.org/jira/browse/HDFS-12443
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: ozone, scm
>Reporter: Weiwei Yang
>Assignee: Yiqun Lin
>Priority: Major
>  Labels: OzonePostMerge
> Attachments: HDFS-12443-HDFS-7240.001.patch, 
> HDFS-12443-HDFS-7240.002.patch, HDFS-12443-HDFS-7240.002.patch, 
> HDFS-12443-SCM-blockdeletion-throttle.pdf
>
>
> Currently SCM scans delLog to send deletion transactions to datanode 
> periodically, the throttling algorithm is simple, it scans at most 
> {{BLOCK_DELETE_TX_PER_REQUEST_LIMIT}} (by default 50) at a time. This is 
> non-optimal, worst case it might cache 50 TXs for 50 different DNs so each DN 
> will only get 1 TX to proceed in an interval, this will make the deletion 
> slow. An improvement to this is to make this throttling by datanode, e.g 50 
> TXs per datanode per interval.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-12737) Thousands of sockets lingering in TIME_WAIT state due to frequent file open operations

2017-11-01 Thread Jitendra Nath Pandey (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-12737?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16235182#comment-16235182
 ] 

Jitendra Nath Pandey commented on HDFS-12737:
-

[~yzhangal], The block token is also being used to authorize the access to a 
block. Therefore, a connection context must be established using that 
particular block token. 
   In method {{DataNode#checkReadAccess}}. The block-id from the 
token-identifier in the UGI is used to authorize the access. Therefore, sharing 
of connections for different block tokens will likely expose a security risk. 

> Thousands of sockets lingering in TIME_WAIT state due to frequent file open 
> operations
> --
>
> Key: HDFS-12737
> URL: https://issues.apache.org/jira/browse/HDFS-12737
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: ipc
> Environment: CDH5.10.2, HBase Multi-WAL=2, 250 replication peers
>Reporter: Wei-Chiu Chuang
>Assignee: Wei-Chiu Chuang
>Priority: Major
>
> On a HBase cluster we found HBase RegionServers have thousands of sockets in 
> TIME_WAIT state. It depleted system resources and caused other services to 
> fail.
> After months of troubleshooting, we found the issue is the cluster has 
> hundreds of replication peers, and has multi-WAL = 2. That creates hundreds 
> of replication threads in HBase RS, and each thread opens WAL file *every 
> second*.
> We found that the IPC client closes socket right away, and does not reuse 
> socket connection. Since each closed socket stays in TIME_WAIT state for 60 
> seconds in Linux by default, that generates thousands of TIME_WAIT sockets.
> {code:title=ClientDatanodeProtocolTranslatorPB:createClientDatanodeProtocolProxy}
> // Since we're creating a new UserGroupInformation here, we know that no
> // future RPC proxies will be able to re-use the same connection. And
> // usages of this proxy tend to be one-off calls.
> //
> // This is a temporary fix: callers should really achieve this by using
> // RPC.stopProxy() on the resulting object, but this is currently not
> // working in trunk. See the discussion on HDFS-1965.
> Configuration confWithNoIpcIdle = new Configuration(conf);
> confWithNoIpcIdle.setInt(CommonConfigurationKeysPublic
> .IPC_CLIENT_CONNECTION_MAXIDLETIME_KEY, 0);
> {code}
> This piece of code is used in DistributedFileSystem#open()
> {noformat}
> 2017-10-27 14:01:44,152 DEBUG org.apache.hadoop.ipc.Client: New connection 
> Thread[IPC Client (1838187805) connection to /172.131.21.48:20001 from 
> blk_1013754707_14032,5,main] for remoteId /172.131.21.48:20001
> java.lang.Throwable: For logging stack trace, not a real exception
> at org.apache.hadoop.ipc.Client.getConnection(Client.java:1556)
> at org.apache.hadoop.ipc.Client.call(Client.java:1482)
> at org.apache.hadoop.ipc.Client.call(Client.java:1443)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:230)
> at com.sun.proxy.$Proxy28.getReplicaVisibleLength(Unknown Source)
> at 
> org.apache.hadoop.hdfs.protocolPB.ClientDatanodeProtocolTranslatorPB.getReplicaVisibleLength(ClientDatanodeProtocolTranslatorPB.java:198)
> at 
> org.apache.hadoop.hdfs.DFSInputStream.readBlockLength(DFSInputStream.java:365)
> at 
> org.apache.hadoop.hdfs.DFSInputStream.fetchLocatedBlocksAndGetLastBlockLength(DFSInputStream.java:335)
> at 
> org.apache.hadoop.hdfs.DFSInputStream.openInfo(DFSInputStream.java:271)
> at 
> org.apache.hadoop.hdfs.DFSInputStream.(DFSInputStream.java:263)
> at org.apache.hadoop.hdfs.DFSClient.open(DFSClient.java:1585)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem$4.doCall(DistributedFileSystem.java:326)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem$4.doCall(DistributedFileSystem.java:322)
> at 
> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem.open(DistributedFileSystem.java:322)
> at 
> org.apache.hadoop.fs.FilterFileSystem.open(FilterFileSystem.java:162)
> at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:783)
> at 
> org.apache.hadoop.hbase.wal.WALFactory.createReader(WALFactory.java:293)
> at 
> org.apache.hadoop.hbase.wal.WALFactory.createReader(WALFactory.java:267)
> at 
> org.apache.hadoop.hbase.wal.WALFactory.createReader(WALFactory.java:255)
> at 
> org.apache.hadoop.hbase.wal.WALFactory.createReader(WALFactory.java:414)
> at 
> org.apache.hadoop.hbase.replication.regionserver.ReplicationWALReaderManager.openReader(ReplicationWALReaderManager.java:70)
> at 
> 

[jira] [Commented] (HDFS-12719) Ozone: Fix checkstyle, javac, whitespace issues in HDFS-7240 branch

2017-11-01 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-12719?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16235176#comment-16235176
 ] 

Hadoop QA commented on HDFS-12719:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m  
0s{color} | {color:blue} Docker mode activated. {color} |
| {color:red}-1{color} | {color:red} patch {color} | {color:red}  0m  5s{color} 
| {color:red} HDFS-12719 does not apply to HDFS-7240. Rebase required? Wrong 
Branch? See https://wiki.apache.org/hadoop/HowToContribute for help. {color} |
\\
\\
|| Subsystem || Report/Notes ||
| JIRA Issue | HDFS-12719 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12894723/HDFS-12719-HDFS-7240.001.patch
 |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/21924/console |
| Powered by | Apache Yetus 0.7.0-SNAPSHOT   http://yetus.apache.org |


This message was automatically generated.



> Ozone: Fix checkstyle, javac, whitespace issues in HDFS-7240 branch
> ---
>
> Key: HDFS-12719
> URL: https://issues.apache.org/jira/browse/HDFS-12719
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: ozone
>Affects Versions: HDFS-7240
>Reporter: Mukul Kumar Singh
>Assignee: Mukul Kumar Singh
>Priority: Major
> Fix For: HDFS-7240
>
> Attachments: HDFS-12719-HDFS-7240.001.patch
>
>
> There are outstanding whitespace/javac/checkstyle issues on the HDFS-7240 
> branch. These were observed by uploading the branch diff to the trunk via 
> parent jira HDFS-7240. This jira will fix all the valid outstanding issues.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-12719) Ozone: Fix checkstyle, javac, whitespace issues in HDFS-7240 branch

2017-11-01 Thread Mukul Kumar Singh (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-12719?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mukul Kumar Singh updated HDFS-12719:
-
Status: Patch Available  (was: Open)

> Ozone: Fix checkstyle, javac, whitespace issues in HDFS-7240 branch
> ---
>
> Key: HDFS-12719
> URL: https://issues.apache.org/jira/browse/HDFS-12719
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: ozone
>Affects Versions: HDFS-7240
>Reporter: Mukul Kumar Singh
>Assignee: Mukul Kumar Singh
>Priority: Major
> Fix For: HDFS-7240
>
> Attachments: HDFS-12719-HDFS-7240.001.patch
>
>
> There are outstanding whitespace/javac/checkstyle issues on the HDFS-7240 
> branch. These were observed by uploading the branch diff to the trunk via 
> parent jira HDFS-7240. This jira will fix all the valid outstanding issues.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-12390) Support to refresh DNS to switch mapping

2017-11-01 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-12390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16235134#comment-16235134
 ] 

Hadoop QA commented on HDFS-12390:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
16s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
27s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 15m 
42s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
28s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
49s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
29s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m 58s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  3m 
10s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m  
6s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m  
8s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:red}-1{color} | {color:red} mvninstall {color} | {color:red}  0m 
25s{color} | {color:red} hadoop-hdfs in the patch failed. {color} |
| {color:red}-1{color} | {color:red} compile {color} | {color:red}  0m 
46s{color} | {color:red} hadoop-hdfs-project in the patch failed. {color} |
| {color:red}-1{color} | {color:red} cc {color} | {color:red}  0m 46s{color} | 
{color:red} hadoop-hdfs-project in the patch failed. {color} |
| {color:red}-1{color} | {color:red} javac {color} | {color:red}  0m 46s{color} 
| {color:red} hadoop-hdfs-project in the patch failed. {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 46s{color} | {color:orange} hadoop-hdfs-project: The patch generated 9 new + 
611 unchanged - 0 fixed = 620 total (was 611) {color} |
| {color:red}-1{color} | {color:red} mvnsite {color} | {color:red}  0m 
29s{color} | {color:red} hadoop-hdfs in the patch failed. {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:red}-1{color} | {color:red} shadedclient {color} | {color:red}  3m 
12s{color} | {color:red} patch has errors when building and testing our client 
artifacts. {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red}  0m 
16s{color} | {color:red} hadoop-hdfs in the patch failed. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m  
7s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  1m 
13s{color} | {color:green} hadoop-hdfs-client in the patch passed. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red}  0m 28s{color} 
| {color:red} hadoop-hdfs in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
16s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 47m 18s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:5b98639 |
| JIRA Issue | HDFS-12390 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12885292/HDFS-12390.002.patch |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  shadedclient  findbugs  checkstyle  cc  |
| uname | Linux 00cbfc182bf5 3.13.0-119-generic #166-Ubuntu SMP Wed May 3 
12:18:55 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool 

[jira] [Commented] (HDFS-12443) Ozone: Improve SCM block deletion throttling algorithm

2017-11-01 Thread Yiqun Lin (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-12443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16235120#comment-16235120
 ] 

Yiqun Lin commented on HDFS-12443:
--

Thanks comments, [~cheersyang]. I think we are in the same direction now. Some 
details I'd like to confirm with you.
bq. How you plan to define the max number of containers for each node?
I'd like to calculated this based on container, block size that was configured. 
The Calculation way I had mentioned in above comment. Please have a look. 
bq. I think we need a in-memory data structure to handle this...
For this new data structure, I'd like to make a change based on current class 
{{DatanodeBlockDeletionTransactions}} and to make this being a independent 
class. That will be convenient for us to test.
Please see if it looks good to you or any suggestion. Then I will start work on 
this. Thank you.

> Ozone: Improve SCM block deletion throttling algorithm 
> ---
>
> Key: HDFS-12443
> URL: https://issues.apache.org/jira/browse/HDFS-12443
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: ozone, scm
>Reporter: Weiwei Yang
>Assignee: Yiqun Lin
>Priority: Major
>  Labels: OzonePostMerge
> Attachments: HDFS-12443-HDFS-7240.001.patch, 
> HDFS-12443-HDFS-7240.002.patch, HDFS-12443-HDFS-7240.002.patch, 
> HDFS-12443-SCM-blockdeletion-throttle.pdf
>
>
> Currently SCM scans delLog to send deletion transactions to datanode 
> periodically, the throttling algorithm is simple, it scans at most 
> {{BLOCK_DELETE_TX_PER_REQUEST_LIMIT}} (by default 50) at a time. This is 
> non-optimal, worst case it might cache 50 TXs for 50 different DNs so each DN 
> will only get 1 TX to proceed in an interval, this will make the deletion 
> slow. An improvement to this is to make this throttling by datanode, e.g 50 
> TXs per datanode per interval.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-12739) Add Support for SCM --init command

2017-11-01 Thread Yiqun Lin (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-12739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16235095#comment-16235095
 ] 

Yiqun Lin commented on HDFS-12739:
--

Thanks [~shashikant] for updating patch. I'm +1 for the change. 
Please wait [~nandakumar131]'s review comments on the latest patch. We may 
attach the same patch to re-trigger Jenkins. Thanks.

> Add Support for SCM --init command
> --
>
> Key: HDFS-12739
> URL: https://issues.apache.org/jira/browse/HDFS-12739
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: HDFS-7240
>Affects Versions: HDFS-7240
>Reporter: Shashikant Banerjee
>Assignee: Shashikant Banerjee
>Priority: Major
> Attachments: HDFS-12739-HDFS-7240.001.patch, 
> HDFS-12739-HDFS-7240.002.patch, HDFS-12739-HDFS-7240.003.patch, 
> HDFS-12739-HDFS-7240.004.patch
>
>
> SCM --init command will generate cluster ID and persist it locally. The same 
> cluster Id will be shared with KSM and the datanodes. IF the cluster Id is 
> already available in the locally available version file, it will just read 
> the cluster Id .



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-7240) Object store in HDFS

2017-11-01 Thread Konstantin Shvachko (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16235080#comment-16235080
 ] 

Konstantin Shvachko commented on HDFS-7240:
---

??I hope this addresses your concerns.??

I don't think _that_ addressed any of my concerns.
* Ozone by itself does not solve any of HDFS problems. It uses HDFS-agnostic 
S3-like API, and I cannot use it on my clusters.
Unless I can convince thousands of my users to rewrite their thousands of 
applications, along with the existing computational frameworks: YARN, Hive, 
Pig, Spark, ..  created over the past 10 years.
* I was talking about futuristic architecture, when you start using Ozone for 
block management, and rewrite NameNode to store its namespace in LevelDB. If 
this is still your plan. I agree this architecture solves the objects-count 
problem. But it does not solve the problem of scaling RPC requests, which is 
more important to me than the # of objects, since you still cannot grow the 
cluster beyond the single-NameNode's-RPC-processing capability.

> Object store in HDFS
> 
>
> Key: HDFS-7240
> URL: https://issues.apache.org/jira/browse/HDFS-7240
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Reporter: Jitendra Nath Pandey
>Assignee: Jitendra Nath Pandey
>Priority: Major
> Attachments: HDFS-7240.001.patch, HDFS-7240.002.patch, 
> HDFS-7240.003.patch, HDFS-7240.003.patch, HDFS-7240.004.patch, 
> Ozone-architecture-v1.pdf, Ozonedesignupdate.pdf, ozone_user_v0.pdf
>
>
> This jira proposes to add object store capabilities into HDFS. 
> As part of the federation work (HDFS-1052) we separated block storage as a 
> generic storage layer. Using the Block Pool abstraction, new kinds of 
> namespaces can be built on top of the storage layer i.e. datanodes.
> In this jira I will explore building an object store using the datanode 
> storage, but independent of namespace metadata.
> I will soon update with a detailed design document.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-12744) More logs when short-circuit read is failed and disabled

2017-11-01 Thread Weiwei Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-12744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16235061#comment-16235061
 ] 

Weiwei Yang commented on HDFS-12744:


Thanks [~subru], that's nice.

> More logs when short-circuit read is failed and disabled
> 
>
> Key: HDFS-12744
> URL: https://issues.apache.org/jira/browse/HDFS-12744
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Reporter: Weiwei Yang
>Assignee: Weiwei Yang
>Priority: Major
>  Labels: supportability
> Fix For: 2.9.0, 3.0.0
>
> Attachments: HDFS-12744.001.patch, HDFS-12744.002.patch
>
>
> Short-circuit read (SCR) failed with following error
> {noformat}
> 2017-10-21 16:42:28,024 WARN  
> [B.defaultRpcServer.handler=7,queue=7,port=16020] 
> impl.BlockReaderFactory: BlockReaderFactory(xxx): unknown response code ERROR
> while attempting to set up short-circuit access. Block xxx is not valid
> {noformat}
> then short-circuit read is disabled for *10 minutes* without any warning 
> message given in the log. This causes us spent some more time to figure out 
> why we had a long time window that SCR was not working. Propose to add a 
> warning log (other places already did) to indicate SCR is disabled and some 
> more logging in DN to display what happened.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-11096) Support rolling upgrade between 2.x and 3.x

2017-11-01 Thread Andrew Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-11096?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Wang updated HDFS-11096:
---
Target Version/s: 3.0.1  (was: 3.0.0)

Thanks Sean, I'm going to bump this to 3.0.1 then.

> Support rolling upgrade between 2.x and 3.x
> ---
>
> Key: HDFS-11096
> URL: https://issues.apache.org/jira/browse/HDFS-11096
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: rolling upgrades
>Affects Versions: 3.0.0-alpha1
>Reporter: Andrew Wang
>Assignee: Sean Mackrory
>Priority: Blocker
> Attachments: HDFS-11096.001.patch, HDFS-11096.002.patch, 
> HDFS-11096.003.patch, HDFS-11096.004.patch, HDFS-11096.005.patch, 
> HDFS-11096.006.patch, HDFS-11096.007.patch
>
>
> trunk has a minimum software version of 3.0.0-alpha1. This means we can't 
> rolling upgrade between branch-2 and trunk.
> This is a showstopper for large deployments. Unless there are very compelling 
> reasons to break compatibility, let's restore the ability to rolling upgrade 
> to 3.x releases.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-12744) More logs when short-circuit read is failed and disabled

2017-11-01 Thread Subru Krishnan (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-12744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16234697#comment-16234697
 ] 

Subru Krishnan edited comment on HDFS-12744 at 11/2/17 1:16 AM:


[~cheersyang]/[~jzhuge], I cherry-picked to branch-2.9 since you want to 
include this in 2.9.0 release. 


was (Author: subru):
[~cheersyang]/[~jzhuge], you should cherry-pick to branch-2.9 if you want to 
include in 2.9.0 release. Thanks.

> More logs when short-circuit read is failed and disabled
> 
>
> Key: HDFS-12744
> URL: https://issues.apache.org/jira/browse/HDFS-12744
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Reporter: Weiwei Yang
>Assignee: Weiwei Yang
>Priority: Major
>  Labels: supportability
> Fix For: 2.9.0, 3.0.0
>
> Attachments: HDFS-12744.001.patch, HDFS-12744.002.patch
>
>
> Short-circuit read (SCR) failed with following error
> {noformat}
> 2017-10-21 16:42:28,024 WARN  
> [B.defaultRpcServer.handler=7,queue=7,port=16020] 
> impl.BlockReaderFactory: BlockReaderFactory(xxx): unknown response code ERROR
> while attempting to set up short-circuit access. Block xxx is not valid
> {noformat}
> then short-circuit read is disabled for *10 minutes* without any warning 
> message given in the log. This causes us spent some more time to figure out 
> why we had a long time window that SCR was not working. Propose to add a 
> warning log (other places already did) to indicate SCR is disabled and some 
> more logging in DN to display what happened.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-12618) fsck -includeSnapshots reports wrong amount of total blocks

2017-11-01 Thread Xiao Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-12618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16235007#comment-16235007
 ] 

Xiao Chen commented on HDFS-12618:
--

Thanks for the new patch Wellington.

>From a quick look this seems to work, nice job. I'd like to see:
- more thorough unit tests covering the description scenario (2 snapshots 
referring to a deleted file)
- tests covering some combinations of create / delete snapshot, and verify the 
number is correct
- not an expert on lambda expert, but it seems {{DirTypeCheck}} could be 
private.
- In general we'd need to acquire the FSDirectory lock as well as the 
FSNamesystem lock. So need dir.readLock() after the name system readlock, and 
dir.readUnlock before the fsn unlock.
- Looks like you have applied a formatter to the entire NamenodeFsck.java 
(instead of just the changed code), which resulted in some unnecessary changes. 
Let's not make those changes.

Will provide a more complete review later this week.

> fsck -includeSnapshots reports wrong amount of total blocks
> ---
>
> Key: HDFS-12618
> URL: https://issues.apache.org/jira/browse/HDFS-12618
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: tools
>Affects Versions: 3.0.0-alpha3
>Reporter: Wellington Chevreuil
>Assignee: Wellington Chevreuil
>Priority: Minor
> Attachments: HDFS-121618.initial, HDFS-12618.001.patch, 
> HDFS-12618.002.patch, HDFS-12618.003.patch
>
>
> When snapshot is enabled, if a file is deleted but is contained by a 
> snapshot, *fsck* will not reported blocks for such file, showing different 
> number of *total blocks* than what is exposed in the Web UI. 
> This should be fine, as *fsck* provides *-includeSnapshots* option. The 
> problem is that *-includeSnapshots* option causes *fsck* to count blocks for 
> every occurrence of a file on snapshots, which is wrong because these blocks 
> should be counted only once (for instance, if a 100MB file is present on 3 
> snapshots, it would still map to one block only in hdfs). This causes fsck to 
> report much more blocks than what actually exist in hdfs and is reported in 
> the Web UI.
> Here's an example:
> 1) HDFS has two files of 2 blocks each:
> {noformat}
> $ hdfs dfs -ls -R /
> drwxr-xr-x   - root supergroup  0 2017-10-07 21:21 /snap-test
> -rw-r--r--   1 root supergroup  209715200 2017-10-07 20:16 /snap-test/file1
> -rw-r--r--   1 root supergroup  209715200 2017-10-07 20:17 /snap-test/file2
> drwxr-xr-x   - root supergroup  0 2017-05-13 13:03 /test
> {noformat} 
> 2) There are two snapshots, with the two files present on each of the 
> snapshots:
> {noformat}
> $ hdfs dfs -ls -R /snap-test/.snapshot
> drwxr-xr-x   - root supergroup  0 2017-10-07 21:21 
> /snap-test/.snapshot/snap1
> -rw-r--r--   1 root supergroup  209715200 2017-10-07 20:16 
> /snap-test/.snapshot/snap1/file1
> -rw-r--r--   1 root supergroup  209715200 2017-10-07 20:17 
> /snap-test/.snapshot/snap1/file2
> drwxr-xr-x   - root supergroup  0 2017-10-07 21:21 
> /snap-test/.snapshot/snap2
> -rw-r--r--   1 root supergroup  209715200 2017-10-07 20:16 
> /snap-test/.snapshot/snap2/file1
> -rw-r--r--   1 root supergroup  209715200 2017-10-07 20:17 
> /snap-test/.snapshot/snap2/file2
> {noformat}
> 3) *fsck -includeSnapshots* reports 12 blocks in total (4 blocks for the 
> normal file path, plus 4 blocks for each snapshot path):
> {noformat}
> $ hdfs fsck / -includeSnapshots
> FSCK started by root (auth:SIMPLE) from /127.0.0.1 for path / at Mon Oct 09 
> 15:15:36 BST 2017
> Status: HEALTHY
>  Number of data-nodes:1
>  Number of racks: 1
>  Total dirs:  6
>  Total symlinks:  0
> Replicated Blocks:
>  Total size:  1258291200 B
>  Total files: 6
>  Total blocks (validated):12 (avg. block size 104857600 B)
>  Minimally replicated blocks: 12 (100.0 %)
>  Over-replicated blocks:  0 (0.0 %)
>  Under-replicated blocks: 0 (0.0 %)
>  Mis-replicated blocks:   0 (0.0 %)
>  Default replication factor:  1
>  Average block replication:   1.0
>  Missing blocks:  0
>  Corrupt blocks:  0
>  Missing replicas:0 (0.0 %)
> {noformat}
> 4) Web UI shows the correct number (4 blocks only):
> {noformat}
> Security is off.
> Safemode is off.
> 5 files and directories, 4 blocks = 9 total filesystem object(s).
> {noformat}
> I would like to work on this solution, will propose an initial solution 
> shortly.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-12756) Ozone: Add datanodeID to heartbeat responses and container protocol

2017-11-01 Thread Anu Engineer (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-12756?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anu Engineer updated HDFS-12756:

Attachment: HDFS-12756-HDFS-7240.001.patch

cc: [~xyao], [~nandakumar131], [~elek], [~Weiwei Yang] Please take a look when 
you get a chance. The first step towards having a cluster simulator as part of 
MiniOzoneCluster. 

> Ozone: Add datanodeID to heartbeat responses and container protocol
> ---
>
> Key: HDFS-12756
> URL: https://issues.apache.org/jira/browse/HDFS-12756
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Anu Engineer
>Assignee: Anu Engineer
> Attachments: HDFS-12756-HDFS-7240.001.patch
>
>
> if we have datanode ID in the HBs responses and commands send to datanode, we 
> will be able to do additional sanity checking on datanode before executing 
> the command. This is also very helpful in creating a MiniOzoneCluster with 
> 1000s of simulated nodes. This is needed for scale based unit tests of SCM.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-12756) Ozone: Add datanodeID to heartbeat responses and container protocol

2017-11-01 Thread Anu Engineer (JIRA)
Anu Engineer created HDFS-12756:
---

 Summary: Ozone: Add datanodeID to heartbeat responses and 
container protocol
 Key: HDFS-12756
 URL: https://issues.apache.org/jira/browse/HDFS-12756
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Anu Engineer
Assignee: Anu Engineer


if we have datanode ID in the HBs responses and commands send to datanode, we 
will be able to do additional sanity checking on datanode before executing the 
command. This is also very helpful in creating a MiniOzoneCluster with 1000s of 
simulated nodes. This is needed for scale based unit tests of SCM.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-7240) Object store in HDFS

2017-11-01 Thread Anu Engineer (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16234959#comment-16234959
 ] 

Anu Engineer commented on HDFS-7240:


[~ste...@apache.org] Thank you for the comments. 
bq. For now, biggest issue I have is that OzoneException needs to become an IOE
I have filed HDFS-12755 for converting the OzoneException to an IOException. 

bq. What's your scale limit? I see a single PUT for the upload, GET path > tmp 
in open() . Is there a test for different sizes of file?
We have tested with different sizes from 1 byte files to 2 GB. There is no size 
limit imposed by ozone architecture. However, we have always planned to follow 
the S3 limit of 5 GB. We can certainly add tests for different size of files -- 
but creating these data files during unit tests take time. We have strived to 
keep the unit tests of ozone under 4 mins so far.  Large key sizes add 
prohibitive unit test times. So our approach is to use Corona, which is a 
load-generation tool for ozone. we run this 4 times daily with different key 
sizes. It is trivial to setup and run.

For the comments on the OzoneFileSystem, I will let the appropriate person 
respond.





> Object store in HDFS
> 
>
> Key: HDFS-7240
> URL: https://issues.apache.org/jira/browse/HDFS-7240
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Reporter: Jitendra Nath Pandey
>Assignee: Jitendra Nath Pandey
>Priority: Major
> Attachments: HDFS-7240.001.patch, HDFS-7240.002.patch, 
> HDFS-7240.003.patch, HDFS-7240.003.patch, HDFS-7240.004.patch, 
> Ozone-architecture-v1.pdf, Ozonedesignupdate.pdf, ozone_user_v0.pdf
>
>
> This jira proposes to add object store capabilities into HDFS. 
> As part of the federation work (HDFS-1052) we separated block storage as a 
> generic storage layer. Using the Block Pool abstraction, new kinds of 
> namespaces can be built on top of the storage layer i.e. datanodes.
> In this jira I will explore building an object store using the datanode 
> storage, but independent of namespace metadata.
> I will soon update with a detailed design document.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-12755) Ozone: OzoneException needs to become an IOException

2017-11-01 Thread Anu Engineer (JIRA)
Anu Engineer created HDFS-12755:
---

 Summary: Ozone: OzoneException needs to become an IOException
 Key: HDFS-12755
 URL: https://issues.apache.org/jira/browse/HDFS-12755
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: ozone
Affects Versions: HDFS-7240
Reporter: Anu Engineer
Assignee: Anu Engineer
Priority: Critical
 Fix For: HDFS-7240


>From Review Comments from [~ste...@apache.org]:

For now, the biggest issue I have is that OzoneException needs to become an 
IOE, so simplifying exception handling all round, preserving information, not 
losing stack traces, and generally leading to happy support teams as well as 
developers. Changing the base class isn't itself traumatic, but it will 
implicate the client code as there's almost no longer any need to catch & wrap 
things.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-12682) ECAdmin -listPolicies will always show SystemErasureCodingPolicies state as DISABLED

2017-11-01 Thread Xiao Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-12682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16234947#comment-16234947
 ] 

Xiao Chen commented on HDFS-12682:
--

Fixing checkstyle...

> ECAdmin -listPolicies will always show SystemErasureCodingPolicies state as 
> DISABLED
> 
>
> Key: HDFS-12682
> URL: https://issues.apache.org/jira/browse/HDFS-12682
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: erasure-coding
>Reporter: Xiao Chen
>Assignee: Xiao Chen
>Priority: Blocker
>  Labels: hdfs-ec-3.0-must-do
> Attachments: HDFS-12682.01.patch, HDFS-12682.02.patch, 
> HDFS-12682.03.patch, HDFS-12682.04.patch, HDFS-12682.05.patch, 
> HDFS-12682.06.patch, HDFS-12682.07.patch, HDFS-12682.08.patch
>
>
> On a real cluster, {{hdfs ec -listPolicies}} will always show policy state as 
> DISABLED.
> {noformat}
> [hdfs@nightly6x-1 root]$ hdfs ec -listPolicies
> Erasure Coding Policies:
> ErasureCodingPolicy=[Name=RS-10-4-1024k, Schema=[ECSchema=[Codec=rs, 
> numDataUnits=10, numParityUnits=4]], CellSize=1048576, Id=5, State=DISABLED]
> ErasureCodingPolicy=[Name=RS-3-2-1024k, Schema=[ECSchema=[Codec=rs, 
> numDataUnits=3, numParityUnits=2]], CellSize=1048576, Id=2, State=DISABLED]
> ErasureCodingPolicy=[Name=RS-6-3-1024k, Schema=[ECSchema=[Codec=rs, 
> numDataUnits=6, numParityUnits=3]], CellSize=1048576, Id=1, State=DISABLED]
> ErasureCodingPolicy=[Name=RS-LEGACY-6-3-1024k, 
> Schema=[ECSchema=[Codec=rs-legacy, numDataUnits=6, numParityUnits=3]], 
> CellSize=1048576, Id=3, State=DISABLED]
> ErasureCodingPolicy=[Name=XOR-2-1-1024k, Schema=[ECSchema=[Codec=xor, 
> numDataUnits=2, numParityUnits=1]], CellSize=1048576, Id=4, State=DISABLED]
> [hdfs@nightly6x-1 root]$ hdfs ec -getPolicy -path /ecec
> XOR-2-1-1024k
> {noformat}
> This is because when [deserializing 
> protobuf|https://github.com/apache/hadoop/blob/branch-3.0.0-beta1/hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/protocolPB/PBHelperClient.java#L2942],
>  the static instance of [SystemErasureCodingPolicies 
> class|https://github.com/apache/hadoop/blob/branch-3.0.0-beta1/hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/protocol/SystemErasureCodingPolicies.java#L101]
>  is first checked, and always returns the cached policy objects, which are 
> created by default with state=DISABLED.
> All the existing unit tests pass, because that static instance that the 
> client (e.g. ECAdmin) reads in unit test is updated by NN. :)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-12682) ECAdmin -listPolicies will always show SystemErasureCodingPolicies state as DISABLED

2017-11-01 Thread Xiao Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-12682?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Chen updated HDFS-12682:
-
Attachment: HDFS-12682.08.patch

> ECAdmin -listPolicies will always show SystemErasureCodingPolicies state as 
> DISABLED
> 
>
> Key: HDFS-12682
> URL: https://issues.apache.org/jira/browse/HDFS-12682
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: erasure-coding
>Reporter: Xiao Chen
>Assignee: Xiao Chen
>Priority: Blocker
>  Labels: hdfs-ec-3.0-must-do
> Attachments: HDFS-12682.01.patch, HDFS-12682.02.patch, 
> HDFS-12682.03.patch, HDFS-12682.04.patch, HDFS-12682.05.patch, 
> HDFS-12682.06.patch, HDFS-12682.07.patch, HDFS-12682.08.patch
>
>
> On a real cluster, {{hdfs ec -listPolicies}} will always show policy state as 
> DISABLED.
> {noformat}
> [hdfs@nightly6x-1 root]$ hdfs ec -listPolicies
> Erasure Coding Policies:
> ErasureCodingPolicy=[Name=RS-10-4-1024k, Schema=[ECSchema=[Codec=rs, 
> numDataUnits=10, numParityUnits=4]], CellSize=1048576, Id=5, State=DISABLED]
> ErasureCodingPolicy=[Name=RS-3-2-1024k, Schema=[ECSchema=[Codec=rs, 
> numDataUnits=3, numParityUnits=2]], CellSize=1048576, Id=2, State=DISABLED]
> ErasureCodingPolicy=[Name=RS-6-3-1024k, Schema=[ECSchema=[Codec=rs, 
> numDataUnits=6, numParityUnits=3]], CellSize=1048576, Id=1, State=DISABLED]
> ErasureCodingPolicy=[Name=RS-LEGACY-6-3-1024k, 
> Schema=[ECSchema=[Codec=rs-legacy, numDataUnits=6, numParityUnits=3]], 
> CellSize=1048576, Id=3, State=DISABLED]
> ErasureCodingPolicy=[Name=XOR-2-1-1024k, Schema=[ECSchema=[Codec=xor, 
> numDataUnits=2, numParityUnits=1]], CellSize=1048576, Id=4, State=DISABLED]
> [hdfs@nightly6x-1 root]$ hdfs ec -getPolicy -path /ecec
> XOR-2-1-1024k
> {noformat}
> This is because when [deserializing 
> protobuf|https://github.com/apache/hadoop/blob/branch-3.0.0-beta1/hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/protocolPB/PBHelperClient.java#L2942],
>  the static instance of [SystemErasureCodingPolicies 
> class|https://github.com/apache/hadoop/blob/branch-3.0.0-beta1/hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/protocol/SystemErasureCodingPolicies.java#L101]
>  is first checked, and always returns the cached policy objects, which are 
> created by default with state=DISABLED.
> All the existing unit tests pass, because that static instance that the 
> client (e.g. ECAdmin) reads in unit test is updated by NN. :)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-12682) ECAdmin -listPolicies will always show SystemErasureCodingPolicies state as DISABLED

2017-11-01 Thread Xiao Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-12682?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Chen updated HDFS-12682:
-
Attachment: (was: HDFS-12682.08.patch)

> ECAdmin -listPolicies will always show SystemErasureCodingPolicies state as 
> DISABLED
> 
>
> Key: HDFS-12682
> URL: https://issues.apache.org/jira/browse/HDFS-12682
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: erasure-coding
>Reporter: Xiao Chen
>Assignee: Xiao Chen
>Priority: Blocker
>  Labels: hdfs-ec-3.0-must-do
> Attachments: HDFS-12682.01.patch, HDFS-12682.02.patch, 
> HDFS-12682.03.patch, HDFS-12682.04.patch, HDFS-12682.05.patch, 
> HDFS-12682.06.patch, HDFS-12682.07.patch, HDFS-12682.08.patch
>
>
> On a real cluster, {{hdfs ec -listPolicies}} will always show policy state as 
> DISABLED.
> {noformat}
> [hdfs@nightly6x-1 root]$ hdfs ec -listPolicies
> Erasure Coding Policies:
> ErasureCodingPolicy=[Name=RS-10-4-1024k, Schema=[ECSchema=[Codec=rs, 
> numDataUnits=10, numParityUnits=4]], CellSize=1048576, Id=5, State=DISABLED]
> ErasureCodingPolicy=[Name=RS-3-2-1024k, Schema=[ECSchema=[Codec=rs, 
> numDataUnits=3, numParityUnits=2]], CellSize=1048576, Id=2, State=DISABLED]
> ErasureCodingPolicy=[Name=RS-6-3-1024k, Schema=[ECSchema=[Codec=rs, 
> numDataUnits=6, numParityUnits=3]], CellSize=1048576, Id=1, State=DISABLED]
> ErasureCodingPolicy=[Name=RS-LEGACY-6-3-1024k, 
> Schema=[ECSchema=[Codec=rs-legacy, numDataUnits=6, numParityUnits=3]], 
> CellSize=1048576, Id=3, State=DISABLED]
> ErasureCodingPolicy=[Name=XOR-2-1-1024k, Schema=[ECSchema=[Codec=xor, 
> numDataUnits=2, numParityUnits=1]], CellSize=1048576, Id=4, State=DISABLED]
> [hdfs@nightly6x-1 root]$ hdfs ec -getPolicy -path /ecec
> XOR-2-1-1024k
> {noformat}
> This is because when [deserializing 
> protobuf|https://github.com/apache/hadoop/blob/branch-3.0.0-beta1/hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/protocolPB/PBHelperClient.java#L2942],
>  the static instance of [SystemErasureCodingPolicies 
> class|https://github.com/apache/hadoop/blob/branch-3.0.0-beta1/hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/protocol/SystemErasureCodingPolicies.java#L101]
>  is first checked, and always returns the cached policy objects, which are 
> created by default with state=DISABLED.
> All the existing unit tests pass, because that static instance that the 
> client (e.g. ECAdmin) reads in unit test is updated by NN. :)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-12682) ECAdmin -listPolicies will always show SystemErasureCodingPolicies state as DISABLED

2017-11-01 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-12682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16234926#comment-16234926
 ] 

Hadoop QA commented on HDFS-12682:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 15m 
56s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 6 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  1m 
36s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 16m 
59s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 12m 
36s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  2m 
14s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  2m 
20s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
15m 29s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  4m 
16s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
49s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
16s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  2m 
 6s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 12m 
24s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 12m 
24s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
2m 11s{color} | {color:orange} root: The patch generated 1 new + 647 unchanged 
- 2 fixed = 648 total (was 649) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  2m 
35s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
10m 18s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  4m 
38s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
47s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  1m 
28s{color} | {color:green} hadoop-hdfs-client in the patch passed. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red}117m 56s{color} 
| {color:red} hadoop-hdfs in the patch failed. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red}125m 26s{color} 
| {color:red} hadoop-mapreduce-client-jobclient in the patch failed. {color} |
| {color:red}-1{color} | {color:red} asflicense {color} | {color:red}  1m  
1s{color} | {color:red} The patch generated 1 ASF License warnings. {color} |
| {color:black}{color} | {color:black} {color} | {color:black}353m 55s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.hdfs.server.datanode.TestDataNodeVolumeFailure |
| Timed out junit tests | org.apache.hadoop.mapred.pipes.TestPipeApplication |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:5b98639 |
| JIRA Issue | HDFS-12682 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12895226/HDFS-12682.08.patch |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 2c8d0b595d67 3.13.0-129-generic #178-Ubuntu SMP Fri Aug 11 
12:48:20 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | 

[jira] [Commented] (HDFS-12720) Ozone: Ratis options are not passed from KSM Client protobuf helper correctly.

2017-11-01 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-12720?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16234925#comment-16234925
 ] 

Hadoop QA commented on HDFS-12720:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 10m 
18s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 3 new or modified test 
files. {color} |
|| || || || {color:brown} HDFS-7240 Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
33s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 13m 
56s{color} | {color:green} HDFS-7240 passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
37s{color} | {color:green} HDFS-7240 passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
42s{color} | {color:green} HDFS-7240 passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
39s{color} | {color:green} HDFS-7240 passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m 20s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  3m 
38s{color} | {color:green} HDFS-7240 passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
46s{color} | {color:green} HDFS-7240 passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m  
6s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
25s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
30s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  1m 
30s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 36s{color} | {color:orange} hadoop-hdfs-project: The patch generated 1 new + 
1 unchanged - 1 fixed = 2 total (was 2) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
28s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green}  
9m 19s{color} | {color:green} patch has no errors when building and testing our 
client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  3m 
40s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
42s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  1m 
22s{color} | {color:green} hadoop-hdfs-client in the patch passed. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 99m  8s{color} 
| {color:red} hadoop-hdfs in the patch failed. {color} |
| {color:red}-1{color} | {color:red} asflicense {color} | {color:red}  0m 
23s{color} | {color:red} The patch generated 3 ASF License warnings. {color} |
| {color:black}{color} | {color:black} {color} | {color:black}165m  7s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.hdfs.TestErasureCodingPolicies |
|   | hadoop.hdfs.TestHdfsAdmin |
|   | hadoop.hdfs.TestMaintenanceState |
|   | hadoop.hdfs.TestDFSStripedOutputStreamWithFailure |
|   | hadoop.hdfs.TestDFSStripedOutputStreamWithRandomECPolicy |
|   | hadoop.hdfs.TestDFSStripedOutputStreamWithFailure140 |
|   | hadoop.hdfs.TestDFSStripedOutputStreamWithFailure180 |
|   | hadoop.hdfs.TestReconstructStripedFile |
|   | hadoop.hdfs.TestDFSStripedOutputStreamWithFailure110 |
|   | hadoop.ozone.scm.container.TestContainerMapping |
|   | hadoop.hdfs.TestDFSStripedOutputStreamWithFailure010 |
|   | hadoop.hdfs.TestDFSStripedOutputStreamWithFailure130 |
|   | hadoop.hdfs.web.TestWebHdfsTimeouts |
|   | hadoop.hdfs.server.balancer.TestBalancerRPCDelay |
|   | 

[jira] [Commented] (HDFS-12754) Lease renewal can hit a deadlock

2017-11-01 Thread Kuhu Shukla (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-12754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16234905#comment-16234905
 ] 

Kuhu Shukla commented on HDFS-12754:


This deadlock was found during testing on our end about an year or so ago. The 
fix (attached patch) was deployed to our production clusters ever since and has 
had significant amount of run time.

> Lease renewal can hit a deadlock 
> -
>
> Key: HDFS-12754
> URL: https://issues.apache.org/jira/browse/HDFS-12754
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.8.1
>Reporter: Kuhu Shukla
>Assignee: Kuhu Shukla
>Priority: Major
> Attachments: HDFS-12754.001.patch
>
>
> The Client and the renewer can hit a deadlock during close operation since 
> closeFile() reaches back to the DFSClient#removeFileBeingWritten. This is 
> possible if the client class close when the renewer is renewing a lease.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-12474) Ozone: SCM: Handling container report with key count and container usage.

2017-11-01 Thread Xiaoyu Yao (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-12474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16234876#comment-16234876
 ] 

Xiaoyu Yao commented on HDFS-12474:
---

Thanks [~nandakumar131] for the update. +1 for the v001 patch. I will commit it 
tomorrow if [~linyiqun] and others don't have additional comments.

> Ozone: SCM: Handling container report with key count and container usage.
> -
>
> Key: HDFS-12474
> URL: https://issues.apache.org/jira/browse/HDFS-12474
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Affects Versions: HDFS-7240
>Reporter: Xiaoyu Yao
>Assignee: Nanda kumar
>Priority: Major
>  Labels: ozoneMerge
> Attachments: HDFS-12474-HDFS-7240.000.patch, 
> HDFS-12474-HDFS-7240.001.patch
>
>
> Currently, the container report only contains the # of reports sent to SCM. 
> We will need to provide the key count and the usage of each individual 
> containers to update the SCM container state maintained by 
> ContainerStateManager. This has a dependency on HDFS-12387.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-12754) Lease renewal can hit a deadlock

2017-11-01 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-12754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16234842#comment-16234842
 ] 

Hadoop QA commented on HDFS-12754:
--

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  9m 
24s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 17m 
33s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
36s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
17s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
39s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green}  
9m 58s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
31s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
22s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
38s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
34s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
34s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 15s{color} | {color:orange} hadoop-hdfs-project/hadoop-hdfs-client: The 
patch generated 1 new + 96 unchanged - 0 fixed = 97 total (was 96) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
37s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
10m 14s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
39s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
21s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  1m 
13s{color} | {color:green} hadoop-hdfs-client in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
17s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 58m 15s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:5b98639 |
| JIRA Issue | HDFS-12754 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12895265/HDFS-12754.001.patch |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  shadedclient  findbugs  checkstyle  |
| uname | Linux cb15286aecc2 3.13.0-119-generic #166-Ubuntu SMP Wed May 3 
12:18:55 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 70f1a94 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_131 |
| findbugs | v3.1.0-RC1 |
| checkstyle | 
https://builds.apache.org/job/PreCommit-HDFS-Build/21920/artifact/out/diff-checkstyle-hadoop-hdfs-project_hadoop-hdfs-client.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-HDFS-Build/21920/testReport/ |
| modules | C: hadoop-hdfs-project/hadoop-hdfs-client U: 
hadoop-hdfs-project/hadoop-hdfs-client |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/21920/console |
| Powered by | Apache Yetus 0.7.0-SNAPSHOT   http://yetus.apache.org |


This message was 

[jira] [Commented] (HDFS-7240) Object store in HDFS

2017-11-01 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16234770#comment-16234770
 ] 

Steve Loughran commented on HDFS-7240:
--

I'm starting with hadoop-common and hadoop-ozone; more to follow on thursday.

For now, biggest issue I have is that OzoneException needs to become an IOE, so 
simplifying excpetion handling all round, preserving information, not losing 
stack traces, and generally leading to happy support teams as well as 
developers. Changing the base class isn't itself traumatic, but it will 
implicate the client code as there's almost no longer any need to catch & wrap 
things.


Other: What's your scale limit? I see a single PUT for the upload, GET path > 
tmp in open() . Is there a test for different sizes of file?

h2. hadoop-common

h3. Config


I've filed some comments on thecreated HADOOP-15007, "Stabilize and document 
Configuration  element", to cover making sure that there are the tests & 
docs for this to go in.

* HDFSPropertyTag: s/DEPRICATED/r/DEPRECATED/
* OzonePropertyTag: s/there/their/ 
* OzoneConfig Property.toString() is going to be "key valuenull" if there is no 
tag defined. Space?



h3. FileUtils

minor: imports all shuffled about compared to trunk & branch-2. revert.

h3. OzoneException

This is is own exception, not an IOE, and at least in OzoneFileSystem the 
process to build an IOE from itinvariably loses the inner stack trace and all 
meaningful information about the exception type. Equally, OzoneBucket catches 
all forms of IOException, converts to an {{OzoneRestClientException}}. 

We don't need to do this. 

it will lose stack trace data, cause confusion, is already making the client 
code over complex with catching IOEs, wrapping to OzoneException, catching 
OzoneException and converting to an IOE, at which point all core information is 
lost. 

1. Make this subclass of IOE, consistent with the rest of our code, and then 
clients can throw up untouched, except in the special case that they need to 
perform some form of exception.
1. Except for (any?) special cases, pass up IOEs raised in the http client as 
is.


Also.
* confused by the overridding of message/getmessage. Is for serialization? 
* Consider adding a setMessage(String format, string...args) and calling 
STring.format: it would tie in with uses in the code.
* override setThrowable and setMessage() called to set the nested ex (hence 
full stack) and handle the case where the exception returns null for 
getMessage().

{code}
OzoneException initCause(Throwable t) {
  super.initCause(t)
  setMessage(t.getMessage() != null ? t.getMessage() : t.toString())
}
{code}

h2. OzoneFileSystem

h3. general


* various places use LOG.info("text " + something). they should all move to 
LOG.info("text {}", something)
* Once OzoneException -> IOE, you can cut the catch and translate here.
* qualify path before all uses. That's needed to stop them being relative, and 
to catch things like someone calling ozfs.rename("o3://bucket/src", 
"s3a://bucket/dest"), delete("s3a://bucket/path"), etc, as well as problems 
with validation happening before paths are made absolute.



* {{RenameIterator.iterate()}} it's going to log @ warn whenever it can't 
delete a temp file because it doesn't exist, which may be a distraction in 
failures. Better: {{if(!tmpFile.delete() && tmpFile.exists())}}, as that will 
only warn if the temp file is actually there. 

h3. OzoneFileSystem.rename(). 
Rename() is the operation to fear on an object store. I haven't looked at in 
full detail,. 
* Qualify all the paths before doing directory validation. Otherwise you can 
defeat the "don't rename into self checks"  rename("/path/src", 
"/path/../path/src/dest").
* Log @ debu all the paths taken before returning so you can debug if needed. 
* S3A rename ended up having a special RenameFailedException() which 
innerRename() raises, with text and return code. Outer rename logs the text and 
returns the return code. This means that all failing paths have an exception 
clearly thrown, and when we eventually make rename/3 public, it's lined up to 
throw exceptions back to the caller. Consider copying this code.

h3. OzoneFileSystem.delete

* qualify path before use
* dont' log at error if you can't delete a nonexistent path, it is used 
everywhere for silent cleanup. Cut it

h3. OzoneFileSystem.ListStatusIterator

* make status field final

h3. OzoneFileSystem.mkdir

Liked your algorithm here; took me a moment to understand how rollback didn't 
need to track all created directories. nice.
* do qualify path first.

h3. OzoneFileSystem.getFileStatus

{{getKeyInfo()}} catches all exceptions and maps to null, which is interpreted 
not found and eventually surfaces as FNFE. This is misleading if the failure is 
for any other reason.

Once OzoneException -> IOException, {{getKeyInfo()}} should only catch & 
downgrade the explicit not found (404?) 

[jira] [Commented] (HDFS-12737) Thousands of sockets lingering in TIME_WAIT state due to frequent file open operations

2017-11-01 Thread Yongjun Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-12737?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16234752#comment-16234752
 ] 

Yongjun Zhang commented on HDFS-12737:
--

Many thanks [~jnp], that make sense!

If we could make the BlockTokenSelector also check block id, when finding it's 
a block token, it would help, but it looks not an easy thing to do at all.






> Thousands of sockets lingering in TIME_WAIT state due to frequent file open 
> operations
> --
>
> Key: HDFS-12737
> URL: https://issues.apache.org/jira/browse/HDFS-12737
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: ipc
> Environment: CDH5.10.2, HBase Multi-WAL=2, 250 replication peers
>Reporter: Wei-Chiu Chuang
>Assignee: Wei-Chiu Chuang
>Priority: Major
>
> On a HBase cluster we found HBase RegionServers have thousands of sockets in 
> TIME_WAIT state. It depleted system resources and caused other services to 
> fail.
> After months of troubleshooting, we found the issue is the cluster has 
> hundreds of replication peers, and has multi-WAL = 2. That creates hundreds 
> of replication threads in HBase RS, and each thread opens WAL file *every 
> second*.
> We found that the IPC client closes socket right away, and does not reuse 
> socket connection. Since each closed socket stays in TIME_WAIT state for 60 
> seconds in Linux by default, that generates thousands of TIME_WAIT sockets.
> {code:title=ClientDatanodeProtocolTranslatorPB:createClientDatanodeProtocolProxy}
> // Since we're creating a new UserGroupInformation here, we know that no
> // future RPC proxies will be able to re-use the same connection. And
> // usages of this proxy tend to be one-off calls.
> //
> // This is a temporary fix: callers should really achieve this by using
> // RPC.stopProxy() on the resulting object, but this is currently not
> // working in trunk. See the discussion on HDFS-1965.
> Configuration confWithNoIpcIdle = new Configuration(conf);
> confWithNoIpcIdle.setInt(CommonConfigurationKeysPublic
> .IPC_CLIENT_CONNECTION_MAXIDLETIME_KEY, 0);
> {code}
> This piece of code is used in DistributedFileSystem#open()
> {noformat}
> 2017-10-27 14:01:44,152 DEBUG org.apache.hadoop.ipc.Client: New connection 
> Thread[IPC Client (1838187805) connection to /172.131.21.48:20001 from 
> blk_1013754707_14032,5,main] for remoteId /172.131.21.48:20001
> java.lang.Throwable: For logging stack trace, not a real exception
> at org.apache.hadoop.ipc.Client.getConnection(Client.java:1556)
> at org.apache.hadoop.ipc.Client.call(Client.java:1482)
> at org.apache.hadoop.ipc.Client.call(Client.java:1443)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:230)
> at com.sun.proxy.$Proxy28.getReplicaVisibleLength(Unknown Source)
> at 
> org.apache.hadoop.hdfs.protocolPB.ClientDatanodeProtocolTranslatorPB.getReplicaVisibleLength(ClientDatanodeProtocolTranslatorPB.java:198)
> at 
> org.apache.hadoop.hdfs.DFSInputStream.readBlockLength(DFSInputStream.java:365)
> at 
> org.apache.hadoop.hdfs.DFSInputStream.fetchLocatedBlocksAndGetLastBlockLength(DFSInputStream.java:335)
> at 
> org.apache.hadoop.hdfs.DFSInputStream.openInfo(DFSInputStream.java:271)
> at 
> org.apache.hadoop.hdfs.DFSInputStream.(DFSInputStream.java:263)
> at org.apache.hadoop.hdfs.DFSClient.open(DFSClient.java:1585)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem$4.doCall(DistributedFileSystem.java:326)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem$4.doCall(DistributedFileSystem.java:322)
> at 
> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem.open(DistributedFileSystem.java:322)
> at 
> org.apache.hadoop.fs.FilterFileSystem.open(FilterFileSystem.java:162)
> at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:783)
> at 
> org.apache.hadoop.hbase.wal.WALFactory.createReader(WALFactory.java:293)
> at 
> org.apache.hadoop.hbase.wal.WALFactory.createReader(WALFactory.java:267)
> at 
> org.apache.hadoop.hbase.wal.WALFactory.createReader(WALFactory.java:255)
> at 
> org.apache.hadoop.hbase.wal.WALFactory.createReader(WALFactory.java:414)
> at 
> org.apache.hadoop.hbase.replication.regionserver.ReplicationWALReaderManager.openReader(ReplicationWALReaderManager.java:70)
> at 
> org.apache.hadoop.hbase.replication.regionserver.ReplicationSource$ReplicationSourceWorkerThread.openReader(ReplicationSource.java:747)
> at 
> 

[jira] [Commented] (HDFS-12744) More logs when short-circuit read is failed and disabled

2017-11-01 Thread Subru Krishnan (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-12744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16234697#comment-16234697
 ] 

Subru Krishnan commented on HDFS-12744:
---

[~cheersyang]/[~jzhuge], you should cherry-pick to branch-2.9 if you want to 
include in 2.9.0 release. Thanks.

> More logs when short-circuit read is failed and disabled
> 
>
> Key: HDFS-12744
> URL: https://issues.apache.org/jira/browse/HDFS-12744
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Reporter: Weiwei Yang
>Assignee: Weiwei Yang
>Priority: Major
>  Labels: supportability
> Fix For: 2.9.0, 3.0.0
>
> Attachments: HDFS-12744.001.patch, HDFS-12744.002.patch
>
>
> Short-circuit read (SCR) failed with following error
> {noformat}
> 2017-10-21 16:42:28,024 WARN  
> [B.defaultRpcServer.handler=7,queue=7,port=16020] 
> impl.BlockReaderFactory: BlockReaderFactory(xxx): unknown response code ERROR
> while attempting to set up short-circuit access. Block xxx is not valid
> {noformat}
> then short-circuit read is disabled for *10 minutes* without any warning 
> message given in the log. This causes us spent some more time to figure out 
> why we had a long time window that SCR was not working. Propose to add a 
> warning log (other places already did) to indicate SCR is disabled and some 
> more logging in DN to display what happened.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-12725) BlockPlacementPolicyRackFaultTolerant still fails with racks with very few nodes

2017-11-01 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-12725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16234685#comment-16234685
 ] 

Hadoop QA commented on HDFS-12725:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  9m 
21s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 16m 
58s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
54s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
37s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m  
2s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m 17s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
56s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
48s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
59s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
51s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
51s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 35s{color} | {color:orange} hadoop-hdfs-project/hadoop-hdfs: The patch 
generated 1 new + 4 unchanged - 1 fixed = 5 total (was 5) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
58s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
10m 53s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
59s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
46s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 64m  3s{color} 
| {color:red} hadoop-hdfs in the patch failed. {color} |
| {color:red}-1{color} | {color:red} asflicense {color} | {color:red}  0m 
19s{color} | {color:red} The patch generated 96 ASF License warnings. {color} |
| {color:black}{color} | {color:black} {color} | {color:black}124m  0s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.hdfs.TestSafeModeWithStripedFile |
|   | hadoop.hdfs.security.TestDelegationTokenForProxyUser |
|   | hadoop.hdfs.web.TestWebHdfsFileSystemContract |
|   | hadoop.hdfs.TestDFSStripedOutputStreamWithFailure190 |
|   | hadoop.hdfs.TestFileLengthOnClusterRestart |
|   | hadoop.hdfs.TestWriteReadStripedFile |
|   | hadoop.hdfs.TestDFSStripedOutputStreamWithFailure150 |
|   | hadoop.hdfs.TestReconstructStripedFile |
| Timed out junit tests | 
org.apache.hadoop.hdfs.TestReadStripedFileWithDecodingCorruptData |
|   | org.apache.hadoop.hdfs.TestReadStripedFileWithDecodingDeletedData |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:5b98639 |
| JIRA Issue | HDFS-12725 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12895241/HDFS-12725.05.patch |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 2598baa15338 3.13.0-123-generic #172-Ubuntu SMP Mon Jun 26 
18:04:35 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 56b88b0 |
| maven 

[jira] [Commented] (HDFS-12720) Ozone: Ratis options are not passed from KSM Client protobuf helper correctly.

2017-11-01 Thread Tsz Wo Nicholas Sze (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-12720?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16234673#comment-16234673
 ] 

Tsz Wo Nicholas Sze commented on HDFS-12720:


+1 for the v5 patch.

> Ozone: Ratis options are not passed from KSM Client protobuf helper correctly.
> --
>
> Key: HDFS-12720
> URL: https://issues.apache.org/jira/browse/HDFS-12720
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: ozone
>Affects Versions: HDFS-7240
>Reporter: Mukul Kumar Singh
>Assignee: Mukul Kumar Singh
>Priority: Major
>  Labels: ozoneMerge
> Fix For: HDFS-7240
>
> Attachments: HDFS-12720-HDFS-7240.001.patch, 
> HDFS-12720-HDFS-7240.002.patch, HDFS-12720-HDFS-7240.003.patch, 
> HDFS-12720-HDFS-7240.004.patch, HDFS-12720-HDFS-7240.005.patch
>
>
> {{KeySpaceManagerProtocolClientSideTranslatorPB#allocateBlock}} and 
> {{KeySpaceManagerProtocolClientSideTranslatorPB#openKey}} do not pass the 
> ratis replication factor and replication type to the KSM server. this causes 
> the allocations using ratis model to resort to standalone mode even when 
> Ratis mode is specified.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-12754) Lease renewal can hit a deadlock

2017-11-01 Thread Kuhu Shukla (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-12754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16234646#comment-16234646
 ] 

Kuhu Shukla commented on HDFS-12754:


CC: [~kihwal].

> Lease renewal can hit a deadlock 
> -
>
> Key: HDFS-12754
> URL: https://issues.apache.org/jira/browse/HDFS-12754
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.8.1
>Reporter: Kuhu Shukla
>Assignee: Kuhu Shukla
>Priority: Major
> Attachments: HDFS-12754.001.patch
>
>
> The Client and the renewer can hit a deadlock during close operation since 
> closeFile() reaches back to the DFSClient#removeFileBeingWritten. This is 
> possible if the client class close when the renewer is renewing a lease.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-12754) Lease renewal can hit a deadlock

2017-11-01 Thread Kuhu Shukla (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-12754?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kuhu Shukla updated HDFS-12754:
---
Attachment: HDFS-12754.001.patch

This patch calls removal only when necessary in endFileLease().  

> Lease renewal can hit a deadlock 
> -
>
> Key: HDFS-12754
> URL: https://issues.apache.org/jira/browse/HDFS-12754
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.8.1
>Reporter: Kuhu Shukla
>Assignee: Kuhu Shukla
>Priority: Major
> Attachments: HDFS-12754.001.patch
>
>
> The Client and the renewer can hit a deadlock during close operation since 
> closeFile() reaches back to the DFSClient#removeFileBeingWritten. This is 
> possible if the client class close when the renewer is renewing a lease.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-12754) Lease renewal can hit a deadlock

2017-11-01 Thread Kuhu Shukla (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-12754?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kuhu Shukla updated HDFS-12754:
---
Status: Patch Available  (was: Open)

> Lease renewal can hit a deadlock 
> -
>
> Key: HDFS-12754
> URL: https://issues.apache.org/jira/browse/HDFS-12754
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.8.1
>Reporter: Kuhu Shukla
>Assignee: Kuhu Shukla
>Priority: Major
> Attachments: HDFS-12754.001.patch
>
>
> The Client and the renewer can hit a deadlock during close operation since 
> closeFile() reaches back to the DFSClient#removeFileBeingWritten. This is 
> possible if the client class close when the renewer is renewing a lease.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-12754) Lease renewal can hit a deadlock

2017-11-01 Thread Kuhu Shukla (JIRA)
Kuhu Shukla created HDFS-12754:
--

 Summary: Lease renewal can hit a deadlock 
 Key: HDFS-12754
 URL: https://issues.apache.org/jira/browse/HDFS-12754
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.8.1
Reporter: Kuhu Shukla
Assignee: Kuhu Shukla
Priority: Major


The Client and the renewer can hit a deadlock during close operation since 
closeFile() reaches back to the DFSClient#removeFileBeingWritten. This is 
possible if the client class close when the renewer is renewing a lease.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-12753) Getting file not found exception while using distcp with s3a

2017-11-01 Thread Logesh Rangan (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-12753?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16234627#comment-16234627
 ] 

Logesh Rangan commented on HDFS-12753:
--

But our production environment doesn't offer a Dynamo DB instance for S3 Guard. 
Is there a way to tune the options for distcp to copy the huge files. I'm 
looking for below information,

1) How to select the number of  map and it's size. I have a directory which has 
~1+ files with total size of ~250 GB. When I run with below option, it is 
taking ~1.30 hours.

hadoop distcp -D HADOOP_OPTS=-Xmx12g -D HADOOP_CLIENT_OPTS='-Xmx12g 
-XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=70 
-XX:+CMSParallelRemarkEnabled' -D 'mapreduce.map.memory.mb=12288' -D 
'mapreduce.map.java.opts=-Xmx10g' -D 'mapreduce.reduce.memory.mb=12288' -D 
'mapreduce.reduce.java.opts=-Xmx10g' 
'-Dfs.s3a.proxy.host=edhmgrn-prod.cloud.capitalone.com' 
'-Dfs.s3a.proxy.port=8088' '-Dfs.s3a.access.key=XXX' 
'-Dfs.s3a.secret.key=XXX' '-Dfs.s3a.connection.timeout=18' 
'-Dfs.s3a.attempts.maximum=5' '-Dfs.s3a.fast.upload=true' 
'-Dfs.s3a.fast.upload.buffer=array' '-Dfs.s3a.fast.upload.active.blocks=50' 
'-Dfs.s3a.multipart.size=262144000' '-Dfs.s3a.threads.max=500' 
'-Dfs.s3a.threads.keepalivetime=600' 
'-Dfs.s3a.server-side-encryption-algorithm=AES256' -bandwidth 3072 -strategy 
dynamic -m 200 -numListstatusThreads 30 /src/ s3a://bucket/dest

2) I'm not seeing the throughput of 3gbps even after configuring the -bandwidth 
as 3072. 

3) How to configure the Java heap and map size for the huge file, so that 
distcp will give better performance.

4) WIth fast upload option, I'm writing the files to S3 using threads. Could 
you please help me in providing some tuning option for this.

Appreciate Your Help.

> Getting file not found exception while using distcp with s3a
> 
>
> Key: HDFS-12753
> URL: https://issues.apache.org/jira/browse/HDFS-12753
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Logesh Rangan
>
> I'm using the distcp option to copy the huge files from Hadoop to S3. 
> Sometimes i'm getting the below error,
> *Command:* (Copying 378 GB data)
> _hadoop distcp -D HADOOP_OPTS=-Xmx12g -D HADOOP_CLIENT_OPTS='-Xmx12g 
> -XX:+UseParNewGC -XX:+UseConcMarkSweepGC 
> -XX:CMSInitiatingOccupancyFraction=70 -XX:+CMSParallelRemarkEnabled' -D 
> 'mapreduce.map.memory.mb=12288' -D 'mapreduce.map.java.opts=-Xmx10g' -D 
> 'mapreduce.reduce.memory.mb=12288' -D 'mapreduce.reduce.java.opts=-Xmx10g' 
> '-Dfs.s3a.proxy.host=edhmgrn-prod.cloud.capitalone.com' 
> '-Dfs.s3a.proxy.port=8088' '-Dfs.s3a.access.key=XXX' 
> '-Dfs.s3a.secret.key=XXX' '-Dfs.s3a.connection.timeout=18' 
> '-Dfs.s3a.attempts.maximum=5' '-Dfs.s3a.fast.upload=true' 
> '-Dfs.s3a.fast.upload.buffer=array' '-Dfs.s3a.fast.upload.active.blocks=50' 
> '-Dfs.s3a.multipart.size=262144000' '-Dfs.s3a.threads.max=500' 
> '-Dfs.s3a.threads.keepalivetime=600' 
> '-Dfs.s3a.server-side-encryption-algorithm=AES256' -bandwidth 3072 -strategy 
> dynamic -m 220 -numListstatusThreads 30 /src/ s3a://bucket/dest
> _
> 17/11/01 12:23:27 INFO mapreduce.Job: Task Id : 
> attempt_1497120915913_2792335_m_000165_0, Status : FAILED
> Error: java.io.FileNotFoundException: No such file or directory: 
> s3a://bucketname/filename
> at 
> org.apache.hadoop.fs.s3a.S3AFileSystem.getFileStatus(S3AFileSystem.java:1132)
> at 
> org.apache.hadoop.fs.s3a.S3AFileSystem.getFileStatus(S3AFileSystem.java:78)
> at 
> org.apache.hadoop.tools.util.DistCpUtils.preserve(DistCpUtils.java:197)
> at org.apache.hadoop.tools.mapred.CopyMapper.map(CopyMapper.java:256)
> at org.apache.hadoop.tools.mapred.CopyMapper.map(CopyMapper.java:50)
> at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145)
> at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:787)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
> at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1912)
> at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
> 17/11/01 12:28:32 INFO mapreduce.Job: Task Id : 
> attempt_1497120915913_2792335_m_10_0, Status : FAILED
> Error: java.io.IOException: File copy failed: hdfs://nameservice1/filena --> 
> s3a://cof-prod-lake-card/src/seam/acct_scores/acctmdlscore_card_cobna_anon_vldtd/instnc_id=2016102300/04_0_copy_6
> at 
> org.apache.hadoop.tools.mapred.CopyMapper.copyFileWithRetry(CopyMapper.java:284)
> at 

[jira] [Commented] (HDFS-12753) Getting file not found exception while using distcp with s3a

2017-11-01 Thread Wei-Chiu Chuang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-12753?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16234533#comment-16234533
 ] 

Wei-Chiu Chuang commented on HDFS-12753:


Looks like you are hit by S3's eventual consistency.

Check out S3Guard which should help with your problem:
https://blog.cloudera.com/blog/2017/08/introducing-s3guard-s3-consistency-for-apache-hadoop/
https://hortonworks.com/blog/s3guard-amazon-s3-consistency/

> Getting file not found exception while using distcp with s3a
> 
>
> Key: HDFS-12753
> URL: https://issues.apache.org/jira/browse/HDFS-12753
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Logesh Rangan
>
> I'm using the distcp option to copy the huge files from Hadoop to S3. 
> Sometimes i'm getting the below error,
> *Command:* (Copying 378 GB data)
> _hadoop distcp -D HADOOP_OPTS=-Xmx12g -D HADOOP_CLIENT_OPTS='-Xmx12g 
> -XX:+UseParNewGC -XX:+UseConcMarkSweepGC 
> -XX:CMSInitiatingOccupancyFraction=70 -XX:+CMSParallelRemarkEnabled' -D 
> 'mapreduce.map.memory.mb=12288' -D 'mapreduce.map.java.opts=-Xmx10g' -D 
> 'mapreduce.reduce.memory.mb=12288' -D 'mapreduce.reduce.java.opts=-Xmx10g' 
> '-Dfs.s3a.proxy.host=edhmgrn-prod.cloud.capitalone.com' 
> '-Dfs.s3a.proxy.port=8088' '-Dfs.s3a.access.key=XXX' 
> '-Dfs.s3a.secret.key=XXX' '-Dfs.s3a.connection.timeout=18' 
> '-Dfs.s3a.attempts.maximum=5' '-Dfs.s3a.fast.upload=true' 
> '-Dfs.s3a.fast.upload.buffer=array' '-Dfs.s3a.fast.upload.active.blocks=50' 
> '-Dfs.s3a.multipart.size=262144000' '-Dfs.s3a.threads.max=500' 
> '-Dfs.s3a.threads.keepalivetime=600' 
> '-Dfs.s3a.server-side-encryption-algorithm=AES256' -bandwidth 3072 -strategy 
> dynamic -m 220 -numListstatusThreads 30 /src/ s3a://bucket/dest
> _
> 17/11/01 12:23:27 INFO mapreduce.Job: Task Id : 
> attempt_1497120915913_2792335_m_000165_0, Status : FAILED
> Error: java.io.FileNotFoundException: No such file or directory: 
> s3a://bucketname/filename
> at 
> org.apache.hadoop.fs.s3a.S3AFileSystem.getFileStatus(S3AFileSystem.java:1132)
> at 
> org.apache.hadoop.fs.s3a.S3AFileSystem.getFileStatus(S3AFileSystem.java:78)
> at 
> org.apache.hadoop.tools.util.DistCpUtils.preserve(DistCpUtils.java:197)
> at org.apache.hadoop.tools.mapred.CopyMapper.map(CopyMapper.java:256)
> at org.apache.hadoop.tools.mapred.CopyMapper.map(CopyMapper.java:50)
> at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145)
> at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:787)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
> at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1912)
> at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
> 17/11/01 12:28:32 INFO mapreduce.Job: Task Id : 
> attempt_1497120915913_2792335_m_10_0, Status : FAILED
> Error: java.io.IOException: File copy failed: hdfs://nameservice1/filena --> 
> s3a://cof-prod-lake-card/src/seam/acct_scores/acctmdlscore_card_cobna_anon_vldtd/instnc_id=2016102300/04_0_copy_6
> at 
> org.apache.hadoop.tools.mapred.CopyMapper.copyFileWithRetry(CopyMapper.java:284)
> at org.apache.hadoop.tools.mapred.CopyMapper.map(CopyMapper.java:252)
> at org.apache.hadoop.tools.mapred.CopyMapper.map(CopyMapper.java:50)
> at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145)
> at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:787)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
> at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1912)
> at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
> Caused by: java.io.IOException: Couldn't run retriable-command: Copying 
> hdfs://nameservice1/filename to s3a://bucketname/filename
> at 
> org.apache.hadoop.tools.util.RetriableCommand.execute(RetriableCommand.java:101)
> at 
> org.apache.hadoop.tools.mapred.CopyMapper.copyFileWithRetry(CopyMapper.java:280)
> ... 10 more
> Caused by: com.cloudera.com.amazonaws.AmazonClientException: Failed to parse 
> XML document with handler class 
> com.cloudera.com.amazonaws.services.s3.model.transform.XmlResponsesSaxParser$ListBucketHandler
> at 
> 

[jira] [Updated] (HDFS-12720) Ozone: Ratis options are not passed from KSM Client protobuf helper correctly.

2017-11-01 Thread Mukul Kumar Singh (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-12720?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mukul Kumar Singh updated HDFS-12720:
-
Attachment: HDFS-12720-HDFS-7240.005.patch

Patch v5 fixes the unit test failures and checkstyle issues.

> Ozone: Ratis options are not passed from KSM Client protobuf helper correctly.
> --
>
> Key: HDFS-12720
> URL: https://issues.apache.org/jira/browse/HDFS-12720
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: ozone
>Affects Versions: HDFS-7240
>Reporter: Mukul Kumar Singh
>Assignee: Mukul Kumar Singh
>Priority: Major
>  Labels: ozoneMerge
> Fix For: HDFS-7240
>
> Attachments: HDFS-12720-HDFS-7240.001.patch, 
> HDFS-12720-HDFS-7240.002.patch, HDFS-12720-HDFS-7240.003.patch, 
> HDFS-12720-HDFS-7240.004.patch, HDFS-12720-HDFS-7240.005.patch
>
>
> {{KeySpaceManagerProtocolClientSideTranslatorPB#allocateBlock}} and 
> {{KeySpaceManagerProtocolClientSideTranslatorPB#openKey}} do not pass the 
> ratis replication factor and replication type to the KSM server. this causes 
> the allocations using ratis model to resort to standalone mode even when 
> Ratis mode is specified.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-12720) Ozone: Ratis options are not passed from KSM Client protobuf helper correctly.

2017-11-01 Thread Tsz Wo Nicholas Sze (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-12720?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz Wo Nicholas Sze updated HDFS-12720:
---
Hadoop Flags: Reviewed

+1 the new patch looks good.  Thanks

> Ozone: Ratis options are not passed from KSM Client protobuf helper correctly.
> --
>
> Key: HDFS-12720
> URL: https://issues.apache.org/jira/browse/HDFS-12720
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: ozone
>Affects Versions: HDFS-7240
>Reporter: Mukul Kumar Singh
>Assignee: Mukul Kumar Singh
>Priority: Major
>  Labels: ozoneMerge
> Fix For: HDFS-7240
>
> Attachments: HDFS-12720-HDFS-7240.001.patch, 
> HDFS-12720-HDFS-7240.002.patch, HDFS-12720-HDFS-7240.003.patch, 
> HDFS-12720-HDFS-7240.004.patch
>
>
> {{KeySpaceManagerProtocolClientSideTranslatorPB#allocateBlock}} and 
> {{KeySpaceManagerProtocolClientSideTranslatorPB#openKey}} do not pass the 
> ratis replication factor and replication type to the KSM server. this causes 
> the allocations using ratis model to resort to standalone mode even when 
> Ratis mode is specified.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-12725) BlockPlacementPolicyRackFaultTolerant still fails with racks with very few nodes

2017-11-01 Thread Xiao Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-12725?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Chen updated HDFS-12725:
-
Attachment: HDFS-12725.05.patch

I was thinking about this patch and IMO we should still WARN in NN logs even if 
it's placed, so the situation doesn't go unnoticed.

Will now emit an message like:
{noformat}
2017-11-01 10:49:27,081 [IPC Server handler 8 on 55407] WARN  
blockmanagement.BlockPlacementPolicy 
(BlockPlacementPolicyRackFaultTolerant.java:chooseTargetInOrder(142)) - Only 
able to place 7 of total expected 9 (maxNodesPerRack=2, numOfReplicas=4) nodes 
evenly across racks, falling back to uneven placement.
{noformat}

> BlockPlacementPolicyRackFaultTolerant still fails with racks with very few 
> nodes
> 
>
> Key: HDFS-12725
> URL: https://issues.apache.org/jira/browse/HDFS-12725
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: erasure-coding
>Affects Versions: 3.0.0
>Reporter: Xiao Chen
>Assignee: Xiao Chen
>Priority: Major
>  Labels: hdfs-ec-3.0-must-do
> Attachments: HDFS-12725.01.patch, HDFS-12725.02.patch, 
> HDFS-12725.03.patch, HDFS-12725.04.patch, HDFS-12725.05.patch
>
>
> HDFS-12567 tries to fix the scenario where EC blocks may not be allocated in 
> extremely rack-imbalanced cluster.
> The added fall-back step of the fix could be improved to do a best-effort 
> placement. This is more likely to happen in testing than in real clusters.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-12564) Add the documents of swebhdfs configurations on the client side

2017-11-01 Thread Xiaoyu Yao (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-12564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16234398#comment-16234398
 ] 

Xiaoyu Yao commented on HDFS-12564:
---

Thanks [~tasanuma0829]. The patch looks good to me overall, here are a few 
comments:

Distcp.md.vm

Line 423: suggest adding a separate section and put the content(links) under it.

"#H3 Secure Copy over the wire with distcp"


ServerSetup.md.vm

This page is for HTTPFS. To avoid confusion, I would suggest we add a detailed 
ssl-client.xml example instead of linking it to 
Swebhdfs document. 


Webhdfs.md

Line 161: /etc/hadoop/hdfs-site.xml has a configuration key to enable secure 
http, 
i.e., dfs.http.policy=HTTPS_ONLY

Also note that dfs.http.policy is not for swebhdfs only. This will also affect 
all the HTTP endpoints of HDFS such as the NN, DN WebUI, JMX, QJM.
Line 198: suggest give a full path: ssl-client.xml -> 
/etc/hadoop/ssl-client.xml 


We also need to document settings for the server side settings, e.g., 
ssl-server.xml. 

> Add the documents of swebhdfs configurations on the client side
> ---
>
> Key: HDFS-12564
> URL: https://issues.apache.org/jira/browse/HDFS-12564
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: documentation, webhdfs
>Reporter: Takanobu Asanuma
>Assignee: Takanobu Asanuma
>Priority: Major
> Attachments: HDFS-12564.1.patch, HDFS-12564.2.patch
>
>
> Documentation does not cover the swebhdfs configurations on the client side. 
> We can reuse the hftp/hsftp documents which was removed from Hadoop-3.0 in 
> HDFS-5570, HDFS-9640.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-12735) Make ContainerStateMachine#applyTransaction async

2017-11-01 Thread Lokesh Jain (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-12735?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lokesh Jain updated HDFS-12735:
---
Attachment: HDFS-12735-HDFS-7240.000.patch

> Make ContainerStateMachine#applyTransaction async
> -
>
> Key: HDFS-12735
> URL: https://issues.apache.org/jira/browse/HDFS-12735
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Lokesh Jain
>Assignee: Lokesh Jain
>Priority: Major
>  Labels: performance
> Attachments: HDFS-12735-HDFS-7240.000.patch
>
>
> Currently ContainerStateMachine#applyTransaction makes a synchronous call to 
> dispatch client requests. Idea is to have a thread pool which dispatches 
> client requests and returns a CompletableFuture.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-12753) Getting file not found exception while using distcp with s3a

2017-11-01 Thread Logesh Rangan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-12753?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Logesh Rangan updated HDFS-12753:
-
Summary: Getting file not found exception while using distcp with s3a  
(was: Getting file not founf exception while using distcp with s3a)

> Getting file not found exception while using distcp with s3a
> 
>
> Key: HDFS-12753
> URL: https://issues.apache.org/jira/browse/HDFS-12753
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Logesh Rangan
>
> I'm using the distcp option to copy the huge files from Hadoop to S3. 
> Sometimes i'm getting the below error,
> *Command:* (Copying 378 GB data)
> _hadoop distcp -D HADOOP_OPTS=-Xmx12g -D HADOOP_CLIENT_OPTS='-Xmx12g 
> -XX:+UseParNewGC -XX:+UseConcMarkSweepGC 
> -XX:CMSInitiatingOccupancyFraction=70 -XX:+CMSParallelRemarkEnabled' -D 
> 'mapreduce.map.memory.mb=12288' -D 'mapreduce.map.java.opts=-Xmx10g' -D 
> 'mapreduce.reduce.memory.mb=12288' -D 'mapreduce.reduce.java.opts=-Xmx10g' 
> '-Dfs.s3a.proxy.host=edhmgrn-prod.cloud.capitalone.com' 
> '-Dfs.s3a.proxy.port=8088' '-Dfs.s3a.access.key=XXX' 
> '-Dfs.s3a.secret.key=XXX' '-Dfs.s3a.connection.timeout=18' 
> '-Dfs.s3a.attempts.maximum=5' '-Dfs.s3a.fast.upload=true' 
> '-Dfs.s3a.fast.upload.buffer=array' '-Dfs.s3a.fast.upload.active.blocks=50' 
> '-Dfs.s3a.multipart.size=262144000' '-Dfs.s3a.threads.max=500' 
> '-Dfs.s3a.threads.keepalivetime=600' 
> '-Dfs.s3a.server-side-encryption-algorithm=AES256' -bandwidth 3072 -strategy 
> dynamic -m 220 -numListstatusThreads 30 /src/ s3a://bucket/dest
> _
> 17/11/01 12:23:27 INFO mapreduce.Job: Task Id : 
> attempt_1497120915913_2792335_m_000165_0, Status : FAILED
> Error: java.io.FileNotFoundException: No such file or directory: 
> s3a://bucketname/filename
> at 
> org.apache.hadoop.fs.s3a.S3AFileSystem.getFileStatus(S3AFileSystem.java:1132)
> at 
> org.apache.hadoop.fs.s3a.S3AFileSystem.getFileStatus(S3AFileSystem.java:78)
> at 
> org.apache.hadoop.tools.util.DistCpUtils.preserve(DistCpUtils.java:197)
> at org.apache.hadoop.tools.mapred.CopyMapper.map(CopyMapper.java:256)
> at org.apache.hadoop.tools.mapred.CopyMapper.map(CopyMapper.java:50)
> at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145)
> at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:787)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
> at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1912)
> at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
> 17/11/01 12:28:32 INFO mapreduce.Job: Task Id : 
> attempt_1497120915913_2792335_m_10_0, Status : FAILED
> Error: java.io.IOException: File copy failed: hdfs://nameservice1/filena --> 
> s3a://cof-prod-lake-card/src/seam/acct_scores/acctmdlscore_card_cobna_anon_vldtd/instnc_id=2016102300/04_0_copy_6
> at 
> org.apache.hadoop.tools.mapred.CopyMapper.copyFileWithRetry(CopyMapper.java:284)
> at org.apache.hadoop.tools.mapred.CopyMapper.map(CopyMapper.java:252)
> at org.apache.hadoop.tools.mapred.CopyMapper.map(CopyMapper.java:50)
> at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145)
> at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:787)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
> at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1912)
> at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
> Caused by: java.io.IOException: Couldn't run retriable-command: Copying 
> hdfs://nameservice1/filename to s3a://bucketname/filename
> at 
> org.apache.hadoop.tools.util.RetriableCommand.execute(RetriableCommand.java:101)
> at 
> org.apache.hadoop.tools.mapred.CopyMapper.copyFileWithRetry(CopyMapper.java:280)
> ... 10 more
> Caused by: com.cloudera.com.amazonaws.AmazonClientException: Failed to parse 
> XML document with handler class 
> com.cloudera.com.amazonaws.services.s3.model.transform.XmlResponsesSaxParser$ListBucketHandler
> at 
> com.cloudera.com.amazonaws.services.s3.model.transform.XmlResponsesSaxParser.parseXmlInputStream(XmlResponsesSaxParser.java:164)
> at 
> 

[jira] [Created] (HDFS-12753) Getting file not founf exception while using distcp with s3a

2017-11-01 Thread Logesh Rangan (JIRA)
Logesh Rangan created HDFS-12753:


 Summary: Getting file not founf exception while using distcp with 
s3a
 Key: HDFS-12753
 URL: https://issues.apache.org/jira/browse/HDFS-12753
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Logesh Rangan


I'm using the distcp option to copy the huge files from Hadoop to S3. Sometimes 
i'm getting the below error,

*Command:* (Copying 378 GB data)

_hadoop distcp -D HADOOP_OPTS=-Xmx12g -D HADOOP_CLIENT_OPTS='-Xmx12g 
-XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=70 
-XX:+CMSParallelRemarkEnabled' -D 'mapreduce.map.memory.mb=12288' -D 
'mapreduce.map.java.opts=-Xmx10g' -D 'mapreduce.reduce.memory.mb=12288' -D 
'mapreduce.reduce.java.opts=-Xmx10g' 
'-Dfs.s3a.proxy.host=edhmgrn-prod.cloud.capitalone.com' 
'-Dfs.s3a.proxy.port=8088' '-Dfs.s3a.access.key=XXX' 
'-Dfs.s3a.secret.key=XXX' '-Dfs.s3a.connection.timeout=18' 
'-Dfs.s3a.attempts.maximum=5' '-Dfs.s3a.fast.upload=true' 
'-Dfs.s3a.fast.upload.buffer=array' '-Dfs.s3a.fast.upload.active.blocks=50' 
'-Dfs.s3a.multipart.size=262144000' '-Dfs.s3a.threads.max=500' 
'-Dfs.s3a.threads.keepalivetime=600' 
'-Dfs.s3a.server-side-encryption-algorithm=AES256' -bandwidth 3072 -strategy 
dynamic -m 220 -numListstatusThreads 30 /src/ s3a://bucket/dest
_
17/11/01 12:23:27 INFO mapreduce.Job: Task Id : 
attempt_1497120915913_2792335_m_000165_0, Status : FAILED
Error: java.io.FileNotFoundException: No such file or directory: 
s3a://bucketname/filename

at 
org.apache.hadoop.fs.s3a.S3AFileSystem.getFileStatus(S3AFileSystem.java:1132)
at 
org.apache.hadoop.fs.s3a.S3AFileSystem.getFileStatus(S3AFileSystem.java:78)
at 
org.apache.hadoop.tools.util.DistCpUtils.preserve(DistCpUtils.java:197)
at org.apache.hadoop.tools.mapred.CopyMapper.map(CopyMapper.java:256)
at org.apache.hadoop.tools.mapred.CopyMapper.map(CopyMapper.java:50)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:787)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1912)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)

17/11/01 12:28:32 INFO mapreduce.Job: Task Id : 
attempt_1497120915913_2792335_m_10_0, Status : FAILED
Error: java.io.IOException: File copy failed: hdfs://nameservice1/filena --> 
s3a://cof-prod-lake-card/src/seam/acct_scores/acctmdlscore_card_cobna_anon_vldtd/instnc_id=2016102300/04_0_copy_6
at 
org.apache.hadoop.tools.mapred.CopyMapper.copyFileWithRetry(CopyMapper.java:284)
at org.apache.hadoop.tools.mapred.CopyMapper.map(CopyMapper.java:252)
at org.apache.hadoop.tools.mapred.CopyMapper.map(CopyMapper.java:50)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:787)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1912)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
Caused by: java.io.IOException: Couldn't run retriable-command: Copying 
hdfs://nameservice1/filename to s3a://bucketname/filename
at 
org.apache.hadoop.tools.util.RetriableCommand.execute(RetriableCommand.java:101)
at 
org.apache.hadoop.tools.mapred.CopyMapper.copyFileWithRetry(CopyMapper.java:280)
... 10 more
Caused by: com.cloudera.com.amazonaws.AmazonClientException: Failed to parse 
XML document with handler class 
com.cloudera.com.amazonaws.services.s3.model.transform.XmlResponsesSaxParser$ListBucketHandler
at 
com.cloudera.com.amazonaws.services.s3.model.transform.XmlResponsesSaxParser.parseXmlInputStream(XmlResponsesSaxParser.java:164)
at 
com.cloudera.com.amazonaws.services.s3.model.transform.XmlResponsesSaxParser.parseListBucketObjectsResponse(XmlResponsesSaxParser.java:299)
at 
com.cloudera.com.amazonaws.services.s3.model.transform.Unmarshallers$ListObjectsUnmarshaller.unmarshall(Unmarshallers.java:77)
at 
com.cloudera.com.amazonaws.services.s3.model.transform.Unmarshallers$ListObjectsUnmarshaller.unmarshall(Unmarshallers.java:74)
at 
com.cloudera.com.amazonaws.services.s3.internal.S3XmlResponseHandler.handle(S3XmlResponseHandler.java:62)
at 

[jira] [Commented] (HDFS-11661) GetContentSummary uses excessive amounts of memory

2017-11-01 Thread Xiao Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-11661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16234342#comment-16234342
 ] 

Xiao Chen commented on HDFS-11661:
--

{quote}
There are more bugs related to snapshots and content summary and quota usage 
discrepencies. I almost have a patch ready that optimizes content summary and 
appears to fix the snapshot issues.
{quote}
Hi [~daryn] and [~shahrs87],
Just wanted to check if this was eventually done? And could you share the jira 
if so?

Thanks!

> GetContentSummary uses excessive amounts of memory
> --
>
> Key: HDFS-11661
> URL: https://issues.apache.org/jira/browse/HDFS-11661
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.8.0, 3.0.0-alpha2
>Reporter: Nathan Roberts
>Assignee: Wei-Chiu Chuang
>Priority: Blocker
> Fix For: 2.9.0, 3.0.0-alpha4, 2.8.2
>
> Attachments: HDFS-11661.001.patch, HDFs-11661.002.patch, Heap 
> growth.png
>
>
> ContentSummaryComputationContext::nodeIncluded() is being used to keep track 
> of all INodes visited during the current content summary calculation. This 
> can be all of the INodes in the filesystem, making for a VERY large hash 
> table. This simply won't work on large filesystems. 
> We noticed this after upgrading a namenode with ~100Million filesystem 
> objects was spending significantly more time in GC. Fortunately this system 
> had some memory breathing room, other clusters we have will not run with this 
> additional demand on memory.
> This was added as part of HDFS-10797 as a way of keeping track of INodes that 
> have already been accounted for - to avoid double counting.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-12739) Add Support for SCM --init command

2017-11-01 Thread Shashikant Banerjee (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-12739?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee updated HDFS-12739:
---
Attachment: HDFS-12739-HDFS-7240.004.patch

[~linyiqun], Thanks for the review comments.
The patch addresses Review comments . Please have a look.

> Add Support for SCM --init command
> --
>
> Key: HDFS-12739
> URL: https://issues.apache.org/jira/browse/HDFS-12739
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: HDFS-7240
>Affects Versions: HDFS-7240
>Reporter: Shashikant Banerjee
>Assignee: Shashikant Banerjee
>Priority: Major
> Attachments: HDFS-12739-HDFS-7240.001.patch, 
> HDFS-12739-HDFS-7240.002.patch, HDFS-12739-HDFS-7240.003.patch, 
> HDFS-12739-HDFS-7240.004.patch
>
>
> SCM --init command will generate cluster ID and persist it locally. The same 
> cluster Id will be shared with KSM and the datanodes. IF the cluster Id is 
> already available in the locally available version file, it will just read 
> the cluster Id .



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-12682) ECAdmin -listPolicies will always show SystemErasureCodingPolicies state as DISABLED

2017-11-01 Thread Xiao Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-12682?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Chen updated HDFS-12682:
-
Attachment: HDFS-12682.08.patch

Thanks for the review Rakesh! Patch 8 to address all the comments.

> ECAdmin -listPolicies will always show SystemErasureCodingPolicies state as 
> DISABLED
> 
>
> Key: HDFS-12682
> URL: https://issues.apache.org/jira/browse/HDFS-12682
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: erasure-coding
>Reporter: Xiao Chen
>Assignee: Xiao Chen
>Priority: Blocker
>  Labels: hdfs-ec-3.0-must-do
> Attachments: HDFS-12682.01.patch, HDFS-12682.02.patch, 
> HDFS-12682.03.patch, HDFS-12682.04.patch, HDFS-12682.05.patch, 
> HDFS-12682.06.patch, HDFS-12682.07.patch, HDFS-12682.08.patch
>
>
> On a real cluster, {{hdfs ec -listPolicies}} will always show policy state as 
> DISABLED.
> {noformat}
> [hdfs@nightly6x-1 root]$ hdfs ec -listPolicies
> Erasure Coding Policies:
> ErasureCodingPolicy=[Name=RS-10-4-1024k, Schema=[ECSchema=[Codec=rs, 
> numDataUnits=10, numParityUnits=4]], CellSize=1048576, Id=5, State=DISABLED]
> ErasureCodingPolicy=[Name=RS-3-2-1024k, Schema=[ECSchema=[Codec=rs, 
> numDataUnits=3, numParityUnits=2]], CellSize=1048576, Id=2, State=DISABLED]
> ErasureCodingPolicy=[Name=RS-6-3-1024k, Schema=[ECSchema=[Codec=rs, 
> numDataUnits=6, numParityUnits=3]], CellSize=1048576, Id=1, State=DISABLED]
> ErasureCodingPolicy=[Name=RS-LEGACY-6-3-1024k, 
> Schema=[ECSchema=[Codec=rs-legacy, numDataUnits=6, numParityUnits=3]], 
> CellSize=1048576, Id=3, State=DISABLED]
> ErasureCodingPolicy=[Name=XOR-2-1-1024k, Schema=[ECSchema=[Codec=xor, 
> numDataUnits=2, numParityUnits=1]], CellSize=1048576, Id=4, State=DISABLED]
> [hdfs@nightly6x-1 root]$ hdfs ec -getPolicy -path /ecec
> XOR-2-1-1024k
> {noformat}
> This is because when [deserializing 
> protobuf|https://github.com/apache/hadoop/blob/branch-3.0.0-beta1/hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/protocolPB/PBHelperClient.java#L2942],
>  the static instance of [SystemErasureCodingPolicies 
> class|https://github.com/apache/hadoop/blob/branch-3.0.0-beta1/hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/protocol/SystemErasureCodingPolicies.java#L101]
>  is first checked, and always returns the cached policy objects, which are 
> created by default with state=DISABLED.
> All the existing unit tests pass, because that static instance that the 
> client (e.g. ECAdmin) reads in unit test is updated by NN. :)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-12681) Fold HdfsLocatedFileStatus into HdfsFileStatus

2017-11-01 Thread Chris Douglas (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-12681?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16234304#comment-16234304
 ] 

Chris Douglas commented on HDFS-12681:
--

Test failures are unrelated to the patch; all are due to resource exhaustion. 
Checkstyle errors are from the builder pattern.

> Fold HdfsLocatedFileStatus into HdfsFileStatus
> --
>
> Key: HDFS-12681
> URL: https://issues.apache.org/jira/browse/HDFS-12681
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Chris Douglas
>Priority: Minor
> Attachments: HDFS-12681.00.patch, HDFS-12681.01.patch, 
> HDFS-12681.02.patch, HDFS-12681.03.patch, HDFS-12681.04.patch, 
> HDFS-12681.05.patch, HDFS-12681.06.patch, HDFS-12681.07.patch, 
> HDFS-12681.08.patch, HDFS-12681.09.patch, HDFS-12681.10.patch
>
>
> {{HdfsLocatedFileStatus}} is a subtype of {{HdfsFileStatus}}, but not of 
> {{LocatedFileStatus}}. Conversion requires copying common fields and shedding 
> unknown data. It would be cleaner and sufficient for {{HdfsFileStatus}} to 
> extend {{LocatedFileStatus}}.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-12750) Ozone: Fix TestStorageContainerManager#testBlockDeletionTransactions

2017-11-01 Thread Xiaoyu Yao (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-12750?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16234261#comment-16234261
 ] 

Xiaoyu Yao commented on HDFS-12750:
---

Thanks [~cheersyang] for the commit. This is a very low risk unit test only 
change given you have done all the local verification, that should be OK. 

> Ozone: Fix TestStorageContainerManager#testBlockDeletionTransactions
> 
>
> Key: HDFS-12750
> URL: https://issues.apache.org/jira/browse/HDFS-12750
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Affects Versions: HDFS-7240
>Reporter: Xiaoyu Yao
>Assignee: Xiaoyu Yao
>Priority: Major
> Fix For: HDFS-7240
>
> Attachments: HDFS-12750-HDFS-7240.001.patch
>
>
> Some of the newly added ozone tests need to shutdown the MiniOzoneCluster so 
> that the metadata db and test files are cleaned up for subsequent tests.
> TestStorageContainerManager#testBlockDeletionTransactions



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-12350) Support meta tags in configs

2017-11-01 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-12350?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16234240#comment-16234240
 ] 

Steve Loughran commented on HDFS-12350:
---

This is a change to hadoop-common. It should have been filed and discussed 
there. Please don't make changes to hadoop-common in hdfs patches without some 
publicity. thanks

> Support meta tags in configs
> 
>
> Key: HDFS-12350
> URL: https://issues.apache.org/jira/browse/HDFS-12350
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Ajay Kumar
>Assignee: Ajay Kumar
>Priority: Major
> Fix For: 3.1.0
>
> Attachments: HDFS-12350.01.patch, HDFS-12350.02.patch, 
> HDFS-12350.03.patch
>
>
> We should tag the hadoop/hdfs config so that we can retrieve properties by 
> there usage/application like PERFORMANCE, NAMENODE etc. Right now we don't 
> have an option available to group or list related properties together. 
> Grouping properties through some restricted set of Meta tags and then 
> exposing them in Configuration class will be useful for end users.
> For example, here is an config file with tags.
> {code}
> 
>
>   dfs.namenode.servicerpc-bind-host
>   localhost
>REQUIRED 
>
>
>   
>   dfs.namenode.fs-limits.min-block-size
>1048576 
>PERFORMANCE,REQUIRED
>
>  
>   dfs.namenode.logging.level
>   Info
>   HDFS, DEBUG 
>
>   
> 
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-12219) Javadoc for FSNamesystem#getMaxObjects is incorrect

2017-11-01 Thread Erik Krogen (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-12219?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Erik Krogen updated HDFS-12219:
---
Fix Version/s: (was: 3.1.0)

> Javadoc for FSNamesystem#getMaxObjects is incorrect
> ---
>
> Key: HDFS-12219
> URL: https://issues.apache.org/jira/browse/HDFS-12219
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Erik Krogen
>Assignee: Erik Krogen
> Fix For: 3.0.0
>
> Attachments: HDFS-12219.000.patch
>
>
> The Javadoc states that this represents the total number of objects in the 
> system, but it really represents the maximum allowed number of objects (as 
> correctly stated on the Javadoc for {{FSNamesystemMBean#getMaxObjects()}}).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-12219) Javadoc for FSNamesystem#getMaxObjects is incorrect

2017-11-01 Thread Erik Krogen (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-12219?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Erik Krogen updated HDFS-12219:
---
Fix Version/s: 3.1.0

> Javadoc for FSNamesystem#getMaxObjects is incorrect
> ---
>
> Key: HDFS-12219
> URL: https://issues.apache.org/jira/browse/HDFS-12219
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Erik Krogen
>Assignee: Erik Krogen
> Fix For: 3.0.0, 3.1.0
>
> Attachments: HDFS-12219.000.patch
>
>
> The Javadoc states that this represents the total number of objects in the 
> system, but it really represents the maximum allowed number of objects (as 
> correctly stated on the Javadoc for {{FSNamesystemMBean#getMaxObjects()}}).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-11096) Support rolling upgrade between 2.x and 3.x

2017-11-01 Thread Sean Mackrory (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-11096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16234197#comment-16234197
 ] 

Sean Mackrory commented on HDFS-11096:
--

>From an HDFS standpoint, definitely - I've run many successful rolling upgrade 
>and distcp-over-webhdfs tests this week and updated the patch. The only thing 
>remaining is to get automation itself in place after this is committed.

I looked into the YARN issues. I'm still seeing very similar symptoms to the 
YARN-6457 issue mentioned above in both branch-3.0 and trunk. In trunk I'm also 
seeing this:

{quote}
17/10/31 23:05:49 INFO security.AMRMTokenSecretManager: Creating password for 
appattempt_1509490231144_0628_02
17/10/31 23:05:49 INFO amlauncher.AMLauncher: Error launching 
appattempt_1509490231144_0628_02. Got exception: 
org.apache.hadoop.security.token.SecretManager$InvalidToken: Invalid container 
token used for starting container on : container-5.docker:35151
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.verifyAndGetContainerTokenIdentifier(ContainerManagerImpl.java:974)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.startContainers(ContainerManagerImpl.java:789)
at 
org.apache.hadoop.yarn.api.impl.pb.service.ContainerManagementProtocolPBServiceImpl.startContainers(ContainerManagementProtocolPBServiceImpl.java:70)
at 
org.apache.hadoop.yarn.proto.ContainerManagementProtocol$ContainerManagementProtocolService$2.callBlockingMethod(ContainerManagementProtocol.java:127)
at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:447)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:989)
at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:845)
at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:788)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1807)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2455)

at sun.reflect.GeneratedConstructorAccessor70.newInstance(Unknown 
Source)
at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at 
org.apache.hadoop.yarn.ipc.RPCUtil.instantiateException(RPCUtil.java:53)
at 
org.apache.hadoop.yarn.ipc.RPCUtil.instantiateIOException(RPCUtil.java:80)
at 
org.apache.hadoop.yarn.ipc.RPCUtil.unwrapAndThrowException(RPCUtil.java:119)
at 
org.apache.hadoop.yarn.api.impl.pb.client.ContainerManagementProtocolPBClientImpl.startContainers(ContainerManagementProtocolPBClientImpl.java:131)
at sun.reflect.GeneratedMethodAccessor85.invoke(Unknown Source)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:422)
at 
org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeMethod(RetryInvocationHandler.java:165)
at 
org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invoke(RetryInvocationHandler.java:157)
at 
org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeOnce(RetryInvocationHandler.java:95)
at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:359)
at com.sun.proxy.$Proxy89.startContainers(Unknown Source)
at 
org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher.launch(AMLauncher.java:123)
at 
org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher.run(AMLauncher.java:304)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: 
org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken):
 Invalid container token used for starting container on : 
container-5.docker:35151
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.verifyAndGetContainerTokenIdentifier(ContainerManagerImpl.java:974)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.startContainers(ContainerManagerImpl.java:789)
at 
org.apache.hadoop.yarn.api.impl.pb.service.ContainerManagementProtocolPBServiceImpl.startContainers(ContainerManagementProtocolPBServiceImpl.java:70)
at 

[jira] [Comment Edited] (HDFS-11096) Support rolling upgrade between 2.x and 3.x

2017-11-01 Thread Sean Mackrory (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-11096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16234197#comment-16234197
 ] 

Sean Mackrory edited comment on HDFS-11096 at 11/1/17 3:16 PM:
---

>From an HDFS standpoint, definitely - I've run many successful rolling upgrade 
>and distcp-over-webhdfs tests this week and updated the patch. The only thing 
>remaining is to get automation itself in place after this is committed.

I looked into the YARN issues. I'm still seeing very similar symptoms to the 
YARN-6457 issue mentioned above in both branch-3.0 and trunk. In trunk I'm also 
seeing this:

{code}
17/10/31 23:05:49 INFO security.AMRMTokenSecretManager: Creating password for 
appattempt_1509490231144_0628_02
17/10/31 23:05:49 INFO amlauncher.AMLauncher: Error launching 
appattempt_1509490231144_0628_02. Got exception: 
org.apache.hadoop.security.token.SecretManager$InvalidToken: Invalid container 
token used for starting container on : container-5.docker:35151
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.verifyAndGetContainerTokenIdentifier(ContainerManagerImpl.java:974)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.startContainers(ContainerManagerImpl.java:789)
at 
org.apache.hadoop.yarn.api.impl.pb.service.ContainerManagementProtocolPBServiceImpl.startContainers(ContainerManagementProtocolPBServiceImpl.java:70)
at 
org.apache.hadoop.yarn.proto.ContainerManagementProtocol$ContainerManagementProtocolService$2.callBlockingMethod(ContainerManagementProtocol.java:127)
at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:447)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:989)
at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:845)
at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:788)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1807)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2455)

at sun.reflect.GeneratedConstructorAccessor70.newInstance(Unknown 
Source)
at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at 
org.apache.hadoop.yarn.ipc.RPCUtil.instantiateException(RPCUtil.java:53)
at 
org.apache.hadoop.yarn.ipc.RPCUtil.instantiateIOException(RPCUtil.java:80)
at 
org.apache.hadoop.yarn.ipc.RPCUtil.unwrapAndThrowException(RPCUtil.java:119)
at 
org.apache.hadoop.yarn.api.impl.pb.client.ContainerManagementProtocolPBClientImpl.startContainers(ContainerManagementProtocolPBClientImpl.java:131)
at sun.reflect.GeneratedMethodAccessor85.invoke(Unknown Source)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:422)
at 
org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeMethod(RetryInvocationHandler.java:165)
at 
org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invoke(RetryInvocationHandler.java:157)
at 
org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeOnce(RetryInvocationHandler.java:95)
at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:359)
at com.sun.proxy.$Proxy89.startContainers(Unknown Source)
at 
org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher.launch(AMLauncher.java:123)
at 
org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher.run(AMLauncher.java:304)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: 
org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken):
 Invalid container token used for starting container on : 
container-5.docker:35151
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.verifyAndGetContainerTokenIdentifier(ContainerManagerImpl.java:974)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.startContainers(ContainerManagerImpl.java:789)
at 
org.apache.hadoop.yarn.api.impl.pb.service.ContainerManagementProtocolPBServiceImpl.startContainers(ContainerManagementProtocolPBServiceImpl.java:70)
at 

[jira] [Updated] (HDFS-12708) Fix hdfs haadmin usage

2017-11-01 Thread fang zhenyi (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-12708?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

fang zhenyi updated HDFS-12708:
---
Attachment: (was: HDFS-15004.001.patch)

> Fix  hdfs haadmin usage
> ---
>
> Key: HDFS-12708
> URL: https://issues.apache.org/jira/browse/HDFS-12708
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 3.0.0-alpha4
>Reporter: fang zhenyi
>Assignee: fang zhenyi
>Priority: Minor
> Fix For: 3.1.0
>
> Attachments: HDFS-12708.001.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-12708) Fix hdfs haadmin usage

2017-11-01 Thread fang zhenyi (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-12708?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

fang zhenyi updated HDFS-12708:
---
Attachment: HDFS-15004.001.patch

> Fix  hdfs haadmin usage
> ---
>
> Key: HDFS-12708
> URL: https://issues.apache.org/jira/browse/HDFS-12708
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 3.0.0-alpha4
>Reporter: fang zhenyi
>Assignee: fang zhenyi
>Priority: Minor
> Fix For: 3.1.0
>
> Attachments: HDFS-12708.001.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-12711) deadly hdfs test

2017-11-01 Thread Allen Wittenauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-12711?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer updated HDFS-12711:

Attachment: fakepatch.branch-2.txt

> deadly hdfs test
> 
>
> Key: HDFS-12711
> URL: https://issues.apache.org/jira/browse/HDFS-12711
> Project: Hadoop HDFS
>  Issue Type: Test
>Affects Versions: 2.9.0, 2.8.2
>Reporter: Allen Wittenauer
>Priority: Critical
> Attachments: HDFS-12711.branch-2.00.patch, fakepatch.branch-2.txt
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-10323) transient deleteOnExit failure in ViewFileSystem due to close() ordering

2017-11-01 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10323?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16234060#comment-16234060
 ] 

Hadoop QA commented on HDFS-10323:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
16s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 17m 
58s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 18m 
25s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
58s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
43s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
13m 26s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
30s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
53s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
47s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 12m 
13s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 12m 
13s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 39s{color} | {color:orange} hadoop-common-project/hadoop-common: The patch 
generated 4 new + 76 unchanged - 0 fixed = 80 total (was 76) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m  
6s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
10m  1s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
51s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
55s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  8m 
56s{color} | {color:green} hadoop-common in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
33s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 91m  1s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:5b98639 |
| JIRA Issue | HDFS-10323 |
| GITHUB PR | https://github.com/apache/hadoop/pull/287 |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  shadedclient  findbugs  checkstyle  |
| uname | Linux ee35e48afebf 3.13.0-119-generic #166-Ubuntu SMP Wed May 3 
12:18:55 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 56b88b0 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_131 |
| findbugs | v3.1.0-RC1 |
| checkstyle | 
https://builds.apache.org/job/PreCommit-HDFS-Build/21913/artifact/out/diff-checkstyle-hadoop-common-project_hadoop-common.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-HDFS-Build/21913/testReport/ |
| modules | C: hadoop-common-project/hadoop-common U: 
hadoop-common-project/hadoop-common |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/21913/console |
| Powered by | Apache 

[jira] [Updated] (HDFS-11807) libhdfs++: Get minidfscluster tests running under valgrind

2017-11-01 Thread Anatoli Shein (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-11807?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anatoli Shein updated HDFS-11807:
-
Attachment: HDFS-11807.HDFS-8707.004.patch

Whitespace fix

> libhdfs++: Get minidfscluster tests running under valgrind
> --
>
> Key: HDFS-11807
> URL: https://issues.apache.org/jira/browse/HDFS-11807
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs-client
>Reporter: James Clampffer
>Assignee: Anatoli Shein
>Priority: Major
> Attachments: HDFS-11807.HDFS-8707.000.patch, 
> HDFS-11807.HDFS-8707.001.patch, HDFS-11807.HDFS-8707.002.patch, 
> HDFS-11807.HDFS-8707.003.patch, HDFS-11807.HDFS-8707.004.patch
>
>
> The gmock based unit tests generally don't expose race conditions and memory 
> stomps.  A good way to expose these is running libhdfs++ stress tests and 
> tools under valgrind and pointing them at a real cluster.  Right now the CI 
> tools don't do that so bugs occasionally slip in and aren't caught until they 
> cause trouble in applications that use libhdfs++ for HDFS access.
> The reason the minidfscluster tests don't run under valgrind is because the 
> GC and JIT compiler in the embedded JVM do things that look like errors to 
> valgrind.  I'd like to have these tests do some basic setup and then fork 
> into two processes: one for the minidfscluster stuff and one for the 
> libhdfs++ client test.  A small amount of shared memory can be used to 
> provide a place for the minidfscluster to stick the hdfsBuilder object that 
> the client needs to get info about which port to connect to.  Can also stick 
> a condition variable there to let the minidfscluster know when it can shut 
> down.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-12443) Ozone: Improve SCM block deletion throttling algorithm

2017-11-01 Thread Weiwei Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-12443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16233972#comment-16233972
 ] 

Weiwei Yang commented on HDFS-12443:


Hi [~linyiqun]

bq. add datanode info into delLog

This might not be a good option. It is not flexible if we have fixed 
containerName to datanode mapping, because a container might be replicated to 
another nodes if the original DN is lost.

bq. Scan the entire delLog from the beginning to end, getting blocks list info 
for each node. If one node reach maximum container number, then its record will 
be skipped.

I think this approach is the best for now. How you plan to define the max 
number of containers for each node? Actually I am fine with a fixed number, e.g 
50 for simplify the problem.

bq.  If not, keep scanning log until it reach the maximum value.

Yes, good idea. I think we need a in-memory data structure to handle this. It 
maintains a map, key is datanodeID and value is the a list of 
{{DeletedBlocksTransaction}}, e.g DatanodeBlockDeletionTransactions. Each 
datanodeID is bounded with a max size for the length of the 
{{DeletedBlocksTransaction}}, it behaves like:
# a KV entry is full once the value reaches the max length, add more element to 
this datanodeID will be skipped
# the map is full only when all KV entries are full
# each value has no duplicate element distinguished by the TXID

and each time we ensure the DelLog at most can be scanned once. Suggest to 
write a separate test case to test such structure, to ensure the behavior is 
well tested. then the implementation in SCMBlockDeletingService will be 
straightforward.

Thanks for driving this forward, appreciate.

> Ozone: Improve SCM block deletion throttling algorithm 
> ---
>
> Key: HDFS-12443
> URL: https://issues.apache.org/jira/browse/HDFS-12443
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: ozone, scm
>Reporter: Weiwei Yang
>Assignee: Yiqun Lin
>Priority: Major
>  Labels: OzonePostMerge
> Attachments: HDFS-12443-HDFS-7240.001.patch, 
> HDFS-12443-HDFS-7240.002.patch, HDFS-12443-HDFS-7240.002.patch, 
> HDFS-12443-SCM-blockdeletion-throttle.pdf
>
>
> Currently SCM scans delLog to send deletion transactions to datanode 
> periodically, the throttling algorithm is simple, it scans at most 
> {{BLOCK_DELETE_TX_PER_REQUEST_LIMIT}} (by default 50) at a time. This is 
> non-optimal, worst case it might cache 50 TXs for 50 different DNs so each DN 
> will only get 1 TX to proceed in an interval, this will make the deletion 
> slow. An improvement to this is to make this throttling by datanode, e.g 50 
> TXs per datanode per interval.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-10323) transient deleteOnExit failure in ViewFileSystem due to close() ordering

2017-11-01 Thread Wenxin He (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10323?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16233962#comment-16233962
 ] 

Wenxin He edited comment on HDFS-10323 at 11/1/17 11:51 AM:


I find this problem too when using spark. And undeleted files leading to HDFS 
cluster no space left.

So according to [~bpodgursky]'s suggestion and [~cmccabe]'s comment
bq. 2) FileSystem.Cache.closeAll could first close all ViewFileSystems, then 
all other FileSystems.

I submit 001 patch to fix the problem:
In this patch 
# FileSystem.Cache.map changed to {color:red}LinkedHashmap{color} in which fs 
are stored in {color:red}insertion order{color}.
When ViewFileSystem is initialized, DistributedFileSystem is first stored in 
FileSystem.Cache.map and then ViewFileSystem.
# When FileSystem.Cache.closeAll invoke, all cached fs {color:red}close 
inversely{color}, which like LiFO model. So ViewFileSystem close before its 
referred DistributedFileSystems, and all deleteOnExit files will be deleted 
safely before DistributedFileSystems close.


was (Author: vincent he):
I find this problem too when using spark. And undeleted files leading to HDFS 
cluster no space left.

So according to [~bpodgursky]'s suggestion and [~cmccabe]'s comment
bq. 2) FileSystem.Cache.closeAll could first close all ViewFileSystems, then 
all other FileSystems.

I submit 001 patch to fix the problem:
In this patch FileSystem.Cache.map changed to LinkedHashmap in which fs are 
stored in insertion order.
When ViewFileSystem is initialized, DistributedFileSystem is first stored in 
FileSystem.Cache.map and then ViewFileSystem.
When FileSystem.Cache.closeAll invoke, all cached fs close inversely, which 
like LiFO model. So ViewFileSystem close before its referred 
DistributedFileSystems, and all deleteOnExit files will be deleted safely 
before DistributedFileSystems close.

> transient deleteOnExit failure in ViewFileSystem due to close() ordering
> 
>
> Key: HDFS-10323
> URL: https://issues.apache.org/jira/browse/HDFS-10323
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: federation
>Affects Versions: 2.6.0, 2.7.4, 3.0.0-beta1
>Reporter: Ben Podgursky
>Assignee: Wenxin He
>Priority: Major
> Attachments: HDFS-10323.001.patch
>
>
> After switching to using a ViewFileSystem, fs.deleteOnExit calls began 
> failing frequently, displaying this error on failure:
> 16/04/21 13:56:24 INFO fs.FileSystem: Ignoring failure to deleteOnExit for 
> path /tmp/delete_on_exit_test_123/a438afc0-a3ca-44f1-9eb5-010ca4a62d84
> Since FileSystem eats the error involved, it is difficult to be sure what the 
> error is, but I believe what is happening is that the ViewFileSystem’s child 
> FileSystems are being close()’d before the ViewFileSystem, due to the random 
> order ClientFinalizer closes FileSystems; so then when the ViewFileSystem 
> tries to close(), it tries to forward the delete() calls to the appropriate 
> child, and fails because the child is already closed.
> I’m unsure how to write an actual Hadoop test to reproduce this, since it 
> involves testing behavior on actual JVM shutdown.  However, I can verify that 
> while
> {code:java}
> fs.deleteOnExit(randomTemporaryDir);

> {code}
> regularly (~50% of the time) fails to delete the temporary directory, this 
> code:
> {code:java}
> ViewFileSystem viewfs = (ViewFileSystem)fs1;

> for (FileSystem fileSystem : viewfs.getChildFileSystems()) {
  
>   if (fileSystem.exists(randomTemporaryDir)) {

> fileSystem.deleteOnExit(randomTemporaryDir);
  
>   }
> 
}

> {code}
> always successfully deletes the temporary directory on JVM shutdown.
> I am not very familiar with FileSystem inheritance hierarchies, but at first 
> glance I see two ways to fix this behavior:
> 1)  ViewFileSystem could forward deleteOnExit calls to the appropriate child 
> FileSystem, and not hold onto that path itself.
> 2) FileSystem.Cache.closeAll could first close all ViewFileSystems, then all 
> other FileSystems.  
> Would appreciate any thoughts of whether this seems accurate, and thoughts 
> (or help) on the fix.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-10323) transient deleteOnExit failure in ViewFileSystem due to close() ordering

2017-11-01 Thread Wenxin He (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10323?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenxin He updated HDFS-10323:
-
Attachment: HDFS-10323.001.patch

> transient deleteOnExit failure in ViewFileSystem due to close() ordering
> 
>
> Key: HDFS-10323
> URL: https://issues.apache.org/jira/browse/HDFS-10323
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: federation
>Affects Versions: 2.6.0, 2.7.4, 3.0.0-beta1
>Reporter: Ben Podgursky
>Assignee: Wenxin He
>Priority: Major
> Attachments: HDFS-10323.001.patch
>
>
> After switching to using a ViewFileSystem, fs.deleteOnExit calls began 
> failing frequently, displaying this error on failure:
> 16/04/21 13:56:24 INFO fs.FileSystem: Ignoring failure to deleteOnExit for 
> path /tmp/delete_on_exit_test_123/a438afc0-a3ca-44f1-9eb5-010ca4a62d84
> Since FileSystem eats the error involved, it is difficult to be sure what the 
> error is, but I believe what is happening is that the ViewFileSystem’s child 
> FileSystems are being close()’d before the ViewFileSystem, due to the random 
> order ClientFinalizer closes FileSystems; so then when the ViewFileSystem 
> tries to close(), it tries to forward the delete() calls to the appropriate 
> child, and fails because the child is already closed.
> I’m unsure how to write an actual Hadoop test to reproduce this, since it 
> involves testing behavior on actual JVM shutdown.  However, I can verify that 
> while
> {code:java}
> fs.deleteOnExit(randomTemporaryDir);

> {code}
> regularly (~50% of the time) fails to delete the temporary directory, this 
> code:
> {code:java}
> ViewFileSystem viewfs = (ViewFileSystem)fs1;

> for (FileSystem fileSystem : viewfs.getChildFileSystems()) {
  
>   if (fileSystem.exists(randomTemporaryDir)) {

> fileSystem.deleteOnExit(randomTemporaryDir);
  
>   }
> 
}

> {code}
> always successfully deletes the temporary directory on JVM shutdown.
> I am not very familiar with FileSystem inheritance hierarchies, but at first 
> glance I see two ways to fix this behavior:
> 1)  ViewFileSystem could forward deleteOnExit calls to the appropriate child 
> FileSystem, and not hold onto that path itself.
> 2) FileSystem.Cache.closeAll could first close all ViewFileSystems, then all 
> other FileSystems.  
> Would appreciate any thoughts of whether this seems accurate, and thoughts 
> (or help) on the fix.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-10323) transient deleteOnExit failure in ViewFileSystem due to close() ordering

2017-11-01 Thread Wenxin He (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10323?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenxin He updated HDFS-10323:
-
Affects Version/s: 2.7.4
   3.0.0-beta1
   Status: Patch Available  (was: Open)

I find this problem too when using spark. And undeleted files leading to HDFS 
cluster no space left.

So according to [~bpodgursky]'s suggestion and [~cmccabe]'s comment
bq. 2) FileSystem.Cache.closeAll could first close all ViewFileSystems, then 
all other FileSystems.

I submit 001 patch to fix the problem:
In this patch FileSystem.Cache.map changed to LinkedHashmap in which fs are 
stored in insertion order.
When ViewFileSystem is initialized, DistributedFileSystem is first stored in 
FileSystem.Cache.map and then ViewFileSystem.
When FileSystem.Cache.closeAll invoke, all cached fs close inversely, which 
like LiFO model. So ViewFileSystem close before its referred 
DistributedFileSystems, and all deleteOnExit files will be deleted safely 
before DistributedFileSystems close.

> transient deleteOnExit failure in ViewFileSystem due to close() ordering
> 
>
> Key: HDFS-10323
> URL: https://issues.apache.org/jira/browse/HDFS-10323
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: federation
>Affects Versions: 3.0.0-beta1, 2.7.4, 2.6.0
>Reporter: Ben Podgursky
>Assignee: Wenxin He
>Priority: Major
>
> After switching to using a ViewFileSystem, fs.deleteOnExit calls began 
> failing frequently, displaying this error on failure:
> 16/04/21 13:56:24 INFO fs.FileSystem: Ignoring failure to deleteOnExit for 
> path /tmp/delete_on_exit_test_123/a438afc0-a3ca-44f1-9eb5-010ca4a62d84
> Since FileSystem eats the error involved, it is difficult to be sure what the 
> error is, but I believe what is happening is that the ViewFileSystem’s child 
> FileSystems are being close()’d before the ViewFileSystem, due to the random 
> order ClientFinalizer closes FileSystems; so then when the ViewFileSystem 
> tries to close(), it tries to forward the delete() calls to the appropriate 
> child, and fails because the child is already closed.
> I’m unsure how to write an actual Hadoop test to reproduce this, since it 
> involves testing behavior on actual JVM shutdown.  However, I can verify that 
> while
> {code:java}
> fs.deleteOnExit(randomTemporaryDir);

> {code}
> regularly (~50% of the time) fails to delete the temporary directory, this 
> code:
> {code:java}
> ViewFileSystem viewfs = (ViewFileSystem)fs1;

> for (FileSystem fileSystem : viewfs.getChildFileSystems()) {
  
>   if (fileSystem.exists(randomTemporaryDir)) {

> fileSystem.deleteOnExit(randomTemporaryDir);
  
>   }
> 
}

> {code}
> always successfully deletes the temporary directory on JVM shutdown.
> I am not very familiar with FileSystem inheritance hierarchies, but at first 
> glance I see two ways to fix this behavior:
> 1)  ViewFileSystem could forward deleteOnExit calls to the appropriate child 
> FileSystem, and not hold onto that path itself.
> 2) FileSystem.Cache.closeAll could first close all ViewFileSystems, then all 
> other FileSystems.  
> Would appreciate any thoughts of whether this seems accurate, and thoughts 
> (or help) on the fix.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-12682) ECAdmin -listPolicies will always show SystemErasureCodingPolicies state as DISABLED

2017-11-01 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-12682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16233952#comment-16233952
 ] 

Hadoop QA commented on HDFS-12682:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
39s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 6 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
51s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 16m 
14s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 13m 
23s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  2m 
 5s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  2m 
22s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
15m 28s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  4m 
36s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
42s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
16s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  2m 
 4s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 12m 
24s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 12m 
24s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  2m 
16s{color} | {color:green} root: The patch generated 0 new + 647 unchanged - 2 
fixed = 647 total (was 649) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  2m 
37s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 1s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
10m 39s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  4m 
41s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
42s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  1m 
20s{color} | {color:green} hadoop-hdfs-client in the patch passed. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red}123m 30s{color} 
| {color:red} hadoop-hdfs in the patch failed. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red}120m 20s{color} 
| {color:red} hadoop-mapreduce-client-jobclient in the patch failed. {color} |
| {color:red}-1{color} | {color:red} asflicense {color} | {color:red}  0m 
43s{color} | {color:red} The patch generated 1 ASF License warnings. {color} |
| {color:black}{color} | {color:black} {color} | {color:black}338m 25s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | 
hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureReporting |
|   | hadoop.hdfs.TestEncryptionZones |
|   | 
hadoop.hdfs.server.blockmanagement.TestReconstructStripedBlocksWithRackAwareness
 |
|   | hadoop.hdfs.server.datanode.TestDataNodeUUID |
| Timed out junit tests | org.apache.hadoop.mapred.pipes.TestPipeApplication |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:5b98639 |
| JIRA Issue | HDFS-12682 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12895124/HDFS-12682.07.patch |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  

[jira] [Updated] (HDFS-12750) Ozone: Fix TestStorageContainerManager#testBlockDeletionTransactions

2017-11-01 Thread Weiwei Yang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-12750?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Weiwei Yang updated HDFS-12750:
---
   Resolution: Fixed
 Hadoop Flags: Reviewed
Fix Version/s: HDFS-7240
   Status: Resolved  (was: Patch Available)

> Ozone: Fix TestStorageContainerManager#testBlockDeletionTransactions
> 
>
> Key: HDFS-12750
> URL: https://issues.apache.org/jira/browse/HDFS-12750
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Affects Versions: HDFS-7240
>Reporter: Xiaoyu Yao
>Assignee: Xiaoyu Yao
>Priority: Major
> Fix For: HDFS-7240
>
> Attachments: HDFS-12750-HDFS-7240.001.patch
>
>
> Some of the newly added ozone tests need to shutdown the MiniOzoneCluster so 
> that the metadata db and test files are cleaned up for subsequent tests.
> TestStorageContainerManager#testBlockDeletionTransactions



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-12750) Ozone: Fix TestStorageContainerManager#testBlockDeletionTransactions

2017-11-01 Thread Weiwei Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-12750?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16233838#comment-16233838
 ] 

Weiwei Yang commented on HDFS-12750:


Oops ... I just realized the patch did not trigger a jenkins job while I 
committing it... The patch no longer applies as I have it committed so it 
reported the error just now... I've done verification on my local env before 
(and after) committing it, so the patch was OK. I don't think we should revert 
the patch and do it all over again. Closing this now. But feel free to revert 
and reopen if you disagrees... Apologies again.

> Ozone: Fix TestStorageContainerManager#testBlockDeletionTransactions
> 
>
> Key: HDFS-12750
> URL: https://issues.apache.org/jira/browse/HDFS-12750
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Affects Versions: HDFS-7240
>Reporter: Xiaoyu Yao
>Assignee: Xiaoyu Yao
>Priority: Major
> Attachments: HDFS-12750-HDFS-7240.001.patch
>
>
> Some of the newly added ozone tests need to shutdown the MiniOzoneCluster so 
> that the metadata db and test files are cleaned up for subsequent tests.
> TestStorageContainerManager#testBlockDeletionTransactions



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-12744) More logs when short-circuit read is failed and disabled

2017-11-01 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-12744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16233837#comment-16233837
 ] 

Hudson commented on HDFS-12744:
---

SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #13174 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/13174/])
HDFS-12744. More logs when short-circuit read is failed and disabled. (wwei: 
rev 56b88b06705441f6f171eec7fb2fa77946ca204b)
* (edit) 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataXceiver.java
* (edit) 
hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/client/impl/BlockReaderFactory.java


> More logs when short-circuit read is failed and disabled
> 
>
> Key: HDFS-12744
> URL: https://issues.apache.org/jira/browse/HDFS-12744
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Reporter: Weiwei Yang
>Assignee: Weiwei Yang
>Priority: Major
>  Labels: supportability
> Fix For: 2.9.0, 3.0.0
>
> Attachments: HDFS-12744.001.patch, HDFS-12744.002.patch
>
>
> Short-circuit read (SCR) failed with following error
> {noformat}
> 2017-10-21 16:42:28,024 WARN  
> [B.defaultRpcServer.handler=7,queue=7,port=16020] 
> impl.BlockReaderFactory: BlockReaderFactory(xxx): unknown response code ERROR
> while attempting to set up short-circuit access. Block xxx is not valid
> {noformat}
> then short-circuit read is disabled for *10 minutes* without any warning 
> message given in the log. This causes us spent some more time to figure out 
> why we had a long time window that SCR was not working. Propose to add a 
> warning log (other places already did) to indicate SCR is disabled and some 
> more logging in DN to display what happened.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-10323) transient deleteOnExit failure in ViewFileSystem due to close() ordering

2017-11-01 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10323?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16233836#comment-16233836
 ] 

ASF GitHub Bot commented on HDFS-10323:
---

GitHub user wenxinhe opened a pull request:

https://github.com/apache/hadoop/pull/287

HDFS-10323. transient deleteOnExit failure in ViewFileSystem due to close() 
ordering



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/wenxinhe/hadoop trunk

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/hadoop/pull/287.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #287


commit a8b39e070b09005b2781ee46a9b2f3a09c04246e
Author: wenxinhe 
Date:   2017-11-01T09:05:16Z

HDFS-10323. transient deleteOnExit failure in ViewFileSystem due to close() 
ordering




> transient deleteOnExit failure in ViewFileSystem due to close() ordering
> 
>
> Key: HDFS-10323
> URL: https://issues.apache.org/jira/browse/HDFS-10323
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: federation
>Affects Versions: 2.6.0
>Reporter: Ben Podgursky
>Assignee: Wenxin He
>Priority: Major
>
> After switching to using a ViewFileSystem, fs.deleteOnExit calls began 
> failing frequently, displaying this error on failure:
> 16/04/21 13:56:24 INFO fs.FileSystem: Ignoring failure to deleteOnExit for 
> path /tmp/delete_on_exit_test_123/a438afc0-a3ca-44f1-9eb5-010ca4a62d84
> Since FileSystem eats the error involved, it is difficult to be sure what the 
> error is, but I believe what is happening is that the ViewFileSystem’s child 
> FileSystems are being close()’d before the ViewFileSystem, due to the random 
> order ClientFinalizer closes FileSystems; so then when the ViewFileSystem 
> tries to close(), it tries to forward the delete() calls to the appropriate 
> child, and fails because the child is already closed.
> I’m unsure how to write an actual Hadoop test to reproduce this, since it 
> involves testing behavior on actual JVM shutdown.  However, I can verify that 
> while
> {code:java}
> fs.deleteOnExit(randomTemporaryDir);

> {code}
> regularly (~50% of the time) fails to delete the temporary directory, this 
> code:
> {code:java}
> ViewFileSystem viewfs = (ViewFileSystem)fs1;

> for (FileSystem fileSystem : viewfs.getChildFileSystems()) {
  
>   if (fileSystem.exists(randomTemporaryDir)) {

> fileSystem.deleteOnExit(randomTemporaryDir);
  
>   }
> 
}

> {code}
> always successfully deletes the temporary directory on JVM shutdown.
> I am not very familiar with FileSystem inheritance hierarchies, but at first 
> glance I see two ways to fix this behavior:
> 1)  ViewFileSystem could forward deleteOnExit calls to the appropriate child 
> FileSystem, and not hold onto that path itself.
> 2) FileSystem.Cache.closeAll could first close all ViewFileSystems, then all 
> other FileSystems.  
> Would appreciate any thoughts of whether this seems accurate, and thoughts 
> (or help) on the fix.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-12750) Ozone: Fix TestStorageContainerManager#testBlockDeletionTransactions

2017-11-01 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-12750?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16233831#comment-16233831
 ] 

Hadoop QA commented on HDFS-12750:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m  
0s{color} | {color:blue} Docker mode activated. {color} |
| {color:red}-1{color} | {color:red} patch {color} | {color:red}  0m  5s{color} 
| {color:red} HDFS-12750 does not apply to HDFS-7240. Rebase required? Wrong 
Branch? See https://wiki.apache.org/hadoop/HowToContribute for help. {color} |
\\
\\
|| Subsystem || Report/Notes ||
| JIRA Issue | HDFS-12750 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12895081/HDFS-12750-HDFS-7240.001.patch
 |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/21912/console |
| Powered by | Apache Yetus 0.7.0-SNAPSHOT   http://yetus.apache.org |


This message was automatically generated.



> Ozone: Fix TestStorageContainerManager#testBlockDeletionTransactions
> 
>
> Key: HDFS-12750
> URL: https://issues.apache.org/jira/browse/HDFS-12750
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Affects Versions: HDFS-7240
>Reporter: Xiaoyu Yao
>Assignee: Xiaoyu Yao
>Priority: Major
> Attachments: HDFS-12750-HDFS-7240.001.patch
>
>
> Some of the newly added ozone tests need to shutdown the MiniOzoneCluster so 
> that the metadata db and test files are cleaned up for subsequent tests.
> TestStorageContainerManager#testBlockDeletionTransactions



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-12750) Ozone: Fix TestStorageContainerManager#testBlockDeletionTransactions

2017-11-01 Thread Weiwei Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-12750?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16233817#comment-16233817
 ] 

Weiwei Yang commented on HDFS-12750:


+1, committing the patch now, thanks [~xyao] for fixing this.

> Ozone: Fix TestStorageContainerManager#testBlockDeletionTransactions
> 
>
> Key: HDFS-12750
> URL: https://issues.apache.org/jira/browse/HDFS-12750
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Affects Versions: HDFS-7240
>Reporter: Xiaoyu Yao
>Assignee: Xiaoyu Yao
>Priority: Major
> Attachments: HDFS-12750-HDFS-7240.001.patch
>
>
> Some of the newly added ozone tests need to shutdown the MiniOzoneCluster so 
> that the metadata db and test files are cleaned up for subsequent tests.
> TestStorageContainerManager#testBlockDeletionTransactions



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-11902) [READ] Merge BlockFormatProvider and FileRegionProvider.

2017-11-01 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-11902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16233812#comment-16233812
 ] 

Hadoop QA commented on HDFS-11902:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  4m  
0s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 5 new or modified test 
files. {color} |
|| || || || {color:brown} HDFS-9806 Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  5m 
36s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 15m 
 6s{color} | {color:green} HDFS-9806 passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 13m 
22s{color} | {color:green} HDFS-9806 passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  2m 
14s{color} | {color:green} HDFS-9806 passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
34s{color} | {color:green} HDFS-9806 passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
13m 41s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red}  0m 
34s{color} | {color:red} hadoop-tools/hadoop-fs2img in HDFS-9806 has 1 extant 
Findbugs warnings. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
19s{color} | {color:green} HDFS-9806 passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
16s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
 8s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 12m 
17s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 12m 
17s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
2m  7s{color} | {color:orange} root: The patch generated 9 new + 448 unchanged 
- 11 fixed = 457 total (was 459) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
38s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} xml {color} | {color:green}  0m  
2s{color} | {color:green} The patch has no ill-formed XML file. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green}  
9m  9s{color} | {color:green} patch has no errors when building and testing our 
client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  3m 
30s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
53s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 87m 13s{color} 
| {color:red} hadoop-hdfs in the patch failed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  2m 
43s{color} | {color:green} hadoop-fs2img in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
58s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}182m 34s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | 
hadoop.hdfs.server.datanode.fsdataset.impl.TestLazyPersistReplicaRecovery |
|   | hadoop.hdfs.server.blockmanagement.TestReplicationPolicy |
|   | hadoop.hdfs.server.namenode.TestStartup |
|   | hadoop.hdfs.TestDFSStripedOutputStreamWithFailure140 |
|   | hadoop.hdfs.TestDFSStripedOutputStreamWithFailure210 |
|   | hadoop.hdfs.server.namenode.TestNameNodeMetadataConsistency |
|   | hadoop.hdfs.TestSetrepDecreasing |
| Timed out junit tests | 
org.apache.hadoop.hdfs.TestReadStripedFileWithDecodingCorruptData |
|   | 

[jira] [Updated] (HDFS-12744) More logs when short-circuit read is failed and disabled

2017-11-01 Thread Weiwei Yang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-12744?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Weiwei Yang updated HDFS-12744:
---
   Resolution: Fixed
 Hadoop Flags: Reviewed
Fix Version/s: 3.0.0
   2.9.0
   Status: Resolved  (was: Patch Available)

Committed to trunk, branch-2 and branch-3.0. Thanks [~jzhuge] for the review.

> More logs when short-circuit read is failed and disabled
> 
>
> Key: HDFS-12744
> URL: https://issues.apache.org/jira/browse/HDFS-12744
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Reporter: Weiwei Yang
>Assignee: Weiwei Yang
>Priority: Major
>  Labels: supportability
> Fix For: 2.9.0, 3.0.0
>
> Attachments: HDFS-12744.001.patch, HDFS-12744.002.patch
>
>
> Short-circuit read (SCR) failed with following error
> {noformat}
> 2017-10-21 16:42:28,024 WARN  
> [B.defaultRpcServer.handler=7,queue=7,port=16020] 
> impl.BlockReaderFactory: BlockReaderFactory(xxx): unknown response code ERROR
> while attempting to set up short-circuit access. Block xxx is not valid
> {noformat}
> then short-circuit read is disabled for *10 minutes* without any warning 
> message given in the log. This causes us spent some more time to figure out 
> why we had a long time window that SCR was not working. Propose to add a 
> warning log (other places already did) to indicate SCR is disabled and some 
> more logging in DN to display what happened.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-12725) BlockPlacementPolicyRackFaultTolerant still fails with racks with very few nodes

2017-11-01 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-12725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16233783#comment-16233783
 ] 

Hadoop QA commented on HDFS-12725:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
54s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 16m 
28s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
49s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
36s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m  
2s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m 25s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
57s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
58s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
58s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
49s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
49s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 36s{color} | {color:orange} hadoop-hdfs-project/hadoop-hdfs: The patch 
generated 1 new + 4 unchanged - 1 fixed = 5 total (was 5) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
54s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
13m 40s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
57s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
46s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red}110m 36s{color} 
| {color:red} hadoop-hdfs in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
25s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}164m 38s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.hdfs.TestReadStripedFileWithMissingBlocks |
|   | hadoop.hdfs.server.namenode.ha.TestPipelinesFailover |
|   | hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureReporting |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:5b98639 |
| JIRA Issue | HDFS-12725 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12895133/HDFS-12725.04.patch |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 871cf8bcf635 3.13.0-119-generic #166-Ubuntu SMP Wed May 3 
12:18:55 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / b8c8b5b |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_131 |
| findbugs | v3.1.0-RC1 |
| checkstyle | 
https://builds.apache.org/job/PreCommit-HDFS-Build/21908/artifact/out/diff-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt
 |
| unit | 
https://builds.apache.org/job/PreCommit-HDFS-Build/21908/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt
 |
|  Test Results | 

[jira] [Commented] (HDFS-12748) NameNode memory leak when accessing webhdfs GETHOMEDIRECTORY

2017-11-01 Thread Weiwei Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-12748?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16233761#comment-16233761
 ] 

Weiwei Yang commented on HDFS-12748:


Thanks [~daryn], your comment makes sense to me. Just uploaded v2 patch, this 
patch pulls some common methods out for re-use, and remove the FileSystem call 
for GETHOMEDIRECTORY, please help to review, thanks.

Note, GETTRASHROOT has same issue, but it requires more refactor (related to 
EC) to make it work consistent in webhdfs and HDFS, I think we need a separate 
JIRA to fix.

Please let me know if this makes sense, thanks.

> NameNode memory leak when accessing webhdfs GETHOMEDIRECTORY
> 
>
> Key: HDFS-12748
> URL: https://issues.apache.org/jira/browse/HDFS-12748
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs
>Affects Versions: 2.8.2
>Reporter: Jiandan Yang 
>Assignee: Weiwei Yang
>Priority: Major
> Attachments: HDFS-12748.001.patch, HDFS-12748.002.patch
>
>
> In our production environment, the standby NN often do fullgc, through mat we 
> found the largest object is FileSystem$Cache, which contains 7,844,890 
> DistributedFileSystem.
> By view hierarchy of method FileSystem.get() , I found only 
> NamenodeWebHdfsMethods#get call FileSystem.get(). I don't know why creating 
> different DistributedFileSystem every time instead of get a FileSystem from 
> cache.
> {code:java}
> case GETHOMEDIRECTORY: {
>   final String js = JsonUtil.toJsonString("Path",
>   FileSystem.get(conf != null ? conf : new Configuration())
>   .getHomeDirectory().toUri().getPath());
>   return Response.ok(js).type(MediaType.APPLICATION_JSON).build();
> }
> {code}
> When we close FileSystem when GETHOMEDIRECTORY, NN don't do fullgc.
> {code:java}
> case GETHOMEDIRECTORY: {
>   FileSystem fs = null;
>   try {
> fs = FileSystem.get(conf != null ? conf : new Configuration());
> final String js = JsonUtil.toJsonString("Path",
> fs.getHomeDirectory().toUri().getPath());
> return Response.ok(js).type(MediaType.APPLICATION_JSON).build();
>   } finally {
> if (fs != null) {
>   fs.close();
> }
>   }
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-12748) NameNode memory leak when accessing webhdfs GETHOMEDIRECTORY

2017-11-01 Thread Weiwei Yang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-12748?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Weiwei Yang updated HDFS-12748:
---
Attachment: HDFS-12748.002.patch

> NameNode memory leak when accessing webhdfs GETHOMEDIRECTORY
> 
>
> Key: HDFS-12748
> URL: https://issues.apache.org/jira/browse/HDFS-12748
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs
>Affects Versions: 2.8.2
>Reporter: Jiandan Yang 
>Assignee: Weiwei Yang
>Priority: Major
> Attachments: HDFS-12748.001.patch, HDFS-12748.002.patch
>
>
> In our production environment, the standby NN often do fullgc, through mat we 
> found the largest object is FileSystem$Cache, which contains 7,844,890 
> DistributedFileSystem.
> By view hierarchy of method FileSystem.get() , I found only 
> NamenodeWebHdfsMethods#get call FileSystem.get(). I don't know why creating 
> different DistributedFileSystem every time instead of get a FileSystem from 
> cache.
> {code:java}
> case GETHOMEDIRECTORY: {
>   final String js = JsonUtil.toJsonString("Path",
>   FileSystem.get(conf != null ? conf : new Configuration())
>   .getHomeDirectory().toUri().getPath());
>   return Response.ok(js).type(MediaType.APPLICATION_JSON).build();
> }
> {code}
> When we close FileSystem when GETHOMEDIRECTORY, NN don't do fullgc.
> {code:java}
> case GETHOMEDIRECTORY: {
>   FileSystem fs = null;
>   try {
> fs = FileSystem.get(conf != null ? conf : new Configuration());
> final String js = JsonUtil.toJsonString("Path",
> fs.getHomeDirectory().toUri().getPath());
> return Response.ok(js).type(MediaType.APPLICATION_JSON).build();
>   } finally {
> if (fs != null) {
>   fs.close();
> }
>   }
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-12622) Fix enumerate in HDFSErasureCoding.md

2017-11-01 Thread Akira Ajisaka (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-12622?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16233754#comment-16233754
 ] 

Akira Ajisaka commented on HDFS-12622:
--

Thanks!

> Fix enumerate in HDFSErasureCoding.md
> -
>
> Key: HDFS-12622
> URL: https://issues.apache.org/jira/browse/HDFS-12622
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: documentation
>Reporter: Akira Ajisaka
>Assignee: Yiqun Lin
>Priority: Minor
>  Labels: newbie
> Fix For: 3.0.0
>
> Attachments: HDFS-12622.001.patch, HDFS-12622.001.patch, Screen Shot 
> 2017-10-10 at 17.36.16.png, screenshot.png
>
>
> {noformat}
>   HDFS native implementation of default RS codec leverages Intel ISA-L 
> library to improve the encoding and decoding calculation. To enable and use 
> Intel ISA-L, there are three steps.
>   1. Build ISA-L library. Please refer to the official site 
> "https://github.com/01org/isa-l/; for detail information.
>   2. Build Hadoop with ISA-L support. Please refer to "Intel ISA-L build 
> options" section in "Build instructions for Hadoop" in (BUILDING.txt) in the 
> source code.
>   3. Use `-Dbundle.isal` to copy the contents of the `isal.lib` directory 
> into the final tar file. Deploy Hadoop with the tar file. Make sure ISA-L is 
> available on HDFS clients and DataNodes.
> {noformat}
> Missing empty line before enumerate.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-12219) Javadoc for FSNamesystem#getMaxObjects is incorrect

2017-11-01 Thread Akira Ajisaka (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-12219?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Akira Ajisaka updated HDFS-12219:
-
Fix Version/s: (was: 3.1.0)
   3.0.0

Cherry-picked to branch-3.0. Thanks!

> Javadoc for FSNamesystem#getMaxObjects is incorrect
> ---
>
> Key: HDFS-12219
> URL: https://issues.apache.org/jira/browse/HDFS-12219
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Erik Krogen
>Assignee: Erik Krogen
> Fix For: 3.0.0
>
> Attachments: HDFS-12219.000.patch
>
>
> The Javadoc states that this represents the total number of objects in the 
> system, but it really represents the maximum allowed number of objects (as 
> correctly stated on the Javadoc for {{FSNamesystemMBean#getMaxObjects()}}).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-12219) Javadoc for FSNamesystem#getMaxObjects is incorrect

2017-11-01 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-12219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16233730#comment-16233730
 ] 

Hudson commented on HDFS-12219:
---

SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #13173 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/13173/])
HDFS-12219. Javadoc for FSNamesystem#getMaxObjects is incorrect. (yqlin: rev 
20304b91cc1513e3d82a01d36f4ee9c4c81b60e4)
* (edit) 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java


> Javadoc for FSNamesystem#getMaxObjects is incorrect
> ---
>
> Key: HDFS-12219
> URL: https://issues.apache.org/jira/browse/HDFS-12219
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Erik Krogen
>Assignee: Erik Krogen
> Fix For: 3.1.0
>
> Attachments: HDFS-12219.000.patch
>
>
> The Javadoc states that this represents the total number of objects in the 
> system, but it really represents the maximum allowed number of objects (as 
> correctly stated on the Javadoc for {{FSNamesystemMBean#getMaxObjects()}}).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Assigned] (HDFS-10323) transient deleteOnExit failure in ViewFileSystem due to close() ordering

2017-11-01 Thread Wenxin He (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10323?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenxin He reassigned HDFS-10323:


Assignee: Wenxin He

> transient deleteOnExit failure in ViewFileSystem due to close() ordering
> 
>
> Key: HDFS-10323
> URL: https://issues.apache.org/jira/browse/HDFS-10323
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: federation
>Affects Versions: 2.6.0
>Reporter: Ben Podgursky
>Assignee: Wenxin He
>Priority: Major
>
> After switching to using a ViewFileSystem, fs.deleteOnExit calls began 
> failing frequently, displaying this error on failure:
> 16/04/21 13:56:24 INFO fs.FileSystem: Ignoring failure to deleteOnExit for 
> path /tmp/delete_on_exit_test_123/a438afc0-a3ca-44f1-9eb5-010ca4a62d84
> Since FileSystem eats the error involved, it is difficult to be sure what the 
> error is, but I believe what is happening is that the ViewFileSystem’s child 
> FileSystems are being close()’d before the ViewFileSystem, due to the random 
> order ClientFinalizer closes FileSystems; so then when the ViewFileSystem 
> tries to close(), it tries to forward the delete() calls to the appropriate 
> child, and fails because the child is already closed.
> I’m unsure how to write an actual Hadoop test to reproduce this, since it 
> involves testing behavior on actual JVM shutdown.  However, I can verify that 
> while
> {code:java}
> fs.deleteOnExit(randomTemporaryDir);

> {code}
> regularly (~50% of the time) fails to delete the temporary directory, this 
> code:
> {code:java}
> ViewFileSystem viewfs = (ViewFileSystem)fs1;

> for (FileSystem fileSystem : viewfs.getChildFileSystems()) {
  
>   if (fileSystem.exists(randomTemporaryDir)) {

> fileSystem.deleteOnExit(randomTemporaryDir);
  
>   }
> 
}

> {code}
> always successfully deletes the temporary directory on JVM shutdown.
> I am not very familiar with FileSystem inheritance hierarchies, but at first 
> glance I see two ways to fix this behavior:
> 1)  ViewFileSystem could forward deleteOnExit calls to the appropriate child 
> FileSystem, and not hold onto that path itself.
> 2) FileSystem.Cache.closeAll could first close all ViewFileSystems, then all 
> other FileSystems.  
> Would appreciate any thoughts of whether this seems accurate, and thoughts 
> (or help) on the fix.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-12219) Javadoc for FSNamesystem#getMaxObjects is incorrect

2017-11-01 Thread Yiqun Lin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-12219?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yiqun Lin updated HDFS-12219:
-
   Resolution: Fixed
 Hadoop Flags: Reviewed
Fix Version/s: 3.1.0
   Status: Resolved  (was: Patch Available)

Just committed this to trunk. Thanks [~xkrogen] for the contribution and thanks 
[~hanishakoneru], [~ajisakaa] for the review.

> Javadoc for FSNamesystem#getMaxObjects is incorrect
> ---
>
> Key: HDFS-12219
> URL: https://issues.apache.org/jira/browse/HDFS-12219
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Erik Krogen
>Assignee: Erik Krogen
> Fix For: 3.1.0
>
> Attachments: HDFS-12219.000.patch
>
>
> The Javadoc states that this represents the total number of objects in the 
> system, but it really represents the maximum allowed number of objects (as 
> correctly stated on the Javadoc for {{FSNamesystemMBean#getMaxObjects()}}).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-12219) Javadoc for FSNamesystem#getMaxObjects is incorrect

2017-11-01 Thread Yiqun Lin (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-12219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16233705#comment-16233705
 ] 

Yiqun Lin commented on HDFS-12219:
--

+1. I'd like to help commit this, :)

> Javadoc for FSNamesystem#getMaxObjects is incorrect
> ---
>
> Key: HDFS-12219
> URL: https://issues.apache.org/jira/browse/HDFS-12219
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Erik Krogen
>Assignee: Erik Krogen
> Attachments: HDFS-12219.000.patch
>
>
> The Javadoc states that this represents the total number of objects in the 
> system, but it really represents the maximum allowed number of objects (as 
> correctly stated on the Javadoc for {{FSNamesystemMBean#getMaxObjects()}}).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-12682) ECAdmin -listPolicies will always show SystemErasureCodingPolicies state as DISABLED

2017-11-01 Thread Rakesh R (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-12682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16233703#comment-16233703
 ] 

Rakesh R commented on HDFS-12682:
-

Good work! [~xiaochen]. Apart from the below comments, overall patch looks good 
to me.
# Please make ErasureCodingPolicyInfo {{implements Serializable}}
# Could you rename {{DFSTestUtil#getPolicyState}} method to 
{{DFSTestUtil#getECPolicyState}}
# It returns both system and user defined policies, so please change message to 
{{ErasureCodingPolicy <" + policy + "> doesn't exist in the policies:" + 
Arrays.toString(policyInfos)}}
{code}
DFSTestUtil#getPolicyState(policy)

throw new IllegalArgumentException("Policy <" + policy + "> is not in"
+ " system policies:" + Arrays.toString(policyInfos));
{code}
# Considering we make the ECP class {{InterfaceAudience.Private}}, can we also 
make ECPS to {{@InterfaceAudience.Private}} ?
{code}
@InterfaceAudience.Public
@InterfaceStability.Evolving
public enum ErasureCodingPolicyState {
{code}

> ECAdmin -listPolicies will always show SystemErasureCodingPolicies state as 
> DISABLED
> 
>
> Key: HDFS-12682
> URL: https://issues.apache.org/jira/browse/HDFS-12682
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: erasure-coding
>Reporter: Xiao Chen
>Assignee: Xiao Chen
>Priority: Blocker
>  Labels: hdfs-ec-3.0-must-do
> Attachments: HDFS-12682.01.patch, HDFS-12682.02.patch, 
> HDFS-12682.03.patch, HDFS-12682.04.patch, HDFS-12682.05.patch, 
> HDFS-12682.06.patch, HDFS-12682.07.patch
>
>
> On a real cluster, {{hdfs ec -listPolicies}} will always show policy state as 
> DISABLED.
> {noformat}
> [hdfs@nightly6x-1 root]$ hdfs ec -listPolicies
> Erasure Coding Policies:
> ErasureCodingPolicy=[Name=RS-10-4-1024k, Schema=[ECSchema=[Codec=rs, 
> numDataUnits=10, numParityUnits=4]], CellSize=1048576, Id=5, State=DISABLED]
> ErasureCodingPolicy=[Name=RS-3-2-1024k, Schema=[ECSchema=[Codec=rs, 
> numDataUnits=3, numParityUnits=2]], CellSize=1048576, Id=2, State=DISABLED]
> ErasureCodingPolicy=[Name=RS-6-3-1024k, Schema=[ECSchema=[Codec=rs, 
> numDataUnits=6, numParityUnits=3]], CellSize=1048576, Id=1, State=DISABLED]
> ErasureCodingPolicy=[Name=RS-LEGACY-6-3-1024k, 
> Schema=[ECSchema=[Codec=rs-legacy, numDataUnits=6, numParityUnits=3]], 
> CellSize=1048576, Id=3, State=DISABLED]
> ErasureCodingPolicy=[Name=XOR-2-1-1024k, Schema=[ECSchema=[Codec=xor, 
> numDataUnits=2, numParityUnits=1]], CellSize=1048576, Id=4, State=DISABLED]
> [hdfs@nightly6x-1 root]$ hdfs ec -getPolicy -path /ecec
> XOR-2-1-1024k
> {noformat}
> This is because when [deserializing 
> protobuf|https://github.com/apache/hadoop/blob/branch-3.0.0-beta1/hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/protocolPB/PBHelperClient.java#L2942],
>  the static instance of [SystemErasureCodingPolicies 
> class|https://github.com/apache/hadoop/blob/branch-3.0.0-beta1/hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/protocol/SystemErasureCodingPolicies.java#L101]
>  is first checked, and always returns the cached policy objects, which are 
> created by default with state=DISABLED.
> All the existing unit tests pass, because that static instance that the 
> client (e.g. ECAdmin) reads in unit test is updated by NN. :)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-12739) Add Support for SCM --init command

2017-11-01 Thread Yiqun Lin (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-12739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16233697#comment-16233697
 ] 

Yiqun Lin commented on HDFS-12739:
--

Thanks for working on this, [~shashikant]. The following are some comments from 
me:

# The usage of {{GENCLUSTERID}} is missing in {{USAGE}}.
# The return value of {{StorageContainerManager#scmInit}} looks confused. Scm 
initial successful, the method return false, if failed, return true. Can we 
make a change on this? And let {{aborted = !scmInit(conf);}}.
# Following line only prints cluster id and no other description, would you 
make a change?
{code}
  private static boolean scmInit(OzoneConfiguration conf) throws 
IOException {
   ...
if (state != StorageState.NORMAL) {
  try {
scmStorage.createStorageDir();
clusterId = StartupOption.INIT.getClusterId();
if (clusterId == null || clusterId.isEmpty()) {
  //Generate a new cluster id
  clusterId = SCMStorage.newClusterID();
}
scmStorage.setClusterID(clusterId);
scmStorage.writeProperties();
System.out.println(clusterId);   <=
return false;

}
{code}
# Here we introduces new scm commands, we need to add some test to verify the 
behaviour of these commands.
Thanks.

> Add Support for SCM --init command
> --
>
> Key: HDFS-12739
> URL: https://issues.apache.org/jira/browse/HDFS-12739
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: HDFS-7240
>Affects Versions: HDFS-7240
>Reporter: Shashikant Banerjee
>Assignee: Shashikant Banerjee
>Priority: Major
> Attachments: HDFS-12739-HDFS-7240.001.patch, 
> HDFS-12739-HDFS-7240.002.patch, HDFS-12739-HDFS-7240.003.patch
>
>
> SCM --init command will generate cluster ID and persist it locally. The same 
> cluster Id will be shared with KSM and the datanodes. IF the cluster Id is 
> already available in the locally available version file, it will just read 
> the cluster Id .



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-12714) Hadoop 3 missing fix for HDFS-5169

2017-11-01 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-12714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16233687#comment-16233687
 ] 

Hudson commented on HDFS-12714:
---

SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #13172 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/13172/])
HDFS-12714. Hadoop 3 missing fix for HDFS-5169. Contributed by Joe (jzhuge: rev 
b8c8b5bc274211b29be125e5463662795a363f84)
* (edit) 
hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfs/hdfs.c


> Hadoop 3 missing fix for HDFS-5169
> --
>
> Key: HDFS-12714
> URL: https://issues.apache.org/jira/browse/HDFS-12714
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: native
>Affects Versions: 3.0.0-alpha1, 3.0.0-beta1, 3.0.0-alpha2, 3.0.0-alpha4, 
> 3.0.0-alpha3
>Reporter: Joe McDonnell
>Assignee: Joe McDonnell
>Priority: Major
> Fix For: 3.0.0-beta1, 3.1.0
>
> Attachments: HDFS-12714.001.patch
>
>
> HDFS-5169 is a fix for a null pointer dereference in translateZCRException. 
> This line in hdfs.c:
> ret = printExceptionAndFree(env, jthr, PRINT_EXC_ALL, "hadoopZeroCopyRead: 
> ZeroCopyCursor#read failed");
> should be:
> ret = printExceptionAndFree(env, exc, PRINT_EXC_ALL, "hadoopZeroCopyRead: 
> ZeroCopyCursor#read failed");
> Plainly, translateZCRException should print the exception (exc) passed in to 
> the function rather than the uninitialized local jthr.
> The fix for HDFS-5169 (part of HDFS-4949) exists on hadoop 2.* branches, but 
> it is missing on hadoop 3 branches including trunk.
> Hadoop 2.8:
> https://github.com/apache/hadoop/blob/branch-2.8/hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfs/hdfs.c#L2514
> Hadoop 3.0:
> https://github.com/apache/hadoop/blob/branch-3.0/hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfs/hdfs.c#L2691



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-12482) Provide a configuration to adjust the weight of EC recovery tasks to adjust the speed of recovery

2017-11-01 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-12482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16233686#comment-16233686
 ] 

Hudson commented on HDFS-12482:
---

SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #13172 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/13172/])
HDFS-12482. Provide a configuration to adjust the weight of EC recovery (lei: 
rev 9367c25dbdfedf60cdbd65611281cf9c667829e6)
* (edit) 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestReconstructStripedFile.java
* (edit) 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/erasurecode/ErasureCodingWorker.java
* (edit) hadoop-hdfs-project/hadoop-hdfs/src/site/markdown/HDFSErasureCoding.md
* (edit) 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSConfigKeys.java
* (edit) 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataNodeFaultInjector.java
* (edit) hadoop-hdfs-project/hadoop-hdfs/src/main/resources/hdfs-default.xml
* (edit) 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/erasurecode/StripedBlockReconstructor.java


> Provide a configuration to adjust the weight of EC recovery tasks to adjust 
> the speed of recovery
> -
>
> Key: HDFS-12482
> URL: https://issues.apache.org/jira/browse/HDFS-12482
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: erasure-coding
>Affects Versions: 3.0.0-alpha4
>Reporter: Lei (Eddy) Xu
>Assignee: Lei (Eddy) Xu
>Priority: Minor
>  Labels: hdfs-ec-3.0-nice-to-have
> Fix For: 3.0.0
>
> Attachments: HDFS-12482.00.patch, HDFS-12482.01.patch, 
> HDFS-12482.02.patch, HDFS-12482.03.patch, HDFS-12482.04.patch, 
> HDFS-12482.05.patch
>
>
> The relative speed of EC recovery comparing to 3x replica recovery is a 
> function of (EC codec, number of sources, NIC speed, and CPU speed, and etc). 
> Currently the EC recovery has a fixed {{xmitsInProgress}} of {{max(# of 
> sources, # of targets)}} comparing to {{1}} for 3x replica recovery, and NN 
> uses {{xmitsInProgress}} to decide how much recovery tasks to schedule to the 
> DataNode this we can add a coefficient for user to tune the weight of EC 
> recovery tasks.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-5169) hdfs.c: translateZCRException: null pointer deref when translating some exceptions

2017-11-01 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16233688#comment-16233688
 ] 

Hudson commented on HDFS-5169:
--

SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #13172 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/13172/])
HDFS-12714. Hadoop 3 missing fix for HDFS-5169. Contributed by Joe (jzhuge: rev 
b8c8b5bc274211b29be125e5463662795a363f84)
* (edit) 
hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfs/hdfs.c


> hdfs.c: translateZCRException: null pointer deref when translating some 
> exceptions
> --
>
> Key: HDFS-5169
> URL: https://issues.apache.org/jira/browse/HDFS-5169
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: libhdfs
>Affects Versions: HDFS-4949
>Reporter: Colin P. McCabe
>Assignee: Colin P. McCabe
>Priority: Minor
> Fix For: HDFS-4949
>
> Attachments: HDFS-5169-caching.001.patch
>
>
> hdfs.c: translateZCRException: there is a null pointer deref when translating 
> some exceptions.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-12714) Hadoop 3 missing fix for HDFS-5169

2017-11-01 Thread John Zhuge (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-12714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16233678#comment-16233678
 ] 

John Zhuge edited comment on HDFS-12714 at 11/1/17 6:03 AM:


Committed to trunk and branch-3.0. Thanks [~joemcdonnell] for reporting and 
fixing the issue!


was (Author: jzhuge):
Committed to trunk and branch-3.0. Thanks [~joemcdonnell] for the contribution!

> Hadoop 3 missing fix for HDFS-5169
> --
>
> Key: HDFS-12714
> URL: https://issues.apache.org/jira/browse/HDFS-12714
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: native
>Affects Versions: 3.0.0-alpha1, 3.0.0-beta1, 3.0.0-alpha2, 3.0.0-alpha4, 
> 3.0.0-alpha3
>Reporter: Joe McDonnell
>Assignee: Joe McDonnell
>Priority: Major
> Fix For: 3.0.0-beta1, 3.1.0
>
> Attachments: HDFS-12714.001.patch
>
>
> HDFS-5169 is a fix for a null pointer dereference in translateZCRException. 
> This line in hdfs.c:
> ret = printExceptionAndFree(env, jthr, PRINT_EXC_ALL, "hadoopZeroCopyRead: 
> ZeroCopyCursor#read failed");
> should be:
> ret = printExceptionAndFree(env, exc, PRINT_EXC_ALL, "hadoopZeroCopyRead: 
> ZeroCopyCursor#read failed");
> Plainly, translateZCRException should print the exception (exc) passed in to 
> the function rather than the uninitialized local jthr.
> The fix for HDFS-5169 (part of HDFS-4949) exists on hadoop 2.* branches, but 
> it is missing on hadoop 3 branches including trunk.
> Hadoop 2.8:
> https://github.com/apache/hadoop/blob/branch-2.8/hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfs/hdfs.c#L2514
> Hadoop 3.0:
> https://github.com/apache/hadoop/blob/branch-3.0/hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfs/hdfs.c#L2691



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-12714) Hadoop 3 missing fix for HDFS-5169

2017-11-01 Thread John Zhuge (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-12714?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Zhuge resolved HDFS-12714.
---
   Resolution: Fixed
Fix Version/s: 3.0.0-beta1
   3.1.0

Committed to trunk and branch-3.0. Thanks [~joemcdonnell] for the contribution!

> Hadoop 3 missing fix for HDFS-5169
> --
>
> Key: HDFS-12714
> URL: https://issues.apache.org/jira/browse/HDFS-12714
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: native
>Affects Versions: 3.0.0-alpha1, 3.0.0-beta1, 3.0.0-alpha2, 3.0.0-alpha4, 
> 3.0.0-alpha3
>Reporter: Joe McDonnell
>Assignee: Joe McDonnell
>Priority: Major
> Fix For: 3.1.0, 3.0.0-beta1
>
> Attachments: HDFS-12714.001.patch
>
>
> HDFS-5169 is a fix for a null pointer dereference in translateZCRException. 
> This line in hdfs.c:
> ret = printExceptionAndFree(env, jthr, PRINT_EXC_ALL, "hadoopZeroCopyRead: 
> ZeroCopyCursor#read failed");
> should be:
> ret = printExceptionAndFree(env, exc, PRINT_EXC_ALL, "hadoopZeroCopyRead: 
> ZeroCopyCursor#read failed");
> Plainly, translateZCRException should print the exception (exc) passed in to 
> the function rather than the uninitialized local jthr.
> The fix for HDFS-5169 (part of HDFS-4949) exists on hadoop 2.* branches, but 
> it is missing on hadoop 3 branches including trunk.
> Hadoop 2.8:
> https://github.com/apache/hadoop/blob/branch-2.8/hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfs/hdfs.c#L2514
> Hadoop 3.0:
> https://github.com/apache/hadoop/blob/branch-3.0/hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfs/hdfs.c#L2691



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org