[jira] [Commented] (HDFS-14832) RBF : Add Icon for ReadOnly False
[ https://issues.apache.org/jira/browse/HDFS-14832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16928287#comment-16928287 ] hemanthboyina commented on HDFS-14832: -- icon is fancier way to show texts is easier way to understand . I feel , if we want to have icon , we should be having both for Read Only False and ReadOnly true If not , texts is more clearer with no confusion > RBF : Add Icon for ReadOnly False > - > > Key: HDFS-14832 > URL: https://issues.apache.org/jira/browse/HDFS-14832 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: hemanthboyina >Assignee: hemanthboyina >Priority: Minor > > In Router Web UI for Mount Table information , add icon for read only state > false -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14832) RBF : Add Icon for ReadOnly False
[ https://issues.apache.org/jira/browse/HDFS-14832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16928275#comment-16928275 ] Takanobu Asanuma commented on HDFS-14832: - Thanks for your comment, [~hemanthboyina]. * I feel lock icon means read-only. And judging based on the icon color may confuse admins. * If Read-only false (which means read-write) status shows there, I prefer to rename the column name from "Read Only" to "Mount Option". * I also listened to my colleagues' opinions. In summary, using icon for read-only may be a bit much. String texts may be clearer. [~hemanthboyina] [~elgoiri] How does look that? > RBF : Add Icon for ReadOnly False > - > > Key: HDFS-14832 > URL: https://issues.apache.org/jira/browse/HDFS-14832 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: hemanthboyina >Assignee: hemanthboyina >Priority: Minor > > In Router Web UI for Mount Table information , add icon for read only state > false -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Issue Comment Deleted] (HDFS-14833) RBF: Router Update Doesn't Sync Quota
[ https://issues.apache.org/jira/browse/HDFS-14833?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Surendra Singh Lilhore updated HDFS-14833: -- Comment: was deleted (was: Thanks [~ayushtkn], {quote}The sync only can success when the cache hasn't been refreshed before the comparison {quote} Good point, this is possible. You can handle this in HDFS-14833.) > RBF: Router Update Doesn't Sync Quota > - > > Key: HDFS-14833 > URL: https://issues.apache.org/jira/browse/HDFS-14833 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Ayush Saxena >Assignee: Ayush Saxena >Priority: Major > > HDFS-14777 Added a check to prevent RPC call, It checks whether in the > present state whether quota is changing. > But ignores the part that if the locations are changed. if the location is > changed the new destination should be synchronized with the mount entry > quota. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14777) RBF: Set ReadOnly is failing for mount Table but actually readonly succeed to set
[ https://issues.apache.org/jira/browse/HDFS-14777?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16928265#comment-16928265 ] Surendra Singh Lilhore commented on HDFS-14777: --- Thanks [~ayushtkn], {quote}The sync only can success when the cache hasn't been refreshed before the comparison {quote} Good point, this is possible. You can handle this in HDFS-14833. > RBF: Set ReadOnly is failing for mount Table but actually readonly succeed to > set > - > > Key: HDFS-14777 > URL: https://issues.apache.org/jira/browse/HDFS-14777 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Ranith Sardar >Assignee: Ranith Sardar >Priority: Major > Fix For: 3.3.0 > > Attachments: HDFS-14777.001.patch, HDFS-14777.002.patch, > HDFS-14777.003.patch, HDFS-14777.004.patch > > > # hdfs dfsrouteradmin -update /test hacluster /test -readonly /opt/client # > hdfs dfsrouteradmin -update /test hacluster /test -readonly update: /test is > in a read only mount > pointorg.apache.hadoop.ipc.RemoteException(java.io.IOException): /test is in > a read only mount point at > org.apache.hadoop.hdfs.server.federation.router.RouterRpcServer.getLocationsForPath(RouterRpcServer.java:1419) > at > org.apache.hadoop.hdfs.server.federation.router.Quota.getQuotaRemoteLocations(Quota.java:217) > at > org.apache.hadoop.hdfs.server.federation.router.Quota.setQuota(Quota.java:75) > at > org.apache.hadoop.hdfs.server.federation.router.RouterAdminServer.synchronizeQuota(RouterAdminServer.java:288) > at > org.apache.hadoop.hdfs.server.federation.router.RouterAdminServer.updateMountTableEntry(RouterAdminServer.java:267) -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14833) RBF: Router Update Doesn't Sync Quota
[ https://issues.apache.org/jira/browse/HDFS-14833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16928264#comment-16928264 ] Surendra Singh Lilhore commented on HDFS-14833: --- Thanks [~ayushtkn], {quote}The sync only can success when the cache hasn't been refreshed before the comparison {quote} Good point, this is possible. You can handle this in HDFS-14833. > RBF: Router Update Doesn't Sync Quota > - > > Key: HDFS-14833 > URL: https://issues.apache.org/jira/browse/HDFS-14833 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Ayush Saxena >Assignee: Ayush Saxena >Priority: Major > > HDFS-14777 Added a check to prevent RPC call, It checks whether in the > present state whether quota is changing. > But ignores the part that if the locations are changed. if the location is > changed the new destination should be synchronized with the mount entry > quota. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14777) RBF: Set ReadOnly is failing for mount Table but actually readonly succeed to set
[ https://issues.apache.org/jira/browse/HDFS-14777?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16928215#comment-16928215 ] Ayush Saxena commented on HDFS-14777: - Thanx Everyone for the work here. Was trying to fix HDFS-14833, Considering that quota just doesn't sync, when locations are updated, but realized while testing there is a problem after this in synchronization even on Quota update in most cases. As the comparison is between the value in the update request and the mount entry(*After the update command has been executed on the entry*). So the entry is the updated entry wrt the update request so the values shall be equal. The sync only can success when the cache hasn't been refreshed before the comparison, which happens very randomly and is beyond control. I guess the entry taken should have been taken before calling update, So as to have a comparison. Give a check [~elgoiri] [~surendrasingh] if it is so I will update it at HDFS-14833. Prior apologies if I have messed up. :) > RBF: Set ReadOnly is failing for mount Table but actually readonly succeed to > set > - > > Key: HDFS-14777 > URL: https://issues.apache.org/jira/browse/HDFS-14777 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Ranith Sardar >Assignee: Ranith Sardar >Priority: Major > Fix For: 3.3.0 > > Attachments: HDFS-14777.001.patch, HDFS-14777.002.patch, > HDFS-14777.003.patch, HDFS-14777.004.patch > > > # hdfs dfsrouteradmin -update /test hacluster /test -readonly /opt/client # > hdfs dfsrouteradmin -update /test hacluster /test -readonly update: /test is > in a read only mount > pointorg.apache.hadoop.ipc.RemoteException(java.io.IOException): /test is in > a read only mount point at > org.apache.hadoop.hdfs.server.federation.router.RouterRpcServer.getLocationsForPath(RouterRpcServer.java:1419) > at > org.apache.hadoop.hdfs.server.federation.router.Quota.getQuotaRemoteLocations(Quota.java:217) > at > org.apache.hadoop.hdfs.server.federation.router.Quota.setQuota(Quota.java:75) > at > org.apache.hadoop.hdfs.server.federation.router.RouterAdminServer.synchronizeQuota(RouterAdminServer.java:288) > at > org.apache.hadoop.hdfs.server.federation.router.RouterAdminServer.updateMountTableEntry(RouterAdminServer.java:267) -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HDFS-14818) Check native pmdk lib by 'hadoop checknative' command
[ https://issues.apache.org/jira/browse/HDFS-14818?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16928209#comment-16928209 ] Rakesh R edited comment on HDFS-14818 at 9/12/19 5:22 AM: -- Thanks [~PhiloHe] for the patch. Overall looks good to me, just few comments. # {{SupportState.PMDK_LIB_NOT_FOUND}} - its unused now, can you remove it. {code:java} SupportState.PMDK_LIB_NOT_FOUND(1), {code} {code:java} case 1: msg = "The native code is built with PMDK support, but PMDK libs " + "are NOT found in execution environment or failed to be loaded."; break; {code} # Any reason to change 'NAME' to 'REALPATH'. was (Author: rakeshr): Thanks [~PhiloHe] for the patch. Overall looks good to me, just a comment. # {{SupportState.PMDK_LIB_NOT_FOUND}} - its unused now, can you remove it. {code} SupportState.PMDK_LIB_NOT_FOUND(1), {code} {code} case 1: msg = "The native code is built with PMDK support, but PMDK libs " + "are NOT found in execution environment or failed to be loaded."; break; {code} > Check native pmdk lib by 'hadoop checknative' command > - > > Key: HDFS-14818 > URL: https://issues.apache.org/jira/browse/HDFS-14818 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: native >Reporter: Feilong He >Assignee: Feilong He >Priority: Major > Attachments: HDFS-14818.000.patch > > > Currently, 'hadoop checknative' command supports checking native libs, such > as zlib, snappy, openssl and ISA-L etc. It's necessary to include pmdk lib in > the checking. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14818) Check native pmdk lib by 'hadoop checknative' command
[ https://issues.apache.org/jira/browse/HDFS-14818?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16928209#comment-16928209 ] Rakesh R commented on HDFS-14818: - Thanks [~PhiloHe] for the patch. Overall looks good to me, just a comment. # {{SupportState.PMDK_LIB_NOT_FOUND}} - its unused now, can you remove it. {code} SupportState.PMDK_LIB_NOT_FOUND(1), {code} {code} case 1: msg = "The native code is built with PMDK support, but PMDK libs " + "are NOT found in execution environment or failed to be loaded."; break; {code} > Check native pmdk lib by 'hadoop checknative' command > - > > Key: HDFS-14818 > URL: https://issues.apache.org/jira/browse/HDFS-14818 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: native >Reporter: Feilong He >Assignee: Feilong He >Priority: Major > Attachments: HDFS-14818.000.patch > > > Currently, 'hadoop checknative' command supports checking native libs, such > as zlib, snappy, openssl and ISA-L etc. It's necessary to include pmdk lib in > the checking. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14754) Erasure Coding : The number of Under-Replicated Blocks never reduced
[ https://issues.apache.org/jira/browse/HDFS-14754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16928202#comment-16928202 ] Surendra Singh Lilhore commented on HDFS-14754: --- [~hemanthboyina], pls attach the patch again with proper comment in test case. > Erasure Coding : The number of Under-Replicated Blocks never reduced > - > > Key: HDFS-14754 > URL: https://issues.apache.org/jira/browse/HDFS-14754 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: hemanthboyina >Assignee: hemanthboyina >Priority: Critical > Attachments: HDFS-14754.001.patch, HDFS-14754.002.patch, > HDFS-14754.003.patch, HDFS-14754.004.patch, HDFS-14754.005.patch, > HDFS-14754.006.patch, HDFS-14754.007.patch > > > Using EC RS-3-2, 6 DN > We came accross a scenario where in the EC 5 blocks , same block is > replicated thrice and two blocks got missing > Replicated block was not deleting and missing block is not able to ReConstruct -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HDFS-14754) Erasure Coding : The number of Under-Replicated Blocks never reduced
[ https://issues.apache.org/jira/browse/HDFS-14754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16928202#comment-16928202 ] Surendra Singh Lilhore edited comment on HDFS-14754 at 9/12/19 5:06 AM: [~hemanthboyina], pls attach the patch again with proper comment in test case. {code:java} // update blocksMap cluster.triggerBlockReports(); // add to invalidates cluster.triggerHeartbeats(); // datanode delete block cluster.triggerHeartbeats(); // update blocksMap cluster.triggerBlockReports(); {code} was (Author: surendrasingh): [~hemanthboyina], pls attach the patch again with proper comment in test case. > Erasure Coding : The number of Under-Replicated Blocks never reduced > - > > Key: HDFS-14754 > URL: https://issues.apache.org/jira/browse/HDFS-14754 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: hemanthboyina >Assignee: hemanthboyina >Priority: Critical > Attachments: HDFS-14754.001.patch, HDFS-14754.002.patch, > HDFS-14754.003.patch, HDFS-14754.004.patch, HDFS-14754.005.patch, > HDFS-14754.006.patch, HDFS-14754.007.patch > > > Using EC RS-3-2, 6 DN > We came accross a scenario where in the EC 5 blocks , same block is > replicated thrice and two blocks got missing > Replicated block was not deleting and missing block is not able to ReConstruct -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14699) Erasure Coding: Storage not considered in live replica when replication streams hard limit reached to threshold
[ https://issues.apache.org/jira/browse/HDFS-14699?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16928199#comment-16928199 ] Surendra Singh Lilhore commented on HDFS-14699: --- {quote}[~surendrasingh] I saw +1 for this patch, can you merge the patch? Thanks! {quote} Sure, I will commit this today... > Erasure Coding: Storage not considered in live replica when replication > streams hard limit reached to threshold > --- > > Key: HDFS-14699 > URL: https://issues.apache.org/jira/browse/HDFS-14699 > Project: Hadoop HDFS > Issue Type: Bug > Components: ec >Affects Versions: 3.2.0, 3.1.1, 3.3.0 >Reporter: Zhao Yi Ming >Assignee: Zhao Yi Ming >Priority: Critical > Labels: patch > Attachments: HDFS-14699.00.patch, HDFS-14699.01.patch, > HDFS-14699.02.patch, HDFS-14699.03.patch, HDFS-14699.04.patch, > HDFS-14699.05.patch, image-2019-08-20-19-58-51-872.png, > image-2019-09-02-17-51-46-742.png > > > We are tried the EC function on 80 node cluster with hadoop 3.1.1, we hit the > same scenario as you said https://issues.apache.org/jira/browse/HDFS-8881. > Following are our testing steps, hope it can helpful.(following DNs have the > testing internal blocks) > # we customized a new 10-2-1024k policy and use it on a path, now we have 12 > internal block(12 live block) > # decommission one DN, after the decommission complete. now we have 13 > internal block(12 live block and 1 decommission block) > # then shutdown one DN which did not have the same block id as 1 > decommission block, now we have 12 internal block(11 live block and 1 > decommission block) > # after wait for about 600s (before the heart beat come) commission the > decommissioned DN again, now we have 12 internal block(11 live block and 1 > duplicate block) > # Then the EC is not reconstruct the missed block > We think this is a critical issue for using the EC function in a production > env. Could you help? Thanks a lot! -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14609) RBF: Security should use common AuthenticationFilter
[ https://issues.apache.org/jira/browse/HDFS-14609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16928195#comment-16928195 ] CR Hota commented on HDFS-14609: [~zhangchen] Thanks for the clarification. I think we are good. Anyways InvalidToken is captured in other tests as well. +1 for 006.patch. > RBF: Security should use common AuthenticationFilter > > > Key: HDFS-14609 > URL: https://issues.apache.org/jira/browse/HDFS-14609 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: CR Hota >Assignee: Chen Zhang >Priority: Major > Attachments: HDFS-14609.001.patch, HDFS-14609.002.patch, > HDFS-14609.003.patch, HDFS-14609.004.patch, HDFS-14609.005.patch, > HDFS-14609.006.patch > > > We worked on router based federation security as part of HDFS-13532. We kept > it compatible with the way namenode works. However with HADOOP-16314 and > HDFS-16354 in trunk, auth filters seems to have been changed causing tests to > fail. > Changes are needed appropriately in RBF, mainly fixing broken tests. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14844) Make buffer of BlockReaderRemote#newBlockReader#BufferedOutputStream configurable
[ https://issues.apache.org/jira/browse/HDFS-14844?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lisheng Sun updated HDFS-14844: --- Attachment: HDFS-14844.002.patch > Make buffer of BlockReaderRemote#newBlockReader#BufferedOutputStream > configurable > -- > > Key: HDFS-14844 > URL: https://issues.apache.org/jira/browse/HDFS-14844 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Lisheng Sun >Assignee: Lisheng Sun >Priority: Minor > Attachments: HDFS-14844.001.patch, HDFS-14844.002.patch > > > details for HDFS-14820 -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14832) RBF : Add Icon for ReadOnly False
[ https://issues.apache.org/jira/browse/HDFS-14832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16928189#comment-16928189 ] hemanthboyina commented on HDFS-14832: -- yes [~tasanuma] we should have like _federationhealth-mounttable-legend_ whats your opinion on having Red colour labelled lock for ReadOnly State False , as same like green for read only state true. > RBF : Add Icon for ReadOnly False > - > > Key: HDFS-14832 > URL: https://issues.apache.org/jira/browse/HDFS-14832 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: hemanthboyina >Assignee: hemanthboyina >Priority: Minor > > In Router Web UI for Mount Table information , add icon for read only state > false -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-2019) Handle Set DtService of token in S3Gateway for OM HA
[ https://issues.apache.org/jira/browse/HDDS-2019?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jitendra Nath Pandey updated HDDS-2019: --- Priority: Critical (was: Major) > Handle Set DtService of token in S3Gateway for OM HA > > > Key: HDDS-2019 > URL: https://issues.apache.org/jira/browse/HDDS-2019 > Project: Hadoop Distributed Data Store > Issue Type: Sub-task >Reporter: Bharat Viswanadham >Assignee: Bharat Viswanadham >Priority: Critical > > When OM HA is enabled, when tokens are generated, the service name should be > set with address of all OM's. > > Current without HA, it is set with Om RpcAddress string. This Jira is to > handle: > # Set dtService with all OM address. Right now in OMClientProducer, UGI is > created with S3 token, and serviceName of token is set with OMAddress, for HA > case, this should be set with all OM RPC addresses. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14795) Add Throttler for writing block
[ https://issues.apache.org/jira/browse/HDFS-14795?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lisheng Sun updated HDFS-14795: --- Attachment: HDFS-14795.011.patch > Add Throttler for writing block > --- > > Key: HDFS-14795 > URL: https://issues.apache.org/jira/browse/HDFS-14795 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Lisheng Sun >Assignee: Lisheng Sun >Priority: Minor > Attachments: HDFS-14795.001.patch, HDFS-14795.002.patch, > HDFS-14795.003.patch, HDFS-14795.004.patch, HDFS-14795.005.patch, > HDFS-14795.006.patch, HDFS-14795.007.patch, HDFS-14795.008.patch, > HDFS-14795.009.patch, HDFS-14795.010.patch, HDFS-14795.011.patch > > > DataXceiver#writeBlock > {code:java} > blockReceiver.receiveBlock(mirrorOut, mirrorIn, replyOut, > mirrorAddr, null, targets, false); > {code} > As above code, DataXceiver#writeBlock doesn't throttler. > I think it is necessary to throttle for writing block, while add throttler > in stage of PIPELINE_SETUP_APPEND_RECOVERY or > PIPELINE_SETUP_STREAMING_RECOVERY. > Default throttler value is still null. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14768) In some cases, erasure blocks are corruption when they are reconstruct.
[ https://issues.apache.org/jira/browse/HDFS-14768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16928181#comment-16928181 ] Hadoop QA commented on HDFS-14768: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s{color} | {color:blue} Docker mode activated. {color} | | {color:red}-1{color} | {color:red} patch {color} | {color:red} 0m 7s{color} | {color:red} HDFS-14768 does not apply to trunk. Rebase required? Wrong Branch? See https://wiki.apache.org/hadoop/HowToContribute for help. {color} | \\ \\ || Subsystem || Report/Notes || | JIRA Issue | HDFS-14768 | | Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/27849/console | | Powered by | Apache Yetus 0.8.0 http://yetus.apache.org | This message was automatically generated. > In some cases, erasure blocks are corruption when they are reconstruct. > > > Key: HDFS-14768 > URL: https://issues.apache.org/jira/browse/HDFS-14768 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode, erasure-coding, hdfs, namenode >Affects Versions: 3.0.2 >Reporter: guojh >Assignee: guojh >Priority: Major > Labels: patch > Fix For: 3.3.0 > > Attachments: HDFS-14768.000.patch, guojh_UT_after_deomission.txt, > guojh_UT_before_deomission.txt, zhaoyiming_UT_after_deomission.txt, > zhaoyiming_UT_beofre_deomission.txt > > > Policy is RS-6-3-1024K, version is hadoop 3.0.2; > We suppose a file's block Index is [0,1,2,3,4,5,6,7,8], And decommission > index[3,4], increase the index 6 datanode's > pendingReplicationWithoutTargets that make it large than > replicationStreamsHardLimit(we set 14). Then, After the method > chooseSourceDatanodes of BlockMananger, the liveBlockIndices is > [0,1,2,3,4,5,7,8], Block Counter is, Live:7, Decommission:2. > In method scheduleReconstruction of BlockManager, the additionalReplRequired > is 9 - 7 = 2. After Namenode choose two target Datanode, will assign a > erasureCode task to target datanode. > When datanode get the task will build targetIndices from liveBlockIndices > and target length. the code is blow. > {code:java} > // code placeholder > targetIndices = new short[targets.length]; > private void initTargetIndices() { > BitSet bitset = reconstructor.getLiveBitSet(); > int m = 0; hasValidTargets = false; > for (int i = 0; i < dataBlkNum + parityBlkNum; i++) { > if (!bitset.get) { > if (reconstructor.getBlockLen > 0) { > if (m < targets.length) { > targetIndices[m++] = (short)i; > hasValidTargets = true; > } > } > } > } > {code} > targetIndices[0]=6, and targetIndices[1] is aways 0 from initial value. > The StripedReader is aways create reader from first 6 index block, and is > [0,1,2,3,4,5] > Use the index [0,1,2,3,4,5] to build target index[6,0] will trigger the isal > bug. the block index6's data is corruption(all data is zero). > I write a unit test can stabilize repreduce. > {code:java} > // code placeholder > public void testFileDecommission() throws Exception { > LOG.info("Starting test testFileDecommission"); > final Path ecFile = new Path(ecDir, "testFileDecommission"); > int writeBytes = cellSize * dataBlocks; > writeStripedFile(dfs, ecFile, writeBytes); > Assert.assertEquals(0, bm.numOfUnderReplicatedBlocks()); > FileChecksum fileChecksum1 = dfs.getFileChecksum(ecFile, writeBytes); > LocatedBlocks locatedBlocks = > StripedFileTestUtil.getLocatedBlocks(ecFile, dfs); > LocatedBlock lb = dfs.getClient().getLocatedBlocks(ecFile.toString(), 0) > .get(0); > DatanodeInfo[] dnLocs = lb.getLocations(); > LocatedStripedBlock lastBlock = > (LocatedStripedBlock)locatedBlocks.getLastLocatedBlock(); > DatanodeInfo[] storageInfos = lastBlock.getLocations(); > // > DatanodeDescriptor datanodeDescriptor = > cluster.getNameNode().getNamesystem() > > .getBlockManager().getDatanodeManager().getDatanode(storageInfos[6].getDatanodeUuid()); > for (int i = 0; i < 100; i++) { > datanodeDescriptor.incrementPendingReplicationWithoutTargets(); > } > assertEquals(dataBlocks + parityBlocks, dnLocs.length); > int[] decommNodeIndex = {3, 4}; > final List decommisionNodes = new ArrayList(); > // add the node which will be decommissioning > decommisionNodes.add(dnLocs[decommNodeIndex[0]]); > decommisionNodes.add(dnLocs[decommNodeIndex[1]]); > decommissionNode(0, decommisionNodes, AdminStates.DECOMMISSIONED); > assertEquals(decommisionNodes.size(), fsn.getNumDecomLiveDataNodes()); > //assertNull(checkFile(dfs, ecFile, 9, decommisionNodes, numDNs)); > // Ensure decommissio
[jira] [Updated] (HDFS-14768) In some cases, erasure blocks are corruption when they are reconstruct.
[ https://issues.apache.org/jira/browse/HDFS-14768?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhao Yi Ming updated HDFS-14768: Attachment: zhaoyiming_UT_after_deomission.txt zhaoyiming_UT_beofre_deomission.txt > In some cases, erasure blocks are corruption when they are reconstruct. > > > Key: HDFS-14768 > URL: https://issues.apache.org/jira/browse/HDFS-14768 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode, erasure-coding, hdfs, namenode >Affects Versions: 3.0.2 >Reporter: guojh >Assignee: guojh >Priority: Major > Labels: patch > Fix For: 3.3.0 > > Attachments: HDFS-14768.000.patch, guojh_UT_after_deomission.txt, > guojh_UT_before_deomission.txt, zhaoyiming_UT_after_deomission.txt, > zhaoyiming_UT_beofre_deomission.txt > > > Policy is RS-6-3-1024K, version is hadoop 3.0.2; > We suppose a file's block Index is [0,1,2,3,4,5,6,7,8], And decommission > index[3,4], increase the index 6 datanode's > pendingReplicationWithoutTargets that make it large than > replicationStreamsHardLimit(we set 14). Then, After the method > chooseSourceDatanodes of BlockMananger, the liveBlockIndices is > [0,1,2,3,4,5,7,8], Block Counter is, Live:7, Decommission:2. > In method scheduleReconstruction of BlockManager, the additionalReplRequired > is 9 - 7 = 2. After Namenode choose two target Datanode, will assign a > erasureCode task to target datanode. > When datanode get the task will build targetIndices from liveBlockIndices > and target length. the code is blow. > {code:java} > // code placeholder > targetIndices = new short[targets.length]; > private void initTargetIndices() { > BitSet bitset = reconstructor.getLiveBitSet(); > int m = 0; hasValidTargets = false; > for (int i = 0; i < dataBlkNum + parityBlkNum; i++) { > if (!bitset.get) { > if (reconstructor.getBlockLen > 0) { > if (m < targets.length) { > targetIndices[m++] = (short)i; > hasValidTargets = true; > } > } > } > } > {code} > targetIndices[0]=6, and targetIndices[1] is aways 0 from initial value. > The StripedReader is aways create reader from first 6 index block, and is > [0,1,2,3,4,5] > Use the index [0,1,2,3,4,5] to build target index[6,0] will trigger the isal > bug. the block index6's data is corruption(all data is zero). > I write a unit test can stabilize repreduce. > {code:java} > // code placeholder > public void testFileDecommission() throws Exception { > LOG.info("Starting test testFileDecommission"); > final Path ecFile = new Path(ecDir, "testFileDecommission"); > int writeBytes = cellSize * dataBlocks; > writeStripedFile(dfs, ecFile, writeBytes); > Assert.assertEquals(0, bm.numOfUnderReplicatedBlocks()); > FileChecksum fileChecksum1 = dfs.getFileChecksum(ecFile, writeBytes); > LocatedBlocks locatedBlocks = > StripedFileTestUtil.getLocatedBlocks(ecFile, dfs); > LocatedBlock lb = dfs.getClient().getLocatedBlocks(ecFile.toString(), 0) > .get(0); > DatanodeInfo[] dnLocs = lb.getLocations(); > LocatedStripedBlock lastBlock = > (LocatedStripedBlock)locatedBlocks.getLastLocatedBlock(); > DatanodeInfo[] storageInfos = lastBlock.getLocations(); > // > DatanodeDescriptor datanodeDescriptor = > cluster.getNameNode().getNamesystem() > > .getBlockManager().getDatanodeManager().getDatanode(storageInfos[6].getDatanodeUuid()); > for (int i = 0; i < 100; i++) { > datanodeDescriptor.incrementPendingReplicationWithoutTargets(); > } > assertEquals(dataBlocks + parityBlocks, dnLocs.length); > int[] decommNodeIndex = {3, 4}; > final List decommisionNodes = new ArrayList(); > // add the node which will be decommissioning > decommisionNodes.add(dnLocs[decommNodeIndex[0]]); > decommisionNodes.add(dnLocs[decommNodeIndex[1]]); > decommissionNode(0, decommisionNodes, AdminStates.DECOMMISSIONED); > assertEquals(decommisionNodes.size(), fsn.getNumDecomLiveDataNodes()); > //assertNull(checkFile(dfs, ecFile, 9, decommisionNodes, numDNs)); > // Ensure decommissioned datanode is not automatically shutdown > DFSClient client = getDfsClient(cluster.getNameNode(0), conf); > assertEquals("All datanodes must be alive", numDNs, > client.datanodeReport(DatanodeReportType.LIVE).length); > FileChecksum fileChecksum2 = dfs.getFileChecksum(ecFile, writeBytes); > Assert.assertTrue("Checksum mismatches!", > fileChecksum1.equals(fileChecksum2)); > StripedFileTestUtil.checkData(dfs, ecFile, writeBytes, decommisionNodes, > null, blockGroupSize); > } > {code} > -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscrib
[jira] [Updated] (HDFS-14840) Use Java Conccurent Instead of Synchronization in BlockPoolTokenSecretManager
[ https://issues.apache.org/jira/browse/HDFS-14840?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Akira Ajisaka updated HDFS-14840: - Fix Version/s: 3.3.0 Hadoop Flags: Reviewed Resolution: Fixed Status: Resolved (was: Patch Available) Committed this to trunk. Thanks [~belugabehr] for the contribution. > Use Java Conccurent Instead of Synchronization in BlockPoolTokenSecretManager > - > > Key: HDFS-14840 > URL: https://issues.apache.org/jira/browse/HDFS-14840 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs >Affects Versions: 3.2.0 >Reporter: David Mollitor >Assignee: David Mollitor >Priority: Minor > Fix For: 3.3.0 > > Attachments: HDFS-14840.1.patch > > > https://github.com/apache/hadoop/blob/d8bac50e12d243ef8fd2c7e0ce5c9997131dee74/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/security/token/block/BlockPoolTokenSecretManager.java#L40 > Instead of synchronizing the entire class, just synchronize the collection. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDDS-1879) Support multiple excluded scopes when choosing datanodes in NetworkTopology
[ https://issues.apache.org/jira/browse/HDDS-1879?focusedWorklogId=311147&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-311147 ] ASF GitHub Bot logged work on HDDS-1879: Author: ASF GitHub Bot Created on: 12/Sep/19 03:50 Start Date: 12/Sep/19 03:50 Worklog Time Spent: 10m Work Description: hadoop-yetus commented on issue #1194: HDDS-1879. Support multiple excluded scopes when choosing datanodes in NetworkTopology URL: https://github.com/apache/hadoop/pull/1194#issuecomment-530653069 :broken_heart: **-1 overall** | Vote | Subsystem | Runtime | Comment | |::|--:|:|:| | 0 | reexec | 105 | Docker mode activated. | ||| _ Prechecks _ | | +1 | dupname | 0 | No case conflicting files found. | | +1 | @author | 0 | The patch does not contain any @author tags. | | +1 | test4tests | 0 | The patch appears to include 2 new or modified test files. | ||| _ trunk Compile Tests _ | | 0 | mvndep | 28 | Maven dependency ordering for branch | | -1 | mvninstall | 33 | hadoop-ozone in trunk failed. | | -1 | compile | 22 | hadoop-ozone in trunk failed. | | +1 | checkstyle | 68 | trunk passed | | +1 | mvnsite | 0 | trunk passed | | +1 | shadedclient | 866 | branch has no errors when building and testing our client artifacts. | | -1 | javadoc | 18 | hadoop-hdds in trunk failed. | | -1 | javadoc | 18 | hadoop-ozone in trunk failed. | | 0 | spotbugs | 149 | Used deprecated FindBugs config; considering switching to SpotBugs. | | -1 | findbugs | 26 | hadoop-ozone in trunk failed. | ||| _ Patch Compile Tests _ | | 0 | mvndep | 18 | Maven dependency ordering for patch | | -1 | mvninstall | 33 | hadoop-ozone in the patch failed. | | -1 | compile | 26 | hadoop-ozone in the patch failed. | | -1 | javac | 53 | hadoop-hdds generated 2 new + 25 unchanged - 2 fixed = 27 total (was 27) | | -1 | javac | 26 | hadoop-ozone in the patch failed. | | -0 | checkstyle | 35 | hadoop-hdds: The patch generated 46 new + 259 unchanged - 55 fixed = 305 total (was 314) | | +1 | mvnsite | 0 | the patch passed | | +1 | whitespace | 0 | The patch has no whitespace issues. | | +1 | shadedclient | 667 | patch has no errors when building and testing our client artifacts. | | -1 | javadoc | 17 | hadoop-hdds in the patch failed. | | -1 | javadoc | 16 | hadoop-ozone in the patch failed. | | -1 | findbugs | 25 | hadoop-ozone in the patch failed. | ||| _ Other Tests _ | | -1 | unit | 212 | hadoop-hdds in the patch failed. | | -1 | unit | 27 | hadoop-ozone in the patch failed. | | +1 | asflicense | 31 | The patch does not generate ASF License warnings. | | | | 3041 | | | Reason | Tests | |---:|:--| | Failed junit tests | hadoop.hdds.scm.container.placement.algorithms.TestSCMContainerPlacementRackAware | | Subsystem | Report/Notes | |--:|:-| | Docker | Client=19.03.1 Server=19.03.1 base: https://builds.apache.org/job/hadoop-multibranch/job/PR-1194/15/artifact/out/Dockerfile | | GITHUB PR | https://github.com/apache/hadoop/pull/1194 | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux 84fd38c81e89 4.15.0-58-generic #64-Ubuntu SMP Tue Aug 6 11:12:41 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | personality/hadoop.sh | | git revision | trunk / 3b06f0b | | Default Java | 1.8.0_222 | | mvninstall | https://builds.apache.org/job/hadoop-multibranch/job/PR-1194/15/artifact/out/branch-mvninstall-hadoop-ozone.txt | | compile | https://builds.apache.org/job/hadoop-multibranch/job/PR-1194/15/artifact/out/branch-compile-hadoop-ozone.txt | | javadoc | https://builds.apache.org/job/hadoop-multibranch/job/PR-1194/15/artifact/out/branch-javadoc-hadoop-hdds.txt | | javadoc | https://builds.apache.org/job/hadoop-multibranch/job/PR-1194/15/artifact/out/branch-javadoc-hadoop-ozone.txt | | findbugs | https://builds.apache.org/job/hadoop-multibranch/job/PR-1194/15/artifact/out/branch-findbugs-hadoop-ozone.txt | | mvninstall | https://builds.apache.org/job/hadoop-multibranch/job/PR-1194/15/artifact/out/patch-mvninstall-hadoop-ozone.txt | | compile | https://builds.apache.org/job/hadoop-multibranch/job/PR-1194/15/artifact/out/patch-compile-hadoop-ozone.txt | | javac | https://builds.apache.org/job/hadoop-multibranch/job/PR-1194/15/artifact/out/diff-compile-javac-hadoop-hdds.txt | | javac | https://builds.apache.org/job/hadoop-multibranch/job/PR-1194/15/artifact/out/patch-compile-hadoop-ozone.txt | | checkstyle | https://builds.apache.org/job/hadoop-multibranch/job/PR-1194/15/artifact/out/diff-checkstyle-hadoop-hdds.txt | | javadoc | htt
[jira] [Commented] (HDFS-14840) Use Java Conccurent Instead of Synchronization in BlockPoolTokenSecretManager
[ https://issues.apache.org/jira/browse/HDFS-14840?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16928177#comment-16928177 ] Hudson commented on HDFS-14840: --- FAILURE: Integrated in Jenkins build Hadoop-trunk-Commit #17281 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/17281/]) HDFS-14840. Use Java Conccurent Instead of Synchronization in (aajisaka: rev 68612a0410066af745e80d3b6d732a6151a635e3) * (edit) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/security/token/block/BlockPoolTokenSecretManager.java > Use Java Conccurent Instead of Synchronization in BlockPoolTokenSecretManager > - > > Key: HDFS-14840 > URL: https://issues.apache.org/jira/browse/HDFS-14840 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs >Affects Versions: 3.2.0 >Reporter: David Mollitor >Assignee: David Mollitor >Priority: Minor > Fix For: 3.3.0 > > Attachments: HDFS-14840.1.patch > > > https://github.com/apache/hadoop/blob/d8bac50e12d243ef8fd2c7e0ce5c9997131dee74/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/security/token/block/BlockPoolTokenSecretManager.java#L40 > Instead of synchronizing the entire class, just synchronize the collection. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HDFS-14768) In some cases, erasure blocks are corruption when they are reconstruct.
[ https://issues.apache.org/jira/browse/HDFS-14768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16928172#comment-16928172 ] Zhao Yi Ming edited comment on HDFS-14768 at 9/12/19 3:45 AM: -- One more information, hope it can give help. Through [~gjhkael]'s UT make the decommission hang, but I check the local data folder, seems the reconstruction worked well. Also with my changed on the UT, the reconstruction worked well too. Attached the folder file list. [^guojh_UT_before_deomission.txt] [^guojh_UT_after_deomission.txt] [^zhaoyiming_UT_beofre_deomission.txt] [^zhaoyiming_UT_after_deomission.txt] was (Author: zhaoyim): One more information, hope it can give help. Through [~gjhkael]'s UT make the decommission hang, but I check the local data folder, seems the reconstruction worked well. Also with my changed on the UT, the reconstruction worked well too. Attached the folder file list. > In some cases, erasure blocks are corruption when they are reconstruct. > > > Key: HDFS-14768 > URL: https://issues.apache.org/jira/browse/HDFS-14768 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode, erasure-coding, hdfs, namenode >Affects Versions: 3.0.2 >Reporter: guojh >Assignee: guojh >Priority: Major > Labels: patch > Fix For: 3.3.0 > > Attachments: HDFS-14768.000.patch, guojh_UT_after_deomission.txt, > guojh_UT_before_deomission.txt, zhaoyiming_UT_after_deomission.txt, > zhaoyiming_UT_beofre_deomission.txt > > > Policy is RS-6-3-1024K, version is hadoop 3.0.2; > We suppose a file's block Index is [0,1,2,3,4,5,6,7,8], And decommission > index[3,4], increase the index 6 datanode's > pendingReplicationWithoutTargets that make it large than > replicationStreamsHardLimit(we set 14). Then, After the method > chooseSourceDatanodes of BlockMananger, the liveBlockIndices is > [0,1,2,3,4,5,7,8], Block Counter is, Live:7, Decommission:2. > In method scheduleReconstruction of BlockManager, the additionalReplRequired > is 9 - 7 = 2. After Namenode choose two target Datanode, will assign a > erasureCode task to target datanode. > When datanode get the task will build targetIndices from liveBlockIndices > and target length. the code is blow. > {code:java} > // code placeholder > targetIndices = new short[targets.length]; > private void initTargetIndices() { > BitSet bitset = reconstructor.getLiveBitSet(); > int m = 0; hasValidTargets = false; > for (int i = 0; i < dataBlkNum + parityBlkNum; i++) { > if (!bitset.get) { > if (reconstructor.getBlockLen > 0) { > if (m < targets.length) { > targetIndices[m++] = (short)i; > hasValidTargets = true; > } > } > } > } > {code} > targetIndices[0]=6, and targetIndices[1] is aways 0 from initial value. > The StripedReader is aways create reader from first 6 index block, and is > [0,1,2,3,4,5] > Use the index [0,1,2,3,4,5] to build target index[6,0] will trigger the isal > bug. the block index6's data is corruption(all data is zero). > I write a unit test can stabilize repreduce. > {code:java} > // code placeholder > public void testFileDecommission() throws Exception { > LOG.info("Starting test testFileDecommission"); > final Path ecFile = new Path(ecDir, "testFileDecommission"); > int writeBytes = cellSize * dataBlocks; > writeStripedFile(dfs, ecFile, writeBytes); > Assert.assertEquals(0, bm.numOfUnderReplicatedBlocks()); > FileChecksum fileChecksum1 = dfs.getFileChecksum(ecFile, writeBytes); > LocatedBlocks locatedBlocks = > StripedFileTestUtil.getLocatedBlocks(ecFile, dfs); > LocatedBlock lb = dfs.getClient().getLocatedBlocks(ecFile.toString(), 0) > .get(0); > DatanodeInfo[] dnLocs = lb.getLocations(); > LocatedStripedBlock lastBlock = > (LocatedStripedBlock)locatedBlocks.getLastLocatedBlock(); > DatanodeInfo[] storageInfos = lastBlock.getLocations(); > // > DatanodeDescriptor datanodeDescriptor = > cluster.getNameNode().getNamesystem() > > .getBlockManager().getDatanodeManager().getDatanode(storageInfos[6].getDatanodeUuid()); > for (int i = 0; i < 100; i++) { > datanodeDescriptor.incrementPendingReplicationWithoutTargets(); > } > assertEquals(dataBlocks + parityBlocks, dnLocs.length); > int[] decommNodeIndex = {3, 4}; > final List decommisionNodes = new ArrayList(); > // add the node which will be decommissioning > decommisionNodes.add(dnLocs[decommNodeIndex[0]]); > decommisionNodes.add(dnLocs[decommNodeIndex[1]]); > decommissionNode(0, decommisionNodes, AdminStates.DECOMMISSIONED); > assertEquals(decommisionNodes.size(), fsn.getNumDecomLiveDataNodes()); > //assertNull(checkFile(dfs, ecFile, 9,
[jira] [Updated] (HDFS-14768) In some cases, erasure blocks are corruption when they are reconstruct.
[ https://issues.apache.org/jira/browse/HDFS-14768?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhao Yi Ming updated HDFS-14768: Attachment: guojh_UT_after_deomission.txt guojh_UT_before_deomission.txt > In some cases, erasure blocks are corruption when they are reconstruct. > > > Key: HDFS-14768 > URL: https://issues.apache.org/jira/browse/HDFS-14768 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode, erasure-coding, hdfs, namenode >Affects Versions: 3.0.2 >Reporter: guojh >Assignee: guojh >Priority: Major > Labels: patch > Fix For: 3.3.0 > > Attachments: HDFS-14768.000.patch, guojh_UT_after_deomission.txt, > guojh_UT_before_deomission.txt, zhaoyiming_UT_after_deomission.txt, > zhaoyiming_UT_beofre_deomission.txt > > > Policy is RS-6-3-1024K, version is hadoop 3.0.2; > We suppose a file's block Index is [0,1,2,3,4,5,6,7,8], And decommission > index[3,4], increase the index 6 datanode's > pendingReplicationWithoutTargets that make it large than > replicationStreamsHardLimit(we set 14). Then, After the method > chooseSourceDatanodes of BlockMananger, the liveBlockIndices is > [0,1,2,3,4,5,7,8], Block Counter is, Live:7, Decommission:2. > In method scheduleReconstruction of BlockManager, the additionalReplRequired > is 9 - 7 = 2. After Namenode choose two target Datanode, will assign a > erasureCode task to target datanode. > When datanode get the task will build targetIndices from liveBlockIndices > and target length. the code is blow. > {code:java} > // code placeholder > targetIndices = new short[targets.length]; > private void initTargetIndices() { > BitSet bitset = reconstructor.getLiveBitSet(); > int m = 0; hasValidTargets = false; > for (int i = 0; i < dataBlkNum + parityBlkNum; i++) { > if (!bitset.get) { > if (reconstructor.getBlockLen > 0) { > if (m < targets.length) { > targetIndices[m++] = (short)i; > hasValidTargets = true; > } > } > } > } > {code} > targetIndices[0]=6, and targetIndices[1] is aways 0 from initial value. > The StripedReader is aways create reader from first 6 index block, and is > [0,1,2,3,4,5] > Use the index [0,1,2,3,4,5] to build target index[6,0] will trigger the isal > bug. the block index6's data is corruption(all data is zero). > I write a unit test can stabilize repreduce. > {code:java} > // code placeholder > public void testFileDecommission() throws Exception { > LOG.info("Starting test testFileDecommission"); > final Path ecFile = new Path(ecDir, "testFileDecommission"); > int writeBytes = cellSize * dataBlocks; > writeStripedFile(dfs, ecFile, writeBytes); > Assert.assertEquals(0, bm.numOfUnderReplicatedBlocks()); > FileChecksum fileChecksum1 = dfs.getFileChecksum(ecFile, writeBytes); > LocatedBlocks locatedBlocks = > StripedFileTestUtil.getLocatedBlocks(ecFile, dfs); > LocatedBlock lb = dfs.getClient().getLocatedBlocks(ecFile.toString(), 0) > .get(0); > DatanodeInfo[] dnLocs = lb.getLocations(); > LocatedStripedBlock lastBlock = > (LocatedStripedBlock)locatedBlocks.getLastLocatedBlock(); > DatanodeInfo[] storageInfos = lastBlock.getLocations(); > // > DatanodeDescriptor datanodeDescriptor = > cluster.getNameNode().getNamesystem() > > .getBlockManager().getDatanodeManager().getDatanode(storageInfos[6].getDatanodeUuid()); > for (int i = 0; i < 100; i++) { > datanodeDescriptor.incrementPendingReplicationWithoutTargets(); > } > assertEquals(dataBlocks + parityBlocks, dnLocs.length); > int[] decommNodeIndex = {3, 4}; > final List decommisionNodes = new ArrayList(); > // add the node which will be decommissioning > decommisionNodes.add(dnLocs[decommNodeIndex[0]]); > decommisionNodes.add(dnLocs[decommNodeIndex[1]]); > decommissionNode(0, decommisionNodes, AdminStates.DECOMMISSIONED); > assertEquals(decommisionNodes.size(), fsn.getNumDecomLiveDataNodes()); > //assertNull(checkFile(dfs, ecFile, 9, decommisionNodes, numDNs)); > // Ensure decommissioned datanode is not automatically shutdown > DFSClient client = getDfsClient(cluster.getNameNode(0), conf); > assertEquals("All datanodes must be alive", numDNs, > client.datanodeReport(DatanodeReportType.LIVE).length); > FileChecksum fileChecksum2 = dfs.getFileChecksum(ecFile, writeBytes); > Assert.assertTrue("Checksum mismatches!", > fileChecksum1.equals(fileChecksum2)); > StripedFileTestUtil.checkData(dfs, ecFile, writeBytes, decommisionNodes, > null, blockGroupSize); > } > {code} > -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail:
[jira] [Commented] (HDFS-14836) FileIoProvider should not increase FileIoErrors metric in datanode volume metric
[ https://issues.apache.org/jira/browse/HDFS-14836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16928173#comment-16928173 ] Aiphago commented on HDFS-14836: Thanks [~jojochuang] for the comment. {quote}it would be the best if we can avoid string-matching exception messages. "Broken pipe" and "Connection reset" are usually thrown as a SocketException. Would it make sense to check exception class name instead? Better, catch SocketException and do not call onFailure(). {quote} if we just check exception class name instead, the range maybe too big.I means maybe there are some other SocketException and not match "Broken pipe" and "Connection reset" .So why we not keep consistency to -HDFS-2054 .- {quote}for socket related exceptions, you don't want to call onFailure(); however, those exceptions should be re-thrown too. They should not be ignored silently. {quote} it's a good suggestion,I will change later. > FileIoProvider should not increase FileIoErrors metric in datanode volume > metric > > > Key: HDFS-14836 > URL: https://issues.apache.org/jira/browse/HDFS-14836 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Aiphago >Assignee: Aiphago >Priority: Minor > Attachments: HDFS-14836.patch > > > I found that FileIoErrors metric will increase in > BlockSender.sendPacket(),when use fileIoProvider.transferToSocketFully().But > in https://issues.apache.org/jira/browse/HDFS-2054 the Exception has been > ignore like "Broken pipe" and "Connection reset" . > So should do a filter when fileIoProvider increase FileIoErrors count ? -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14768) In some cases, erasure blocks are corruption when they are reconstruct.
[ https://issues.apache.org/jira/browse/HDFS-14768?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhao Yi Ming updated HDFS-14768: Attachment: (was: zhaoyiming_changes_beofre_deomission.txt) > In some cases, erasure blocks are corruption when they are reconstruct. > > > Key: HDFS-14768 > URL: https://issues.apache.org/jira/browse/HDFS-14768 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode, erasure-coding, hdfs, namenode >Affects Versions: 3.0.2 >Reporter: guojh >Assignee: guojh >Priority: Major > Labels: patch > Fix For: 3.3.0 > > Attachments: HDFS-14768.000.patch > > > Policy is RS-6-3-1024K, version is hadoop 3.0.2; > We suppose a file's block Index is [0,1,2,3,4,5,6,7,8], And decommission > index[3,4], increase the index 6 datanode's > pendingReplicationWithoutTargets that make it large than > replicationStreamsHardLimit(we set 14). Then, After the method > chooseSourceDatanodes of BlockMananger, the liveBlockIndices is > [0,1,2,3,4,5,7,8], Block Counter is, Live:7, Decommission:2. > In method scheduleReconstruction of BlockManager, the additionalReplRequired > is 9 - 7 = 2. After Namenode choose two target Datanode, will assign a > erasureCode task to target datanode. > When datanode get the task will build targetIndices from liveBlockIndices > and target length. the code is blow. > {code:java} > // code placeholder > targetIndices = new short[targets.length]; > private void initTargetIndices() { > BitSet bitset = reconstructor.getLiveBitSet(); > int m = 0; hasValidTargets = false; > for (int i = 0; i < dataBlkNum + parityBlkNum; i++) { > if (!bitset.get) { > if (reconstructor.getBlockLen > 0) { > if (m < targets.length) { > targetIndices[m++] = (short)i; > hasValidTargets = true; > } > } > } > } > {code} > targetIndices[0]=6, and targetIndices[1] is aways 0 from initial value. > The StripedReader is aways create reader from first 6 index block, and is > [0,1,2,3,4,5] > Use the index [0,1,2,3,4,5] to build target index[6,0] will trigger the isal > bug. the block index6's data is corruption(all data is zero). > I write a unit test can stabilize repreduce. > {code:java} > // code placeholder > public void testFileDecommission() throws Exception { > LOG.info("Starting test testFileDecommission"); > final Path ecFile = new Path(ecDir, "testFileDecommission"); > int writeBytes = cellSize * dataBlocks; > writeStripedFile(dfs, ecFile, writeBytes); > Assert.assertEquals(0, bm.numOfUnderReplicatedBlocks()); > FileChecksum fileChecksum1 = dfs.getFileChecksum(ecFile, writeBytes); > LocatedBlocks locatedBlocks = > StripedFileTestUtil.getLocatedBlocks(ecFile, dfs); > LocatedBlock lb = dfs.getClient().getLocatedBlocks(ecFile.toString(), 0) > .get(0); > DatanodeInfo[] dnLocs = lb.getLocations(); > LocatedStripedBlock lastBlock = > (LocatedStripedBlock)locatedBlocks.getLastLocatedBlock(); > DatanodeInfo[] storageInfos = lastBlock.getLocations(); > // > DatanodeDescriptor datanodeDescriptor = > cluster.getNameNode().getNamesystem() > > .getBlockManager().getDatanodeManager().getDatanode(storageInfos[6].getDatanodeUuid()); > for (int i = 0; i < 100; i++) { > datanodeDescriptor.incrementPendingReplicationWithoutTargets(); > } > assertEquals(dataBlocks + parityBlocks, dnLocs.length); > int[] decommNodeIndex = {3, 4}; > final List decommisionNodes = new ArrayList(); > // add the node which will be decommissioning > decommisionNodes.add(dnLocs[decommNodeIndex[0]]); > decommisionNodes.add(dnLocs[decommNodeIndex[1]]); > decommissionNode(0, decommisionNodes, AdminStates.DECOMMISSIONED); > assertEquals(decommisionNodes.size(), fsn.getNumDecomLiveDataNodes()); > //assertNull(checkFile(dfs, ecFile, 9, decommisionNodes, numDNs)); > // Ensure decommissioned datanode is not automatically shutdown > DFSClient client = getDfsClient(cluster.getNameNode(0), conf); > assertEquals("All datanodes must be alive", numDNs, > client.datanodeReport(DatanodeReportType.LIVE).length); > FileChecksum fileChecksum2 = dfs.getFileChecksum(ecFile, writeBytes); > Assert.assertTrue("Checksum mismatches!", > fileChecksum1.equals(fileChecksum2)); > StripedFileTestUtil.checkData(dfs, ecFile, writeBytes, decommisionNodes, > null, blockGroupSize); > } > {code} > -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14768) In some cases, erasure blocks are corruption when they are reconstruct.
[ https://issues.apache.org/jira/browse/HDFS-14768?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhao Yi Ming updated HDFS-14768: Attachment: (was: zhaoyiming_changes_after_deomission.txt) > In some cases, erasure blocks are corruption when they are reconstruct. > > > Key: HDFS-14768 > URL: https://issues.apache.org/jira/browse/HDFS-14768 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode, erasure-coding, hdfs, namenode >Affects Versions: 3.0.2 >Reporter: guojh >Assignee: guojh >Priority: Major > Labels: patch > Fix For: 3.3.0 > > Attachments: HDFS-14768.000.patch > > > Policy is RS-6-3-1024K, version is hadoop 3.0.2; > We suppose a file's block Index is [0,1,2,3,4,5,6,7,8], And decommission > index[3,4], increase the index 6 datanode's > pendingReplicationWithoutTargets that make it large than > replicationStreamsHardLimit(we set 14). Then, After the method > chooseSourceDatanodes of BlockMananger, the liveBlockIndices is > [0,1,2,3,4,5,7,8], Block Counter is, Live:7, Decommission:2. > In method scheduleReconstruction of BlockManager, the additionalReplRequired > is 9 - 7 = 2. After Namenode choose two target Datanode, will assign a > erasureCode task to target datanode. > When datanode get the task will build targetIndices from liveBlockIndices > and target length. the code is blow. > {code:java} > // code placeholder > targetIndices = new short[targets.length]; > private void initTargetIndices() { > BitSet bitset = reconstructor.getLiveBitSet(); > int m = 0; hasValidTargets = false; > for (int i = 0; i < dataBlkNum + parityBlkNum; i++) { > if (!bitset.get) { > if (reconstructor.getBlockLen > 0) { > if (m < targets.length) { > targetIndices[m++] = (short)i; > hasValidTargets = true; > } > } > } > } > {code} > targetIndices[0]=6, and targetIndices[1] is aways 0 from initial value. > The StripedReader is aways create reader from first 6 index block, and is > [0,1,2,3,4,5] > Use the index [0,1,2,3,4,5] to build target index[6,0] will trigger the isal > bug. the block index6's data is corruption(all data is zero). > I write a unit test can stabilize repreduce. > {code:java} > // code placeholder > public void testFileDecommission() throws Exception { > LOG.info("Starting test testFileDecommission"); > final Path ecFile = new Path(ecDir, "testFileDecommission"); > int writeBytes = cellSize * dataBlocks; > writeStripedFile(dfs, ecFile, writeBytes); > Assert.assertEquals(0, bm.numOfUnderReplicatedBlocks()); > FileChecksum fileChecksum1 = dfs.getFileChecksum(ecFile, writeBytes); > LocatedBlocks locatedBlocks = > StripedFileTestUtil.getLocatedBlocks(ecFile, dfs); > LocatedBlock lb = dfs.getClient().getLocatedBlocks(ecFile.toString(), 0) > .get(0); > DatanodeInfo[] dnLocs = lb.getLocations(); > LocatedStripedBlock lastBlock = > (LocatedStripedBlock)locatedBlocks.getLastLocatedBlock(); > DatanodeInfo[] storageInfos = lastBlock.getLocations(); > // > DatanodeDescriptor datanodeDescriptor = > cluster.getNameNode().getNamesystem() > > .getBlockManager().getDatanodeManager().getDatanode(storageInfos[6].getDatanodeUuid()); > for (int i = 0; i < 100; i++) { > datanodeDescriptor.incrementPendingReplicationWithoutTargets(); > } > assertEquals(dataBlocks + parityBlocks, dnLocs.length); > int[] decommNodeIndex = {3, 4}; > final List decommisionNodes = new ArrayList(); > // add the node which will be decommissioning > decommisionNodes.add(dnLocs[decommNodeIndex[0]]); > decommisionNodes.add(dnLocs[decommNodeIndex[1]]); > decommissionNode(0, decommisionNodes, AdminStates.DECOMMISSIONED); > assertEquals(decommisionNodes.size(), fsn.getNumDecomLiveDataNodes()); > //assertNull(checkFile(dfs, ecFile, 9, decommisionNodes, numDNs)); > // Ensure decommissioned datanode is not automatically shutdown > DFSClient client = getDfsClient(cluster.getNameNode(0), conf); > assertEquals("All datanodes must be alive", numDNs, > client.datanodeReport(DatanodeReportType.LIVE).length); > FileChecksum fileChecksum2 = dfs.getFileChecksum(ecFile, writeBytes); > Assert.assertTrue("Checksum mismatches!", > fileChecksum1.equals(fileChecksum2)); > StripedFileTestUtil.checkData(dfs, ecFile, writeBytes, decommisionNodes, > null, blockGroupSize); > } > {code} > -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14768) In some cases, erasure blocks are corruption when they are reconstruct.
[ https://issues.apache.org/jira/browse/HDFS-14768?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhao Yi Ming updated HDFS-14768: Attachment: (was: guojh_changes_before_deomission.txt) > In some cases, erasure blocks are corruption when they are reconstruct. > > > Key: HDFS-14768 > URL: https://issues.apache.org/jira/browse/HDFS-14768 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode, erasure-coding, hdfs, namenode >Affects Versions: 3.0.2 >Reporter: guojh >Assignee: guojh >Priority: Major > Labels: patch > Fix For: 3.3.0 > > Attachments: HDFS-14768.000.patch > > > Policy is RS-6-3-1024K, version is hadoop 3.0.2; > We suppose a file's block Index is [0,1,2,3,4,5,6,7,8], And decommission > index[3,4], increase the index 6 datanode's > pendingReplicationWithoutTargets that make it large than > replicationStreamsHardLimit(we set 14). Then, After the method > chooseSourceDatanodes of BlockMananger, the liveBlockIndices is > [0,1,2,3,4,5,7,8], Block Counter is, Live:7, Decommission:2. > In method scheduleReconstruction of BlockManager, the additionalReplRequired > is 9 - 7 = 2. After Namenode choose two target Datanode, will assign a > erasureCode task to target datanode. > When datanode get the task will build targetIndices from liveBlockIndices > and target length. the code is blow. > {code:java} > // code placeholder > targetIndices = new short[targets.length]; > private void initTargetIndices() { > BitSet bitset = reconstructor.getLiveBitSet(); > int m = 0; hasValidTargets = false; > for (int i = 0; i < dataBlkNum + parityBlkNum; i++) { > if (!bitset.get) { > if (reconstructor.getBlockLen > 0) { > if (m < targets.length) { > targetIndices[m++] = (short)i; > hasValidTargets = true; > } > } > } > } > {code} > targetIndices[0]=6, and targetIndices[1] is aways 0 from initial value. > The StripedReader is aways create reader from first 6 index block, and is > [0,1,2,3,4,5] > Use the index [0,1,2,3,4,5] to build target index[6,0] will trigger the isal > bug. the block index6's data is corruption(all data is zero). > I write a unit test can stabilize repreduce. > {code:java} > // code placeholder > public void testFileDecommission() throws Exception { > LOG.info("Starting test testFileDecommission"); > final Path ecFile = new Path(ecDir, "testFileDecommission"); > int writeBytes = cellSize * dataBlocks; > writeStripedFile(dfs, ecFile, writeBytes); > Assert.assertEquals(0, bm.numOfUnderReplicatedBlocks()); > FileChecksum fileChecksum1 = dfs.getFileChecksum(ecFile, writeBytes); > LocatedBlocks locatedBlocks = > StripedFileTestUtil.getLocatedBlocks(ecFile, dfs); > LocatedBlock lb = dfs.getClient().getLocatedBlocks(ecFile.toString(), 0) > .get(0); > DatanodeInfo[] dnLocs = lb.getLocations(); > LocatedStripedBlock lastBlock = > (LocatedStripedBlock)locatedBlocks.getLastLocatedBlock(); > DatanodeInfo[] storageInfos = lastBlock.getLocations(); > // > DatanodeDescriptor datanodeDescriptor = > cluster.getNameNode().getNamesystem() > > .getBlockManager().getDatanodeManager().getDatanode(storageInfos[6].getDatanodeUuid()); > for (int i = 0; i < 100; i++) { > datanodeDescriptor.incrementPendingReplicationWithoutTargets(); > } > assertEquals(dataBlocks + parityBlocks, dnLocs.length); > int[] decommNodeIndex = {3, 4}; > final List decommisionNodes = new ArrayList(); > // add the node which will be decommissioning > decommisionNodes.add(dnLocs[decommNodeIndex[0]]); > decommisionNodes.add(dnLocs[decommNodeIndex[1]]); > decommissionNode(0, decommisionNodes, AdminStates.DECOMMISSIONED); > assertEquals(decommisionNodes.size(), fsn.getNumDecomLiveDataNodes()); > //assertNull(checkFile(dfs, ecFile, 9, decommisionNodes, numDNs)); > // Ensure decommissioned datanode is not automatically shutdown > DFSClient client = getDfsClient(cluster.getNameNode(0), conf); > assertEquals("All datanodes must be alive", numDNs, > client.datanodeReport(DatanodeReportType.LIVE).length); > FileChecksum fileChecksum2 = dfs.getFileChecksum(ecFile, writeBytes); > Assert.assertTrue("Checksum mismatches!", > fileChecksum1.equals(fileChecksum2)); > StripedFileTestUtil.checkData(dfs, ecFile, writeBytes, decommisionNodes, > null, blockGroupSize); > } > {code} > -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14768) In some cases, erasure blocks are corruption when they are reconstruct.
[ https://issues.apache.org/jira/browse/HDFS-14768?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhao Yi Ming updated HDFS-14768: Attachment: (was: guojh_changes_after_deomission.txt) > In some cases, erasure blocks are corruption when they are reconstruct. > > > Key: HDFS-14768 > URL: https://issues.apache.org/jira/browse/HDFS-14768 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode, erasure-coding, hdfs, namenode >Affects Versions: 3.0.2 >Reporter: guojh >Assignee: guojh >Priority: Major > Labels: patch > Fix For: 3.3.0 > > Attachments: HDFS-14768.000.patch > > > Policy is RS-6-3-1024K, version is hadoop 3.0.2; > We suppose a file's block Index is [0,1,2,3,4,5,6,7,8], And decommission > index[3,4], increase the index 6 datanode's > pendingReplicationWithoutTargets that make it large than > replicationStreamsHardLimit(we set 14). Then, After the method > chooseSourceDatanodes of BlockMananger, the liveBlockIndices is > [0,1,2,3,4,5,7,8], Block Counter is, Live:7, Decommission:2. > In method scheduleReconstruction of BlockManager, the additionalReplRequired > is 9 - 7 = 2. After Namenode choose two target Datanode, will assign a > erasureCode task to target datanode. > When datanode get the task will build targetIndices from liveBlockIndices > and target length. the code is blow. > {code:java} > // code placeholder > targetIndices = new short[targets.length]; > private void initTargetIndices() { > BitSet bitset = reconstructor.getLiveBitSet(); > int m = 0; hasValidTargets = false; > for (int i = 0; i < dataBlkNum + parityBlkNum; i++) { > if (!bitset.get) { > if (reconstructor.getBlockLen > 0) { > if (m < targets.length) { > targetIndices[m++] = (short)i; > hasValidTargets = true; > } > } > } > } > {code} > targetIndices[0]=6, and targetIndices[1] is aways 0 from initial value. > The StripedReader is aways create reader from first 6 index block, and is > [0,1,2,3,4,5] > Use the index [0,1,2,3,4,5] to build target index[6,0] will trigger the isal > bug. the block index6's data is corruption(all data is zero). > I write a unit test can stabilize repreduce. > {code:java} > // code placeholder > public void testFileDecommission() throws Exception { > LOG.info("Starting test testFileDecommission"); > final Path ecFile = new Path(ecDir, "testFileDecommission"); > int writeBytes = cellSize * dataBlocks; > writeStripedFile(dfs, ecFile, writeBytes); > Assert.assertEquals(0, bm.numOfUnderReplicatedBlocks()); > FileChecksum fileChecksum1 = dfs.getFileChecksum(ecFile, writeBytes); > LocatedBlocks locatedBlocks = > StripedFileTestUtil.getLocatedBlocks(ecFile, dfs); > LocatedBlock lb = dfs.getClient().getLocatedBlocks(ecFile.toString(), 0) > .get(0); > DatanodeInfo[] dnLocs = lb.getLocations(); > LocatedStripedBlock lastBlock = > (LocatedStripedBlock)locatedBlocks.getLastLocatedBlock(); > DatanodeInfo[] storageInfos = lastBlock.getLocations(); > // > DatanodeDescriptor datanodeDescriptor = > cluster.getNameNode().getNamesystem() > > .getBlockManager().getDatanodeManager().getDatanode(storageInfos[6].getDatanodeUuid()); > for (int i = 0; i < 100; i++) { > datanodeDescriptor.incrementPendingReplicationWithoutTargets(); > } > assertEquals(dataBlocks + parityBlocks, dnLocs.length); > int[] decommNodeIndex = {3, 4}; > final List decommisionNodes = new ArrayList(); > // add the node which will be decommissioning > decommisionNodes.add(dnLocs[decommNodeIndex[0]]); > decommisionNodes.add(dnLocs[decommNodeIndex[1]]); > decommissionNode(0, decommisionNodes, AdminStates.DECOMMISSIONED); > assertEquals(decommisionNodes.size(), fsn.getNumDecomLiveDataNodes()); > //assertNull(checkFile(dfs, ecFile, 9, decommisionNodes, numDNs)); > // Ensure decommissioned datanode is not automatically shutdown > DFSClient client = getDfsClient(cluster.getNameNode(0), conf); > assertEquals("All datanodes must be alive", numDNs, > client.datanodeReport(DatanodeReportType.LIVE).length); > FileChecksum fileChecksum2 = dfs.getFileChecksum(ecFile, writeBytes); > Assert.assertTrue("Checksum mismatches!", > fileChecksum1.equals(fileChecksum2)); > StripedFileTestUtil.checkData(dfs, ecFile, writeBytes, decommisionNodes, > null, blockGroupSize); > } > {code} > -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14840) Use Java Conccurent Instead of Synchronization in BlockPoolTokenSecretManager
[ https://issues.apache.org/jira/browse/HDFS-14840?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Akira Ajisaka updated HDFS-14840: - Summary: Use Java Conccurent Instead of Synchronization in BlockPoolTokenSecretManager (was: Use Java Conccurent Instead of Synchrnoization in BlockPoolTokenSecretManager) > Use Java Conccurent Instead of Synchronization in BlockPoolTokenSecretManager > - > > Key: HDFS-14840 > URL: https://issues.apache.org/jira/browse/HDFS-14840 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs >Affects Versions: 3.2.0 >Reporter: David Mollitor >Assignee: David Mollitor >Priority: Minor > Attachments: HDFS-14840.1.patch > > > https://github.com/apache/hadoop/blob/d8bac50e12d243ef8fd2c7e0ce5c9997131dee74/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/security/token/block/BlockPoolTokenSecretManager.java#L40 > Instead of synchronizing the entire class, just synchronize the collection. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14768) In some cases, erasure blocks are corruption when they are reconstruct.
[ https://issues.apache.org/jira/browse/HDFS-14768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16928172#comment-16928172 ] Zhao Yi Ming commented on HDFS-14768: - One more information, hope it can give help. Through [~gjhkael]'s UT make the decommission hang, but I check the local data folder, seems the reconstruction worked well. Also with my changed on the UT, the reconstruction worked well too. Attached the folder file list. > In some cases, erasure blocks are corruption when they are reconstruct. > > > Key: HDFS-14768 > URL: https://issues.apache.org/jira/browse/HDFS-14768 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode, erasure-coding, hdfs, namenode >Affects Versions: 3.0.2 >Reporter: guojh >Assignee: guojh >Priority: Major > Labels: patch > Fix For: 3.3.0 > > Attachments: HDFS-14768.000.patch, > guojh_changes_after_deomission.txt, guojh_changes_before_deomission.txt, > zhaoyiming_changes_after_deomission.txt, > zhaoyiming_changes_beofre_deomission.txt > > > Policy is RS-6-3-1024K, version is hadoop 3.0.2; > We suppose a file's block Index is [0,1,2,3,4,5,6,7,8], And decommission > index[3,4], increase the index 6 datanode's > pendingReplicationWithoutTargets that make it large than > replicationStreamsHardLimit(we set 14). Then, After the method > chooseSourceDatanodes of BlockMananger, the liveBlockIndices is > [0,1,2,3,4,5,7,8], Block Counter is, Live:7, Decommission:2. > In method scheduleReconstruction of BlockManager, the additionalReplRequired > is 9 - 7 = 2. After Namenode choose two target Datanode, will assign a > erasureCode task to target datanode. > When datanode get the task will build targetIndices from liveBlockIndices > and target length. the code is blow. > {code:java} > // code placeholder > targetIndices = new short[targets.length]; > private void initTargetIndices() { > BitSet bitset = reconstructor.getLiveBitSet(); > int m = 0; hasValidTargets = false; > for (int i = 0; i < dataBlkNum + parityBlkNum; i++) { > if (!bitset.get) { > if (reconstructor.getBlockLen > 0) { > if (m < targets.length) { > targetIndices[m++] = (short)i; > hasValidTargets = true; > } > } > } > } > {code} > targetIndices[0]=6, and targetIndices[1] is aways 0 from initial value. > The StripedReader is aways create reader from first 6 index block, and is > [0,1,2,3,4,5] > Use the index [0,1,2,3,4,5] to build target index[6,0] will trigger the isal > bug. the block index6's data is corruption(all data is zero). > I write a unit test can stabilize repreduce. > {code:java} > // code placeholder > public void testFileDecommission() throws Exception { > LOG.info("Starting test testFileDecommission"); > final Path ecFile = new Path(ecDir, "testFileDecommission"); > int writeBytes = cellSize * dataBlocks; > writeStripedFile(dfs, ecFile, writeBytes); > Assert.assertEquals(0, bm.numOfUnderReplicatedBlocks()); > FileChecksum fileChecksum1 = dfs.getFileChecksum(ecFile, writeBytes); > LocatedBlocks locatedBlocks = > StripedFileTestUtil.getLocatedBlocks(ecFile, dfs); > LocatedBlock lb = dfs.getClient().getLocatedBlocks(ecFile.toString(), 0) > .get(0); > DatanodeInfo[] dnLocs = lb.getLocations(); > LocatedStripedBlock lastBlock = > (LocatedStripedBlock)locatedBlocks.getLastLocatedBlock(); > DatanodeInfo[] storageInfos = lastBlock.getLocations(); > // > DatanodeDescriptor datanodeDescriptor = > cluster.getNameNode().getNamesystem() > > .getBlockManager().getDatanodeManager().getDatanode(storageInfos[6].getDatanodeUuid()); > for (int i = 0; i < 100; i++) { > datanodeDescriptor.incrementPendingReplicationWithoutTargets(); > } > assertEquals(dataBlocks + parityBlocks, dnLocs.length); > int[] decommNodeIndex = {3, 4}; > final List decommisionNodes = new ArrayList(); > // add the node which will be decommissioning > decommisionNodes.add(dnLocs[decommNodeIndex[0]]); > decommisionNodes.add(dnLocs[decommNodeIndex[1]]); > decommissionNode(0, decommisionNodes, AdminStates.DECOMMISSIONED); > assertEquals(decommisionNodes.size(), fsn.getNumDecomLiveDataNodes()); > //assertNull(checkFile(dfs, ecFile, 9, decommisionNodes, numDNs)); > // Ensure decommissioned datanode is not automatically shutdown > DFSClient client = getDfsClient(cluster.getNameNode(0), conf); > assertEquals("All datanodes must be alive", numDNs, > client.datanodeReport(DatanodeReportType.LIVE).length); > FileChecksum fileChecksum2 = dfs.getFileChecksum(ecFile, writeBytes); > Assert.assertTrue("Checksum mismatches!", > fileChecksum1.equals(fileChecksum2)); > StripedFileTestUt
[jira] [Updated] (HDFS-14768) In some cases, erasure blocks are corruption when they are reconstruct.
[ https://issues.apache.org/jira/browse/HDFS-14768?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhao Yi Ming updated HDFS-14768: Attachment: guojh_changes_after_deomission.txt guojh_changes_before_deomission.txt zhaoyiming_changes_after_deomission.txt zhaoyiming_changes_beofre_deomission.txt > In some cases, erasure blocks are corruption when they are reconstruct. > > > Key: HDFS-14768 > URL: https://issues.apache.org/jira/browse/HDFS-14768 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode, erasure-coding, hdfs, namenode >Affects Versions: 3.0.2 >Reporter: guojh >Assignee: guojh >Priority: Major > Labels: patch > Fix For: 3.3.0 > > Attachments: HDFS-14768.000.patch, > guojh_changes_after_deomission.txt, guojh_changes_before_deomission.txt, > zhaoyiming_changes_after_deomission.txt, > zhaoyiming_changes_beofre_deomission.txt > > > Policy is RS-6-3-1024K, version is hadoop 3.0.2; > We suppose a file's block Index is [0,1,2,3,4,5,6,7,8], And decommission > index[3,4], increase the index 6 datanode's > pendingReplicationWithoutTargets that make it large than > replicationStreamsHardLimit(we set 14). Then, After the method > chooseSourceDatanodes of BlockMananger, the liveBlockIndices is > [0,1,2,3,4,5,7,8], Block Counter is, Live:7, Decommission:2. > In method scheduleReconstruction of BlockManager, the additionalReplRequired > is 9 - 7 = 2. After Namenode choose two target Datanode, will assign a > erasureCode task to target datanode. > When datanode get the task will build targetIndices from liveBlockIndices > and target length. the code is blow. > {code:java} > // code placeholder > targetIndices = new short[targets.length]; > private void initTargetIndices() { > BitSet bitset = reconstructor.getLiveBitSet(); > int m = 0; hasValidTargets = false; > for (int i = 0; i < dataBlkNum + parityBlkNum; i++) { > if (!bitset.get) { > if (reconstructor.getBlockLen > 0) { > if (m < targets.length) { > targetIndices[m++] = (short)i; > hasValidTargets = true; > } > } > } > } > {code} > targetIndices[0]=6, and targetIndices[1] is aways 0 from initial value. > The StripedReader is aways create reader from first 6 index block, and is > [0,1,2,3,4,5] > Use the index [0,1,2,3,4,5] to build target index[6,0] will trigger the isal > bug. the block index6's data is corruption(all data is zero). > I write a unit test can stabilize repreduce. > {code:java} > // code placeholder > public void testFileDecommission() throws Exception { > LOG.info("Starting test testFileDecommission"); > final Path ecFile = new Path(ecDir, "testFileDecommission"); > int writeBytes = cellSize * dataBlocks; > writeStripedFile(dfs, ecFile, writeBytes); > Assert.assertEquals(0, bm.numOfUnderReplicatedBlocks()); > FileChecksum fileChecksum1 = dfs.getFileChecksum(ecFile, writeBytes); > LocatedBlocks locatedBlocks = > StripedFileTestUtil.getLocatedBlocks(ecFile, dfs); > LocatedBlock lb = dfs.getClient().getLocatedBlocks(ecFile.toString(), 0) > .get(0); > DatanodeInfo[] dnLocs = lb.getLocations(); > LocatedStripedBlock lastBlock = > (LocatedStripedBlock)locatedBlocks.getLastLocatedBlock(); > DatanodeInfo[] storageInfos = lastBlock.getLocations(); > // > DatanodeDescriptor datanodeDescriptor = > cluster.getNameNode().getNamesystem() > > .getBlockManager().getDatanodeManager().getDatanode(storageInfos[6].getDatanodeUuid()); > for (int i = 0; i < 100; i++) { > datanodeDescriptor.incrementPendingReplicationWithoutTargets(); > } > assertEquals(dataBlocks + parityBlocks, dnLocs.length); > int[] decommNodeIndex = {3, 4}; > final List decommisionNodes = new ArrayList(); > // add the node which will be decommissioning > decommisionNodes.add(dnLocs[decommNodeIndex[0]]); > decommisionNodes.add(dnLocs[decommNodeIndex[1]]); > decommissionNode(0, decommisionNodes, AdminStates.DECOMMISSIONED); > assertEquals(decommisionNodes.size(), fsn.getNumDecomLiveDataNodes()); > //assertNull(checkFile(dfs, ecFile, 9, decommisionNodes, numDNs)); > // Ensure decommissioned datanode is not automatically shutdown > DFSClient client = getDfsClient(cluster.getNameNode(0), conf); > assertEquals("All datanodes must be alive", numDNs, > client.datanodeReport(DatanodeReportType.LIVE).length); > FileChecksum fileChecksum2 = dfs.getFileChecksum(ecFile, writeBytes); > Assert.assertTrue("Checksum mismatches!", > fileChecksum1.equals(fileChecksum2)); > StripedFileTestUtil.checkData(dfs, ecFile, writeBytes, decommisionNodes, > null, blockGroupSize); > } > {code} > -- This
[jira] [Commented] (HDFS-14768) In some cases, erasure blocks are corruption when they are reconstruct.
[ https://issues.apache.org/jira/browse/HDFS-14768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16928167#comment-16928167 ] Zhao Yi Ming commented on HDFS-14768: - [~gjhkael] Thanks for your update! Here is one question, in your UT if move the decommisionNodes() before getLastLocatedBlock(), will it effect the UT result? Because I did this changes, the UT can run as normal. move following code {code:java} // code placeholder int[] decommNodeIndex = {3, 4}; final List decommisionNodes = new ArrayList(); // add the node which will be decommissioning decommisionNodes.add(dnLocs[decommNodeIndex[0]]); decommisionNodes.add(dnLocs[decommNodeIndex[1]]); decommissionNode(0, decommisionNodes, AdminStates.DECOMMISSIONED); assertEquals(decommisionNodes.size(), fsn.getNumDecomLiveDataNodes()); {code} to {code:java} // code placeholder LocatedBlock lb = dfs.getClient().getLocatedBlocks(ecFile.toString(), 0) .get(0); DatanodeInfo[] dnLocs = lb.getLocations(); => Here LocatedStripedBlock lastBlock = (LocatedStripedBlock)locatedBlocks.getLastLocatedBlock(); DatanodeInfo[] storageInfos = lastBlock.getLocations(); {code} > In some cases, erasure blocks are corruption when they are reconstruct. > > > Key: HDFS-14768 > URL: https://issues.apache.org/jira/browse/HDFS-14768 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode, erasure-coding, hdfs, namenode >Affects Versions: 3.0.2 >Reporter: guojh >Assignee: guojh >Priority: Major > Labels: patch > Fix For: 3.3.0 > > Attachments: HDFS-14768.000.patch > > > Policy is RS-6-3-1024K, version is hadoop 3.0.2; > We suppose a file's block Index is [0,1,2,3,4,5,6,7,8], And decommission > index[3,4], increase the index 6 datanode's > pendingReplicationWithoutTargets that make it large than > replicationStreamsHardLimit(we set 14). Then, After the method > chooseSourceDatanodes of BlockMananger, the liveBlockIndices is > [0,1,2,3,4,5,7,8], Block Counter is, Live:7, Decommission:2. > In method scheduleReconstruction of BlockManager, the additionalReplRequired > is 9 - 7 = 2. After Namenode choose two target Datanode, will assign a > erasureCode task to target datanode. > When datanode get the task will build targetIndices from liveBlockIndices > and target length. the code is blow. > {code:java} > // code placeholder > targetIndices = new short[targets.length]; > private void initTargetIndices() { > BitSet bitset = reconstructor.getLiveBitSet(); > int m = 0; hasValidTargets = false; > for (int i = 0; i < dataBlkNum + parityBlkNum; i++) { > if (!bitset.get) { > if (reconstructor.getBlockLen > 0) { > if (m < targets.length) { > targetIndices[m++] = (short)i; > hasValidTargets = true; > } > } > } > } > {code} > targetIndices[0]=6, and targetIndices[1] is aways 0 from initial value. > The StripedReader is aways create reader from first 6 index block, and is > [0,1,2,3,4,5] > Use the index [0,1,2,3,4,5] to build target index[6,0] will trigger the isal > bug. the block index6's data is corruption(all data is zero). > I write a unit test can stabilize repreduce. > {code:java} > // code placeholder > public void testFileDecommission() throws Exception { > LOG.info("Starting test testFileDecommission"); > final Path ecFile = new Path(ecDir, "testFileDecommission"); > int writeBytes = cellSize * dataBlocks; > writeStripedFile(dfs, ecFile, writeBytes); > Assert.assertEquals(0, bm.numOfUnderReplicatedBlocks()); > FileChecksum fileChecksum1 = dfs.getFileChecksum(ecFile, writeBytes); > LocatedBlocks locatedBlocks = > StripedFileTestUtil.getLocatedBlocks(ecFile, dfs); > LocatedBlock lb = dfs.getClient().getLocatedBlocks(ecFile.toString(), 0) > .get(0); > DatanodeInfo[] dnLocs = lb.getLocations(); > LocatedStripedBlock lastBlock = > (LocatedStripedBlock)locatedBlocks.getLastLocatedBlock(); > DatanodeInfo[] storageInfos = lastBlock.getLocations(); > // > DatanodeDescriptor datanodeDescriptor = > cluster.getNameNode().getNamesystem() > > .getBlockManager().getDatanodeManager().getDatanode(storageInfos[6].getDatanodeUuid()); > for (int i = 0; i < 100; i++) { > datanodeDescriptor.incrementPendingReplicationWithoutTargets(); > } > assertEquals(dataBlocks + parityBlocks, dnLocs.length); > int[] decommNodeIndex = {3, 4}; > final List decommisionNodes = new ArrayList(); > // add the node which will be decommissioning > decommisionNodes.add(dnLocs[decommNodeIndex[0]]); > decommisionNodes.add(dnLocs[deco
[jira] [Comment Edited] (HDFS-14768) In some cases, erasure blocks are corruption when they are reconstruct.
[ https://issues.apache.org/jira/browse/HDFS-14768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16928167#comment-16928167 ] Zhao Yi Ming edited comment on HDFS-14768 at 9/12/19 3:34 AM: -- [~gjhkael] Thanks for your update! Here is one question, in your UT if move the decommisionNodes() before getLastLocatedBlock(), will it effect the UT result? Because I did this changes, the UT can run as normal. move following code {code:java} // code placeholder int[] decommNodeIndex = {3, 4}; final List decommisionNodes = new ArrayList(); // add the node which will be decommissioning decommisionNodes.add(dnLocs[decommNodeIndex[0]]); decommisionNodes.add(dnLocs[decommNodeIndex[1]]); decommissionNode(0, decommisionNodes, AdminStates.DECOMMISSIONED); assertEquals(decommisionNodes.size(), fsn.getNumDecomLiveDataNodes()); {code} to {code:java} // code placeholder LocatedBlock lb = dfs.getClient().getLocatedBlocks(ecFile.toString(), 0) .get(0); DatanodeInfo[] dnLocs = lb.getLocations(); => Here LocatedStripedBlock lastBlock = (LocatedStripedBlock)locatedBlocks.getLastLocatedBlock(); DatanodeInfo[] storageInfos = lastBlock.getLocations(); {code} was (Author: zhaoyim): [~gjhkael] Thanks for your update! Here is one question, in your UT if move the decommisionNodes() before getLastLocatedBlock(), will it effect the UT result? Because I did this changes, the UT can run as normal. move following code {code:java} // code placeholder int[] decommNodeIndex = {3, 4}; final List decommisionNodes = new ArrayList(); // add the node which will be decommissioning decommisionNodes.add(dnLocs[decommNodeIndex[0]]); decommisionNodes.add(dnLocs[decommNodeIndex[1]]); decommissionNode(0, decommisionNodes, AdminStates.DECOMMISSIONED); assertEquals(decommisionNodes.size(), fsn.getNumDecomLiveDataNodes()); {code} to {code:java} // code placeholder LocatedBlock lb = dfs.getClient().getLocatedBlocks(ecFile.toString(), 0) .get(0); DatanodeInfo[] dnLocs = lb.getLocations(); => Here LocatedStripedBlock lastBlock = (LocatedStripedBlock)locatedBlocks.getLastLocatedBlock(); DatanodeInfo[] storageInfos = lastBlock.getLocations(); {code} > In some cases, erasure blocks are corruption when they are reconstruct. > > > Key: HDFS-14768 > URL: https://issues.apache.org/jira/browse/HDFS-14768 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode, erasure-coding, hdfs, namenode >Affects Versions: 3.0.2 >Reporter: guojh >Assignee: guojh >Priority: Major > Labels: patch > Fix For: 3.3.0 > > Attachments: HDFS-14768.000.patch > > > Policy is RS-6-3-1024K, version is hadoop 3.0.2; > We suppose a file's block Index is [0,1,2,3,4,5,6,7,8], And decommission > index[3,4], increase the index 6 datanode's > pendingReplicationWithoutTargets that make it large than > replicationStreamsHardLimit(we set 14). Then, After the method > chooseSourceDatanodes of BlockMananger, the liveBlockIndices is > [0,1,2,3,4,5,7,8], Block Counter is, Live:7, Decommission:2. > In method scheduleReconstruction of BlockManager, the additionalReplRequired > is 9 - 7 = 2. After Namenode choose two target Datanode, will assign a > erasureCode task to target datanode. > When datanode get the task will build targetIndices from liveBlockIndices > and target length. the code is blow. > {code:java} > // code placeholder > targetIndices = new short[targets.length]; > private void initTargetIndices() { > BitSet bitset = reconstructor.getLiveBitSet(); > int m = 0; hasValidTargets = false; > for (int i = 0; i < dataBlkNum + parityBlkNum; i++) { > if (!bitset.get) { > if (reconstructor.getBlockLen > 0) { > if (m < targets.length) { > targetIndices[m++] = (short)i; > hasValidTargets = true; > } > } > } > } > {code} > targetIndices[0]=6, and targetIndices[1] is aways 0 from initial value. > The StripedReader is aways create reader from first 6 index block, and is > [0,1,2,3,4,5] > Use the index [0,1,2,3,4,5] to build target index[6,0] will trigger the isal > bug. the block index6's data is corruption(all data is zero). > I write a unit test can stabilize repreduce. > {code:java} > // code placeholder > public void testFileDecommission() throws Exception { > LOG.info("Starting test testFileDecommission"); > final Path ecFile = new Path(ecDir, "testFileDecommission"); > int writeBytes = cellSize * d
[jira] [Commented] (HDFS-14609) RBF: Security should use common AuthenticationFilter
[ https://issues.apache.org/jira/browse/HDFS-14609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16928163#comment-16928163 ] Hadoop QA commented on HDFS-14609: -- | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 22s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 3 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 18m 28s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 27s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 20s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 30s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 12m 7s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 52s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 34s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 26s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 23s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 23s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 15s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 26s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 12m 40s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 57s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 31s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 21m 29s{color} | {color:green} hadoop-hdfs-rbf in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 25s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 71m 54s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=19.03.1 Server=19.03.1 Image:yetus/hadoop:bdbca0e53b4 | | JIRA Issue | HDFS-14609 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12980140/HDFS-14609.006.patch | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux d8bb824d1b38 4.15.0-60-generic #67-Ubuntu SMP Thu Aug 22 16:55:30 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / f537410 | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_222 | | findbugs | v3.1.0-RC1 | | Test Results | https://builds.apache.org/job/PreCommit-HDFS-Build/27848/testReport/ | | Max. process+thread count | 1585 (vs. ulimit of 5500) | | modules | C: hadoop-hdfs-project/hadoop-hdfs-rbf U: hadoop-hdfs-project/hadoop-hdfs-rbf | | Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/27848/console | | Powered by | Apache Yetus 0.8.0 http://yetus.apache.org | This message was automatically generated. > RBF: Security should use common AuthenticationFilter > > > Key: HDFS-14609 >
[jira] [Commented] (HDFS-14832) RBF : Add Icon for ReadOnly False
[ https://issues.apache.org/jira/browse/HDFS-14832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16928161#comment-16928161 ] Takanobu Asanuma commented on HDFS-14832: - Thanks for working on this, [~hemanthboyina], and thanks for pinging me, [~elgoiri]. If we have both of read-only and read-write icons, we may need {{federationhealth-mounttable-legend}} section like other pages. > RBF : Add Icon for ReadOnly False > - > > Key: HDFS-14832 > URL: https://issues.apache.org/jira/browse/HDFS-14832 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: hemanthboyina >Assignee: hemanthboyina >Priority: Minor > > In Router Web UI for Mount Table information , add icon for read only state > false -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14564) Add libhdfs APIs for readFully; add readFully to ByteBufferPositionedReadable
[ https://issues.apache.org/jira/browse/HDFS-14564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16928160#comment-16928160 ] Hadoop QA commented on HDFS-14564: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 39s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} dupname {color} | {color:green} 0m 1s{color} | {color:green} No case conflicting files found. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 5 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 1m 6s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 17m 5s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 16m 27s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 3m 8s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 4m 24s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 19m 47s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 3m 41s{color} | {color:green} trunk passed {color} | | {color:blue}0{color} | {color:blue} spotbugs {color} | {color:blue} 0m 30s{color} | {color:blue} Used deprecated FindBugs config; considering switching to SpotBugs. {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 30s{color} | {color:blue} branch/hadoop-hdfs-project/hadoop-hdfs-native-client no findbugs output file (findbugsXml.xml) {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 24s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 2m 57s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 15m 54s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} cc {color} | {color:green} 15m 54s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 15m 54s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 2m 25s{color} | {color:green} root: The patch generated 0 new + 110 unchanged - 1 fixed = 110 total (was 111) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 4m 18s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 11m 26s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 3m 41s{color} | {color:green} the patch passed {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 35s{color} | {color:blue} hadoop-hdfs-project/hadoop-hdfs-native-client has no data from findbugs {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 8m 55s{color} | {color:green} hadoop-common in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 2m 12s{color} | {color:green} hadoop-hdfs-client in the patch passed. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 85m 51s{color} | {color:red} hadoop-hdfs in the patch failed. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 3m 11s{color} | {color:red} hadoop-hdfs-native-client in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 57s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} |
[jira] [Work logged] (HDDS-1879) Support multiple excluded scopes when choosing datanodes in NetworkTopology
[ https://issues.apache.org/jira/browse/HDDS-1879?focusedWorklogId=311132&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-311132 ] ASF GitHub Bot logged work on HDDS-1879: Author: ASF GitHub Bot Created on: 12/Sep/19 03:02 Start Date: 12/Sep/19 03:02 Worklog Time Spent: 10m Work Description: ChenSammi commented on issue #1194: HDDS-1879. Support multiple excluded scopes when choosing datanodes in NetworkTopology URL: https://github.com/apache/hadoop/pull/1194#issuecomment-530644666 Thanks @xiaoyuyao for the review. A new patch to fix the check style issue. The failed UTs are not related. Will commit after the Yetus build. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 311132) Time Spent: 4h (was: 3h 50m) > Support multiple excluded scopes when choosing datanodes in NetworkTopology > --- > > Key: HDDS-1879 > URL: https://issues.apache.org/jira/browse/HDDS-1879 > Project: Hadoop Distributed Data Store > Issue Type: Sub-task >Reporter: Sammi Chen >Assignee: Sammi Chen >Priority: Major > Labels: pull-request-available > Time Spent: 4h > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14843) Double Synchronization in BlockReportLeaseManager
[ https://issues.apache.org/jira/browse/HDFS-14843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16928155#comment-16928155 ] Supratim Deka commented on HDFS-14843: -- +1 Thanks for the patch [~belugabehr], looks good to me. > Double Synchronization in BlockReportLeaseManager > - > > Key: HDFS-14843 > URL: https://issues.apache.org/jira/browse/HDFS-14843 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: David Mollitor >Assignee: David Mollitor >Priority: Minor > Attachments: HDFS-14843.1.patch > > > {code:java|title=BlockReportLeaseManager.java} > private synchronized long getNextId() { > long id; > do { > id = nextId++; > } while (id == 0); > return id; > } > {code} > This is a private method and is synchronized, however, it is only be accessed > from an already-synchronized method. No need to double-synchronize. > https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockReportLeaseManager.java#L183-L189 > https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockReportLeaseManager.java#L227 -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14768) In some cases, erasure blocks are corruption when they are reconstruct.
[ https://issues.apache.org/jira/browse/HDFS-14768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16928146#comment-16928146 ] guojh commented on HDFS-14768: -- [~zhaoyim] My UT is not good, you need to check the block manually, I will update a new UT to check the block corruption automatic. > In some cases, erasure blocks are corruption when they are reconstruct. > > > Key: HDFS-14768 > URL: https://issues.apache.org/jira/browse/HDFS-14768 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode, erasure-coding, hdfs, namenode >Affects Versions: 3.0.2 >Reporter: guojh >Assignee: guojh >Priority: Major > Labels: patch > Fix For: 3.3.0 > > Attachments: HDFS-14768.000.patch > > > Policy is RS-6-3-1024K, version is hadoop 3.0.2; > We suppose a file's block Index is [0,1,2,3,4,5,6,7,8], And decommission > index[3,4], increase the index 6 datanode's > pendingReplicationWithoutTargets that make it large than > replicationStreamsHardLimit(we set 14). Then, After the method > chooseSourceDatanodes of BlockMananger, the liveBlockIndices is > [0,1,2,3,4,5,7,8], Block Counter is, Live:7, Decommission:2. > In method scheduleReconstruction of BlockManager, the additionalReplRequired > is 9 - 7 = 2. After Namenode choose two target Datanode, will assign a > erasureCode task to target datanode. > When datanode get the task will build targetIndices from liveBlockIndices > and target length. the code is blow. > {code:java} > // code placeholder > targetIndices = new short[targets.length]; > private void initTargetIndices() { > BitSet bitset = reconstructor.getLiveBitSet(); > int m = 0; hasValidTargets = false; > for (int i = 0; i < dataBlkNum + parityBlkNum; i++) { > if (!bitset.get) { > if (reconstructor.getBlockLen > 0) { > if (m < targets.length) { > targetIndices[m++] = (short)i; > hasValidTargets = true; > } > } > } > } > {code} > targetIndices[0]=6, and targetIndices[1] is aways 0 from initial value. > The StripedReader is aways create reader from first 6 index block, and is > [0,1,2,3,4,5] > Use the index [0,1,2,3,4,5] to build target index[6,0] will trigger the isal > bug. the block index6's data is corruption(all data is zero). > I write a unit test can stabilize repreduce. > {code:java} > // code placeholder > public void testFileDecommission() throws Exception { > LOG.info("Starting test testFileDecommission"); > final Path ecFile = new Path(ecDir, "testFileDecommission"); > int writeBytes = cellSize * dataBlocks; > writeStripedFile(dfs, ecFile, writeBytes); > Assert.assertEquals(0, bm.numOfUnderReplicatedBlocks()); > FileChecksum fileChecksum1 = dfs.getFileChecksum(ecFile, writeBytes); > LocatedBlocks locatedBlocks = > StripedFileTestUtil.getLocatedBlocks(ecFile, dfs); > LocatedBlock lb = dfs.getClient().getLocatedBlocks(ecFile.toString(), 0) > .get(0); > DatanodeInfo[] dnLocs = lb.getLocations(); > LocatedStripedBlock lastBlock = > (LocatedStripedBlock)locatedBlocks.getLastLocatedBlock(); > DatanodeInfo[] storageInfos = lastBlock.getLocations(); > // > DatanodeDescriptor datanodeDescriptor = > cluster.getNameNode().getNamesystem() > > .getBlockManager().getDatanodeManager().getDatanode(storageInfos[6].getDatanodeUuid()); > for (int i = 0; i < 100; i++) { > datanodeDescriptor.incrementPendingReplicationWithoutTargets(); > } > assertEquals(dataBlocks + parityBlocks, dnLocs.length); > int[] decommNodeIndex = {3, 4}; > final List decommisionNodes = new ArrayList(); > // add the node which will be decommissioning > decommisionNodes.add(dnLocs[decommNodeIndex[0]]); > decommisionNodes.add(dnLocs[decommNodeIndex[1]]); > decommissionNode(0, decommisionNodes, AdminStates.DECOMMISSIONED); > assertEquals(decommisionNodes.size(), fsn.getNumDecomLiveDataNodes()); > //assertNull(checkFile(dfs, ecFile, 9, decommisionNodes, numDNs)); > // Ensure decommissioned datanode is not automatically shutdown > DFSClient client = getDfsClient(cluster.getNameNode(0), conf); > assertEquals("All datanodes must be alive", numDNs, > client.datanodeReport(DatanodeReportType.LIVE).length); > FileChecksum fileChecksum2 = dfs.getFileChecksum(ecFile, writeBytes); > Assert.assertTrue("Checksum mismatches!", > fileChecksum1.equals(fileChecksum2)); > StripedFileTestUtil.checkData(dfs, ecFile, writeBytes, decommisionNodes, > null, blockGroupSize); > } > {code} > -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional
[jira] [Commented] (HDFS-14768) In some cases, erasure blocks are corruption when they are reconstruct.
[ https://issues.apache.org/jira/browse/HDFS-14768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16928145#comment-16928145 ] guojh commented on HDFS-14768: -- [~surendrasingh] Sorry about my UT. I will update a new UT to check the block corruption. > In some cases, erasure blocks are corruption when they are reconstruct. > > > Key: HDFS-14768 > URL: https://issues.apache.org/jira/browse/HDFS-14768 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode, erasure-coding, hdfs, namenode >Affects Versions: 3.0.2 >Reporter: guojh >Assignee: guojh >Priority: Major > Labels: patch > Fix For: 3.3.0 > > Attachments: HDFS-14768.000.patch > > > Policy is RS-6-3-1024K, version is hadoop 3.0.2; > We suppose a file's block Index is [0,1,2,3,4,5,6,7,8], And decommission > index[3,4], increase the index 6 datanode's > pendingReplicationWithoutTargets that make it large than > replicationStreamsHardLimit(we set 14). Then, After the method > chooseSourceDatanodes of BlockMananger, the liveBlockIndices is > [0,1,2,3,4,5,7,8], Block Counter is, Live:7, Decommission:2. > In method scheduleReconstruction of BlockManager, the additionalReplRequired > is 9 - 7 = 2. After Namenode choose two target Datanode, will assign a > erasureCode task to target datanode. > When datanode get the task will build targetIndices from liveBlockIndices > and target length. the code is blow. > {code:java} > // code placeholder > targetIndices = new short[targets.length]; > private void initTargetIndices() { > BitSet bitset = reconstructor.getLiveBitSet(); > int m = 0; hasValidTargets = false; > for (int i = 0; i < dataBlkNum + parityBlkNum; i++) { > if (!bitset.get) { > if (reconstructor.getBlockLen > 0) { > if (m < targets.length) { > targetIndices[m++] = (short)i; > hasValidTargets = true; > } > } > } > } > {code} > targetIndices[0]=6, and targetIndices[1] is aways 0 from initial value. > The StripedReader is aways create reader from first 6 index block, and is > [0,1,2,3,4,5] > Use the index [0,1,2,3,4,5] to build target index[6,0] will trigger the isal > bug. the block index6's data is corruption(all data is zero). > I write a unit test can stabilize repreduce. > {code:java} > // code placeholder > public void testFileDecommission() throws Exception { > LOG.info("Starting test testFileDecommission"); > final Path ecFile = new Path(ecDir, "testFileDecommission"); > int writeBytes = cellSize * dataBlocks; > writeStripedFile(dfs, ecFile, writeBytes); > Assert.assertEquals(0, bm.numOfUnderReplicatedBlocks()); > FileChecksum fileChecksum1 = dfs.getFileChecksum(ecFile, writeBytes); > LocatedBlocks locatedBlocks = > StripedFileTestUtil.getLocatedBlocks(ecFile, dfs); > LocatedBlock lb = dfs.getClient().getLocatedBlocks(ecFile.toString(), 0) > .get(0); > DatanodeInfo[] dnLocs = lb.getLocations(); > LocatedStripedBlock lastBlock = > (LocatedStripedBlock)locatedBlocks.getLastLocatedBlock(); > DatanodeInfo[] storageInfos = lastBlock.getLocations(); > // > DatanodeDescriptor datanodeDescriptor = > cluster.getNameNode().getNamesystem() > > .getBlockManager().getDatanodeManager().getDatanode(storageInfos[6].getDatanodeUuid()); > for (int i = 0; i < 100; i++) { > datanodeDescriptor.incrementPendingReplicationWithoutTargets(); > } > assertEquals(dataBlocks + parityBlocks, dnLocs.length); > int[] decommNodeIndex = {3, 4}; > final List decommisionNodes = new ArrayList(); > // add the node which will be decommissioning > decommisionNodes.add(dnLocs[decommNodeIndex[0]]); > decommisionNodes.add(dnLocs[decommNodeIndex[1]]); > decommissionNode(0, decommisionNodes, AdminStates.DECOMMISSIONED); > assertEquals(decommisionNodes.size(), fsn.getNumDecomLiveDataNodes()); > //assertNull(checkFile(dfs, ecFile, 9, decommisionNodes, numDNs)); > // Ensure decommissioned datanode is not automatically shutdown > DFSClient client = getDfsClient(cluster.getNameNode(0), conf); > assertEquals("All datanodes must be alive", numDNs, > client.datanodeReport(DatanodeReportType.LIVE).length); > FileChecksum fileChecksum2 = dfs.getFileChecksum(ecFile, writeBytes); > Assert.assertTrue("Checksum mismatches!", > fileChecksum1.equals(fileChecksum2)); > StripedFileTestUtil.checkData(dfs, ecFile, writeBytes, decommisionNodes, > null, blockGroupSize); > } > {code} > -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.
[jira] [Commented] (HDFS-14437) Exception happened when rollEditLog expects empty EditsDoubleBuffer.bufCurrent but not
[ https://issues.apache.org/jira/browse/HDFS-14437?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16928141#comment-16928141 ] angerszhu commented on HDFS-14437: -- [~daryn] In current hadoop truck branch, still have this bug. > Exception happened when rollEditLog expects empty > EditsDoubleBuffer.bufCurrent but not > - > > Key: HDFS-14437 > URL: https://issues.apache.org/jira/browse/HDFS-14437 > Project: Hadoop HDFS > Issue Type: Bug > Components: ha, namenode, qjm >Reporter: angerszhu >Priority: Major > > For the problem mentioned in https://issues.apache.org/jira/browse/HDFS-10943 > , I have sort the process of write and flush EditLog and some important > function, I found the in the class FSEditLog class, the close() function > will call such process like below: > > {code:java} > waitForSyncToFinish(); > endCurrentLogSegment(true);{code} > since we have gain the object lock in the function close(), so when > waitForSyncToFish() method return, it mean all logSync job has done and all > data in bufReady has been flushed out, and since current thread has the lock > of this object, when call endCurrentLogSegment(), no other thread will gain > the lock so they can't write new editlog into currentBuf. > But when we don't call waitForSyncToFish() before endCurrentLogSegment(), > there may be some autoScheduled logSync()'s flush process is doing, since > this process don't need > synchronization since it has mention in the comment of logSync() method : > > {code:java} > /** > * Sync all modifications done by this thread. > * > * The internal concurrency design of this class is as follows: > * - Log items are written synchronized into an in-memory buffer, > * and each assigned a transaction ID. > * - When a thread (client) would like to sync all of its edits, logSync() > * uses a ThreadLocal transaction ID to determine what edit number must > * be synced to. > * - The isSyncRunning volatile boolean tracks whether a sync is currently > * under progress. > * > * The data is double-buffered within each edit log implementation so that > * in-memory writing can occur in parallel with the on-disk writing. > * > * Each sync occurs in three steps: > * 1. synchronized, it swaps the double buffer and sets the isSyncRunning > * flag. > * 2. unsynchronized, it flushes the data to storage > * 3. synchronized, it resets the flag and notifies anyone waiting on the > * sync. > * > * The lack of synchronization on step 2 allows other threads to continue > * to write into the memory buffer while the sync is in progress. > * Because this step is unsynchronized, actions that need to avoid > * concurrency with sync() should be synchronized and also call > * waitForSyncToFinish() before assuming they are running alone. > */ > public void logSync() { > long syncStart = 0; > // Fetch the transactionId of this thread. > long mytxid = myTransactionId.get().txid; > > boolean sync = false; > try { > EditLogOutputStream logStream = null; > synchronized (this) { > try { > printStatistics(false); > // if somebody is already syncing, then wait > while (mytxid > synctxid && isSyncRunning) { > try { > wait(1000); > } catch (InterruptedException ie) { > } > } > // > // If this transaction was already flushed, then nothing to do > // > if (mytxid <= synctxid) { > numTransactionsBatchedInSync++; > if (metrics != null) { > // Metrics is non-null only when used inside name node > metrics.incrTransactionsBatchedInSync(); > } > return; > } > > // now, this thread will do the sync > syncStart = txid; > isSyncRunning = true; > sync = true; > // swap buffers > try { > if (journalSet.isEmpty()) { > throw new IOException("No journals available to flush"); > } > editLogStream.setReadyToFlush(); > } catch (IOException e) { > final String msg = > "Could not sync enough journals to persistent storage " + > "due to " + e.getMessage() + ". " + > "Unsynced transactions: " + (txid - synctxid); > LOG.fatal(msg, new Exception()); > synchronized(journalSetLock) { > IOUtils.cleanup(LOG, journalSet); > } > terminate(1, msg); > } > } finally { > // Prevent RuntimeException from blocking other log edit write > doneWithAutoSyncScheduling(); > } > //editLogStream may become null, > //
[jira] [Updated] (HDFS-14609) RBF: Security should use common AuthenticationFilter
[ https://issues.apache.org/jira/browse/HDFS-14609?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chen Zhang updated HDFS-14609: -- Attachment: HDFS-14609.006.patch > RBF: Security should use common AuthenticationFilter > > > Key: HDFS-14609 > URL: https://issues.apache.org/jira/browse/HDFS-14609 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: CR Hota >Assignee: Chen Zhang >Priority: Major > Attachments: HDFS-14609.001.patch, HDFS-14609.002.patch, > HDFS-14609.003.patch, HDFS-14609.004.patch, HDFS-14609.005.patch, > HDFS-14609.006.patch > > > We worked on router based federation security as part of HDFS-13532. We kept > it compatible with the way namenode works. However with HADOOP-16314 and > HDFS-16354 in trunk, auth filters seems to have been changed causing tests to > fail. > Changes are needed appropriately in RBF, mainly fixing broken tests. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14609) RBF: Security should use common AuthenticationFilter
[ https://issues.apache.org/jira/browse/HDFS-14609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16928138#comment-16928138 ] Chen Zhang commented on HDFS-14609: --- Thanks [~eyang] for your response, upload patch v6 to fix checkstyle error. > RBF: Security should use common AuthenticationFilter > > > Key: HDFS-14609 > URL: https://issues.apache.org/jira/browse/HDFS-14609 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: CR Hota >Assignee: Chen Zhang >Priority: Major > Attachments: HDFS-14609.001.patch, HDFS-14609.002.patch, > HDFS-14609.003.patch, HDFS-14609.004.patch, HDFS-14609.005.patch, > HDFS-14609.006.patch > > > We worked on router based federation security as part of HDFS-13532. We kept > it compatible with the way namenode works. However with HADOOP-16314 and > HDFS-16354 in trunk, auth filters seems to have been changed causing tests to > fail. > Changes are needed appropriately in RBF, mainly fixing broken tests. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14811) RBF: TestRouterRpc#testErasureCoding is flaky
[ https://issues.apache.org/jira/browse/HDFS-14811?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16928132#comment-16928132 ] Chen Zhang commented on HDFS-14811: --- Actually, I'm considering another improvement: Adding a configurable threshold of counting overloaded(e.g. 50), if the xceiverCount is lower than that threshold, then that DN will not count as an overloaded node. Anyway, a DN with xceiverCount 2 is treated as an overloaded node is not reasonable. Do you think it's a better solution? [~elgoiri] [~ayushtkn] > RBF: TestRouterRpc#testErasureCoding is flaky > - > > Key: HDFS-14811 > URL: https://issues.apache.org/jira/browse/HDFS-14811 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Chen Zhang >Assignee: Chen Zhang >Priority: Major > Attachments: HDFS-14811.001.patch, HDFS-14811.002.patch > > > The Failed reason: > {code:java} > 2019-09-01 18:19:20,940 [IPC Server handler 5 on default port 53140] INFO > blockmanagement.BlockPlacementPolicy > (BlockPlacementPolicyDefault.java:chooseRandom(838)) - [ > Node /default-rack/127.0.0.1:53148 [ > ] > Node /default-rack/127.0.0.1:53161 [ > ] > Node /default-rack/127.0.0.1:53157 [ > Datanode 127.0.0.1:53157 is not chosen since the node is too busy (load: 3 > > 2.6665). > Node /default-rack/127.0.0.1:53143 [ > ] > Node /default-rack/127.0.0.1:53165 [ > ] > 2019-09-01 18:19:20,940 [IPC Server handler 5 on default port 53140] INFO > blockmanagement.BlockPlacementPolicy > (BlockPlacementPolicyDefault.java:chooseRandom(846)) - Not enough replicas > was chosen. Reason: {NODE_TOO_BUSY=1} > 2019-09-01 18:19:20,941 [IPC Server handler 5 on default port 53140] WARN > blockmanagement.BlockPlacementPolicy > (BlockPlacementPolicyDefault.java:chooseTarget(449)) - Failed to place enough > replicas, still in need of 1 to reach 6 (unavailableStorages=[], > storagePolicy=BlockStoragePolicy{HOT:7, storageTypes=[DISK], > creationFallbacks=[], replicationFallbacks=[ARCHIVE]}, newBlock=true) > 2019-09-01 18:19:20,941 [IPC Server handler 5 on default port 53140] WARN > protocol.BlockStoragePolicy (BlockStoragePolicy.java:chooseStorageTypes(161)) > - Failed to place enough replicas: expected size is 1 but only 0 storage > types can be selected (replication=6, selected=[], unavailable=[DISK], > removed=[DISK], policy=BlockStoragePolicy{HOT:7, storageTypes=[DISK], > creationFallbacks=[], replicationFallbacks=[ARCHIVE]}) > 2019-09-01 18:19:20,941 [IPC Server handler 5 on default port 53140] WARN > blockmanagement.BlockPlacementPolicy > (BlockPlacementPolicyDefault.java:chooseTarget(449)) - Failed to place enough > replicas, still in need of 1 to reach 6 (unavailableStorages=[DISK], > storagePolicy=BlockStoragePolicy{HOT:7, storageTypes=[DISK], > creationFallbacks=[], replicationFallbacks=[ARCHIVE]}, newBlock=true) All > required storage types are unavailable: unavailableStorages=[DISK], > storagePolicy=BlockStoragePolicy{HOT:7, storageTypes=[DISK], > creationFallbacks=[], replicationFallbacks=[ARCHIVE]} > 2019-09-01 18:19:20,941 [IPC Server handler 5 on default port 53140] INFO > ipc.Server (Server.java:logException(2982)) - IPC Server handler 5 on default > port 53140, call Call#1270 Retry#0 > org.apache.hadoop.hdfs.protocol.ClientProtocol.addBlock from 127.0.0.1:53202 > java.io.IOException: File /testec/testfile2 could only be written to 5 of the > 6 required nodes for RS-6-3-1024k. There are 6 datanode(s) running and 6 > node(s) are excluded in this operation. > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget4NewBlock(BlockManager.java:) > at > org.apache.hadoop.hdfs.server.namenode.FSDirWriteFileOp.chooseTargetForNewBlock(FSDirWriteFileOp.java:294) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2815) > at > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:893) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:574) > at > org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:529) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1070) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:1001) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:929) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserG
[jira] [Comment Edited] (HDFS-14811) RBF: TestRouterRpc#testErasureCoding is flaky
[ https://issues.apache.org/jira/browse/HDFS-14811?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16928128#comment-16928128 ] Chen Zhang edited comment on HDFS-14811 at 9/12/19 1:39 AM: {quote}Let's see if we can fix the count of active threads. {quote} [~elgoiri],HDFS-12288 won't help in this case. when some client writing data on 1 DN, it will start 2 threads({{DatanodeXceiver}} and {{PacketResponder}}), after the patch of HDFS-12288, {{DatanodeXceiverServer}} thread will not be included in {{xceiverCount}} of heartbeat. In this case, there will be 5 DN with {{xceiverCount}} equals to 0 and 1 DN equals to 2(which is overloaded). was (Author: zhangchen): {quote}Let's see if we can fix the count of active threads. {quote} [~elgoiri],HDFS-12288 won't help in this case. when some client writing data on 1 DN, it will start 2 threads(\{{DatanodeXceiver}} and {{PacketResponder}}), after the patch of HDFS-12288, {{DatanodeXceiverServer}} will not included in {{xceiverCount}} of heartbeat. In this case, there will be 5 DN with {{xceiverCount}} equals to 0 and 1 DN equals to 2(which is overloaded). > RBF: TestRouterRpc#testErasureCoding is flaky > - > > Key: HDFS-14811 > URL: https://issues.apache.org/jira/browse/HDFS-14811 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Chen Zhang >Assignee: Chen Zhang >Priority: Major > Attachments: HDFS-14811.001.patch, HDFS-14811.002.patch > > > The Failed reason: > {code:java} > 2019-09-01 18:19:20,940 [IPC Server handler 5 on default port 53140] INFO > blockmanagement.BlockPlacementPolicy > (BlockPlacementPolicyDefault.java:chooseRandom(838)) - [ > Node /default-rack/127.0.0.1:53148 [ > ] > Node /default-rack/127.0.0.1:53161 [ > ] > Node /default-rack/127.0.0.1:53157 [ > Datanode 127.0.0.1:53157 is not chosen since the node is too busy (load: 3 > > 2.6665). > Node /default-rack/127.0.0.1:53143 [ > ] > Node /default-rack/127.0.0.1:53165 [ > ] > 2019-09-01 18:19:20,940 [IPC Server handler 5 on default port 53140] INFO > blockmanagement.BlockPlacementPolicy > (BlockPlacementPolicyDefault.java:chooseRandom(846)) - Not enough replicas > was chosen. Reason: {NODE_TOO_BUSY=1} > 2019-09-01 18:19:20,941 [IPC Server handler 5 on default port 53140] WARN > blockmanagement.BlockPlacementPolicy > (BlockPlacementPolicyDefault.java:chooseTarget(449)) - Failed to place enough > replicas, still in need of 1 to reach 6 (unavailableStorages=[], > storagePolicy=BlockStoragePolicy{HOT:7, storageTypes=[DISK], > creationFallbacks=[], replicationFallbacks=[ARCHIVE]}, newBlock=true) > 2019-09-01 18:19:20,941 [IPC Server handler 5 on default port 53140] WARN > protocol.BlockStoragePolicy (BlockStoragePolicy.java:chooseStorageTypes(161)) > - Failed to place enough replicas: expected size is 1 but only 0 storage > types can be selected (replication=6, selected=[], unavailable=[DISK], > removed=[DISK], policy=BlockStoragePolicy{HOT:7, storageTypes=[DISK], > creationFallbacks=[], replicationFallbacks=[ARCHIVE]}) > 2019-09-01 18:19:20,941 [IPC Server handler 5 on default port 53140] WARN > blockmanagement.BlockPlacementPolicy > (BlockPlacementPolicyDefault.java:chooseTarget(449)) - Failed to place enough > replicas, still in need of 1 to reach 6 (unavailableStorages=[DISK], > storagePolicy=BlockStoragePolicy{HOT:7, storageTypes=[DISK], > creationFallbacks=[], replicationFallbacks=[ARCHIVE]}, newBlock=true) All > required storage types are unavailable: unavailableStorages=[DISK], > storagePolicy=BlockStoragePolicy{HOT:7, storageTypes=[DISK], > creationFallbacks=[], replicationFallbacks=[ARCHIVE]} > 2019-09-01 18:19:20,941 [IPC Server handler 5 on default port 53140] INFO > ipc.Server (Server.java:logException(2982)) - IPC Server handler 5 on default > port 53140, call Call#1270 Retry#0 > org.apache.hadoop.hdfs.protocol.ClientProtocol.addBlock from 127.0.0.1:53202 > java.io.IOException: File /testec/testfile2 could only be written to 5 of the > 6 required nodes for RS-6-3-1024k. There are 6 datanode(s) running and 6 > node(s) are excluded in this operation. > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget4NewBlock(BlockManager.java:) > at > org.apache.hadoop.hdfs.server.namenode.FSDirWriteFileOp.chooseTargetForNewBlock(FSDirWriteFileOp.java:294) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2815) > at > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:893) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:574) > at > org.apache.hadoop.hdf
[jira] [Commented] (HDFS-14811) RBF: TestRouterRpc#testErasureCoding is flaky
[ https://issues.apache.org/jira/browse/HDFS-14811?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16928128#comment-16928128 ] Chen Zhang commented on HDFS-14811: --- {quote}Let's see if we can fix the count of active threads. {quote} [~elgoiri],HDFS-12288 won't help in this case. when some client writing data on 1 DN, it will start 2 threads(\{{DatanodeXceiver}} and {{PacketResponder}}), after the patch of HDFS-12288, {{DatanodeXceiverServer}} will not included in {{xceiverCount}} of heartbeat. In this case, there will be 5 DN with {{xceiverCount}} equals to 0 and 1 DN equals to 2(which is overloaded). > RBF: TestRouterRpc#testErasureCoding is flaky > - > > Key: HDFS-14811 > URL: https://issues.apache.org/jira/browse/HDFS-14811 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Chen Zhang >Assignee: Chen Zhang >Priority: Major > Attachments: HDFS-14811.001.patch, HDFS-14811.002.patch > > > The Failed reason: > {code:java} > 2019-09-01 18:19:20,940 [IPC Server handler 5 on default port 53140] INFO > blockmanagement.BlockPlacementPolicy > (BlockPlacementPolicyDefault.java:chooseRandom(838)) - [ > Node /default-rack/127.0.0.1:53148 [ > ] > Node /default-rack/127.0.0.1:53161 [ > ] > Node /default-rack/127.0.0.1:53157 [ > Datanode 127.0.0.1:53157 is not chosen since the node is too busy (load: 3 > > 2.6665). > Node /default-rack/127.0.0.1:53143 [ > ] > Node /default-rack/127.0.0.1:53165 [ > ] > 2019-09-01 18:19:20,940 [IPC Server handler 5 on default port 53140] INFO > blockmanagement.BlockPlacementPolicy > (BlockPlacementPolicyDefault.java:chooseRandom(846)) - Not enough replicas > was chosen. Reason: {NODE_TOO_BUSY=1} > 2019-09-01 18:19:20,941 [IPC Server handler 5 on default port 53140] WARN > blockmanagement.BlockPlacementPolicy > (BlockPlacementPolicyDefault.java:chooseTarget(449)) - Failed to place enough > replicas, still in need of 1 to reach 6 (unavailableStorages=[], > storagePolicy=BlockStoragePolicy{HOT:7, storageTypes=[DISK], > creationFallbacks=[], replicationFallbacks=[ARCHIVE]}, newBlock=true) > 2019-09-01 18:19:20,941 [IPC Server handler 5 on default port 53140] WARN > protocol.BlockStoragePolicy (BlockStoragePolicy.java:chooseStorageTypes(161)) > - Failed to place enough replicas: expected size is 1 but only 0 storage > types can be selected (replication=6, selected=[], unavailable=[DISK], > removed=[DISK], policy=BlockStoragePolicy{HOT:7, storageTypes=[DISK], > creationFallbacks=[], replicationFallbacks=[ARCHIVE]}) > 2019-09-01 18:19:20,941 [IPC Server handler 5 on default port 53140] WARN > blockmanagement.BlockPlacementPolicy > (BlockPlacementPolicyDefault.java:chooseTarget(449)) - Failed to place enough > replicas, still in need of 1 to reach 6 (unavailableStorages=[DISK], > storagePolicy=BlockStoragePolicy{HOT:7, storageTypes=[DISK], > creationFallbacks=[], replicationFallbacks=[ARCHIVE]}, newBlock=true) All > required storage types are unavailable: unavailableStorages=[DISK], > storagePolicy=BlockStoragePolicy{HOT:7, storageTypes=[DISK], > creationFallbacks=[], replicationFallbacks=[ARCHIVE]} > 2019-09-01 18:19:20,941 [IPC Server handler 5 on default port 53140] INFO > ipc.Server (Server.java:logException(2982)) - IPC Server handler 5 on default > port 53140, call Call#1270 Retry#0 > org.apache.hadoop.hdfs.protocol.ClientProtocol.addBlock from 127.0.0.1:53202 > java.io.IOException: File /testec/testfile2 could only be written to 5 of the > 6 required nodes for RS-6-3-1024k. There are 6 datanode(s) running and 6 > node(s) are excluded in this operation. > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget4NewBlock(BlockManager.java:) > at > org.apache.hadoop.hdfs.server.namenode.FSDirWriteFileOp.chooseTargetForNewBlock(FSDirWriteFileOp.java:294) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2815) > at > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:893) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:574) > at > org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:529) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1070) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:1001) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:929) > at java.security.AccessController.doPrivileged(Native Method) > at javax.secur
[jira] [Commented] (HDFS-12288) Fix DataNode's xceiver count calculation
[ https://issues.apache.org/jira/browse/HDFS-12288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16928125#comment-16928125 ] Chen Zhang commented on HDFS-12288: --- Thanks [~elgoiri] for your response, I'll add more tests to make the patch complete. > Fix DataNode's xceiver count calculation > > > Key: HDFS-12288 > URL: https://issues.apache.org/jira/browse/HDFS-12288 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode, hdfs >Reporter: Lukas Majercak >Assignee: Chen Zhang >Priority: Major > Attachments: HDFS-12288.001.patch, HDFS-12288.002.patch, > HDFS-12288.003.patch > > > The problem with the ThreadGroup.activeCount() method is that the method is > only a very rough estimate, and in reality returns the total number of > threads in the thread group as opposed to the threads actually running. > In some DNs, we saw this to return 50~ for a long time, even though the > actual number of DataXceiver threads was next to none. > This is a big issue as we use the xceiverCount to make decisions on the NN > for choosing replication source DN or returning DNs to clients for R/W. > The plan is to reuse the DataNodeMetrics.dataNodeActiveXceiversCount value > which only accounts for actual number of DataXcevier threads currently > running and thus represents the load on the DN much better. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-2106) Avoid usage of hadoop projects as parent of hdds/ozone
[ https://issues.apache.org/jira/browse/HDDS-2106?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Elek, Marton updated HDDS-2106: --- Resolution: Fixed Status: Resolved (was: Patch Available) > Avoid usage of hadoop projects as parent of hdds/ozone > -- > > Key: HDDS-2106 > URL: https://issues.apache.org/jira/browse/HDDS-2106 > Project: Hadoop Distributed Data Store > Issue Type: Improvement >Reporter: Elek, Marton >Assignee: Elek, Marton >Priority: Blocker > Labels: pull-request-available > Time Spent: 0.5h > Remaining Estimate: 0h > > Ozone uses hadoop as a dependency. The dependency defined on multiple level: > 1. the hadoop artifacts are defined in the sections > 2. both hadoop-ozone and hadoop-hdds projects uses "hadoop-project" as the > parent > As we already have a slightly different assembly process it could be more > resilient to use a dedicated parent project instead of the hadoop one. With > this approach it will be easier to upgrade the versions as we don't need to > be careful about the pom contents only about the used dependencies. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-2089) Add CLI createPipeline
[ https://issues.apache.org/jira/browse/HDDS-2089?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiaoyu Yao updated HDDS-2089: - Status: Patch Available (was: Open) > Add CLI createPipeline > -- > > Key: HDDS-2089 > URL: https://issues.apache.org/jira/browse/HDDS-2089 > Project: Hadoop Distributed Data Store > Issue Type: Sub-task > Components: Ozone CLI >Affects Versions: 0.5.0 >Reporter: Li Cheng >Assignee: Li Cheng >Priority: Major > Labels: pull-request-available > Time Spent: 0.5h > Remaining Estimate: 0h > > Add a SCMCLI to create pipeline for ozone. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDDS-2106) Avoid usage of hadoop projects as parent of hdds/ozone
[ https://issues.apache.org/jira/browse/HDDS-2106?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16928122#comment-16928122 ] Hudson commented on HDDS-2106: -- FAILURE: Integrated in Jenkins build Hadoop-trunk-Commit #17279 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/17279/]) HDDS-2106. Avoid usage of hadoop projects as parent of hdds/ozone (elek: rev f537410563e3966581442acc77f7d9f7fd95e3e5) * (edit) pom.ozone.xml * (edit) hadoop-ozone/pom.xml * (edit) hadoop-hdds/pom.xml > Avoid usage of hadoop projects as parent of hdds/ozone > -- > > Key: HDDS-2106 > URL: https://issues.apache.org/jira/browse/HDDS-2106 > Project: Hadoop Distributed Data Store > Issue Type: Improvement >Reporter: Elek, Marton >Assignee: Elek, Marton >Priority: Blocker > Labels: pull-request-available > Time Spent: 0.5h > Remaining Estimate: 0h > > Ozone uses hadoop as a dependency. The dependency defined on multiple level: > 1. the hadoop artifacts are defined in the sections > 2. both hadoop-ozone and hadoop-hdds projects uses "hadoop-project" as the > parent > As we already have a slightly different assembly process it could be more > resilient to use a dedicated parent project instead of the hadoop one. With > this approach it will be easier to upgrade the versions as we don't need to > be careful about the pom contents only about the used dependencies. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDDS-2106) Avoid usage of hadoop projects as parent of hdds/ozone
[ https://issues.apache.org/jira/browse/HDDS-2106?focusedWorklogId=311108&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-311108 ] ASF GitHub Bot logged work on HDDS-2106: Author: ASF GitHub Bot Created on: 12/Sep/19 01:23 Start Date: 12/Sep/19 01:23 Worklog Time Spent: 10m Work Description: elek commented on pull request #1423: HDDS-2106. Avoid usage of hadoop projects as parent of hdds/ozone URL: https://github.com/apache/hadoop/pull/1423 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 311108) Time Spent: 0.5h (was: 20m) > Avoid usage of hadoop projects as parent of hdds/ozone > -- > > Key: HDDS-2106 > URL: https://issues.apache.org/jira/browse/HDDS-2106 > Project: Hadoop Distributed Data Store > Issue Type: Improvement >Reporter: Elek, Marton >Assignee: Elek, Marton >Priority: Blocker > Labels: pull-request-available > Time Spent: 0.5h > Remaining Estimate: 0h > > Ozone uses hadoop as a dependency. The dependency defined on multiple level: > 1. the hadoop artifacts are defined in the sections > 2. both hadoop-ozone and hadoop-hdds projects uses "hadoop-project" as the > parent > As we already have a slightly different assembly process it could be more > resilient to use a dedicated parent project instead of the hadoop one. With > this approach it will be easier to upgrade the versions as we don't need to > be careful about the pom contents only about the used dependencies. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14845) Request is a replay (34) error in httpfs
[ https://issues.apache.org/jira/browse/HDFS-14845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16928114#comment-16928114 ] Akira Ajisaka commented on HDFS-14845: -- Thanks [~Prabhu Joseph]! 1. "org.apache.hadoop.security.AuthenticationFilterInitializer,org.apache.hadoop.http.lib.StaticUserWebFilter,org.apache.hadoop.security.HttpCrossOriginFilterInitializer" 2. https://gist.github.com/aajisaka/82d08ec02aa73bdabfbdad1e9723898d > Request is a replay (34) error in httpfs > > > Key: HDFS-14845 > URL: https://issues.apache.org/jira/browse/HDFS-14845 > Project: Hadoop HDFS > Issue Type: Bug > Components: httpfs >Affects Versions: 3.3.0 > Environment: Kerberos and ZKDelgationTokenSecretManager enabled in > HttpFS >Reporter: Akira Ajisaka >Priority: Critical > > We are facing "Request is a replay (34)" error when accessing to HDFS via > httpfs on trunk. > {noformat} > % curl -i --negotiate -u : "https://:4443/webhdfs/v1/?op=liststatus" > HTTP/1.1 401 Authentication required > Date: Mon, 09 Sep 2019 06:00:04 GMT > Date: Mon, 09 Sep 2019 06:00:04 GMT > Pragma: no-cache > X-Content-Type-Options: nosniff > X-XSS-Protection: 1; mode=block > WWW-Authenticate: Negotiate > Set-Cookie: hadoop.auth=; Path=/; Secure; HttpOnly > Cache-Control: must-revalidate,no-cache,no-store > Content-Type: text/html;charset=iso-8859-1 > Content-Length: 271 > HTTP/1.1 403 GSSException: Failure unspecified at GSS-API level (Mechanism > level: Request is a replay (34)) > Date: Mon, 09 Sep 2019 06:00:04 GMT > Date: Mon, 09 Sep 2019 06:00:04 GMT > Pragma: no-cache > X-Content-Type-Options: nosniff > X-XSS-Protection: 1; mode=block > (snip) > Set-Cookie: hadoop.auth=; Path=/; Secure; HttpOnly > Cache-Control: must-revalidate,no-cache,no-store > Content-Type: text/html;charset=iso-8859-1 > Content-Length: 413 > > > > Error 403 GSSException: Failure unspecified at GSS-API level > (Mechanism level: Request is a replay (34)) > > HTTP ERROR 403 > Problem accessing /webhdfs/v1/. Reason: > GSSException: Failure unspecified at GSS-API level (Mechanism level: > Request is a replay (34)) > > > {noformat} -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14778) BlockManager findAndMarkBlockAsCorrupt adds block to the map if the Storage state is failed
[ https://issues.apache.org/jira/browse/HDFS-14778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16928109#comment-16928109 ] Wei-Chiu Chuang commented on HDFS-14778: It would be really nice if you can update the test (TestFileCorruption#testCorruptionWithDiskFailure) added by HDFS-9958 to verify the fix. How about using storage.areBlocksOnFailedStorage() to check? > BlockManager findAndMarkBlockAsCorrupt adds block to the map if the Storage > state is failed > --- > > Key: HDFS-14778 > URL: https://issues.apache.org/jira/browse/HDFS-14778 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: hemanthboyina >Assignee: hemanthboyina >Priority: Major > Attachments: HDFS-14778.001.patch > > > Should not mark the block as corrupt if the storage state is failed -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDDS-2106) Avoid usage of hadoop projects as parent of hdds/ozone
[ https://issues.apache.org/jira/browse/HDDS-2106?focusedWorklogId=311098&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-311098 ] ASF GitHub Bot logged work on HDDS-2106: Author: ASF GitHub Bot Created on: 12/Sep/19 00:40 Start Date: 12/Sep/19 00:40 Worklog Time Spent: 10m Work Description: elek commented on issue #1423: HDDS-2106. Avoid usage of hadoop projects as parent of hdds/ozone URL: https://github.com/apache/hadoop/pull/1423#issuecomment-530617911 Thanks @arp7 and @adoroszlai the review. There are no more warnings (missing parts are also migrated from pom.xml) and the integration test failures are not related. Will merge it soon. For the records: this is just the first step. As we have this brand new parent pom, later we can simplify the hadoop-ozone/pom.xml and hadoop-hdds/pom.xml as many common parts can be moved to pom.ozone.xml This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 311098) Time Spent: 20m (was: 10m) > Avoid usage of hadoop projects as parent of hdds/ozone > -- > > Key: HDDS-2106 > URL: https://issues.apache.org/jira/browse/HDDS-2106 > Project: Hadoop Distributed Data Store > Issue Type: Improvement >Reporter: Elek, Marton >Assignee: Elek, Marton >Priority: Blocker > Labels: pull-request-available > Time Spent: 20m > Remaining Estimate: 0h > > Ozone uses hadoop as a dependency. The dependency defined on multiple level: > 1. the hadoop artifacts are defined in the sections > 2. both hadoop-ozone and hadoop-hdds projects uses "hadoop-project" as the > parent > As we already have a slightly different assembly process it could be more > resilient to use a dedicated parent project instead of the hadoop one. With > this approach it will be easier to upgrade the versions as we don't need to > be careful about the pom contents only about the used dependencies. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-2111) DOM XSS
[ https://issues.apache.org/jira/browse/HDDS-2111?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dinesh Chitlangia updated HDDS-2111: Component/s: S3 > DOM XSS > --- > > Key: HDDS-2111 > URL: https://issues.apache.org/jira/browse/HDDS-2111 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: S3 >Reporter: Aayush >Priority: Major > > VULNERABILITY DETAILS > There is a way to bypass anti-XSS filter for DOM XSS exploiting a > "window.location.href". > Considering a typical URL: > scheme://domain:port/path?query_string#fragment_id > Browsers encode correctly both "path" and "query_string", but not the > "fragment_id". > So if used "fragment_id" the vector is also not logged on Web Server. > VERSION > Chrome Version: 10.0.648.134 (Official Build 77917) beta > REPRODUCTION CASE > This is an index.html page: > {code:java} > aws s3api --endpoint > document.write(window.location.href.replace("static/", "")) > create-bucket --bucket=wordcount > {code} > The attack vector is: > index.html?#alert('XSS'); > * PoC: > For your convenience, a minimalist PoC is located on: > http://security.onofri.org/xss_location.html?#alert('XSS'); > * References > - DOM Based Cross-Site Scripting or XSS of the Third Kind - > http://www.webappsec.org/projects/articles/071105.shtml > reference:- > https://bugs.chromium.org/p/chromium/issues/detail?id=76796 -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14811) RBF: TestRouterRpc#testErasureCoding is flaky
[ https://issues.apache.org/jira/browse/HDFS-14811?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16928083#comment-16928083 ] Íñigo Goiri commented on HDFS-14811: Let's see if we can fix the count of active threads. > RBF: TestRouterRpc#testErasureCoding is flaky > - > > Key: HDFS-14811 > URL: https://issues.apache.org/jira/browse/HDFS-14811 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Chen Zhang >Assignee: Chen Zhang >Priority: Major > Attachments: HDFS-14811.001.patch, HDFS-14811.002.patch > > > The Failed reason: > {code:java} > 2019-09-01 18:19:20,940 [IPC Server handler 5 on default port 53140] INFO > blockmanagement.BlockPlacementPolicy > (BlockPlacementPolicyDefault.java:chooseRandom(838)) - [ > Node /default-rack/127.0.0.1:53148 [ > ] > Node /default-rack/127.0.0.1:53161 [ > ] > Node /default-rack/127.0.0.1:53157 [ > Datanode 127.0.0.1:53157 is not chosen since the node is too busy (load: 3 > > 2.6665). > Node /default-rack/127.0.0.1:53143 [ > ] > Node /default-rack/127.0.0.1:53165 [ > ] > 2019-09-01 18:19:20,940 [IPC Server handler 5 on default port 53140] INFO > blockmanagement.BlockPlacementPolicy > (BlockPlacementPolicyDefault.java:chooseRandom(846)) - Not enough replicas > was chosen. Reason: {NODE_TOO_BUSY=1} > 2019-09-01 18:19:20,941 [IPC Server handler 5 on default port 53140] WARN > blockmanagement.BlockPlacementPolicy > (BlockPlacementPolicyDefault.java:chooseTarget(449)) - Failed to place enough > replicas, still in need of 1 to reach 6 (unavailableStorages=[], > storagePolicy=BlockStoragePolicy{HOT:7, storageTypes=[DISK], > creationFallbacks=[], replicationFallbacks=[ARCHIVE]}, newBlock=true) > 2019-09-01 18:19:20,941 [IPC Server handler 5 on default port 53140] WARN > protocol.BlockStoragePolicy (BlockStoragePolicy.java:chooseStorageTypes(161)) > - Failed to place enough replicas: expected size is 1 but only 0 storage > types can be selected (replication=6, selected=[], unavailable=[DISK], > removed=[DISK], policy=BlockStoragePolicy{HOT:7, storageTypes=[DISK], > creationFallbacks=[], replicationFallbacks=[ARCHIVE]}) > 2019-09-01 18:19:20,941 [IPC Server handler 5 on default port 53140] WARN > blockmanagement.BlockPlacementPolicy > (BlockPlacementPolicyDefault.java:chooseTarget(449)) - Failed to place enough > replicas, still in need of 1 to reach 6 (unavailableStorages=[DISK], > storagePolicy=BlockStoragePolicy{HOT:7, storageTypes=[DISK], > creationFallbacks=[], replicationFallbacks=[ARCHIVE]}, newBlock=true) All > required storage types are unavailable: unavailableStorages=[DISK], > storagePolicy=BlockStoragePolicy{HOT:7, storageTypes=[DISK], > creationFallbacks=[], replicationFallbacks=[ARCHIVE]} > 2019-09-01 18:19:20,941 [IPC Server handler 5 on default port 53140] INFO > ipc.Server (Server.java:logException(2982)) - IPC Server handler 5 on default > port 53140, call Call#1270 Retry#0 > org.apache.hadoop.hdfs.protocol.ClientProtocol.addBlock from 127.0.0.1:53202 > java.io.IOException: File /testec/testfile2 could only be written to 5 of the > 6 required nodes for RS-6-3-1024k. There are 6 datanode(s) running and 6 > node(s) are excluded in this operation. > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget4NewBlock(BlockManager.java:) > at > org.apache.hadoop.hdfs.server.namenode.FSDirWriteFileOp.chooseTargetForNewBlock(FSDirWriteFileOp.java:294) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2815) > at > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:893) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:574) > at > org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:529) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1070) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:1001) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:929) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1891) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2921) > 2019-09-01 18:19:20,942 [IPC Server handler 6 on default port 53197] INFO > ipc.Server (Server.java:logException(2975)) - IPC Server handler 6 on default > port 53197, call Call#1
[jira] [Commented] (HDFS-14795) Add Throttler for writing block
[ https://issues.apache.org/jira/browse/HDFS-14795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16928082#comment-16928082 ] Íñigo Goiri commented on HDFS-14795: Now I see where the docu description comes form. I would try to extend both descriptions to make them a little more specific on what a transfer and what a write means. > Add Throttler for writing block > --- > > Key: HDFS-14795 > URL: https://issues.apache.org/jira/browse/HDFS-14795 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Lisheng Sun >Assignee: Lisheng Sun >Priority: Minor > Attachments: HDFS-14795.001.patch, HDFS-14795.002.patch, > HDFS-14795.003.patch, HDFS-14795.004.patch, HDFS-14795.005.patch, > HDFS-14795.006.patch, HDFS-14795.007.patch, HDFS-14795.008.patch, > HDFS-14795.009.patch, HDFS-14795.010.patch > > > DataXceiver#writeBlock > {code:java} > blockReceiver.receiveBlock(mirrorOut, mirrorIn, replyOut, > mirrorAddr, null, targets, false); > {code} > As above code, DataXceiver#writeBlock doesn't throttler. > I think it is necessary to throttle for writing block, while add throttler > in stage of PIPELINE_SETUP_APPEND_RECOVERY or > PIPELINE_SETUP_STREAMING_RECOVERY. > Default throttler value is still null. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-12288) Fix DataNode's xceiver count calculation
[ https://issues.apache.org/jira/browse/HDFS-12288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16928080#comment-16928080 ] Íñigo Goiri commented on HDFS-12288: Changing the variable within HeartbeatRequestProto might be too dangerous. Let's keep the old naming for backwards compatibility. Regarding the new variables for tracking the threads, I think it makes sense. > Fix DataNode's xceiver count calculation > > > Key: HDFS-12288 > URL: https://issues.apache.org/jira/browse/HDFS-12288 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode, hdfs >Reporter: Lukas Majercak >Assignee: Chen Zhang >Priority: Major > Attachments: HDFS-12288.001.patch, HDFS-12288.002.patch, > HDFS-12288.003.patch > > > The problem with the ThreadGroup.activeCount() method is that the method is > only a very rough estimate, and in reality returns the total number of > threads in the thread group as opposed to the threads actually running. > In some DNs, we saw this to return 50~ for a long time, even though the > actual number of DataXceiver threads was next to none. > This is a big issue as we use the xceiverCount to make decisions on the NN > for choosing replication source DN or returning DNs to clients for R/W. > The plan is to reuse the DataNodeMetrics.dataNodeActiveXceiversCount value > which only accounts for actual number of DataXcevier threads currently > running and thus represents the load on the DN much better. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-2114) Rename does not preserve non-explicitly created interim directories
[ https://issues.apache.org/jira/browse/HDDS-2114?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Istvan Fajth updated HDDS-2114: --- Attachment: demonstrative_test.patch > Rename does not preserve non-explicitly created interim directories > --- > > Key: HDDS-2114 > URL: https://issues.apache.org/jira/browse/HDDS-2114 > Project: Hadoop Distributed Data Store > Issue Type: Bug >Reporter: Istvan Fajth >Priority: Critical > Attachments: demonstrative_test.patch > > > I am attaching a patch that adds a test that demonstrates the problem. > The scenario is coming from the way how Hive implements acid transactions > with the ORC table format, but the test is redacted to the simplest possible > code that reproduces the issue. > The scenario: > * Given a 3 level directory structure, where the top level directory was > explicitly created, and the interim directory is implicitly created (for > example either by creating a file with create("/top/interim/file") or by > creating a directory with mkdirs("top/interim/dir")) > * When the leaf is moved out from the implicitly created directory making > this directory an empty directory > * Then a FileNotFoundException is thrown when getFileStatus or listStatus is > called on the interim directory. > The expected behaviour: > after the directory is becoming empty, the directory should still be part of > the file system, moreover an empty FileStatus array should be returned when > listStatus is called on it, and also a valid FileStatus object should be > returned when getFileStatus is called on it. > > > As this issue is present with Hive, and as this is how a FileSystem is > expected to work this seems to be an at least critical issue as I see, please > feel free to change the priority if needed. > Also please note that, if the interim directory is explicitly created with > mkdirs("top/interim") before creating the leaf, then the issue does not > appear. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDDS-2114) Rename does not preserve non-explicitly created interim directories
Istvan Fajth created HDDS-2114: -- Summary: Rename does not preserve non-explicitly created interim directories Key: HDDS-2114 URL: https://issues.apache.org/jira/browse/HDDS-2114 Project: Hadoop Distributed Data Store Issue Type: Bug Reporter: Istvan Fajth Attachments: demonstrative_test.patch I am attaching a patch that adds a test that demonstrates the problem. The scenario is coming from the way how Hive implements acid transactions with the ORC table format, but the test is redacted to the simplest possible code that reproduces the issue. The scenario: * Given a 3 level directory structure, where the top level directory was explicitly created, and the interim directory is implicitly created (for example either by creating a file with create("/top/interim/file") or by creating a directory with mkdirs("top/interim/dir")) * When the leaf is moved out from the implicitly created directory making this directory an empty directory * Then a FileNotFoundException is thrown when getFileStatus or listStatus is called on the interim directory. The expected behaviour: after the directory is becoming empty, the directory should still be part of the file system, moreover an empty FileStatus array should be returned when listStatus is called on it, and also a valid FileStatus object should be returned when getFileStatus is called on it. As this issue is present with Hive, and as this is how a FileSystem is expected to work this seems to be an at least critical issue as I see, please feel free to change the priority if needed. Also please note that, if the interim directory is explicitly created with mkdirs("top/interim") before creating the leaf, then the issue does not appear. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14832) RBF : Add Icon for ReadOnly False
[ https://issues.apache.org/jira/browse/HDFS-14832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16928071#comment-16928071 ] Íñigo Goiri commented on HDFS-14832: I would let [~tasanuma] to chime in. > RBF : Add Icon for ReadOnly False > - > > Key: HDFS-14832 > URL: https://issues.apache.org/jira/browse/HDFS-14832 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: hemanthboyina >Assignee: hemanthboyina >Priority: Minor > > In Router Web UI for Mount Table information , add icon for read only state > false -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14655) [SBN Read] Namenode crashes if one of The JN is down
[ https://issues.apache.org/jira/browse/HDFS-14655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16928036#comment-16928036 ] Hadoop QA commented on HDFS-14655: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 43s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s{color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 18m 34s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 55s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 49s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 3s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 13m 21s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 1s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 51s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 59s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 52s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 52s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 43s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 58s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} xml {color} | {color:green} 0m 2s{color} | {color:green} The patch has no ill-formed XML file. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 12m 43s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 15s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 51s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red}105m 38s{color} | {color:red} hadoop-hdfs in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 30s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}163m 44s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.hdfs.server.blockmanagement.TestBlockTokenWithDFS | | | hadoop.hdfs.server.balancer.TestBalancerWithMultipleNameNodes | \\ \\ || Subsystem || Report/Notes || | Docker | Client=19.03.0 Server=19.03.0 Image:yetus/hadoop:bdbca0e53b4 | | JIRA Issue | HDFS-14655 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12980112/HDFS-14655-04.patch | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle xml | | uname | Linux 5e1a891a2b03 4.15.0-58-generic #64-Ubuntu SMP Tue Aug 6 11:12:41 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 64ed6b1 | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_222 | | findbugs | v3.1.0-RC1 | | unit | https://builds.apache.org/job/PreCommit-HDFS-Build/27847/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt | | Test Results
[jira] [Commented] (HDDS-2112) Ozone rename is behaving different compared with HDFS
[ https://issues.apache.org/jira/browse/HDDS-2112?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16928007#comment-16928007 ] Istvan Fajth commented on HDDS-2112: Great [~ste...@apache.org], I was as well unsure, and I did not checked the contract tests to see, but I completely agree that it is tedious, also it would spare me a couple of hours if rename threw an exception in these situations. Thank you for the clarification, and closure of the ticket! > Ozone rename is behaving different compared with HDFS > - > > Key: HDDS-2112 > URL: https://issues.apache.org/jira/browse/HDDS-2112 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Filesystem >Reporter: Istvan Fajth >Priority: Major > Attachments: demonstrative_test.patch > > > I am attaching a patch file, that introduces two new tests for the > OzoneFileSystem implementation which demonstrates the expected behaviour. > Case 1: > Given a directory a file "/source/subdir/file", and a directory /target > When fs.rename("/source/subdir/file", "/target/subdir/file") is called > Then DistributedFileSystem (HDFS), is returning false from the method, while > OzoneFileSystem throws a FileNotFoundException as "/target/subdir" is not > existing. > The expected behaviour would be to return false in this case instead of > throwing an exception with that behave the same as DistributedFileSystem does. > > Case 2: > Given a directory "/source" and a file "/targetFile" > When fs.rename("/source", "/targetFile") is called > Then DistributedFileSystem (HDFS), is returning false from the method, while > OzoneFileSystem throws a FileAlreadyExistsException as "/targetFile" does > exist. > The expected behaviour would be to return false in this case instead of > throwing an exception with that behave the same as DistributedFileSystem does. > > It may be considered as well a bug in HDFS, however it is not clear from the > FileSystem interface's documentation on the two rename methods that it > defines in which cases an exception should be thrown and in which cases a > return false is the expected behaviour. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-2096) Ozone ACL document missing AddAcl API
[ https://issues.apache.org/jira/browse/HDDS-2096?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiaoyu Yao updated HDDS-2096: - Status: Patch Available (was: Open) > Ozone ACL document missing AddAcl API > - > > Key: HDDS-2096 > URL: https://issues.apache.org/jira/browse/HDDS-2096 > Project: Hadoop Distributed Data Store > Issue Type: Bug >Reporter: Xiaoyu Yao >Assignee: Xiaoyu Yao >Priority: Major > Labels: pull-request-available > Time Spent: 0.5h > Remaining Estimate: 0h > > Current Ozone Native ACL APIs document looks like below, the AddAcl is > missing. > > h3. Ozone Native ACL APIs > The ACLs can be manipulated by a set of APIs supported by Ozone. The APIs > supported are: > # *SetAcl* – This API will take user principal, the name, type of the ozone > object and a list of ACLs. > # *GetAcl* – This API will take the name and type of the ozone object and > will return a list of ACLs. > # *RemoveAcl* - This API will take the name, type of the ozone object and > the ACL that has to be removed. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDDS-2076) Read fails because the block cannot be located in the container
[ https://issues.apache.org/jira/browse/HDDS-2076?focusedWorklogId=310988&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-310988 ] ASF GitHub Bot logged work on HDDS-2076: Author: ASF GitHub Bot Created on: 11/Sep/19 20:41 Start Date: 11/Sep/19 20:41 Worklog Time Spent: 10m Work Description: hadoop-yetus commented on issue #1410: HDDS-2076. Read fails because the block cannot be located in the container URL: https://github.com/apache/hadoop/pull/1410#issuecomment-530556754 :broken_heart: **-1 overall** | Vote | Subsystem | Runtime | Comment | |::|--:|:|:| | 0 | reexec | 82 | Docker mode activated. | ||| _ Prechecks _ | | +1 | dupname | 0 | No case conflicting files found. | | +1 | @author | 0 | The patch does not contain any @author tags. | | +1 | test4tests | 0 | The patch appears to include 1 new or modified test files. | ||| _ trunk Compile Tests _ | | 0 | mvndep | 32 | Maven dependency ordering for branch | | +1 | mvninstall | 660 | trunk passed | | +1 | compile | 387 | trunk passed | | +1 | checkstyle | 79 | trunk passed | | +1 | mvnsite | 0 | trunk passed | | +1 | shadedclient | 967 | branch has no errors when building and testing our client artifacts. | | +1 | javadoc | 175 | trunk passed | | 0 | spotbugs | 494 | Used deprecated FindBugs config; considering switching to SpotBugs. | | +1 | findbugs | 715 | trunk passed | | -0 | patch | 562 | Used diff version of patch file. Binary files and potentially other changes not applied. Please rebase and squash commits if necessary. | ||| _ Patch Compile Tests _ | | 0 | mvndep | 45 | Maven dependency ordering for patch | | +1 | mvninstall | 715 | the patch passed | | +1 | compile | 475 | the patch passed | | +1 | javac | 475 | the patch passed | | +1 | checkstyle | 103 | the patch passed | | +1 | mvnsite | 0 | the patch passed | | +1 | whitespace | 0 | The patch has no whitespace issues. | | +1 | shadedclient | 885 | patch has no errors when building and testing our client artifacts. | | +1 | javadoc | 221 | the patch passed | | +1 | findbugs | 846 | the patch passed | ||| _ Other Tests _ | | -1 | unit | 250 | hadoop-hdds in the patch failed. | | -1 | unit | 2796 | hadoop-ozone in the patch failed. | | +1 | asflicense | 70 | The patch does not generate ASF License warnings. | | | | 9721 | | | Reason | Tests | |---:|:--| | Failed junit tests | hadoop.ozone.container.keyvalue.TestKeyValueContainer | | | hadoop.ozone.container.ozoneimpl.TestOzoneContainer | | | hadoop.ozone.client.rpc.TestReadRetries | | | hadoop.ozone.client.rpc.Test2WayCommitInRatis | | | hadoop.ozone.client.rpc.TestDeleteWithSlowFollower | | | hadoop.ozone.container.common.statemachine.commandhandler.TestCloseContainerByPipeline | | | hadoop.ozone.client.rpc.TestOzoneAtRestEncryption | | | hadoop.ozone.container.common.statemachine.commandhandler.TestDeleteContainerHandler | | Subsystem | Report/Notes | |--:|:-| | Docker | Client=19.03.2 Server=19.03.2 base: https://builds.apache.org/job/hadoop-multibranch/job/PR-1410/4/artifact/out/Dockerfile | | GITHUB PR | https://github.com/apache/hadoop/pull/1410 | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux 98e5c8d82f67 4.15.0-58-generic #64-Ubuntu SMP Tue Aug 6 11:12:41 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | personality/hadoop.sh | | git revision | trunk / 9221704 | | Default Java | 1.8.0_212 | | unit | https://builds.apache.org/job/hadoop-multibranch/job/PR-1410/4/artifact/out/patch-unit-hadoop-hdds.txt | | unit | https://builds.apache.org/job/hadoop-multibranch/job/PR-1410/4/artifact/out/patch-unit-hadoop-ozone.txt | | Test Results | https://builds.apache.org/job/hadoop-multibranch/job/PR-1410/4/testReport/ | | Max. process+thread count | 4356 (vs. ulimit of 5500) | | modules | C: hadoop-hdds/container-service hadoop-ozone/integration-test U: . | | Console output | https://builds.apache.org/job/hadoop-multibranch/job/PR-1410/4/console | | versions | git=2.7.4 maven=3.3.9 findbugs=3.1.0-RC1 | | Powered by | Apache Yetus 0.10.0 http://yetus.apache.org | This message was automatically generated. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking ---
[jira] [Work logged] (HDDS-1982) Extend SCMNodeManager to support decommission and maintenance states
[ https://issues.apache.org/jira/browse/HDDS-1982?focusedWorklogId=310984&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-310984 ] ASF GitHub Bot logged work on HDDS-1982: Author: ASF GitHub Bot Created on: 11/Sep/19 20:38 Start Date: 11/Sep/19 20:38 Worklog Time Spent: 10m Work Description: hadoop-yetus commented on issue #1344: HDDS-1982 Extend SCMNodeManager to support decommission and maintenance states URL: https://github.com/apache/hadoop/pull/1344#issuecomment-530555681 :broken_heart: **-1 overall** | Vote | Subsystem | Runtime | Comment | |::|--:|:|:| | 0 | reexec | 82 | Docker mode activated. | ||| _ Prechecks _ | | +1 | dupname | 1 | No case conflicting files found. | | +1 | @author | 0 | The patch does not contain any @author tags. | | +1 | test4tests | 0 | The patch appears to include 15 new or modified test files. | ||| _ trunk Compile Tests _ | | 0 | mvndep | 80 | Maven dependency ordering for branch | | +1 | mvninstall | 639 | trunk passed | | +1 | compile | 397 | trunk passed | | +1 | checkstyle | 76 | trunk passed | | +1 | mvnsite | 0 | trunk passed | | +1 | shadedclient | 981 | branch has no errors when building and testing our client artifacts. | | +1 | javadoc | 234 | trunk passed | | 0 | spotbugs | 531 | Used deprecated FindBugs config; considering switching to SpotBugs. | | +1 | findbugs | 786 | trunk passed | ||| _ Patch Compile Tests _ | | 0 | mvndep | 44 | Maven dependency ordering for patch | | +1 | mvninstall | 723 | the patch passed | | +1 | compile | 467 | the patch passed | | +1 | cc | 467 | the patch passed | | +1 | javac | 467 | the patch passed | | +1 | checkstyle | 99 | the patch passed | | +1 | mvnsite | 0 | the patch passed | | +1 | whitespace | 0 | The patch has no whitespace issues. | | +1 | shadedclient | 876 | patch has no errors when building and testing our client artifacts. | | -1 | javadoc | 95 | hadoop-hdds generated 1 new + 16 unchanged - 0 fixed = 17 total (was 16) | | +1 | findbugs | 788 | the patch passed | ||| _ Other Tests _ | | +1 | unit | 418 | hadoop-hdds in the patch passed. | | -1 | unit | 334 | hadoop-ozone in the patch failed. | | +1 | asflicense | 60 | The patch does not generate ASF License warnings. | | | | 7525 | | | Reason | Tests | |---:|:--| | Failed junit tests | hadoop.ozone.om.ratis.TestOzoneManagerDoubleBufferWithOMResponse | | Subsystem | Report/Notes | |--:|:-| | Docker | Client=19.03.2 Server=19.03.2 base: https://builds.apache.org/job/hadoop-multibranch/job/PR-1344/6/artifact/out/Dockerfile | | GITHUB PR | https://github.com/apache/hadoop/pull/1344 | | Optional Tests | dupname asflicense compile cc mvnsite javac unit javadoc mvninstall shadedclient findbugs checkstyle | | uname | Linux 425bc211517e 4.15.0-60-generic #67-Ubuntu SMP Thu Aug 22 16:55:30 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | personality/hadoop.sh | | git revision | trunk / 9221704 | | Default Java | 1.8.0_222 | | javadoc | https://builds.apache.org/job/hadoop-multibranch/job/PR-1344/6/artifact/out/diff-javadoc-javadoc-hadoop-hdds.txt | | unit | https://builds.apache.org/job/hadoop-multibranch/job/PR-1344/6/artifact/out/patch-unit-hadoop-ozone.txt | | Test Results | https://builds.apache.org/job/hadoop-multibranch/job/PR-1344/6/testReport/ | | Max. process+thread count | 426 (vs. ulimit of 5500) | | modules | C: hadoop-hdds/common hadoop-hdds/server-scm hadoop-hdds/tools hadoop-ozone/integration-test U: . | | Console output | https://builds.apache.org/job/hadoop-multibranch/job/PR-1344/6/console | | versions | git=2.7.4 maven=3.3.9 findbugs=3.1.0-RC1 | | Powered by | Apache Yetus 0.10.0 http://yetus.apache.org | This message was automatically generated. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 310984) Time Spent: 5h 50m (was: 5h 40m) > Extend SCMNodeManager to support decommission and maintenance states > > > Key: HDDS-1982 > URL: https://issues.apache.org/jira/browse/HDDS-1982 > Project: Hadoop Distributed Data Store > Issue Type: Sub-task > Components: SCM >Reporter: Stephen O'Donnell >Assignee: Ste
[jira] [Work logged] (HDDS-1786) Datanodes takeSnapshot should delete previously created snapshots
[ https://issues.apache.org/jira/browse/HDDS-1786?focusedWorklogId=310974&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-310974 ] ASF GitHub Bot logged work on HDDS-1786: Author: ASF GitHub Bot Created on: 11/Sep/19 20:21 Start Date: 11/Sep/19 20:21 Worklog Time Spent: 10m Work Description: hadoop-yetus commented on issue #1163: HDDS-1786 : Datanodes takeSnapshot should delete previously created s… URL: https://github.com/apache/hadoop/pull/1163#issuecomment-530549591 :broken_heart: **-1 overall** | Vote | Subsystem | Runtime | Comment | |::|--:|:|:| | 0 | reexec | 40 | Docker mode activated. | ||| _ Prechecks _ | | +1 | dupname | 0 | No case conflicting files found. | | +1 | @author | 0 | The patch does not contain any @author tags. | | +1 | test4tests | 0 | The patch appears to include 1 new or modified test files. | ||| _ trunk Compile Tests _ | | 0 | mvndep | 23 | Maven dependency ordering for branch | | +1 | mvninstall | 591 | trunk passed | | +1 | compile | 393 | trunk passed | | +1 | checkstyle | 77 | trunk passed | | +1 | mvnsite | 0 | trunk passed | | +1 | shadedclient | 843 | branch has no errors when building and testing our client artifacts. | | +1 | javadoc | 177 | trunk passed | | 0 | spotbugs | 431 | Used deprecated FindBugs config; considering switching to SpotBugs. | | +1 | findbugs | 634 | trunk passed | | -0 | patch | 482 | Used diff version of patch file. Binary files and potentially other changes not applied. Please rebase and squash commits if necessary. | ||| _ Patch Compile Tests _ | | 0 | mvndep | 38 | Maven dependency ordering for patch | | +1 | mvninstall | 547 | the patch passed | | +1 | compile | 392 | the patch passed | | +1 | javac | 392 | the patch passed | | +1 | checkstyle | 85 | the patch passed | | +1 | mvnsite | 0 | the patch passed | | +1 | whitespace | 0 | The patch has no whitespace issues. | | +1 | shadedclient | 659 | patch has no errors when building and testing our client artifacts. | | +1 | javadoc | 173 | the patch passed | | +1 | findbugs | 699 | the patch passed | ||| _ Other Tests _ | | +1 | unit | 298 | hadoop-hdds in the patch passed. | | -1 | unit | 2644 | hadoop-ozone in the patch failed. | | +1 | asflicense | 51 | The patch does not generate ASF License warnings. | | | | 8546 | | | Reason | Tests | |---:|:--| | Failed junit tests | hadoop.ozone.container.common.statemachine.commandhandler.TestBlockDeletion | | | hadoop.ozone.client.rpc.TestMultiBlockWritesWithDnFailures | | | hadoop.ozone.om.TestOMRatisSnapshots | | | hadoop.ozone.TestSecureOzoneCluster | | | hadoop.ozone.scm.TestContainerSmallFile | | | hadoop.ozone.client.rpc.TestBlockOutputStream | | | hadoop.ozone.om.TestOzoneManagerHA | | | hadoop.ozone.om.TestOmAcls | | Subsystem | Report/Notes | |--:|:-| | Docker | Client=19.03.1 Server=19.03.1 base: https://builds.apache.org/job/hadoop-multibranch/job/PR-1163/11/artifact/out/Dockerfile | | GITHUB PR | https://github.com/apache/hadoop/pull/1163 | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux d7a4f1cd6f52 4.15.0-60-generic #67-Ubuntu SMP Thu Aug 22 16:55:30 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | personality/hadoop.sh | | git revision | trunk / 9221704 | | Default Java | 1.8.0_222 | | unit | https://builds.apache.org/job/hadoop-multibranch/job/PR-1163/11/artifact/out/patch-unit-hadoop-ozone.txt | | Test Results | https://builds.apache.org/job/hadoop-multibranch/job/PR-1163/11/testReport/ | | Max. process+thread count | 5118 (vs. ulimit of 5500) | | modules | C: hadoop-hdds/container-service hadoop-ozone/integration-test U: . | | Console output | https://builds.apache.org/job/hadoop-multibranch/job/PR-1163/11/console | | versions | git=2.7.4 maven=3.3.9 findbugs=3.1.0-RC1 | | Powered by | Apache Yetus 0.10.0 http://yetus.apache.org | This message was automatically generated. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 310974) Time Spent: 4.5h (was: 4h 20m) > Datanodes takeSnapshot should delete previously created snapshots > - > > Key:
[jira] [Work logged] (HDDS-2096) Ozone ACL document missing AddAcl API
[ https://issues.apache.org/jira/browse/HDDS-2096?focusedWorklogId=310967&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-310967 ] ASF GitHub Bot logged work on HDDS-2096: Author: ASF GitHub Bot Created on: 11/Sep/19 20:12 Start Date: 11/Sep/19 20:12 Worklog Time Spent: 10m Work Description: hadoop-yetus commented on pull request #1427: HDDS-2096. Ozone ACL document missing AddAcl API. Contributed by Xiao… URL: https://github.com/apache/hadoop/pull/1427#discussion_r323436876 ## File path: hadoop-hdds/docs/content/security/SecurityAcls.md ## @@ -78,5 +78,7 @@ supported are: of the ozone object and a list of ACLs. 2. **GetAcl** – This API will take the name and type of the ozone object and will return a list of ACLs. -3. **RemoveAcl** - This API will take the name, type of the +3. **AddAcl** - This API will take the name, type of the ozone object, the +ACL, and add it to existing ACL entries of the ozone object. Review comment: whitespace:end of line This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 310967) Time Spent: 20m (was: 10m) > Ozone ACL document missing AddAcl API > - > > Key: HDDS-2096 > URL: https://issues.apache.org/jira/browse/HDDS-2096 > Project: Hadoop Distributed Data Store > Issue Type: Bug >Reporter: Xiaoyu Yao >Assignee: Xiaoyu Yao >Priority: Major > Labels: pull-request-available > Time Spent: 20m > Remaining Estimate: 0h > > Current Ozone Native ACL APIs document looks like below, the AddAcl is > missing. > > h3. Ozone Native ACL APIs > The ACLs can be manipulated by a set of APIs supported by Ozone. The APIs > supported are: > # *SetAcl* – This API will take user principal, the name, type of the ozone > object and a list of ACLs. > # *GetAcl* – This API will take the name and type of the ozone object and > will return a list of ACLs. > # *RemoveAcl* - This API will take the name, type of the ozone object and > the ACL that has to be removed. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDDS-2096) Ozone ACL document missing AddAcl API
[ https://issues.apache.org/jira/browse/HDDS-2096?focusedWorklogId=310968&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-310968 ] ASF GitHub Bot logged work on HDDS-2096: Author: ASF GitHub Bot Created on: 11/Sep/19 20:12 Start Date: 11/Sep/19 20:12 Worklog Time Spent: 10m Work Description: hadoop-yetus commented on issue #1427: HDDS-2096. Ozone ACL document missing AddAcl API. Contributed by Xiao… URL: https://github.com/apache/hadoop/pull/1427#issuecomment-530546164 :broken_heart: **-1 overall** | Vote | Subsystem | Runtime | Comment | |::|--:|:|:| | 0 | reexec | 42 | Docker mode activated. | ||| _ Prechecks _ | | +1 | dupname | 0 | No case conflicting files found. | | +1 | @author | 0 | The patch does not contain any @author tags. | ||| _ trunk Compile Tests _ | | +1 | mvninstall | 644 | trunk passed | | +1 | mvnsite | 0 | trunk passed | | +1 | shadedclient | 1469 | branch has no errors when building and testing our client artifacts. | ||| _ Patch Compile Tests _ | | +1 | mvninstall | 568 | the patch passed | | +1 | mvnsite | 0 | the patch passed | | -1 | whitespace | 0 | The patch has 1 line(s) that end in whitespace. Use git apply --whitespace=fix <>. Refer https://git-scm.com/docs/git-apply | | +1 | shadedclient | 725 | patch has no errors when building and testing our client artifacts. | ||| _ Other Tests _ | | +1 | asflicense | 60 | The patch does not generate ASF License warnings. | | | | 3038 | | | Subsystem | Report/Notes | |--:|:-| | Docker | Client=19.03.1 Server=19.03.1 base: https://builds.apache.org/job/hadoop-multibranch/job/PR-1427/1/artifact/out/Dockerfile | | GITHUB PR | https://github.com/apache/hadoop/pull/1427 | | Optional Tests | dupname asflicense mvnsite | | uname | Linux 2ed3846aeacb 4.15.0-60-generic #67-Ubuntu SMP Thu Aug 22 16:55:30 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | personality/hadoop.sh | | git revision | trunk / 64ed6b1 | | whitespace | https://builds.apache.org/job/hadoop-multibranch/job/PR-1427/1/artifact/out/whitespace-eol.txt | | Max. process+thread count | 412 (vs. ulimit of 5500) | | modules | C: hadoop-hdds/docs U: hadoop-hdds/docs | | Console output | https://builds.apache.org/job/hadoop-multibranch/job/PR-1427/1/console | | versions | git=2.7.4 maven=3.3.9 | | Powered by | Apache Yetus 0.10.0 http://yetus.apache.org | This message was automatically generated. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 310968) Time Spent: 0.5h (was: 20m) > Ozone ACL document missing AddAcl API > - > > Key: HDDS-2096 > URL: https://issues.apache.org/jira/browse/HDDS-2096 > Project: Hadoop Distributed Data Store > Issue Type: Bug >Reporter: Xiaoyu Yao >Assignee: Xiaoyu Yao >Priority: Major > Labels: pull-request-available > Time Spent: 0.5h > Remaining Estimate: 0h > > Current Ozone Native ACL APIs document looks like below, the AddAcl is > missing. > > h3. Ozone Native ACL APIs > The ACLs can be manipulated by a set of APIs supported by Ozone. The APIs > supported are: > # *SetAcl* – This API will take user principal, the name, type of the ozone > object and a list of ACLs. > # *GetAcl* – This API will take the name and type of the ozone object and > will return a list of ACLs. > # *RemoveAcl* - This API will take the name, type of the ozone object and > the ACL that has to be removed. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-2112) Ozone rename is behaving different compared with HDFS
[ https://issues.apache.org/jira/browse/HDDS-2112?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Loughran updated HDDS-2112: - Summary: Ozone rename is behaving different compared with HDFS (was: rename is behaving different compared with HDFS) > Ozone rename is behaving different compared with HDFS > - > > Key: HDDS-2112 > URL: https://issues.apache.org/jira/browse/HDDS-2112 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Filesystem >Reporter: Istvan Fajth >Priority: Major > Attachments: demonstrative_test.patch > > > I am attaching a patch file, that introduces two new tests for the > OzoneFileSystem implementation which demonstrates the expected behaviour. > Case 1: > Given a directory a file "/source/subdir/file", and a directory /target > When fs.rename("/source/subdir/file", "/target/subdir/file") is called > Then DistributedFileSystem (HDFS), is returning false from the method, while > OzoneFileSystem throws a FileNotFoundException as "/target/subdir" is not > existing. > The expected behaviour would be to return false in this case instead of > throwing an exception with that behave the same as DistributedFileSystem does. > > Case 2: > Given a directory "/source" and a file "/targetFile" > When fs.rename("/source", "/targetFile") is called > Then DistributedFileSystem (HDFS), is returning false from the method, while > OzoneFileSystem throws a FileAlreadyExistsException as "/targetFile" does > exist. > The expected behaviour would be to return false in this case instead of > throwing an exception with that behave the same as DistributedFileSystem does. > > It may be considered as well a bug in HDFS, however it is not clear from the > FileSystem interface's documentation on the two rename methods that it > defines in which cases an exception should be thrown and in which cases a > return false is the expected behaviour. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDDS-2112) rename is behaving different compared with HDFS
[ https://issues.apache.org/jira/browse/HDDS-2112?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16927955#comment-16927955 ] Steve Loughran commented on HDDS-2112: -- This is covered inL https://github.com/apache/hadoop/blob/trunk/hadoop-common-project/hadoop-common/src/site/markdown/filesystem/filesystem.md#boolean-renamepath-src-path-d basically * rename is a confusing mess and the whole return true/false world complicates life * everything other than HDFS (including local) raises an FNFE when the parent isn't there * having rename() return false is generally useless for applications, which are invariably full of code like {code} if (!rename(src, dest)) throw new IOE("rename failed and we don't know why!") {code} We have tests for what rename does in inorg.apache.hadoop.fs.contract.AbstractContractRenameTest; if you want to see the core quirks which rename() can get up to, look at org.apache.hadoop.fs.contract.ContractOptions Ultimately we need to get HADOOP-11452 in and make the rename/3 call public. That's the one called via FileContext and which, for HDFS, does raise exceptions in the two situations created. closing as a WONTFIX as it is HDFS through the FileSystem.rename(src, dest) API call which is considered the incorrect behaviour Well done for finding this though -good bit of research! > rename is behaving different compared with HDFS > --- > > Key: HDDS-2112 > URL: https://issues.apache.org/jira/browse/HDDS-2112 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Filesystem >Reporter: Istvan Fajth >Priority: Major > Attachments: demonstrative_test.patch > > > I am attaching a patch file, that introduces two new tests for the > OzoneFileSystem implementation which demonstrates the expected behaviour. > Case 1: > Given a directory a file "/source/subdir/file", and a directory /target > When fs.rename("/source/subdir/file", "/target/subdir/file") is called > Then DistributedFileSystem (HDFS), is returning false from the method, while > OzoneFileSystem throws a FileNotFoundException as "/target/subdir" is not > existing. > The expected behaviour would be to return false in this case instead of > throwing an exception with that behave the same as DistributedFileSystem does. > > Case 2: > Given a directory "/source" and a file "/targetFile" > When fs.rename("/source", "/targetFile") is called > Then DistributedFileSystem (HDFS), is returning false from the method, while > OzoneFileSystem throws a FileAlreadyExistsException as "/targetFile" does > exist. > The expected behaviour would be to return false in this case instead of > throwing an exception with that behave the same as DistributedFileSystem does. > > It may be considered as well a bug in HDFS, however it is not clear from the > FileSystem interface's documentation on the two rename methods that it > defines in which cases an exception should be thrown and in which cases a > return false is the expected behaviour. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Resolved] (HDDS-2112) rename is behaving different compared with HDFS
[ https://issues.apache.org/jira/browse/HDDS-2112?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Loughran resolved HDDS-2112. -- Resolution: Won't Fix > rename is behaving different compared with HDFS > --- > > Key: HDDS-2112 > URL: https://issues.apache.org/jira/browse/HDDS-2112 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Filesystem >Reporter: Istvan Fajth >Priority: Major > Attachments: demonstrative_test.patch > > > I am attaching a patch file, that introduces two new tests for the > OzoneFileSystem implementation which demonstrates the expected behaviour. > Case 1: > Given a directory a file "/source/subdir/file", and a directory /target > When fs.rename("/source/subdir/file", "/target/subdir/file") is called > Then DistributedFileSystem (HDFS), is returning false from the method, while > OzoneFileSystem throws a FileNotFoundException as "/target/subdir" is not > existing. > The expected behaviour would be to return false in this case instead of > throwing an exception with that behave the same as DistributedFileSystem does. > > Case 2: > Given a directory "/source" and a file "/targetFile" > When fs.rename("/source", "/targetFile") is called > Then DistributedFileSystem (HDFS), is returning false from the method, while > OzoneFileSystem throws a FileAlreadyExistsException as "/targetFile" does > exist. > The expected behaviour would be to return false in this case instead of > throwing an exception with that behave the same as DistributedFileSystem does. > > It may be considered as well a bug in HDFS, however it is not clear from the > FileSystem interface's documentation on the two rename methods that it > defines in which cases an exception should be thrown and in which cases a > return false is the expected behaviour. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14795) Add Throttler for writing block
[ https://issues.apache.org/jira/browse/HDFS-14795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16927940#comment-16927940 ] Hadoop QA commented on HDFS-14795: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 23s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s{color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 21m 17s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 19s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 5s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 28s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 16m 7s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 38s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 10s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 15s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 7s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 7s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 49s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 16s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} xml {color} | {color:green} 0m 3s{color} | {color:green} The patch has no ill-formed XML file. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 14m 21s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 5s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 5s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red} 95m 59s{color} | {color:red} hadoop-hdfs in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 31s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}164m 56s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.hdfs.tools.TestDFSAdminWithHA | | | hadoop.hdfs.TestReconstructStripedFile | \\ \\ || Subsystem || Report/Notes || | Docker | Client=19.03.1 Server=19.03.1 Image:yetus/hadoop:bdbca0e53b4 | | JIRA Issue | HDFS-14795 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12980106/HDFS-14795.010.patch | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle xml | | uname | Linux 54d46fe40a9b 4.15.0-54-generic #58-Ubuntu SMP Mon Jun 24 10:55:24 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 9221704 | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_222 | | findbugs | v3.1.0-RC1 | | unit | https://builds.apache.org/job/PreCommit-HDFS-Build/27846/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt | | Test Results | https://builds.apache.org/job/PreCommit
[jira] [Commented] (HDDS-2075) Tracing in OzoneManager call is propagated with wrong parent
[ https://issues.apache.org/jira/browse/HDDS-2075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16927937#comment-16927937 ] Hudson commented on HDDS-2075: -- FAILURE: Integrated in Jenkins build Hadoop-trunk-Commit #17277 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/17277/]) HDDS-2075. Tracing in OzoneManager call is propagated with wrong parent (xyao: rev 64ed6b177d6b00b22d45576a8517432dc6c03348) * (edit) hadoop-ozone/client/src/main/java/org/apache/hadoop/ozone/client/rpc/RpcClient.java * (edit) hadoop-ozone/common/src/main/java/org/apache/hadoop/ozone/om/protocolPB/OzoneManagerProtocolClientSideTranslatorPB.java > Tracing in OzoneManager call is propagated with wrong parent > > > Key: HDDS-2075 > URL: https://issues.apache.org/jira/browse/HDDS-2075 > Project: Hadoop Distributed Data Store > Issue Type: Sub-task >Reporter: Elek, Marton >Assignee: Doroszlai, Attila >Priority: Major > Labels: pull-request-available > Attachments: create_bucket-new.png, create_bucket.png > > Time Spent: 1h > Remaining Estimate: 0h > > As you can see in the attached screenshot the OzoneManager.createBucket > (server side) tracing information is the children of the freon.createBucket > instead of the freon OzoneManagerProtocolPB.submitRequest. > To avoid confusion the hierarchy should be fixed (Most probably we generate > the child span AFTER we already serialized the parent one to the message) -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14655) [SBN Read] Namenode crashes if one of The JN is down
[ https://issues.apache.org/jira/browse/HDFS-14655?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ayush Saxena updated HDFS-14655: Status: Patch Available (was: Open) > [SBN Read] Namenode crashes if one of The JN is down > > > Key: HDFS-14655 > URL: https://issues.apache.org/jira/browse/HDFS-14655 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 3.3.0 >Reporter: Harshakiran Reddy >Assignee: Ayush Saxena >Priority: Critical > Attachments: HDFS-14655-01.patch, HDFS-14655-02.patch, > HDFS-14655-03.patch, HDFS-14655-04.patch, HDFS-14655.poc.patch > > > {noformat} > 2019-07-04 17:35:54,064 | INFO | Logger channel (from parallel executor) to > XXX/XXX | Retrying connect to server: XXX/XXX. Already tried > 9 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, > sleepTime=1000 MILLISECONDS) | Client.java:975 > 2019-07-04 17:35:54,087 | FATAL | Edit log tailer | Unknown error encountered > while tailing edits. Shutting down standby NN. | EditLogTailer.java:474 > java.lang.OutOfMemoryError: unable to create new native thread > at java.lang.Thread.start0(Native Method) > at java.lang.Thread.start(Thread.java:717) > at > java.util.concurrent.ThreadPoolExecutor.addWorker(ThreadPoolExecutor.java:957) > at > java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1378) > at > com.google.common.util.concurrent.MoreExecutors$ListeningDecorator.execute(MoreExecutors.java:440) > at > com.google.common.util.concurrent.AbstractListeningExecutorService.submit(AbstractListeningExecutorService.java:56) > at > org.apache.hadoop.hdfs.qjournal.client.IPCLoggerChannel.getJournaledEdits(IPCLoggerChannel.java:565) > at > org.apache.hadoop.hdfs.qjournal.client.AsyncLoggerSet.getJournaledEdits(AsyncLoggerSet.java:272) > at > org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager.selectRpcInputStreams(QuorumJournalManager.java:533) > at > org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager.selectInputStreams(QuorumJournalManager.java:508) > at > org.apache.hadoop.hdfs.server.namenode.JournalSet.selectInputStreams(JournalSet.java:275) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLog.selectInputStreams(FSEditLog.java:1681) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLog.selectInputStreams(FSEditLog.java:1714) > at > org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer.doTailEdits(EditLogTailer.java:307) > at > org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.doWork(EditLogTailer.java:460) > at > org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.access$300(EditLogTailer.java:410) > at > org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread$1.run(EditLogTailer.java:427) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:360) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1709) > at > org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:483) > at > org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.run(EditLogTailer.java:423) > 2019-07-04 17:35:54,112 | INFO | Edit log tailer | Exiting with status 1: > java.lang.OutOfMemoryError: unable to create new native thread | > ExitUtil.java:210 > {noformat} -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14655) [SBN Read] Namenode crashes if one of The JN is down
[ https://issues.apache.org/jira/browse/HDFS-14655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16927923#comment-16927923 ] Ayush Saxena commented on HDFS-14655: - No issues, We can definitely wait. It would be good if we can get feedback from them too. :) > [SBN Read] Namenode crashes if one of The JN is down > > > Key: HDFS-14655 > URL: https://issues.apache.org/jira/browse/HDFS-14655 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 3.3.0 >Reporter: Harshakiran Reddy >Assignee: Ayush Saxena >Priority: Critical > Attachments: HDFS-14655-01.patch, HDFS-14655-02.patch, > HDFS-14655-03.patch, HDFS-14655-04.patch, HDFS-14655.poc.patch > > > {noformat} > 2019-07-04 17:35:54,064 | INFO | Logger channel (from parallel executor) to > XXX/XXX | Retrying connect to server: XXX/XXX. Already tried > 9 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, > sleepTime=1000 MILLISECONDS) | Client.java:975 > 2019-07-04 17:35:54,087 | FATAL | Edit log tailer | Unknown error encountered > while tailing edits. Shutting down standby NN. | EditLogTailer.java:474 > java.lang.OutOfMemoryError: unable to create new native thread > at java.lang.Thread.start0(Native Method) > at java.lang.Thread.start(Thread.java:717) > at > java.util.concurrent.ThreadPoolExecutor.addWorker(ThreadPoolExecutor.java:957) > at > java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1378) > at > com.google.common.util.concurrent.MoreExecutors$ListeningDecorator.execute(MoreExecutors.java:440) > at > com.google.common.util.concurrent.AbstractListeningExecutorService.submit(AbstractListeningExecutorService.java:56) > at > org.apache.hadoop.hdfs.qjournal.client.IPCLoggerChannel.getJournaledEdits(IPCLoggerChannel.java:565) > at > org.apache.hadoop.hdfs.qjournal.client.AsyncLoggerSet.getJournaledEdits(AsyncLoggerSet.java:272) > at > org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager.selectRpcInputStreams(QuorumJournalManager.java:533) > at > org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager.selectInputStreams(QuorumJournalManager.java:508) > at > org.apache.hadoop.hdfs.server.namenode.JournalSet.selectInputStreams(JournalSet.java:275) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLog.selectInputStreams(FSEditLog.java:1681) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLog.selectInputStreams(FSEditLog.java:1714) > at > org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer.doTailEdits(EditLogTailer.java:307) > at > org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.doWork(EditLogTailer.java:460) > at > org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.access$300(EditLogTailer.java:410) > at > org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread$1.run(EditLogTailer.java:427) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:360) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1709) > at > org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:483) > at > org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.run(EditLogTailer.java:423) > 2019-07-04 17:35:54,112 | INFO | Edit log tailer | Exiting with status 1: > java.lang.OutOfMemoryError: unable to create new native thread | > ExitUtil.java:210 > {noformat} -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14844) Make buffer of BlockReaderRemote#newBlockReader#BufferedOutputStream configurable
[ https://issues.apache.org/jira/browse/HDFS-14844?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16927919#comment-16927919 ] Hadoop QA commented on HDFS-14844: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 41s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s{color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 20s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 18m 45s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 3m 8s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 50s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 54s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 14m 17s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 46s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 20s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 11s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 41s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 56s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 2m 56s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 0m 50s{color} | {color:orange} hadoop-hdfs-project: The patch generated 1 new + 69 unchanged - 0 fixed = 70 total (was 69) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 51s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} xml {color} | {color:green} 0m 2s{color} | {color:green} The patch has no ill-formed XML file. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 12m 21s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 4m 0s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 18s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 1m 55s{color} | {color:green} hadoop-hdfs-client in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green}101m 40s{color} | {color:green} hadoop-hdfs in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 34s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}173m 42s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=18.09.7 Server=18.09.7 Image:yetus/hadoop:bdbca0e53b4 | | JIRA Issue | HDFS-14844 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12980104/HDFS-14844.001.patch | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle xml | | uname | Linux 6d6a3fa9050e 4.15.0-58-generic #64-Ubuntu SMP Tue Aug 6 11:12:41 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven |
[jira] [Work logged] (HDDS-2075) Tracing in OzoneManager call is propagated with wrong parent
[ https://issues.apache.org/jira/browse/HDDS-2075?focusedWorklogId=310942&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-310942 ] ASF GitHub Bot logged work on HDDS-2075: Author: ASF GitHub Bot Created on: 11/Sep/19 19:06 Start Date: 11/Sep/19 19:06 Worklog Time Spent: 10m Work Description: adoroszlai commented on issue #1415: HDDS-2075. Tracing in OzoneManager call is propagated with wrong parent URL: https://github.com/apache/hadoop/pull/1415#issuecomment-530521643 Thanks @xiaoyuyao for reviewing and merging it. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 310942) Time Spent: 1h (was: 50m) > Tracing in OzoneManager call is propagated with wrong parent > > > Key: HDDS-2075 > URL: https://issues.apache.org/jira/browse/HDDS-2075 > Project: Hadoop Distributed Data Store > Issue Type: Sub-task >Reporter: Elek, Marton >Assignee: Doroszlai, Attila >Priority: Major > Labels: pull-request-available > Attachments: create_bucket-new.png, create_bucket.png > > Time Spent: 1h > Remaining Estimate: 0h > > As you can see in the attached screenshot the OzoneManager.createBucket > (server side) tracing information is the children of the freon.createBucket > instead of the freon OzoneManagerProtocolPB.submitRequest. > To avoid confusion the hierarchy should be fixed (Most probably we generate > the child span AFTER we already serialized the parent one to the message) -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14655) [SBN Read] Namenode crashes if one of The JN is down
[ https://issues.apache.org/jira/browse/HDFS-14655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16927910#comment-16927910 ] Erik Krogen commented on HDFS-14655: Agreed, I think we are saying the same thing. I'm +1 on v004 patch. [~shv] and [~vagarychen] are both at a conference until Friday, would you mind if we wait until then to see if they want to provide any input before we commit? > [SBN Read] Namenode crashes if one of The JN is down > > > Key: HDFS-14655 > URL: https://issues.apache.org/jira/browse/HDFS-14655 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 3.3.0 >Reporter: Harshakiran Reddy >Assignee: Ayush Saxena >Priority: Critical > Attachments: HDFS-14655-01.patch, HDFS-14655-02.patch, > HDFS-14655-03.patch, HDFS-14655-04.patch, HDFS-14655.poc.patch > > > {noformat} > 2019-07-04 17:35:54,064 | INFO | Logger channel (from parallel executor) to > XXX/XXX | Retrying connect to server: XXX/XXX. Already tried > 9 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, > sleepTime=1000 MILLISECONDS) | Client.java:975 > 2019-07-04 17:35:54,087 | FATAL | Edit log tailer | Unknown error encountered > while tailing edits. Shutting down standby NN. | EditLogTailer.java:474 > java.lang.OutOfMemoryError: unable to create new native thread > at java.lang.Thread.start0(Native Method) > at java.lang.Thread.start(Thread.java:717) > at > java.util.concurrent.ThreadPoolExecutor.addWorker(ThreadPoolExecutor.java:957) > at > java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1378) > at > com.google.common.util.concurrent.MoreExecutors$ListeningDecorator.execute(MoreExecutors.java:440) > at > com.google.common.util.concurrent.AbstractListeningExecutorService.submit(AbstractListeningExecutorService.java:56) > at > org.apache.hadoop.hdfs.qjournal.client.IPCLoggerChannel.getJournaledEdits(IPCLoggerChannel.java:565) > at > org.apache.hadoop.hdfs.qjournal.client.AsyncLoggerSet.getJournaledEdits(AsyncLoggerSet.java:272) > at > org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager.selectRpcInputStreams(QuorumJournalManager.java:533) > at > org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager.selectInputStreams(QuorumJournalManager.java:508) > at > org.apache.hadoop.hdfs.server.namenode.JournalSet.selectInputStreams(JournalSet.java:275) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLog.selectInputStreams(FSEditLog.java:1681) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLog.selectInputStreams(FSEditLog.java:1714) > at > org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer.doTailEdits(EditLogTailer.java:307) > at > org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.doWork(EditLogTailer.java:460) > at > org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.access$300(EditLogTailer.java:410) > at > org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread$1.run(EditLogTailer.java:427) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:360) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1709) > at > org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:483) > at > org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.run(EditLogTailer.java:423) > 2019-07-04 17:35:54,112 | INFO | Edit log tailer | Exiting with status 1: > java.lang.OutOfMemoryError: unable to create new native thread | > ExitUtil.java:210 > {noformat} -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HDFS-14655) [SBN Read] Namenode crashes if one of The JN is down
[ https://issues.apache.org/jira/browse/HDFS-14655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16927900#comment-16927900 ] Ayush Saxena edited comment on HDFS-14655 at 9/11/19 7:00 PM: -- Yes, As I said above too, It interrupts But doesn’t kill, Then is Client.java it checks whether the thread is interrupted or not, if interrupted then it stops retrying, IIRC So thread can be reused.. was (Author: ayushtkn): Yes, As I said above too, It interrupts But doesn’t kill, Then is Client.java it checks whether the thread is interrupted or not, if interrupted then the thread is killed, IIRC > [SBN Read] Namenode crashes if one of The JN is down > > > Key: HDFS-14655 > URL: https://issues.apache.org/jira/browse/HDFS-14655 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 3.3.0 >Reporter: Harshakiran Reddy >Assignee: Ayush Saxena >Priority: Critical > Attachments: HDFS-14655-01.patch, HDFS-14655-02.patch, > HDFS-14655-03.patch, HDFS-14655-04.patch, HDFS-14655.poc.patch > > > {noformat} > 2019-07-04 17:35:54,064 | INFO | Logger channel (from parallel executor) to > XXX/XXX | Retrying connect to server: XXX/XXX. Already tried > 9 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, > sleepTime=1000 MILLISECONDS) | Client.java:975 > 2019-07-04 17:35:54,087 | FATAL | Edit log tailer | Unknown error encountered > while tailing edits. Shutting down standby NN. | EditLogTailer.java:474 > java.lang.OutOfMemoryError: unable to create new native thread > at java.lang.Thread.start0(Native Method) > at java.lang.Thread.start(Thread.java:717) > at > java.util.concurrent.ThreadPoolExecutor.addWorker(ThreadPoolExecutor.java:957) > at > java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1378) > at > com.google.common.util.concurrent.MoreExecutors$ListeningDecorator.execute(MoreExecutors.java:440) > at > com.google.common.util.concurrent.AbstractListeningExecutorService.submit(AbstractListeningExecutorService.java:56) > at > org.apache.hadoop.hdfs.qjournal.client.IPCLoggerChannel.getJournaledEdits(IPCLoggerChannel.java:565) > at > org.apache.hadoop.hdfs.qjournal.client.AsyncLoggerSet.getJournaledEdits(AsyncLoggerSet.java:272) > at > org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager.selectRpcInputStreams(QuorumJournalManager.java:533) > at > org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager.selectInputStreams(QuorumJournalManager.java:508) > at > org.apache.hadoop.hdfs.server.namenode.JournalSet.selectInputStreams(JournalSet.java:275) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLog.selectInputStreams(FSEditLog.java:1681) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLog.selectInputStreams(FSEditLog.java:1714) > at > org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer.doTailEdits(EditLogTailer.java:307) > at > org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.doWork(EditLogTailer.java:460) > at > org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.access$300(EditLogTailer.java:410) > at > org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread$1.run(EditLogTailer.java:427) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:360) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1709) > at > org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:483) > at > org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.run(EditLogTailer.java:423) > 2019-07-04 17:35:54,112 | INFO | Edit log tailer | Exiting with status 1: > java.lang.OutOfMemoryError: unable to create new native thread | > ExitUtil.java:210 > {noformat} -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDDS-2075) Tracing in OzoneManager call is propagated with wrong parent
[ https://issues.apache.org/jira/browse/HDDS-2075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16927905#comment-16927905 ] Xiaoyu Yao commented on HDDS-2075: -- cc: [~nandakumar131] we should port this for 0.4.1 if possible. > Tracing in OzoneManager call is propagated with wrong parent > > > Key: HDDS-2075 > URL: https://issues.apache.org/jira/browse/HDDS-2075 > Project: Hadoop Distributed Data Store > Issue Type: Sub-task >Reporter: Elek, Marton >Assignee: Doroszlai, Attila >Priority: Major > Labels: pull-request-available > Attachments: create_bucket-new.png, create_bucket.png > > Time Spent: 50m > Remaining Estimate: 0h > > As you can see in the attached screenshot the OzoneManager.createBucket > (server side) tracing information is the children of the freon.createBucket > instead of the freon OzoneManagerProtocolPB.submitRequest. > To avoid confusion the hierarchy should be fixed (Most probably we generate > the child span AFTER we already serialized the parent one to the message) -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HDFS-14655) [SBN Read] Namenode crashes if one of The JN is down
[ https://issues.apache.org/jira/browse/HDFS-14655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16927900#comment-16927900 ] Ayush Saxena edited comment on HDFS-14655 at 9/11/19 6:59 PM: -- Yes, As I said above too, It interrupts But doesn’t kill, Then is Client.java it checks whether the thread is interrupted or not, if interrupted then the thread is killed, IIRC was (Author: ayushtkn): Yes, As I said above too, It interrupts But doesn’t kill,m > [SBN Read] Namenode crashes if one of The JN is down > > > Key: HDFS-14655 > URL: https://issues.apache.org/jira/browse/HDFS-14655 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 3.3.0 >Reporter: Harshakiran Reddy >Assignee: Ayush Saxena >Priority: Critical > Attachments: HDFS-14655-01.patch, HDFS-14655-02.patch, > HDFS-14655-03.patch, HDFS-14655-04.patch, HDFS-14655.poc.patch > > > {noformat} > 2019-07-04 17:35:54,064 | INFO | Logger channel (from parallel executor) to > XXX/XXX | Retrying connect to server: XXX/XXX. Already tried > 9 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, > sleepTime=1000 MILLISECONDS) | Client.java:975 > 2019-07-04 17:35:54,087 | FATAL | Edit log tailer | Unknown error encountered > while tailing edits. Shutting down standby NN. | EditLogTailer.java:474 > java.lang.OutOfMemoryError: unable to create new native thread > at java.lang.Thread.start0(Native Method) > at java.lang.Thread.start(Thread.java:717) > at > java.util.concurrent.ThreadPoolExecutor.addWorker(ThreadPoolExecutor.java:957) > at > java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1378) > at > com.google.common.util.concurrent.MoreExecutors$ListeningDecorator.execute(MoreExecutors.java:440) > at > com.google.common.util.concurrent.AbstractListeningExecutorService.submit(AbstractListeningExecutorService.java:56) > at > org.apache.hadoop.hdfs.qjournal.client.IPCLoggerChannel.getJournaledEdits(IPCLoggerChannel.java:565) > at > org.apache.hadoop.hdfs.qjournal.client.AsyncLoggerSet.getJournaledEdits(AsyncLoggerSet.java:272) > at > org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager.selectRpcInputStreams(QuorumJournalManager.java:533) > at > org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager.selectInputStreams(QuorumJournalManager.java:508) > at > org.apache.hadoop.hdfs.server.namenode.JournalSet.selectInputStreams(JournalSet.java:275) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLog.selectInputStreams(FSEditLog.java:1681) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLog.selectInputStreams(FSEditLog.java:1714) > at > org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer.doTailEdits(EditLogTailer.java:307) > at > org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.doWork(EditLogTailer.java:460) > at > org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.access$300(EditLogTailer.java:410) > at > org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread$1.run(EditLogTailer.java:427) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:360) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1709) > at > org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:483) > at > org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.run(EditLogTailer.java:423) > 2019-07-04 17:35:54,112 | INFO | Edit log tailer | Exiting with status 1: > java.lang.OutOfMemoryError: unable to create new native thread | > ExitUtil.java:210 > {noformat} -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-2075) Tracing in OzoneManager call is propagated with wrong parent
[ https://issues.apache.org/jira/browse/HDDS-2075?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiaoyu Yao updated HDDS-2075: - Resolution: Fixed Status: Resolved (was: Patch Available) Thanks [~adoroszlai] for the contribution. I've merged the PR to trunk. > Tracing in OzoneManager call is propagated with wrong parent > > > Key: HDDS-2075 > URL: https://issues.apache.org/jira/browse/HDDS-2075 > Project: Hadoop Distributed Data Store > Issue Type: Sub-task >Reporter: Elek, Marton >Assignee: Doroszlai, Attila >Priority: Major > Labels: pull-request-available > Attachments: create_bucket-new.png, create_bucket.png > > Time Spent: 50m > Remaining Estimate: 0h > > As you can see in the attached screenshot the OzoneManager.createBucket > (server side) tracing information is the children of the freon.createBucket > instead of the freon OzoneManagerProtocolPB.submitRequest. > To avoid confusion the hierarchy should be fixed (Most probably we generate > the child span AFTER we already serialized the parent one to the message) -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDDS-2075) Tracing in OzoneManager call is propagated with wrong parent
[ https://issues.apache.org/jira/browse/HDDS-2075?focusedWorklogId=310935&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-310935 ] ASF GitHub Bot logged work on HDDS-2075: Author: ASF GitHub Bot Created on: 11/Sep/19 18:59 Start Date: 11/Sep/19 18:59 Worklog Time Spent: 10m Work Description: xiaoyuyao commented on pull request #1415: HDDS-2075. Tracing in OzoneManager call is propagated with wrong parent URL: https://github.com/apache/hadoop/pull/1415 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 310935) Time Spent: 50m (was: 40m) > Tracing in OzoneManager call is propagated with wrong parent > > > Key: HDDS-2075 > URL: https://issues.apache.org/jira/browse/HDDS-2075 > Project: Hadoop Distributed Data Store > Issue Type: Sub-task >Reporter: Elek, Marton >Assignee: Doroszlai, Attila >Priority: Major > Labels: pull-request-available > Attachments: create_bucket-new.png, create_bucket.png > > Time Spent: 50m > Remaining Estimate: 0h > > As you can see in the attached screenshot the OzoneManager.createBucket > (server side) tracing information is the children of the freon.createBucket > instead of the freon OzoneManagerProtocolPB.submitRequest. > To avoid confusion the hierarchy should be fixed (Most probably we generate > the child span AFTER we already serialized the parent one to the message) -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14655) [SBN Read] Namenode crashes if one of The JN is down
[ https://issues.apache.org/jira/browse/HDFS-14655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16927900#comment-16927900 ] Ayush Saxena commented on HDFS-14655: - Yes, As I said above too, It interrupts But doesn’t kill,m > [SBN Read] Namenode crashes if one of The JN is down > > > Key: HDFS-14655 > URL: https://issues.apache.org/jira/browse/HDFS-14655 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 3.3.0 >Reporter: Harshakiran Reddy >Assignee: Ayush Saxena >Priority: Critical > Attachments: HDFS-14655-01.patch, HDFS-14655-02.patch, > HDFS-14655-03.patch, HDFS-14655-04.patch, HDFS-14655.poc.patch > > > {noformat} > 2019-07-04 17:35:54,064 | INFO | Logger channel (from parallel executor) to > XXX/XXX | Retrying connect to server: XXX/XXX. Already tried > 9 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, > sleepTime=1000 MILLISECONDS) | Client.java:975 > 2019-07-04 17:35:54,087 | FATAL | Edit log tailer | Unknown error encountered > while tailing edits. Shutting down standby NN. | EditLogTailer.java:474 > java.lang.OutOfMemoryError: unable to create new native thread > at java.lang.Thread.start0(Native Method) > at java.lang.Thread.start(Thread.java:717) > at > java.util.concurrent.ThreadPoolExecutor.addWorker(ThreadPoolExecutor.java:957) > at > java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1378) > at > com.google.common.util.concurrent.MoreExecutors$ListeningDecorator.execute(MoreExecutors.java:440) > at > com.google.common.util.concurrent.AbstractListeningExecutorService.submit(AbstractListeningExecutorService.java:56) > at > org.apache.hadoop.hdfs.qjournal.client.IPCLoggerChannel.getJournaledEdits(IPCLoggerChannel.java:565) > at > org.apache.hadoop.hdfs.qjournal.client.AsyncLoggerSet.getJournaledEdits(AsyncLoggerSet.java:272) > at > org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager.selectRpcInputStreams(QuorumJournalManager.java:533) > at > org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager.selectInputStreams(QuorumJournalManager.java:508) > at > org.apache.hadoop.hdfs.server.namenode.JournalSet.selectInputStreams(JournalSet.java:275) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLog.selectInputStreams(FSEditLog.java:1681) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLog.selectInputStreams(FSEditLog.java:1714) > at > org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer.doTailEdits(EditLogTailer.java:307) > at > org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.doWork(EditLogTailer.java:460) > at > org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.access$300(EditLogTailer.java:410) > at > org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread$1.run(EditLogTailer.java:427) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:360) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1709) > at > org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:483) > at > org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.run(EditLogTailer.java:423) > 2019-07-04 17:35:54,112 | INFO | Edit log tailer | Exiting with status 1: > java.lang.OutOfMemoryError: unable to create new native thread | > ExitUtil.java:210 > {noformat} -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-2096) Ozone ACL document missing AddAcl API
[ https://issues.apache.org/jira/browse/HDDS-2096?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HDDS-2096: - Labels: pull-request-available (was: ) > Ozone ACL document missing AddAcl API > - > > Key: HDDS-2096 > URL: https://issues.apache.org/jira/browse/HDDS-2096 > Project: Hadoop Distributed Data Store > Issue Type: Test >Reporter: Xiaoyu Yao >Assignee: Xiaoyu Yao >Priority: Major > Labels: pull-request-available > > Current Ozone Native ACL APIs document looks like below, the AddAcl is > missing. > > h3. Ozone Native ACL APIs > The ACLs can be manipulated by a set of APIs supported by Ozone. The APIs > supported are: > # *SetAcl* – This API will take user principal, the name, type of the ozone > object and a list of ACLs. > # *GetAcl* – This API will take the name and type of the ozone object and > will return a list of ACLs. > # *RemoveAcl* - This API will take the name, type of the ozone object and > the ACL that has to be removed. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-2096) Ozone ACL document missing AddAcl API
[ https://issues.apache.org/jira/browse/HDDS-2096?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiaoyu Yao updated HDDS-2096: - Issue Type: Bug (was: Test) > Ozone ACL document missing AddAcl API > - > > Key: HDDS-2096 > URL: https://issues.apache.org/jira/browse/HDDS-2096 > Project: Hadoop Distributed Data Store > Issue Type: Bug >Reporter: Xiaoyu Yao >Assignee: Xiaoyu Yao >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > Current Ozone Native ACL APIs document looks like below, the AddAcl is > missing. > > h3. Ozone Native ACL APIs > The ACLs can be manipulated by a set of APIs supported by Ozone. The APIs > supported are: > # *SetAcl* – This API will take user principal, the name, type of the ozone > object and a list of ACLs. > # *GetAcl* – This API will take the name and type of the ozone object and > will return a list of ACLs. > # *RemoveAcl* - This API will take the name, type of the ozone object and > the ACL that has to be removed. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDDS-2096) Ozone ACL document missing AddAcl API
[ https://issues.apache.org/jira/browse/HDDS-2096?focusedWorklogId=310934&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-310934 ] ASF GitHub Bot logged work on HDDS-2096: Author: ASF GitHub Bot Created on: 11/Sep/19 18:57 Start Date: 11/Sep/19 18:57 Worklog Time Spent: 10m Work Description: xiaoyuyao commented on pull request #1427: HDDS-2096. Ozone ACL document missing AddAcl API. Contributed by Xiao… URL: https://github.com/apache/hadoop/pull/1427 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 310934) Remaining Estimate: 0h Time Spent: 10m > Ozone ACL document missing AddAcl API > - > > Key: HDDS-2096 > URL: https://issues.apache.org/jira/browse/HDDS-2096 > Project: Hadoop Distributed Data Store > Issue Type: Test >Reporter: Xiaoyu Yao >Assignee: Xiaoyu Yao >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > Current Ozone Native ACL APIs document looks like below, the AddAcl is > missing. > > h3. Ozone Native ACL APIs > The ACLs can be manipulated by a set of APIs supported by Ozone. The APIs > supported are: > # *SetAcl* – This API will take user principal, the name, type of the ozone > object and a list of ACLs. > # *GetAcl* – This API will take the name and type of the ozone object and > will return a list of ACLs. > # *RemoveAcl* - This API will take the name, type of the ozone object and > the ACL that has to be removed. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Assigned] (HDDS-2096) Ozone ACL document missing AddAcl API
[ https://issues.apache.org/jira/browse/HDDS-2096?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiaoyu Yao reassigned HDDS-2096: Assignee: Xiaoyu Yao > Ozone ACL document missing AddAcl API > - > > Key: HDDS-2096 > URL: https://issues.apache.org/jira/browse/HDDS-2096 > Project: Hadoop Distributed Data Store > Issue Type: Test >Reporter: Xiaoyu Yao >Assignee: Xiaoyu Yao >Priority: Major > > Current Ozone Native ACL APIs document looks like below, the AddAcl is > missing. > > h3. Ozone Native ACL APIs > The ACLs can be manipulated by a set of APIs supported by Ozone. The APIs > supported are: > # *SetAcl* – This API will take user principal, the name, type of the ozone > object and a list of ACLs. > # *GetAcl* – This API will take the name and type of the ozone object and > will return a list of ACLs. > # *RemoveAcl* - This API will take the name, type of the ozone object and > the ACL that has to be removed. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14655) [SBN Read] Namenode crashes if one of The JN is down
[ https://issues.apache.org/jira/browse/HDFS-14655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16927896#comment-16927896 ] Erik Krogen commented on HDFS-14655: Hm. I believe that when you call {{cancel()}} on the {{ListenableFuture}}, it does not kill the thread, it just interrupts it so that it can go on to fetch another task. But it would be good to confirm this. > [SBN Read] Namenode crashes if one of The JN is down > > > Key: HDFS-14655 > URL: https://issues.apache.org/jira/browse/HDFS-14655 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 3.3.0 >Reporter: Harshakiran Reddy >Assignee: Ayush Saxena >Priority: Critical > Attachments: HDFS-14655-01.patch, HDFS-14655-02.patch, > HDFS-14655-03.patch, HDFS-14655-04.patch, HDFS-14655.poc.patch > > > {noformat} > 2019-07-04 17:35:54,064 | INFO | Logger channel (from parallel executor) to > XXX/XXX | Retrying connect to server: XXX/XXX. Already tried > 9 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, > sleepTime=1000 MILLISECONDS) | Client.java:975 > 2019-07-04 17:35:54,087 | FATAL | Edit log tailer | Unknown error encountered > while tailing edits. Shutting down standby NN. | EditLogTailer.java:474 > java.lang.OutOfMemoryError: unable to create new native thread > at java.lang.Thread.start0(Native Method) > at java.lang.Thread.start(Thread.java:717) > at > java.util.concurrent.ThreadPoolExecutor.addWorker(ThreadPoolExecutor.java:957) > at > java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1378) > at > com.google.common.util.concurrent.MoreExecutors$ListeningDecorator.execute(MoreExecutors.java:440) > at > com.google.common.util.concurrent.AbstractListeningExecutorService.submit(AbstractListeningExecutorService.java:56) > at > org.apache.hadoop.hdfs.qjournal.client.IPCLoggerChannel.getJournaledEdits(IPCLoggerChannel.java:565) > at > org.apache.hadoop.hdfs.qjournal.client.AsyncLoggerSet.getJournaledEdits(AsyncLoggerSet.java:272) > at > org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager.selectRpcInputStreams(QuorumJournalManager.java:533) > at > org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager.selectInputStreams(QuorumJournalManager.java:508) > at > org.apache.hadoop.hdfs.server.namenode.JournalSet.selectInputStreams(JournalSet.java:275) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLog.selectInputStreams(FSEditLog.java:1681) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLog.selectInputStreams(FSEditLog.java:1714) > at > org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer.doTailEdits(EditLogTailer.java:307) > at > org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.doWork(EditLogTailer.java:460) > at > org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.access$300(EditLogTailer.java:410) > at > org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread$1.run(EditLogTailer.java:427) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:360) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1709) > at > org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:483) > at > org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.run(EditLogTailer.java:423) > 2019-07-04 17:35:54,112 | INFO | Edit log tailer | Exiting with status 1: > java.lang.OutOfMemoryError: unable to create new native thread | > ExitUtil.java:210 > {noformat} -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDDS-2107) Datanodes should retry forever to connect to SCM in an unsecure environment
[ https://issues.apache.org/jira/browse/HDDS-2107?focusedWorklogId=310907&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-310907 ] ASF GitHub Bot logged work on HDDS-2107: Author: ASF GitHub Bot Created on: 11/Sep/19 18:20 Start Date: 11/Sep/19 18:20 Worklog Time Spent: 10m Work Description: vivekratnavel commented on issue #1424: HDDS-2107. Datanodes should retry forever to connect to SCM in an… URL: https://github.com/apache/hadoop/pull/1424#issuecomment-530503260 @adoroszlai You are right. With this change, we don't get the error from `EndPointStateMachine` and the result now looks like this: ``` datanode_1 | 2019-09-11 18:16:55 INFO InitDatanodeState:140 - DatanodeDetails is persisted to /data/datanode.id datanode_1 | 2019-09-11 18:16:57 INFO Client:948 - Retrying connect to server: datanode/172.19.0.2:9861. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=2147483647, sleepTime=1000 MILLISECONDS) datanode_1 | 2019-09-11 18:16:58 INFO Client:948 - Retrying connect to server: datanode/172.19.0.2:9861. Already tried 1 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=2147483647, sleepTime=1000 MILLISECONDS) datanode_1 | 2019-09-11 18:16:59 INFO Client:948 - Retrying connect to server: datanode/172.19.0.2:9861. Already tried 2 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=2147483647, sleepTime=1000 MILLISECONDS) datanode_1 | 2019-09-11 18:17:00 INFO Client:948 - Retrying connect to server: datanode/172.19.0.2:9861. Already tried 3 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=2147483647, sleepTime=1000 MILLISECONDS) datanode_1 | 2019-09-11 18:17:01 INFO Client:948 - Retrying connect to server: datanode/172.19.0.2:9861. Already tried 4 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=2147483647, sleepTime=1000 MILLISECONDS) datanode_1 | 2019-09-11 18:17:02 INFO Client:948 - Retrying connect to server: datanode/172.19.0.2:9861. Already tried 5 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=2147483647, sleepTime=1000 MILLISECONDS) datanode_1 | 2019-09-11 18:17:03 INFO Client:948 - Retrying connect to server: datanode/172.19.0.2:9861. Already tried 6 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=2147483647, sleepTime=1000 MILLISECONDS) datanode_1 | 2019-09-11 18:17:04 INFO Client:948 - Retrying connect to server: datanode/172.19.0.2:9861. Already tried 7 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=2147483647, sleepTime=1000 MILLISECONDS) datanode_1 | 2019-09-11 18:17:05 INFO Client:948 - Retrying connect to server: datanode/172.19.0.2:9861. Already tried 8 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=2147483647, sleepTime=1000 MILLISECONDS) datanode_1 | 2019-09-11 18:17:06 INFO Client:948 - Retrying connect to server: datanode/172.19.0.2:9861. Already tried 9 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=2147483647, sleepTime=1000 MILLISECONDS) datanode_1 | 2019-09-11 18:17:07 INFO Client:948 - Retrying connect to server: datanode/172.19.0.2:9861. Already tried 10 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=2147483647, sleepTime=1000 MILLISECONDS) datanode_1 | 2019-09-11 18:17:08 INFO Client:948 - Retrying connect to server: datanode/172.19.0.2:9861. Already tried 11 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=2147483647, sleepTime=1000 MILLISECONDS) datanode_1 | 2019-09-11 18:17:09 INFO Client:948 - Retrying connect to server: datanode/172.19.0.2:9861. Already tried 12 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=2147483647, sleepTime=1000 MILLISECONDS) datanode_1 | 2019-09-11 18:17:10 INFO Client:948 - Retrying connect to server: datanode/172.19.0.2:9861. Already tried 13 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=2147483647, sleepTime=1000 MILLISECONDS) datanode_1 | 2019-09-11 18:17:11 INFO Client:948 - Retrying connect to server: datanode/172.19.0.2:9861. Already tried 14 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=2147483647, sleepTime=1000 MILLISECONDS) datanode_1 | 2019-09-11 18:17:12 INFO Client:948 - Retrying connect to server: datanode/172.19.0.2:9861. Already tried 15 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=2147483647, sleepTime=1000 MILLISECONDS) datanode_1 | 2019-09-11 18:17:13 INFO Client:948 - Retrying connect to server: datanode/172.19.0.2:9861. Already tried 16 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=2147483647, sleepTime=1000 MILLISECONDS) data
[jira] [Comment Edited] (HDFS-14655) [SBN Read] Namenode crashes if one of The JN is down
[ https://issues.apache.org/jira/browse/HDFS-14655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16927854#comment-16927854 ] Ayush Saxena edited comment on HDFS-14655 at 9/11/19 5:44 PM: -- Thanx [~xkrogen], Well we can change it to 60 seconds. As it was previously, I guess that should be enough since we are cancelling the threads too? was (Author: ayushtkn): Thanx [~xkrogen], Well we can change it to 60 seconds. I guess that should be enough? > [SBN Read] Namenode crashes if one of The JN is down > > > Key: HDFS-14655 > URL: https://issues.apache.org/jira/browse/HDFS-14655 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 3.3.0 >Reporter: Harshakiran Reddy >Assignee: Ayush Saxena >Priority: Critical > Attachments: HDFS-14655-01.patch, HDFS-14655-02.patch, > HDFS-14655-03.patch, HDFS-14655-04.patch, HDFS-14655.poc.patch > > > {noformat} > 2019-07-04 17:35:54,064 | INFO | Logger channel (from parallel executor) to > XXX/XXX | Retrying connect to server: XXX/XXX. Already tried > 9 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, > sleepTime=1000 MILLISECONDS) | Client.java:975 > 2019-07-04 17:35:54,087 | FATAL | Edit log tailer | Unknown error encountered > while tailing edits. Shutting down standby NN. | EditLogTailer.java:474 > java.lang.OutOfMemoryError: unable to create new native thread > at java.lang.Thread.start0(Native Method) > at java.lang.Thread.start(Thread.java:717) > at > java.util.concurrent.ThreadPoolExecutor.addWorker(ThreadPoolExecutor.java:957) > at > java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1378) > at > com.google.common.util.concurrent.MoreExecutors$ListeningDecorator.execute(MoreExecutors.java:440) > at > com.google.common.util.concurrent.AbstractListeningExecutorService.submit(AbstractListeningExecutorService.java:56) > at > org.apache.hadoop.hdfs.qjournal.client.IPCLoggerChannel.getJournaledEdits(IPCLoggerChannel.java:565) > at > org.apache.hadoop.hdfs.qjournal.client.AsyncLoggerSet.getJournaledEdits(AsyncLoggerSet.java:272) > at > org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager.selectRpcInputStreams(QuorumJournalManager.java:533) > at > org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager.selectInputStreams(QuorumJournalManager.java:508) > at > org.apache.hadoop.hdfs.server.namenode.JournalSet.selectInputStreams(JournalSet.java:275) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLog.selectInputStreams(FSEditLog.java:1681) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLog.selectInputStreams(FSEditLog.java:1714) > at > org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer.doTailEdits(EditLogTailer.java:307) > at > org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.doWork(EditLogTailer.java:460) > at > org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.access$300(EditLogTailer.java:410) > at > org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread$1.run(EditLogTailer.java:427) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:360) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1709) > at > org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:483) > at > org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.run(EditLogTailer.java:423) > 2019-07-04 17:35:54,112 | INFO | Edit log tailer | Exiting with status 1: > java.lang.OutOfMemoryError: unable to create new native thread | > ExitUtil.java:210 > {noformat} -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDDS-1786) Datanodes takeSnapshot should delete previously created snapshots
[ https://issues.apache.org/jira/browse/HDDS-1786?focusedWorklogId=310887&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-310887 ] ASF GitHub Bot logged work on HDDS-1786: Author: ASF GitHub Bot Created on: 11/Sep/19 17:44 Start Date: 11/Sep/19 17:44 Worklog Time Spent: 10m Work Description: avijayanhwx commented on issue #1163: HDDS-1786 : Datanodes takeSnapshot should delete previously created s… URL: https://github.com/apache/hadoop/pull/1163#issuecomment-530488768 > Thanks @avijayanhwx for working on the patch. The patch looks good to me. Can u check the compilation issue? @bshashikant I have fixed an edge case in the unit test. The rest of the failures are unrelated. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 310887) Time Spent: 4h 20m (was: 4h 10m) > Datanodes takeSnapshot should delete previously created snapshots > - > > Key: HDDS-1786 > URL: https://issues.apache.org/jira/browse/HDDS-1786 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Datanode >Affects Versions: 0.4.0 >Reporter: Mukul Kumar Singh >Assignee: Aravindan Vijayan >Priority: Major > Labels: pull-request-available > Time Spent: 4h 20m > Remaining Estimate: 0h > > Right now, after after taking a new snapshot, the previous snapshot file is > left in the raft log directory. When a new snapshot is taken, the previous > snapshots should be deleted. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDDS-1786) Datanodes takeSnapshot should delete previously created snapshots
[ https://issues.apache.org/jira/browse/HDDS-1786?focusedWorklogId=310886&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-310886 ] ASF GitHub Bot logged work on HDDS-1786: Author: ASF GitHub Bot Created on: 11/Sep/19 17:44 Start Date: 11/Sep/19 17:44 Worklog Time Spent: 10m Work Description: avijayanhwx commented on issue #1163: HDDS-1786 : Datanodes takeSnapshot should delete previously created s… URL: https://github.com/apache/hadoop/pull/1163#issuecomment-530488768 > Thanks @avijayanhwx for working on the patch. The patch looks good to me. Can u check the compilation issue? @bshashikant I have fixed an edge case in the unit test. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 310886) Time Spent: 4h 10m (was: 4h) > Datanodes takeSnapshot should delete previously created snapshots > - > > Key: HDDS-1786 > URL: https://issues.apache.org/jira/browse/HDDS-1786 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Datanode >Affects Versions: 0.4.0 >Reporter: Mukul Kumar Singh >Assignee: Aravindan Vijayan >Priority: Major > Labels: pull-request-available > Time Spent: 4h 10m > Remaining Estimate: 0h > > Right now, after after taking a new snapshot, the previous snapshot file is > left in the raft log directory. When a new snapshot is taken, the previous > snapshots should be deleted. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14655) [SBN Read] Namenode crashes if one of The JN is down
[ https://issues.apache.org/jira/browse/HDFS-14655?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ayush Saxena updated HDFS-14655: Attachment: HDFS-14655-04.patch > [SBN Read] Namenode crashes if one of The JN is down > > > Key: HDFS-14655 > URL: https://issues.apache.org/jira/browse/HDFS-14655 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 3.3.0 >Reporter: Harshakiran Reddy >Assignee: Ayush Saxena >Priority: Critical > Attachments: HDFS-14655-01.patch, HDFS-14655-02.patch, > HDFS-14655-03.patch, HDFS-14655-04.patch, HDFS-14655.poc.patch > > > {noformat} > 2019-07-04 17:35:54,064 | INFO | Logger channel (from parallel executor) to > XXX/XXX | Retrying connect to server: XXX/XXX. Already tried > 9 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, > sleepTime=1000 MILLISECONDS) | Client.java:975 > 2019-07-04 17:35:54,087 | FATAL | Edit log tailer | Unknown error encountered > while tailing edits. Shutting down standby NN. | EditLogTailer.java:474 > java.lang.OutOfMemoryError: unable to create new native thread > at java.lang.Thread.start0(Native Method) > at java.lang.Thread.start(Thread.java:717) > at > java.util.concurrent.ThreadPoolExecutor.addWorker(ThreadPoolExecutor.java:957) > at > java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1378) > at > com.google.common.util.concurrent.MoreExecutors$ListeningDecorator.execute(MoreExecutors.java:440) > at > com.google.common.util.concurrent.AbstractListeningExecutorService.submit(AbstractListeningExecutorService.java:56) > at > org.apache.hadoop.hdfs.qjournal.client.IPCLoggerChannel.getJournaledEdits(IPCLoggerChannel.java:565) > at > org.apache.hadoop.hdfs.qjournal.client.AsyncLoggerSet.getJournaledEdits(AsyncLoggerSet.java:272) > at > org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager.selectRpcInputStreams(QuorumJournalManager.java:533) > at > org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager.selectInputStreams(QuorumJournalManager.java:508) > at > org.apache.hadoop.hdfs.server.namenode.JournalSet.selectInputStreams(JournalSet.java:275) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLog.selectInputStreams(FSEditLog.java:1681) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLog.selectInputStreams(FSEditLog.java:1714) > at > org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer.doTailEdits(EditLogTailer.java:307) > at > org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.doWork(EditLogTailer.java:460) > at > org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.access$300(EditLogTailer.java:410) > at > org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread$1.run(EditLogTailer.java:427) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:360) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1709) > at > org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:483) > at > org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.run(EditLogTailer.java:423) > 2019-07-04 17:35:54,112 | INFO | Edit log tailer | Exiting with status 1: > java.lang.OutOfMemoryError: unable to create new native thread | > ExitUtil.java:210 > {noformat} -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14655) [SBN Read] Namenode crashes if one of The JN is down
[ https://issues.apache.org/jira/browse/HDFS-14655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16927854#comment-16927854 ] Ayush Saxena commented on HDFS-14655: - Thanx [~xkrogen], Well we can change it to 60 seconds. I guess that should be enough? > [SBN Read] Namenode crashes if one of The JN is down > > > Key: HDFS-14655 > URL: https://issues.apache.org/jira/browse/HDFS-14655 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 3.3.0 >Reporter: Harshakiran Reddy >Assignee: Ayush Saxena >Priority: Critical > Attachments: HDFS-14655-01.patch, HDFS-14655-02.patch, > HDFS-14655-03.patch, HDFS-14655.poc.patch > > > {noformat} > 2019-07-04 17:35:54,064 | INFO | Logger channel (from parallel executor) to > XXX/XXX | Retrying connect to server: XXX/XXX. Already tried > 9 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, > sleepTime=1000 MILLISECONDS) | Client.java:975 > 2019-07-04 17:35:54,087 | FATAL | Edit log tailer | Unknown error encountered > while tailing edits. Shutting down standby NN. | EditLogTailer.java:474 > java.lang.OutOfMemoryError: unable to create new native thread > at java.lang.Thread.start0(Native Method) > at java.lang.Thread.start(Thread.java:717) > at > java.util.concurrent.ThreadPoolExecutor.addWorker(ThreadPoolExecutor.java:957) > at > java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1378) > at > com.google.common.util.concurrent.MoreExecutors$ListeningDecorator.execute(MoreExecutors.java:440) > at > com.google.common.util.concurrent.AbstractListeningExecutorService.submit(AbstractListeningExecutorService.java:56) > at > org.apache.hadoop.hdfs.qjournal.client.IPCLoggerChannel.getJournaledEdits(IPCLoggerChannel.java:565) > at > org.apache.hadoop.hdfs.qjournal.client.AsyncLoggerSet.getJournaledEdits(AsyncLoggerSet.java:272) > at > org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager.selectRpcInputStreams(QuorumJournalManager.java:533) > at > org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager.selectInputStreams(QuorumJournalManager.java:508) > at > org.apache.hadoop.hdfs.server.namenode.JournalSet.selectInputStreams(JournalSet.java:275) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLog.selectInputStreams(FSEditLog.java:1681) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLog.selectInputStreams(FSEditLog.java:1714) > at > org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer.doTailEdits(EditLogTailer.java:307) > at > org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.doWork(EditLogTailer.java:460) > at > org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.access$300(EditLogTailer.java:410) > at > org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread$1.run(EditLogTailer.java:427) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:360) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1709) > at > org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:483) > at > org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.run(EditLogTailer.java:423) > 2019-07-04 17:35:54,112 | INFO | Edit log tailer | Exiting with status 1: > java.lang.OutOfMemoryError: unable to create new native thread | > ExitUtil.java:210 > {noformat} -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDDS-1982) Extend SCMNodeManager to support decommission and maintenance states
[ https://issues.apache.org/jira/browse/HDDS-1982?focusedWorklogId=310884&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-310884 ] ASF GitHub Bot logged work on HDDS-1982: Author: ASF GitHub Bot Created on: 11/Sep/19 17:38 Start Date: 11/Sep/19 17:38 Worklog Time Spent: 10m Work Description: sodonnel commented on pull request #1344: HDDS-1982 Extend SCMNodeManager to support decommission and maintenance states URL: https://github.com/apache/hadoop/pull/1344#discussion_r323370846 ## File path: hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/node/SCMNodeManager.java ## @@ -417,9 +451,12 @@ private SCMNodeStat getNodeStatInternal(DatanodeDetails datanodeDetails) { @Override public Map getNodeCount() { +// TODO - This does not consider decom, maint etc. Map nodeCountMap = new HashMap(); Review comment: The existing code had Map, but I agree it would be better with or . I plan to leave this as is for now, as this method is used only for JMX right now, and I plan to split that out into a separate change via HDDS-2113 as there are some open questions there. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 310884) Time Spent: 5h 40m (was: 5.5h) > Extend SCMNodeManager to support decommission and maintenance states > > > Key: HDDS-1982 > URL: https://issues.apache.org/jira/browse/HDDS-1982 > Project: Hadoop Distributed Data Store > Issue Type: Sub-task > Components: SCM >Reporter: Stephen O'Donnell >Assignee: Stephen O'Donnell >Priority: Major > Labels: pull-request-available > Time Spent: 5h 40m > Remaining Estimate: 0h > > Currently, within SCM a node can have the following states: > HEALTHY > STALE > DEAD > DECOMMISSIONING > DECOMMISSIONED > The last 2 are not currently used. > In order to support decommissioning and maintenance mode, we need to extend > the set of states a node can have to include decommission and maintenance > states. > It is also important to note that a node decommissioning or entering > maintenance can also be HEALTHY, STALE or go DEAD. > Therefore in this Jira I propose we should model a node state with two > different sets of values. The first, is effectively the liveliness of the > node, with the following states. This is largely what is in place now: > HEALTHY > STALE > DEAD > The second is the node operational state: > IN_SERVICE > DECOMMISSIONING > DECOMMISSIONED > ENTERING_MAINTENANCE > IN_MAINTENANCE > That means the overall total number of states for a node is the cross-product > of the two above lists, however it probably makes sense to keep the two > states seperate internally. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDDS-1982) Extend SCMNodeManager to support decommission and maintenance states
[ https://issues.apache.org/jira/browse/HDDS-1982?focusedWorklogId=310882&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-310882 ] ASF GitHub Bot logged work on HDDS-1982: Author: ASF GitHub Bot Created on: 11/Sep/19 17:36 Start Date: 11/Sep/19 17:36 Worklog Time Spent: 10m Work Description: sodonnel commented on pull request #1344: HDDS-1982 Extend SCMNodeManager to support decommission and maintenance states URL: https://github.com/apache/hadoop/pull/1344#discussion_r323369810 ## File path: hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/node/states/NodeStateMap.java ## @@ -309,4 +381,61 @@ private void checkIfNodeExist(UUID uuid) throws NodeNotFoundException { throw new NodeNotFoundException("Node UUID: " + uuid); } } + + /** + * Create a list of datanodeInfo for all nodes matching the passed states. + * Passing null for one of the states acts like a wildcard for that state. + * + * @param opState + * @param health + * @return List of DatanodeInfo objects matching the passed state + */ + private List filterNodes( + NodeOperationalState opState, NodeState health) { +if (opState != null && health != null) { Review comment: I had not really looked into the Streams API before, but I change the code to use streams and it does make it easier to follow, so I have made this change. I still kept the IF statements at the start of the method as if both params are null we can just return the entire list with no searching and if both are non-null we can search using the NodeStatus which should be slightly more efficient. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 310882) Time Spent: 5.5h (was: 5h 20m) > Extend SCMNodeManager to support decommission and maintenance states > > > Key: HDDS-1982 > URL: https://issues.apache.org/jira/browse/HDDS-1982 > Project: Hadoop Distributed Data Store > Issue Type: Sub-task > Components: SCM >Reporter: Stephen O'Donnell >Assignee: Stephen O'Donnell >Priority: Major > Labels: pull-request-available > Time Spent: 5.5h > Remaining Estimate: 0h > > Currently, within SCM a node can have the following states: > HEALTHY > STALE > DEAD > DECOMMISSIONING > DECOMMISSIONED > The last 2 are not currently used. > In order to support decommissioning and maintenance mode, we need to extend > the set of states a node can have to include decommission and maintenance > states. > It is also important to note that a node decommissioning or entering > maintenance can also be HEALTHY, STALE or go DEAD. > Therefore in this Jira I propose we should model a node state with two > different sets of values. The first, is effectively the liveliness of the > node, with the following states. This is largely what is in place now: > HEALTHY > STALE > DEAD > The second is the node operational state: > IN_SERVICE > DECOMMISSIONING > DECOMMISSIONED > ENTERING_MAINTENANCE > IN_MAINTENANCE > That means the overall total number of states for a node is the cross-product > of the two above lists, however it probably makes sense to keep the two > states seperate internally. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDDS-1786) Datanodes takeSnapshot should delete previously created snapshots
[ https://issues.apache.org/jira/browse/HDDS-1786?focusedWorklogId=310881&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-310881 ] ASF GitHub Bot logged work on HDDS-1786: Author: ASF GitHub Bot Created on: 11/Sep/19 17:35 Start Date: 11/Sep/19 17:35 Worklog Time Spent: 10m Work Description: bshashikant commented on issue #1163: HDDS-1786 : Datanodes takeSnapshot should delete previously created s… URL: https://github.com/apache/hadoop/pull/1163#issuecomment-530485067 Thanks @avijayanhwx for working on the patch. The patch looks good to me. Can u check the compilation issue? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 310881) Time Spent: 4h (was: 3h 50m) > Datanodes takeSnapshot should delete previously created snapshots > - > > Key: HDDS-1786 > URL: https://issues.apache.org/jira/browse/HDDS-1786 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Datanode >Affects Versions: 0.4.0 >Reporter: Mukul Kumar Singh >Assignee: Aravindan Vijayan >Priority: Major > Labels: pull-request-available > Time Spent: 4h > Remaining Estimate: 0h > > Right now, after after taking a new snapshot, the previous snapshot file is > left in the raft log directory. When a new snapshot is taken, the previous > snapshots should be deleted. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDDS-2113) Update JMX metrics in SCMNodeMetrics for Decommission and Maintenance
[ https://issues.apache.org/jira/browse/HDDS-2113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16927840#comment-16927840 ] Stephen O'Donnell commented on HDDS-2113: - [~nandakumar131] [~anu] [~arpaga] I would appreciate your thoughts on this. > Update JMX metrics in SCMNodeMetrics for Decommission and Maintenance > - > > Key: HDDS-2113 > URL: https://issues.apache.org/jira/browse/HDDS-2113 > Project: Hadoop Distributed Data Store > Issue Type: Sub-task > Components: SCM >Affects Versions: 0.5.0 >Reporter: Stephen O'Donnell >Assignee: Stephen O'Donnell >Priority: Major > > Currently the class SCMNodeMetrics exposes JMX metrics for the number of > HEALTHY, STALE and DEAD nodes. > It also exposes the disk capacity of the cluster and the amount of space used > and available. > We need to decide how we want to display things in JMX when nodes are in and > entering maintenance, decommissioning and decommissioned. > We now have 15 states rather than the previous 3, as we can have nodes in: > * IN_SERVICE > * ENTERING_MAINTENANCE > * IN_MAINTENANCE > * DECOMMISSIONING > * DECOMMISSIONED > And in each of these states, nodes can be: > * HEALTHY > * STALE > * DEAD > The simplest case would be to expose these 15 states directly in JMX, as it > gives the complete picture, but I wonder if we need any summary JMX metrics > too? > > We also need to consider how to count disk capacity and usage. For example: > # Do we count capacity and usage on a DECOMMISSIONING node? This is not a > clear cut answer, as a decommissioning node does not provide any capacity for > writers in the cluster, but it does use capacity. > # For a DECOMMISSIONED node, we probably should not count capacity or usage > # For an ENTERING_MAINTENANCE node, do we count capacity and usage? I > suspect we should include the capacity and usage in the totals, however a > node in this state will not be available for writes. > # For an IN_MAINTENANCE node that is healthy? > # For an IN_MAINTENANCE node that is dead? > I would welcome any thoughts on this before changing the code. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org