date:20190911

[jira] [Commented] (HDFS-14832) RBF : Add Icon for ReadOnly False

2019-09-11 Thread hemanthboyina (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-14832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16928287#comment-16928287
 ] 

hemanthboyina commented on HDFS-14832:
--

icon is fancier way to show 
texts is easier way to understand .

I feel , if we want to have icon , we should be having both for Read Only False 
and ReadOnly true

If not , texts is more clearer with no confusion 

> RBF : Add Icon for ReadOnly False
> -
>
> Key: HDFS-14832
> URL: https://issues.apache.org/jira/browse/HDFS-14832
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: hemanthboyina
>Assignee: hemanthboyina
>Priority: Minor
>
> In Router Web UI for Mount Table information , add icon for read only state 
> false 



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-14832) RBF : Add Icon for ReadOnly False

2019-09-11 Thread Takanobu Asanuma (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-14832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16928275#comment-16928275
 ] 

Takanobu Asanuma commented on HDFS-14832:
-

Thanks for your comment, [~hemanthboyina].

* I feel lock icon means read-only. And judging based on the icon color may 
confuse admins.
* If Read-only false (which means read-write) status shows there, I prefer to 
rename the column name from "Read Only" to "Mount Option".
* I also listened to my colleagues' opinions. In summary, using icon for 
read-only may be a bit much. String texts may be clearer.

[~hemanthboyina] [~elgoiri] How does look that?

> RBF : Add Icon for ReadOnly False
> -
>
> Key: HDFS-14832
> URL: https://issues.apache.org/jira/browse/HDFS-14832
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: hemanthboyina
>Assignee: hemanthboyina
>Priority: Minor
>
> In Router Web UI for Mount Table information , add icon for read only state 
> false 



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Issue Comment Deleted] (HDFS-14833) RBF: Router Update Doesn't Sync Quota

2019-09-11 Thread Surendra Singh Lilhore (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-14833?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Surendra Singh Lilhore updated HDFS-14833:
--
Comment: was deleted

(was: Thanks [~ayushtkn], 
{quote}The sync only can success when the cache hasn't been refreshed before 
the comparison
{quote}
Good point, this is possible. You can handle this in HDFS-14833.)

> RBF: Router Update Doesn't Sync Quota
> -
>
> Key: HDFS-14833
> URL: https://issues.apache.org/jira/browse/HDFS-14833
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Ayush Saxena
>Assignee: Ayush Saxena
>Priority: Major
>
> HDFS-14777 Added a check to prevent RPC call, It checks whether in the 
> present state whether quota is changing. 
> But ignores the part that if the locations are changed. if the location is 
> changed the new destination should be synchronized with the mount entry 
> quota. 



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-14777) RBF: Set ReadOnly is failing for mount Table but actually readonly succeed to set

2019-09-11 Thread Surendra Singh Lilhore (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-14777?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16928265#comment-16928265
 ] 

Surendra Singh Lilhore commented on HDFS-14777:
---

Thanks [~ayushtkn], 
{quote}The sync only can success when the cache hasn't been refreshed before 
the comparison
{quote}
Good point, this is possible. You can handle this in HDFS-14833.

> RBF: Set ReadOnly is failing for mount Table but actually readonly succeed to 
> set
> -
>
> Key: HDFS-14777
> URL: https://issues.apache.org/jira/browse/HDFS-14777
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Ranith Sardar
>Assignee: Ranith Sardar
>Priority: Major
> Fix For: 3.3.0
>
> Attachments: HDFS-14777.001.patch, HDFS-14777.002.patch, 
> HDFS-14777.003.patch, HDFS-14777.004.patch
>
>
> # hdfs dfsrouteradmin -update /test hacluster /test -readonly /opt/client # 
> hdfs dfsrouteradmin -update /test hacluster /test -readonly update: /test is 
> in a read only mount 
> pointorg.apache.hadoop.ipc.RemoteException(java.io.IOException): /test is in 
> a read only mount point at 
> org.apache.hadoop.hdfs.server.federation.router.RouterRpcServer.getLocationsForPath(RouterRpcServer.java:1419)
>  at 
> org.apache.hadoop.hdfs.server.federation.router.Quota.getQuotaRemoteLocations(Quota.java:217)
>  at 
> org.apache.hadoop.hdfs.server.federation.router.Quota.setQuota(Quota.java:75) 
> at 
> org.apache.hadoop.hdfs.server.federation.router.RouterAdminServer.synchronizeQuota(RouterAdminServer.java:288)
>  at 
> org.apache.hadoop.hdfs.server.federation.router.RouterAdminServer.updateMountTableEntry(RouterAdminServer.java:267)



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-14833) RBF: Router Update Doesn't Sync Quota

2019-09-11 Thread Surendra Singh Lilhore (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-14833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16928264#comment-16928264
 ] 

Surendra Singh Lilhore commented on HDFS-14833:
---

Thanks [~ayushtkn], 
{quote}The sync only can success when the cache hasn't been refreshed before 
the comparison
{quote}
Good point, this is possible. You can handle this in HDFS-14833.

> RBF: Router Update Doesn't Sync Quota
> -
>
> Key: HDFS-14833
> URL: https://issues.apache.org/jira/browse/HDFS-14833
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Ayush Saxena
>Assignee: Ayush Saxena
>Priority: Major
>
> HDFS-14777 Added a check to prevent RPC call, It checks whether in the 
> present state whether quota is changing. 
> But ignores the part that if the locations are changed. if the location is 
> changed the new destination should be synchronized with the mount entry 
> quota. 



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-14777) RBF: Set ReadOnly is failing for mount Table but actually readonly succeed to set

2019-09-11 Thread Ayush Saxena (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-14777?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16928215#comment-16928215
 ] 

Ayush Saxena commented on HDFS-14777:
-

Thanx Everyone for the work here. Was trying to fix HDFS-14833, Considering 
that quota just doesn't sync, when locations are updated, but realized while 
testing there is a problem after this in synchronization even on Quota update 
in most cases. As the comparison is between the value in the update request and 
the mount entry(*After the update command has been executed on the entry*). So 
the entry is the updated entry wrt the update request so the values shall be 
equal. The sync only can success when the cache hasn't been refreshed before 
the comparison, which happens very randomly and is beyond control.
I guess the entry taken should have been taken before calling update, So as to 
have a comparison. 
Give a check [~elgoiri] [~surendrasingh] if it is so I will update it at 
HDFS-14833. Prior apologies if I have messed up. :) 

> RBF: Set ReadOnly is failing for mount Table but actually readonly succeed to 
> set
> -
>
> Key: HDFS-14777
> URL: https://issues.apache.org/jira/browse/HDFS-14777
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Ranith Sardar
>Assignee: Ranith Sardar
>Priority: Major
> Fix For: 3.3.0
>
> Attachments: HDFS-14777.001.patch, HDFS-14777.002.patch, 
> HDFS-14777.003.patch, HDFS-14777.004.patch
>
>
> # hdfs dfsrouteradmin -update /test hacluster /test -readonly /opt/client # 
> hdfs dfsrouteradmin -update /test hacluster /test -readonly update: /test is 
> in a read only mount 
> pointorg.apache.hadoop.ipc.RemoteException(java.io.IOException): /test is in 
> a read only mount point at 
> org.apache.hadoop.hdfs.server.federation.router.RouterRpcServer.getLocationsForPath(RouterRpcServer.java:1419)
>  at 
> org.apache.hadoop.hdfs.server.federation.router.Quota.getQuotaRemoteLocations(Quota.java:217)
>  at 
> org.apache.hadoop.hdfs.server.federation.router.Quota.setQuota(Quota.java:75) 
> at 
> org.apache.hadoop.hdfs.server.federation.router.RouterAdminServer.synchronizeQuota(RouterAdminServer.java:288)
>  at 
> org.apache.hadoop.hdfs.server.federation.router.RouterAdminServer.updateMountTableEntry(RouterAdminServer.java:267)



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Comment Edited] (HDFS-14818) Check native pmdk lib by 'hadoop checknative' command

2019-09-11 Thread Rakesh R (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-14818?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16928209#comment-16928209
 ] 

Rakesh R edited comment on HDFS-14818 at 9/12/19 5:22 AM:
--

Thanks [~PhiloHe] for the patch. Overall looks good to me, just few comments.
 # {{SupportState.PMDK_LIB_NOT_FOUND}} - its unused now, can you remove it.
{code:java}
SupportState.PMDK_LIB_NOT_FOUND(1),
{code}
{code:java}
 case 1:
 msg = "The native code is built with PMDK support, but PMDK libs " +
 "are NOT found in execution environment or failed to be loaded.";
 break;
{code}
# Any reason to change 'NAME' to 'REALPATH'.


was (Author: rakeshr):
Thanks [~PhiloHe] for the patch. Overall looks good to me, just a comment.

# {{SupportState.PMDK_LIB_NOT_FOUND}} - its unused now, can you remove it.
 {code}
SupportState.PMDK_LIB_NOT_FOUND(1),
{code}
{code}
 case 1:
 msg = "The native code is built with PMDK support, but PMDK libs " +
 "are NOT found in execution environment or failed to be loaded.";
 break;
{code}

> Check native pmdk lib by 'hadoop checknative' command
> -
>
> Key: HDFS-14818
> URL: https://issues.apache.org/jira/browse/HDFS-14818
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: native
>Reporter: Feilong He
>Assignee: Feilong He
>Priority: Major
> Attachments: HDFS-14818.000.patch
>
>
> Currently, 'hadoop checknative' command supports checking native libs, such 
> as zlib, snappy, openssl and ISA-L etc. It's necessary to include pmdk lib in 
> the checking.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-14818) Check native pmdk lib by 'hadoop checknative' command

2019-09-11 Thread Rakesh R (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-14818?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16928209#comment-16928209
 ] 

Rakesh R commented on HDFS-14818:
-

Thanks [~PhiloHe] for the patch. Overall looks good to me, just a comment.

# {{SupportState.PMDK_LIB_NOT_FOUND}} - its unused now, can you remove it.
 {code}
SupportState.PMDK_LIB_NOT_FOUND(1),
{code}
{code}
 case 1:
 msg = "The native code is built with PMDK support, but PMDK libs " +
 "are NOT found in execution environment or failed to be loaded.";
 break;
{code}

> Check native pmdk lib by 'hadoop checknative' command
> -
>
> Key: HDFS-14818
> URL: https://issues.apache.org/jira/browse/HDFS-14818
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: native
>Reporter: Feilong He
>Assignee: Feilong He
>Priority: Major
> Attachments: HDFS-14818.000.patch
>
>
> Currently, 'hadoop checknative' command supports checking native libs, such 
> as zlib, snappy, openssl and ISA-L etc. It's necessary to include pmdk lib in 
> the checking.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-14754) Erasure Coding : The number of Under-Replicated Blocks never reduced

2019-09-11 Thread Surendra Singh Lilhore (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-14754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16928202#comment-16928202
 ] 

Surendra Singh Lilhore commented on HDFS-14754:
---

[~hemanthboyina], pls attach the patch again with proper comment in test case.

> Erasure Coding :  The number of Under-Replicated Blocks never reduced
> -
>
> Key: HDFS-14754
> URL: https://issues.apache.org/jira/browse/HDFS-14754
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: hemanthboyina
>Assignee: hemanthboyina
>Priority: Critical
> Attachments: HDFS-14754.001.patch, HDFS-14754.002.patch, 
> HDFS-14754.003.patch, HDFS-14754.004.patch, HDFS-14754.005.patch, 
> HDFS-14754.006.patch, HDFS-14754.007.patch
>
>
> Using EC RS-3-2, 6 DN 
> We came accross a scenario where in the EC 5 blocks , same block is 
> replicated thrice and two blocks got missing
> Replicated block was not deleting and missing block is not able to ReConstruct



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Comment Edited] (HDFS-14754) Erasure Coding : The number of Under-Replicated Blocks never reduced

2019-09-11 Thread Surendra Singh Lilhore (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-14754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16928202#comment-16928202
 ] 

Surendra Singh Lilhore edited comment on HDFS-14754 at 9/12/19 5:06 AM:


[~hemanthboyina], pls attach the patch again with proper comment in test case.
{code:java}
// update blocksMap
cluster.triggerBlockReports();
// add to invalidates
cluster.triggerHeartbeats();
// datanode delete block
cluster.triggerHeartbeats();
// update blocksMap
cluster.triggerBlockReports();  {code}


was (Author: surendrasingh):
[~hemanthboyina], pls attach the patch again with proper comment in test case.

> Erasure Coding :  The number of Under-Replicated Blocks never reduced
> -
>
> Key: HDFS-14754
> URL: https://issues.apache.org/jira/browse/HDFS-14754
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: hemanthboyina
>Assignee: hemanthboyina
>Priority: Critical
> Attachments: HDFS-14754.001.patch, HDFS-14754.002.patch, 
> HDFS-14754.003.patch, HDFS-14754.004.patch, HDFS-14754.005.patch, 
> HDFS-14754.006.patch, HDFS-14754.007.patch
>
>
> Using EC RS-3-2, 6 DN 
> We came accross a scenario where in the EC 5 blocks , same block is 
> replicated thrice and two blocks got missing
> Replicated block was not deleting and missing block is not able to ReConstruct



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-14699) Erasure Coding: Storage not considered in live replica when replication streams hard limit reached to threshold

2019-09-11 Thread Surendra Singh Lilhore (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-14699?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16928199#comment-16928199
 ] 

Surendra Singh Lilhore commented on HDFS-14699:
---

{quote}[~surendrasingh] I saw +1 for this patch, can you merge the patch? 
Thanks!
{quote}
Sure, I will commit this today...

> Erasure Coding: Storage not considered in live replica when replication 
> streams hard limit reached to threshold
> ---
>
> Key: HDFS-14699
> URL: https://issues.apache.org/jira/browse/HDFS-14699
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: ec
>Affects Versions: 3.2.0, 3.1.1, 3.3.0
>Reporter: Zhao Yi Ming
>Assignee: Zhao Yi Ming
>Priority: Critical
>  Labels: patch
> Attachments: HDFS-14699.00.patch, HDFS-14699.01.patch, 
> HDFS-14699.02.patch, HDFS-14699.03.patch, HDFS-14699.04.patch, 
> HDFS-14699.05.patch, image-2019-08-20-19-58-51-872.png, 
> image-2019-09-02-17-51-46-742.png
>
>
> We are tried the EC function on 80 node cluster with hadoop 3.1.1, we hit the 
> same scenario as you said https://issues.apache.org/jira/browse/HDFS-8881. 
> Following are our testing steps, hope it can helpful.(following DNs have the 
> testing internal blocks)
>  # we customized a new 10-2-1024k policy and use it on a path, now we have 12 
> internal block(12 live block)
>  # decommission one DN, after the decommission complete. now we have 13 
> internal block(12 live block and 1 decommission block)
>  # then shutdown one DN which did not have the same block id as 1 
> decommission block, now we have 12 internal block(11 live block and 1 
> decommission block)
>  # after wait for about 600s (before the heart beat come) commission the 
> decommissioned DN again, now we have 12 internal block(11 live block and 1 
> duplicate block)
>  # Then the EC is not reconstruct the missed block
> We think this is a critical issue for using the EC function in a production 
> env. Could you help? Thanks a lot!



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-14609) RBF: Security should use common AuthenticationFilter

2019-09-11 Thread CR Hota (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-14609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16928195#comment-16928195
 ] 

CR Hota commented on HDFS-14609:


[~zhangchen] Thanks for the clarification. I think we are good. Anyways 
InvalidToken is captured in other tests as well.

+1 for 006.patch.

> RBF: Security should use common AuthenticationFilter
> 
>
> Key: HDFS-14609
> URL: https://issues.apache.org/jira/browse/HDFS-14609
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: CR Hota
>Assignee: Chen Zhang
>Priority: Major
> Attachments: HDFS-14609.001.patch, HDFS-14609.002.patch, 
> HDFS-14609.003.patch, HDFS-14609.004.patch, HDFS-14609.005.patch, 
> HDFS-14609.006.patch
>
>
> We worked on router based federation security as part of HDFS-13532. We kept 
> it compatible with the way namenode works. However with HADOOP-16314 and 
> HDFS-16354 in trunk, auth filters seems to have been changed causing tests to 
> fail.
> Changes are needed appropriately in RBF, mainly fixing broken tests.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-14844) Make buffer of BlockReaderRemote#newBlockReader#BufferedOutputStream configurable

2019-09-11 Thread Lisheng Sun (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-14844?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lisheng Sun updated HDFS-14844:
---
Attachment: HDFS-14844.002.patch

> Make buffer of BlockReaderRemote#newBlockReader#BufferedOutputStream  
> configurable
> --
>
> Key: HDFS-14844
> URL: https://issues.apache.org/jira/browse/HDFS-14844
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Lisheng Sun
>Assignee: Lisheng Sun
>Priority: Minor
> Attachments: HDFS-14844.001.patch, HDFS-14844.002.patch
>
>
> details for HDFS-14820



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-14832) RBF : Add Icon for ReadOnly False

2019-09-11 Thread hemanthboyina (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-14832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16928189#comment-16928189
 ] 

hemanthboyina commented on HDFS-14832:
--

yes [~tasanuma] we should have like _federationhealth-mounttable-legend_

whats your opinion on having Red colour labelled lock for ReadOnly State False 
, as same like green for read only state true. 

> RBF : Add Icon for ReadOnly False
> -
>
> Key: HDFS-14832
> URL: https://issues.apache.org/jira/browse/HDFS-14832
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: hemanthboyina
>Assignee: hemanthboyina
>Priority: Minor
>
> In Router Web UI for Mount Table information , add icon for read only state 
> false 



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDDS-2019) Handle Set DtService of token in S3Gateway for OM HA

2019-09-11 Thread Jitendra Nath Pandey (Jira)



 [ 
https://issues.apache.org/jira/browse/HDDS-2019?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jitendra Nath Pandey updated HDDS-2019:
---
Priority: Critical  (was: Major)

> Handle Set DtService of token in S3Gateway for OM HA
> 
>
> Key: HDDS-2019
> URL: https://issues.apache.org/jira/browse/HDDS-2019
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>Reporter: Bharat Viswanadham
>Assignee: Bharat Viswanadham
>Priority: Critical
>
> When OM HA is enabled, when tokens are generated, the service name should be 
> set with address of all OM's.
>  
> Current without HA, it is set with Om RpcAddress string. This Jira is to 
> handle:
>  # Set dtService with all OM address. Right now in OMClientProducer, UGI is 
> created with S3 token, and serviceName of token is set with OMAddress, for HA 
> case, this should be set with all OM RPC addresses.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-14795) Add Throttler for writing block

2019-09-11 Thread Lisheng Sun (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-14795?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lisheng Sun updated HDFS-14795:
---
Attachment: HDFS-14795.011.patch

> Add Throttler for writing block
> ---
>
> Key: HDFS-14795
> URL: https://issues.apache.org/jira/browse/HDFS-14795
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Lisheng Sun
>Assignee: Lisheng Sun
>Priority: Minor
> Attachments: HDFS-14795.001.patch, HDFS-14795.002.patch, 
> HDFS-14795.003.patch, HDFS-14795.004.patch, HDFS-14795.005.patch, 
> HDFS-14795.006.patch, HDFS-14795.007.patch, HDFS-14795.008.patch, 
> HDFS-14795.009.patch, HDFS-14795.010.patch, HDFS-14795.011.patch
>
>
> DataXceiver#writeBlock
> {code:java}
> blockReceiver.receiveBlock(mirrorOut, mirrorIn, replyOut,
> mirrorAddr, null, targets, false);
> {code}
> As above code, DataXceiver#writeBlock doesn't throttler.
>  I think it is necessary to throttle for writing block, while add throttler 
> in stage of PIPELINE_SETUP_APPEND_RECOVERY or 
> PIPELINE_SETUP_STREAMING_RECOVERY.
> Default throttler value is still null.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-14768) In some cases, erasure blocks are corruption when they are reconstruct.

2019-09-11 Thread Hadoop QA (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-14768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16928181#comment-16928181
 ] 

Hadoop QA commented on HDFS-14768:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m  
0s{color} | {color:blue} Docker mode activated. {color} |
| {color:red}-1{color} | {color:red} patch {color} | {color:red}  0m  7s{color} 
| {color:red} HDFS-14768 does not apply to trunk. Rebase required? Wrong 
Branch? See https://wiki.apache.org/hadoop/HowToContribute for help. {color} |
\\
\\
|| Subsystem || Report/Notes ||
| JIRA Issue | HDFS-14768 |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/27849/console |
| Powered by | Apache Yetus 0.8.0   http://yetus.apache.org |


This message was automatically generated.



> In some cases, erasure blocks are corruption  when they are reconstruct.
> 
>
> Key: HDFS-14768
> URL: https://issues.apache.org/jira/browse/HDFS-14768
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode, erasure-coding, hdfs, namenode
>Affects Versions: 3.0.2
>Reporter: guojh
>Assignee: guojh
>Priority: Major
>  Labels: patch
> Fix For: 3.3.0
>
> Attachments: HDFS-14768.000.patch, guojh_UT_after_deomission.txt, 
> guojh_UT_before_deomission.txt, zhaoyiming_UT_after_deomission.txt, 
> zhaoyiming_UT_beofre_deomission.txt
>
>
> Policy is RS-6-3-1024K, version is hadoop 3.0.2;
> We suppose a file's block Index is [0,1,2,3,4,5,6,7,8], And decommission 
> index[3,4], increase the index 6 datanode's
> pendingReplicationWithoutTargets  that make it large than 
> replicationStreamsHardLimit(we set 14). Then, After the method 
> chooseSourceDatanodes of BlockMananger, the liveBlockIndices is 
> [0,1,2,3,4,5,7,8], Block Counter is, Live:7, Decommission:2. 
> In method scheduleReconstruction of BlockManager, the additionalReplRequired 
> is 9 - 7 = 2. After Namenode choose two target Datanode, will assign a 
> erasureCode task to target datanode.
> When datanode get the task will build  targetIndices from liveBlockIndices 
> and target length. the code is blow.
> {code:java}
> // code placeholder
> targetIndices = new short[targets.length];
> private void initTargetIndices() { 
>   BitSet bitset = reconstructor.getLiveBitSet();
>   int m = 0; hasValidTargets = false; 
>   for (int i = 0; i < dataBlkNum + parityBlkNum; i++) {  
> if (!bitset.get) {    
>   if (reconstructor.getBlockLen > 0) {
>        if (m < targets.length) {
>          targetIndices[m++] = (short)i;
>          hasValidTargets = true;
>         }
>       }
>     }
>  }
> {code}
> targetIndices[0]=6, and targetIndices[1] is aways 0 from initial value.
> The StripedReader is  aways create reader from first 6 index block, and is 
> [0,1,2,3,4,5]
> Use the index [0,1,2,3,4,5] to build target index[6,0] will trigger the isal 
> bug. the block index6's data is corruption(all data is zero).
> I write a unit test can stabilize repreduce.
> {code:java}
> // code placeholder
> public void testFileDecommission() throws Exception {
>   LOG.info("Starting test testFileDecommission");
>   final Path ecFile = new Path(ecDir, "testFileDecommission");
>   int writeBytes = cellSize * dataBlocks;
>   writeStripedFile(dfs, ecFile, writeBytes);
>   Assert.assertEquals(0, bm.numOfUnderReplicatedBlocks());
>   FileChecksum fileChecksum1 = dfs.getFileChecksum(ecFile, writeBytes);
>   LocatedBlocks locatedBlocks =
>   StripedFileTestUtil.getLocatedBlocks(ecFile, dfs);
>   LocatedBlock lb = dfs.getClient().getLocatedBlocks(ecFile.toString(), 0)
>   .get(0);
>   DatanodeInfo[] dnLocs = lb.getLocations();
>   LocatedStripedBlock lastBlock =
>   (LocatedStripedBlock)locatedBlocks.getLastLocatedBlock();
>   DatanodeInfo[] storageInfos = lastBlock.getLocations();
>   // 
>   DatanodeDescriptor datanodeDescriptor = 
> cluster.getNameNode().getNamesystem()
>   
> .getBlockManager().getDatanodeManager().getDatanode(storageInfos[6].getDatanodeUuid());
>   for (int i = 0; i < 100; i++) {
> datanodeDescriptor.incrementPendingReplicationWithoutTargets();
>   }
>   assertEquals(dataBlocks + parityBlocks, dnLocs.length);
>   int[] decommNodeIndex = {3, 4};
>   final List decommisionNodes = new ArrayList();
>   // add the node which will be decommissioning
>   decommisionNodes.add(dnLocs[decommNodeIndex[0]]);
>   decommisionNodes.add(dnLocs[decommNodeIndex[1]]);
>   decommissionNode(0, decommisionNodes, AdminStates.DECOMMISSIONED);
>   assertEquals(decommisionNodes.size(), fsn.getNumDecomLiveDataNodes());
>   //assertNull(checkFile(dfs, ecFile, 9, decommisionNodes, numDNs));
>   // Ensure decommissio

[jira] [Updated] (HDFS-14768) In some cases, erasure blocks are corruption when they are reconstruct.

2019-09-11 Thread Zhao Yi Ming (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-14768?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhao Yi Ming updated HDFS-14768:

Attachment: zhaoyiming_UT_after_deomission.txt
zhaoyiming_UT_beofre_deomission.txt

> In some cases, erasure blocks are corruption  when they are reconstruct.
> 
>
> Key: HDFS-14768
> URL: https://issues.apache.org/jira/browse/HDFS-14768
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode, erasure-coding, hdfs, namenode
>Affects Versions: 3.0.2
>Reporter: guojh
>Assignee: guojh
>Priority: Major
>  Labels: patch
> Fix For: 3.3.0
>
> Attachments: HDFS-14768.000.patch, guojh_UT_after_deomission.txt, 
> guojh_UT_before_deomission.txt, zhaoyiming_UT_after_deomission.txt, 
> zhaoyiming_UT_beofre_deomission.txt
>
>
> Policy is RS-6-3-1024K, version is hadoop 3.0.2;
> We suppose a file's block Index is [0,1,2,3,4,5,6,7,8], And decommission 
> index[3,4], increase the index 6 datanode's
> pendingReplicationWithoutTargets  that make it large than 
> replicationStreamsHardLimit(we set 14). Then, After the method 
> chooseSourceDatanodes of BlockMananger, the liveBlockIndices is 
> [0,1,2,3,4,5,7,8], Block Counter is, Live:7, Decommission:2. 
> In method scheduleReconstruction of BlockManager, the additionalReplRequired 
> is 9 - 7 = 2. After Namenode choose two target Datanode, will assign a 
> erasureCode task to target datanode.
> When datanode get the task will build  targetIndices from liveBlockIndices 
> and target length. the code is blow.
> {code:java}
> // code placeholder
> targetIndices = new short[targets.length];
> private void initTargetIndices() { 
>   BitSet bitset = reconstructor.getLiveBitSet();
>   int m = 0; hasValidTargets = false; 
>   for (int i = 0; i < dataBlkNum + parityBlkNum; i++) {  
> if (!bitset.get) {    
>   if (reconstructor.getBlockLen > 0) {
>        if (m < targets.length) {
>          targetIndices[m++] = (short)i;
>          hasValidTargets = true;
>         }
>       }
>     }
>  }
> {code}
> targetIndices[0]=6, and targetIndices[1] is aways 0 from initial value.
> The StripedReader is  aways create reader from first 6 index block, and is 
> [0,1,2,3,4,5]
> Use the index [0,1,2,3,4,5] to build target index[6,0] will trigger the isal 
> bug. the block index6's data is corruption(all data is zero).
> I write a unit test can stabilize repreduce.
> {code:java}
> // code placeholder
> public void testFileDecommission() throws Exception {
>   LOG.info("Starting test testFileDecommission");
>   final Path ecFile = new Path(ecDir, "testFileDecommission");
>   int writeBytes = cellSize * dataBlocks;
>   writeStripedFile(dfs, ecFile, writeBytes);
>   Assert.assertEquals(0, bm.numOfUnderReplicatedBlocks());
>   FileChecksum fileChecksum1 = dfs.getFileChecksum(ecFile, writeBytes);
>   LocatedBlocks locatedBlocks =
>   StripedFileTestUtil.getLocatedBlocks(ecFile, dfs);
>   LocatedBlock lb = dfs.getClient().getLocatedBlocks(ecFile.toString(), 0)
>   .get(0);
>   DatanodeInfo[] dnLocs = lb.getLocations();
>   LocatedStripedBlock lastBlock =
>   (LocatedStripedBlock)locatedBlocks.getLastLocatedBlock();
>   DatanodeInfo[] storageInfos = lastBlock.getLocations();
>   // 
>   DatanodeDescriptor datanodeDescriptor = 
> cluster.getNameNode().getNamesystem()
>   
> .getBlockManager().getDatanodeManager().getDatanode(storageInfos[6].getDatanodeUuid());
>   for (int i = 0; i < 100; i++) {
> datanodeDescriptor.incrementPendingReplicationWithoutTargets();
>   }
>   assertEquals(dataBlocks + parityBlocks, dnLocs.length);
>   int[] decommNodeIndex = {3, 4};
>   final List decommisionNodes = new ArrayList();
>   // add the node which will be decommissioning
>   decommisionNodes.add(dnLocs[decommNodeIndex[0]]);
>   decommisionNodes.add(dnLocs[decommNodeIndex[1]]);
>   decommissionNode(0, decommisionNodes, AdminStates.DECOMMISSIONED);
>   assertEquals(decommisionNodes.size(), fsn.getNumDecomLiveDataNodes());
>   //assertNull(checkFile(dfs, ecFile, 9, decommisionNodes, numDNs));
>   // Ensure decommissioned datanode is not automatically shutdown
>   DFSClient client = getDfsClient(cluster.getNameNode(0), conf);
>   assertEquals("All datanodes must be alive", numDNs,
>   client.datanodeReport(DatanodeReportType.LIVE).length);
>   FileChecksum fileChecksum2 = dfs.getFileChecksum(ecFile, writeBytes);
>   Assert.assertTrue("Checksum mismatches!",
>   fileChecksum1.equals(fileChecksum2));
>   StripedFileTestUtil.checkData(dfs, ecFile, writeBytes, decommisionNodes,
>   null, blockGroupSize);
> }
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscrib

[jira] [Updated] (HDFS-14840) Use Java Conccurent Instead of Synchronization in BlockPoolTokenSecretManager

2019-09-11 Thread Akira Ajisaka (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-14840?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Akira Ajisaka updated HDFS-14840:
-
Fix Version/s: 3.3.0
 Hadoop Flags: Reviewed
   Resolution: Fixed
   Status: Resolved  (was: Patch Available)

Committed this to trunk. Thanks [~belugabehr] for the contribution.

> Use Java Conccurent Instead of Synchronization in BlockPoolTokenSecretManager
> -
>
> Key: HDFS-14840
> URL: https://issues.apache.org/jira/browse/HDFS-14840
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs
>Affects Versions: 3.2.0
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Minor
> Fix For: 3.3.0
>
> Attachments: HDFS-14840.1.patch
>
>
> https://github.com/apache/hadoop/blob/d8bac50e12d243ef8fd2c7e0ce5c9997131dee74/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/security/token/block/BlockPoolTokenSecretManager.java#L40
> Instead of synchronizing the entire class, just synchronize the collection.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Work logged] (HDDS-1879) Support multiple excluded scopes when choosing datanodes in NetworkTopology

2019-09-11 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HDDS-1879?focusedWorklogId=311147&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-311147
 ]

ASF GitHub Bot logged work on HDDS-1879:


Author: ASF GitHub Bot
Created on: 12/Sep/19 03:50
Start Date: 12/Sep/19 03:50
Worklog Time Spent: 10m 
  Work Description: hadoop-yetus commented on issue #1194: HDDS-1879.  
Support multiple excluded scopes when choosing datanodes in NetworkTopology
URL: https://github.com/apache/hadoop/pull/1194#issuecomment-530653069
 
 
   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime | Comment |
   |::|--:|:|:|
   | 0 | reexec | 105 | Docker mode activated. |
   ||| _ Prechecks _ |
   | +1 | dupname | 0 | No case conflicting files found. |
   | +1 | @author | 0 | The patch does not contain any @author tags. |
   | +1 | test4tests | 0 | The patch appears to include 2 new or modified test 
files. |
   ||| _ trunk Compile Tests _ |
   | 0 | mvndep | 28 | Maven dependency ordering for branch |
   | -1 | mvninstall | 33 | hadoop-ozone in trunk failed. |
   | -1 | compile | 22 | hadoop-ozone in trunk failed. |
   | +1 | checkstyle | 68 | trunk passed |
   | +1 | mvnsite | 0 | trunk passed |
   | +1 | shadedclient | 866 | branch has no errors when building and testing 
our client artifacts. |
   | -1 | javadoc | 18 | hadoop-hdds in trunk failed. |
   | -1 | javadoc | 18 | hadoop-ozone in trunk failed. |
   | 0 | spotbugs | 149 | Used deprecated FindBugs config; considering 
switching to SpotBugs. |
   | -1 | findbugs | 26 | hadoop-ozone in trunk failed. |
   ||| _ Patch Compile Tests _ |
   | 0 | mvndep | 18 | Maven dependency ordering for patch |
   | -1 | mvninstall | 33 | hadoop-ozone in the patch failed. |
   | -1 | compile | 26 | hadoop-ozone in the patch failed. |
   | -1 | javac | 53 | hadoop-hdds generated 2 new + 25 unchanged - 2 fixed = 
27 total (was 27) |
   | -1 | javac | 26 | hadoop-ozone in the patch failed. |
   | -0 | checkstyle | 35 | hadoop-hdds: The patch generated 46 new + 259 
unchanged - 55 fixed = 305 total (was 314) |
   | +1 | mvnsite | 0 | the patch passed |
   | +1 | whitespace | 0 | The patch has no whitespace issues. |
   | +1 | shadedclient | 667 | patch has no errors when building and testing 
our client artifacts. |
   | -1 | javadoc | 17 | hadoop-hdds in the patch failed. |
   | -1 | javadoc | 16 | hadoop-ozone in the patch failed. |
   | -1 | findbugs | 25 | hadoop-ozone in the patch failed. |
   ||| _ Other Tests _ |
   | -1 | unit | 212 | hadoop-hdds in the patch failed. |
   | -1 | unit | 27 | hadoop-ozone in the patch failed. |
   | +1 | asflicense | 31 | The patch does not generate ASF License warnings. |
   | | | 3041 | |
   
   
   | Reason | Tests |
   |---:|:--|
   | Failed junit tests | 
hadoop.hdds.scm.container.placement.algorithms.TestSCMContainerPlacementRackAware
 |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | Client=19.03.1 Server=19.03.1 base: 
https://builds.apache.org/job/hadoop-multibranch/job/PR-1194/15/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/1194 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall 
mvnsite unit shadedclient findbugs checkstyle |
   | uname | Linux 84fd38c81e89 4.15.0-58-generic #64-Ubuntu SMP Tue Aug 6 
11:12:41 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | personality/hadoop.sh |
   | git revision | trunk / 3b06f0b |
   | Default Java | 1.8.0_222 |
   | mvninstall | 
https://builds.apache.org/job/hadoop-multibranch/job/PR-1194/15/artifact/out/branch-mvninstall-hadoop-ozone.txt
 |
   | compile | 
https://builds.apache.org/job/hadoop-multibranch/job/PR-1194/15/artifact/out/branch-compile-hadoop-ozone.txt
 |
   | javadoc | 
https://builds.apache.org/job/hadoop-multibranch/job/PR-1194/15/artifact/out/branch-javadoc-hadoop-hdds.txt
 |
   | javadoc | 
https://builds.apache.org/job/hadoop-multibranch/job/PR-1194/15/artifact/out/branch-javadoc-hadoop-ozone.txt
 |
   | findbugs | 
https://builds.apache.org/job/hadoop-multibranch/job/PR-1194/15/artifact/out/branch-findbugs-hadoop-ozone.txt
 |
   | mvninstall | 
https://builds.apache.org/job/hadoop-multibranch/job/PR-1194/15/artifact/out/patch-mvninstall-hadoop-ozone.txt
 |
   | compile | 
https://builds.apache.org/job/hadoop-multibranch/job/PR-1194/15/artifact/out/patch-compile-hadoop-ozone.txt
 |
   | javac | 
https://builds.apache.org/job/hadoop-multibranch/job/PR-1194/15/artifact/out/diff-compile-javac-hadoop-hdds.txt
 |
   | javac | 
https://builds.apache.org/job/hadoop-multibranch/job/PR-1194/15/artifact/out/patch-compile-hadoop-ozone.txt
 |
   | checkstyle | 
https://builds.apache.org/job/hadoop-multibranch/job/PR-1194/15/artifact/out/diff-checkstyle-hadoop-hdds.txt
 |
   | javadoc | 
htt

[jira] [Commented] (HDFS-14840) Use Java Conccurent Instead of Synchronization in BlockPoolTokenSecretManager

2019-09-11 Thread Hudson (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-14840?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16928177#comment-16928177
 ] 

Hudson commented on HDFS-14840:
---

FAILURE: Integrated in Jenkins build Hadoop-trunk-Commit #17281 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/17281/])
HDFS-14840. Use Java Conccurent Instead of Synchronization in (aajisaka: rev 
68612a0410066af745e80d3b6d732a6151a635e3)
* (edit) 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/security/token/block/BlockPoolTokenSecretManager.java


> Use Java Conccurent Instead of Synchronization in BlockPoolTokenSecretManager
> -
>
> Key: HDFS-14840
> URL: https://issues.apache.org/jira/browse/HDFS-14840
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs
>Affects Versions: 3.2.0
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Minor
> Fix For: 3.3.0
>
> Attachments: HDFS-14840.1.patch
>
>
> https://github.com/apache/hadoop/blob/d8bac50e12d243ef8fd2c7e0ce5c9997131dee74/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/security/token/block/BlockPoolTokenSecretManager.java#L40
> Instead of synchronizing the entire class, just synchronize the collection.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Comment Edited] (HDFS-14768) In some cases, erasure blocks are corruption when they are reconstruct.

2019-09-11 Thread Zhao Yi Ming (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-14768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16928172#comment-16928172
 ] 

Zhao Yi Ming edited comment on HDFS-14768 at 9/12/19 3:45 AM:
--

One more information, hope it can give help. Through [~gjhkael]'s UT make the 
decommission hang, but I check the local data folder, seems the reconstruction 
worked well. Also with my changed on the UT, the reconstruction worked well 
too. Attached the folder file list.

[^guojh_UT_before_deomission.txt]

[^guojh_UT_after_deomission.txt]

[^zhaoyiming_UT_beofre_deomission.txt]

[^zhaoyiming_UT_after_deomission.txt]


was (Author: zhaoyim):
One more information, hope it can give help. Through [~gjhkael]'s UT make the 
decommission hang, but I check the local data folder, seems the reconstruction 
worked well. Also with my changed on the UT, the reconstruction worked well 
too. Attached the folder file list.

> In some cases, erasure blocks are corruption  when they are reconstruct.
> 
>
> Key: HDFS-14768
> URL: https://issues.apache.org/jira/browse/HDFS-14768
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode, erasure-coding, hdfs, namenode
>Affects Versions: 3.0.2
>Reporter: guojh
>Assignee: guojh
>Priority: Major
>  Labels: patch
> Fix For: 3.3.0
>
> Attachments: HDFS-14768.000.patch, guojh_UT_after_deomission.txt, 
> guojh_UT_before_deomission.txt, zhaoyiming_UT_after_deomission.txt, 
> zhaoyiming_UT_beofre_deomission.txt
>
>
> Policy is RS-6-3-1024K, version is hadoop 3.0.2;
> We suppose a file's block Index is [0,1,2,3,4,5,6,7,8], And decommission 
> index[3,4], increase the index 6 datanode's
> pendingReplicationWithoutTargets  that make it large than 
> replicationStreamsHardLimit(we set 14). Then, After the method 
> chooseSourceDatanodes of BlockMananger, the liveBlockIndices is 
> [0,1,2,3,4,5,7,8], Block Counter is, Live:7, Decommission:2. 
> In method scheduleReconstruction of BlockManager, the additionalReplRequired 
> is 9 - 7 = 2. After Namenode choose two target Datanode, will assign a 
> erasureCode task to target datanode.
> When datanode get the task will build  targetIndices from liveBlockIndices 
> and target length. the code is blow.
> {code:java}
> // code placeholder
> targetIndices = new short[targets.length];
> private void initTargetIndices() { 
>   BitSet bitset = reconstructor.getLiveBitSet();
>   int m = 0; hasValidTargets = false; 
>   for (int i = 0; i < dataBlkNum + parityBlkNum; i++) {  
> if (!bitset.get) {    
>   if (reconstructor.getBlockLen > 0) {
>        if (m < targets.length) {
>          targetIndices[m++] = (short)i;
>          hasValidTargets = true;
>         }
>       }
>     }
>  }
> {code}
> targetIndices[0]=6, and targetIndices[1] is aways 0 from initial value.
> The StripedReader is  aways create reader from first 6 index block, and is 
> [0,1,2,3,4,5]
> Use the index [0,1,2,3,4,5] to build target index[6,0] will trigger the isal 
> bug. the block index6's data is corruption(all data is zero).
> I write a unit test can stabilize repreduce.
> {code:java}
> // code placeholder
> public void testFileDecommission() throws Exception {
>   LOG.info("Starting test testFileDecommission");
>   final Path ecFile = new Path(ecDir, "testFileDecommission");
>   int writeBytes = cellSize * dataBlocks;
>   writeStripedFile(dfs, ecFile, writeBytes);
>   Assert.assertEquals(0, bm.numOfUnderReplicatedBlocks());
>   FileChecksum fileChecksum1 = dfs.getFileChecksum(ecFile, writeBytes);
>   LocatedBlocks locatedBlocks =
>   StripedFileTestUtil.getLocatedBlocks(ecFile, dfs);
>   LocatedBlock lb = dfs.getClient().getLocatedBlocks(ecFile.toString(), 0)
>   .get(0);
>   DatanodeInfo[] dnLocs = lb.getLocations();
>   LocatedStripedBlock lastBlock =
>   (LocatedStripedBlock)locatedBlocks.getLastLocatedBlock();
>   DatanodeInfo[] storageInfos = lastBlock.getLocations();
>   // 
>   DatanodeDescriptor datanodeDescriptor = 
> cluster.getNameNode().getNamesystem()
>   
> .getBlockManager().getDatanodeManager().getDatanode(storageInfos[6].getDatanodeUuid());
>   for (int i = 0; i < 100; i++) {
> datanodeDescriptor.incrementPendingReplicationWithoutTargets();
>   }
>   assertEquals(dataBlocks + parityBlocks, dnLocs.length);
>   int[] decommNodeIndex = {3, 4};
>   final List decommisionNodes = new ArrayList();
>   // add the node which will be decommissioning
>   decommisionNodes.add(dnLocs[decommNodeIndex[0]]);
>   decommisionNodes.add(dnLocs[decommNodeIndex[1]]);
>   decommissionNode(0, decommisionNodes, AdminStates.DECOMMISSIONED);
>   assertEquals(decommisionNodes.size(), fsn.getNumDecomLiveDataNodes());
>   //assertNull(checkFile(dfs, ecFile, 9,

[jira] [Updated] (HDFS-14768) In some cases, erasure blocks are corruption when they are reconstruct.

2019-09-11 Thread Zhao Yi Ming (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-14768?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhao Yi Ming updated HDFS-14768:

Attachment: guojh_UT_after_deomission.txt
guojh_UT_before_deomission.txt

> In some cases, erasure blocks are corruption  when they are reconstruct.
> 
>
> Key: HDFS-14768
> URL: https://issues.apache.org/jira/browse/HDFS-14768
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode, erasure-coding, hdfs, namenode
>Affects Versions: 3.0.2
>Reporter: guojh
>Assignee: guojh
>Priority: Major
>  Labels: patch
> Fix For: 3.3.0
>
> Attachments: HDFS-14768.000.patch, guojh_UT_after_deomission.txt, 
> guojh_UT_before_deomission.txt, zhaoyiming_UT_after_deomission.txt, 
> zhaoyiming_UT_beofre_deomission.txt
>
>
> Policy is RS-6-3-1024K, version is hadoop 3.0.2;
> We suppose a file's block Index is [0,1,2,3,4,5,6,7,8], And decommission 
> index[3,4], increase the index 6 datanode's
> pendingReplicationWithoutTargets  that make it large than 
> replicationStreamsHardLimit(we set 14). Then, After the method 
> chooseSourceDatanodes of BlockMananger, the liveBlockIndices is 
> [0,1,2,3,4,5,7,8], Block Counter is, Live:7, Decommission:2. 
> In method scheduleReconstruction of BlockManager, the additionalReplRequired 
> is 9 - 7 = 2. After Namenode choose two target Datanode, will assign a 
> erasureCode task to target datanode.
> When datanode get the task will build  targetIndices from liveBlockIndices 
> and target length. the code is blow.
> {code:java}
> // code placeholder
> targetIndices = new short[targets.length];
> private void initTargetIndices() { 
>   BitSet bitset = reconstructor.getLiveBitSet();
>   int m = 0; hasValidTargets = false; 
>   for (int i = 0; i < dataBlkNum + parityBlkNum; i++) {  
> if (!bitset.get) {    
>   if (reconstructor.getBlockLen > 0) {
>        if (m < targets.length) {
>          targetIndices[m++] = (short)i;
>          hasValidTargets = true;
>         }
>       }
>     }
>  }
> {code}
> targetIndices[0]=6, and targetIndices[1] is aways 0 from initial value.
> The StripedReader is  aways create reader from first 6 index block, and is 
> [0,1,2,3,4,5]
> Use the index [0,1,2,3,4,5] to build target index[6,0] will trigger the isal 
> bug. the block index6's data is corruption(all data is zero).
> I write a unit test can stabilize repreduce.
> {code:java}
> // code placeholder
> public void testFileDecommission() throws Exception {
>   LOG.info("Starting test testFileDecommission");
>   final Path ecFile = new Path(ecDir, "testFileDecommission");
>   int writeBytes = cellSize * dataBlocks;
>   writeStripedFile(dfs, ecFile, writeBytes);
>   Assert.assertEquals(0, bm.numOfUnderReplicatedBlocks());
>   FileChecksum fileChecksum1 = dfs.getFileChecksum(ecFile, writeBytes);
>   LocatedBlocks locatedBlocks =
>   StripedFileTestUtil.getLocatedBlocks(ecFile, dfs);
>   LocatedBlock lb = dfs.getClient().getLocatedBlocks(ecFile.toString(), 0)
>   .get(0);
>   DatanodeInfo[] dnLocs = lb.getLocations();
>   LocatedStripedBlock lastBlock =
>   (LocatedStripedBlock)locatedBlocks.getLastLocatedBlock();
>   DatanodeInfo[] storageInfos = lastBlock.getLocations();
>   // 
>   DatanodeDescriptor datanodeDescriptor = 
> cluster.getNameNode().getNamesystem()
>   
> .getBlockManager().getDatanodeManager().getDatanode(storageInfos[6].getDatanodeUuid());
>   for (int i = 0; i < 100; i++) {
> datanodeDescriptor.incrementPendingReplicationWithoutTargets();
>   }
>   assertEquals(dataBlocks + parityBlocks, dnLocs.length);
>   int[] decommNodeIndex = {3, 4};
>   final List decommisionNodes = new ArrayList();
>   // add the node which will be decommissioning
>   decommisionNodes.add(dnLocs[decommNodeIndex[0]]);
>   decommisionNodes.add(dnLocs[decommNodeIndex[1]]);
>   decommissionNode(0, decommisionNodes, AdminStates.DECOMMISSIONED);
>   assertEquals(decommisionNodes.size(), fsn.getNumDecomLiveDataNodes());
>   //assertNull(checkFile(dfs, ecFile, 9, decommisionNodes, numDNs));
>   // Ensure decommissioned datanode is not automatically shutdown
>   DFSClient client = getDfsClient(cluster.getNameNode(0), conf);
>   assertEquals("All datanodes must be alive", numDNs,
>   client.datanodeReport(DatanodeReportType.LIVE).length);
>   FileChecksum fileChecksum2 = dfs.getFileChecksum(ecFile, writeBytes);
>   Assert.assertTrue("Checksum mismatches!",
>   fileChecksum1.equals(fileChecksum2));
>   StripedFileTestUtil.checkData(dfs, ecFile, writeBytes, decommisionNodes,
>   null, blockGroupSize);
> }
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail:

[jira] [Commented] (HDFS-14836) FileIoProvider should not increase FileIoErrors metric in datanode volume metric

2019-09-11 Thread Aiphago (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-14836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16928173#comment-16928173
 ] 

Aiphago commented on HDFS-14836:


Thanks [~jojochuang] for the comment.
{quote}it would be the best if we can avoid string-matching exception messages. 
"Broken pipe" and "Connection reset" are usually thrown as a SocketException. 
Would it make sense to check exception class name instead? Better, catch 
SocketException and do not call onFailure().
{quote}
if we just check exception class name instead, the range maybe too big.I means 
maybe there are some other SocketException and not match "Broken pipe" and 
"Connection reset" .So why we not keep consistency to  -HDFS-2054 .-
{quote}for socket related exceptions, you don't want to call onFailure(); 
however, those exceptions should be re-thrown too. They should not be ignored 
silently.
{quote}
it's a good suggestion,I will change later.

> FileIoProvider should not increase FileIoErrors metric in datanode volume 
> metric
> 
>
> Key: HDFS-14836
> URL: https://issues.apache.org/jira/browse/HDFS-14836
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Aiphago
>Assignee: Aiphago
>Priority: Minor
> Attachments: HDFS-14836.patch
>
>
> I found that  FileIoErrors metric will increase in 
> BlockSender.sendPacket(),when use fileIoProvider.transferToSocketFully().But 
> in https://issues.apache.org/jira/browse/HDFS-2054 the Exception has been 
> ignore like "Broken pipe" and "Connection reset" .
> So should do a filter when fileIoProvider increase FileIoErrors count ?



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-14768) In some cases, erasure blocks are corruption when they are reconstruct.

2019-09-11 Thread Zhao Yi Ming (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-14768?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhao Yi Ming updated HDFS-14768:

Attachment: (was: zhaoyiming_changes_beofre_deomission.txt)

> In some cases, erasure blocks are corruption  when they are reconstruct.
> 
>
> Key: HDFS-14768
> URL: https://issues.apache.org/jira/browse/HDFS-14768
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode, erasure-coding, hdfs, namenode
>Affects Versions: 3.0.2
>Reporter: guojh
>Assignee: guojh
>Priority: Major
>  Labels: patch
> Fix For: 3.3.0
>
> Attachments: HDFS-14768.000.patch
>
>
> Policy is RS-6-3-1024K, version is hadoop 3.0.2;
> We suppose a file's block Index is [0,1,2,3,4,5,6,7,8], And decommission 
> index[3,4], increase the index 6 datanode's
> pendingReplicationWithoutTargets  that make it large than 
> replicationStreamsHardLimit(we set 14). Then, After the method 
> chooseSourceDatanodes of BlockMananger, the liveBlockIndices is 
> [0,1,2,3,4,5,7,8], Block Counter is, Live:7, Decommission:2. 
> In method scheduleReconstruction of BlockManager, the additionalReplRequired 
> is 9 - 7 = 2. After Namenode choose two target Datanode, will assign a 
> erasureCode task to target datanode.
> When datanode get the task will build  targetIndices from liveBlockIndices 
> and target length. the code is blow.
> {code:java}
> // code placeholder
> targetIndices = new short[targets.length];
> private void initTargetIndices() { 
>   BitSet bitset = reconstructor.getLiveBitSet();
>   int m = 0; hasValidTargets = false; 
>   for (int i = 0; i < dataBlkNum + parityBlkNum; i++) {  
> if (!bitset.get) {    
>   if (reconstructor.getBlockLen > 0) {
>        if (m < targets.length) {
>          targetIndices[m++] = (short)i;
>          hasValidTargets = true;
>         }
>       }
>     }
>  }
> {code}
> targetIndices[0]=6, and targetIndices[1] is aways 0 from initial value.
> The StripedReader is  aways create reader from first 6 index block, and is 
> [0,1,2,3,4,5]
> Use the index [0,1,2,3,4,5] to build target index[6,0] will trigger the isal 
> bug. the block index6's data is corruption(all data is zero).
> I write a unit test can stabilize repreduce.
> {code:java}
> // code placeholder
> public void testFileDecommission() throws Exception {
>   LOG.info("Starting test testFileDecommission");
>   final Path ecFile = new Path(ecDir, "testFileDecommission");
>   int writeBytes = cellSize * dataBlocks;
>   writeStripedFile(dfs, ecFile, writeBytes);
>   Assert.assertEquals(0, bm.numOfUnderReplicatedBlocks());
>   FileChecksum fileChecksum1 = dfs.getFileChecksum(ecFile, writeBytes);
>   LocatedBlocks locatedBlocks =
>   StripedFileTestUtil.getLocatedBlocks(ecFile, dfs);
>   LocatedBlock lb = dfs.getClient().getLocatedBlocks(ecFile.toString(), 0)
>   .get(0);
>   DatanodeInfo[] dnLocs = lb.getLocations();
>   LocatedStripedBlock lastBlock =
>   (LocatedStripedBlock)locatedBlocks.getLastLocatedBlock();
>   DatanodeInfo[] storageInfos = lastBlock.getLocations();
>   // 
>   DatanodeDescriptor datanodeDescriptor = 
> cluster.getNameNode().getNamesystem()
>   
> .getBlockManager().getDatanodeManager().getDatanode(storageInfos[6].getDatanodeUuid());
>   for (int i = 0; i < 100; i++) {
> datanodeDescriptor.incrementPendingReplicationWithoutTargets();
>   }
>   assertEquals(dataBlocks + parityBlocks, dnLocs.length);
>   int[] decommNodeIndex = {3, 4};
>   final List decommisionNodes = new ArrayList();
>   // add the node which will be decommissioning
>   decommisionNodes.add(dnLocs[decommNodeIndex[0]]);
>   decommisionNodes.add(dnLocs[decommNodeIndex[1]]);
>   decommissionNode(0, decommisionNodes, AdminStates.DECOMMISSIONED);
>   assertEquals(decommisionNodes.size(), fsn.getNumDecomLiveDataNodes());
>   //assertNull(checkFile(dfs, ecFile, 9, decommisionNodes, numDNs));
>   // Ensure decommissioned datanode is not automatically shutdown
>   DFSClient client = getDfsClient(cluster.getNameNode(0), conf);
>   assertEquals("All datanodes must be alive", numDNs,
>   client.datanodeReport(DatanodeReportType.LIVE).length);
>   FileChecksum fileChecksum2 = dfs.getFileChecksum(ecFile, writeBytes);
>   Assert.assertTrue("Checksum mismatches!",
>   fileChecksum1.equals(fileChecksum2));
>   StripedFileTestUtil.checkData(dfs, ecFile, writeBytes, decommisionNodes,
>   null, blockGroupSize);
> }
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-14768) In some cases, erasure blocks are corruption when they are reconstruct.

2019-09-11 Thread Zhao Yi Ming (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-14768?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhao Yi Ming updated HDFS-14768:

Attachment: (was: zhaoyiming_changes_after_deomission.txt)

> In some cases, erasure blocks are corruption  when they are reconstruct.
> 
>
> Key: HDFS-14768
> URL: https://issues.apache.org/jira/browse/HDFS-14768
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode, erasure-coding, hdfs, namenode
>Affects Versions: 3.0.2
>Reporter: guojh
>Assignee: guojh
>Priority: Major
>  Labels: patch
> Fix For: 3.3.0
>
> Attachments: HDFS-14768.000.patch
>
>
> Policy is RS-6-3-1024K, version is hadoop 3.0.2;
> We suppose a file's block Index is [0,1,2,3,4,5,6,7,8], And decommission 
> index[3,4], increase the index 6 datanode's
> pendingReplicationWithoutTargets  that make it large than 
> replicationStreamsHardLimit(we set 14). Then, After the method 
> chooseSourceDatanodes of BlockMananger, the liveBlockIndices is 
> [0,1,2,3,4,5,7,8], Block Counter is, Live:7, Decommission:2. 
> In method scheduleReconstruction of BlockManager, the additionalReplRequired 
> is 9 - 7 = 2. After Namenode choose two target Datanode, will assign a 
> erasureCode task to target datanode.
> When datanode get the task will build  targetIndices from liveBlockIndices 
> and target length. the code is blow.
> {code:java}
> // code placeholder
> targetIndices = new short[targets.length];
> private void initTargetIndices() { 
>   BitSet bitset = reconstructor.getLiveBitSet();
>   int m = 0; hasValidTargets = false; 
>   for (int i = 0; i < dataBlkNum + parityBlkNum; i++) {  
> if (!bitset.get) {    
>   if (reconstructor.getBlockLen > 0) {
>        if (m < targets.length) {
>          targetIndices[m++] = (short)i;
>          hasValidTargets = true;
>         }
>       }
>     }
>  }
> {code}
> targetIndices[0]=6, and targetIndices[1] is aways 0 from initial value.
> The StripedReader is  aways create reader from first 6 index block, and is 
> [0,1,2,3,4,5]
> Use the index [0,1,2,3,4,5] to build target index[6,0] will trigger the isal 
> bug. the block index6's data is corruption(all data is zero).
> I write a unit test can stabilize repreduce.
> {code:java}
> // code placeholder
> public void testFileDecommission() throws Exception {
>   LOG.info("Starting test testFileDecommission");
>   final Path ecFile = new Path(ecDir, "testFileDecommission");
>   int writeBytes = cellSize * dataBlocks;
>   writeStripedFile(dfs, ecFile, writeBytes);
>   Assert.assertEquals(0, bm.numOfUnderReplicatedBlocks());
>   FileChecksum fileChecksum1 = dfs.getFileChecksum(ecFile, writeBytes);
>   LocatedBlocks locatedBlocks =
>   StripedFileTestUtil.getLocatedBlocks(ecFile, dfs);
>   LocatedBlock lb = dfs.getClient().getLocatedBlocks(ecFile.toString(), 0)
>   .get(0);
>   DatanodeInfo[] dnLocs = lb.getLocations();
>   LocatedStripedBlock lastBlock =
>   (LocatedStripedBlock)locatedBlocks.getLastLocatedBlock();
>   DatanodeInfo[] storageInfos = lastBlock.getLocations();
>   // 
>   DatanodeDescriptor datanodeDescriptor = 
> cluster.getNameNode().getNamesystem()
>   
> .getBlockManager().getDatanodeManager().getDatanode(storageInfos[6].getDatanodeUuid());
>   for (int i = 0; i < 100; i++) {
> datanodeDescriptor.incrementPendingReplicationWithoutTargets();
>   }
>   assertEquals(dataBlocks + parityBlocks, dnLocs.length);
>   int[] decommNodeIndex = {3, 4};
>   final List decommisionNodes = new ArrayList();
>   // add the node which will be decommissioning
>   decommisionNodes.add(dnLocs[decommNodeIndex[0]]);
>   decommisionNodes.add(dnLocs[decommNodeIndex[1]]);
>   decommissionNode(0, decommisionNodes, AdminStates.DECOMMISSIONED);
>   assertEquals(decommisionNodes.size(), fsn.getNumDecomLiveDataNodes());
>   //assertNull(checkFile(dfs, ecFile, 9, decommisionNodes, numDNs));
>   // Ensure decommissioned datanode is not automatically shutdown
>   DFSClient client = getDfsClient(cluster.getNameNode(0), conf);
>   assertEquals("All datanodes must be alive", numDNs,
>   client.datanodeReport(DatanodeReportType.LIVE).length);
>   FileChecksum fileChecksum2 = dfs.getFileChecksum(ecFile, writeBytes);
>   Assert.assertTrue("Checksum mismatches!",
>   fileChecksum1.equals(fileChecksum2));
>   StripedFileTestUtil.checkData(dfs, ecFile, writeBytes, decommisionNodes,
>   null, blockGroupSize);
> }
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-14768) In some cases, erasure blocks are corruption when they are reconstruct.

2019-09-11 Thread Zhao Yi Ming (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-14768?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhao Yi Ming updated HDFS-14768:

Attachment: (was: guojh_changes_before_deomission.txt)

> In some cases, erasure blocks are corruption  when they are reconstruct.
> 
>
> Key: HDFS-14768
> URL: https://issues.apache.org/jira/browse/HDFS-14768
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode, erasure-coding, hdfs, namenode
>Affects Versions: 3.0.2
>Reporter: guojh
>Assignee: guojh
>Priority: Major
>  Labels: patch
> Fix For: 3.3.0
>
> Attachments: HDFS-14768.000.patch
>
>
> Policy is RS-6-3-1024K, version is hadoop 3.0.2;
> We suppose a file's block Index is [0,1,2,3,4,5,6,7,8], And decommission 
> index[3,4], increase the index 6 datanode's
> pendingReplicationWithoutTargets  that make it large than 
> replicationStreamsHardLimit(we set 14). Then, After the method 
> chooseSourceDatanodes of BlockMananger, the liveBlockIndices is 
> [0,1,2,3,4,5,7,8], Block Counter is, Live:7, Decommission:2. 
> In method scheduleReconstruction of BlockManager, the additionalReplRequired 
> is 9 - 7 = 2. After Namenode choose two target Datanode, will assign a 
> erasureCode task to target datanode.
> When datanode get the task will build  targetIndices from liveBlockIndices 
> and target length. the code is blow.
> {code:java}
> // code placeholder
> targetIndices = new short[targets.length];
> private void initTargetIndices() { 
>   BitSet bitset = reconstructor.getLiveBitSet();
>   int m = 0; hasValidTargets = false; 
>   for (int i = 0; i < dataBlkNum + parityBlkNum; i++) {  
> if (!bitset.get) {    
>   if (reconstructor.getBlockLen > 0) {
>        if (m < targets.length) {
>          targetIndices[m++] = (short)i;
>          hasValidTargets = true;
>         }
>       }
>     }
>  }
> {code}
> targetIndices[0]=6, and targetIndices[1] is aways 0 from initial value.
> The StripedReader is  aways create reader from first 6 index block, and is 
> [0,1,2,3,4,5]
> Use the index [0,1,2,3,4,5] to build target index[6,0] will trigger the isal 
> bug. the block index6's data is corruption(all data is zero).
> I write a unit test can stabilize repreduce.
> {code:java}
> // code placeholder
> public void testFileDecommission() throws Exception {
>   LOG.info("Starting test testFileDecommission");
>   final Path ecFile = new Path(ecDir, "testFileDecommission");
>   int writeBytes = cellSize * dataBlocks;
>   writeStripedFile(dfs, ecFile, writeBytes);
>   Assert.assertEquals(0, bm.numOfUnderReplicatedBlocks());
>   FileChecksum fileChecksum1 = dfs.getFileChecksum(ecFile, writeBytes);
>   LocatedBlocks locatedBlocks =
>   StripedFileTestUtil.getLocatedBlocks(ecFile, dfs);
>   LocatedBlock lb = dfs.getClient().getLocatedBlocks(ecFile.toString(), 0)
>   .get(0);
>   DatanodeInfo[] dnLocs = lb.getLocations();
>   LocatedStripedBlock lastBlock =
>   (LocatedStripedBlock)locatedBlocks.getLastLocatedBlock();
>   DatanodeInfo[] storageInfos = lastBlock.getLocations();
>   // 
>   DatanodeDescriptor datanodeDescriptor = 
> cluster.getNameNode().getNamesystem()
>   
> .getBlockManager().getDatanodeManager().getDatanode(storageInfos[6].getDatanodeUuid());
>   for (int i = 0; i < 100; i++) {
> datanodeDescriptor.incrementPendingReplicationWithoutTargets();
>   }
>   assertEquals(dataBlocks + parityBlocks, dnLocs.length);
>   int[] decommNodeIndex = {3, 4};
>   final List decommisionNodes = new ArrayList();
>   // add the node which will be decommissioning
>   decommisionNodes.add(dnLocs[decommNodeIndex[0]]);
>   decommisionNodes.add(dnLocs[decommNodeIndex[1]]);
>   decommissionNode(0, decommisionNodes, AdminStates.DECOMMISSIONED);
>   assertEquals(decommisionNodes.size(), fsn.getNumDecomLiveDataNodes());
>   //assertNull(checkFile(dfs, ecFile, 9, decommisionNodes, numDNs));
>   // Ensure decommissioned datanode is not automatically shutdown
>   DFSClient client = getDfsClient(cluster.getNameNode(0), conf);
>   assertEquals("All datanodes must be alive", numDNs,
>   client.datanodeReport(DatanodeReportType.LIVE).length);
>   FileChecksum fileChecksum2 = dfs.getFileChecksum(ecFile, writeBytes);
>   Assert.assertTrue("Checksum mismatches!",
>   fileChecksum1.equals(fileChecksum2));
>   StripedFileTestUtil.checkData(dfs, ecFile, writeBytes, decommisionNodes,
>   null, blockGroupSize);
> }
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-14768) In some cases, erasure blocks are corruption when they are reconstruct.

2019-09-11 Thread Zhao Yi Ming (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-14768?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhao Yi Ming updated HDFS-14768:

Attachment: (was: guojh_changes_after_deomission.txt)

> In some cases, erasure blocks are corruption  when they are reconstruct.
> 
>
> Key: HDFS-14768
> URL: https://issues.apache.org/jira/browse/HDFS-14768
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode, erasure-coding, hdfs, namenode
>Affects Versions: 3.0.2
>Reporter: guojh
>Assignee: guojh
>Priority: Major
>  Labels: patch
> Fix For: 3.3.0
>
> Attachments: HDFS-14768.000.patch
>
>
> Policy is RS-6-3-1024K, version is hadoop 3.0.2;
> We suppose a file's block Index is [0,1,2,3,4,5,6,7,8], And decommission 
> index[3,4], increase the index 6 datanode's
> pendingReplicationWithoutTargets  that make it large than 
> replicationStreamsHardLimit(we set 14). Then, After the method 
> chooseSourceDatanodes of BlockMananger, the liveBlockIndices is 
> [0,1,2,3,4,5,7,8], Block Counter is, Live:7, Decommission:2. 
> In method scheduleReconstruction of BlockManager, the additionalReplRequired 
> is 9 - 7 = 2. After Namenode choose two target Datanode, will assign a 
> erasureCode task to target datanode.
> When datanode get the task will build  targetIndices from liveBlockIndices 
> and target length. the code is blow.
> {code:java}
> // code placeholder
> targetIndices = new short[targets.length];
> private void initTargetIndices() { 
>   BitSet bitset = reconstructor.getLiveBitSet();
>   int m = 0; hasValidTargets = false; 
>   for (int i = 0; i < dataBlkNum + parityBlkNum; i++) {  
> if (!bitset.get) {    
>   if (reconstructor.getBlockLen > 0) {
>        if (m < targets.length) {
>          targetIndices[m++] = (short)i;
>          hasValidTargets = true;
>         }
>       }
>     }
>  }
> {code}
> targetIndices[0]=6, and targetIndices[1] is aways 0 from initial value.
> The StripedReader is  aways create reader from first 6 index block, and is 
> [0,1,2,3,4,5]
> Use the index [0,1,2,3,4,5] to build target index[6,0] will trigger the isal 
> bug. the block index6's data is corruption(all data is zero).
> I write a unit test can stabilize repreduce.
> {code:java}
> // code placeholder
> public void testFileDecommission() throws Exception {
>   LOG.info("Starting test testFileDecommission");
>   final Path ecFile = new Path(ecDir, "testFileDecommission");
>   int writeBytes = cellSize * dataBlocks;
>   writeStripedFile(dfs, ecFile, writeBytes);
>   Assert.assertEquals(0, bm.numOfUnderReplicatedBlocks());
>   FileChecksum fileChecksum1 = dfs.getFileChecksum(ecFile, writeBytes);
>   LocatedBlocks locatedBlocks =
>   StripedFileTestUtil.getLocatedBlocks(ecFile, dfs);
>   LocatedBlock lb = dfs.getClient().getLocatedBlocks(ecFile.toString(), 0)
>   .get(0);
>   DatanodeInfo[] dnLocs = lb.getLocations();
>   LocatedStripedBlock lastBlock =
>   (LocatedStripedBlock)locatedBlocks.getLastLocatedBlock();
>   DatanodeInfo[] storageInfos = lastBlock.getLocations();
>   // 
>   DatanodeDescriptor datanodeDescriptor = 
> cluster.getNameNode().getNamesystem()
>   
> .getBlockManager().getDatanodeManager().getDatanode(storageInfos[6].getDatanodeUuid());
>   for (int i = 0; i < 100; i++) {
> datanodeDescriptor.incrementPendingReplicationWithoutTargets();
>   }
>   assertEquals(dataBlocks + parityBlocks, dnLocs.length);
>   int[] decommNodeIndex = {3, 4};
>   final List decommisionNodes = new ArrayList();
>   // add the node which will be decommissioning
>   decommisionNodes.add(dnLocs[decommNodeIndex[0]]);
>   decommisionNodes.add(dnLocs[decommNodeIndex[1]]);
>   decommissionNode(0, decommisionNodes, AdminStates.DECOMMISSIONED);
>   assertEquals(decommisionNodes.size(), fsn.getNumDecomLiveDataNodes());
>   //assertNull(checkFile(dfs, ecFile, 9, decommisionNodes, numDNs));
>   // Ensure decommissioned datanode is not automatically shutdown
>   DFSClient client = getDfsClient(cluster.getNameNode(0), conf);
>   assertEquals("All datanodes must be alive", numDNs,
>   client.datanodeReport(DatanodeReportType.LIVE).length);
>   FileChecksum fileChecksum2 = dfs.getFileChecksum(ecFile, writeBytes);
>   Assert.assertTrue("Checksum mismatches!",
>   fileChecksum1.equals(fileChecksum2));
>   StripedFileTestUtil.checkData(dfs, ecFile, writeBytes, decommisionNodes,
>   null, blockGroupSize);
> }
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-14840) Use Java Conccurent Instead of Synchronization in BlockPoolTokenSecretManager

2019-09-11 Thread Akira Ajisaka (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-14840?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Akira Ajisaka updated HDFS-14840:
-
Summary: Use Java Conccurent Instead of Synchronization in 
BlockPoolTokenSecretManager  (was: Use Java Conccurent Instead of 
Synchrnoization in BlockPoolTokenSecretManager)

> Use Java Conccurent Instead of Synchronization in BlockPoolTokenSecretManager
> -
>
> Key: HDFS-14840
> URL: https://issues.apache.org/jira/browse/HDFS-14840
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs
>Affects Versions: 3.2.0
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Minor
> Attachments: HDFS-14840.1.patch
>
>
> https://github.com/apache/hadoop/blob/d8bac50e12d243ef8fd2c7e0ce5c9997131dee74/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/security/token/block/BlockPoolTokenSecretManager.java#L40
> Instead of synchronizing the entire class, just synchronize the collection.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-14768) In some cases, erasure blocks are corruption when they are reconstruct.

2019-09-11 Thread Zhao Yi Ming (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-14768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16928172#comment-16928172
 ] 

Zhao Yi Ming commented on HDFS-14768:
-

One more information, hope it can give help. Through [~gjhkael]'s UT make the 
decommission hang, but I check the local data folder, seems the reconstruction 
worked well. Also with my changed on the UT, the reconstruction worked well 
too. Attached the folder file list.

> In some cases, erasure blocks are corruption  when they are reconstruct.
> 
>
> Key: HDFS-14768
> URL: https://issues.apache.org/jira/browse/HDFS-14768
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode, erasure-coding, hdfs, namenode
>Affects Versions: 3.0.2
>Reporter: guojh
>Assignee: guojh
>Priority: Major
>  Labels: patch
> Fix For: 3.3.0
>
> Attachments: HDFS-14768.000.patch, 
> guojh_changes_after_deomission.txt, guojh_changes_before_deomission.txt, 
> zhaoyiming_changes_after_deomission.txt, 
> zhaoyiming_changes_beofre_deomission.txt
>
>
> Policy is RS-6-3-1024K, version is hadoop 3.0.2;
> We suppose a file's block Index is [0,1,2,3,4,5,6,7,8], And decommission 
> index[3,4], increase the index 6 datanode's
> pendingReplicationWithoutTargets  that make it large than 
> replicationStreamsHardLimit(we set 14). Then, After the method 
> chooseSourceDatanodes of BlockMananger, the liveBlockIndices is 
> [0,1,2,3,4,5,7,8], Block Counter is, Live:7, Decommission:2. 
> In method scheduleReconstruction of BlockManager, the additionalReplRequired 
> is 9 - 7 = 2. After Namenode choose two target Datanode, will assign a 
> erasureCode task to target datanode.
> When datanode get the task will build  targetIndices from liveBlockIndices 
> and target length. the code is blow.
> {code:java}
> // code placeholder
> targetIndices = new short[targets.length];
> private void initTargetIndices() { 
>   BitSet bitset = reconstructor.getLiveBitSet();
>   int m = 0; hasValidTargets = false; 
>   for (int i = 0; i < dataBlkNum + parityBlkNum; i++) {  
> if (!bitset.get) {    
>   if (reconstructor.getBlockLen > 0) {
>        if (m < targets.length) {
>          targetIndices[m++] = (short)i;
>          hasValidTargets = true;
>         }
>       }
>     }
>  }
> {code}
> targetIndices[0]=6, and targetIndices[1] is aways 0 from initial value.
> The StripedReader is  aways create reader from first 6 index block, and is 
> [0,1,2,3,4,5]
> Use the index [0,1,2,3,4,5] to build target index[6,0] will trigger the isal 
> bug. the block index6's data is corruption(all data is zero).
> I write a unit test can stabilize repreduce.
> {code:java}
> // code placeholder
> public void testFileDecommission() throws Exception {
>   LOG.info("Starting test testFileDecommission");
>   final Path ecFile = new Path(ecDir, "testFileDecommission");
>   int writeBytes = cellSize * dataBlocks;
>   writeStripedFile(dfs, ecFile, writeBytes);
>   Assert.assertEquals(0, bm.numOfUnderReplicatedBlocks());
>   FileChecksum fileChecksum1 = dfs.getFileChecksum(ecFile, writeBytes);
>   LocatedBlocks locatedBlocks =
>   StripedFileTestUtil.getLocatedBlocks(ecFile, dfs);
>   LocatedBlock lb = dfs.getClient().getLocatedBlocks(ecFile.toString(), 0)
>   .get(0);
>   DatanodeInfo[] dnLocs = lb.getLocations();
>   LocatedStripedBlock lastBlock =
>   (LocatedStripedBlock)locatedBlocks.getLastLocatedBlock();
>   DatanodeInfo[] storageInfos = lastBlock.getLocations();
>   // 
>   DatanodeDescriptor datanodeDescriptor = 
> cluster.getNameNode().getNamesystem()
>   
> .getBlockManager().getDatanodeManager().getDatanode(storageInfos[6].getDatanodeUuid());
>   for (int i = 0; i < 100; i++) {
> datanodeDescriptor.incrementPendingReplicationWithoutTargets();
>   }
>   assertEquals(dataBlocks + parityBlocks, dnLocs.length);
>   int[] decommNodeIndex = {3, 4};
>   final List decommisionNodes = new ArrayList();
>   // add the node which will be decommissioning
>   decommisionNodes.add(dnLocs[decommNodeIndex[0]]);
>   decommisionNodes.add(dnLocs[decommNodeIndex[1]]);
>   decommissionNode(0, decommisionNodes, AdminStates.DECOMMISSIONED);
>   assertEquals(decommisionNodes.size(), fsn.getNumDecomLiveDataNodes());
>   //assertNull(checkFile(dfs, ecFile, 9, decommisionNodes, numDNs));
>   // Ensure decommissioned datanode is not automatically shutdown
>   DFSClient client = getDfsClient(cluster.getNameNode(0), conf);
>   assertEquals("All datanodes must be alive", numDNs,
>   client.datanodeReport(DatanodeReportType.LIVE).length);
>   FileChecksum fileChecksum2 = dfs.getFileChecksum(ecFile, writeBytes);
>   Assert.assertTrue("Checksum mismatches!",
>   fileChecksum1.equals(fileChecksum2));
>   StripedFileTestUt

[jira] [Updated] (HDFS-14768) In some cases, erasure blocks are corruption when they are reconstruct.

2019-09-11 Thread Zhao Yi Ming (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-14768?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhao Yi Ming updated HDFS-14768:

Attachment: guojh_changes_after_deomission.txt
guojh_changes_before_deomission.txt
zhaoyiming_changes_after_deomission.txt
zhaoyiming_changes_beofre_deomission.txt

> In some cases, erasure blocks are corruption  when they are reconstruct.
> 
>
> Key: HDFS-14768
> URL: https://issues.apache.org/jira/browse/HDFS-14768
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode, erasure-coding, hdfs, namenode
>Affects Versions: 3.0.2
>Reporter: guojh
>Assignee: guojh
>Priority: Major
>  Labels: patch
> Fix For: 3.3.0
>
> Attachments: HDFS-14768.000.patch, 
> guojh_changes_after_deomission.txt, guojh_changes_before_deomission.txt, 
> zhaoyiming_changes_after_deomission.txt, 
> zhaoyiming_changes_beofre_deomission.txt
>
>
> Policy is RS-6-3-1024K, version is hadoop 3.0.2;
> We suppose a file's block Index is [0,1,2,3,4,5,6,7,8], And decommission 
> index[3,4], increase the index 6 datanode's
> pendingReplicationWithoutTargets  that make it large than 
> replicationStreamsHardLimit(we set 14). Then, After the method 
> chooseSourceDatanodes of BlockMananger, the liveBlockIndices is 
> [0,1,2,3,4,5,7,8], Block Counter is, Live:7, Decommission:2. 
> In method scheduleReconstruction of BlockManager, the additionalReplRequired 
> is 9 - 7 = 2. After Namenode choose two target Datanode, will assign a 
> erasureCode task to target datanode.
> When datanode get the task will build  targetIndices from liveBlockIndices 
> and target length. the code is blow.
> {code:java}
> // code placeholder
> targetIndices = new short[targets.length];
> private void initTargetIndices() { 
>   BitSet bitset = reconstructor.getLiveBitSet();
>   int m = 0; hasValidTargets = false; 
>   for (int i = 0; i < dataBlkNum + parityBlkNum; i++) {  
> if (!bitset.get) {    
>   if (reconstructor.getBlockLen > 0) {
>        if (m < targets.length) {
>          targetIndices[m++] = (short)i;
>          hasValidTargets = true;
>         }
>       }
>     }
>  }
> {code}
> targetIndices[0]=6, and targetIndices[1] is aways 0 from initial value.
> The StripedReader is  aways create reader from first 6 index block, and is 
> [0,1,2,3,4,5]
> Use the index [0,1,2,3,4,5] to build target index[6,0] will trigger the isal 
> bug. the block index6's data is corruption(all data is zero).
> I write a unit test can stabilize repreduce.
> {code:java}
> // code placeholder
> public void testFileDecommission() throws Exception {
>   LOG.info("Starting test testFileDecommission");
>   final Path ecFile = new Path(ecDir, "testFileDecommission");
>   int writeBytes = cellSize * dataBlocks;
>   writeStripedFile(dfs, ecFile, writeBytes);
>   Assert.assertEquals(0, bm.numOfUnderReplicatedBlocks());
>   FileChecksum fileChecksum1 = dfs.getFileChecksum(ecFile, writeBytes);
>   LocatedBlocks locatedBlocks =
>   StripedFileTestUtil.getLocatedBlocks(ecFile, dfs);
>   LocatedBlock lb = dfs.getClient().getLocatedBlocks(ecFile.toString(), 0)
>   .get(0);
>   DatanodeInfo[] dnLocs = lb.getLocations();
>   LocatedStripedBlock lastBlock =
>   (LocatedStripedBlock)locatedBlocks.getLastLocatedBlock();
>   DatanodeInfo[] storageInfos = lastBlock.getLocations();
>   // 
>   DatanodeDescriptor datanodeDescriptor = 
> cluster.getNameNode().getNamesystem()
>   
> .getBlockManager().getDatanodeManager().getDatanode(storageInfos[6].getDatanodeUuid());
>   for (int i = 0; i < 100; i++) {
> datanodeDescriptor.incrementPendingReplicationWithoutTargets();
>   }
>   assertEquals(dataBlocks + parityBlocks, dnLocs.length);
>   int[] decommNodeIndex = {3, 4};
>   final List decommisionNodes = new ArrayList();
>   // add the node which will be decommissioning
>   decommisionNodes.add(dnLocs[decommNodeIndex[0]]);
>   decommisionNodes.add(dnLocs[decommNodeIndex[1]]);
>   decommissionNode(0, decommisionNodes, AdminStates.DECOMMISSIONED);
>   assertEquals(decommisionNodes.size(), fsn.getNumDecomLiveDataNodes());
>   //assertNull(checkFile(dfs, ecFile, 9, decommisionNodes, numDNs));
>   // Ensure decommissioned datanode is not automatically shutdown
>   DFSClient client = getDfsClient(cluster.getNameNode(0), conf);
>   assertEquals("All datanodes must be alive", numDNs,
>   client.datanodeReport(DatanodeReportType.LIVE).length);
>   FileChecksum fileChecksum2 = dfs.getFileChecksum(ecFile, writeBytes);
>   Assert.assertTrue("Checksum mismatches!",
>   fileChecksum1.equals(fileChecksum2));
>   StripedFileTestUtil.checkData(dfs, ecFile, writeBytes, decommisionNodes,
>   null, blockGroupSize);
> }
> {code}
>  



--
This

[jira] [Commented] (HDFS-14768) In some cases, erasure blocks are corruption when they are reconstruct.

2019-09-11 Thread Zhao Yi Ming (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-14768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16928167#comment-16928167
 ] 

Zhao Yi Ming commented on HDFS-14768:
-

[~gjhkael] Thanks for your update!  Here is one question, in your UT if move 
the decommisionNodes() before getLastLocatedBlock(), will it effect the UT 
result? Because I did this changes, the UT can run as normal.

move following code
{code:java}
// code placeholder
  int[] decommNodeIndex = {3, 4};
  final List decommisionNodes = new 
ArrayList();
  // add the node which will be decommissioning
  decommisionNodes.add(dnLocs[decommNodeIndex[0]]);
  decommisionNodes.add(dnLocs[decommNodeIndex[1]]);
  decommissionNode(0, decommisionNodes, AdminStates.DECOMMISSIONED);
  assertEquals(decommisionNodes.size(), fsn.getNumDecomLiveDataNodes());
{code}
to
{code:java}
// code placeholder
  LocatedBlock lb = dfs.getClient().getLocatedBlocks(ecFile.toString(), 
0)
      .get(0);
  DatanodeInfo[] dnLocs = lb.getLocations();

=> Here

  LocatedStripedBlock lastBlock =
      (LocatedStripedBlock)locatedBlocks.getLastLocatedBlock();
  DatanodeInfo[] storageInfos = lastBlock.getLocations();
{code}
 

 

 

> In some cases, erasure blocks are corruption  when they are reconstruct.
> 
>
> Key: HDFS-14768
> URL: https://issues.apache.org/jira/browse/HDFS-14768
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode, erasure-coding, hdfs, namenode
>Affects Versions: 3.0.2
>Reporter: guojh
>Assignee: guojh
>Priority: Major
>  Labels: patch
> Fix For: 3.3.0
>
> Attachments: HDFS-14768.000.patch
>
>
> Policy is RS-6-3-1024K, version is hadoop 3.0.2;
> We suppose a file's block Index is [0,1,2,3,4,5,6,7,8], And decommission 
> index[3,4], increase the index 6 datanode's
> pendingReplicationWithoutTargets  that make it large than 
> replicationStreamsHardLimit(we set 14). Then, After the method 
> chooseSourceDatanodes of BlockMananger, the liveBlockIndices is 
> [0,1,2,3,4,5,7,8], Block Counter is, Live:7, Decommission:2. 
> In method scheduleReconstruction of BlockManager, the additionalReplRequired 
> is 9 - 7 = 2. After Namenode choose two target Datanode, will assign a 
> erasureCode task to target datanode.
> When datanode get the task will build  targetIndices from liveBlockIndices 
> and target length. the code is blow.
> {code:java}
> // code placeholder
> targetIndices = new short[targets.length];
> private void initTargetIndices() { 
>   BitSet bitset = reconstructor.getLiveBitSet();
>   int m = 0; hasValidTargets = false; 
>   for (int i = 0; i < dataBlkNum + parityBlkNum; i++) {  
> if (!bitset.get) {    
>   if (reconstructor.getBlockLen > 0) {
>        if (m < targets.length) {
>          targetIndices[m++] = (short)i;
>          hasValidTargets = true;
>         }
>       }
>     }
>  }
> {code}
> targetIndices[0]=6, and targetIndices[1] is aways 0 from initial value.
> The StripedReader is  aways create reader from first 6 index block, and is 
> [0,1,2,3,4,5]
> Use the index [0,1,2,3,4,5] to build target index[6,0] will trigger the isal 
> bug. the block index6's data is corruption(all data is zero).
> I write a unit test can stabilize repreduce.
> {code:java}
> // code placeholder
> public void testFileDecommission() throws Exception {
>   LOG.info("Starting test testFileDecommission");
>   final Path ecFile = new Path(ecDir, "testFileDecommission");
>   int writeBytes = cellSize * dataBlocks;
>   writeStripedFile(dfs, ecFile, writeBytes);
>   Assert.assertEquals(0, bm.numOfUnderReplicatedBlocks());
>   FileChecksum fileChecksum1 = dfs.getFileChecksum(ecFile, writeBytes);
>   LocatedBlocks locatedBlocks =
>   StripedFileTestUtil.getLocatedBlocks(ecFile, dfs);
>   LocatedBlock lb = dfs.getClient().getLocatedBlocks(ecFile.toString(), 0)
>   .get(0);
>   DatanodeInfo[] dnLocs = lb.getLocations();
>   LocatedStripedBlock lastBlock =
>   (LocatedStripedBlock)locatedBlocks.getLastLocatedBlock();
>   DatanodeInfo[] storageInfos = lastBlock.getLocations();
>   // 
>   DatanodeDescriptor datanodeDescriptor = 
> cluster.getNameNode().getNamesystem()
>   
> .getBlockManager().getDatanodeManager().getDatanode(storageInfos[6].getDatanodeUuid());
>   for (int i = 0; i < 100; i++) {
> datanodeDescriptor.incrementPendingReplicationWithoutTargets();
>   }
>   assertEquals(dataBlocks + parityBlocks, dnLocs.length);
>   int[] decommNodeIndex = {3, 4};
>   final List decommisionNodes = new ArrayList();
>   // add the node which will be decommissioning
>   decommisionNodes.add(dnLocs[decommNodeIndex[0]]);
>   decommisionNodes.add(dnLocs[deco

[jira] [Comment Edited] (HDFS-14768) In some cases, erasure blocks are corruption when they are reconstruct.

2019-09-11 Thread Zhao Yi Ming (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-14768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16928167#comment-16928167
 ] 

Zhao Yi Ming edited comment on HDFS-14768 at 9/12/19 3:34 AM:
--

[~gjhkael] Thanks for your update!  Here is one question, in your UT if move 
the decommisionNodes() before getLastLocatedBlock(), will it effect the UT 
result? Because I did this changes, the UT can run as normal.

move following code
{code:java}
// code placeholder
  int[] decommNodeIndex = {3, 4};
  final List decommisionNodes = new 
ArrayList();
  // add the node which will be decommissioning
  decommisionNodes.add(dnLocs[decommNodeIndex[0]]);
  decommisionNodes.add(dnLocs[decommNodeIndex[1]]);
  decommissionNode(0, decommisionNodes, AdminStates.DECOMMISSIONED);
  assertEquals(decommisionNodes.size(), fsn.getNumDecomLiveDataNodes());
{code}
to
{code:java}
// code placeholder
  LocatedBlock lb = dfs.getClient().getLocatedBlocks(ecFile.toString(), 
0)
      .get(0);
  DatanodeInfo[] dnLocs = lb.getLocations();

=> Here

  LocatedStripedBlock lastBlock =
      (LocatedStripedBlock)locatedBlocks.getLastLocatedBlock();
  DatanodeInfo[] storageInfos = lastBlock.getLocations();
{code}


was (Author: zhaoyim):
[~gjhkael] Thanks for your update!  Here is one question, in your UT if move 
the decommisionNodes() before getLastLocatedBlock(), will it effect the UT 
result? Because I did this changes, the UT can run as normal.

move following code
{code:java}
// code placeholder
  int[] decommNodeIndex = {3, 4};
  final List decommisionNodes = new 
ArrayList();
  // add the node which will be decommissioning
  decommisionNodes.add(dnLocs[decommNodeIndex[0]]);
  decommisionNodes.add(dnLocs[decommNodeIndex[1]]);
  decommissionNode(0, decommisionNodes, AdminStates.DECOMMISSIONED);
  assertEquals(decommisionNodes.size(), fsn.getNumDecomLiveDataNodes());
{code}
to
{code:java}
// code placeholder
  LocatedBlock lb = dfs.getClient().getLocatedBlocks(ecFile.toString(), 
0)
      .get(0);
  DatanodeInfo[] dnLocs = lb.getLocations();

=> Here

  LocatedStripedBlock lastBlock =
      (LocatedStripedBlock)locatedBlocks.getLastLocatedBlock();
  DatanodeInfo[] storageInfos = lastBlock.getLocations();
{code}
 

 

 

> In some cases, erasure blocks are corruption  when they are reconstruct.
> 
>
> Key: HDFS-14768
> URL: https://issues.apache.org/jira/browse/HDFS-14768
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode, erasure-coding, hdfs, namenode
>Affects Versions: 3.0.2
>Reporter: guojh
>Assignee: guojh
>Priority: Major
>  Labels: patch
> Fix For: 3.3.0
>
> Attachments: HDFS-14768.000.patch
>
>
> Policy is RS-6-3-1024K, version is hadoop 3.0.2;
> We suppose a file's block Index is [0,1,2,3,4,5,6,7,8], And decommission 
> index[3,4], increase the index 6 datanode's
> pendingReplicationWithoutTargets  that make it large than 
> replicationStreamsHardLimit(we set 14). Then, After the method 
> chooseSourceDatanodes of BlockMananger, the liveBlockIndices is 
> [0,1,2,3,4,5,7,8], Block Counter is, Live:7, Decommission:2. 
> In method scheduleReconstruction of BlockManager, the additionalReplRequired 
> is 9 - 7 = 2. After Namenode choose two target Datanode, will assign a 
> erasureCode task to target datanode.
> When datanode get the task will build  targetIndices from liveBlockIndices 
> and target length. the code is blow.
> {code:java}
> // code placeholder
> targetIndices = new short[targets.length];
> private void initTargetIndices() { 
>   BitSet bitset = reconstructor.getLiveBitSet();
>   int m = 0; hasValidTargets = false; 
>   for (int i = 0; i < dataBlkNum + parityBlkNum; i++) {  
> if (!bitset.get) {    
>   if (reconstructor.getBlockLen > 0) {
>        if (m < targets.length) {
>          targetIndices[m++] = (short)i;
>          hasValidTargets = true;
>         }
>       }
>     }
>  }
> {code}
> targetIndices[0]=6, and targetIndices[1] is aways 0 from initial value.
> The StripedReader is  aways create reader from first 6 index block, and is 
> [0,1,2,3,4,5]
> Use the index [0,1,2,3,4,5] to build target index[6,0] will trigger the isal 
> bug. the block index6's data is corruption(all data is zero).
> I write a unit test can stabilize repreduce.
> {code:java}
> // code placeholder
> public void testFileDecommission() throws Exception {
>   LOG.info("Starting test testFileDecommission");
>   final Path ecFile = new Path(ecDir, "testFileDecommission");
>   int writeBytes = cellSize * d

[jira] [Commented] (HDFS-14609) RBF: Security should use common AuthenticationFilter

2019-09-11 Thread Hadoop QA (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-14609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16928163#comment-16928163
 ] 

Hadoop QA commented on HDFS-14609:
--

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
22s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 3 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 18m 
28s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
27s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
20s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
30s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m  7s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
52s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
34s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
26s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
23s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
23s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
15s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
26s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m 40s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
57s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
31s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 21m 
29s{color} | {color:green} hadoop-hdfs-rbf in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
25s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 71m 54s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=19.03.1 Server=19.03.1 Image:yetus/hadoop:bdbca0e53b4 |
| JIRA Issue | HDFS-14609 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12980140/HDFS-14609.006.patch |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  findbugs  checkstyle  |
| uname | Linux d8bb824d1b38 4.15.0-60-generic #67-Ubuntu SMP Thu Aug 22 
16:55:30 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / f537410 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_222 |
| findbugs | v3.1.0-RC1 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-HDFS-Build/27848/testReport/ |
| Max. process+thread count | 1585 (vs. ulimit of 5500) |
| modules | C: hadoop-hdfs-project/hadoop-hdfs-rbf U: 
hadoop-hdfs-project/hadoop-hdfs-rbf |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/27848/console |
| Powered by | Apache Yetus 0.8.0   http://yetus.apache.org |


This message was automatically generated.



> RBF: Security should use common AuthenticationFilter
> 
>
> Key: HDFS-14609
>

[jira] [Commented] (HDFS-14832) RBF : Add Icon for ReadOnly False

2019-09-11 Thread Takanobu Asanuma (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-14832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16928161#comment-16928161
 ] 

Takanobu Asanuma commented on HDFS-14832:
-

Thanks for working on this, [~hemanthboyina], and thanks for pinging me, 
[~elgoiri].

If we have both of read-only and read-write icons, we may need 
{{federationhealth-mounttable-legend}} section like other pages.

> RBF : Add Icon for ReadOnly False
> -
>
> Key: HDFS-14832
> URL: https://issues.apache.org/jira/browse/HDFS-14832
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: hemanthboyina
>Assignee: hemanthboyina
>Priority: Minor
>
> In Router Web UI for Mount Table information , add icon for read only state 
> false 



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-14564) Add libhdfs APIs for readFully; add readFully to ByteBufferPositionedReadable

2019-09-11 Thread Hadoop QA (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-14564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16928160#comment-16928160
 ] 

Hadoop QA commented on HDFS-14564:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
39s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} dupname {color} | {color:green}  0m  
1s{color} | {color:green} No case conflicting files found. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 5 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  1m  
6s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 17m 
 5s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 16m 
27s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  3m 
 8s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  4m 
24s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
19m 47s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  3m 
41s{color} | {color:green} trunk passed {color} |
| {color:blue}0{color} | {color:blue} spotbugs {color} | {color:blue}  0m 
30s{color} | {color:blue} Used deprecated FindBugs config; considering 
switching to SpotBugs. {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m 
30s{color} | {color:blue} branch/hadoop-hdfs-project/hadoop-hdfs-native-client 
no findbugs output file (findbugsXml.xml) {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
24s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  2m 
57s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 15m 
54s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} cc {color} | {color:green} 15m 
54s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 15m 
54s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  2m 
25s{color} | {color:green} root: The patch generated 0 new + 110 unchanged - 1 
fixed = 110 total (was 111) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  4m 
18s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m 26s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  3m 
41s{color} | {color:green} the patch passed {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m 
35s{color} | {color:blue} hadoop-hdfs-project/hadoop-hdfs-native-client has no 
data from findbugs {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  8m 
55s{color} | {color:green} hadoop-common in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  2m 
12s{color} | {color:green} hadoop-hdfs-client in the patch passed. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 85m 51s{color} 
| {color:red} hadoop-hdfs in the patch failed. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red}  3m 11s{color} 
| {color:red} hadoop-hdfs-native-client in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
57s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} |

[jira] [Work logged] (HDDS-1879) Support multiple excluded scopes when choosing datanodes in NetworkTopology

2019-09-11 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HDDS-1879?focusedWorklogId=311132&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-311132
 ]

ASF GitHub Bot logged work on HDDS-1879:


Author: ASF GitHub Bot
Created on: 12/Sep/19 03:02
Start Date: 12/Sep/19 03:02
Worklog Time Spent: 10m 
  Work Description: ChenSammi commented on issue #1194: HDDS-1879.  Support 
multiple excluded scopes when choosing datanodes in NetworkTopology
URL: https://github.com/apache/hadoop/pull/1194#issuecomment-530644666
 
 
   Thanks @xiaoyuyao  for the review.   A new patch to fix the check style 
issue.  The failed UTs are not related.  Will commit after the Yetus build. 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 311132)
Time Spent: 4h  (was: 3h 50m)

> Support multiple excluded scopes when choosing datanodes in NetworkTopology
> ---
>
> Key: HDDS-1879
> URL: https://issues.apache.org/jira/browse/HDDS-1879
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>Reporter: Sammi Chen
>Assignee: Sammi Chen
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 4h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-14843) Double Synchronization in BlockReportLeaseManager

2019-09-11 Thread Supratim Deka (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-14843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16928155#comment-16928155
 ] 

Supratim Deka commented on HDFS-14843:
--

+1
Thanks for the patch [~belugabehr], looks good to me.

> Double Synchronization in BlockReportLeaseManager
> -
>
> Key: HDFS-14843
> URL: https://issues.apache.org/jira/browse/HDFS-14843
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Minor
> Attachments: HDFS-14843.1.patch
>
>
> {code:java|title=BlockReportLeaseManager.java}
>   private synchronized long getNextId() {
> long id;
> do {
>   id = nextId++;
> } while (id == 0);
> return id;
>   }
> {code}
> This is a private method and is synchronized, however, it is only be accessed 
> from an already-synchronized method.  No need to double-synchronize.
> https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockReportLeaseManager.java#L183-L189
> https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockReportLeaseManager.java#L227



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-14768) In some cases, erasure blocks are corruption when they are reconstruct.

2019-09-11 Thread guojh (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-14768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16928146#comment-16928146
 ] 

guojh commented on HDFS-14768:
--

[~zhaoyim] My UT is not good, you need to check the block manually, I will 
update a new UT to check the block corruption  automatic.

> In some cases, erasure blocks are corruption  when they are reconstruct.
> 
>
> Key: HDFS-14768
> URL: https://issues.apache.org/jira/browse/HDFS-14768
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode, erasure-coding, hdfs, namenode
>Affects Versions: 3.0.2
>Reporter: guojh
>Assignee: guojh
>Priority: Major
>  Labels: patch
> Fix For: 3.3.0
>
> Attachments: HDFS-14768.000.patch
>
>
> Policy is RS-6-3-1024K, version is hadoop 3.0.2;
> We suppose a file's block Index is [0,1,2,3,4,5,6,7,8], And decommission 
> index[3,4], increase the index 6 datanode's
> pendingReplicationWithoutTargets  that make it large than 
> replicationStreamsHardLimit(we set 14). Then, After the method 
> chooseSourceDatanodes of BlockMananger, the liveBlockIndices is 
> [0,1,2,3,4,5,7,8], Block Counter is, Live:7, Decommission:2. 
> In method scheduleReconstruction of BlockManager, the additionalReplRequired 
> is 9 - 7 = 2. After Namenode choose two target Datanode, will assign a 
> erasureCode task to target datanode.
> When datanode get the task will build  targetIndices from liveBlockIndices 
> and target length. the code is blow.
> {code:java}
> // code placeholder
> targetIndices = new short[targets.length];
> private void initTargetIndices() { 
>   BitSet bitset = reconstructor.getLiveBitSet();
>   int m = 0; hasValidTargets = false; 
>   for (int i = 0; i < dataBlkNum + parityBlkNum; i++) {  
> if (!bitset.get) {    
>   if (reconstructor.getBlockLen > 0) {
>        if (m < targets.length) {
>          targetIndices[m++] = (short)i;
>          hasValidTargets = true;
>         }
>       }
>     }
>  }
> {code}
> targetIndices[0]=6, and targetIndices[1] is aways 0 from initial value.
> The StripedReader is  aways create reader from first 6 index block, and is 
> [0,1,2,3,4,5]
> Use the index [0,1,2,3,4,5] to build target index[6,0] will trigger the isal 
> bug. the block index6's data is corruption(all data is zero).
> I write a unit test can stabilize repreduce.
> {code:java}
> // code placeholder
> public void testFileDecommission() throws Exception {
>   LOG.info("Starting test testFileDecommission");
>   final Path ecFile = new Path(ecDir, "testFileDecommission");
>   int writeBytes = cellSize * dataBlocks;
>   writeStripedFile(dfs, ecFile, writeBytes);
>   Assert.assertEquals(0, bm.numOfUnderReplicatedBlocks());
>   FileChecksum fileChecksum1 = dfs.getFileChecksum(ecFile, writeBytes);
>   LocatedBlocks locatedBlocks =
>   StripedFileTestUtil.getLocatedBlocks(ecFile, dfs);
>   LocatedBlock lb = dfs.getClient().getLocatedBlocks(ecFile.toString(), 0)
>   .get(0);
>   DatanodeInfo[] dnLocs = lb.getLocations();
>   LocatedStripedBlock lastBlock =
>   (LocatedStripedBlock)locatedBlocks.getLastLocatedBlock();
>   DatanodeInfo[] storageInfos = lastBlock.getLocations();
>   // 
>   DatanodeDescriptor datanodeDescriptor = 
> cluster.getNameNode().getNamesystem()
>   
> .getBlockManager().getDatanodeManager().getDatanode(storageInfos[6].getDatanodeUuid());
>   for (int i = 0; i < 100; i++) {
> datanodeDescriptor.incrementPendingReplicationWithoutTargets();
>   }
>   assertEquals(dataBlocks + parityBlocks, dnLocs.length);
>   int[] decommNodeIndex = {3, 4};
>   final List decommisionNodes = new ArrayList();
>   // add the node which will be decommissioning
>   decommisionNodes.add(dnLocs[decommNodeIndex[0]]);
>   decommisionNodes.add(dnLocs[decommNodeIndex[1]]);
>   decommissionNode(0, decommisionNodes, AdminStates.DECOMMISSIONED);
>   assertEquals(decommisionNodes.size(), fsn.getNumDecomLiveDataNodes());
>   //assertNull(checkFile(dfs, ecFile, 9, decommisionNodes, numDNs));
>   // Ensure decommissioned datanode is not automatically shutdown
>   DFSClient client = getDfsClient(cluster.getNameNode(0), conf);
>   assertEquals("All datanodes must be alive", numDNs,
>   client.datanodeReport(DatanodeReportType.LIVE).length);
>   FileChecksum fileChecksum2 = dfs.getFileChecksum(ecFile, writeBytes);
>   Assert.assertTrue("Checksum mismatches!",
>   fileChecksum1.equals(fileChecksum2));
>   StripedFileTestUtil.checkData(dfs, ecFile, writeBytes, decommisionNodes,
>   null, blockGroupSize);
> }
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional

[jira] [Commented] (HDFS-14768) In some cases, erasure blocks are corruption when they are reconstruct.

2019-09-11 Thread guojh (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-14768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16928145#comment-16928145
 ] 

guojh commented on HDFS-14768:
--

[~surendrasingh] Sorry about my UT. I will update a new UT to check the block 
corruption.

> In some cases, erasure blocks are corruption  when they are reconstruct.
> 
>
> Key: HDFS-14768
> URL: https://issues.apache.org/jira/browse/HDFS-14768
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode, erasure-coding, hdfs, namenode
>Affects Versions: 3.0.2
>Reporter: guojh
>Assignee: guojh
>Priority: Major
>  Labels: patch
> Fix For: 3.3.0
>
> Attachments: HDFS-14768.000.patch
>
>
> Policy is RS-6-3-1024K, version is hadoop 3.0.2;
> We suppose a file's block Index is [0,1,2,3,4,5,6,7,8], And decommission 
> index[3,4], increase the index 6 datanode's
> pendingReplicationWithoutTargets  that make it large than 
> replicationStreamsHardLimit(we set 14). Then, After the method 
> chooseSourceDatanodes of BlockMananger, the liveBlockIndices is 
> [0,1,2,3,4,5,7,8], Block Counter is, Live:7, Decommission:2. 
> In method scheduleReconstruction of BlockManager, the additionalReplRequired 
> is 9 - 7 = 2. After Namenode choose two target Datanode, will assign a 
> erasureCode task to target datanode.
> When datanode get the task will build  targetIndices from liveBlockIndices 
> and target length. the code is blow.
> {code:java}
> // code placeholder
> targetIndices = new short[targets.length];
> private void initTargetIndices() { 
>   BitSet bitset = reconstructor.getLiveBitSet();
>   int m = 0; hasValidTargets = false; 
>   for (int i = 0; i < dataBlkNum + parityBlkNum; i++) {  
> if (!bitset.get) {    
>   if (reconstructor.getBlockLen > 0) {
>        if (m < targets.length) {
>          targetIndices[m++] = (short)i;
>          hasValidTargets = true;
>         }
>       }
>     }
>  }
> {code}
> targetIndices[0]=6, and targetIndices[1] is aways 0 from initial value.
> The StripedReader is  aways create reader from first 6 index block, and is 
> [0,1,2,3,4,5]
> Use the index [0,1,2,3,4,5] to build target index[6,0] will trigger the isal 
> bug. the block index6's data is corruption(all data is zero).
> I write a unit test can stabilize repreduce.
> {code:java}
> // code placeholder
> public void testFileDecommission() throws Exception {
>   LOG.info("Starting test testFileDecommission");
>   final Path ecFile = new Path(ecDir, "testFileDecommission");
>   int writeBytes = cellSize * dataBlocks;
>   writeStripedFile(dfs, ecFile, writeBytes);
>   Assert.assertEquals(0, bm.numOfUnderReplicatedBlocks());
>   FileChecksum fileChecksum1 = dfs.getFileChecksum(ecFile, writeBytes);
>   LocatedBlocks locatedBlocks =
>   StripedFileTestUtil.getLocatedBlocks(ecFile, dfs);
>   LocatedBlock lb = dfs.getClient().getLocatedBlocks(ecFile.toString(), 0)
>   .get(0);
>   DatanodeInfo[] dnLocs = lb.getLocations();
>   LocatedStripedBlock lastBlock =
>   (LocatedStripedBlock)locatedBlocks.getLastLocatedBlock();
>   DatanodeInfo[] storageInfos = lastBlock.getLocations();
>   // 
>   DatanodeDescriptor datanodeDescriptor = 
> cluster.getNameNode().getNamesystem()
>   
> .getBlockManager().getDatanodeManager().getDatanode(storageInfos[6].getDatanodeUuid());
>   for (int i = 0; i < 100; i++) {
> datanodeDescriptor.incrementPendingReplicationWithoutTargets();
>   }
>   assertEquals(dataBlocks + parityBlocks, dnLocs.length);
>   int[] decommNodeIndex = {3, 4};
>   final List decommisionNodes = new ArrayList();
>   // add the node which will be decommissioning
>   decommisionNodes.add(dnLocs[decommNodeIndex[0]]);
>   decommisionNodes.add(dnLocs[decommNodeIndex[1]]);
>   decommissionNode(0, decommisionNodes, AdminStates.DECOMMISSIONED);
>   assertEquals(decommisionNodes.size(), fsn.getNumDecomLiveDataNodes());
>   //assertNull(checkFile(dfs, ecFile, 9, decommisionNodes, numDNs));
>   // Ensure decommissioned datanode is not automatically shutdown
>   DFSClient client = getDfsClient(cluster.getNameNode(0), conf);
>   assertEquals("All datanodes must be alive", numDNs,
>   client.datanodeReport(DatanodeReportType.LIVE).length);
>   FileChecksum fileChecksum2 = dfs.getFileChecksum(ecFile, writeBytes);
>   Assert.assertTrue("Checksum mismatches!",
>   fileChecksum1.equals(fileChecksum2));
>   StripedFileTestUtil.checkData(dfs, ecFile, writeBytes, decommisionNodes,
>   null, blockGroupSize);
> }
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.

[jira] [Commented] (HDFS-14437) Exception happened when rollEditLog expects empty EditsDoubleBuffer.bufCurrent but not

2019-09-11 Thread angerszhu (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-14437?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16928141#comment-16928141
 ] 

angerszhu commented on HDFS-14437:
--

[~daryn] 

In current hadoop truck branch, still have this bug.

> Exception happened when   rollEditLog expects empty 
> EditsDoubleBuffer.bufCurrent  but not
> -
>
> Key: HDFS-14437
> URL: https://issues.apache.org/jira/browse/HDFS-14437
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: ha, namenode, qjm
>Reporter: angerszhu
>Priority: Major
>
> For the problem mentioned in https://issues.apache.org/jira/browse/HDFS-10943 
> , I have sort the process of write and flush EditLog and some important 
> function, I found the in the class  FSEditLog class, the close() function 
> will call such process like below:
>  
> {code:java}
> waitForSyncToFinish();
> endCurrentLogSegment(true);{code}
> since we have gain the object lock in the function close(), so when  
> waitForSyncToFish() method return, it mean all logSync job has done and all 
> data in bufReady has been flushed out, and since current thread has the lock 
> of this object, when call endCurrentLogSegment(), no other thread will gain 
> the lock so they can't write new editlog into currentBuf.
> But when we don't call waitForSyncToFish() before endCurrentLogSegment(), 
> there may be some autoScheduled logSync()'s flush process is doing, since 
> this process don't need
> synchronization since it has mention in the comment of logSync() method :
>  
> {code:java}
> /**
>  * Sync all modifications done by this thread.
>  *
>  * The internal concurrency design of this class is as follows:
>  *   - Log items are written synchronized into an in-memory buffer,
>  * and each assigned a transaction ID.
>  *   - When a thread (client) would like to sync all of its edits, logSync()
>  * uses a ThreadLocal transaction ID to determine what edit number must
>  * be synced to.
>  *   - The isSyncRunning volatile boolean tracks whether a sync is currently
>  * under progress.
>  *
>  * The data is double-buffered within each edit log implementation so that
>  * in-memory writing can occur in parallel with the on-disk writing.
>  *
>  * Each sync occurs in three steps:
>  *   1. synchronized, it swaps the double buffer and sets the isSyncRunning
>  *  flag.
>  *   2. unsynchronized, it flushes the data to storage
>  *   3. synchronized, it resets the flag and notifies anyone waiting on the
>  *  sync.
>  *
>  * The lack of synchronization on step 2 allows other threads to continue
>  * to write into the memory buffer while the sync is in progress.
>  * Because this step is unsynchronized, actions that need to avoid
>  * concurrency with sync() should be synchronized and also call
>  * waitForSyncToFinish() before assuming they are running alone.
>  */
> public void logSync() {
>   long syncStart = 0;
>   // Fetch the transactionId of this thread. 
>   long mytxid = myTransactionId.get().txid;
>   
>   boolean sync = false;
>   try {
> EditLogOutputStream logStream = null;
> synchronized (this) {
>   try {
> printStatistics(false);
> // if somebody is already syncing, then wait
> while (mytxid > synctxid && isSyncRunning) {
>   try {
> wait(1000);
>   } catch (InterruptedException ie) {
>   }
> }
> //
> // If this transaction was already flushed, then nothing to do
> //
> if (mytxid <= synctxid) {
>   numTransactionsBatchedInSync++;
>   if (metrics != null) {
> // Metrics is non-null only when used inside name node
> metrics.incrTransactionsBatchedInSync();
>   }
>   return;
> }
>
> // now, this thread will do the sync
> syncStart = txid;
> isSyncRunning = true;
> sync = true;
> // swap buffers
> try {
>   if (journalSet.isEmpty()) {
> throw new IOException("No journals available to flush");
>   }
>   editLogStream.setReadyToFlush();
> } catch (IOException e) {
>   final String msg =
>   "Could not sync enough journals to persistent storage " +
>   "due to " + e.getMessage() + ". " +
>   "Unsynced transactions: " + (txid - synctxid);
>   LOG.fatal(msg, new Exception());
>   synchronized(journalSetLock) {
> IOUtils.cleanup(LOG, journalSet);
>   }
>   terminate(1, msg);
> }
>   } finally {
> // Prevent RuntimeException from blocking other log edit write 
> doneWithAutoSyncScheduling();
>   }
>   //editLogStream may become null,
>   //

[jira] [Updated] (HDFS-14609) RBF: Security should use common AuthenticationFilter

2019-09-11 Thread Chen Zhang (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-14609?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chen Zhang updated HDFS-14609:
--
Attachment: HDFS-14609.006.patch

> RBF: Security should use common AuthenticationFilter
> 
>
> Key: HDFS-14609
> URL: https://issues.apache.org/jira/browse/HDFS-14609
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: CR Hota
>Assignee: Chen Zhang
>Priority: Major
> Attachments: HDFS-14609.001.patch, HDFS-14609.002.patch, 
> HDFS-14609.003.patch, HDFS-14609.004.patch, HDFS-14609.005.patch, 
> HDFS-14609.006.patch
>
>
> We worked on router based federation security as part of HDFS-13532. We kept 
> it compatible with the way namenode works. However with HADOOP-16314 and 
> HDFS-16354 in trunk, auth filters seems to have been changed causing tests to 
> fail.
> Changes are needed appropriately in RBF, mainly fixing broken tests.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-14609) RBF: Security should use common AuthenticationFilter

2019-09-11 Thread Chen Zhang (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-14609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16928138#comment-16928138
 ] 

Chen Zhang commented on HDFS-14609:
---

Thanks [~eyang] for your response, upload patch v6 to fix checkstyle error.

> RBF: Security should use common AuthenticationFilter
> 
>
> Key: HDFS-14609
> URL: https://issues.apache.org/jira/browse/HDFS-14609
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: CR Hota
>Assignee: Chen Zhang
>Priority: Major
> Attachments: HDFS-14609.001.patch, HDFS-14609.002.patch, 
> HDFS-14609.003.patch, HDFS-14609.004.patch, HDFS-14609.005.patch, 
> HDFS-14609.006.patch
>
>
> We worked on router based federation security as part of HDFS-13532. We kept 
> it compatible with the way namenode works. However with HADOOP-16314 and 
> HDFS-16354 in trunk, auth filters seems to have been changed causing tests to 
> fail.
> Changes are needed appropriately in RBF, mainly fixing broken tests.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-14811) RBF: TestRouterRpc#testErasureCoding is flaky

2019-09-11 Thread Chen Zhang (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-14811?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16928132#comment-16928132
 ] 

Chen Zhang commented on HDFS-14811:
---

Actually, I'm considering another improvement: Adding a configurable threshold 
of counting overloaded(e.g. 50), if the xceiverCount is lower than that 
threshold, then that DN will not count as an overloaded node. Anyway, a DN with 
xceiverCount 2 is treated as an overloaded node is not reasonable.

Do you think it's a better solution? [~elgoiri] [~ayushtkn]

> RBF: TestRouterRpc#testErasureCoding is flaky
> -
>
> Key: HDFS-14811
> URL: https://issues.apache.org/jira/browse/HDFS-14811
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Chen Zhang
>Assignee: Chen Zhang
>Priority: Major
> Attachments: HDFS-14811.001.patch, HDFS-14811.002.patch
>
>
> The Failed reason:
> {code:java}
> 2019-09-01 18:19:20,940 [IPC Server handler 5 on default port 53140] INFO  
> blockmanagement.BlockPlacementPolicy 
> (BlockPlacementPolicyDefault.java:chooseRandom(838)) - [
> Node /default-rack/127.0.0.1:53148 [
> ]
> Node /default-rack/127.0.0.1:53161 [
> ]
> Node /default-rack/127.0.0.1:53157 [
>   Datanode 127.0.0.1:53157 is not chosen since the node is too busy (load: 3 
> > 2.6665).
> Node /default-rack/127.0.0.1:53143 [
> ]
> Node /default-rack/127.0.0.1:53165 [
> ]
> 2019-09-01 18:19:20,940 [IPC Server handler 5 on default port 53140] INFO  
> blockmanagement.BlockPlacementPolicy 
> (BlockPlacementPolicyDefault.java:chooseRandom(846)) - Not enough replicas 
> was chosen. Reason: {NODE_TOO_BUSY=1}
> 2019-09-01 18:19:20,941 [IPC Server handler 5 on default port 53140] WARN  
> blockmanagement.BlockPlacementPolicy 
> (BlockPlacementPolicyDefault.java:chooseTarget(449)) - Failed to place enough 
> replicas, still in need of 1 to reach 6 (unavailableStorages=[], 
> storagePolicy=BlockStoragePolicy{HOT:7, storageTypes=[DISK], 
> creationFallbacks=[], replicationFallbacks=[ARCHIVE]}, newBlock=true) 
> 2019-09-01 18:19:20,941 [IPC Server handler 5 on default port 53140] WARN  
> protocol.BlockStoragePolicy (BlockStoragePolicy.java:chooseStorageTypes(161)) 
> - Failed to place enough replicas: expected size is 1 but only 0 storage 
> types can be selected (replication=6, selected=[], unavailable=[DISK], 
> removed=[DISK], policy=BlockStoragePolicy{HOT:7, storageTypes=[DISK], 
> creationFallbacks=[], replicationFallbacks=[ARCHIVE]})
> 2019-09-01 18:19:20,941 [IPC Server handler 5 on default port 53140] WARN  
> blockmanagement.BlockPlacementPolicy 
> (BlockPlacementPolicyDefault.java:chooseTarget(449)) - Failed to place enough 
> replicas, still in need of 1 to reach 6 (unavailableStorages=[DISK], 
> storagePolicy=BlockStoragePolicy{HOT:7, storageTypes=[DISK], 
> creationFallbacks=[], replicationFallbacks=[ARCHIVE]}, newBlock=true) All 
> required storage types are unavailable:  unavailableStorages=[DISK], 
> storagePolicy=BlockStoragePolicy{HOT:7, storageTypes=[DISK], 
> creationFallbacks=[], replicationFallbacks=[ARCHIVE]}
> 2019-09-01 18:19:20,941 [IPC Server handler 5 on default port 53140] INFO  
> ipc.Server (Server.java:logException(2982)) - IPC Server handler 5 on default 
> port 53140, call Call#1270 Retry#0 
> org.apache.hadoop.hdfs.protocol.ClientProtocol.addBlock from 127.0.0.1:53202
> java.io.IOException: File /testec/testfile2 could only be written to 5 of the 
> 6 required nodes for RS-6-3-1024k. There are 6 datanode(s) running and 6 
> node(s) are excluded in this operation.
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget4NewBlock(BlockManager.java:)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSDirWriteFileOp.chooseTargetForNewBlock(FSDirWriteFileOp.java:294)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2815)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:893)
>   at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:574)
>   at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:529)
>   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1070)
>   at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:1001)
>   at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:929)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserG

[jira] [Comment Edited] (HDFS-14811) RBF: TestRouterRpc#testErasureCoding is flaky

2019-09-11 Thread Chen Zhang (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-14811?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16928128#comment-16928128
 ] 

Chen Zhang edited comment on HDFS-14811 at 9/12/19 1:39 AM:


{quote}Let's see if we can fix the count of active threads.
{quote}
[~elgoiri]，HDFS-12288 won't help in this case. when some client writing data on 
1 DN, it will start 2 threads({{DatanodeXceiver}} and {{PacketResponder}}), 
after the patch of HDFS-12288, {{DatanodeXceiverServer}} thread will not be 
included in {{xceiverCount}} of heartbeat. In this case, there will be 5 DN 
with {{xceiverCount}} equals to 0 and 1 DN equals to 2(which is overloaded).


was (Author: zhangchen):
{quote}Let's see if we can fix the count of active threads.
{quote}
[~elgoiri]，HDFS-12288 won't help in this case. when some client writing data on 
1 DN, it will start 2 threads(\{{DatanodeXceiver}} and {{PacketResponder}}), 
after the patch of HDFS-12288, {{DatanodeXceiverServer}} will not included in 
{{xceiverCount}} of heartbeat. In this case, there will be 5 DN with 
{{xceiverCount}} equals to 0 and 1 DN equals to 2(which is overloaded).

> RBF: TestRouterRpc#testErasureCoding is flaky
> -
>
> Key: HDFS-14811
> URL: https://issues.apache.org/jira/browse/HDFS-14811
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Chen Zhang
>Assignee: Chen Zhang
>Priority: Major
> Attachments: HDFS-14811.001.patch, HDFS-14811.002.patch
>
>
> The Failed reason:
> {code:java}
> 2019-09-01 18:19:20,940 [IPC Server handler 5 on default port 53140] INFO  
> blockmanagement.BlockPlacementPolicy 
> (BlockPlacementPolicyDefault.java:chooseRandom(838)) - [
> Node /default-rack/127.0.0.1:53148 [
> ]
> Node /default-rack/127.0.0.1:53161 [
> ]
> Node /default-rack/127.0.0.1:53157 [
>   Datanode 127.0.0.1:53157 is not chosen since the node is too busy (load: 3 
> > 2.6665).
> Node /default-rack/127.0.0.1:53143 [
> ]
> Node /default-rack/127.0.0.1:53165 [
> ]
> 2019-09-01 18:19:20,940 [IPC Server handler 5 on default port 53140] INFO  
> blockmanagement.BlockPlacementPolicy 
> (BlockPlacementPolicyDefault.java:chooseRandom(846)) - Not enough replicas 
> was chosen. Reason: {NODE_TOO_BUSY=1}
> 2019-09-01 18:19:20,941 [IPC Server handler 5 on default port 53140] WARN  
> blockmanagement.BlockPlacementPolicy 
> (BlockPlacementPolicyDefault.java:chooseTarget(449)) - Failed to place enough 
> replicas, still in need of 1 to reach 6 (unavailableStorages=[], 
> storagePolicy=BlockStoragePolicy{HOT:7, storageTypes=[DISK], 
> creationFallbacks=[], replicationFallbacks=[ARCHIVE]}, newBlock=true) 
> 2019-09-01 18:19:20,941 [IPC Server handler 5 on default port 53140] WARN  
> protocol.BlockStoragePolicy (BlockStoragePolicy.java:chooseStorageTypes(161)) 
> - Failed to place enough replicas: expected size is 1 but only 0 storage 
> types can be selected (replication=6, selected=[], unavailable=[DISK], 
> removed=[DISK], policy=BlockStoragePolicy{HOT:7, storageTypes=[DISK], 
> creationFallbacks=[], replicationFallbacks=[ARCHIVE]})
> 2019-09-01 18:19:20,941 [IPC Server handler 5 on default port 53140] WARN  
> blockmanagement.BlockPlacementPolicy 
> (BlockPlacementPolicyDefault.java:chooseTarget(449)) - Failed to place enough 
> replicas, still in need of 1 to reach 6 (unavailableStorages=[DISK], 
> storagePolicy=BlockStoragePolicy{HOT:7, storageTypes=[DISK], 
> creationFallbacks=[], replicationFallbacks=[ARCHIVE]}, newBlock=true) All 
> required storage types are unavailable:  unavailableStorages=[DISK], 
> storagePolicy=BlockStoragePolicy{HOT:7, storageTypes=[DISK], 
> creationFallbacks=[], replicationFallbacks=[ARCHIVE]}
> 2019-09-01 18:19:20,941 [IPC Server handler 5 on default port 53140] INFO  
> ipc.Server (Server.java:logException(2982)) - IPC Server handler 5 on default 
> port 53140, call Call#1270 Retry#0 
> org.apache.hadoop.hdfs.protocol.ClientProtocol.addBlock from 127.0.0.1:53202
> java.io.IOException: File /testec/testfile2 could only be written to 5 of the 
> 6 required nodes for RS-6-3-1024k. There are 6 datanode(s) running and 6 
> node(s) are excluded in this operation.
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget4NewBlock(BlockManager.java:)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSDirWriteFileOp.chooseTargetForNewBlock(FSDirWriteFileOp.java:294)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2815)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:893)
>   at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:574)
>   at 
> org.apache.hadoop.hdf

[jira] [Commented] (HDFS-14811) RBF: TestRouterRpc#testErasureCoding is flaky

2019-09-11 Thread Chen Zhang (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-14811?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16928128#comment-16928128
 ] 

Chen Zhang commented on HDFS-14811:
---

{quote}Let's see if we can fix the count of active threads.
{quote}
[~elgoiri]，HDFS-12288 won't help in this case. when some client writing data on 
1 DN, it will start 2 threads(\{{DatanodeXceiver}} and {{PacketResponder}}), 
after the patch of HDFS-12288, {{DatanodeXceiverServer}} will not included in 
{{xceiverCount}} of heartbeat. In this case, there will be 5 DN with 
{{xceiverCount}} equals to 0 and 1 DN equals to 2(which is overloaded).

> RBF: TestRouterRpc#testErasureCoding is flaky
> -
>
> Key: HDFS-14811
> URL: https://issues.apache.org/jira/browse/HDFS-14811
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Chen Zhang
>Assignee: Chen Zhang
>Priority: Major
> Attachments: HDFS-14811.001.patch, HDFS-14811.002.patch
>
>
> The Failed reason:
> {code:java}
> 2019-09-01 18:19:20,940 [IPC Server handler 5 on default port 53140] INFO  
> blockmanagement.BlockPlacementPolicy 
> (BlockPlacementPolicyDefault.java:chooseRandom(838)) - [
> Node /default-rack/127.0.0.1:53148 [
> ]
> Node /default-rack/127.0.0.1:53161 [
> ]
> Node /default-rack/127.0.0.1:53157 [
>   Datanode 127.0.0.1:53157 is not chosen since the node is too busy (load: 3 
> > 2.6665).
> Node /default-rack/127.0.0.1:53143 [
> ]
> Node /default-rack/127.0.0.1:53165 [
> ]
> 2019-09-01 18:19:20,940 [IPC Server handler 5 on default port 53140] INFO  
> blockmanagement.BlockPlacementPolicy 
> (BlockPlacementPolicyDefault.java:chooseRandom(846)) - Not enough replicas 
> was chosen. Reason: {NODE_TOO_BUSY=1}
> 2019-09-01 18:19:20,941 [IPC Server handler 5 on default port 53140] WARN  
> blockmanagement.BlockPlacementPolicy 
> (BlockPlacementPolicyDefault.java:chooseTarget(449)) - Failed to place enough 
> replicas, still in need of 1 to reach 6 (unavailableStorages=[], 
> storagePolicy=BlockStoragePolicy{HOT:7, storageTypes=[DISK], 
> creationFallbacks=[], replicationFallbacks=[ARCHIVE]}, newBlock=true) 
> 2019-09-01 18:19:20,941 [IPC Server handler 5 on default port 53140] WARN  
> protocol.BlockStoragePolicy (BlockStoragePolicy.java:chooseStorageTypes(161)) 
> - Failed to place enough replicas: expected size is 1 but only 0 storage 
> types can be selected (replication=6, selected=[], unavailable=[DISK], 
> removed=[DISK], policy=BlockStoragePolicy{HOT:7, storageTypes=[DISK], 
> creationFallbacks=[], replicationFallbacks=[ARCHIVE]})
> 2019-09-01 18:19:20,941 [IPC Server handler 5 on default port 53140] WARN  
> blockmanagement.BlockPlacementPolicy 
> (BlockPlacementPolicyDefault.java:chooseTarget(449)) - Failed to place enough 
> replicas, still in need of 1 to reach 6 (unavailableStorages=[DISK], 
> storagePolicy=BlockStoragePolicy{HOT:7, storageTypes=[DISK], 
> creationFallbacks=[], replicationFallbacks=[ARCHIVE]}, newBlock=true) All 
> required storage types are unavailable:  unavailableStorages=[DISK], 
> storagePolicy=BlockStoragePolicy{HOT:7, storageTypes=[DISK], 
> creationFallbacks=[], replicationFallbacks=[ARCHIVE]}
> 2019-09-01 18:19:20,941 [IPC Server handler 5 on default port 53140] INFO  
> ipc.Server (Server.java:logException(2982)) - IPC Server handler 5 on default 
> port 53140, call Call#1270 Retry#0 
> org.apache.hadoop.hdfs.protocol.ClientProtocol.addBlock from 127.0.0.1:53202
> java.io.IOException: File /testec/testfile2 could only be written to 5 of the 
> 6 required nodes for RS-6-3-1024k. There are 6 datanode(s) running and 6 
> node(s) are excluded in this operation.
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget4NewBlock(BlockManager.java:)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSDirWriteFileOp.chooseTargetForNewBlock(FSDirWriteFileOp.java:294)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2815)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:893)
>   at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:574)
>   at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:529)
>   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1070)
>   at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:1001)
>   at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:929)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.secur

[jira] [Commented] (HDFS-12288) Fix DataNode's xceiver count calculation

2019-09-11 Thread Chen Zhang (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-12288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16928125#comment-16928125
 ] 

Chen Zhang commented on HDFS-12288:
---

Thanks [~elgoiri] for your response, I'll add more tests to make the patch 
complete.

> Fix DataNode's xceiver count calculation
> 
>
> Key: HDFS-12288
> URL: https://issues.apache.org/jira/browse/HDFS-12288
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode, hdfs
>Reporter: Lukas Majercak
>Assignee: Chen Zhang
>Priority: Major
> Attachments: HDFS-12288.001.patch, HDFS-12288.002.patch, 
> HDFS-12288.003.patch
>
>
> The problem with the ThreadGroup.activeCount() method is that the method is 
> only a very rough estimate, and in reality returns the total number of 
> threads in the thread group as opposed to the threads actually running.
> In some DNs, we saw this to return 50~ for a long time, even though the 
> actual number of DataXceiver threads was next to none.
> This is a big issue as we use the xceiverCount to make decisions on the NN 
> for choosing replication source DN or returning DNs to clients for R/W.
> The plan is to reuse the DataNodeMetrics.dataNodeActiveXceiversCount value 
> which only accounts for actual number of DataXcevier threads currently 
> running and thus represents the load on the DN much better.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDDS-2106) Avoid usage of hadoop projects as parent of hdds/ozone

2019-09-11 Thread Elek, Marton (Jira)



 [ 
https://issues.apache.org/jira/browse/HDDS-2106?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Elek, Marton updated HDDS-2106:
---
Resolution: Fixed
Status: Resolved  (was: Patch Available)

> Avoid usage of hadoop projects as parent of hdds/ozone
> --
>
> Key: HDDS-2106
> URL: https://issues.apache.org/jira/browse/HDDS-2106
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>Reporter: Elek, Marton
>Assignee: Elek, Marton
>Priority: Blocker
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Ozone uses hadoop as a dependency. The dependency defined on multiple level:
>  1. the hadoop artifacts are defined in the  sections
>  2. both hadoop-ozone and hadoop-hdds projects uses "hadoop-project" as the 
> parent
> As we already have a slightly different assembly process it could be more 
> resilient to use a dedicated parent project instead of the hadoop one. With 
> this approach it will be easier to upgrade the versions as we don't need to 
> be careful about the pom contents only about the used dependencies.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDDS-2089) Add CLI createPipeline

2019-09-11 Thread Xiaoyu Yao (Jira)



 [ 
https://issues.apache.org/jira/browse/HDDS-2089?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaoyu Yao updated HDDS-2089:
-
Status: Patch Available  (was: Open)

> Add CLI createPipeline
> --
>
> Key: HDDS-2089
> URL: https://issues.apache.org/jira/browse/HDDS-2089
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>  Components: Ozone CLI
>Affects Versions: 0.5.0
>Reporter: Li Cheng
>Assignee: Li Cheng
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Add a SCMCLI to create pipeline for ozone.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDDS-2106) Avoid usage of hadoop projects as parent of hdds/ozone

2019-09-11 Thread Hudson (Jira)



[ 
https://issues.apache.org/jira/browse/HDDS-2106?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16928122#comment-16928122
 ] 

Hudson commented on HDDS-2106:
--

FAILURE: Integrated in Jenkins build Hadoop-trunk-Commit #17279 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/17279/])
HDDS-2106. Avoid usage of hadoop projects as parent of hdds/ozone (elek: rev 
f537410563e3966581442acc77f7d9f7fd95e3e5)
* (edit) pom.ozone.xml
* (edit) hadoop-ozone/pom.xml
* (edit) hadoop-hdds/pom.xml


> Avoid usage of hadoop projects as parent of hdds/ozone
> --
>
> Key: HDDS-2106
> URL: https://issues.apache.org/jira/browse/HDDS-2106
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>Reporter: Elek, Marton
>Assignee: Elek, Marton
>Priority: Blocker
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Ozone uses hadoop as a dependency. The dependency defined on multiple level:
>  1. the hadoop artifacts are defined in the  sections
>  2. both hadoop-ozone and hadoop-hdds projects uses "hadoop-project" as the 
> parent
> As we already have a slightly different assembly process it could be more 
> resilient to use a dedicated parent project instead of the hadoop one. With 
> this approach it will be easier to upgrade the versions as we don't need to 
> be careful about the pom contents only about the used dependencies.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Work logged] (HDDS-2106) Avoid usage of hadoop projects as parent of hdds/ozone

2019-09-11 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HDDS-2106?focusedWorklogId=311108&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-311108
 ]

ASF GitHub Bot logged work on HDDS-2106:


Author: ASF GitHub Bot
Created on: 12/Sep/19 01:23
Start Date: 12/Sep/19 01:23
Worklog Time Spent: 10m 
  Work Description: elek commented on pull request #1423: HDDS-2106. Avoid 
usage of hadoop projects as parent of hdds/ozone
URL: https://github.com/apache/hadoop/pull/1423
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 311108)
Time Spent: 0.5h  (was: 20m)

> Avoid usage of hadoop projects as parent of hdds/ozone
> --
>
> Key: HDDS-2106
> URL: https://issues.apache.org/jira/browse/HDDS-2106
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>Reporter: Elek, Marton
>Assignee: Elek, Marton
>Priority: Blocker
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Ozone uses hadoop as a dependency. The dependency defined on multiple level:
>  1. the hadoop artifacts are defined in the  sections
>  2. both hadoop-ozone and hadoop-hdds projects uses "hadoop-project" as the 
> parent
> As we already have a slightly different assembly process it could be more 
> resilient to use a dedicated parent project instead of the hadoop one. With 
> this approach it will be easier to upgrade the versions as we don't need to 
> be careful about the pom contents only about the used dependencies.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-14845) Request is a replay (34) error in httpfs

2019-09-11 Thread Akira Ajisaka (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-14845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16928114#comment-16928114
 ] 

Akira Ajisaka commented on HDFS-14845:
--

Thanks [~Prabhu Joseph]!

1. 
"org.apache.hadoop.security.AuthenticationFilterInitializer,org.apache.hadoop.http.lib.StaticUserWebFilter,org.apache.hadoop.security.HttpCrossOriginFilterInitializer"
2. https://gist.github.com/aajisaka/82d08ec02aa73bdabfbdad1e9723898d

 

> Request is a replay (34) error in httpfs
> 
>
> Key: HDFS-14845
> URL: https://issues.apache.org/jira/browse/HDFS-14845
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: httpfs
>Affects Versions: 3.3.0
> Environment: Kerberos and ZKDelgationTokenSecretManager enabled in 
> HttpFS
>Reporter: Akira Ajisaka
>Priority: Critical
>
> We are facing "Request is a replay (34)" error when accessing to HDFS via 
> httpfs on trunk.
> {noformat}
> % curl -i --negotiate -u : "https://:4443/webhdfs/v1/?op=liststatus"
> HTTP/1.1 401 Authentication required
> Date: Mon, 09 Sep 2019 06:00:04 GMT
> Date: Mon, 09 Sep 2019 06:00:04 GMT
> Pragma: no-cache
> X-Content-Type-Options: nosniff
> X-XSS-Protection: 1; mode=block
> WWW-Authenticate: Negotiate
> Set-Cookie: hadoop.auth=; Path=/; Secure; HttpOnly
> Cache-Control: must-revalidate,no-cache,no-store
> Content-Type: text/html;charset=iso-8859-1
> Content-Length: 271
> HTTP/1.1 403 GSSException: Failure unspecified at GSS-API level (Mechanism 
> level: Request is a replay (34))
> Date: Mon, 09 Sep 2019 06:00:04 GMT
> Date: Mon, 09 Sep 2019 06:00:04 GMT
> Pragma: no-cache
> X-Content-Type-Options: nosniff
> X-XSS-Protection: 1; mode=block
> (snip)
> Set-Cookie: hadoop.auth=; Path=/; Secure; HttpOnly
> Cache-Control: must-revalidate,no-cache,no-store
> Content-Type: text/html;charset=iso-8859-1
> Content-Length: 413
> 
> 
> 
> Error 403 GSSException: Failure unspecified at GSS-API level 
> (Mechanism level: Request is a replay (34))
> 
> HTTP ERROR 403
> Problem accessing /webhdfs/v1/. Reason:
> GSSException: Failure unspecified at GSS-API level (Mechanism level: 
> Request is a replay (34))
> 
> 
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-14778) BlockManager findAndMarkBlockAsCorrupt adds block to the map if the Storage state is failed

2019-09-11 Thread Wei-Chiu Chuang (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-14778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16928109#comment-16928109
 ] 

Wei-Chiu Chuang commented on HDFS-14778:


It would be really nice if you can update the test 
(TestFileCorruption#testCorruptionWithDiskFailure) added by HDFS-9958 to verify 
the fix.

How about using storage.areBlocksOnFailedStorage() to check?

> BlockManager findAndMarkBlockAsCorrupt adds block to the map if the Storage 
> state is failed
> ---
>
> Key: HDFS-14778
> URL: https://issues.apache.org/jira/browse/HDFS-14778
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: hemanthboyina
>Assignee: hemanthboyina
>Priority: Major
> Attachments: HDFS-14778.001.patch
>
>
> Should not mark the block as corrupt if the storage state is failed



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Work logged] (HDDS-2106) Avoid usage of hadoop projects as parent of hdds/ozone

2019-09-11 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HDDS-2106?focusedWorklogId=311098&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-311098
 ]

ASF GitHub Bot logged work on HDDS-2106:


Author: ASF GitHub Bot
Created on: 12/Sep/19 00:40
Start Date: 12/Sep/19 00:40
Worklog Time Spent: 10m 
  Work Description: elek commented on issue #1423: HDDS-2106. Avoid usage 
of hadoop projects as parent of hdds/ozone
URL: https://github.com/apache/hadoop/pull/1423#issuecomment-530617911
 
 
   Thanks @arp7 and @adoroszlai the review. There are no more warnings (missing 
parts are also migrated from pom.xml) and the integration test failures are not 
related.
   
   Will merge it soon.
   
   For the records: this is just the first step. As we have this brand new 
parent pom, later we can simplify the hadoop-ozone/pom.xml and 
hadoop-hdds/pom.xml as many common parts can be moved to pom.ozone.xml
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 311098)
Time Spent: 20m  (was: 10m)

> Avoid usage of hadoop projects as parent of hdds/ozone
> --
>
> Key: HDDS-2106
> URL: https://issues.apache.org/jira/browse/HDDS-2106
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>Reporter: Elek, Marton
>Assignee: Elek, Marton
>Priority: Blocker
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Ozone uses hadoop as a dependency. The dependency defined on multiple level:
>  1. the hadoop artifacts are defined in the  sections
>  2. both hadoop-ozone and hadoop-hdds projects uses "hadoop-project" as the 
> parent
> As we already have a slightly different assembly process it could be more 
> resilient to use a dedicated parent project instead of the hadoop one. With 
> this approach it will be easier to upgrade the versions as we don't need to 
> be careful about the pom contents only about the used dependencies.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDDS-2111) DOM XSS

2019-09-11 Thread Dinesh Chitlangia (Jira)



 [ 
https://issues.apache.org/jira/browse/HDDS-2111?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dinesh Chitlangia updated HDDS-2111:

Component/s: S3

> DOM XSS
> ---
>
> Key: HDDS-2111
> URL: https://issues.apache.org/jira/browse/HDDS-2111
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: S3
>Reporter: Aayush
>Priority: Major
>
> VULNERABILITY DETAILS
> There is a way to bypass anti-XSS filter for DOM XSS exploiting a 
> "window.location.href".
> Considering a typical URL:
> scheme://domain:port/path?query_string#fragment_id
> Browsers encode correctly both "path" and "query_string", but not the 
> "fragment_id". 
> So if used "fragment_id" the vector is also not logged on Web Server.
> VERSION
> Chrome Version: 10.0.648.134 (Official Build 77917) beta
> REPRODUCTION CASE
> This is an index.html page:
> {code:java}
> aws s3api --endpoint 
> document.write(window.location.href.replace("static/", "")) 
> create-bucket --bucket=wordcount
> {code}
> The attack vector is:
> index.html?#alert('XSS');
> * PoC:
> For your convenience, a minimalist PoC is located on:
> http://security.onofri.org/xss_location.html?#alert('XSS');
> * References
> - DOM Based Cross-Site Scripting or XSS of the Third Kind - 
> http://www.webappsec.org/projects/articles/071105.shtml
> reference:- 
> https://bugs.chromium.org/p/chromium/issues/detail?id=76796



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-14811) RBF: TestRouterRpc#testErasureCoding is flaky

2019-09-11 Thread Jira



[ 
https://issues.apache.org/jira/browse/HDFS-14811?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16928083#comment-16928083
 ] 

Íñigo Goiri commented on HDFS-14811:


Let's see if we can fix the count of active threads.

> RBF: TestRouterRpc#testErasureCoding is flaky
> -
>
> Key: HDFS-14811
> URL: https://issues.apache.org/jira/browse/HDFS-14811
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Chen Zhang
>Assignee: Chen Zhang
>Priority: Major
> Attachments: HDFS-14811.001.patch, HDFS-14811.002.patch
>
>
> The Failed reason:
> {code:java}
> 2019-09-01 18:19:20,940 [IPC Server handler 5 on default port 53140] INFO  
> blockmanagement.BlockPlacementPolicy 
> (BlockPlacementPolicyDefault.java:chooseRandom(838)) - [
> Node /default-rack/127.0.0.1:53148 [
> ]
> Node /default-rack/127.0.0.1:53161 [
> ]
> Node /default-rack/127.0.0.1:53157 [
>   Datanode 127.0.0.1:53157 is not chosen since the node is too busy (load: 3 
> > 2.6665).
> Node /default-rack/127.0.0.1:53143 [
> ]
> Node /default-rack/127.0.0.1:53165 [
> ]
> 2019-09-01 18:19:20,940 [IPC Server handler 5 on default port 53140] INFO  
> blockmanagement.BlockPlacementPolicy 
> (BlockPlacementPolicyDefault.java:chooseRandom(846)) - Not enough replicas 
> was chosen. Reason: {NODE_TOO_BUSY=1}
> 2019-09-01 18:19:20,941 [IPC Server handler 5 on default port 53140] WARN  
> blockmanagement.BlockPlacementPolicy 
> (BlockPlacementPolicyDefault.java:chooseTarget(449)) - Failed to place enough 
> replicas, still in need of 1 to reach 6 (unavailableStorages=[], 
> storagePolicy=BlockStoragePolicy{HOT:7, storageTypes=[DISK], 
> creationFallbacks=[], replicationFallbacks=[ARCHIVE]}, newBlock=true) 
> 2019-09-01 18:19:20,941 [IPC Server handler 5 on default port 53140] WARN  
> protocol.BlockStoragePolicy (BlockStoragePolicy.java:chooseStorageTypes(161)) 
> - Failed to place enough replicas: expected size is 1 but only 0 storage 
> types can be selected (replication=6, selected=[], unavailable=[DISK], 
> removed=[DISK], policy=BlockStoragePolicy{HOT:7, storageTypes=[DISK], 
> creationFallbacks=[], replicationFallbacks=[ARCHIVE]})
> 2019-09-01 18:19:20,941 [IPC Server handler 5 on default port 53140] WARN  
> blockmanagement.BlockPlacementPolicy 
> (BlockPlacementPolicyDefault.java:chooseTarget(449)) - Failed to place enough 
> replicas, still in need of 1 to reach 6 (unavailableStorages=[DISK], 
> storagePolicy=BlockStoragePolicy{HOT:7, storageTypes=[DISK], 
> creationFallbacks=[], replicationFallbacks=[ARCHIVE]}, newBlock=true) All 
> required storage types are unavailable:  unavailableStorages=[DISK], 
> storagePolicy=BlockStoragePolicy{HOT:7, storageTypes=[DISK], 
> creationFallbacks=[], replicationFallbacks=[ARCHIVE]}
> 2019-09-01 18:19:20,941 [IPC Server handler 5 on default port 53140] INFO  
> ipc.Server (Server.java:logException(2982)) - IPC Server handler 5 on default 
> port 53140, call Call#1270 Retry#0 
> org.apache.hadoop.hdfs.protocol.ClientProtocol.addBlock from 127.0.0.1:53202
> java.io.IOException: File /testec/testfile2 could only be written to 5 of the 
> 6 required nodes for RS-6-3-1024k. There are 6 datanode(s) running and 6 
> node(s) are excluded in this operation.
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget4NewBlock(BlockManager.java:)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSDirWriteFileOp.chooseTargetForNewBlock(FSDirWriteFileOp.java:294)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2815)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:893)
>   at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:574)
>   at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:529)
>   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1070)
>   at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:1001)
>   at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:929)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1891)
>   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2921)
> 2019-09-01 18:19:20,942 [IPC Server handler 6 on default port 53197] INFO  
> ipc.Server (Server.java:logException(2975)) - IPC Server handler 6 on default 
> port 53197, call Call#1

[jira] [Commented] (HDFS-14795) Add Throttler for writing block

2019-09-11 Thread Jira



[ 
https://issues.apache.org/jira/browse/HDFS-14795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16928082#comment-16928082
 ] 

Íñigo Goiri commented on HDFS-14795:


Now I see where the docu description comes form. I would try to extend both 
descriptions to make them a little more specific on what a transfer and what a 
write means.

> Add Throttler for writing block
> ---
>
> Key: HDFS-14795
> URL: https://issues.apache.org/jira/browse/HDFS-14795
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Lisheng Sun
>Assignee: Lisheng Sun
>Priority: Minor
> Attachments: HDFS-14795.001.patch, HDFS-14795.002.patch, 
> HDFS-14795.003.patch, HDFS-14795.004.patch, HDFS-14795.005.patch, 
> HDFS-14795.006.patch, HDFS-14795.007.patch, HDFS-14795.008.patch, 
> HDFS-14795.009.patch, HDFS-14795.010.patch
>
>
> DataXceiver#writeBlock
> {code:java}
> blockReceiver.receiveBlock(mirrorOut, mirrorIn, replyOut,
> mirrorAddr, null, targets, false);
> {code}
> As above code, DataXceiver#writeBlock doesn't throttler.
>  I think it is necessary to throttle for writing block, while add throttler 
> in stage of PIPELINE_SETUP_APPEND_RECOVERY or 
> PIPELINE_SETUP_STREAMING_RECOVERY.
> Default throttler value is still null.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-12288) Fix DataNode's xceiver count calculation

2019-09-11 Thread Jira



[ 
https://issues.apache.org/jira/browse/HDFS-12288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16928080#comment-16928080
 ] 

Íñigo Goiri commented on HDFS-12288:


Changing the variable within HeartbeatRequestProto might be too dangerous.
Let's keep the old naming for backwards compatibility.
Regarding the new variables for tracking the threads, I think it makes sense.

> Fix DataNode's xceiver count calculation
> 
>
> Key: HDFS-12288
> URL: https://issues.apache.org/jira/browse/HDFS-12288
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode, hdfs
>Reporter: Lukas Majercak
>Assignee: Chen Zhang
>Priority: Major
> Attachments: HDFS-12288.001.patch, HDFS-12288.002.patch, 
> HDFS-12288.003.patch
>
>
> The problem with the ThreadGroup.activeCount() method is that the method is 
> only a very rough estimate, and in reality returns the total number of 
> threads in the thread group as opposed to the threads actually running.
> In some DNs, we saw this to return 50~ for a long time, even though the 
> actual number of DataXceiver threads was next to none.
> This is a big issue as we use the xceiverCount to make decisions on the NN 
> for choosing replication source DN or returning DNs to clients for R/W.
> The plan is to reuse the DataNodeMetrics.dataNodeActiveXceiversCount value 
> which only accounts for actual number of DataXcevier threads currently 
> running and thus represents the load on the DN much better.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDDS-2114) Rename does not preserve non-explicitly created interim directories

2019-09-11 Thread Istvan Fajth (Jira)



 [ 
https://issues.apache.org/jira/browse/HDDS-2114?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Istvan Fajth updated HDDS-2114:
---
Attachment: demonstrative_test.patch

> Rename does not preserve non-explicitly created interim directories
> ---
>
> Key: HDDS-2114
> URL: https://issues.apache.org/jira/browse/HDDS-2114
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>Reporter: Istvan Fajth
>Priority: Critical
> Attachments: demonstrative_test.patch
>
>
> I am attaching a patch that adds a test that demonstrates the problem.
> The scenario is coming from the way how Hive implements acid transactions 
> with the ORC table format, but the test is redacted to the simplest possible 
> code that reproduces the issue.
> The scenario:
>  * Given a 3 level directory structure, where the top level directory was 
> explicitly created, and the interim directory is implicitly created (for 
> example either by creating a file with create("/top/interim/file") or by 
> creating a directory with mkdirs("top/interim/dir"))
>  * When the leaf is moved out from the implicitly created directory making 
> this directory an empty directory
>  * Then a FileNotFoundException is thrown when getFileStatus or listStatus is 
> called on the interim directory.
> The expected behaviour:
> after the directory is becoming empty, the directory should still be part of 
> the file system, moreover an empty FileStatus array should be returned when 
> listStatus is called on it, and also a valid FileStatus object should be 
> returned when getFileStatus is called on it.
>  
>  
> As this issue is present with Hive, and as this is how a FileSystem is 
> expected to work this seems to be an at least critical issue as I see, please 
> feel free to change the priority if needed.
> Also please note that, if the interim directory is explicitly created with 
> mkdirs("top/interim") before creating the leaf, then the issue does not 
> appear.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Created] (HDDS-2114) Rename does not preserve non-explicitly created interim directories

2019-09-11 Thread Istvan Fajth (Jira)

Istvan Fajth created HDDS-2114:
--

 Summary: Rename does not preserve non-explicitly created interim 
directories
 Key: HDDS-2114
 URL: https://issues.apache.org/jira/browse/HDDS-2114
 Project: Hadoop Distributed Data Store
  Issue Type: Bug
Reporter: Istvan Fajth
 Attachments: demonstrative_test.patch

I am attaching a patch that adds a test that demonstrates the problem.

The scenario is coming from the way how Hive implements acid transactions with 
the ORC table format, but the test is redacted to the simplest possible code 
that reproduces the issue.

The scenario:
 * Given a 3 level directory structure, where the top level directory was 
explicitly created, and the interim directory is implicitly created (for 
example either by creating a file with create("/top/interim/file") or by 
creating a directory with mkdirs("top/interim/dir"))
 * When the leaf is moved out from the implicitly created directory making this 
directory an empty directory
 * Then a FileNotFoundException is thrown when getFileStatus or listStatus is 
called on the interim directory.

The expected behaviour:

after the directory is becoming empty, the directory should still be part of 
the file system, moreover an empty FileStatus array should be returned when 
listStatus is called on it, and also a valid FileStatus object should be 
returned when getFileStatus is called on it.

 

 

As this issue is present with Hive, and as this is how a FileSystem is expected 
to work this seems to be an at least critical issue as I see, please feel free 
to change the priority if needed.

Also please note that, if the interim directory is explicitly created with 
mkdirs("top/interim") before creating the leaf, then the issue does not appear.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-14832) RBF : Add Icon for ReadOnly False

2019-09-11 Thread Jira



[ 
https://issues.apache.org/jira/browse/HDFS-14832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16928071#comment-16928071
 ] 

Íñigo Goiri commented on HDFS-14832:


I would let [~tasanuma] to chime in.

> RBF : Add Icon for ReadOnly False
> -
>
> Key: HDFS-14832
> URL: https://issues.apache.org/jira/browse/HDFS-14832
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: hemanthboyina
>Assignee: hemanthboyina
>Priority: Minor
>
> In Router Web UI for Mount Table information , add icon for read only state 
> false 



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-14655) [SBN Read] Namenode crashes if one of The JN is down

2019-09-11 Thread Hadoop QA (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-14655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16928036#comment-16928036
 ] 

Hadoop QA commented on HDFS-14655:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
43s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 18m 
34s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
55s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
49s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m  
3s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
13m 21s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m  
1s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
51s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
59s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
52s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
52s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
43s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
58s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} xml {color} | {color:green}  0m  
2s{color} | {color:green} The patch has no ill-formed XML file. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m 43s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
15s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
51s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red}105m 38s{color} 
| {color:red} hadoop-hdfs in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
30s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}163m 44s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.hdfs.server.blockmanagement.TestBlockTokenWithDFS 
|
|   | hadoop.hdfs.server.balancer.TestBalancerWithMultipleNameNodes |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=19.03.0 Server=19.03.0 Image:yetus/hadoop:bdbca0e53b4 |
| JIRA Issue | HDFS-14655 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12980112/HDFS-14655-04.patch |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  findbugs  checkstyle  xml  |
| uname | Linux 5e1a891a2b03 4.15.0-58-generic #64-Ubuntu SMP Tue Aug 6 
11:12:41 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 64ed6b1 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_222 |
| findbugs | v3.1.0-RC1 |
| unit | 
https://builds.apache.org/job/PreCommit-HDFS-Build/27847/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt
 |
|  Test Results

[jira] [Commented] (HDDS-2112) Ozone rename is behaving different compared with HDFS

2019-09-11 Thread Istvan Fajth (Jira)



[ 
https://issues.apache.org/jira/browse/HDDS-2112?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16928007#comment-16928007
 ] 

Istvan Fajth commented on HDDS-2112:


Great [~ste...@apache.org], I was as well unsure, and I did not checked the 
contract tests to see, but I completely agree that it is tedious, also it would 
spare me a couple of hours if rename threw an exception in these situations.

Thank you for the clarification, and closure of the ticket!

> Ozone rename is behaving different compared with HDFS
> -
>
> Key: HDDS-2112
> URL: https://issues.apache.org/jira/browse/HDDS-2112
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Filesystem
>Reporter: Istvan Fajth
>Priority: Major
> Attachments: demonstrative_test.patch
>
>
> I am attaching a patch file, that introduces two new tests for the 
> OzoneFileSystem implementation which demonstrates the expected behaviour.
> Case 1:
> Given a directory a file "/source/subdir/file", and a directory /target
> When fs.rename("/source/subdir/file", "/target/subdir/file") is called
> Then DistributedFileSystem (HDFS), is returning false from the method, while 
> OzoneFileSystem throws a FileNotFoundException as "/target/subdir" is not 
> existing.
> The expected behaviour would be to return false in this case instead of 
> throwing an exception with that behave the same as DistributedFileSystem does.
>  
> Case 2:
> Given a directory "/source" and a file "/targetFile"
> When fs.rename("/source", "/targetFile") is called
> Then DistributedFileSystem (HDFS), is returning false from the method, while 
> OzoneFileSystem throws a FileAlreadyExistsException as "/targetFile" does 
> exist.
> The expected behaviour would be to return false in this case instead of 
> throwing an exception with that behave the same as DistributedFileSystem does.
>  
> It may be considered as well a bug in HDFS, however it is not clear from the 
> FileSystem interface's documentation on the two rename methods that it 
> defines in which cases an exception should be thrown and in which cases a 
> return false is the expected behaviour.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDDS-2096) Ozone ACL document missing AddAcl API

2019-09-11 Thread Xiaoyu Yao (Jira)



 [ 
https://issues.apache.org/jira/browse/HDDS-2096?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaoyu Yao updated HDDS-2096:
-
Status: Patch Available  (was: Open)

> Ozone ACL document missing AddAcl API
> -
>
> Key: HDDS-2096
> URL: https://issues.apache.org/jira/browse/HDDS-2096
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>Reporter: Xiaoyu Yao
>Assignee: Xiaoyu Yao
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Current Ozone Native ACL APIs document looks like below, the AddAcl is 
> missing.
>  
> h3. Ozone Native ACL APIs
> The ACLs can be manipulated by a set of APIs supported by Ozone. The APIs 
> supported are:
>  # *SetAcl* – This API will take user principal, the name, type of the ozone 
> object and a list of ACLs.
>  # *GetAcl* – This API will take the name and type of the ozone object and 
> will return a list of ACLs.
>  # *RemoveAcl* - This API will take the name, type of the ozone object and 
> the ACL that has to be removed.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Work logged] (HDDS-2076) Read fails because the block cannot be located in the container

2019-09-11 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HDDS-2076?focusedWorklogId=310988&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-310988
 ]

ASF GitHub Bot logged work on HDDS-2076:


Author: ASF GitHub Bot
Created on: 11/Sep/19 20:41
Start Date: 11/Sep/19 20:41
Worklog Time Spent: 10m 
  Work Description: hadoop-yetus commented on issue #1410: HDDS-2076. Read 
fails because the block cannot be located in the container
URL: https://github.com/apache/hadoop/pull/1410#issuecomment-530556754
 
 
   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime | Comment |
   |::|--:|:|:|
   | 0 | reexec | 82 | Docker mode activated. |
   ||| _ Prechecks _ |
   | +1 | dupname | 0 | No case conflicting files found. |
   | +1 | @author | 0 | The patch does not contain any @author tags. |
   | +1 | test4tests | 0 | The patch appears to include 1 new or modified test 
files. |
   ||| _ trunk Compile Tests _ |
   | 0 | mvndep | 32 | Maven dependency ordering for branch |
   | +1 | mvninstall | 660 | trunk passed |
   | +1 | compile | 387 | trunk passed |
   | +1 | checkstyle | 79 | trunk passed |
   | +1 | mvnsite | 0 | trunk passed |
   | +1 | shadedclient | 967 | branch has no errors when building and testing 
our client artifacts. |
   | +1 | javadoc | 175 | trunk passed |
   | 0 | spotbugs | 494 | Used deprecated FindBugs config; considering 
switching to SpotBugs. |
   | +1 | findbugs | 715 | trunk passed |
   | -0 | patch | 562 | Used diff version of patch file. Binary files and 
potentially other changes not applied. Please rebase and squash commits if 
necessary. |
   ||| _ Patch Compile Tests _ |
   | 0 | mvndep | 45 | Maven dependency ordering for patch |
   | +1 | mvninstall | 715 | the patch passed |
   | +1 | compile | 475 | the patch passed |
   | +1 | javac | 475 | the patch passed |
   | +1 | checkstyle | 103 | the patch passed |
   | +1 | mvnsite | 0 | the patch passed |
   | +1 | whitespace | 0 | The patch has no whitespace issues. |
   | +1 | shadedclient | 885 | patch has no errors when building and testing 
our client artifacts. |
   | +1 | javadoc | 221 | the patch passed |
   | +1 | findbugs | 846 | the patch passed |
   ||| _ Other Tests _ |
   | -1 | unit | 250 | hadoop-hdds in the patch failed. |
   | -1 | unit | 2796 | hadoop-ozone in the patch failed. |
   | +1 | asflicense | 70 | The patch does not generate ASF License warnings. |
   | | | 9721 | |
   
   
   | Reason | Tests |
   |---:|:--|
   | Failed junit tests | hadoop.ozone.container.keyvalue.TestKeyValueContainer 
|
   |   | hadoop.ozone.container.ozoneimpl.TestOzoneContainer |
   |   | hadoop.ozone.client.rpc.TestReadRetries |
   |   | hadoop.ozone.client.rpc.Test2WayCommitInRatis |
   |   | hadoop.ozone.client.rpc.TestDeleteWithSlowFollower |
   |   | 
hadoop.ozone.container.common.statemachine.commandhandler.TestCloseContainerByPipeline
 |
   |   | hadoop.ozone.client.rpc.TestOzoneAtRestEncryption |
   |   | 
hadoop.ozone.container.common.statemachine.commandhandler.TestDeleteContainerHandler
 |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | Client=19.03.2 Server=19.03.2 base: 
https://builds.apache.org/job/hadoop-multibranch/job/PR-1410/4/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/1410 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall 
mvnsite unit shadedclient findbugs checkstyle |
   | uname | Linux 98e5c8d82f67 4.15.0-58-generic #64-Ubuntu SMP Tue Aug 6 
11:12:41 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | personality/hadoop.sh |
   | git revision | trunk / 9221704 |
   | Default Java | 1.8.0_212 |
   | unit | 
https://builds.apache.org/job/hadoop-multibranch/job/PR-1410/4/artifact/out/patch-unit-hadoop-hdds.txt
 |
   | unit | 
https://builds.apache.org/job/hadoop-multibranch/job/PR-1410/4/artifact/out/patch-unit-hadoop-ozone.txt
 |
   |  Test Results | 
https://builds.apache.org/job/hadoop-multibranch/job/PR-1410/4/testReport/ |
   | Max. process+thread count | 4356 (vs. ulimit of 5500) |
   | modules | C: hadoop-hdds/container-service hadoop-ozone/integration-test 
U: . |
   | Console output | 
https://builds.apache.org/job/hadoop-multibranch/job/PR-1410/4/console |
   | versions | git=2.7.4 maven=3.3.9 findbugs=3.1.0-RC1 |
   | Powered by | Apache Yetus 0.10.0 http://yetus.apache.org |
   
   
   This message was automatically generated.
   
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

[jira] [Work logged] (HDDS-1982) Extend SCMNodeManager to support decommission and maintenance states

2019-09-11 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HDDS-1982?focusedWorklogId=310984&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-310984
 ]

ASF GitHub Bot logged work on HDDS-1982:


Author: ASF GitHub Bot
Created on: 11/Sep/19 20:38
Start Date: 11/Sep/19 20:38
Worklog Time Spent: 10m 
  Work Description: hadoop-yetus commented on issue #1344: HDDS-1982 Extend 
SCMNodeManager to support decommission and maintenance states
URL: https://github.com/apache/hadoop/pull/1344#issuecomment-530555681
 
 
   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime | Comment |
   |::|--:|:|:|
   | 0 | reexec | 82 | Docker mode activated. |
   ||| _ Prechecks _ |
   | +1 | dupname | 1 | No case conflicting files found. |
   | +1 | @author | 0 | The patch does not contain any @author tags. |
   | +1 | test4tests | 0 | The patch appears to include 15 new or modified test 
files. |
   ||| _ trunk Compile Tests _ |
   | 0 | mvndep | 80 | Maven dependency ordering for branch |
   | +1 | mvninstall | 639 | trunk passed |
   | +1 | compile | 397 | trunk passed |
   | +1 | checkstyle | 76 | trunk passed |
   | +1 | mvnsite | 0 | trunk passed |
   | +1 | shadedclient | 981 | branch has no errors when building and testing 
our client artifacts. |
   | +1 | javadoc | 234 | trunk passed |
   | 0 | spotbugs | 531 | Used deprecated FindBugs config; considering 
switching to SpotBugs. |
   | +1 | findbugs | 786 | trunk passed |
   ||| _ Patch Compile Tests _ |
   | 0 | mvndep | 44 | Maven dependency ordering for patch |
   | +1 | mvninstall | 723 | the patch passed |
   | +1 | compile | 467 | the patch passed |
   | +1 | cc | 467 | the patch passed |
   | +1 | javac | 467 | the patch passed |
   | +1 | checkstyle | 99 | the patch passed |
   | +1 | mvnsite | 0 | the patch passed |
   | +1 | whitespace | 0 | The patch has no whitespace issues. |
   | +1 | shadedclient | 876 | patch has no errors when building and testing 
our client artifacts. |
   | -1 | javadoc | 95 | hadoop-hdds generated 1 new + 16 unchanged - 0 fixed = 
17 total (was 16) |
   | +1 | findbugs | 788 | the patch passed |
   ||| _ Other Tests _ |
   | +1 | unit | 418 | hadoop-hdds in the patch passed. |
   | -1 | unit | 334 | hadoop-ozone in the patch failed. |
   | +1 | asflicense | 60 | The patch does not generate ASF License warnings. |
   | | | 7525 | |
   
   
   | Reason | Tests |
   |---:|:--|
   | Failed junit tests | 
hadoop.ozone.om.ratis.TestOzoneManagerDoubleBufferWithOMResponse |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | Client=19.03.2 Server=19.03.2 base: 
https://builds.apache.org/job/hadoop-multibranch/job/PR-1344/6/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/1344 |
   | Optional Tests | dupname asflicense compile cc mvnsite javac unit javadoc 
mvninstall shadedclient findbugs checkstyle |
   | uname | Linux 425bc211517e 4.15.0-60-generic #67-Ubuntu SMP Thu Aug 22 
16:55:30 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | personality/hadoop.sh |
   | git revision | trunk / 9221704 |
   | Default Java | 1.8.0_222 |
   | javadoc | 
https://builds.apache.org/job/hadoop-multibranch/job/PR-1344/6/artifact/out/diff-javadoc-javadoc-hadoop-hdds.txt
 |
   | unit | 
https://builds.apache.org/job/hadoop-multibranch/job/PR-1344/6/artifact/out/patch-unit-hadoop-ozone.txt
 |
   |  Test Results | 
https://builds.apache.org/job/hadoop-multibranch/job/PR-1344/6/testReport/ |
   | Max. process+thread count | 426 (vs. ulimit of 5500) |
   | modules | C: hadoop-hdds/common hadoop-hdds/server-scm hadoop-hdds/tools 
hadoop-ozone/integration-test U: . |
   | Console output | 
https://builds.apache.org/job/hadoop-multibranch/job/PR-1344/6/console |
   | versions | git=2.7.4 maven=3.3.9 findbugs=3.1.0-RC1 |
   | Powered by | Apache Yetus 0.10.0 http://yetus.apache.org |
   
   
   This message was automatically generated.
   
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 310984)
Time Spent: 5h 50m  (was: 5h 40m)

> Extend SCMNodeManager to support decommission and maintenance states
> 
>
> Key: HDDS-1982
> URL: https://issues.apache.org/jira/browse/HDDS-1982
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>  Components: SCM
>Reporter: Stephen O'Donnell
>Assignee: Ste

[jira] [Work logged] (HDDS-1786) Datanodes takeSnapshot should delete previously created snapshots

2019-09-11 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HDDS-1786?focusedWorklogId=310974&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-310974
 ]

ASF GitHub Bot logged work on HDDS-1786:


Author: ASF GitHub Bot
Created on: 11/Sep/19 20:21
Start Date: 11/Sep/19 20:21
Worklog Time Spent: 10m 
  Work Description: hadoop-yetus commented on issue #1163: HDDS-1786 : 
Datanodes takeSnapshot should delete previously created s…
URL: https://github.com/apache/hadoop/pull/1163#issuecomment-530549591
 
 
   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime | Comment |
   |::|--:|:|:|
   | 0 | reexec | 40 | Docker mode activated. |
   ||| _ Prechecks _ |
   | +1 | dupname | 0 | No case conflicting files found. |
   | +1 | @author | 0 | The patch does not contain any @author tags. |
   | +1 | test4tests | 0 | The patch appears to include 1 new or modified test 
files. |
   ||| _ trunk Compile Tests _ |
   | 0 | mvndep | 23 | Maven dependency ordering for branch |
   | +1 | mvninstall | 591 | trunk passed |
   | +1 | compile | 393 | trunk passed |
   | +1 | checkstyle | 77 | trunk passed |
   | +1 | mvnsite | 0 | trunk passed |
   | +1 | shadedclient | 843 | branch has no errors when building and testing 
our client artifacts. |
   | +1 | javadoc | 177 | trunk passed |
   | 0 | spotbugs | 431 | Used deprecated FindBugs config; considering 
switching to SpotBugs. |
   | +1 | findbugs | 634 | trunk passed |
   | -0 | patch | 482 | Used diff version of patch file. Binary files and 
potentially other changes not applied. Please rebase and squash commits if 
necessary. |
   ||| _ Patch Compile Tests _ |
   | 0 | mvndep | 38 | Maven dependency ordering for patch |
   | +1 | mvninstall | 547 | the patch passed |
   | +1 | compile | 392 | the patch passed |
   | +1 | javac | 392 | the patch passed |
   | +1 | checkstyle | 85 | the patch passed |
   | +1 | mvnsite | 0 | the patch passed |
   | +1 | whitespace | 0 | The patch has no whitespace issues. |
   | +1 | shadedclient | 659 | patch has no errors when building and testing 
our client artifacts. |
   | +1 | javadoc | 173 | the patch passed |
   | +1 | findbugs | 699 | the patch passed |
   ||| _ Other Tests _ |
   | +1 | unit | 298 | hadoop-hdds in the patch passed. |
   | -1 | unit | 2644 | hadoop-ozone in the patch failed. |
   | +1 | asflicense | 51 | The patch does not generate ASF License warnings. |
   | | | 8546 | |
   
   
   | Reason | Tests |
   |---:|:--|
   | Failed junit tests | 
hadoop.ozone.container.common.statemachine.commandhandler.TestBlockDeletion |
   |   | hadoop.ozone.client.rpc.TestMultiBlockWritesWithDnFailures |
   |   | hadoop.ozone.om.TestOMRatisSnapshots |
   |   | hadoop.ozone.TestSecureOzoneCluster |
   |   | hadoop.ozone.scm.TestContainerSmallFile |
   |   | hadoop.ozone.client.rpc.TestBlockOutputStream |
   |   | hadoop.ozone.om.TestOzoneManagerHA |
   |   | hadoop.ozone.om.TestOmAcls |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | Client=19.03.1 Server=19.03.1 base: 
https://builds.apache.org/job/hadoop-multibranch/job/PR-1163/11/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/1163 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall 
mvnsite unit shadedclient findbugs checkstyle |
   | uname | Linux d7a4f1cd6f52 4.15.0-60-generic #67-Ubuntu SMP Thu Aug 22 
16:55:30 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | personality/hadoop.sh |
   | git revision | trunk / 9221704 |
   | Default Java | 1.8.0_222 |
   | unit | 
https://builds.apache.org/job/hadoop-multibranch/job/PR-1163/11/artifact/out/patch-unit-hadoop-ozone.txt
 |
   |  Test Results | 
https://builds.apache.org/job/hadoop-multibranch/job/PR-1163/11/testReport/ |
   | Max. process+thread count | 5118 (vs. ulimit of 5500) |
   | modules | C: hadoop-hdds/container-service hadoop-ozone/integration-test 
U: . |
   | Console output | 
https://builds.apache.org/job/hadoop-multibranch/job/PR-1163/11/console |
   | versions | git=2.7.4 maven=3.3.9 findbugs=3.1.0-RC1 |
   | Powered by | Apache Yetus 0.10.0 http://yetus.apache.org |
   
   
   This message was automatically generated.
   
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 310974)
Time Spent: 4.5h  (was: 4h 20m)

> Datanodes takeSnapshot should delete previously created snapshots
> -
>
> Key:

[jira] [Work logged] (HDDS-2096) Ozone ACL document missing AddAcl API

2019-09-11 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HDDS-2096?focusedWorklogId=310967&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-310967
 ]

ASF GitHub Bot logged work on HDDS-2096:


Author: ASF GitHub Bot
Created on: 11/Sep/19 20:12
Start Date: 11/Sep/19 20:12
Worklog Time Spent: 10m 
  Work Description: hadoop-yetus commented on pull request #1427: 
HDDS-2096. Ozone ACL document missing AddAcl API. Contributed by Xiao…
URL: https://github.com/apache/hadoop/pull/1427#discussion_r323436876
 
 

 ##
 File path: hadoop-hdds/docs/content/security/SecurityAcls.md
 ##
 @@ -78,5 +78,7 @@ supported are:
 of the ozone object and a list of ACLs.
 2. **GetAcl** – This API will take the name and type of the ozone object
 and will return a list of ACLs.
-3. **RemoveAcl** - This API will take the name, type of the
+3. **AddAcl** - This API will take the name, type of the ozone object, the
+ACL, and add it to existing ACL entries of the ozone object. 
 
 Review comment:
   whitespace:end of line
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 310967)
Time Spent: 20m  (was: 10m)

> Ozone ACL document missing AddAcl API
> -
>
> Key: HDDS-2096
> URL: https://issues.apache.org/jira/browse/HDDS-2096
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>Reporter: Xiaoyu Yao
>Assignee: Xiaoyu Yao
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Current Ozone Native ACL APIs document looks like below, the AddAcl is 
> missing.
>  
> h3. Ozone Native ACL APIs
> The ACLs can be manipulated by a set of APIs supported by Ozone. The APIs 
> supported are:
>  # *SetAcl* – This API will take user principal, the name, type of the ozone 
> object and a list of ACLs.
>  # *GetAcl* – This API will take the name and type of the ozone object and 
> will return a list of ACLs.
>  # *RemoveAcl* - This API will take the name, type of the ozone object and 
> the ACL that has to be removed.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Work logged] (HDDS-2096) Ozone ACL document missing AddAcl API

2019-09-11 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HDDS-2096?focusedWorklogId=310968&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-310968
 ]

ASF GitHub Bot logged work on HDDS-2096:


Author: ASF GitHub Bot
Created on: 11/Sep/19 20:12
Start Date: 11/Sep/19 20:12
Worklog Time Spent: 10m 
  Work Description: hadoop-yetus commented on issue #1427: HDDS-2096. Ozone 
ACL document missing AddAcl API. Contributed by Xiao…
URL: https://github.com/apache/hadoop/pull/1427#issuecomment-530546164
 
 
   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime | Comment |
   |::|--:|:|:|
   | 0 | reexec | 42 | Docker mode activated. |
   ||| _ Prechecks _ |
   | +1 | dupname | 0 | No case conflicting files found. |
   | +1 | @author | 0 | The patch does not contain any @author tags. |
   ||| _ trunk Compile Tests _ |
   | +1 | mvninstall | 644 | trunk passed |
   | +1 | mvnsite | 0 | trunk passed |
   | +1 | shadedclient | 1469 | branch has no errors when building and testing 
our client artifacts. |
   ||| _ Patch Compile Tests _ |
   | +1 | mvninstall | 568 | the patch passed |
   | +1 | mvnsite | 0 | the patch passed |
   | -1 | whitespace | 0 | The patch has 1 line(s) that end in whitespace. Use 
git apply --whitespace=fix <>. Refer 
https://git-scm.com/docs/git-apply |
   | +1 | shadedclient | 725 | patch has no errors when building and testing 
our client artifacts. |
   ||| _ Other Tests _ |
   | +1 | asflicense | 60 | The patch does not generate ASF License warnings. |
   | | | 3038 | |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | Client=19.03.1 Server=19.03.1 base: 
https://builds.apache.org/job/hadoop-multibranch/job/PR-1427/1/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/1427 |
   | Optional Tests | dupname asflicense mvnsite |
   | uname | Linux 2ed3846aeacb 4.15.0-60-generic #67-Ubuntu SMP Thu Aug 22 
16:55:30 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | personality/hadoop.sh |
   | git revision | trunk / 64ed6b1 |
   | whitespace | 
https://builds.apache.org/job/hadoop-multibranch/job/PR-1427/1/artifact/out/whitespace-eol.txt
 |
   | Max. process+thread count | 412 (vs. ulimit of 5500) |
   | modules | C: hadoop-hdds/docs U: hadoop-hdds/docs |
   | Console output | 
https://builds.apache.org/job/hadoop-multibranch/job/PR-1427/1/console |
   | versions | git=2.7.4 maven=3.3.9 |
   | Powered by | Apache Yetus 0.10.0 http://yetus.apache.org |
   
   
   This message was automatically generated.
   
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 310968)
Time Spent: 0.5h  (was: 20m)

> Ozone ACL document missing AddAcl API
> -
>
> Key: HDDS-2096
> URL: https://issues.apache.org/jira/browse/HDDS-2096
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>Reporter: Xiaoyu Yao
>Assignee: Xiaoyu Yao
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Current Ozone Native ACL APIs document looks like below, the AddAcl is 
> missing.
>  
> h3. Ozone Native ACL APIs
> The ACLs can be manipulated by a set of APIs supported by Ozone. The APIs 
> supported are:
>  # *SetAcl* – This API will take user principal, the name, type of the ozone 
> object and a list of ACLs.
>  # *GetAcl* – This API will take the name and type of the ozone object and 
> will return a list of ACLs.
>  # *RemoveAcl* - This API will take the name, type of the ozone object and 
> the ACL that has to be removed.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDDS-2112) Ozone rename is behaving different compared with HDFS

2019-09-11 Thread Steve Loughran (Jira)



 [ 
https://issues.apache.org/jira/browse/HDDS-2112?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated HDDS-2112:
-
Summary: Ozone rename is behaving different compared with HDFS  (was: 
rename is behaving different compared with HDFS)

> Ozone rename is behaving different compared with HDFS
> -
>
> Key: HDDS-2112
> URL: https://issues.apache.org/jira/browse/HDDS-2112
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Filesystem
>Reporter: Istvan Fajth
>Priority: Major
> Attachments: demonstrative_test.patch
>
>
> I am attaching a patch file, that introduces two new tests for the 
> OzoneFileSystem implementation which demonstrates the expected behaviour.
> Case 1:
> Given a directory a file "/source/subdir/file", and a directory /target
> When fs.rename("/source/subdir/file", "/target/subdir/file") is called
> Then DistributedFileSystem (HDFS), is returning false from the method, while 
> OzoneFileSystem throws a FileNotFoundException as "/target/subdir" is not 
> existing.
> The expected behaviour would be to return false in this case instead of 
> throwing an exception with that behave the same as DistributedFileSystem does.
>  
> Case 2:
> Given a directory "/source" and a file "/targetFile"
> When fs.rename("/source", "/targetFile") is called
> Then DistributedFileSystem (HDFS), is returning false from the method, while 
> OzoneFileSystem throws a FileAlreadyExistsException as "/targetFile" does 
> exist.
> The expected behaviour would be to return false in this case instead of 
> throwing an exception with that behave the same as DistributedFileSystem does.
>  
> It may be considered as well a bug in HDFS, however it is not clear from the 
> FileSystem interface's documentation on the two rename methods that it 
> defines in which cases an exception should be thrown and in which cases a 
> return false is the expected behaviour.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDDS-2112) rename is behaving different compared with HDFS

2019-09-11 Thread Steve Loughran (Jira)



[ 
https://issues.apache.org/jira/browse/HDDS-2112?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16927955#comment-16927955
 ] 

Steve Loughran commented on HDDS-2112:
--

This is covered inL 
https://github.com/apache/hadoop/blob/trunk/hadoop-common-project/hadoop-common/src/site/markdown/filesystem/filesystem.md#boolean-renamepath-src-path-d
 

basically
* rename is a confusing mess and the whole return true/false world complicates 
life
* everything other than HDFS (including local) raises an FNFE when the parent 
isn't there
* having rename() return false is generally useless for applications, which are 
invariably full of code like
{code}
 if (!rename(src, dest)) throw new IOE("rename failed and we don't know why!")
{code}
We have tests for what rename does in 
inorg.apache.hadoop.fs.contract.AbstractContractRenameTest; if you want to see 
the core quirks which rename() can get up to, look at 
org.apache.hadoop.fs.contract.ContractOptions

Ultimately we need to get HADOOP-11452 in and make the rename/3 call public. 
That's the one called via FileContext and which, for HDFS, does raise 
exceptions in the two situations created.

closing as a WONTFIX as it is HDFS through the FileSystem.rename(src, dest) API 
call which is considered the incorrect behaviour

Well done for finding this though -good bit of research!

> rename is behaving different compared with HDFS
> ---
>
> Key: HDDS-2112
> URL: https://issues.apache.org/jira/browse/HDDS-2112
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Filesystem
>Reporter: Istvan Fajth
>Priority: Major
> Attachments: demonstrative_test.patch
>
>
> I am attaching a patch file, that introduces two new tests for the 
> OzoneFileSystem implementation which demonstrates the expected behaviour.
> Case 1:
> Given a directory a file "/source/subdir/file", and a directory /target
> When fs.rename("/source/subdir/file", "/target/subdir/file") is called
> Then DistributedFileSystem (HDFS), is returning false from the method, while 
> OzoneFileSystem throws a FileNotFoundException as "/target/subdir" is not 
> existing.
> The expected behaviour would be to return false in this case instead of 
> throwing an exception with that behave the same as DistributedFileSystem does.
>  
> Case 2:
> Given a directory "/source" and a file "/targetFile"
> When fs.rename("/source", "/targetFile") is called
> Then DistributedFileSystem (HDFS), is returning false from the method, while 
> OzoneFileSystem throws a FileAlreadyExistsException as "/targetFile" does 
> exist.
> The expected behaviour would be to return false in this case instead of 
> throwing an exception with that behave the same as DistributedFileSystem does.
>  
> It may be considered as well a bug in HDFS, however it is not clear from the 
> FileSystem interface's documentation on the two rename methods that it 
> defines in which cases an exception should be thrown and in which cases a 
> return false is the expected behaviour.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Resolved] (HDDS-2112) rename is behaving different compared with HDFS

2019-09-11 Thread Steve Loughran (Jira)



 [ 
https://issues.apache.org/jira/browse/HDDS-2112?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran resolved HDDS-2112.
--
Resolution: Won't Fix

> rename is behaving different compared with HDFS
> ---
>
> Key: HDDS-2112
> URL: https://issues.apache.org/jira/browse/HDDS-2112
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Filesystem
>Reporter: Istvan Fajth
>Priority: Major
> Attachments: demonstrative_test.patch
>
>
> I am attaching a patch file, that introduces two new tests for the 
> OzoneFileSystem implementation which demonstrates the expected behaviour.
> Case 1:
> Given a directory a file "/source/subdir/file", and a directory /target
> When fs.rename("/source/subdir/file", "/target/subdir/file") is called
> Then DistributedFileSystem (HDFS), is returning false from the method, while 
> OzoneFileSystem throws a FileNotFoundException as "/target/subdir" is not 
> existing.
> The expected behaviour would be to return false in this case instead of 
> throwing an exception with that behave the same as DistributedFileSystem does.
>  
> Case 2:
> Given a directory "/source" and a file "/targetFile"
> When fs.rename("/source", "/targetFile") is called
> Then DistributedFileSystem (HDFS), is returning false from the method, while 
> OzoneFileSystem throws a FileAlreadyExistsException as "/targetFile" does 
> exist.
> The expected behaviour would be to return false in this case instead of 
> throwing an exception with that behave the same as DistributedFileSystem does.
>  
> It may be considered as well a bug in HDFS, however it is not clear from the 
> FileSystem interface's documentation on the two rename methods that it 
> defines in which cases an exception should be thrown and in which cases a 
> return false is the expected behaviour.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-14795) Add Throttler for writing block

2019-09-11 Thread Hadoop QA (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-14795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16927940#comment-16927940
 ] 

Hadoop QA commented on HDFS-14795:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
23s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 21m 
17s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
19s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
 5s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
28s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
16m  7s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
38s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
10s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
15s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
7s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  1m  
7s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
49s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
16s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} xml {color} | {color:green}  0m  
3s{color} | {color:green} The patch has no ill-formed XML file. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
14m 21s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  3m  
5s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m  
5s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 95m 59s{color} 
| {color:red} hadoop-hdfs in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
31s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}164m 56s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.hdfs.tools.TestDFSAdminWithHA |
|   | hadoop.hdfs.TestReconstructStripedFile |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=19.03.1 Server=19.03.1 Image:yetus/hadoop:bdbca0e53b4 |
| JIRA Issue | HDFS-14795 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12980106/HDFS-14795.010.patch |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  findbugs  checkstyle  xml  |
| uname | Linux 54d46fe40a9b 4.15.0-54-generic #58-Ubuntu SMP Mon Jun 24 
10:55:24 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 9221704 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_222 |
| findbugs | v3.1.0-RC1 |
| unit | 
https://builds.apache.org/job/PreCommit-HDFS-Build/27846/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit

[jira] [Commented] (HDDS-2075) Tracing in OzoneManager call is propagated with wrong parent

2019-09-11 Thread Hudson (Jira)



[ 
https://issues.apache.org/jira/browse/HDDS-2075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16927937#comment-16927937
 ] 

Hudson commented on HDDS-2075:
--

FAILURE: Integrated in Jenkins build Hadoop-trunk-Commit #17277 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/17277/])
HDDS-2075. Tracing in OzoneManager call is propagated with wrong parent (xyao: 
rev 64ed6b177d6b00b22d45576a8517432dc6c03348)
* (edit) 
hadoop-ozone/client/src/main/java/org/apache/hadoop/ozone/client/rpc/RpcClient.java
* (edit) 
hadoop-ozone/common/src/main/java/org/apache/hadoop/ozone/om/protocolPB/OzoneManagerProtocolClientSideTranslatorPB.java


> Tracing in OzoneManager call is propagated with wrong parent
> 
>
> Key: HDDS-2075
> URL: https://issues.apache.org/jira/browse/HDDS-2075
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>Reporter: Elek, Marton
>Assignee: Doroszlai, Attila
>Priority: Major
>  Labels: pull-request-available
> Attachments: create_bucket-new.png, create_bucket.png
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> As you can see in the attached screenshot the OzoneManager.createBucket 
> (server side) tracing information is the children of the freon.createBucket 
> instead of the freon OzoneManagerProtocolPB.submitRequest.
> To avoid confusion the hierarchy should be fixed (Most probably we generate 
> the child span AFTER we already serialized the parent one to the message) 



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-14655) [SBN Read] Namenode crashes if one of The JN is down

2019-09-11 Thread Ayush Saxena (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-14655?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ayush Saxena updated HDFS-14655:

Status: Patch Available  (was: Open)

> [SBN Read] Namenode crashes if one of The JN is down
> 
>
> Key: HDFS-14655
> URL: https://issues.apache.org/jira/browse/HDFS-14655
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 3.3.0
>Reporter: Harshakiran Reddy
>Assignee: Ayush Saxena
>Priority: Critical
> Attachments: HDFS-14655-01.patch, HDFS-14655-02.patch, 
> HDFS-14655-03.patch, HDFS-14655-04.patch, HDFS-14655.poc.patch
>
>
> {noformat}
> 2019-07-04 17:35:54,064 | INFO  | Logger channel (from parallel executor) to 
> XXX/XXX | Retrying connect to server: XXX/XXX. Already tried 
> 9 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, 
> sleepTime=1000 MILLISECONDS) | Client.java:975
> 2019-07-04 17:35:54,087 | FATAL | Edit log tailer | Unknown error encountered 
> while tailing edits. Shutting down standby NN. | EditLogTailer.java:474
> java.lang.OutOfMemoryError: unable to create new native thread
>   at java.lang.Thread.start0(Native Method)
>   at java.lang.Thread.start(Thread.java:717)
>   at 
> java.util.concurrent.ThreadPoolExecutor.addWorker(ThreadPoolExecutor.java:957)
>   at 
> java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1378)
>   at 
> com.google.common.util.concurrent.MoreExecutors$ListeningDecorator.execute(MoreExecutors.java:440)
>   at 
> com.google.common.util.concurrent.AbstractListeningExecutorService.submit(AbstractListeningExecutorService.java:56)
>   at 
> org.apache.hadoop.hdfs.qjournal.client.IPCLoggerChannel.getJournaledEdits(IPCLoggerChannel.java:565)
>   at 
> org.apache.hadoop.hdfs.qjournal.client.AsyncLoggerSet.getJournaledEdits(AsyncLoggerSet.java:272)
>   at 
> org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager.selectRpcInputStreams(QuorumJournalManager.java:533)
>   at 
> org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager.selectInputStreams(QuorumJournalManager.java:508)
>   at 
> org.apache.hadoop.hdfs.server.namenode.JournalSet.selectInputStreams(JournalSet.java:275)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLog.selectInputStreams(FSEditLog.java:1681)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLog.selectInputStreams(FSEditLog.java:1714)
>   at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer.doTailEdits(EditLogTailer.java:307)
>   at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.doWork(EditLogTailer.java:460)
>   at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.access$300(EditLogTailer.java:410)
>   at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread$1.run(EditLogTailer.java:427)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:360)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1709)
>   at 
> org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:483)
>   at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.run(EditLogTailer.java:423)
> 2019-07-04 17:35:54,112 | INFO  | Edit log tailer | Exiting with status 1: 
> java.lang.OutOfMemoryError: unable to create new native thread | 
> ExitUtil.java:210
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-14655) [SBN Read] Namenode crashes if one of The JN is down

2019-09-11 Thread Ayush Saxena (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-14655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16927923#comment-16927923
 ] 

Ayush Saxena commented on HDFS-14655:
-

No issues, We can definitely wait. It would be good if we can get feedback from 
them too. :)

> [SBN Read] Namenode crashes if one of The JN is down
> 
>
> Key: HDFS-14655
> URL: https://issues.apache.org/jira/browse/HDFS-14655
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 3.3.0
>Reporter: Harshakiran Reddy
>Assignee: Ayush Saxena
>Priority: Critical
> Attachments: HDFS-14655-01.patch, HDFS-14655-02.patch, 
> HDFS-14655-03.patch, HDFS-14655-04.patch, HDFS-14655.poc.patch
>
>
> {noformat}
> 2019-07-04 17:35:54,064 | INFO  | Logger channel (from parallel executor) to 
> XXX/XXX | Retrying connect to server: XXX/XXX. Already tried 
> 9 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, 
> sleepTime=1000 MILLISECONDS) | Client.java:975
> 2019-07-04 17:35:54,087 | FATAL | Edit log tailer | Unknown error encountered 
> while tailing edits. Shutting down standby NN. | EditLogTailer.java:474
> java.lang.OutOfMemoryError: unable to create new native thread
>   at java.lang.Thread.start0(Native Method)
>   at java.lang.Thread.start(Thread.java:717)
>   at 
> java.util.concurrent.ThreadPoolExecutor.addWorker(ThreadPoolExecutor.java:957)
>   at 
> java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1378)
>   at 
> com.google.common.util.concurrent.MoreExecutors$ListeningDecorator.execute(MoreExecutors.java:440)
>   at 
> com.google.common.util.concurrent.AbstractListeningExecutorService.submit(AbstractListeningExecutorService.java:56)
>   at 
> org.apache.hadoop.hdfs.qjournal.client.IPCLoggerChannel.getJournaledEdits(IPCLoggerChannel.java:565)
>   at 
> org.apache.hadoop.hdfs.qjournal.client.AsyncLoggerSet.getJournaledEdits(AsyncLoggerSet.java:272)
>   at 
> org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager.selectRpcInputStreams(QuorumJournalManager.java:533)
>   at 
> org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager.selectInputStreams(QuorumJournalManager.java:508)
>   at 
> org.apache.hadoop.hdfs.server.namenode.JournalSet.selectInputStreams(JournalSet.java:275)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLog.selectInputStreams(FSEditLog.java:1681)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLog.selectInputStreams(FSEditLog.java:1714)
>   at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer.doTailEdits(EditLogTailer.java:307)
>   at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.doWork(EditLogTailer.java:460)
>   at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.access$300(EditLogTailer.java:410)
>   at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread$1.run(EditLogTailer.java:427)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:360)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1709)
>   at 
> org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:483)
>   at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.run(EditLogTailer.java:423)
> 2019-07-04 17:35:54,112 | INFO  | Edit log tailer | Exiting with status 1: 
> java.lang.OutOfMemoryError: unable to create new native thread | 
> ExitUtil.java:210
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-14844) Make buffer of BlockReaderRemote#newBlockReader#BufferedOutputStream configurable

2019-09-11 Thread Hadoop QA (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-14844?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16927919#comment-16927919
 ] 

Hadoop QA commented on HDFS-14844:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
41s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
20s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 18m 
45s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  3m  
8s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
50s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
54s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
14m 17s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  3m 
46s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
20s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
11s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
41s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  2m 
56s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  2m 
56s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 50s{color} | {color:orange} hadoop-hdfs-project: The patch generated 1 new + 
69 unchanged - 0 fixed = 70 total (was 69) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
51s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} xml {color} | {color:green}  0m  
2s{color} | {color:green} The patch has no ill-formed XML file. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m 21s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  4m  
0s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
18s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  1m 
55s{color} | {color:green} hadoop-hdfs-client in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}101m 
40s{color} | {color:green} hadoop-hdfs in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
34s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}173m 42s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=18.09.7 Server=18.09.7 Image:yetus/hadoop:bdbca0e53b4 |
| JIRA Issue | HDFS-14844 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12980104/HDFS-14844.001.patch |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  findbugs  checkstyle  xml  |
| uname | Linux 6d6a3fa9050e 4.15.0-58-generic #64-Ubuntu SMP Tue Aug 6 
11:12:41 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |

[jira] [Work logged] (HDDS-2075) Tracing in OzoneManager call is propagated with wrong parent

2019-09-11 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HDDS-2075?focusedWorklogId=310942&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-310942
 ]

ASF GitHub Bot logged work on HDDS-2075:


Author: ASF GitHub Bot
Created on: 11/Sep/19 19:06
Start Date: 11/Sep/19 19:06
Worklog Time Spent: 10m 
  Work Description: adoroszlai commented on issue #1415: HDDS-2075. Tracing 
in OzoneManager call is propagated with wrong parent
URL: https://github.com/apache/hadoop/pull/1415#issuecomment-530521643
 
 
   Thanks @xiaoyuyao for reviewing and merging it.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 310942)
Time Spent: 1h  (was: 50m)

> Tracing in OzoneManager call is propagated with wrong parent
> 
>
> Key: HDDS-2075
> URL: https://issues.apache.org/jira/browse/HDDS-2075
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>Reporter: Elek, Marton
>Assignee: Doroszlai, Attila
>Priority: Major
>  Labels: pull-request-available
> Attachments: create_bucket-new.png, create_bucket.png
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> As you can see in the attached screenshot the OzoneManager.createBucket 
> (server side) tracing information is the children of the freon.createBucket 
> instead of the freon OzoneManagerProtocolPB.submitRequest.
> To avoid confusion the hierarchy should be fixed (Most probably we generate 
> the child span AFTER we already serialized the parent one to the message) 



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-14655) [SBN Read] Namenode crashes if one of The JN is down

2019-09-11 Thread Erik Krogen (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-14655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16927910#comment-16927910
 ] 

Erik Krogen commented on HDFS-14655:


Agreed, I think we are saying the same thing. I'm +1 on v004 patch.

[~shv] and [~vagarychen] are both at a conference until Friday, would you mind 
if we wait until then to see if they want to provide any input before we commit?

> [SBN Read] Namenode crashes if one of The JN is down
> 
>
> Key: HDFS-14655
> URL: https://issues.apache.org/jira/browse/HDFS-14655
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 3.3.0
>Reporter: Harshakiran Reddy
>Assignee: Ayush Saxena
>Priority: Critical
> Attachments: HDFS-14655-01.patch, HDFS-14655-02.patch, 
> HDFS-14655-03.patch, HDFS-14655-04.patch, HDFS-14655.poc.patch
>
>
> {noformat}
> 2019-07-04 17:35:54,064 | INFO  | Logger channel (from parallel executor) to 
> XXX/XXX | Retrying connect to server: XXX/XXX. Already tried 
> 9 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, 
> sleepTime=1000 MILLISECONDS) | Client.java:975
> 2019-07-04 17:35:54,087 | FATAL | Edit log tailer | Unknown error encountered 
> while tailing edits. Shutting down standby NN. | EditLogTailer.java:474
> java.lang.OutOfMemoryError: unable to create new native thread
>   at java.lang.Thread.start0(Native Method)
>   at java.lang.Thread.start(Thread.java:717)
>   at 
> java.util.concurrent.ThreadPoolExecutor.addWorker(ThreadPoolExecutor.java:957)
>   at 
> java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1378)
>   at 
> com.google.common.util.concurrent.MoreExecutors$ListeningDecorator.execute(MoreExecutors.java:440)
>   at 
> com.google.common.util.concurrent.AbstractListeningExecutorService.submit(AbstractListeningExecutorService.java:56)
>   at 
> org.apache.hadoop.hdfs.qjournal.client.IPCLoggerChannel.getJournaledEdits(IPCLoggerChannel.java:565)
>   at 
> org.apache.hadoop.hdfs.qjournal.client.AsyncLoggerSet.getJournaledEdits(AsyncLoggerSet.java:272)
>   at 
> org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager.selectRpcInputStreams(QuorumJournalManager.java:533)
>   at 
> org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager.selectInputStreams(QuorumJournalManager.java:508)
>   at 
> org.apache.hadoop.hdfs.server.namenode.JournalSet.selectInputStreams(JournalSet.java:275)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLog.selectInputStreams(FSEditLog.java:1681)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLog.selectInputStreams(FSEditLog.java:1714)
>   at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer.doTailEdits(EditLogTailer.java:307)
>   at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.doWork(EditLogTailer.java:460)
>   at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.access$300(EditLogTailer.java:410)
>   at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread$1.run(EditLogTailer.java:427)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:360)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1709)
>   at 
> org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:483)
>   at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.run(EditLogTailer.java:423)
> 2019-07-04 17:35:54,112 | INFO  | Edit log tailer | Exiting with status 1: 
> java.lang.OutOfMemoryError: unable to create new native thread | 
> ExitUtil.java:210
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Comment Edited] (HDFS-14655) [SBN Read] Namenode crashes if one of The JN is down

2019-09-11 Thread Ayush Saxena (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-14655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16927900#comment-16927900
 ] 

Ayush Saxena edited comment on HDFS-14655 at 9/11/19 7:00 PM:
--

Yes, As I said above too, It interrupts But doesn’t kill, Then is Client.java 
it checks whether the thread is interrupted or not, if interrupted then it 
stops retrying, IIRC
So thread can be reused..


was (Author: ayushtkn):
Yes, As I said above too, It interrupts But doesn’t kill, Then is Client.java 
it checks whether the thread is interrupted or not, if interrupted then the 
thread is killed, IIRC

> [SBN Read] Namenode crashes if one of The JN is down
> 
>
> Key: HDFS-14655
> URL: https://issues.apache.org/jira/browse/HDFS-14655
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 3.3.0
>Reporter: Harshakiran Reddy
>Assignee: Ayush Saxena
>Priority: Critical
> Attachments: HDFS-14655-01.patch, HDFS-14655-02.patch, 
> HDFS-14655-03.patch, HDFS-14655-04.patch, HDFS-14655.poc.patch
>
>
> {noformat}
> 2019-07-04 17:35:54,064 | INFO  | Logger channel (from parallel executor) to 
> XXX/XXX | Retrying connect to server: XXX/XXX. Already tried 
> 9 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, 
> sleepTime=1000 MILLISECONDS) | Client.java:975
> 2019-07-04 17:35:54,087 | FATAL | Edit log tailer | Unknown error encountered 
> while tailing edits. Shutting down standby NN. | EditLogTailer.java:474
> java.lang.OutOfMemoryError: unable to create new native thread
>   at java.lang.Thread.start0(Native Method)
>   at java.lang.Thread.start(Thread.java:717)
>   at 
> java.util.concurrent.ThreadPoolExecutor.addWorker(ThreadPoolExecutor.java:957)
>   at 
> java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1378)
>   at 
> com.google.common.util.concurrent.MoreExecutors$ListeningDecorator.execute(MoreExecutors.java:440)
>   at 
> com.google.common.util.concurrent.AbstractListeningExecutorService.submit(AbstractListeningExecutorService.java:56)
>   at 
> org.apache.hadoop.hdfs.qjournal.client.IPCLoggerChannel.getJournaledEdits(IPCLoggerChannel.java:565)
>   at 
> org.apache.hadoop.hdfs.qjournal.client.AsyncLoggerSet.getJournaledEdits(AsyncLoggerSet.java:272)
>   at 
> org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager.selectRpcInputStreams(QuorumJournalManager.java:533)
>   at 
> org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager.selectInputStreams(QuorumJournalManager.java:508)
>   at 
> org.apache.hadoop.hdfs.server.namenode.JournalSet.selectInputStreams(JournalSet.java:275)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLog.selectInputStreams(FSEditLog.java:1681)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLog.selectInputStreams(FSEditLog.java:1714)
>   at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer.doTailEdits(EditLogTailer.java:307)
>   at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.doWork(EditLogTailer.java:460)
>   at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.access$300(EditLogTailer.java:410)
>   at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread$1.run(EditLogTailer.java:427)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:360)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1709)
>   at 
> org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:483)
>   at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.run(EditLogTailer.java:423)
> 2019-07-04 17:35:54,112 | INFO  | Edit log tailer | Exiting with status 1: 
> java.lang.OutOfMemoryError: unable to create new native thread | 
> ExitUtil.java:210
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDDS-2075) Tracing in OzoneManager call is propagated with wrong parent

2019-09-11 Thread Xiaoyu Yao (Jira)



[ 
https://issues.apache.org/jira/browse/HDDS-2075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16927905#comment-16927905
 ] 

Xiaoyu Yao commented on HDDS-2075:
--

cc: [~nandakumar131] we should port this for 0.4.1 if possible. 

> Tracing in OzoneManager call is propagated with wrong parent
> 
>
> Key: HDDS-2075
> URL: https://issues.apache.org/jira/browse/HDDS-2075
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>Reporter: Elek, Marton
>Assignee: Doroszlai, Attila
>Priority: Major
>  Labels: pull-request-available
> Attachments: create_bucket-new.png, create_bucket.png
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> As you can see in the attached screenshot the OzoneManager.createBucket 
> (server side) tracing information is the children of the freon.createBucket 
> instead of the freon OzoneManagerProtocolPB.submitRequest.
> To avoid confusion the hierarchy should be fixed (Most probably we generate 
> the child span AFTER we already serialized the parent one to the message) 



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Comment Edited] (HDFS-14655) [SBN Read] Namenode crashes if one of The JN is down

2019-09-11 Thread Ayush Saxena (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-14655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16927900#comment-16927900
 ] 

Ayush Saxena edited comment on HDFS-14655 at 9/11/19 6:59 PM:
--

Yes, As I said above too, It interrupts But doesn’t kill, Then is Client.java 
it checks whether the thread is interrupted or not, if interrupted then the 
thread is killed, IIRC


was (Author: ayushtkn):
Yes, As I said above too, It interrupts But doesn’t kill,m

> [SBN Read] Namenode crashes if one of The JN is down
> 
>
> Key: HDFS-14655
> URL: https://issues.apache.org/jira/browse/HDFS-14655
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 3.3.0
>Reporter: Harshakiran Reddy
>Assignee: Ayush Saxena
>Priority: Critical
> Attachments: HDFS-14655-01.patch, HDFS-14655-02.patch, 
> HDFS-14655-03.patch, HDFS-14655-04.patch, HDFS-14655.poc.patch
>
>
> {noformat}
> 2019-07-04 17:35:54,064 | INFO  | Logger channel (from parallel executor) to 
> XXX/XXX | Retrying connect to server: XXX/XXX. Already tried 
> 9 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, 
> sleepTime=1000 MILLISECONDS) | Client.java:975
> 2019-07-04 17:35:54,087 | FATAL | Edit log tailer | Unknown error encountered 
> while tailing edits. Shutting down standby NN. | EditLogTailer.java:474
> java.lang.OutOfMemoryError: unable to create new native thread
>   at java.lang.Thread.start0(Native Method)
>   at java.lang.Thread.start(Thread.java:717)
>   at 
> java.util.concurrent.ThreadPoolExecutor.addWorker(ThreadPoolExecutor.java:957)
>   at 
> java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1378)
>   at 
> com.google.common.util.concurrent.MoreExecutors$ListeningDecorator.execute(MoreExecutors.java:440)
>   at 
> com.google.common.util.concurrent.AbstractListeningExecutorService.submit(AbstractListeningExecutorService.java:56)
>   at 
> org.apache.hadoop.hdfs.qjournal.client.IPCLoggerChannel.getJournaledEdits(IPCLoggerChannel.java:565)
>   at 
> org.apache.hadoop.hdfs.qjournal.client.AsyncLoggerSet.getJournaledEdits(AsyncLoggerSet.java:272)
>   at 
> org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager.selectRpcInputStreams(QuorumJournalManager.java:533)
>   at 
> org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager.selectInputStreams(QuorumJournalManager.java:508)
>   at 
> org.apache.hadoop.hdfs.server.namenode.JournalSet.selectInputStreams(JournalSet.java:275)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLog.selectInputStreams(FSEditLog.java:1681)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLog.selectInputStreams(FSEditLog.java:1714)
>   at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer.doTailEdits(EditLogTailer.java:307)
>   at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.doWork(EditLogTailer.java:460)
>   at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.access$300(EditLogTailer.java:410)
>   at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread$1.run(EditLogTailer.java:427)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:360)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1709)
>   at 
> org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:483)
>   at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.run(EditLogTailer.java:423)
> 2019-07-04 17:35:54,112 | INFO  | Edit log tailer | Exiting with status 1: 
> java.lang.OutOfMemoryError: unable to create new native thread | 
> ExitUtil.java:210
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDDS-2075) Tracing in OzoneManager call is propagated with wrong parent

2019-09-11 Thread Xiaoyu Yao (Jira)



 [ 
https://issues.apache.org/jira/browse/HDDS-2075?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaoyu Yao updated HDDS-2075:
-
Resolution: Fixed
Status: Resolved  (was: Patch Available)

Thanks [~adoroszlai] for the contribution. I've merged the PR to trunk.

> Tracing in OzoneManager call is propagated with wrong parent
> 
>
> Key: HDDS-2075
> URL: https://issues.apache.org/jira/browse/HDDS-2075
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>Reporter: Elek, Marton
>Assignee: Doroszlai, Attila
>Priority: Major
>  Labels: pull-request-available
> Attachments: create_bucket-new.png, create_bucket.png
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> As you can see in the attached screenshot the OzoneManager.createBucket 
> (server side) tracing information is the children of the freon.createBucket 
> instead of the freon OzoneManagerProtocolPB.submitRequest.
> To avoid confusion the hierarchy should be fixed (Most probably we generate 
> the child span AFTER we already serialized the parent one to the message) 



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Work logged] (HDDS-2075) Tracing in OzoneManager call is propagated with wrong parent

2019-09-11 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HDDS-2075?focusedWorklogId=310935&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-310935
 ]

ASF GitHub Bot logged work on HDDS-2075:


Author: ASF GitHub Bot
Created on: 11/Sep/19 18:59
Start Date: 11/Sep/19 18:59
Worklog Time Spent: 10m 
  Work Description: xiaoyuyao commented on pull request #1415: HDDS-2075. 
Tracing in OzoneManager call is propagated with wrong parent
URL: https://github.com/apache/hadoop/pull/1415
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 310935)
Time Spent: 50m  (was: 40m)

> Tracing in OzoneManager call is propagated with wrong parent
> 
>
> Key: HDDS-2075
> URL: https://issues.apache.org/jira/browse/HDDS-2075
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>Reporter: Elek, Marton
>Assignee: Doroszlai, Attila
>Priority: Major
>  Labels: pull-request-available
> Attachments: create_bucket-new.png, create_bucket.png
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> As you can see in the attached screenshot the OzoneManager.createBucket 
> (server side) tracing information is the children of the freon.createBucket 
> instead of the freon OzoneManagerProtocolPB.submitRequest.
> To avoid confusion the hierarchy should be fixed (Most probably we generate 
> the child span AFTER we already serialized the parent one to the message) 



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-14655) [SBN Read] Namenode crashes if one of The JN is down

2019-09-11 Thread Ayush Saxena (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-14655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16927900#comment-16927900
 ] 

Ayush Saxena commented on HDFS-14655:
-

Yes, As I said above too, It interrupts But doesn’t kill,m

> [SBN Read] Namenode crashes if one of The JN is down
> 
>
> Key: HDFS-14655
> URL: https://issues.apache.org/jira/browse/HDFS-14655
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 3.3.0
>Reporter: Harshakiran Reddy
>Assignee: Ayush Saxena
>Priority: Critical
> Attachments: HDFS-14655-01.patch, HDFS-14655-02.patch, 
> HDFS-14655-03.patch, HDFS-14655-04.patch, HDFS-14655.poc.patch
>
>
> {noformat}
> 2019-07-04 17:35:54,064 | INFO  | Logger channel (from parallel executor) to 
> XXX/XXX | Retrying connect to server: XXX/XXX. Already tried 
> 9 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, 
> sleepTime=1000 MILLISECONDS) | Client.java:975
> 2019-07-04 17:35:54,087 | FATAL | Edit log tailer | Unknown error encountered 
> while tailing edits. Shutting down standby NN. | EditLogTailer.java:474
> java.lang.OutOfMemoryError: unable to create new native thread
>   at java.lang.Thread.start0(Native Method)
>   at java.lang.Thread.start(Thread.java:717)
>   at 
> java.util.concurrent.ThreadPoolExecutor.addWorker(ThreadPoolExecutor.java:957)
>   at 
> java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1378)
>   at 
> com.google.common.util.concurrent.MoreExecutors$ListeningDecorator.execute(MoreExecutors.java:440)
>   at 
> com.google.common.util.concurrent.AbstractListeningExecutorService.submit(AbstractListeningExecutorService.java:56)
>   at 
> org.apache.hadoop.hdfs.qjournal.client.IPCLoggerChannel.getJournaledEdits(IPCLoggerChannel.java:565)
>   at 
> org.apache.hadoop.hdfs.qjournal.client.AsyncLoggerSet.getJournaledEdits(AsyncLoggerSet.java:272)
>   at 
> org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager.selectRpcInputStreams(QuorumJournalManager.java:533)
>   at 
> org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager.selectInputStreams(QuorumJournalManager.java:508)
>   at 
> org.apache.hadoop.hdfs.server.namenode.JournalSet.selectInputStreams(JournalSet.java:275)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLog.selectInputStreams(FSEditLog.java:1681)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLog.selectInputStreams(FSEditLog.java:1714)
>   at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer.doTailEdits(EditLogTailer.java:307)
>   at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.doWork(EditLogTailer.java:460)
>   at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.access$300(EditLogTailer.java:410)
>   at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread$1.run(EditLogTailer.java:427)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:360)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1709)
>   at 
> org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:483)
>   at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.run(EditLogTailer.java:423)
> 2019-07-04 17:35:54,112 | INFO  | Edit log tailer | Exiting with status 1: 
> java.lang.OutOfMemoryError: unable to create new native thread | 
> ExitUtil.java:210
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDDS-2096) Ozone ACL document missing AddAcl API

2019-09-11 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HDDS-2096?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HDDS-2096:
-
Labels: pull-request-available  (was: )

> Ozone ACL document missing AddAcl API
> -
>
> Key: HDDS-2096
> URL: https://issues.apache.org/jira/browse/HDDS-2096
> Project: Hadoop Distributed Data Store
>  Issue Type: Test
>Reporter: Xiaoyu Yao
>Assignee: Xiaoyu Yao
>Priority: Major
>  Labels: pull-request-available
>
> Current Ozone Native ACL APIs document looks like below, the AddAcl is 
> missing.
>  
> h3. Ozone Native ACL APIs
> The ACLs can be manipulated by a set of APIs supported by Ozone. The APIs 
> supported are:
>  # *SetAcl* – This API will take user principal, the name, type of the ozone 
> object and a list of ACLs.
>  # *GetAcl* – This API will take the name and type of the ozone object and 
> will return a list of ACLs.
>  # *RemoveAcl* - This API will take the name, type of the ozone object and 
> the ACL that has to be removed.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDDS-2096) Ozone ACL document missing AddAcl API

2019-09-11 Thread Xiaoyu Yao (Jira)



 [ 
https://issues.apache.org/jira/browse/HDDS-2096?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaoyu Yao updated HDDS-2096:
-
Issue Type: Bug  (was: Test)

> Ozone ACL document missing AddAcl API
> -
>
> Key: HDDS-2096
> URL: https://issues.apache.org/jira/browse/HDDS-2096
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>Reporter: Xiaoyu Yao
>Assignee: Xiaoyu Yao
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Current Ozone Native ACL APIs document looks like below, the AddAcl is 
> missing.
>  
> h3. Ozone Native ACL APIs
> The ACLs can be manipulated by a set of APIs supported by Ozone. The APIs 
> supported are:
>  # *SetAcl* – This API will take user principal, the name, type of the ozone 
> object and a list of ACLs.
>  # *GetAcl* – This API will take the name and type of the ozone object and 
> will return a list of ACLs.
>  # *RemoveAcl* - This API will take the name, type of the ozone object and 
> the ACL that has to be removed.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Work logged] (HDDS-2096) Ozone ACL document missing AddAcl API

2019-09-11 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HDDS-2096?focusedWorklogId=310934&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-310934
 ]

ASF GitHub Bot logged work on HDDS-2096:


Author: ASF GitHub Bot
Created on: 11/Sep/19 18:57
Start Date: 11/Sep/19 18:57
Worklog Time Spent: 10m 
  Work Description: xiaoyuyao commented on pull request #1427: HDDS-2096. 
Ozone ACL document missing AddAcl API. Contributed by Xiao…
URL: https://github.com/apache/hadoop/pull/1427
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 310934)
Remaining Estimate: 0h
Time Spent: 10m

> Ozone ACL document missing AddAcl API
> -
>
> Key: HDDS-2096
> URL: https://issues.apache.org/jira/browse/HDDS-2096
> Project: Hadoop Distributed Data Store
>  Issue Type: Test
>Reporter: Xiaoyu Yao
>Assignee: Xiaoyu Yao
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Current Ozone Native ACL APIs document looks like below, the AddAcl is 
> missing.
>  
> h3. Ozone Native ACL APIs
> The ACLs can be manipulated by a set of APIs supported by Ozone. The APIs 
> supported are:
>  # *SetAcl* – This API will take user principal, the name, type of the ozone 
> object and a list of ACLs.
>  # *GetAcl* – This API will take the name and type of the ozone object and 
> will return a list of ACLs.
>  # *RemoveAcl* - This API will take the name, type of the ozone object and 
> the ACL that has to be removed.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Assigned] (HDDS-2096) Ozone ACL document missing AddAcl API

2019-09-11 Thread Xiaoyu Yao (Jira)



 [ 
https://issues.apache.org/jira/browse/HDDS-2096?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaoyu Yao reassigned HDDS-2096:


Assignee: Xiaoyu Yao

> Ozone ACL document missing AddAcl API
> -
>
> Key: HDDS-2096
> URL: https://issues.apache.org/jira/browse/HDDS-2096
> Project: Hadoop Distributed Data Store
>  Issue Type: Test
>Reporter: Xiaoyu Yao
>Assignee: Xiaoyu Yao
>Priority: Major
>
> Current Ozone Native ACL APIs document looks like below, the AddAcl is 
> missing.
>  
> h3. Ozone Native ACL APIs
> The ACLs can be manipulated by a set of APIs supported by Ozone. The APIs 
> supported are:
>  # *SetAcl* – This API will take user principal, the name, type of the ozone 
> object and a list of ACLs.
>  # *GetAcl* – This API will take the name and type of the ozone object and 
> will return a list of ACLs.
>  # *RemoveAcl* - This API will take the name, type of the ozone object and 
> the ACL that has to be removed.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-14655) [SBN Read] Namenode crashes if one of The JN is down

2019-09-11 Thread Erik Krogen (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-14655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16927896#comment-16927896
 ] 

Erik Krogen commented on HDFS-14655:


Hm. I believe that when you call {{cancel()}} on the {{ListenableFuture}}, it 
does not kill the thread, it just interrupts it so that it can go on to fetch 
another task. But it would be good to confirm this.

> [SBN Read] Namenode crashes if one of The JN is down
> 
>
> Key: HDFS-14655
> URL: https://issues.apache.org/jira/browse/HDFS-14655
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 3.3.0
>Reporter: Harshakiran Reddy
>Assignee: Ayush Saxena
>Priority: Critical
> Attachments: HDFS-14655-01.patch, HDFS-14655-02.patch, 
> HDFS-14655-03.patch, HDFS-14655-04.patch, HDFS-14655.poc.patch
>
>
> {noformat}
> 2019-07-04 17:35:54,064 | INFO  | Logger channel (from parallel executor) to 
> XXX/XXX | Retrying connect to server: XXX/XXX. Already tried 
> 9 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, 
> sleepTime=1000 MILLISECONDS) | Client.java:975
> 2019-07-04 17:35:54,087 | FATAL | Edit log tailer | Unknown error encountered 
> while tailing edits. Shutting down standby NN. | EditLogTailer.java:474
> java.lang.OutOfMemoryError: unable to create new native thread
>   at java.lang.Thread.start0(Native Method)
>   at java.lang.Thread.start(Thread.java:717)
>   at 
> java.util.concurrent.ThreadPoolExecutor.addWorker(ThreadPoolExecutor.java:957)
>   at 
> java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1378)
>   at 
> com.google.common.util.concurrent.MoreExecutors$ListeningDecorator.execute(MoreExecutors.java:440)
>   at 
> com.google.common.util.concurrent.AbstractListeningExecutorService.submit(AbstractListeningExecutorService.java:56)
>   at 
> org.apache.hadoop.hdfs.qjournal.client.IPCLoggerChannel.getJournaledEdits(IPCLoggerChannel.java:565)
>   at 
> org.apache.hadoop.hdfs.qjournal.client.AsyncLoggerSet.getJournaledEdits(AsyncLoggerSet.java:272)
>   at 
> org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager.selectRpcInputStreams(QuorumJournalManager.java:533)
>   at 
> org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager.selectInputStreams(QuorumJournalManager.java:508)
>   at 
> org.apache.hadoop.hdfs.server.namenode.JournalSet.selectInputStreams(JournalSet.java:275)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLog.selectInputStreams(FSEditLog.java:1681)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLog.selectInputStreams(FSEditLog.java:1714)
>   at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer.doTailEdits(EditLogTailer.java:307)
>   at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.doWork(EditLogTailer.java:460)
>   at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.access$300(EditLogTailer.java:410)
>   at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread$1.run(EditLogTailer.java:427)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:360)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1709)
>   at 
> org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:483)
>   at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.run(EditLogTailer.java:423)
> 2019-07-04 17:35:54,112 | INFO  | Edit log tailer | Exiting with status 1: 
> java.lang.OutOfMemoryError: unable to create new native thread | 
> ExitUtil.java:210
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Work logged] (HDDS-2107) Datanodes should retry forever to connect to SCM in an unsecure environment

2019-09-11 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HDDS-2107?focusedWorklogId=310907&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-310907
 ]

ASF GitHub Bot logged work on HDDS-2107:


Author: ASF GitHub Bot
Created on: 11/Sep/19 18:20
Start Date: 11/Sep/19 18:20
Worklog Time Spent: 10m 
  Work Description: vivekratnavel commented on issue #1424: HDDS-2107. 
Datanodes should retry forever to connect to SCM in an…
URL: https://github.com/apache/hadoop/pull/1424#issuecomment-530503260
 
 
   @adoroszlai You are right. With this change, we don't get the error from 
`EndPointStateMachine`  and the result now looks like this:
   ```
   datanode_1  | 2019-09-11 18:16:55 INFO  InitDatanodeState:140 - 
DatanodeDetails is persisted to /data/datanode.id
   datanode_1  | 2019-09-11 18:16:57 INFO  Client:948 - Retrying connect to 
server: datanode/172.19.0.2:9861. Already tried 0 time(s); retry policy is 
RetryUpToMaximumCountWithFixedSleep(maxRetries=2147483647, sleepTime=1000 
MILLISECONDS)
   datanode_1  | 2019-09-11 18:16:58 INFO  Client:948 - Retrying connect to 
server: datanode/172.19.0.2:9861. Already tried 1 time(s); retry policy is 
RetryUpToMaximumCountWithFixedSleep(maxRetries=2147483647, sleepTime=1000 
MILLISECONDS)
   datanode_1  | 2019-09-11 18:16:59 INFO  Client:948 - Retrying connect to 
server: datanode/172.19.0.2:9861. Already tried 2 time(s); retry policy is 
RetryUpToMaximumCountWithFixedSleep(maxRetries=2147483647, sleepTime=1000 
MILLISECONDS)
   datanode_1  | 2019-09-11 18:17:00 INFO  Client:948 - Retrying connect to 
server: datanode/172.19.0.2:9861. Already tried 3 time(s); retry policy is 
RetryUpToMaximumCountWithFixedSleep(maxRetries=2147483647, sleepTime=1000 
MILLISECONDS)
   datanode_1  | 2019-09-11 18:17:01 INFO  Client:948 - Retrying connect to 
server: datanode/172.19.0.2:9861. Already tried 4 time(s); retry policy is 
RetryUpToMaximumCountWithFixedSleep(maxRetries=2147483647, sleepTime=1000 
MILLISECONDS)
   datanode_1  | 2019-09-11 18:17:02 INFO  Client:948 - Retrying connect to 
server: datanode/172.19.0.2:9861. Already tried 5 time(s); retry policy is 
RetryUpToMaximumCountWithFixedSleep(maxRetries=2147483647, sleepTime=1000 
MILLISECONDS)
   datanode_1  | 2019-09-11 18:17:03 INFO  Client:948 - Retrying connect to 
server: datanode/172.19.0.2:9861. Already tried 6 time(s); retry policy is 
RetryUpToMaximumCountWithFixedSleep(maxRetries=2147483647, sleepTime=1000 
MILLISECONDS)
   datanode_1  | 2019-09-11 18:17:04 INFO  Client:948 - Retrying connect to 
server: datanode/172.19.0.2:9861. Already tried 7 time(s); retry policy is 
RetryUpToMaximumCountWithFixedSleep(maxRetries=2147483647, sleepTime=1000 
MILLISECONDS)
   datanode_1  | 2019-09-11 18:17:05 INFO  Client:948 - Retrying connect to 
server: datanode/172.19.0.2:9861. Already tried 8 time(s); retry policy is 
RetryUpToMaximumCountWithFixedSleep(maxRetries=2147483647, sleepTime=1000 
MILLISECONDS)
   datanode_1  | 2019-09-11 18:17:06 INFO  Client:948 - Retrying connect to 
server: datanode/172.19.0.2:9861. Already tried 9 time(s); retry policy is 
RetryUpToMaximumCountWithFixedSleep(maxRetries=2147483647, sleepTime=1000 
MILLISECONDS)
   datanode_1  | 2019-09-11 18:17:07 INFO  Client:948 - Retrying connect to 
server: datanode/172.19.0.2:9861. Already tried 10 time(s); retry policy is 
RetryUpToMaximumCountWithFixedSleep(maxRetries=2147483647, sleepTime=1000 
MILLISECONDS)
   datanode_1  | 2019-09-11 18:17:08 INFO  Client:948 - Retrying connect to 
server: datanode/172.19.0.2:9861. Already tried 11 time(s); retry policy is 
RetryUpToMaximumCountWithFixedSleep(maxRetries=2147483647, sleepTime=1000 
MILLISECONDS)
   datanode_1  | 2019-09-11 18:17:09 INFO  Client:948 - Retrying connect to 
server: datanode/172.19.0.2:9861. Already tried 12 time(s); retry policy is 
RetryUpToMaximumCountWithFixedSleep(maxRetries=2147483647, sleepTime=1000 
MILLISECONDS)
   datanode_1  | 2019-09-11 18:17:10 INFO  Client:948 - Retrying connect to 
server: datanode/172.19.0.2:9861. Already tried 13 time(s); retry policy is 
RetryUpToMaximumCountWithFixedSleep(maxRetries=2147483647, sleepTime=1000 
MILLISECONDS)
   datanode_1  | 2019-09-11 18:17:11 INFO  Client:948 - Retrying connect to 
server: datanode/172.19.0.2:9861. Already tried 14 time(s); retry policy is 
RetryUpToMaximumCountWithFixedSleep(maxRetries=2147483647, sleepTime=1000 
MILLISECONDS)
   datanode_1  | 2019-09-11 18:17:12 INFO  Client:948 - Retrying connect to 
server: datanode/172.19.0.2:9861. Already tried 15 time(s); retry policy is 
RetryUpToMaximumCountWithFixedSleep(maxRetries=2147483647, sleepTime=1000 
MILLISECONDS)
   datanode_1  | 2019-09-11 18:17:13 INFO  Client:948 - Retrying connect to 
server: datanode/172.19.0.2:9861. Already tried 16 time(s); retry policy is 
RetryUpToMaximumCountWithFixedSleep(maxRetries=2147483647, sleepTime=1000 
MILLISECONDS)
   data

[jira] [Comment Edited] (HDFS-14655) [SBN Read] Namenode crashes if one of The JN is down

2019-09-11 Thread Ayush Saxena (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-14655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16927854#comment-16927854
 ] 

Ayush Saxena edited comment on HDFS-14655 at 9/11/19 5:44 PM:
--

Thanx [~xkrogen], Well we can change it to 60 seconds. As it was previously, I 
guess that should be enough since we are cancelling the threads too?



was (Author: ayushtkn):
Thanx [~xkrogen], Well we can change it to 60 seconds. I guess that should be 
enough?


> [SBN Read] Namenode crashes if one of The JN is down
> 
>
> Key: HDFS-14655
> URL: https://issues.apache.org/jira/browse/HDFS-14655
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 3.3.0
>Reporter: Harshakiran Reddy
>Assignee: Ayush Saxena
>Priority: Critical
> Attachments: HDFS-14655-01.patch, HDFS-14655-02.patch, 
> HDFS-14655-03.patch, HDFS-14655-04.patch, HDFS-14655.poc.patch
>
>
> {noformat}
> 2019-07-04 17:35:54,064 | INFO  | Logger channel (from parallel executor) to 
> XXX/XXX | Retrying connect to server: XXX/XXX. Already tried 
> 9 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, 
> sleepTime=1000 MILLISECONDS) | Client.java:975
> 2019-07-04 17:35:54,087 | FATAL | Edit log tailer | Unknown error encountered 
> while tailing edits. Shutting down standby NN. | EditLogTailer.java:474
> java.lang.OutOfMemoryError: unable to create new native thread
>   at java.lang.Thread.start0(Native Method)
>   at java.lang.Thread.start(Thread.java:717)
>   at 
> java.util.concurrent.ThreadPoolExecutor.addWorker(ThreadPoolExecutor.java:957)
>   at 
> java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1378)
>   at 
> com.google.common.util.concurrent.MoreExecutors$ListeningDecorator.execute(MoreExecutors.java:440)
>   at 
> com.google.common.util.concurrent.AbstractListeningExecutorService.submit(AbstractListeningExecutorService.java:56)
>   at 
> org.apache.hadoop.hdfs.qjournal.client.IPCLoggerChannel.getJournaledEdits(IPCLoggerChannel.java:565)
>   at 
> org.apache.hadoop.hdfs.qjournal.client.AsyncLoggerSet.getJournaledEdits(AsyncLoggerSet.java:272)
>   at 
> org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager.selectRpcInputStreams(QuorumJournalManager.java:533)
>   at 
> org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager.selectInputStreams(QuorumJournalManager.java:508)
>   at 
> org.apache.hadoop.hdfs.server.namenode.JournalSet.selectInputStreams(JournalSet.java:275)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLog.selectInputStreams(FSEditLog.java:1681)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLog.selectInputStreams(FSEditLog.java:1714)
>   at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer.doTailEdits(EditLogTailer.java:307)
>   at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.doWork(EditLogTailer.java:460)
>   at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.access$300(EditLogTailer.java:410)
>   at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread$1.run(EditLogTailer.java:427)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:360)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1709)
>   at 
> org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:483)
>   at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.run(EditLogTailer.java:423)
> 2019-07-04 17:35:54,112 | INFO  | Edit log tailer | Exiting with status 1: 
> java.lang.OutOfMemoryError: unable to create new native thread | 
> ExitUtil.java:210
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Work logged] (HDDS-1786) Datanodes takeSnapshot should delete previously created snapshots

2019-09-11 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HDDS-1786?focusedWorklogId=310887&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-310887
 ]

ASF GitHub Bot logged work on HDDS-1786:


Author: ASF GitHub Bot
Created on: 11/Sep/19 17:44
Start Date: 11/Sep/19 17:44
Worklog Time Spent: 10m 
  Work Description: avijayanhwx commented on issue #1163: HDDS-1786 : 
Datanodes takeSnapshot should delete previously created s…
URL: https://github.com/apache/hadoop/pull/1163#issuecomment-530488768
 
 
   > Thanks @avijayanhwx for working on the patch. The patch looks good to me. 
Can u check the compilation issue?
   
   @bshashikant I have fixed an edge case in the unit test. The rest of the 
failures are unrelated. 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 310887)
Time Spent: 4h 20m  (was: 4h 10m)

> Datanodes takeSnapshot should delete previously created snapshots
> -
>
> Key: HDDS-1786
> URL: https://issues.apache.org/jira/browse/HDDS-1786
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Datanode
>Affects Versions: 0.4.0
>Reporter: Mukul Kumar Singh
>Assignee: Aravindan Vijayan
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 4h 20m
>  Remaining Estimate: 0h
>
> Right now, after after taking a new snapshot, the previous snapshot file is 
> left in the raft log directory. When a new snapshot is taken, the previous 
> snapshots should be deleted.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Work logged] (HDDS-1786) Datanodes takeSnapshot should delete previously created snapshots

2019-09-11 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HDDS-1786?focusedWorklogId=310886&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-310886
 ]

ASF GitHub Bot logged work on HDDS-1786:


Author: ASF GitHub Bot
Created on: 11/Sep/19 17:44
Start Date: 11/Sep/19 17:44
Worklog Time Spent: 10m 
  Work Description: avijayanhwx commented on issue #1163: HDDS-1786 : 
Datanodes takeSnapshot should delete previously created s…
URL: https://github.com/apache/hadoop/pull/1163#issuecomment-530488768
 
 
   > Thanks @avijayanhwx for working on the patch. The patch looks good to me. 
Can u check the compilation issue?
   
   @bshashikant I have fixed an edge case in the unit test.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 310886)
Time Spent: 4h 10m  (was: 4h)

> Datanodes takeSnapshot should delete previously created snapshots
> -
>
> Key: HDDS-1786
> URL: https://issues.apache.org/jira/browse/HDDS-1786
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Datanode
>Affects Versions: 0.4.0
>Reporter: Mukul Kumar Singh
>Assignee: Aravindan Vijayan
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 4h 10m
>  Remaining Estimate: 0h
>
> Right now, after after taking a new snapshot, the previous snapshot file is 
> left in the raft log directory. When a new snapshot is taken, the previous 
> snapshots should be deleted.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-14655) [SBN Read] Namenode crashes if one of The JN is down

2019-09-11 Thread Ayush Saxena (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-14655?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ayush Saxena updated HDFS-14655:

Attachment: HDFS-14655-04.patch

> [SBN Read] Namenode crashes if one of The JN is down
> 
>
> Key: HDFS-14655
> URL: https://issues.apache.org/jira/browse/HDFS-14655
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 3.3.0
>Reporter: Harshakiran Reddy
>Assignee: Ayush Saxena
>Priority: Critical
> Attachments: HDFS-14655-01.patch, HDFS-14655-02.patch, 
> HDFS-14655-03.patch, HDFS-14655-04.patch, HDFS-14655.poc.patch
>
>
> {noformat}
> 2019-07-04 17:35:54,064 | INFO  | Logger channel (from parallel executor) to 
> XXX/XXX | Retrying connect to server: XXX/XXX. Already tried 
> 9 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, 
> sleepTime=1000 MILLISECONDS) | Client.java:975
> 2019-07-04 17:35:54,087 | FATAL | Edit log tailer | Unknown error encountered 
> while tailing edits. Shutting down standby NN. | EditLogTailer.java:474
> java.lang.OutOfMemoryError: unable to create new native thread
>   at java.lang.Thread.start0(Native Method)
>   at java.lang.Thread.start(Thread.java:717)
>   at 
> java.util.concurrent.ThreadPoolExecutor.addWorker(ThreadPoolExecutor.java:957)
>   at 
> java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1378)
>   at 
> com.google.common.util.concurrent.MoreExecutors$ListeningDecorator.execute(MoreExecutors.java:440)
>   at 
> com.google.common.util.concurrent.AbstractListeningExecutorService.submit(AbstractListeningExecutorService.java:56)
>   at 
> org.apache.hadoop.hdfs.qjournal.client.IPCLoggerChannel.getJournaledEdits(IPCLoggerChannel.java:565)
>   at 
> org.apache.hadoop.hdfs.qjournal.client.AsyncLoggerSet.getJournaledEdits(AsyncLoggerSet.java:272)
>   at 
> org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager.selectRpcInputStreams(QuorumJournalManager.java:533)
>   at 
> org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager.selectInputStreams(QuorumJournalManager.java:508)
>   at 
> org.apache.hadoop.hdfs.server.namenode.JournalSet.selectInputStreams(JournalSet.java:275)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLog.selectInputStreams(FSEditLog.java:1681)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLog.selectInputStreams(FSEditLog.java:1714)
>   at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer.doTailEdits(EditLogTailer.java:307)
>   at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.doWork(EditLogTailer.java:460)
>   at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.access$300(EditLogTailer.java:410)
>   at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread$1.run(EditLogTailer.java:427)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:360)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1709)
>   at 
> org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:483)
>   at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.run(EditLogTailer.java:423)
> 2019-07-04 17:35:54,112 | INFO  | Edit log tailer | Exiting with status 1: 
> java.lang.OutOfMemoryError: unable to create new native thread | 
> ExitUtil.java:210
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-14655) [SBN Read] Namenode crashes if one of The JN is down

2019-09-11 Thread Ayush Saxena (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-14655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16927854#comment-16927854
 ] 

Ayush Saxena commented on HDFS-14655:
-

Thanx [~xkrogen], Well we can change it to 60 seconds. I guess that should be 
enough?


> [SBN Read] Namenode crashes if one of The JN is down
> 
>
> Key: HDFS-14655
> URL: https://issues.apache.org/jira/browse/HDFS-14655
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 3.3.0
>Reporter: Harshakiran Reddy
>Assignee: Ayush Saxena
>Priority: Critical
> Attachments: HDFS-14655-01.patch, HDFS-14655-02.patch, 
> HDFS-14655-03.patch, HDFS-14655.poc.patch
>
>
> {noformat}
> 2019-07-04 17:35:54,064 | INFO  | Logger channel (from parallel executor) to 
> XXX/XXX | Retrying connect to server: XXX/XXX. Already tried 
> 9 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, 
> sleepTime=1000 MILLISECONDS) | Client.java:975
> 2019-07-04 17:35:54,087 | FATAL | Edit log tailer | Unknown error encountered 
> while tailing edits. Shutting down standby NN. | EditLogTailer.java:474
> java.lang.OutOfMemoryError: unable to create new native thread
>   at java.lang.Thread.start0(Native Method)
>   at java.lang.Thread.start(Thread.java:717)
>   at 
> java.util.concurrent.ThreadPoolExecutor.addWorker(ThreadPoolExecutor.java:957)
>   at 
> java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1378)
>   at 
> com.google.common.util.concurrent.MoreExecutors$ListeningDecorator.execute(MoreExecutors.java:440)
>   at 
> com.google.common.util.concurrent.AbstractListeningExecutorService.submit(AbstractListeningExecutorService.java:56)
>   at 
> org.apache.hadoop.hdfs.qjournal.client.IPCLoggerChannel.getJournaledEdits(IPCLoggerChannel.java:565)
>   at 
> org.apache.hadoop.hdfs.qjournal.client.AsyncLoggerSet.getJournaledEdits(AsyncLoggerSet.java:272)
>   at 
> org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager.selectRpcInputStreams(QuorumJournalManager.java:533)
>   at 
> org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager.selectInputStreams(QuorumJournalManager.java:508)
>   at 
> org.apache.hadoop.hdfs.server.namenode.JournalSet.selectInputStreams(JournalSet.java:275)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLog.selectInputStreams(FSEditLog.java:1681)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLog.selectInputStreams(FSEditLog.java:1714)
>   at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer.doTailEdits(EditLogTailer.java:307)
>   at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.doWork(EditLogTailer.java:460)
>   at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.access$300(EditLogTailer.java:410)
>   at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread$1.run(EditLogTailer.java:427)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:360)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1709)
>   at 
> org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:483)
>   at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.run(EditLogTailer.java:423)
> 2019-07-04 17:35:54,112 | INFO  | Edit log tailer | Exiting with status 1: 
> java.lang.OutOfMemoryError: unable to create new native thread | 
> ExitUtil.java:210
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Work logged] (HDDS-1982) Extend SCMNodeManager to support decommission and maintenance states

2019-09-11 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HDDS-1982?focusedWorklogId=310884&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-310884
 ]

ASF GitHub Bot logged work on HDDS-1982:


Author: ASF GitHub Bot
Created on: 11/Sep/19 17:38
Start Date: 11/Sep/19 17:38
Worklog Time Spent: 10m 
  Work Description: sodonnel commented on pull request #1344: HDDS-1982 
Extend SCMNodeManager to support decommission and maintenance states
URL: https://github.com/apache/hadoop/pull/1344#discussion_r323370846
 
 

 ##
 File path: 
hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/node/SCMNodeManager.java
 ##
 @@ -417,9 +451,12 @@ private SCMNodeStat getNodeStatInternal(DatanodeDetails 
datanodeDetails) {
 
   @Override
   public Map getNodeCount() {
+// TODO - This does not consider decom, maint etc.
 Map nodeCountMap = new HashMap();
 
 Review comment:
   The existing code had Map, but I agree it would be better 
with  or . I plan to leave this as is 
for now, as this method is used only for JMX right now, and I plan to split 
that out into a separate change via HDDS-2113 as there are some open questions 
there.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 310884)
Time Spent: 5h 40m  (was: 5.5h)

> Extend SCMNodeManager to support decommission and maintenance states
> 
>
> Key: HDDS-1982
> URL: https://issues.apache.org/jira/browse/HDDS-1982
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>  Components: SCM
>Reporter: Stephen O'Donnell
>Assignee: Stephen O'Donnell
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 5h 40m
>  Remaining Estimate: 0h
>
> Currently, within SCM a node can have the following states:
> HEALTHY
> STALE
> DEAD
> DECOMMISSIONING
> DECOMMISSIONED
> The last 2 are not currently used.
> In order to support decommissioning and maintenance mode, we need to extend 
> the set of states a node can have to include decommission and maintenance 
> states.
> It is also important to note that a node decommissioning or entering 
> maintenance can also be HEALTHY, STALE or go DEAD.
> Therefore in this Jira I propose we should model a node state with two 
> different sets of values. The first, is effectively the liveliness of the 
> node, with the following states. This is largely what is in place now:
> HEALTHY
> STALE
> DEAD
> The second is the node operational state:
> IN_SERVICE
> DECOMMISSIONING
> DECOMMISSIONED
> ENTERING_MAINTENANCE
> IN_MAINTENANCE
> That means the overall total number of states for a node is the cross-product 
> of the two above lists, however it probably makes sense to keep the two 
> states seperate internally.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Work logged] (HDDS-1982) Extend SCMNodeManager to support decommission and maintenance states

2019-09-11 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HDDS-1982?focusedWorklogId=310882&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-310882
 ]

ASF GitHub Bot logged work on HDDS-1982:


Author: ASF GitHub Bot
Created on: 11/Sep/19 17:36
Start Date: 11/Sep/19 17:36
Worklog Time Spent: 10m 
  Work Description: sodonnel commented on pull request #1344: HDDS-1982 
Extend SCMNodeManager to support decommission and maintenance states
URL: https://github.com/apache/hadoop/pull/1344#discussion_r323369810
 
 

 ##
 File path: 
hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/node/states/NodeStateMap.java
 ##
 @@ -309,4 +381,61 @@ private void checkIfNodeExist(UUID uuid) throws 
NodeNotFoundException {
   throw new NodeNotFoundException("Node UUID: " + uuid);
 }
   }
+
+  /**
+   * Create a list of datanodeInfo for all nodes matching the passed states.
+   * Passing null for one of the states acts like a wildcard for that state.
+   *
+   * @param opState
+   * @param health
+   * @return List of DatanodeInfo objects matching the passed state
+   */
+  private List filterNodes(
+  NodeOperationalState opState, NodeState health) {
+if (opState != null && health != null) {
 
 Review comment:
   I had not really looked into the Streams API before, but I change the code 
to use streams and it does make it easier to follow, so I have made this 
change. I still kept the IF statements at the start of the method as if both 
params are null we can just return the entire list with no searching and if 
both are non-null we can search using the NodeStatus which should be slightly 
more efficient.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 310882)
Time Spent: 5.5h  (was: 5h 20m)

> Extend SCMNodeManager to support decommission and maintenance states
> 
>
> Key: HDDS-1982
> URL: https://issues.apache.org/jira/browse/HDDS-1982
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>  Components: SCM
>Reporter: Stephen O'Donnell
>Assignee: Stephen O'Donnell
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 5.5h
>  Remaining Estimate: 0h
>
> Currently, within SCM a node can have the following states:
> HEALTHY
> STALE
> DEAD
> DECOMMISSIONING
> DECOMMISSIONED
> The last 2 are not currently used.
> In order to support decommissioning and maintenance mode, we need to extend 
> the set of states a node can have to include decommission and maintenance 
> states.
> It is also important to note that a node decommissioning or entering 
> maintenance can also be HEALTHY, STALE or go DEAD.
> Therefore in this Jira I propose we should model a node state with two 
> different sets of values. The first, is effectively the liveliness of the 
> node, with the following states. This is largely what is in place now:
> HEALTHY
> STALE
> DEAD
> The second is the node operational state:
> IN_SERVICE
> DECOMMISSIONING
> DECOMMISSIONED
> ENTERING_MAINTENANCE
> IN_MAINTENANCE
> That means the overall total number of states for a node is the cross-product 
> of the two above lists, however it probably makes sense to keep the two 
> states seperate internally.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Work logged] (HDDS-1786) Datanodes takeSnapshot should delete previously created snapshots

2019-09-11 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HDDS-1786?focusedWorklogId=310881&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-310881
 ]

ASF GitHub Bot logged work on HDDS-1786:


Author: ASF GitHub Bot
Created on: 11/Sep/19 17:35
Start Date: 11/Sep/19 17:35
Worklog Time Spent: 10m 
  Work Description: bshashikant commented on issue #1163: HDDS-1786 : 
Datanodes takeSnapshot should delete previously created s…
URL: https://github.com/apache/hadoop/pull/1163#issuecomment-530485067
 
 
   Thanks @avijayanhwx for working on the patch. The patch looks good to me. 
Can u check the compilation issue?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 310881)
Time Spent: 4h  (was: 3h 50m)

> Datanodes takeSnapshot should delete previously created snapshots
> -
>
> Key: HDDS-1786
> URL: https://issues.apache.org/jira/browse/HDDS-1786
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Datanode
>Affects Versions: 0.4.0
>Reporter: Mukul Kumar Singh
>Assignee: Aravindan Vijayan
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 4h
>  Remaining Estimate: 0h
>
> Right now, after after taking a new snapshot, the previous snapshot file is 
> left in the raft log directory. When a new snapshot is taken, the previous 
> snapshots should be deleted.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDDS-2113) Update JMX metrics in SCMNodeMetrics for Decommission and Maintenance

2019-09-11 Thread Stephen O'Donnell (Jira)



[ 
https://issues.apache.org/jira/browse/HDDS-2113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16927840#comment-16927840
 ] 

Stephen O'Donnell commented on HDDS-2113:
-

[~nandakumar131] [~anu] [~arpaga] I would appreciate your thoughts on this.

> Update JMX metrics in SCMNodeMetrics for Decommission and Maintenance
> -
>
> Key: HDDS-2113
> URL: https://issues.apache.org/jira/browse/HDDS-2113
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>  Components: SCM
>Affects Versions: 0.5.0
>Reporter: Stephen O'Donnell
>Assignee: Stephen O'Donnell
>Priority: Major
>
> Currently the class SCMNodeMetrics exposes JMX metrics for the number of 
> HEALTHY, STALE and DEAD nodes.
> It also exposes the disk capacity of the cluster and the amount of space used 
> and available.
> We need to decide how we want to display things in JMX when nodes are in and 
> entering maintenance, decommissioning and decommissioned.
> We now have 15 states rather than the previous 3, as we can have nodes in:
>  * IN_SERVICE
>  * ENTERING_MAINTENANCE
>  * IN_MAINTENANCE
>  * DECOMMISSIONING
>  * DECOMMISSIONED
> And in each of these states, nodes can be:
>  * HEALTHY
>  * STALE
>  * DEAD
> The simplest case would be to expose these 15 states directly in JMX, as it 
> gives the complete picture, but I wonder if we need any summary JMX metrics 
> too?
>  
> We also need to consider how to count disk capacity and usage. For example:
>  # Do we count capacity and usage on a DECOMMISSIONING node? This is not a 
> clear cut answer, as a decommissioning node does not provide any capacity for 
> writers in the cluster, but it does use capacity.
>  # For a DECOMMISSIONED node, we probably should not count capacity or usage
>  # For an ENTERING_MAINTENANCE node, do we count capacity and usage? I 
> suspect we should include the capacity and usage in the totals, however a 
> node in this state will not be available for writes.
>  # For an IN_MAINTENANCE node that is healthy?
>  # For an IN_MAINTENANCE node that is dead?
> I would welcome any thoughts on this before changing the code.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

1 2 >

1 - 100 of 174 matches

Mail list logo