[jira] [Commented] (HDFS-14074) DataNode runs async disk checks maybe throws NullPointerException, and DataNode failed to register to NameSpace.

2019-09-09 Thread Zhankun Tang (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16926346#comment-16926346
 ] 

Zhankun Tang commented on HDFS-14074:
-

[~jojochuang], cool. Thanks!

> DataNode runs async disk checks  maybe  throws NullPointerException, and 
> DataNode failed to register to NameSpace.
> --
>
> Key: HDFS-14074
> URL: https://issues.apache.org/jira/browse/HDFS-14074
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs
>Affects Versions: 2.8.0, 3.0.0
> Environment: hadoop-2.7.3, hadoop-2.8.0
>Reporter: guangyi lu
>Assignee: guangyi lu
>Priority: Major
>  Labels: HDFS, HDFS-4
> Fix For: 3.3.0, 3.2.1, 3.1.3
>
> Attachments: HDFS-14074-latest.patch, HDFS-14074.patch, 
> WechatIMG83.jpeg
>
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
> In ThrottledAsyncChecker class,it members of the completedChecks is 
> WeakHashMap, its definition is as follows:
>       this.completedChecks = new WeakHashMap<>();
> and one of its uses is as follows in schedule method:
>      if (completedChecks.containsKey(target)) {  
>       // here may be happen garbage collection,and result may be null.
>        final LastCheckResult result = completedChecks.get(target);         
>  
>        final long msSinceLastCheck = timer.monotonicNow() - 
> result.completedAt;    
>        
> }
> after  "completedChecks.containsKey(target)",  may be happen garbage 
> collection,  and result may be null.
> the solution is:
> this.completedChecks = new ReferenceMap(1, 1);
> or
>  this.completedChecks = new HashMap<>();
>  



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14074) DataNode runs async disk checks maybe throws NullPointerException, and DataNode failed to register to NameSpace.

2019-09-09 Thread Wei-Chiu Chuang (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16926344#comment-16926344
 ] 

Wei-Chiu Chuang commented on HDFS-14074:


Removed the incompatible change flag. This is a harmless fix. Thanks

> DataNode runs async disk checks  maybe  throws NullPointerException, and 
> DataNode failed to register to NameSpace.
> --
>
> Key: HDFS-14074
> URL: https://issues.apache.org/jira/browse/HDFS-14074
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs
>Affects Versions: 2.8.0, 3.0.0
> Environment: hadoop-2.7.3, hadoop-2.8.0
>Reporter: guangyi lu
>Assignee: guangyi lu
>Priority: Major
>  Labels: HDFS, HDFS-4
> Fix For: 3.3.0, 3.2.1, 3.1.3
>
> Attachments: HDFS-14074-latest.patch, HDFS-14074.patch, 
> WechatIMG83.jpeg
>
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
> In ThrottledAsyncChecker class,it members of the completedChecks is 
> WeakHashMap, its definition is as follows:
>       this.completedChecks = new WeakHashMap<>();
> and one of its uses is as follows in schedule method:
>      if (completedChecks.containsKey(target)) {  
>       // here may be happen garbage collection,and result may be null.
>        final LastCheckResult result = completedChecks.get(target);         
>  
>        final long msSinceLastCheck = timer.monotonicNow() - 
> result.completedAt;    
>        
> }
> after  "completedChecks.containsKey(target)",  may be happen garbage 
> collection,  and result may be null.
> the solution is:
> this.completedChecks = new ReferenceMap(1, 1);
> or
>  this.completedChecks = new HashMap<>();
>  



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14074) DataNode runs async disk checks maybe throws NullPointerException, and DataNode failed to register to NameSpace.

2019-09-09 Thread Wei-Chiu Chuang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14074?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang updated HDFS-14074:
---
Hadoop Flags: Reviewed  (was: Incompatible change,Reviewed)

> DataNode runs async disk checks  maybe  throws NullPointerException, and 
> DataNode failed to register to NameSpace.
> --
>
> Key: HDFS-14074
> URL: https://issues.apache.org/jira/browse/HDFS-14074
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs
>Affects Versions: 2.8.0, 3.0.0
> Environment: hadoop-2.7.3, hadoop-2.8.0
>Reporter: guangyi lu
>Assignee: guangyi lu
>Priority: Major
>  Labels: HDFS, HDFS-4
> Fix For: 3.3.0, 3.2.1, 3.1.3
>
> Attachments: HDFS-14074-latest.patch, HDFS-14074.patch, 
> WechatIMG83.jpeg
>
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
> In ThrottledAsyncChecker class,it members of the completedChecks is 
> WeakHashMap, its definition is as follows:
>       this.completedChecks = new WeakHashMap<>();
> and one of its uses is as follows in schedule method:
>      if (completedChecks.containsKey(target)) {  
>       // here may be happen garbage collection,and result may be null.
>        final LastCheckResult result = completedChecks.get(target);         
>  
>        final long msSinceLastCheck = timer.monotonicNow() - 
> result.completedAt;    
>        
> }
> after  "completedChecks.containsKey(target)",  may be happen garbage 
> collection,  and result may be null.
> the solution is:
> this.completedChecks = new ReferenceMap(1, 1);
> or
>  this.completedChecks = new HashMap<>();
>  



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14074) DataNode runs async disk checks maybe throws NullPointerException, and DataNode failed to register to NameSpace.

2019-09-09 Thread Zhankun Tang (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16926343#comment-16926343
 ] 

Zhankun Tang commented on HDFS-14074:
-

[~jojochuang], [~luguangyi], [~arp], Could you please update the release note? 
this is a blocker for the 3.1.3 release too. Thanks a lot.

> DataNode runs async disk checks  maybe  throws NullPointerException, and 
> DataNode failed to register to NameSpace.
> --
>
> Key: HDFS-14074
> URL: https://issues.apache.org/jira/browse/HDFS-14074
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs
>Affects Versions: 2.8.0, 3.0.0
> Environment: hadoop-2.7.3, hadoop-2.8.0
>Reporter: guangyi lu
>Assignee: guangyi lu
>Priority: Major
>  Labels: HDFS, HDFS-4
> Fix For: 3.3.0, 3.2.1, 3.1.3
>
> Attachments: HDFS-14074-latest.patch, HDFS-14074.patch, 
> WechatIMG83.jpeg
>
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
> In ThrottledAsyncChecker class,it members of the completedChecks is 
> WeakHashMap, its definition is as follows:
>       this.completedChecks = new WeakHashMap<>();
> and one of its uses is as follows in schedule method:
>      if (completedChecks.containsKey(target)) {  
>       // here may be happen garbage collection,and result may be null.
>        final LastCheckResult result = completedChecks.get(target);         
>  
>        final long msSinceLastCheck = timer.monotonicNow() - 
> result.completedAt;    
>        
> }
> after  "completedChecks.containsKey(target)",  may be happen garbage 
> collection,  and result may be null.
> the solution is:
> this.completedChecks = new ReferenceMap(1, 1);
> or
>  this.completedChecks = new HashMap<>();
>  



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDDS-1868) Ozone pipelines should be marked as ready only after the leader election is complete

2019-09-09 Thread Siddharth Wagle (Jira)


[ 
https://issues.apache.org/jira/browse/HDDS-1868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16926174#comment-16926174
 ] 

Siddharth Wagle edited comment on HDDS-1868 at 9/10/19 4:34 AM:


[~ljain] you are very correct, my UT did catch it. Could you please review 
version 02? Thanks.
The UT does not check negative scenario where no leader means no report, so I 
will change the name if you think code changes looks good.


was (Author: swagle):
[~ljain] you are very correct, my UT did catch it. Could you please review 
version 02? Thanks.
The UT does not check negative scenario where no leader means no report, so I 
will change the name if you think change looks good.

> Ozone pipelines should be marked as ready only after the leader election is 
> complete
> 
>
> Key: HDDS-1868
> URL: https://issues.apache.org/jira/browse/HDDS-1868
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Datanode, SCM
>Affects Versions: 0.4.0
>Reporter: Mukul Kumar Singh
>Assignee: Siddharth Wagle
>Priority: Major
> Fix For: 0.5.0
>
> Attachments: HDDS-1868.01.patch, HDDS-1868.02.patch
>
>
> Ozone pipeline on restart start in allocated state, they are moved into open 
> state after all the pipeline have reported to it. However this potentially 
> can lead into an issue where the pipeline is still not ready to accept any 
> incoming IO operations.
> The pipelines should be marked as ready only after the leader election is 
> complete and leader is ready to accept incoming IO.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDDS-1868) Ozone pipelines should be marked as ready only after the leader election is complete

2019-09-09 Thread Siddharth Wagle (Jira)


[ 
https://issues.apache.org/jira/browse/HDDS-1868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16926174#comment-16926174
 ] 

Siddharth Wagle edited comment on HDDS-1868 at 9/10/19 4:33 AM:


[~ljain] you are very correct, my UT did catch it. Could you please review 
version 02? Thanks.
The UT does not check negative scenario where no leader means no report, so I 
will change the name if you think change looks good.


was (Author: swagle):
[~ljain] you are very correct, my UT did catch it. Could you please review 
version 02? Thanks.

> Ozone pipelines should be marked as ready only after the leader election is 
> complete
> 
>
> Key: HDDS-1868
> URL: https://issues.apache.org/jira/browse/HDDS-1868
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Datanode, SCM
>Affects Versions: 0.4.0
>Reporter: Mukul Kumar Singh
>Assignee: Siddharth Wagle
>Priority: Major
> Fix For: 0.5.0
>
> Attachments: HDDS-1868.01.patch, HDDS-1868.02.patch
>
>
> Ozone pipeline on restart start in allocated state, they are moved into open 
> state after all the pipeline have reported to it. However this potentially 
> can lead into an issue where the pipeline is still not ready to accept any 
> incoming IO operations.
> The pipelines should be marked as ready only after the leader election is 
> complete and leader is ready to accept incoming IO.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14836) FileIoProvider should not increase FileIoErrors metric in datanode volume metric

2019-09-09 Thread Wei-Chiu Chuang (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16926308#comment-16926308
 ] 

Wei-Chiu Chuang commented on HDFS-14836:


Got it. Thanks. That makes sense to me.

> FileIoProvider should not increase FileIoErrors metric in datanode volume 
> metric
> 
>
> Key: HDFS-14836
> URL: https://issues.apache.org/jira/browse/HDFS-14836
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.9.1
>Reporter: Aiphago
>Assignee: Aiphago
>Priority: Minor
>
> I found that  FileIoErrors metric will increase in 
> BlockSender.sendPacket(),when use fileIoProvider.transferToSocketFully().But 
> in https://issues.apache.org/jira/browse/HDFS-2054 the Exception has been 
> ignore like "Broken pipe" and "Connection reset" .
> So should do a filter when fileIoProvider increase FileIoErrors count ?



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDDS-2089) Add CLI createPipeline

2019-09-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2089?focusedWorklogId=309512=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-309512
 ]

ASF GitHub Bot logged work on HDDS-2089:


Author: ASF GitHub Bot
Created on: 10/Sep/19 03:43
Start Date: 10/Sep/19 03:43
Worklog Time Spent: 10m 
  Work Description: timmylicheng commented on pull request #1418: 
HDDS-2089: Add createPipeline CLI.
URL: https://github.com/apache/hadoop/pull/1418
 
 
   #HDDS-2089 Add createPipeline for ozone scmcli
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 309512)
Remaining Estimate: 0h
Time Spent: 10m

> Add CLI createPipeline
> --
>
> Key: HDDS-2089
> URL: https://issues.apache.org/jira/browse/HDDS-2089
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>  Components: Ozone CLI
>Affects Versions: 0.5.0
>Reporter: Li Cheng
>Assignee: Li Cheng
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Add a SCMCLI to create pipeline for ozone.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-2089) Add CLI createPipeline

2019-09-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2089?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HDDS-2089:
-
Labels: pull-request-available  (was: )

> Add CLI createPipeline
> --
>
> Key: HDDS-2089
> URL: https://issues.apache.org/jira/browse/HDDS-2089
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>  Components: Ozone CLI
>Affects Versions: 0.5.0
>Reporter: Li Cheng
>Assignee: Li Cheng
>Priority: Major
>  Labels: pull-request-available
>
> Add a SCMCLI to create pipeline for ozone.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-14820) The default 8KB buffer of BlockReaderRemote#newBlockReader#BufferedOutputStream is too big

2019-09-09 Thread Lisheng Sun (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16926304#comment-16926304
 ] 

Lisheng Sun edited comment on HDFS-14820 at 9/10/19 3:38 AM:
-

hi [~elgoiri]
{quote}What is the current default value? 8KB?
{quote}
as follow code, current default value is 8KB.
{code:java}
final DataOutputStream out = new DataOutputStream(new BufferedOutputStream(
  peer.getOutputStream()));

public BufferedOutputStream(OutputStream out) {
this(out, 8192);
}
{code}
i have updated buffer is 512B, taken a lot tests and the resut is ok. I can do 
the pressure test and  use the new buffer in our prodution environment later.
 i agree your suggestion,we can first make it configurable and make the default 
the old value.

Adjust the buffer according to user need.


was (Author: leosun08):
hi [~elgoiri]
{quote}What is the current default value? 8KB?
{quote}
as follow code, current default value is 8KB.
{code:java}
final DataOutputStream out = new DataOutputStream(new BufferedOutputStream(
  peer.getOutputStream()));

public BufferedOutputStream(OutputStream out) {
this(out, 8192);
}
{code}
i have updated buffer is 512B, taken a test and the resut is ok.
 i agree your suggestion,make it configurable and make the default the old 
value.

>  The default 8KB buffer of 
> BlockReaderRemote#newBlockReader#BufferedOutputStream is too big
> ---
>
> Key: HDFS-14820
> URL: https://issues.apache.org/jira/browse/HDFS-14820
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Lisheng Sun
>Assignee: Lisheng Sun
>Priority: Major
> Attachments: HDFS-14820.001.patch
>
>
> this issue is similar to HDFS-14535.
> {code:java}
> public static BlockReader newBlockReader(String file,
> ExtendedBlock block,
> Token blockToken,
> long startOffset, long len,
> boolean verifyChecksum,
> String clientName,
> Peer peer, DatanodeID datanodeID,
> PeerCache peerCache,
> CachingStrategy cachingStrategy,
> int networkDistance) throws IOException {
>   // in and out will be closed when sock is closed (by the caller)
>   final DataOutputStream out = new DataOutputStream(new BufferedOutputStream(
>   peer.getOutputStream()));
>   new Sender(out).readBlock(block, blockToken, clientName, startOffset, len,
>   verifyChecksum, cachingStrategy);
> }
> public BufferedOutputStream(OutputStream out) {
> this(out, 8192);
> }
> {code}
> Sender#readBlock parameter( block,blockToken, clientName, startOffset, len, 
> verifyChecksum, cachingStrategy) could not use such a big buffer.
> So i think it should reduce BufferedOutputStream buffer.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14820) The default 8KB buffer of BlockReaderRemote#newBlockReader#BufferedOutputStream is too big

2019-09-09 Thread Lisheng Sun (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16926304#comment-16926304
 ] 

Lisheng Sun commented on HDFS-14820:


hi [~elgoiri]
{quote}What is the current default value? 8KB?
{quote}
as follow code, current default value is 8KB.
{code:java}
final DataOutputStream out = new DataOutputStream(new BufferedOutputStream(
  peer.getOutputStream()));

public BufferedOutputStream(OutputStream out) {
this(out, 8192);
}
{code}
i have updated buffer is 512B, taken a test and the resut is ok.
 i agree your suggestion,make it configurable and make the default the old 
value.

>  The default 8KB buffer of 
> BlockReaderRemote#newBlockReader#BufferedOutputStream is too big
> ---
>
> Key: HDFS-14820
> URL: https://issues.apache.org/jira/browse/HDFS-14820
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Lisheng Sun
>Assignee: Lisheng Sun
>Priority: Major
> Attachments: HDFS-14820.001.patch
>
>
> this issue is similar to HDFS-14535.
> {code:java}
> public static BlockReader newBlockReader(String file,
> ExtendedBlock block,
> Token blockToken,
> long startOffset, long len,
> boolean verifyChecksum,
> String clientName,
> Peer peer, DatanodeID datanodeID,
> PeerCache peerCache,
> CachingStrategy cachingStrategy,
> int networkDistance) throws IOException {
>   // in and out will be closed when sock is closed (by the caller)
>   final DataOutputStream out = new DataOutputStream(new BufferedOutputStream(
>   peer.getOutputStream()));
>   new Sender(out).readBlock(block, blockToken, clientName, startOffset, len,
>   verifyChecksum, cachingStrategy);
> }
> public BufferedOutputStream(OutputStream out) {
> this(out, 8192);
> }
> {code}
> Sender#readBlock parameter( block,blockToken, clientName, startOffset, len, 
> verifyChecksum, cachingStrategy) could not use such a big buffer.
> So i think it should reduce BufferedOutputStream buffer.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14837) Review of Block.java

2019-09-09 Thread stack (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16926301#comment-16926301
 ] 

stack commented on HDFS-14837:
--

One question, is Long.hashCode same as (int)(blockId^(blockId>>>32)) (I've not 
looked..)

> Review of Block.java
> 
>
> Key: HDFS-14837
> URL: https://issues.apache.org/jira/browse/HDFS-14837
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client
>Affects Versions: 3.2.0
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Minor
> Attachments: HDFS-14837.1.patch
>
>
> The {{Block}} class is such a core class in the project, I just wanted to 
> make sure it was super clean and documentation was correct.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14836) FileIoProvider should not increase FileIoErrors metric in datanode volume metric

2019-09-09 Thread Aiphago (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16926300#comment-16926300
 ] 

Aiphago commented on HDFS-14836:


Hi [~jojochuang] thxs for your attention.

like  HDFS-2054  "Broken pipe" and "Connection reset" is cause by client rather 
than datanode , and datanode may increment a lot of FileIoErrors counter 
,because of this Exception. So I think it's better to do a filter.

 

> FileIoProvider should not increase FileIoErrors metric in datanode volume 
> metric
> 
>
> Key: HDFS-14836
> URL: https://issues.apache.org/jira/browse/HDFS-14836
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.9.1
>Reporter: Aiphago
>Assignee: Aiphago
>Priority: Minor
>
> I found that  FileIoErrors metric will increase in 
> BlockSender.sendPacket(),when use fileIoProvider.transferToSocketFully().But 
> in https://issues.apache.org/jira/browse/HDFS-2054 the Exception has been 
> ignore like "Broken pipe" and "Connection reset" .
> So should do a filter when fileIoProvider increase FileIoErrors count ?



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14283) DFSInputStream to prefer cached replica

2019-09-09 Thread Lisheng Sun (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16926296#comment-16926296
 ] 

Lisheng Sun commented on HDFS-14283:


[~smeng] i are working on this jira.  upload this patch later. Thank you.

> DFSInputStream to prefer cached replica
> ---
>
> Key: HDFS-14283
> URL: https://issues.apache.org/jira/browse/HDFS-14283
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 2.6.0
> Environment: HDFS Caching
>Reporter: Wei-Chiu Chuang
>Assignee: Lisheng Sun
>Priority: Major
>
> HDFS Caching offers performance benefits. However, currently NameNode does 
> not treat cached replica with higher priority, so HDFS caching is only useful 
> when cache replication = 3, that is to say, all replicas are cached in 
> memory, so that a client doesn't randomly pick an uncached replica.
> HDFS-6846 proposed to let NameNode give higher priority to cached replica. 
> Changing a logic in NameNode is always tricky so that didn't get much 
> traction. Here I propose a different approach: let client (DFSInputStream) 
> prefer cached replica.
> A {{LocatedBlock}} object already contains cached replica location so a 
> client has the needed information. I think we can change 
> {{DFSInputStream#getBestNodeDNAddrPair()}} for this purpose.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14795) Add Throttler for writing block

2019-09-09 Thread Lisheng Sun (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14795?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lisheng Sun updated HDFS-14795:
---
Attachment: HDFS-14795.005.patch

> Add Throttler for writing block
> ---
>
> Key: HDFS-14795
> URL: https://issues.apache.org/jira/browse/HDFS-14795
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Lisheng Sun
>Assignee: Lisheng Sun
>Priority: Minor
> Attachments: HDFS-14795.001.patch, HDFS-14795.002.patch, 
> HDFS-14795.003.patch, HDFS-14795.004.patch, HDFS-14795.005.patch
>
>
> DataXceiver#writeBlock
> {code:java}
> blockReceiver.receiveBlock(mirrorOut, mirrorIn, replyOut,
> mirrorAddr, null, targets, false);
> {code}
> As above code, DataXceiver#writeBlock doesn't throttler.
>  I think it is necessary to throttle for writing block, while add throttler 
> in stage of PIPELINE_SETUP_APPEND_RECOVERY or 
> PIPELINE_SETUP_STREAMING_RECOVERY.
> Default throttler value is still null.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14568) setStoragePolicy should check quota and update consume on storage type quota.

2019-09-09 Thread Jinglun (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14568?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16926292#comment-16926292
 ] 

Jinglun commented on HDFS-14568:


Hi [~surendrasingh], sorry for my late response. Do you mean to set SSD storage 
quota to 10byte on a directory with 10GB DISK space consume? I think we 
shouldn't allow this because the setStoragePolicy() will cause a quota exceed. 
And I think any rpc causing quota exceed should end with a QuotaExceed 
exception. In patch-004 a RemoteException with  
QuotaByStorageTypeExceededException will be thrown.

+1 the change would be incompatible because the method only throws IOException 
but now it will throw a QuotaExceedException. 

May be adding a switch to enable the quota check & consume update ? 

 

> setStoragePolicy should check quota and update consume on storage type quota.
> -
>
> Key: HDFS-14568
> URL: https://issues.apache.org/jira/browse/HDFS-14568
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 3.1.0
>Reporter: Jinglun
>Assignee: Jinglun
>Priority: Major
> Attachments: HDFS-14568-001.patch, HDFS-14568-unit-test.patch, 
> HDFS-14568.002.patch, HDFS-14568.003.patch, HDFS-14568.004.patch
>
>
> The quota and consume of the file's ancestors are not handled when the 
> storage policy of the file is changed. For example:
>  1. Set quota StorageType.SSD fileSpace-1 to the parent dir;
>  2. Create a file size of fileSpace with storage policy \{DISK,DISK,DISK} 
> under it;
>  3. Change the storage policy of the file to ALLSSD_STORAGE_POLICY_NAME and 
> expect a QuotaByStorageTypeExceededException.
> Because the quota and consume is not handled, the expected exception is not 
> threw out.
>  
> There are 3 reasons why we should handle the consume and the quota.
> 1. Replication uses the new storage policy. Considering a file with BlockType 
> CONTIGUOUS. It's replication factor is 3 and it's storage policy is "HOT". 
> Now we change the policy to "ONE_SSD". If a DN goes down and the file needs 
> replication, the NN will choose storages in policy "ONE_SSD" and replicate 
> the block to a SSD storage.
> 2. We acturally have a cluster storaging both HOT and COLD data. We have a 
> backgroud process searching all the files to find those that are not accessed 
> for a period of time. Then we set them to COLD and start a mover to move the 
> replicas. After moving, all the replicas are consistent with the storage 
> policy.
> 3. The NameNode manages the global state of the cluster. If there is any 
> inconsistent situation, such as the replicas doesn't match the storage policy 
> of the file, we should take the NameNode as the standard and make the cluster 
> to match the NameNode. The block replication is a good example of the rule. 
> When we count the consume of a file(CONTIGUOUS), we multiply the replication 
> factor with the file's length, no matter the file is under replicated or 
> excessed. So does the storage type quota and consume.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-1868) Ozone pipelines should be marked as ready only after the leader election is complete

2019-09-09 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/HDDS-1868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16926246#comment-16926246
 ] 

Hadoop QA commented on HDDS-1868:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  1m 
33s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} dupname {color} | {color:green}  0m  
1s{color} | {color:green} No case conflicting files found. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  7m 
55s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 13m 
59s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  6m 
21s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
29s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m  
0s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
14m 42s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  3m 
26s{color} | {color:green} trunk passed {color} |
| {color:blue}0{color} | {color:blue} spotbugs {color} | {color:blue}  7m  
1s{color} | {color:blue} Used deprecated FindBugs config; considering switching 
to SpotBugs. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 10m 
29s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  1m 
19s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  8m 
55s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  6m 
27s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  6m 
27s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 46s{color} | {color:orange} hadoop-ozone: The patch generated 1 new + 0 
unchanged - 0 fixed = 1 total (was 0) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m  
0s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m 23s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  2m 
54s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 10m 
32s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  5m  
2s{color} | {color:green} hadoop-hdds in the patch passed. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 41m 31s{color} 
| {color:red} hadoop-ozone in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
53s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}152m 16s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.ozone.container.server.TestSecureContainerServer |
|   | hadoop.ozone.client.rpc.TestBlockOutputStream |
|   | 
hadoop.ozone.container.common.statemachine.commandhandler.TestBlockDeletion |
|   | hadoop.ozone.client.rpc.TestCloseContainerHandlingByClient |
|   | hadoop.ozone.container.TestContainerReplication |
|   | hadoop.hdds.scm.safemode.TestSCMSafeModeWithPipelineRules |
|   | hadoop.ozone.container.ozoneimpl.TestOzoneContainer |
|   | 

[jira] [Commented] (HDFS-14802) The feature of protect directories should be used in RenameOp

2019-09-09 Thread Fei Hui (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16926225#comment-16926225
 ] 

Fei Hui commented on HDFS-14802:


[~jojochuang] [~arp] Does is make sense?

> The feature of protect directories should be used in RenameOp
> -
>
> Key: HDFS-14802
> URL: https://issues.apache.org/jira/browse/HDFS-14802
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs
>Affects Versions: 3.0.4, 3.3.0, 3.2.1, 3.1.3
>Reporter: Fei Hui
>Assignee: Fei Hui
>Priority: Major
> Attachments: HDFS-14802.001.patch, HDFS-14802.002.patch, 
> HDFS-14802.003.patch
>
>
> Now we could set fs.protected.directories to prevent users from deleting 
> important directories. But users can delete directories around the limitation.
> 1. Rename the directories and delete them.
> 2. move the directories to trash and namenode will delete them.
> So I think we should use the feature of protected directories in RenameOp



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-14831) Downgrade Failed from 3.2.0 to 2.7 because of incompatible stringtable

2019-09-09 Thread Fei Hui (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16926223#comment-16926223
 ] 

Fei Hui edited comment on HDFS-14831 at 9/10/19 1:08 AM:
-

[~jojochuang] Get it
Users can upgrade to 3.2.0 from 2.7.0~2.7.7 and downgrade to 2.7.8 from 3.2.0. 
We can work around the compatible stringtable problem, is it right?


was (Author: ferhui):
[~jojochuang] Get it
Users can upgrade to 3.2.0 from 2.7.0~2.7.7 and downgrade to 2.7.8 from 3.2.0. 
We can work around in compatible stringtable problem, is it right?

> Downgrade Failed from 3.2.0 to 2.7 because of incompatible stringtable 
> ---
>
> Key: HDFS-14831
> URL: https://issues.apache.org/jira/browse/HDFS-14831
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 3.2.0, 3.3.0, 3.1.3
>Reporter: Fei Hui
>Assignee: Fei Hui
>Priority: Major
>
> Mentioned on HDFS-13596
> Incompatible StringTable changes cause downgrade from 3.2.0 to 2.7.2 failed
> commit message as follow, but issue not found
> {quote}
> commit 8a41edb089fbdedc5e7d9a2aeec63d126afea49f
> Author: Vinayakumar B 
> Date:   Mon Oct 15 15:48:26 2018 +0530
> Fix potential FSImage corruption. Contributed by Daryn Sharp.
> {quote} 



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-14831) Downgrade Failed from 3.2.0 to 2.7 because of incompatible stringtable

2019-09-09 Thread Fei Hui (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16926223#comment-16926223
 ] 

Fei Hui edited comment on HDFS-14831 at 9/10/19 1:07 AM:
-

[~jojochuang] Get it
Users can upgrade to 3.2.0 from 2.7.0~2.7.7 and downgrade to 2.7.8 from 3.2.0. 
We can work around in compatible stringtable problem, is it right?


was (Author: ferhui):
[~jojochuang] Get it
Users can upgrade to 3.2.0 from 2.7.0~2.7.7 and downgrade to 2.7.8 from 3.2.0. 
We can work around in compatible stringtable problem, is it?

> Downgrade Failed from 3.2.0 to 2.7 because of incompatible stringtable 
> ---
>
> Key: HDFS-14831
> URL: https://issues.apache.org/jira/browse/HDFS-14831
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 3.2.0, 3.3.0, 3.1.3
>Reporter: Fei Hui
>Assignee: Fei Hui
>Priority: Major
>
> Mentioned on HDFS-13596
> Incompatible StringTable changes cause downgrade from 3.2.0 to 2.7.2 failed
> commit message as follow, but issue not found
> {quote}
> commit 8a41edb089fbdedc5e7d9a2aeec63d126afea49f
> Author: Vinayakumar B 
> Date:   Mon Oct 15 15:48:26 2018 +0530
> Fix potential FSImage corruption. Contributed by Daryn Sharp.
> {quote} 



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14831) Downgrade Failed from 3.2.0 to 2.7 because of incompatible stringtable

2019-09-09 Thread Fei Hui (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16926223#comment-16926223
 ] 

Fei Hui commented on HDFS-14831:


[~jojochuang] Get it
Users can upgrade to 3.2.0 from 2.7.0~2.7.7 and downgrade to 2.7.8 from 3.2.0. 
We can work around in compatible stringtable problem, is it?

> Downgrade Failed from 3.2.0 to 2.7 because of incompatible stringtable 
> ---
>
> Key: HDFS-14831
> URL: https://issues.apache.org/jira/browse/HDFS-14831
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 3.2.0, 3.3.0, 3.1.3
>Reporter: Fei Hui
>Assignee: Fei Hui
>Priority: Major
>
> Mentioned on HDFS-13596
> Incompatible StringTable changes cause downgrade from 3.2.0 to 2.7.2 failed
> commit message as follow, but issue not found
> {quote}
> commit 8a41edb089fbdedc5e7d9a2aeec63d126afea49f
> Author: Vinayakumar B 
> Date:   Mon Oct 15 15:48:26 2018 +0530
> Fix potential FSImage corruption. Contributed by Daryn Sharp.
> {quote} 



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14837) Review of Block.java

2019-09-09 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16926218#comment-16926218
 ] 

Hadoop QA commented on HDFS-14837:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
28s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 22m 
29s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
51s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
27s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
52s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
13m 42s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
51s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
37s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
45s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
36s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
36s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
18s{color} | {color:green} hadoop-hdfs-project/hadoop-hdfs-client: The patch 
generated 0 new + 9 unchanged - 1 fixed = 9 total (was 10) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
50s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
13m 22s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m  
3s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
26s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  1m 
51s{color} | {color:green} hadoop-hdfs-client in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
24s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 62m 15s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=19.03.1 Server=19.03.1 Image:yetus/hadoop:bdbca0e53b4 |
| JIRA Issue | HDFS-14837 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12979905/HDFS-14837.1.patch |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 63d30f32bdb3 4.15.0-60-generic #67-Ubuntu SMP Thu Aug 22 
16:55:30 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 650c4ce |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_222 |
| findbugs | v3.1.0-RC1 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-HDFS-Build/27826/testReport/ |
| Max. process+thread count | 341 (vs. ulimit of 5500) |
| modules | C: hadoop-hdfs-project/hadoop-hdfs-client U: 
hadoop-hdfs-project/hadoop-hdfs-client |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/27826/console |
| Powered by | Apache Yetus 0.8.0   

[jira] [Commented] (HDFS-14509) DN throws InvalidToken due to inequality of password when upgrade NN 2.x to 3.x

2019-09-09 Thread Fei Hui (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16926217#comment-16926217
 ] 

Fei Hui commented on HDFS-14509:


[~John Smith] During Rolling upgrade, NN is 3.x, and DN is 2.x , What is your 
client version?

> DN throws InvalidToken due to inequality of password when upgrade NN 2.x to 
> 3.x
> ---
>
> Key: HDFS-14509
> URL: https://issues.apache.org/jira/browse/HDFS-14509
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Yuxuan Wang
>Priority: Blocker
> Attachments: HDFS-14509-001.patch
>
>
> According to the doc, if we want to upgrade cluster from 2.x to 3.x, we need 
> upgrade NN first. And there will be a intermediate state that NN is 3.x and 
> DN is 2.x. At that moment, if a client reads (or writes) a block, it will get 
> a block token from NN and then deliver the token to DN who can verify the 
> token. But the verification in the code now is :
> {code:title=BlockTokenSecretManager.java|borderStyle=solid}
> public void checkAccess(...)
> {
> ...
> id.readFields(new DataInputStream(new 
> ByteArrayInputStream(token.getIdentifier(;
> ...
> if (!Arrays.equals(retrievePassword(id), token.getPassword())) {
>   throw new InvalidToken("Block token with " + id.toString()
>   + " doesn't have the correct token password");
> }
> }
> {code} 
> And {{retrievePassword(id)}} is:
> {code} 
> public byte[] retrievePassword(BlockTokenIdentifier identifier)
> {
> ...
> return createPassword(identifier.getBytes(), key.getKey());
> }
> {code} 
> So, if NN's identifier add new fields, DN will lose the fields and compute 
> wrong password.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDDS-1786) Datanodes takeSnapshot should delete previously created snapshots

2019-09-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-1786?focusedWorklogId=309423=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-309423
 ]

ASF GitHub Bot logged work on HDDS-1786:


Author: ASF GitHub Bot
Created on: 10/Sep/19 00:59
Start Date: 10/Sep/19 00:59
Worklog Time Spent: 10m 
  Work Description: hadoop-yetus commented on issue #1163: HDDS-1786 : 
Datanodes takeSnapshot should delete previously created s…
URL: https://github.com/apache/hadoop/pull/1163#issuecomment-529723367
 
 
   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime | Comment |
   |::|--:|:|:|
   | 0 | reexec | 82 | Docker mode activated. |
   ||| _ Prechecks _ |
   | +1 | dupname | 0 | No case conflicting files found. |
   | +1 | @author | 0 | The patch does not contain any @author tags. |
   | -1 | test4tests | 0 | The patch doesn't appear to include any new or 
modified tests.  Please justify why no new tests are needed for this patch. 
Also please list what manual steps were performed to verify this patch. |
   ||| _ trunk Compile Tests _ |
   | +1 | mvninstall | 637 | trunk passed |
   | +1 | compile | 372 | trunk passed |
   | +1 | checkstyle | 76 | trunk passed |
   | +1 | mvnsite | 0 | trunk passed |
   | +1 | shadedclient | 954 | branch has no errors when building and testing 
our client artifacts. |
   | +1 | javadoc | 170 | trunk passed |
   | 0 | spotbugs | 436 | Used deprecated FindBugs config; considering 
switching to SpotBugs. |
   | +1 | findbugs | 638 | trunk passed |
   | -0 | patch | 479 | Used diff version of patch file. Binary files and 
potentially other changes not applied. Please rebase and squash commits if 
necessary. |
   ||| _ Patch Compile Tests _ |
   | +1 | mvninstall | 547 | the patch passed |
   | +1 | compile | 377 | the patch passed |
   | +1 | javac | 377 | the patch passed |
   | +1 | checkstyle | 79 | the patch passed |
   | +1 | mvnsite | 0 | the patch passed |
   | +1 | whitespace | 0 | The patch has no whitespace issues. |
   | +1 | shadedclient | 756 | patch has no errors when building and testing 
our client artifacts. |
   | +1 | javadoc | 191 | the patch passed |
   | +1 | findbugs | 657 | the patch passed |
   ||| _ Other Tests _ |
   | +1 | unit | 314 | hadoop-hdds in the patch passed. |
   | -1 | unit | 2269 | hadoop-ozone in the patch failed. |
   | +1 | asflicense | 46 | The patch does not generate ASF License warnings. |
   | | | 8333 | |
   
   
   | Reason | Tests |
   |---:|:--|
   | Failed junit tests | hadoop.ozone.client.rpc.TestBlockOutputStream |
   |   | hadoop.ozone.client.rpc.TestCommitWatcher |
   |   | hadoop.ozone.container.TestContainerReplication |
   |   | hadoop.ozone.client.rpc.TestBlockOutputStreamWithFailures |
   |   | 
hadoop.ozone.container.common.statemachine.commandhandler.TestBlockDeletion |
   |   | hadoop.ozone.TestSecureOzoneCluster |
   |   | hadoop.ozone.client.rpc.TestMultiBlockWritesWithDnFailures |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | Client=19.03.2 Server=19.03.2 base: 
https://builds.apache.org/job/hadoop-multibranch/job/PR-1163/7/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/1163 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall 
mvnsite unit shadedclient findbugs checkstyle |
   | uname | Linux 06da123aa662 4.15.0-60-generic #67-Ubuntu SMP Thu Aug 22 
16:55:30 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | personality/hadoop.sh |
   | git revision | trunk / 650c4ce |
   | Default Java | 1.8.0_222 |
   | unit | 
https://builds.apache.org/job/hadoop-multibranch/job/PR-1163/7/artifact/out/patch-unit-hadoop-ozone.txt
 |
   |  Test Results | 
https://builds.apache.org/job/hadoop-multibranch/job/PR-1163/7/testReport/ |
   | Max. process+thread count | 4692 (vs. ulimit of 5500) |
   | modules | C: hadoop-hdds/container-service U: 
hadoop-hdds/container-service |
   | Console output | 
https://builds.apache.org/job/hadoop-multibranch/job/PR-1163/7/console |
   | versions | git=2.7.4 maven=3.3.9 findbugs=3.1.0-RC1 |
   | Powered by | Apache Yetus 0.10.0 http://yetus.apache.org |
   
   
   This message was automatically generated.
   
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 309423)
Time Spent: 3h  (was: 2h 50m)

> Datanodes takeSnapshot should delete previously created snapshots
> -
>
> Key: 

[jira] [Commented] (HDFS-14837) Review of Block.java

2019-09-09 Thread stack (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16926214#comment-16926214
 ] 

stack commented on HDFS-14837:
--

+1
nice cleanup

> Review of Block.java
> 
>
> Key: HDFS-14837
> URL: https://issues.apache.org/jira/browse/HDFS-14837
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client
>Affects Versions: 3.2.0
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Minor
> Attachments: HDFS-14837.1.patch
>
>
> The {{Block}} class is such a core class in the project, I just wanted to 
> make sure it was super clean and documentation was correct.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDDS-1786) Datanodes takeSnapshot should delete previously created snapshots

2019-09-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-1786?focusedWorklogId=309419=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-309419
 ]

ASF GitHub Bot logged work on HDDS-1786:


Author: ASF GitHub Bot
Created on: 10/Sep/19 00:49
Start Date: 10/Sep/19 00:49
Worklog Time Spent: 10m 
  Work Description: hadoop-yetus commented on issue #1163: HDDS-1786 : 
Datanodes takeSnapshot should delete previously created s…
URL: https://github.com/apache/hadoop/pull/1163#issuecomment-529721393
 
 
   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime | Comment |
   |::|--:|:|:|
   | 0 | reexec | 46 | Docker mode activated. |
   ||| _ Prechecks _ |
   | +1 | dupname | 0 | No case conflicting files found. |
   | +1 | @author | 0 | The patch does not contain any @author tags. |
   | -1 | test4tests | 0 | The patch doesn't appear to include any new or 
modified tests.  Please justify why no new tests are needed for this patch. 
Also please list what manual steps were performed to verify this patch. |
   ||| _ trunk Compile Tests _ |
   | +1 | mvninstall | 623 | trunk passed |
   | +1 | compile | 392 | trunk passed |
   | +1 | checkstyle | 80 | trunk passed |
   | +1 | mvnsite | 0 | trunk passed |
   | +1 | shadedclient | 854 | branch has no errors when building and testing 
our client artifacts. |
   | +1 | javadoc | 174 | trunk passed |
   | 0 | spotbugs | 443 | Used deprecated FindBugs config; considering 
switching to SpotBugs. |
   | +1 | findbugs | 649 | trunk passed |
   | -0 | patch | 491 | Used diff version of patch file. Binary files and 
potentially other changes not applied. Please rebase and squash commits if 
necessary. |
   ||| _ Patch Compile Tests _ |
   | +1 | mvninstall | 552 | the patch passed |
   | +1 | compile | 396 | the patch passed |
   | +1 | javac | 396 | the patch passed |
   | +1 | checkstyle | 85 | the patch passed |
   | +1 | mvnsite | 0 | the patch passed |
   | +1 | whitespace | 0 | The patch has no whitespace issues. |
   | +1 | shadedclient | 654 | patch has no errors when building and testing 
our client artifacts. |
   | +1 | javadoc | 175 | the patch passed |
   | +1 | findbugs | 653 | the patch passed |
   ||| _ Other Tests _ |
   | +1 | unit | 295 | hadoop-hdds in the patch passed. |
   | -1 | unit | 1996 | hadoop-ozone in the patch failed. |
   | +1 | asflicense | 52 | The patch does not generate ASF License warnings. |
   | | | 7864 | |
   
   
   | Reason | Tests |
   |---:|:--|
   | Failed junit tests | hadoop.hdds.scm.pipeline.TestRatisPipelineProvider |
   |   | hadoop.ozone.client.rpc.TestMultiBlockWritesWithDnFailures |
   |   | hadoop.ozone.TestSecureOzoneCluster |
   |   | hadoop.ozone.client.rpc.TestContainerStateMachineFailures |
   |   | hadoop.ozone.client.rpc.TestCloseContainerHandlingByClient |
   |   | hadoop.ozone.scm.TestContainerSmallFile |
   |   | hadoop.ozone.container.TestContainerReplication |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | Client=19.03.1 Server=19.03.1 base: 
https://builds.apache.org/job/hadoop-multibranch/job/PR-1163/6/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/1163 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall 
mvnsite unit shadedclient findbugs checkstyle |
   | uname | Linux d74699a82149 4.15.0-58-generic #64-Ubuntu SMP Tue Aug 6 
11:12:41 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | personality/hadoop.sh |
   | git revision | trunk / 650c4ce |
   | Default Java | 1.8.0_222 |
   | unit | 
https://builds.apache.org/job/hadoop-multibranch/job/PR-1163/6/artifact/out/patch-unit-hadoop-ozone.txt
 |
   |  Test Results | 
https://builds.apache.org/job/hadoop-multibranch/job/PR-1163/6/testReport/ |
   | Max. process+thread count | 5268 (vs. ulimit of 5500) |
   | modules | C: hadoop-hdds/container-service U: 
hadoop-hdds/container-service |
   | Console output | 
https://builds.apache.org/job/hadoop-multibranch/job/PR-1163/6/console |
   | versions | git=2.7.4 maven=3.3.9 findbugs=3.1.0-RC1 |
   | Powered by | Apache Yetus 0.10.0 http://yetus.apache.org |
   
   
   This message was automatically generated.
   
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 309419)
Time Spent: 2h 50m  (was: 2h 40m)

> Datanodes takeSnapshot should delete previously created snapshots
> -
>
> Key: HDDS-1786
>  

[jira] [Updated] (HDDS-2106) Avoid usage of hadoop projects as parent of hdds/ozone

2019-09-09 Thread Elek, Marton (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2106?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Elek, Marton updated HDDS-2106:
---
Priority: Blocker  (was: Major)

> Avoid usage of hadoop projects as parent of hdds/ozone
> --
>
> Key: HDDS-2106
> URL: https://issues.apache.org/jira/browse/HDDS-2106
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>Reporter: Elek, Marton
>Priority: Blocker
>
> Ozone uses hadoop as a dependency. The dependency defined on multiple level:
>  1. the hadoop artifacts are defined in the  sections
>  2. both hadoop-ozone and hadoop-hdds projects uses "hadoop-project" as the 
> parent
> As we already have a slightly different assembly process it could be more 
> resilient to use a dedicated parent project instead of the hadoop one. With 
> this approach it will be easier to upgrade the versions as we don't need to 
> be careful about the pom contents only about the used dependencies.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Assigned] (HDDS-2106) Avoid usage of hadoop projects as parent of hdds/ozone

2019-09-09 Thread Elek, Marton (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2106?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Elek, Marton reassigned HDDS-2106:
--

Assignee: Elek, Marton

> Avoid usage of hadoop projects as parent of hdds/ozone
> --
>
> Key: HDDS-2106
> URL: https://issues.apache.org/jira/browse/HDDS-2106
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>Reporter: Elek, Marton
>Assignee: Elek, Marton
>Priority: Blocker
>
> Ozone uses hadoop as a dependency. The dependency defined on multiple level:
>  1. the hadoop artifacts are defined in the  sections
>  2. both hadoop-ozone and hadoop-hdds projects uses "hadoop-project" as the 
> parent
> As we already have a slightly different assembly process it could be more 
> resilient to use a dedicated parent project instead of the hadoop one. With 
> this approach it will be easier to upgrade the versions as we don't need to 
> be careful about the pom contents only about the used dependencies.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14837) Review of Block.java

2019-09-09 Thread David Mollitor (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14837?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Mollitor updated HDFS-14837:
--
Status: Patch Available  (was: Open)

> Review of Block.java
> 
>
> Key: HDFS-14837
> URL: https://issues.apache.org/jira/browse/HDFS-14837
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client
>Affects Versions: 3.2.0
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Minor
> Attachments: HDFS-14837.1.patch
>
>
> The {{Block}} class is such a core class in the project, I just wanted to 
> make sure it was super clean and documentation was correct.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14837) Review of Block.java

2019-09-09 Thread David Mollitor (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14837?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Mollitor updated HDFS-14837:
--
Attachment: HDFS-14837.1.patch

> Review of Block.java
> 
>
> Key: HDFS-14837
> URL: https://issues.apache.org/jira/browse/HDFS-14837
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client
>Affects Versions: 3.2.0
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Minor
> Attachments: HDFS-14837.1.patch
>
>
> The {{Block}} class is such a core class in the project, I just wanted to 
> make sure it was super clean and documentation was correct.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-14837) Review of Block.java

2019-09-09 Thread David Mollitor (Jira)
David Mollitor created HDFS-14837:
-

 Summary: Review of Block.java
 Key: HDFS-14837
 URL: https://issues.apache.org/jira/browse/HDFS-14837
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: hdfs-client
Affects Versions: 3.2.0
Reporter: David Mollitor
Assignee: David Mollitor


The {{Block}} class is such a core class in the project, I just wanted to make 
sure it was super clean and documentation was correct.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-1868) Ozone pipelines should be marked as ready only after the leader election is complete

2019-09-09 Thread Siddharth Wagle (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-1868?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siddharth Wagle updated HDDS-1868:
--
Attachment: HDDS-1868.02.patch

> Ozone pipelines should be marked as ready only after the leader election is 
> complete
> 
>
> Key: HDDS-1868
> URL: https://issues.apache.org/jira/browse/HDDS-1868
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Datanode, SCM
>Affects Versions: 0.4.0
>Reporter: Mukul Kumar Singh
>Assignee: Siddharth Wagle
>Priority: Major
> Fix For: 0.5.0
>
> Attachments: HDDS-1868.01.patch, HDDS-1868.02.patch
>
>
> Ozone pipeline on restart start in allocated state, they are moved into open 
> state after all the pipeline have reported to it. However this potentially 
> can lead into an issue where the pipeline is still not ready to accept any 
> incoming IO operations.
> The pipelines should be marked as ready only after the leader election is 
> complete and leader is ready to accept incoming IO.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-1868) Ozone pipelines should be marked as ready only after the leader election is complete

2019-09-09 Thread Siddharth Wagle (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-1868?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siddharth Wagle updated HDDS-1868:
--
Status: Patch Available  (was: In Progress)

[~ljain] you are very correct, my UT did catch it. Could you please review 
version 02? Thanks.

> Ozone pipelines should be marked as ready only after the leader election is 
> complete
> 
>
> Key: HDDS-1868
> URL: https://issues.apache.org/jira/browse/HDDS-1868
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Datanode, SCM
>Affects Versions: 0.4.0
>Reporter: Mukul Kumar Singh
>Assignee: Siddharth Wagle
>Priority: Major
> Fix For: 0.5.0
>
> Attachments: HDDS-1868.01.patch, HDDS-1868.02.patch
>
>
> Ozone pipeline on restart start in allocated state, they are moved into open 
> state after all the pipeline have reported to it. However this potentially 
> can lead into an issue where the pipeline is still not ready to accept any 
> incoming IO operations.
> The pipelines should be marked as ready only after the leader election is 
> complete and leader is ready to accept incoming IO.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-14774) RBF: Improve RouterWebhdfsMethods#chooseDatanode() error handling

2019-09-09 Thread Wei-Chiu Chuang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14774?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang resolved HDFS-14774.

Resolution: Not A Problem

Thanks CR. I'm resolving it.

> RBF: Improve RouterWebhdfsMethods#chooseDatanode() error handling
> -
>
> Key: HDFS-14774
> URL: https://issues.apache.org/jira/browse/HDFS-14774
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Wei-Chiu Chuang
>Assignee: CR Hota
>Priority: Minor
>
>  HDFS-13972 added the following code:
> {code}
> try {
>   dns = rpcServer.getDatanodeReport(DatanodeReportType.LIVE);
> } catch (IOException e) {
>   LOG.error("Cannot get the datanodes from the RPC server", e);
> } finally {
>   // Reset ugi to remote user for remaining operations.
>   RouterRpcServer.resetCurrentUser();
> }
> HashSet excludes = new HashSet();
> if (excludeDatanodes != null) {
>   Collection collection =
>   getTrimmedStringCollection(excludeDatanodes);
>   for (DatanodeInfo dn : dns) {
> if (collection.contains(dn.getName())) {
>   excludes.add(dn);
> }
>   }
> }
> {code}
> If {{rpcServer.getDatanodeReport()}} throws an exception, {{dns}} will become 
> null. This does't look like the best way to handle the exception. Should 
> router retry upon exception? Does it perform retry automatically under the 
> hood?
> [~crh] [~brahmareddy]



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-2053) Fix TestOzoneManagerRatisServer failure

2019-09-09 Thread Hudson (Jira)


[ 
https://issues.apache.org/jira/browse/HDDS-2053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16926168#comment-16926168
 ] 

Hudson commented on HDDS-2053:
--

FAILURE: Integrated in Jenkins build Hadoop-trunk-Commit #17265 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/17265/])
HDDS-2053. Fix TestOzoneManagerRatisServer failure. Contributed by (github: rev 
650c4cead5d5465921a8bbd4d6294f515f958169)
* (edit) 
hadoop-ozone/ozone-manager/src/test/java/org/apache/hadoop/ozone/om/ratis/TestOzoneManagerRatisServer.java


> Fix TestOzoneManagerRatisServer failure
> ---
>
> Key: HDDS-2053
> URL: https://issues.apache.org/jira/browse/HDDS-2053
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: test
>Reporter: Xiaoyu Yao
>Assignee: Xiaoyu Yao
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 0.5.0
>
>  Time Spent: 2h 50m
>  Remaining Estimate: 0h
>
> {{TestOzoneManagerRatisServer}} is failing on trunk with the following error
> {noformat}
> [ERROR] 
> verifyRaftGroupIdGenerationWithCustomOmServiceId(org.apache.hadoop.ozone.om.ratis.TestOzoneManagerRatisServer)
>   Time elapsed: 0.418 s  <<< ERROR!
> org.apache.hadoop.metrics2.MetricsException: Metrics source 
> OzoneManagerDoubleBufferMetrics already exists!
>   at 
> org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.newSourceName(DefaultMetricsSystem.java:152)
>   at 
> org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.sourceName(DefaultMetricsSystem.java:125)
>   at 
> org.apache.hadoop.metrics2.impl.MetricsSystemImpl.register(MetricsSystemImpl.java:229)
>   at 
> org.apache.hadoop.ozone.om.ratis.metrics.OzoneManagerDoubleBufferMetrics.create(OzoneManagerDoubleBufferMetrics.java:50)
>   at 
> org.apache.hadoop.ozone.om.ratis.OzoneManagerDoubleBuffer.(OzoneManagerDoubleBuffer.java:110)
>   at 
> org.apache.hadoop.ozone.om.ratis.OzoneManagerDoubleBuffer.(OzoneManagerDoubleBuffer.java:88)
>   at 
> org.apache.hadoop.ozone.om.ratis.OzoneManagerStateMachine.(OzoneManagerStateMachine.java:87)
>   at 
> org.apache.hadoop.ozone.om.ratis.OzoneManagerRatisServer.getStateMachine(OzoneManagerRatisServer.java:314)
>   at 
> org.apache.hadoop.ozone.om.ratis.OzoneManagerRatisServer.(OzoneManagerRatisServer.java:244)
>   at 
> org.apache.hadoop.ozone.om.ratis.OzoneManagerRatisServer.newOMRatisServer(OzoneManagerRatisServer.java:302)
>   at 
> org.apache.hadoop.ozone.om.ratis.TestOzoneManagerRatisServer.verifyRaftGroupIdGenerationWithCustomOmServiceId(TestOzoneManagerRatisServer.java:209)
> ...
> {noformat}
> (Thanks [~nandakumar131] for the stack trace.)



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-2053) Fix TestOzoneManagerRatisServer failure

2019-09-09 Thread Xiaoyu Yao (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2053?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaoyu Yao updated HDDS-2053:
-
Fix Version/s: 0.5.0
   Resolution: Fixed
   Status: Resolved  (was: Patch Available)

Thanks all for the reviews. I've merged the change to trunk. 

> Fix TestOzoneManagerRatisServer failure
> ---
>
> Key: HDDS-2053
> URL: https://issues.apache.org/jira/browse/HDDS-2053
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: test
>Reporter: Xiaoyu Yao
>Assignee: Xiaoyu Yao
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 0.5.0
>
>  Time Spent: 2h 50m
>  Remaining Estimate: 0h
>
> {{TestOzoneManagerRatisServer}} is failing on trunk with the following error
> {noformat}
> [ERROR] 
> verifyRaftGroupIdGenerationWithCustomOmServiceId(org.apache.hadoop.ozone.om.ratis.TestOzoneManagerRatisServer)
>   Time elapsed: 0.418 s  <<< ERROR!
> org.apache.hadoop.metrics2.MetricsException: Metrics source 
> OzoneManagerDoubleBufferMetrics already exists!
>   at 
> org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.newSourceName(DefaultMetricsSystem.java:152)
>   at 
> org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.sourceName(DefaultMetricsSystem.java:125)
>   at 
> org.apache.hadoop.metrics2.impl.MetricsSystemImpl.register(MetricsSystemImpl.java:229)
>   at 
> org.apache.hadoop.ozone.om.ratis.metrics.OzoneManagerDoubleBufferMetrics.create(OzoneManagerDoubleBufferMetrics.java:50)
>   at 
> org.apache.hadoop.ozone.om.ratis.OzoneManagerDoubleBuffer.(OzoneManagerDoubleBuffer.java:110)
>   at 
> org.apache.hadoop.ozone.om.ratis.OzoneManagerDoubleBuffer.(OzoneManagerDoubleBuffer.java:88)
>   at 
> org.apache.hadoop.ozone.om.ratis.OzoneManagerStateMachine.(OzoneManagerStateMachine.java:87)
>   at 
> org.apache.hadoop.ozone.om.ratis.OzoneManagerRatisServer.getStateMachine(OzoneManagerRatisServer.java:314)
>   at 
> org.apache.hadoop.ozone.om.ratis.OzoneManagerRatisServer.(OzoneManagerRatisServer.java:244)
>   at 
> org.apache.hadoop.ozone.om.ratis.OzoneManagerRatisServer.newOMRatisServer(OzoneManagerRatisServer.java:302)
>   at 
> org.apache.hadoop.ozone.om.ratis.TestOzoneManagerRatisServer.verifyRaftGroupIdGenerationWithCustomOmServiceId(TestOzoneManagerRatisServer.java:209)
> ...
> {noformat}
> (Thanks [~nandakumar131] for the stack trace.)



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDDS-2053) Fix TestOzoneManagerRatisServer failure

2019-09-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2053?focusedWorklogId=309330=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-309330
 ]

ASF GitHub Bot logged work on HDDS-2053:


Author: ASF GitHub Bot
Created on: 09/Sep/19 22:38
Start Date: 09/Sep/19 22:38
Worklog Time Spent: 10m 
  Work Description: xiaoyuyao commented on pull request #1373: HDDS-2053. 
Fix TestOzoneManagerRatisServer failure. Contributed by Xi…
URL: https://github.com/apache/hadoop/pull/1373
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 309330)
Time Spent: 2h 50m  (was: 2h 40m)

> Fix TestOzoneManagerRatisServer failure
> ---
>
> Key: HDDS-2053
> URL: https://issues.apache.org/jira/browse/HDDS-2053
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: test
>Reporter: Xiaoyu Yao
>Assignee: Xiaoyu Yao
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 0.5.0
>
>  Time Spent: 2h 50m
>  Remaining Estimate: 0h
>
> {{TestOzoneManagerRatisServer}} is failing on trunk with the following error
> {noformat}
> [ERROR] 
> verifyRaftGroupIdGenerationWithCustomOmServiceId(org.apache.hadoop.ozone.om.ratis.TestOzoneManagerRatisServer)
>   Time elapsed: 0.418 s  <<< ERROR!
> org.apache.hadoop.metrics2.MetricsException: Metrics source 
> OzoneManagerDoubleBufferMetrics already exists!
>   at 
> org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.newSourceName(DefaultMetricsSystem.java:152)
>   at 
> org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.sourceName(DefaultMetricsSystem.java:125)
>   at 
> org.apache.hadoop.metrics2.impl.MetricsSystemImpl.register(MetricsSystemImpl.java:229)
>   at 
> org.apache.hadoop.ozone.om.ratis.metrics.OzoneManagerDoubleBufferMetrics.create(OzoneManagerDoubleBufferMetrics.java:50)
>   at 
> org.apache.hadoop.ozone.om.ratis.OzoneManagerDoubleBuffer.(OzoneManagerDoubleBuffer.java:110)
>   at 
> org.apache.hadoop.ozone.om.ratis.OzoneManagerDoubleBuffer.(OzoneManagerDoubleBuffer.java:88)
>   at 
> org.apache.hadoop.ozone.om.ratis.OzoneManagerStateMachine.(OzoneManagerStateMachine.java:87)
>   at 
> org.apache.hadoop.ozone.om.ratis.OzoneManagerRatisServer.getStateMachine(OzoneManagerRatisServer.java:314)
>   at 
> org.apache.hadoop.ozone.om.ratis.OzoneManagerRatisServer.(OzoneManagerRatisServer.java:244)
>   at 
> org.apache.hadoop.ozone.om.ratis.OzoneManagerRatisServer.newOMRatisServer(OzoneManagerRatisServer.java:302)
>   at 
> org.apache.hadoop.ozone.om.ratis.TestOzoneManagerRatisServer.verifyRaftGroupIdGenerationWithCustomOmServiceId(TestOzoneManagerRatisServer.java:209)
> ...
> {noformat}
> (Thanks [~nandakumar131] for the stack trace.)



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDDS-2106) Avoid usage of hadoop projects as parent of hdds/ozone

2019-09-09 Thread Elek, Marton (Jira)
Elek, Marton created HDDS-2106:
--

 Summary: Avoid usage of hadoop projects as parent of hdds/ozone
 Key: HDDS-2106
 URL: https://issues.apache.org/jira/browse/HDDS-2106
 Project: Hadoop Distributed Data Store
  Issue Type: Improvement
Reporter: Elek, Marton


Ozone uses hadoop as a dependency. The dependency defined on multiple level:

 1. the hadoop artifacts are defined in the  sections
 2. both hadoop-ozone and hadoop-hdds projects uses "hadoop-project" as the 
parent

As we already have a slightly different assembly process it could be more 
resilient to use a dedicated parent project instead of the hadoop one. With 
this approach it will be easier to upgrade the versions as we don't need to be 
careful about the pom contents only about the used dependencies.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-2102) HddsVolumeChecker should use java optional in place of Guava optional

2019-09-09 Thread Hudson (Jira)


[ 
https://issues.apache.org/jira/browse/HDDS-2102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16926148#comment-16926148
 ] 

Hudson commented on HDDS-2102:
--

FAILURE: Integrated in Jenkins build Hadoop-trunk-Commit #17264 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/17264/])
HDDS-2102. HddsVolumeChecker should use java optional in place of Guava 
(bharat: rev d69b811ddd8bf2632faabf1e069883b8aa08f5a0)
* (edit) 
hadoop-hdds/container-service/src/test/java/org/apache/hadoop/ozone/container/common/volume/TestHddsVolumeChecker.java
* (add) 
hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/common/volume/AsyncChecker.java
* (edit) 
hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/common/volume/HddsVolumeChecker.java
* (edit) 
hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/common/volume/ThrottledAsyncChecker.java


> HddsVolumeChecker should use java optional in place of Guava optional
> -
>
> Key: HDDS-2102
> URL: https://issues.apache.org/jira/browse/HDDS-2102
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Datanode
>Affects Versions: 0.4.0
>Reporter: Mukul Kumar Singh
>Assignee: Mukul Kumar Singh
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.5.0
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> HddsVolumeChecker should use java optional in place of Guava optional, as the 
> Guava dependency is marked unstable.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDDS-2053) Fix TestOzoneManagerRatisServer failure

2019-09-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2053?focusedWorklogId=309313=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-309313
 ]

ASF GitHub Bot logged work on HDDS-2053:


Author: ASF GitHub Bot
Created on: 09/Sep/19 22:15
Start Date: 09/Sep/19 22:15
Worklog Time Spent: 10m 
  Work Description: hanishakoneru commented on issue #1373: HDDS-2053. Fix 
TestOzoneManagerRatisServer failure. Contributed by Xi…
URL: https://github.com/apache/hadoop/pull/1373#issuecomment-529688386
 
 
   Change LGTM. +1.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 309313)
Time Spent: 2h 40m  (was: 2.5h)

> Fix TestOzoneManagerRatisServer failure
> ---
>
> Key: HDDS-2053
> URL: https://issues.apache.org/jira/browse/HDDS-2053
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: test
>Reporter: Xiaoyu Yao
>Assignee: Xiaoyu Yao
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 2h 40m
>  Remaining Estimate: 0h
>
> {{TestOzoneManagerRatisServer}} is failing on trunk with the following error
> {noformat}
> [ERROR] 
> verifyRaftGroupIdGenerationWithCustomOmServiceId(org.apache.hadoop.ozone.om.ratis.TestOzoneManagerRatisServer)
>   Time elapsed: 0.418 s  <<< ERROR!
> org.apache.hadoop.metrics2.MetricsException: Metrics source 
> OzoneManagerDoubleBufferMetrics already exists!
>   at 
> org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.newSourceName(DefaultMetricsSystem.java:152)
>   at 
> org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.sourceName(DefaultMetricsSystem.java:125)
>   at 
> org.apache.hadoop.metrics2.impl.MetricsSystemImpl.register(MetricsSystemImpl.java:229)
>   at 
> org.apache.hadoop.ozone.om.ratis.metrics.OzoneManagerDoubleBufferMetrics.create(OzoneManagerDoubleBufferMetrics.java:50)
>   at 
> org.apache.hadoop.ozone.om.ratis.OzoneManagerDoubleBuffer.(OzoneManagerDoubleBuffer.java:110)
>   at 
> org.apache.hadoop.ozone.om.ratis.OzoneManagerDoubleBuffer.(OzoneManagerDoubleBuffer.java:88)
>   at 
> org.apache.hadoop.ozone.om.ratis.OzoneManagerStateMachine.(OzoneManagerStateMachine.java:87)
>   at 
> org.apache.hadoop.ozone.om.ratis.OzoneManagerRatisServer.getStateMachine(OzoneManagerRatisServer.java:314)
>   at 
> org.apache.hadoop.ozone.om.ratis.OzoneManagerRatisServer.(OzoneManagerRatisServer.java:244)
>   at 
> org.apache.hadoop.ozone.om.ratis.OzoneManagerRatisServer.newOMRatisServer(OzoneManagerRatisServer.java:302)
>   at 
> org.apache.hadoop.ozone.om.ratis.TestOzoneManagerRatisServer.verifyRaftGroupIdGenerationWithCustomOmServiceId(TestOzoneManagerRatisServer.java:209)
> ...
> {noformat}
> (Thanks [~nandakumar131] for the stack trace.)



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14283) DFSInputStream to prefer cached replica

2019-09-09 Thread Siyao Meng (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16926127#comment-16926127
 ] 

Siyao Meng commented on HDFS-14283:
---

[~leosun08] Any work done on your side yet? If not I can take over this one.

[~jojochuang] I'm a bit worried that using enabling this by default could cause 
hot spot issue on those DataNodes with cached replicas.

> DFSInputStream to prefer cached replica
> ---
>
> Key: HDFS-14283
> URL: https://issues.apache.org/jira/browse/HDFS-14283
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 2.6.0
> Environment: HDFS Caching
>Reporter: Wei-Chiu Chuang
>Assignee: Lisheng Sun
>Priority: Major
>
> HDFS Caching offers performance benefits. However, currently NameNode does 
> not treat cached replica with higher priority, so HDFS caching is only useful 
> when cache replication = 3, that is to say, all replicas are cached in 
> memory, so that a client doesn't randomly pick an uncached replica.
> HDFS-6846 proposed to let NameNode give higher priority to cached replica. 
> Changing a logic in NameNode is always tricky so that didn't get much 
> traction. Here I propose a different approach: let client (DFSInputStream) 
> prefer cached replica.
> A {{LocatedBlock}} object already contains cached replica location so a 
> client has the needed information. I think we can change 
> {{DFSInputStream#getBestNodeDNAddrPair()}} for this purpose.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDDS-2105) Merge OzoneClientFactory#getRpcClient functions

2019-09-09 Thread Siyao Meng (Jira)
Siyao Meng created HDDS-2105:


 Summary: Merge OzoneClientFactory#getRpcClient functions
 Key: HDDS-2105
 URL: https://issues.apache.org/jira/browse/HDDS-2105
 Project: Hadoop Distributed Data Store
  Issue Type: Improvement
Reporter: Siyao Meng
Assignee: Siyao Meng


Ref: https://github.com/apache/hadoop/pull/1360#discussion_r321585214

There will be 5 overloaded OzoneClientFactory#getRpcClient functions (when 
HDDS-2007 is committed). They contains some redundant logic and unnecessarily 
increases code paths.

Goal: Merge those functions into one or two.

Work will begin after HDDS-2007 is committed.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14509) DN throws InvalidToken due to inequality of password when upgrade NN 2.x to 3.x

2019-09-09 Thread Wei-Chiu Chuang (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16926109#comment-16926109
 ] 

Wei-Chiu Chuang commented on HDFS-14509:


[~ferhui] can you tell if this fix is still required after HDFS-13596?

> DN throws InvalidToken due to inequality of password when upgrade NN 2.x to 
> 3.x
> ---
>
> Key: HDFS-14509
> URL: https://issues.apache.org/jira/browse/HDFS-14509
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Yuxuan Wang
>Priority: Blocker
> Attachments: HDFS-14509-001.patch
>
>
> According to the doc, if we want to upgrade cluster from 2.x to 3.x, we need 
> upgrade NN first. And there will be a intermediate state that NN is 3.x and 
> DN is 2.x. At that moment, if a client reads (or writes) a block, it will get 
> a block token from NN and then deliver the token to DN who can verify the 
> token. But the verification in the code now is :
> {code:title=BlockTokenSecretManager.java|borderStyle=solid}
> public void checkAccess(...)
> {
> ...
> id.readFields(new DataInputStream(new 
> ByteArrayInputStream(token.getIdentifier(;
> ...
> if (!Arrays.equals(retrievePassword(id), token.getPassword())) {
>   throw new InvalidToken("Block token with " + id.toString()
>   + " doesn't have the correct token password");
> }
> }
> {code} 
> And {{retrievePassword(id)}} is:
> {code} 
> public byte[] retrievePassword(BlockTokenIdentifier identifier)
> {
> ...
> return createPassword(identifier.getBytes(), key.getKey());
> }
> {code} 
> So, if NN's identifier add new fields, DN will lose the fields and compute 
> wrong password.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDDS-2075) Tracing in OzoneManager call is propagated with wrong parent

2019-09-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2075?focusedWorklogId=309262=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-309262
 ]

ASF GitHub Bot logged work on HDDS-2075:


Author: ASF GitHub Bot
Created on: 09/Sep/19 21:25
Start Date: 09/Sep/19 21:25
Worklog Time Spent: 10m 
  Work Description: xiaoyuyao commented on issue #1415: HDDS-2075. Tracing 
in OzoneManager call is propagated with wrong parent
URL: https://github.com/apache/hadoop/pull/1415#issuecomment-529673327
 
 
   LGTM, +1. Thanks @adoroszlai  for fixing this. 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 309262)
Time Spent: 40m  (was: 0.5h)

> Tracing in OzoneManager call is propagated with wrong parent
> 
>
> Key: HDDS-2075
> URL: https://issues.apache.org/jira/browse/HDDS-2075
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>Reporter: Elek, Marton
>Assignee: Doroszlai, Attila
>Priority: Major
>  Labels: pull-request-available
> Attachments: create_bucket-new.png, create_bucket.png
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> As you can see in the attached screenshot the OzoneManager.createBucket 
> (server side) tracing information is the children of the freon.createBucket 
> instead of the freon OzoneManagerProtocolPB.submitRequest.
> To avoid confusion the hierarchy should be fixed (Most probably we generate 
> the child span AFTER we already serialized the parent one to the message) 



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-2102) HddsVolumeChecker should use java optional in place of Guava optional

2019-09-09 Thread Bharat Viswanadham (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2102?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bharat Viswanadham updated HDDS-2102:
-
Fix Version/s: 0.5.0
   Resolution: Fixed
   Status: Resolved  (was: Patch Available)

> HddsVolumeChecker should use java optional in place of Guava optional
> -
>
> Key: HDDS-2102
> URL: https://issues.apache.org/jira/browse/HDDS-2102
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Datanode
>Affects Versions: 0.4.0
>Reporter: Mukul Kumar Singh
>Assignee: Mukul Kumar Singh
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.5.0
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> HddsVolumeChecker should use java optional in place of Guava optional, as the 
> Guava dependency is marked unstable.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-2104) Refactor OMFailoverProxyProvider#loadOMClientConfigs

2019-09-09 Thread Siyao Meng (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2104?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siyao Meng updated HDDS-2104:
-
Description: 
Ref: https://github.com/apache/hadoop/pull/1360#discussion_r321586979

Now that we decide to use client-side configuration for OM HA, some logic in 
OMFailoverProxyProvider#loadOMClientConfigs becomes redundant.

The work will begin after HDDS-2007 is committed.

  was:
Now that we decide to use client-side configuration for OM HA, some logic in 
OMFailoverProxyProvider#loadOMClientConfigs becomes redundant.

The work will begin after HDDS-2007 is committed.


> Refactor OMFailoverProxyProvider#loadOMClientConfigs
> 
>
> Key: HDDS-2104
> URL: https://issues.apache.org/jira/browse/HDDS-2104
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>Reporter: Siyao Meng
>Assignee: Siyao Meng
>Priority: Major
>
> Ref: https://github.com/apache/hadoop/pull/1360#discussion_r321586979
> Now that we decide to use client-side configuration for OM HA, some logic in 
> OMFailoverProxyProvider#loadOMClientConfigs becomes redundant.
> The work will begin after HDDS-2007 is committed.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDDS-2102) HddsVolumeChecker should use java optional in place of Guava optional

2019-09-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2102?focusedWorklogId=309252=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-309252
 ]

ASF GitHub Bot logged work on HDDS-2102:


Author: ASF GitHub Bot
Created on: 09/Sep/19 21:17
Start Date: 09/Sep/19 21:17
Worklog Time Spent: 10m 
  Work Description: bharatviswa504 commented on issue #1416: HDDS-2102. 
HddsVolumeChecker should use java optional in place of Guava optional. 
Contributed by Mukul Kumar Singh.
URL: https://github.com/apache/hadoop/pull/1416#issuecomment-529670900
 
 
   Thank You @mukul1987 for the contribution.
   I have committed this to the trunk.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 309252)
Time Spent: 40m  (was: 0.5h)

> HddsVolumeChecker should use java optional in place of Guava optional
> -
>
> Key: HDDS-2102
> URL: https://issues.apache.org/jira/browse/HDDS-2102
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Datanode
>Affects Versions: 0.4.0
>Reporter: Mukul Kumar Singh
>Assignee: Mukul Kumar Singh
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> HddsVolumeChecker should use java optional in place of Guava optional, as the 
> Guava dependency is marked unstable.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDDS-2104) Refactor OMFailoverProxyProvider#loadOMClientConfigs

2019-09-09 Thread Siyao Meng (Jira)
Siyao Meng created HDDS-2104:


 Summary: Refactor OMFailoverProxyProvider#loadOMClientConfigs
 Key: HDDS-2104
 URL: https://issues.apache.org/jira/browse/HDDS-2104
 Project: Hadoop Distributed Data Store
  Issue Type: Improvement
Reporter: Siyao Meng
Assignee: Siyao Meng


Now that we decide to use client-side configuration for OM HA, some logic in 
OMFailoverProxyProvider#loadOMClientConfigs becomes redundant.

The work will begin after HDDS-2007 is committed.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDDS-2102) HddsVolumeChecker should use java optional in place of Guava optional

2019-09-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2102?focusedWorklogId=309251=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-309251
 ]

ASF GitHub Bot logged work on HDDS-2102:


Author: ASF GitHub Bot
Created on: 09/Sep/19 21:17
Start Date: 09/Sep/19 21:17
Worklog Time Spent: 10m 
  Work Description: bharatviswa504 commented on pull request #1416: 
HDDS-2102. HddsVolumeChecker should use java optional in place of Guava 
optional. Contributed by Mukul Kumar Singh.
URL: https://github.com/apache/hadoop/pull/1416
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 309251)
Time Spent: 0.5h  (was: 20m)

> HddsVolumeChecker should use java optional in place of Guava optional
> -
>
> Key: HDDS-2102
> URL: https://issues.apache.org/jira/browse/HDDS-2102
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Datanode
>Affects Versions: 0.4.0
>Reporter: Mukul Kumar Singh
>Assignee: Mukul Kumar Singh
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> HddsVolumeChecker should use java optional in place of Guava optional, as the 
> Guava dependency is marked unstable.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Assigned] (HDDS-1505) Remove "ozone.enabled" parameter from ozone configs

2019-09-09 Thread Vivek Ratnavel Subramanian (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-1505?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vivek Ratnavel Subramanian reassigned HDDS-1505:


Assignee: Vivek Ratnavel Subramanian

> Remove "ozone.enabled" parameter from ozone configs
> ---
>
> Key: HDDS-1505
> URL: https://issues.apache.org/jira/browse/HDDS-1505
> Project: Hadoop Distributed Data Store
>  Issue Type: Task
>  Components: Ozone Manager
>Affects Versions: 0.4.0
>Reporter: Vivek Ratnavel Subramanian
>Assignee: Vivek Ratnavel Subramanian
>Priority: Minor
>
> Remove "ozone.enabled" config as it is no longer needed



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDDS-2102) HddsVolumeChecker should use java optional in place of Guava optional

2019-09-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2102?focusedWorklogId=309245=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-309245
 ]

ASF GitHub Bot logged work on HDDS-2102:


Author: ASF GitHub Bot
Created on: 09/Sep/19 21:04
Start Date: 09/Sep/19 21:04
Worklog Time Spent: 10m 
  Work Description: hadoop-yetus commented on issue #1416: HDDS-2102. 
HddsVolumeChecker should use java optional in place of Guava optional. 
Contributed by Mukul Kumar Singh.
URL: https://github.com/apache/hadoop/pull/1416#issuecomment-529666543
 
 
   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime | Comment |
   |::|--:|:|:|
   | 0 | reexec | 100 | Docker mode activated. |
   ||| _ Prechecks _ |
   | +1 | dupname | 0 | No case conflicting files found. |
   | +1 | @author | 0 | The patch does not contain any @author tags. |
   | +1 | test4tests | 0 | The patch appears to include 1 new or modified test 
files. |
   ||| _ trunk Compile Tests _ |
   | +1 | mvninstall | 691 | trunk passed |
   | +1 | compile | 388 | trunk passed |
   | +1 | checkstyle | 74 | trunk passed |
   | +1 | mvnsite | 0 | trunk passed |
   | +1 | shadedclient | 979 | branch has no errors when building and testing 
our client artifacts. |
   | +1 | javadoc | 185 | trunk passed |
   | 0 | spotbugs | 453 | Used deprecated FindBugs config; considering 
switching to SpotBugs. |
   | +1 | findbugs | 688 | trunk passed |
   ||| _ Patch Compile Tests _ |
   | +1 | mvninstall | 555 | the patch passed |
   | +1 | compile | 409 | the patch passed |
   | +1 | javac | 409 | the patch passed |
   | +1 | checkstyle | 80 | the patch passed |
   | +1 | mvnsite | 0 | the patch passed |
   | +1 | whitespace | 0 | The patch has no whitespace issues. |
   | +1 | shadedclient | 725 | patch has no errors when building and testing 
our client artifacts. |
   | +1 | javadoc | 183 | the patch passed |
   | +1 | findbugs | 768 | the patch passed |
   ||| _ Other Tests _ |
   | -1 | unit | 313 | hadoop-hdds in the patch failed. |
   | -1 | unit | 289 | hadoop-ozone in the patch failed. |
   | +1 | asflicense | 42 | The patch does not generate ASF License warnings. |
   | | | 6642 | |
   
   
   | Reason | Tests |
   |---:|:--|
   | Failed junit tests | hadoop.hdds.scm.block.TestBlockManager |
   |   | hadoop.ozone.om.ratis.TestOzoneManagerRatisServer |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | Client=18.09.7 Server=18.09.7 base: 
https://builds.apache.org/job/hadoop-multibranch/job/PR-1416/1/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/1416 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall 
mvnsite unit shadedclient findbugs checkstyle |
   | uname | Linux 626c82f6d9c2 4.15.0-58-generic #64-Ubuntu SMP Tue Aug 6 
11:12:41 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | personality/hadoop.sh |
   | git revision | trunk / 469165e |
   | Default Java | 1.8.0_222 |
   | unit | 
https://builds.apache.org/job/hadoop-multibranch/job/PR-1416/1/artifact/out/patch-unit-hadoop-hdds.txt
 |
   | unit | 
https://builds.apache.org/job/hadoop-multibranch/job/PR-1416/1/artifact/out/patch-unit-hadoop-ozone.txt
 |
   |  Test Results | 
https://builds.apache.org/job/hadoop-multibranch/job/PR-1416/1/testReport/ |
   | Max. process+thread count | 1325 (vs. ulimit of 5500) |
   | modules | C: hadoop-hdds/container-service U: 
hadoop-hdds/container-service |
   | Console output | 
https://builds.apache.org/job/hadoop-multibranch/job/PR-1416/1/console |
   | versions | git=2.7.4 maven=3.3.9 findbugs=3.1.0-RC1 |
   | Powered by | Apache Yetus 0.10.0 http://yetus.apache.org |
   
   
   This message was automatically generated.
   
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 309245)
Time Spent: 20m  (was: 10m)

> HddsVolumeChecker should use java optional in place of Guava optional
> -
>
> Key: HDDS-2102
> URL: https://issues.apache.org/jira/browse/HDDS-2102
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Datanode
>Affects Versions: 0.4.0
>Reporter: Mukul Kumar Singh
>Assignee: Mukul Kumar Singh
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> HddsVolumeChecker 

[jira] [Commented] (HDDS-2097) Add TeraSort to acceptance test

2019-09-09 Thread Xiaoyu Yao (Jira)


[ 
https://issues.apache.org/jira/browse/HDDS-2097?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16926094#comment-16926094
 ] 

Xiaoyu Yao commented on HDDS-2097:
--

Thanks [~ste...@apache.org] for the heads up. I will play with it and see if 
the existing one for s3a fits requirement on ozone. Also, ozone as a submodule 
depends on Hadoop 3.2.0, is this available in Hadoop 3.2.0?

> Add TeraSort to acceptance test
> ---
>
> Key: HDDS-2097
> URL: https://issues.apache.org/jira/browse/HDDS-2097
> Project: Hadoop Distributed Data Store
>  Issue Type: Test
>Reporter: Xiaoyu Yao
>Priority: Major
>
> We may begin with 1GB teragen/terasort/teravalidate.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Resolved] (HDDS-2086) ReconServer throws SQLException but path present for ozone.recon.db.dir in ozone-site

2019-09-09 Thread Aravindan Vijayan (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2086?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aravindan Vijayan resolved HDDS-2086.
-
Resolution: Cannot Reproduce

Unable to repro on latest trunk when configuring ozone.recon.db.dir to an 
existing directory. Possibly an environment issue. 

> ReconServer throws SQLException but path present for ozone.recon.db.dir in 
> ozone-site
> -
>
> Key: HDDS-2086
> URL: https://issues.apache.org/jira/browse/HDDS-2086
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>Reporter: Shweta
>Priority: Major
>
> java.sql.SQLException: path to 
> '/${ozone.recon.db.dir}/ozone_recon_sqlite.db': '/${ozone.recon.db.dir}' does 
> not exist
> But property present in ozone-site.xml:
> 
> ozone.recon.db.dir
> /tmp/metadata
>  



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work started] (HDDS-2103) TestContainerReplication fails due to unhealthy container

2019-09-09 Thread Doroszlai, Attila (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2103?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HDDS-2103 started by Doroszlai, Attila.
---
> TestContainerReplication fails due to unhealthy container
> -
>
> Key: HDDS-2103
> URL: https://issues.apache.org/jira/browse/HDDS-2103
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: test
>Affects Versions: 0.5.0
>Reporter: Doroszlai, Attila
>Assignee: Doroszlai, Attila
>Priority: Major
>
> {code:title=https://github.com/elek/ozone-ci/blob/master/trunk/trunk-nightly-20190907-l8mkd/integration/hadoop-ozone/integration-test/org.apache.hadoop.ozone.container.TestContainerReplication.txt}
> Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 12.771 s <<< 
> FAILURE! - in org.apache.hadoop.ozone.container.TestContainerReplication
> testContainerReplication(org.apache.hadoop.ozone.container.TestContainerReplication)
>   Time elapsed: 12.702 s  <<< FAILURE!
> java.lang.AssertionError: Container is not replicated to the destination 
> datanode
>   at org.junit.Assert.fail(Assert.java:88)
>   at org.junit.Assert.assertTrue(Assert.java:41)
>   at org.junit.Assert.assertNotNull(Assert.java:621)
>   at 
> org.apache.hadoop.ozone.container.TestContainerReplication.testContainerReplication(TestContainerReplication.java:153)
> {code}
> caused by:
> {code:title=https://github.com/elek/ozone-ci/blob/master/trunk/trunk-nightly-20190907-l8mkd/integration/hadoop-ozone/integration-test/org.apache.hadoop.ozone.container.TestContainerReplication-output.txt}
> java.lang.IllegalStateException: Only closed containers could be exported: 
> ContainerId=1
>   at 
> org.apache.hadoop.ozone.container.keyvalue.KeyValueContainer.exportContainerData(KeyValueContainer.java:525)
>   at 
> org.apache.hadoop.ozone.container.keyvalue.KeyValueHandler.exportContainer(KeyValueHandler.java:875)
>   at 
> org.apache.hadoop.ozone.container.ozoneimpl.ContainerController.exportContainer(ContainerController.java:134)
>   at 
> org.apache.hadoop.ozone.container.replication.OnDemandContainerReplicationSource.copyData(OnDemandContainerReplicationSource.java:64)
>   at 
> org.apache.hadoop.ozone.container.replication.GrpcReplicationService.download(GrpcReplicationService.java:63)
> {code}
> Container is in unhealthy state because pipeline is not found for it in 
> {{CloseContainerCommandHandler}}.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-2103) TestContainerReplication fails due to unhealthy container

2019-09-09 Thread Doroszlai, Attila (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2103?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Doroszlai, Attila updated HDDS-2103:

Target Version/s: 0.5.0

> TestContainerReplication fails due to unhealthy container
> -
>
> Key: HDDS-2103
> URL: https://issues.apache.org/jira/browse/HDDS-2103
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: test
>Affects Versions: 0.5.0
>Reporter: Doroszlai, Attila
>Assignee: Doroszlai, Attila
>Priority: Major
>
> {code:title=https://github.com/elek/ozone-ci/blob/master/trunk/trunk-nightly-20190907-l8mkd/integration/hadoop-ozone/integration-test/org.apache.hadoop.ozone.container.TestContainerReplication.txt}
> Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 12.771 s <<< 
> FAILURE! - in org.apache.hadoop.ozone.container.TestContainerReplication
> testContainerReplication(org.apache.hadoop.ozone.container.TestContainerReplication)
>   Time elapsed: 12.702 s  <<< FAILURE!
> java.lang.AssertionError: Container is not replicated to the destination 
> datanode
>   at org.junit.Assert.fail(Assert.java:88)
>   at org.junit.Assert.assertTrue(Assert.java:41)
>   at org.junit.Assert.assertNotNull(Assert.java:621)
>   at 
> org.apache.hadoop.ozone.container.TestContainerReplication.testContainerReplication(TestContainerReplication.java:153)
> {code}
> caused by:
> {code:title=https://github.com/elek/ozone-ci/blob/master/trunk/trunk-nightly-20190907-l8mkd/integration/hadoop-ozone/integration-test/org.apache.hadoop.ozone.container.TestContainerReplication-output.txt}
> java.lang.IllegalStateException: Only closed containers could be exported: 
> ContainerId=1
>   at 
> org.apache.hadoop.ozone.container.keyvalue.KeyValueContainer.exportContainerData(KeyValueContainer.java:525)
>   at 
> org.apache.hadoop.ozone.container.keyvalue.KeyValueHandler.exportContainer(KeyValueHandler.java:875)
>   at 
> org.apache.hadoop.ozone.container.ozoneimpl.ContainerController.exportContainer(ContainerController.java:134)
>   at 
> org.apache.hadoop.ozone.container.replication.OnDemandContainerReplicationSource.copyData(OnDemandContainerReplicationSource.java:64)
>   at 
> org.apache.hadoop.ozone.container.replication.GrpcReplicationService.download(GrpcReplicationService.java:63)
> {code}
> Container is in unhealthy state because pipeline is not found for it in 
> {{CloseContainerCommandHandler}}.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDDS-2103) TestContainerReplication fails due to unhealthy container

2019-09-09 Thread Doroszlai, Attila (Jira)
Doroszlai, Attila created HDDS-2103:
---

 Summary: TestContainerReplication fails due to unhealthy container
 Key: HDDS-2103
 URL: https://issues.apache.org/jira/browse/HDDS-2103
 Project: Hadoop Distributed Data Store
  Issue Type: Bug
  Components: test
Affects Versions: 0.5.0
Reporter: Doroszlai, Attila
Assignee: Doroszlai, Attila


{code:title=https://github.com/elek/ozone-ci/blob/master/trunk/trunk-nightly-20190907-l8mkd/integration/hadoop-ozone/integration-test/org.apache.hadoop.ozone.container.TestContainerReplication.txt}
Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 12.771 s <<< 
FAILURE! - in org.apache.hadoop.ozone.container.TestContainerReplication
testContainerReplication(org.apache.hadoop.ozone.container.TestContainerReplication)
  Time elapsed: 12.702 s  <<< FAILURE!
java.lang.AssertionError: Container is not replicated to the destination 
datanode
at org.junit.Assert.fail(Assert.java:88)
at org.junit.Assert.assertTrue(Assert.java:41)
at org.junit.Assert.assertNotNull(Assert.java:621)
at 
org.apache.hadoop.ozone.container.TestContainerReplication.testContainerReplication(TestContainerReplication.java:153)
{code}

caused by:

{code:title=https://github.com/elek/ozone-ci/blob/master/trunk/trunk-nightly-20190907-l8mkd/integration/hadoop-ozone/integration-test/org.apache.hadoop.ozone.container.TestContainerReplication-output.txt}
java.lang.IllegalStateException: Only closed containers could be exported: 
ContainerId=1
at 
org.apache.hadoop.ozone.container.keyvalue.KeyValueContainer.exportContainerData(KeyValueContainer.java:525)
at 
org.apache.hadoop.ozone.container.keyvalue.KeyValueHandler.exportContainer(KeyValueHandler.java:875)
at 
org.apache.hadoop.ozone.container.ozoneimpl.ContainerController.exportContainer(ContainerController.java:134)
at 
org.apache.hadoop.ozone.container.replication.OnDemandContainerReplicationSource.copyData(OnDemandContainerReplicationSource.java:64)
at 
org.apache.hadoop.ozone.container.replication.GrpcReplicationService.download(GrpcReplicationService.java:63)
{code}

Container is in unhealthy state because pipeline is not found for it in 
{{CloseContainerCommandHandler}}.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDDS-2098) Ozone shell command prints out ERROR when the log4j file is not present.

2019-09-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2098?focusedWorklogId=309169=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-309169
 ]

ASF GitHub Bot logged work on HDDS-2098:


Author: ASF GitHub Bot
Created on: 09/Sep/19 19:55
Start Date: 09/Sep/19 19:55
Worklog Time Spent: 10m 
  Work Description: avijayanhwx commented on issue #1411: HDDS-2098 : Ozone 
shell command prints out ERROR when the log4j file …
URL: https://github.com/apache/hadoop/pull/1411#issuecomment-529639718
 
 
   > I have a question
   > During ozone tarball build, we do copy ozone-shell-log4j.properties to 
etc/hadoop (like we copy log4.properties then why do we see this error or 
something need to be fixed in copying this script?
   > 
   > 
https://github.com/apache/hadoop/blob/trunk/hadoop-ozone/dist/dev-support/bin/dist-layout-stitching#L95
   
   Yes, while starting ozone from snapshot tar ball, it works perfectly. 
However, when Ozone is deployed through a management product like Cloudera 
Manager, the log4j properties may not be individually configurable. We may have 
to rely on a default log4.properties. In that case, printing a 
FileNotFoundException for ozone shell commands is something we can avoid. 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 309169)
Time Spent: 1h 20m  (was: 1h 10m)

> Ozone shell command prints out ERROR when the log4j file is not present.
> 
>
> Key: HDDS-2098
> URL: https://issues.apache.org/jira/browse/HDDS-2098
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone CLI
>Affects Versions: 0.5.0
>Reporter: Aravindan Vijayan
>Assignee: Aravindan Vijayan
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.5.0
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> *Exception Trace*
> {code}
> log4j:ERROR Could not read configuration file from URL 
> [file:/etc/ozone/conf/ozone-shell-log4j.properties].
> java.io.FileNotFoundException: /etc/ozone/conf/ozone-shell-log4j.properties 
> (No such file or directory)
>   at java.io.FileInputStream.open0(Native Method)
>   at java.io.FileInputStream.open(FileInputStream.java:195)
>   at java.io.FileInputStream.(FileInputStream.java:138)
>   at java.io.FileInputStream.(FileInputStream.java:93)
>   at 
> sun.net.www.protocol.file.FileURLConnection.connect(FileURLConnection.java:90)
>   at 
> sun.net.www.protocol.file.FileURLConnection.getInputStream(FileURLConnection.java:188)
>   at 
> org.apache.log4j.PropertyConfigurator.doConfigure(PropertyConfigurator.java:557)
>   at 
> org.apache.log4j.helpers.OptionConverter.selectAndConfigure(OptionConverter.java:526)
>   at org.apache.log4j.LogManager.(LogManager.java:127)
>   at org.slf4j.impl.Log4jLoggerFactory.(Log4jLoggerFactory.java:66)
>   at org.slf4j.impl.StaticLoggerBinder.(StaticLoggerBinder.java:72)
>   at 
> org.slf4j.impl.StaticLoggerBinder.(StaticLoggerBinder.java:45)
>   at org.slf4j.LoggerFactory.bind(LoggerFactory.java:150)
>   at org.slf4j.LoggerFactory.performInitialization(LoggerFactory.java:124)
>   at org.slf4j.LoggerFactory.getILoggerFactory(LoggerFactory.java:412)
>   at org.slf4j.LoggerFactory.getLogger(LoggerFactory.java:357)
>   at org.slf4j.LoggerFactory.getLogger(LoggerFactory.java:383)
>   at org.apache.hadoop.ozone.web.ozShell.Shell.(Shell.java:35)
> log4j:ERROR Ignoring configuration file 
> [file:/etc/ozone/conf/ozone-shell-log4j.properties].
> log4j:WARN No appenders could be found for logger 
> (io.jaegertracing.thrift.internal.senders.ThriftSenderFactory).
> log4j:WARN Please initialize the log4j system properly.
> log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more 
> info.
> {
>   "metadata" : { },
>   "name" : "vol-test-putfile-1567740142",
>   "admin" : "root",
>   "owner" : "root",
>   "creationTime" : 1567740146501,
>   "acls" : [ {
> "type" : "USER",
> "name" : "root",
> "aclScope" : "ACCESS",
> "aclList" : [ "ALL" ]
>   }, {
> "type" : "GROUP",
> "name" : "root",
> "aclScope" : "ACCESS",
> "aclList" : [ "ALL" ]
>   } ],
>   "quota" : 1152921504606846976
> }
> {code}
> *Fix*
> When a log4j file is not present, the default should be console.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Commented] (HDFS-14774) RBF: Improve RouterWebhdfsMethods#chooseDatanode() error handling

2019-09-09 Thread CR Hota (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14774?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16926050#comment-16926050
 ] 

CR Hota commented on HDFS-14774:


Hey [~jojochuang], 

Do you have any follow up questions or shall we close this?

> RBF: Improve RouterWebhdfsMethods#chooseDatanode() error handling
> -
>
> Key: HDFS-14774
> URL: https://issues.apache.org/jira/browse/HDFS-14774
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Wei-Chiu Chuang
>Assignee: CR Hota
>Priority: Minor
>
>  HDFS-13972 added the following code:
> {code}
> try {
>   dns = rpcServer.getDatanodeReport(DatanodeReportType.LIVE);
> } catch (IOException e) {
>   LOG.error("Cannot get the datanodes from the RPC server", e);
> } finally {
>   // Reset ugi to remote user for remaining operations.
>   RouterRpcServer.resetCurrentUser();
> }
> HashSet excludes = new HashSet();
> if (excludeDatanodes != null) {
>   Collection collection =
>   getTrimmedStringCollection(excludeDatanodes);
>   for (DatanodeInfo dn : dns) {
> if (collection.contains(dn.getName())) {
>   excludes.add(dn);
> }
>   }
> }
> {code}
> If {{rpcServer.getDatanodeReport()}} throws an exception, {{dns}} will become 
> null. This does't look like the best way to handle the exception. Should 
> router retry upon exception? Does it perform retry automatically under the 
> hood?
> [~crh] [~brahmareddy]



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-1843) Undetectable corruption after restart of a datanode

2019-09-09 Thread Hudson (Jira)


[ 
https://issues.apache.org/jira/browse/HDDS-1843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16926017#comment-16926017
 ] 

Hudson commented on HDDS-1843:
--

FAILURE: Integrated in Jenkins build Hadoop-trunk-Commit #17262 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/17262/])
HDDS-1843. Undetectable corruption after restart of a datanode. (shashikant: 
rev 469165e6f29a6e7788f218bdbbc3f7bacf26628b)
* (edit) hadoop-hdds/common/src/main/proto/DatanodeContainerProtocol.proto
* (edit) 
hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/common/interfaces/ContainerDispatcher.java
* (edit) 
hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/common/transport/server/ratis/ContainerStateMachine.java
* (edit) 
hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/common/impl/HddsDispatcher.java
* (edit) 
hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/keyvalue/KeyValueContainer.java
* (edit) 
hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/common/transport/server/ratis/DispatcherContext.java
* (edit) 
hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/keyvalue/impl/BlockManagerImpl.java
* (edit) 
hadoop-ozone/integration-test/src/test/java/org/apache/hadoop/ozone/container/server/TestSecureContainerServer.java
* (edit) 
hadoop-ozone/integration-test/src/test/java/org/apache/hadoop/ozone/container/server/TestContainerServer.java
* (edit) 
hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/common/impl/ContainerSet.java
* (edit) 
hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/common/interfaces/Container.java
* (edit) 
hadoop-ozone/integration-test/src/test/java/org/apache/hadoop/ozone/container/common/transport/server/ratis/TestCSMMetrics.java
* (edit) 
hadoop-ozone/integration-test/src/test/java/org/apache/hadoop/ozone/client/rpc/TestContainerStateMachineFailures.java


> Undetectable corruption after restart of a datanode
> ---
>
> Key: HDDS-1843
> URL: https://issues.apache.org/jira/browse/HDDS-1843
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Datanode
>Affects Versions: 0.5.0
>Reporter: Shashikant Banerjee
>Assignee: Shashikant Banerjee
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 0.5.0
>
> Attachments: HDDS-1843.000.patch
>
>  Time Spent: 9h 50m
>  Remaining Estimate: 0h
>
> Right now, all write chunks use BufferedIO ie, sync flag is disabled by 
> default. Also, Rocks Db metadata updates are done in Rocks DB cache first at 
> Datanode. In case, there comes a situation where the buffered chunk data as 
> well as the corresponding metadata update is lost as a part of datanode 
> restart, it may lead to a situation where, it will not be possible to detect 
> the corruption (not even with container scanner) of this nature in a 
> reasonable time frame, until and unless there is a client IO failure or Recon 
> server detects it over time. In order to atleast to detect the problem, Ratis 
> snapshot on datanode should sync the rocks db file . In such a way, 
> ContainerScanner will be able to detect this.We can also add a metric around 
> sync to measure how much of a throughput loss it can incurr.
> Thanks [~msingh] for suggesting this.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-13541) NameNode Port based selective encryption

2019-09-09 Thread Chen Liang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-13541?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chen Liang updated HDFS-13541:
--
Resolution: Fixed
Status: Resolved  (was: Patch Available)

Although this is an umbrella Jira, given that this Jira is marked releaser 
blocker, closing this ticket to unblock releasers.

> NameNode Port based selective encryption
> 
>
> Key: HDFS-13541
> URL: https://issues.apache.org/jira/browse/HDFS-13541
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode, namenode, security
>Reporter: Chen Liang
>Assignee: Chen Liang
>Priority: Major
>  Labels: release-blocker
> Attachments: HDFS-13541-branch-2.001.patch, 
> HDFS-13541-branch-2.002.patch, HDFS-13541-branch-2.003.patch, 
> HDFS-13541-branch-3.1.001.patch, HDFS-13541-branch-3.1.002.patch, 
> HDFS-13541-branch-3.2.001.patch, HDFS-13541-branch-3.2.002.patch, NameNode 
> Port based selective encryption-v1.pdf
>
>
> Here at LinkedIn, one issue we face is that we need to enforce different 
> security requirement based on the location of client and the cluster. 
> Specifically, for clients from outside of the data center, it is required by 
> regulation that all traffic must be encrypted. But for clients within the 
> same data center, unencrypted connections are more desired to avoid the high 
> encryption overhead. 
> HADOOP-10221 introduced pluggable SASL resolver, based on which HADOOP-10335 
> introduced WhitelistBasedResolver which solves the same problem. However we 
> found it difficult to fit into our environment for several reasons. In this 
> JIRA, on top of pluggable SASL resolver, *we propose a different approach of 
> running RPC two ports on NameNode, and the two ports will be enforcing 
> encrypted and unencrypted connections respectively, and the following 
> DataNode access will simply follow the same behaviour of 
> encryption/unencryption*. Then by blocking unencrypted port on datacenter 
> firewall, we can completely block unencrypted external access.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDDS-2075) Tracing in OzoneManager call is propagated with wrong parent

2019-09-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2075?focusedWorklogId=309142=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-309142
 ]

ASF GitHub Bot logged work on HDDS-2075:


Author: ASF GitHub Bot
Created on: 09/Sep/19 18:25
Start Date: 09/Sep/19 18:25
Worklog Time Spent: 10m 
  Work Description: hadoop-yetus commented on issue #1415: HDDS-2075. 
Tracing in OzoneManager call is propagated with wrong parent
URL: https://github.com/apache/hadoop/pull/1415#issuecomment-529606399
 
 
   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime | Comment |
   |::|--:|:|:|
   | 0 | reexec | 1333 | Docker mode activated. |
   ||| _ Prechecks _ |
   | +1 | dupname | 0 | No case conflicting files found. |
   | +1 | @author | 0 | The patch does not contain any @author tags. |
   | -1 | test4tests | 0 | The patch doesn't appear to include any new or 
modified tests.  Please justify why no new tests are needed for this patch. 
Also please list what manual steps were performed to verify this patch. |
   ||| _ trunk Compile Tests _ |
   | 0 | mvndep | 45 | Maven dependency ordering for branch |
   | +1 | mvninstall | 647 | trunk passed |
   | +1 | compile | 391 | trunk passed |
   | +1 | checkstyle | 75 | trunk passed |
   | +1 | mvnsite | 0 | trunk passed |
   | +1 | shadedclient | 947 | branch has no errors when building and testing 
our client artifacts. |
   | +1 | javadoc | 172 | trunk passed |
   | 0 | spotbugs | 479 | Used deprecated FindBugs config; considering 
switching to SpotBugs. |
   | +1 | findbugs | 702 | trunk passed |
   ||| _ Patch Compile Tests _ |
   | 0 | mvndep | 23 | Maven dependency ordering for patch |
   | +1 | mvninstall | 562 | the patch passed |
   | +1 | compile | 374 | the patch passed |
   | +1 | javac | 374 | the patch passed |
   | +1 | checkstyle | 79 | the patch passed |
   | +1 | mvnsite | 0 | the patch passed |
   | +1 | whitespace | 0 | The patch has no whitespace issues. |
   | +1 | shadedclient | 744 | patch has no errors when building and testing 
our client artifacts. |
   | +1 | javadoc | 167 | the patch passed |
   | +1 | findbugs | 653 | the patch passed |
   ||| _ Other Tests _ |
   | +1 | unit | 315 | hadoop-hdds in the patch passed. |
   | -1 | unit | 236 | hadoop-ozone in the patch failed. |
   | +1 | asflicense | 40 | The patch does not generate ASF License warnings. |
   | | | 7674 | |
   
   
   | Reason | Tests |
   |---:|:--|
   | Failed junit tests | hadoop.ozone.om.ratis.TestOzoneManagerRatisServer |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | Client=19.03.0 Server=19.03.0 base: 
https://builds.apache.org/job/hadoop-multibranch/job/PR-1415/1/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/1415 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall 
mvnsite unit shadedclient findbugs checkstyle |
   | uname | Linux fe5b73ebf793 4.15.0-58-generic #64-Ubuntu SMP Tue Aug 6 
11:12:41 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | personality/hadoop.sh |
   | git revision | trunk / 147f986 |
   | Default Java | 1.8.0_222 |
   | unit | 
https://builds.apache.org/job/hadoop-multibranch/job/PR-1415/1/artifact/out/patch-unit-hadoop-ozone.txt
 |
   |  Test Results | 
https://builds.apache.org/job/hadoop-multibranch/job/PR-1415/1/testReport/ |
   | Max. process+thread count | 1298 (vs. ulimit of 5500) |
   | modules | C: hadoop-ozone/common hadoop-ozone/client U: hadoop-ozone |
   | Console output | 
https://builds.apache.org/job/hadoop-multibranch/job/PR-1415/1/console |
   | versions | git=2.7.4 maven=3.3.9 findbugs=3.1.0-RC1 |
   | Powered by | Apache Yetus 0.10.0 http://yetus.apache.org |
   
   
   This message was automatically generated.
   
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 309142)
Time Spent: 0.5h  (was: 20m)

> Tracing in OzoneManager call is propagated with wrong parent
> 
>
> Key: HDDS-2075
> URL: https://issues.apache.org/jira/browse/HDDS-2075
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>Reporter: Elek, Marton
>Assignee: Doroszlai, Attila
>Priority: Major
>  Labels: pull-request-available
> Attachments: create_bucket-new.png, create_bucket.png
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> 

[jira] [Commented] (HDFS-12288) Fix DataNode's xceiver count calculation

2019-09-09 Thread Lukas Majercak (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-12288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16925976#comment-16925976
 ] 

Lukas Majercak commented on HDFS-12288:
---

[~zhangchen] not working on this right now, feel free to pick it up.

> Fix DataNode's xceiver count calculation
> 
>
> Key: HDFS-12288
> URL: https://issues.apache.org/jira/browse/HDFS-12288
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode, hdfs
>Reporter: Lukas Majercak
>Assignee: Lukas Majercak
>Priority: Major
> Attachments: HDFS-12288.001.patch, HDFS-12288.002.patch
>
>
> The problem with the ThreadGroup.activeCount() method is that the method is 
> only a very rough estimate, and in reality returns the total number of 
> threads in the thread group as opposed to the threads actually running.
> In some DNs, we saw this to return 50~ for a long time, even though the 
> actual number of DataXceiver threads was next to none.
> This is a big issue as we use the xceiverCount to make decisions on the NN 
> for choosing replication source DN or returning DNs to clients for R/W.
> The plan is to reuse the DataNodeMetrics.dataNodeActiveXceiversCount value 
> which only accounts for actual number of DataXcevier threads currently 
> running and thus represents the load on the DN much better.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDDS-2076) Read fails because the block cannot be located in the container

2019-09-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2076?focusedWorklogId=309112=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-309112
 ]

ASF GitHub Bot logged work on HDDS-2076:


Author: ASF GitHub Bot
Created on: 09/Sep/19 17:38
Start Date: 09/Sep/19 17:38
Worklog Time Spent: 10m 
  Work Description: nandakumar131 commented on issue #1410: HDDS-2076. Read 
fails because the block cannot be located in the container
URL: https://github.com/apache/hadoop/pull/1410#issuecomment-529588126
 
 
   /retest
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 309112)
Time Spent: 1h 40m  (was: 1.5h)

> Read fails because the block cannot be located in the container
> ---
>
> Key: HDDS-2076
> URL: https://issues.apache.org/jira/browse/HDDS-2076
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Client, Ozone Datanode
>Affects Versions: 0.4.0
>Reporter: Mukul Kumar Singh
>Assignee: Shashikant Banerjee
>Priority: Blocker
>  Labels: MiniOzoneChaosCluster, pull-request-available
> Attachments: log.zip
>
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> Read fails as the client is not able to read the block from the container.
> {code}
> org.apache.hadoop.hdds.scm.container.common.helpers.StorageContainerException:
>  Unable to find the block with bcsID 2515 .Container 7 bcsId is 0.
> at 
> org.apache.hadoop.hdds.scm.storage.ContainerProtocolCalls.validateContainerResponse(ContainerProtocolCalls.java:536)
> at 
> org.apache.hadoop.hdds.scm.storage.ContainerProtocolCalls.lambd2a0$getValid1a9to-08-30
>  12:51:20,081 | INFO  | SCMAudit | user=msingh | ip=192.168.0.r103 
> |List$0(ContainerP
> rotocolCalls.java:569)
> {code}
> The client eventually exits here
> {code}
> 2019-08-30 12:51:20,081 [pool-224-thread-6] ERROR 
> ozone.MiniOzoneLoadGenerator (MiniOzoneLoadGenerator.java:readData(176)) - 
> LOADGEN: Read key:pool-224-thread-6_330651 failed with ex
> ception
> ERROR ozone.MiniOzoneLoadGenerator (MiniOzoneLoadGenerator.java:load(121)) - 
> LOADGEN: Exiting due to exception
> {code}



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14820) The default 8KB buffer of BlockReaderRemote#newBlockReader#BufferedOutputStream is too big

2019-09-09 Thread Jira


[ 
https://issues.apache.org/jira/browse/HDFS-14820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16925944#comment-16925944
 ] 

Íñigo Goiri commented on HDFS-14820:


What is the current default value? 8KB?
I think this is too sensitive to change like this.
We should make it configurable and make the default the old value.

>  The default 8KB buffer of 
> BlockReaderRemote#newBlockReader#BufferedOutputStream is too big
> ---
>
> Key: HDFS-14820
> URL: https://issues.apache.org/jira/browse/HDFS-14820
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Lisheng Sun
>Assignee: Lisheng Sun
>Priority: Major
> Attachments: HDFS-14820.001.patch
>
>
> this issue is similar to HDFS-14535.
> {code:java}
> public static BlockReader newBlockReader(String file,
> ExtendedBlock block,
> Token blockToken,
> long startOffset, long len,
> boolean verifyChecksum,
> String clientName,
> Peer peer, DatanodeID datanodeID,
> PeerCache peerCache,
> CachingStrategy cachingStrategy,
> int networkDistance) throws IOException {
>   // in and out will be closed when sock is closed (by the caller)
>   final DataOutputStream out = new DataOutputStream(new BufferedOutputStream(
>   peer.getOutputStream()));
>   new Sender(out).readBlock(block, blockToken, clientName, startOffset, len,
>   verifyChecksum, cachingStrategy);
> }
> public BufferedOutputStream(OutputStream out) {
> this(out, 8192);
> }
> {code}
> Sender#readBlock parameter( block,blockToken, clientName, startOffset, len, 
> verifyChecksum, cachingStrategy) could not use such a big buffer.
> So i think it should reduce BufferedOutputStream buffer.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14833) RBF: Router Update Doesn't Sync Quota

2019-09-09 Thread Ayush Saxena (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14833?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ayush Saxena updated HDFS-14833:

Parent: HDFS-14603
Issue Type: Sub-task  (was: Bug)

> RBF: Router Update Doesn't Sync Quota
> -
>
> Key: HDFS-14833
> URL: https://issues.apache.org/jira/browse/HDFS-14833
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Ayush Saxena
>Assignee: Ayush Saxena
>Priority: Major
>
> HDFS-14777 Added a check to prevent RPC call, It checks whether in the 
> present state whether quota is changing. 
> But ignores the part that if the locations are changed. if the location is 
> changed the new destination should be synchronized with the mount entry 
> quota. 



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14704) RBF: ServiceAddress and webAddress should not be null in NamenodeHeartbeatService

2019-09-09 Thread Jira


[ 
https://issues.apache.org/jira/browse/HDFS-14704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16925942#comment-16925942
 ] 

Íñigo Goiri commented on HDFS-14704:


Keep in mind there are setup were the serviceAddress is not used; do we support 
those with this change?

> RBF: ServiceAddress and webAddress should not be null in 
> NamenodeHeartbeatService
> -
>
> Key: HDFS-14704
> URL: https://issues.apache.org/jira/browse/HDFS-14704
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: rbf
>Reporter: xuzq
>Assignee: xuzq
>Priority: Major
> Attachments: HDFS-14704-trunk-001.patch, HDFS-14704-trunk-002.patch, 
> HDFS-14704-trunk-003.patch
>
>
> NnId should not be null in NamenodeHeartbeatService.
> If NnId is null, it will also print the error message like:
> {code:java}
> 2019-08-06 10:38:07,455 ERROR router.NamenodeHeartbeatService 
> (NamenodeHeartbeatService.java:updateState(229)) - Unhandled exception 
> updating NN registration for ns1:null
> java.lang.NullPointerException
> at 
> org.apache.hadoop.hdfs.federation.protocol.proto.HdfsServerFederationProtos$NamenodeMembershipRecordProto$Builder.setServiceAddress(HdfsServerFederationProtos.java:3831)
> at 
> org.apache.hadoop.hdfs.server.federation.store.records.impl.pb.MembershipStatePBImpl.setServiceAddress(MembershipStatePBImpl.java:119)
> at 
> org.apache.hadoop.hdfs.server.federation.store.records.MembershipState.newInstance(MembershipState.java:108)
> at 
> org.apache.hadoop.hdfs.server.federation.resolver.MembershipNamenodeResolver.registerNamenode(MembershipNamenodeResolver.java:267)
> at 
> org.apache.hadoop.hdfs.server.federation.router.NamenodeHeartbeatService.updateState(NamenodeHeartbeatService.java:223)
> at 
> org.apache.hadoop.hdfs.server.federation.router.NamenodeHeartbeatService.periodicInvoke(NamenodeHeartbeatService.java:159)
> at 
> org.apache.hadoop.hdfs.server.federation.router.PeriodicService$1.run(PeriodicService.java:178)
> at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
> at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
> at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748){code}



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Assigned] (HDDS-2102) HddsVolumeChecker should use java optional in place of Guava optional

2019-09-09 Thread Mukul Kumar Singh (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2102?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mukul Kumar Singh reassigned HDDS-2102:
---

Assignee: Mukul Kumar Singh

> HddsVolumeChecker should use java optional in place of Guava optional
> -
>
> Key: HDDS-2102
> URL: https://issues.apache.org/jira/browse/HDDS-2102
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Datanode
>Affects Versions: 0.4.0
>Reporter: Mukul Kumar Singh
>Assignee: Mukul Kumar Singh
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> HddsVolumeChecker should use java optional in place of Guava optional, as the 
> Guava dependency is marked unstable.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-2102) HddsVolumeChecker should use java optional in place of Guava optional

2019-09-09 Thread Mukul Kumar Singh (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2102?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mukul Kumar Singh updated HDDS-2102:

Status: Patch Available  (was: Open)

> HddsVolumeChecker should use java optional in place of Guava optional
> -
>
> Key: HDDS-2102
> URL: https://issues.apache.org/jira/browse/HDDS-2102
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Datanode
>Affects Versions: 0.4.0
>Reporter: Mukul Kumar Singh
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> HddsVolumeChecker should use java optional in place of Guava optional, as the 
> Guava dependency is marked unstable.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDDS-2102) HddsVolumeChecker should use java optional in place of Guava optional

2019-09-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2102?focusedWorklogId=309085=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-309085
 ]

ASF GitHub Bot logged work on HDDS-2102:


Author: ASF GitHub Bot
Created on: 09/Sep/19 17:18
Start Date: 09/Sep/19 17:18
Worklog Time Spent: 10m 
  Work Description: mukul1987 commented on pull request #1416: HDDS-2102. 
HddsVolumeChecker should use java optional in place of Guava optional. 
Contributed by Mukul Kumar Singh.
URL: https://github.com/apache/hadoop/pull/1416
 
 
   HddsVolumeChecker should use java optional in place of Guava optional as 
Guava Optional is marked as unstable.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 309085)
Remaining Estimate: 0h
Time Spent: 10m

> HddsVolumeChecker should use java optional in place of Guava optional
> -
>
> Key: HDDS-2102
> URL: https://issues.apache.org/jira/browse/HDDS-2102
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Datanode
>Affects Versions: 0.4.0
>Reporter: Mukul Kumar Singh
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> HddsVolumeChecker should use java optional in place of Guava optional, as the 
> Guava dependency is marked unstable.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-2102) HddsVolumeChecker should use java optional in place of Guava optional

2019-09-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2102?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HDDS-2102:
-
Labels: pull-request-available  (was: )

> HddsVolumeChecker should use java optional in place of Guava optional
> -
>
> Key: HDDS-2102
> URL: https://issues.apache.org/jira/browse/HDDS-2102
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Datanode
>Affects Versions: 0.4.0
>Reporter: Mukul Kumar Singh
>Priority: Major
>  Labels: pull-request-available
>
> HddsVolumeChecker should use java optional in place of Guava optional, as the 
> Guava dependency is marked unstable.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDDS-1843) Undetectable corruption after restart of a datanode

2019-09-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-1843?focusedWorklogId=309083=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-309083
 ]

ASF GitHub Bot logged work on HDDS-1843:


Author: ASF GitHub Bot
Created on: 09/Sep/19 17:16
Start Date: 09/Sep/19 17:16
Worklog Time Spent: 10m 
  Work Description: bshashikant commented on pull request #1364: HDDS-1843. 
Undetectable corruption after restart of a datanode.
URL: https://github.com/apache/hadoop/pull/1364
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 309083)
Time Spent: 9h 50m  (was: 9h 40m)

> Undetectable corruption after restart of a datanode
> ---
>
> Key: HDDS-1843
> URL: https://issues.apache.org/jira/browse/HDDS-1843
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Datanode
>Affects Versions: 0.5.0
>Reporter: Shashikant Banerjee
>Assignee: Shashikant Banerjee
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 0.5.0
>
> Attachments: HDDS-1843.000.patch
>
>  Time Spent: 9h 50m
>  Remaining Estimate: 0h
>
> Right now, all write chunks use BufferedIO ie, sync flag is disabled by 
> default. Also, Rocks Db metadata updates are done in Rocks DB cache first at 
> Datanode. In case, there comes a situation where the buffered chunk data as 
> well as the corresponding metadata update is lost as a part of datanode 
> restart, it may lead to a situation where, it will not be possible to detect 
> the corruption (not even with container scanner) of this nature in a 
> reasonable time frame, until and unless there is a client IO failure or Recon 
> server detects it over time. In order to atleast to detect the problem, Ratis 
> snapshot on datanode should sync the rocks db file . In such a way, 
> ContainerScanner will be able to detect this.We can also add a metric around 
> sync to measure how much of a throughput loss it can incurr.
> Thanks [~msingh] for suggesting this.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-1843) Undetectable corruption after restart of a datanode

2019-09-09 Thread Shashikant Banerjee (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-1843?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee updated HDDS-1843:
--
Resolution: Fixed
Status: Resolved  (was: Patch Available)

> Undetectable corruption after restart of a datanode
> ---
>
> Key: HDDS-1843
> URL: https://issues.apache.org/jira/browse/HDDS-1843
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Datanode
>Affects Versions: 0.5.0
>Reporter: Shashikant Banerjee
>Assignee: Shashikant Banerjee
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 0.5.0
>
> Attachments: HDDS-1843.000.patch
>
>  Time Spent: 9h 50m
>  Remaining Estimate: 0h
>
> Right now, all write chunks use BufferedIO ie, sync flag is disabled by 
> default. Also, Rocks Db metadata updates are done in Rocks DB cache first at 
> Datanode. In case, there comes a situation where the buffered chunk data as 
> well as the corresponding metadata update is lost as a part of datanode 
> restart, it may lead to a situation where, it will not be possible to detect 
> the corruption (not even with container scanner) of this nature in a 
> reasonable time frame, until and unless there is a client IO failure or Recon 
> server detects it over time. In order to atleast to detect the problem, Ratis 
> snapshot on datanode should sync the rocks db file . In such a way, 
> ContainerScanner will be able to detect this.We can also add a metric around 
> sync to measure how much of a throughput loss it can incurr.
> Thanks [~msingh] for suggesting this.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDDS-1843) Undetectable corruption after restart of a datanode

2019-09-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-1843?focusedWorklogId=309082=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-309082
 ]

ASF GitHub Bot logged work on HDDS-1843:


Author: ASF GitHub Bot
Created on: 09/Sep/19 17:15
Start Date: 09/Sep/19 17:15
Worklog Time Spent: 10m 
  Work Description: bshashikant commented on issue #1364: HDDS-1843. 
Undetectable corruption after restart of a datanode.
URL: https://github.com/apache/hadoop/pull/1364#issuecomment-529579125
 
 
   Thanks @nandakumar131  @mukul1987 @supratimdeka for the reviews. I have 
committed this change to trunk.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 309082)
Time Spent: 9h 40m  (was: 9.5h)

> Undetectable corruption after restart of a datanode
> ---
>
> Key: HDDS-1843
> URL: https://issues.apache.org/jira/browse/HDDS-1843
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Datanode
>Affects Versions: 0.5.0
>Reporter: Shashikant Banerjee
>Assignee: Shashikant Banerjee
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 0.5.0
>
> Attachments: HDDS-1843.000.patch
>
>  Time Spent: 9h 40m
>  Remaining Estimate: 0h
>
> Right now, all write chunks use BufferedIO ie, sync flag is disabled by 
> default. Also, Rocks Db metadata updates are done in Rocks DB cache first at 
> Datanode. In case, there comes a situation where the buffered chunk data as 
> well as the corresponding metadata update is lost as a part of datanode 
> restart, it may lead to a situation where, it will not be possible to detect 
> the corruption (not even with container scanner) of this nature in a 
> reasonable time frame, until and unless there is a client IO failure or Recon 
> server detects it over time. In order to atleast to detect the problem, Ratis 
> snapshot on datanode should sync the rocks db file . In such a way, 
> ContainerScanner will be able to detect this.We can also add a metric around 
> sync to measure how much of a throughput loss it can incurr.
> Thanks [~msingh] for suggesting this.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14795) Add Throttler for writing block

2019-09-09 Thread Jira


[ 
https://issues.apache.org/jira/browse/HDFS-14795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16925927#comment-16925927
 ] 

Íñigo Goiri commented on HDFS-14795:


Thanks [~leosun08], I think this is more readable now.
For "isWrite()" I would use ifs instead of switch:
{code}
if (stage == PIPELINE_SETUP_STREAMING_RECOVERY) {
  return true;
} else if (stage = =PIPELINE_SETUP_APPEND_RECOVERY) {
  return true;
} else {
  return false;
}
{code}
A minor neat, the indentation DFSConfigKeys#123 doesn't seem consistent.

> Add Throttler for writing block
> ---
>
> Key: HDFS-14795
> URL: https://issues.apache.org/jira/browse/HDFS-14795
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Lisheng Sun
>Assignee: Lisheng Sun
>Priority: Minor
> Attachments: HDFS-14795.001.patch, HDFS-14795.002.patch, 
> HDFS-14795.003.patch, HDFS-14795.004.patch
>
>
> DataXceiver#writeBlock
> {code:java}
> blockReceiver.receiveBlock(mirrorOut, mirrorIn, replyOut,
> mirrorAddr, null, targets, false);
> {code}
> As above code, DataXceiver#writeBlock doesn't throttler.
>  I think it is necessary to throttle for writing block, while add throttler 
> in stage of PIPELINE_SETUP_APPEND_RECOVERY or 
> PIPELINE_SETUP_STREAMING_RECOVERY.
> Default throttler value is still null.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDDS-2102) HddsVolumeChecker should use java optional in place of Guava optional

2019-09-09 Thread Mukul Kumar Singh (Jira)
Mukul Kumar Singh created HDDS-2102:
---

 Summary: HddsVolumeChecker should use java optional in place of 
Guava optional
 Key: HDDS-2102
 URL: https://issues.apache.org/jira/browse/HDDS-2102
 Project: Hadoop Distributed Data Store
  Issue Type: Bug
  Components: Ozone Datanode
Affects Versions: 0.4.0
Reporter: Mukul Kumar Singh


HddsVolumeChecker should use java optional in place of Guava optional, as the 
Guava dependency is marked unstable.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-2101) Ozone filesystem provider doesn't exist

2019-09-09 Thread Elek, Marton (Jira)


[ 
https://issues.apache.org/jira/browse/HDDS-2101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16925893#comment-16925893
 ] 

Elek, Marton commented on HDDS-2101:


The problem is that the exact implementation depends from the current 
environment. In case of a legacy hadoop it should be BasicOzoneFileSystem for 
hadoop 3.2 it should be OzoneFileSystem...

> Ozone filesystem provider doesn't exist
> ---
>
> Key: HDDS-2101
> URL: https://issues.apache.org/jira/browse/HDDS-2101
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Filesystem
>Reporter: Jitendra Nath Pandey
>Assignee: Vivek Ratnavel Subramanian
>Priority: Critical
>
> We don't have a filesystem provider in META-INF. 
> i.e. following file doesn't exist.
> {{hadoop-ozone/ozonefs/src/main/resources/META-INF/services/org.apache.hadoop.fs.FileSystem}}
> See for example
> {{hadoop-tools/hadoop-aws/src/main/resources/META-INF/services/org.apache.hadoop.fs.FileSystem}}



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-2075) Tracing in OzoneManager call is propagated with wrong parent

2019-09-09 Thread Doroszlai, Attila (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2075?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Doroszlai, Attila updated HDDS-2075:

Status: Patch Available  (was: In Progress)

> Tracing in OzoneManager call is propagated with wrong parent
> 
>
> Key: HDDS-2075
> URL: https://issues.apache.org/jira/browse/HDDS-2075
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>Reporter: Elek, Marton
>Assignee: Doroszlai, Attila
>Priority: Major
>  Labels: pull-request-available
> Attachments: create_bucket-new.png, create_bucket.png
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> As you can see in the attached screenshot the OzoneManager.createBucket 
> (server side) tracing information is the children of the freon.createBucket 
> instead of the freon OzoneManagerProtocolPB.submitRequest.
> To avoid confusion the hierarchy should be fixed (Most probably we generate 
> the child span AFTER we already serialized the parent one to the message) 



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDDS-2075) Tracing in OzoneManager call is propagated with wrong parent

2019-09-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2075?focusedWorklogId=309017=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-309017
 ]

ASF GitHub Bot logged work on HDDS-2075:


Author: ASF GitHub Bot
Created on: 09/Sep/19 16:17
Start Date: 09/Sep/19 16:17
Worklog Time Spent: 10m 
  Work Description: adoroszlai commented on issue #1415: HDDS-2075. Tracing 
in OzoneManager call is propagated with wrong parent
URL: https://github.com/apache/hadoop/pull/1415#issuecomment-529555336
 
 
   /label ozone
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 309017)
Time Spent: 20m  (was: 10m)

> Tracing in OzoneManager call is propagated with wrong parent
> 
>
> Key: HDDS-2075
> URL: https://issues.apache.org/jira/browse/HDDS-2075
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>Reporter: Elek, Marton
>Assignee: Doroszlai, Attila
>Priority: Major
>  Labels: pull-request-available
> Attachments: create_bucket-new.png, create_bucket.png
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> As you can see in the attached screenshot the OzoneManager.createBucket 
> (server side) tracing information is the children of the freon.createBucket 
> instead of the freon OzoneManagerProtocolPB.submitRequest.
> To avoid confusion the hierarchy should be fixed (Most probably we generate 
> the child span AFTER we already serialized the parent one to the message) 



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-2075) Tracing in OzoneManager call is propagated with wrong parent

2019-09-09 Thread Doroszlai, Attila (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2075?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Doroszlai, Attila updated HDDS-2075:

Attachment: create_bucket-new.png

> Tracing in OzoneManager call is propagated with wrong parent
> 
>
> Key: HDDS-2075
> URL: https://issues.apache.org/jira/browse/HDDS-2075
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>Reporter: Elek, Marton
>Assignee: Doroszlai, Attila
>Priority: Major
>  Labels: pull-request-available
> Attachments: create_bucket-new.png, create_bucket.png
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> As you can see in the attached screenshot the OzoneManager.createBucket 
> (server side) tracing information is the children of the freon.createBucket 
> instead of the freon OzoneManagerProtocolPB.submitRequest.
> To avoid confusion the hierarchy should be fixed (Most probably we generate 
> the child span AFTER we already serialized the parent one to the message) 



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDDS-2075) Tracing in OzoneManager call is propagated with wrong parent

2019-09-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2075?focusedWorklogId=309015=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-309015
 ]

ASF GitHub Bot logged work on HDDS-2075:


Author: ASF GitHub Bot
Created on: 09/Sep/19 16:16
Start Date: 09/Sep/19 16:16
Worklog Time Spent: 10m 
  Work Description: adoroszlai commented on pull request #1415: HDDS-2075. 
Tracing in OzoneManager call is propagated with wrong parent
URL: https://github.com/apache/hadoop/pull/1415
 
 
   ## What changes were proposed in this pull request?
   
   Apply tracing to `OzoneManagerProtocol` instead of `OzoneManagerProtocolPB`. 
 The latter only has a single public method, and no other `*ProtocolPB` 
interface is traced.
   
   https://issues.apache.org/jira/browse/HDDS-2075
   
   ## How was this patch tested?
   
   Verified operation hierarchy in Jaeger UI.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 309015)
Remaining Estimate: 0h
Time Spent: 10m

> Tracing in OzoneManager call is propagated with wrong parent
> 
>
> Key: HDDS-2075
> URL: https://issues.apache.org/jira/browse/HDDS-2075
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>Reporter: Elek, Marton
>Assignee: Doroszlai, Attila
>Priority: Major
>  Labels: pull-request-available
> Attachments: create_bucket.png
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> As you can see in the attached screenshot the OzoneManager.createBucket 
> (server side) tracing information is the children of the freon.createBucket 
> instead of the freon OzoneManagerProtocolPB.submitRequest.
> To avoid confusion the hierarchy should be fixed (Most probably we generate 
> the child span AFTER we already serialized the parent one to the message) 



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-2075) Tracing in OzoneManager call is propagated with wrong parent

2019-09-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2075?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HDDS-2075:
-
Labels: pull-request-available  (was: )

> Tracing in OzoneManager call is propagated with wrong parent
> 
>
> Key: HDDS-2075
> URL: https://issues.apache.org/jira/browse/HDDS-2075
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>Reporter: Elek, Marton
>Assignee: Doroszlai, Attila
>Priority: Major
>  Labels: pull-request-available
> Attachments: create_bucket.png
>
>
> As you can see in the attached screenshot the OzoneManager.createBucket 
> (server side) tracing information is the children of the freon.createBucket 
> instead of the freon OzoneManagerProtocolPB.submitRequest.
> To avoid confusion the hierarchy should be fixed (Most probably we generate 
> the child span AFTER we already serialized the parent one to the message) 



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-2075) Tracing in OzoneManager call is propagated with wrong parent

2019-09-09 Thread Doroszlai, Attila (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2075?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Doroszlai, Attila updated HDDS-2075:

Attachment: create_bucket.png

> Tracing in OzoneManager call is propagated with wrong parent
> 
>
> Key: HDDS-2075
> URL: https://issues.apache.org/jira/browse/HDDS-2075
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>Reporter: Elek, Marton
>Assignee: Doroszlai, Attila
>Priority: Major
> Attachments: create_bucket.png
>
>
> As you can see in the attached screenshot the OzoneManager.createBucket 
> (server side) tracing information is the children of the freon.createBucket 
> instead of the freon OzoneManagerProtocolPB.submitRequest.
> To avoid confusion the hierarchy should be fixed (Most probably we generate 
> the child span AFTER we already serialized the parent one to the message) 



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-2075) Tracing in OzoneManager call is propagated with wrong parent

2019-09-09 Thread Doroszlai, Attila (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2075?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Doroszlai, Attila updated HDDS-2075:

Target Version/s: 0.5.0

> Tracing in OzoneManager call is propagated with wrong parent
> 
>
> Key: HDDS-2075
> URL: https://issues.apache.org/jira/browse/HDDS-2075
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>Reporter: Elek, Marton
>Assignee: Doroszlai, Attila
>Priority: Major
>
> As you can see in the attached screenshot the OzoneManager.createBucket 
> (server side) tracing information is the children of the freon.createBucket 
> instead of the freon OzoneManagerProtocolPB.submitRequest.
> To avoid confusion the hierarchy should be fixed (Most probably we generate 
> the child span AFTER we already serialized the parent one to the message) 



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work started] (HDDS-2075) Tracing in OzoneManager call is propagated with wrong parent

2019-09-09 Thread Doroszlai, Attila (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2075?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HDDS-2075 started by Doroszlai, Attila.
---
> Tracing in OzoneManager call is propagated with wrong parent
> 
>
> Key: HDDS-2075
> URL: https://issues.apache.org/jira/browse/HDDS-2075
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>Reporter: Elek, Marton
>Assignee: Doroszlai, Attila
>Priority: Major
>
> As you can see in the attached screenshot the OzoneManager.createBucket 
> (server side) tracing information is the children of the freon.createBucket 
> instead of the freon OzoneManagerProtocolPB.submitRequest.
> To avoid confusion the hierarchy should be fixed (Most probably we generate 
> the child span AFTER we already serialized the parent one to the message) 



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Assigned] (HDDS-2075) Tracing in OzoneManager call is propagated with wrong parent

2019-09-09 Thread Doroszlai, Attila (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2075?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Doroszlai, Attila reassigned HDDS-2075:
---

Assignee: Doroszlai, Attila

> Tracing in OzoneManager call is propagated with wrong parent
> 
>
> Key: HDDS-2075
> URL: https://issues.apache.org/jira/browse/HDDS-2075
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>Reporter: Elek, Marton
>Assignee: Doroszlai, Attila
>Priority: Major
>
> As you can see in the attached screenshot the OzoneManager.createBucket 
> (server side) tracing information is the children of the freon.createBucket 
> instead of the freon OzoneManagerProtocolPB.submitRequest.
> To avoid confusion the hierarchy should be fixed (Most probably we generate 
> the child span AFTER we already serialized the parent one to the message) 



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14655) [SBN Read] Namenode crashes if one of The JN is down

2019-09-09 Thread Erik Krogen (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16925848#comment-16925848
 ] 

Erik Krogen commented on HDFS-14655:


Looks better to me, thanks [~ayushtkn]! I do think we should rename the config 
and update the description to represent that this config is a _maximum_ thread 
count; the way it reads now, I would assume that there are always this many 
threads being used.

One thing I noticed, you used a keepalive time of 0:
{code}
return new HadoopThreadPoolExecutor(1, numThreads, 0L,
TimeUnit.MILLISECONDS, new LinkedBlockingQueue(),
{code}
I feel a longer time would probably be better; if more than 1 thread is needed, 
it will probably be needed again soon (might represent a slow JN?), so it seems 
some keepalive would be helpful to avoid the thread creation overhead. Also you 
can use 
[diamond-typing|https://docs.oracle.com/javase/tutorial/java/generics/types.html#diamond]
 here for the {{LinkedBlockingQueue}} instantiation.

> [SBN Read] Namenode crashes if one of The JN is down
> 
>
> Key: HDFS-14655
> URL: https://issues.apache.org/jira/browse/HDFS-14655
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 3.3.0
>Reporter: Harshakiran Reddy
>Assignee: Ayush Saxena
>Priority: Critical
> Attachments: HDFS-14655-01.patch, HDFS-14655-02.patch, 
> HDFS-14655-03.patch, HDFS-14655.poc.patch
>
>
> {noformat}
> 2019-07-04 17:35:54,064 | INFO  | Logger channel (from parallel executor) to 
> XXX/XXX | Retrying connect to server: XXX/XXX. Already tried 
> 9 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, 
> sleepTime=1000 MILLISECONDS) | Client.java:975
> 2019-07-04 17:35:54,087 | FATAL | Edit log tailer | Unknown error encountered 
> while tailing edits. Shutting down standby NN. | EditLogTailer.java:474
> java.lang.OutOfMemoryError: unable to create new native thread
>   at java.lang.Thread.start0(Native Method)
>   at java.lang.Thread.start(Thread.java:717)
>   at 
> java.util.concurrent.ThreadPoolExecutor.addWorker(ThreadPoolExecutor.java:957)
>   at 
> java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1378)
>   at 
> com.google.common.util.concurrent.MoreExecutors$ListeningDecorator.execute(MoreExecutors.java:440)
>   at 
> com.google.common.util.concurrent.AbstractListeningExecutorService.submit(AbstractListeningExecutorService.java:56)
>   at 
> org.apache.hadoop.hdfs.qjournal.client.IPCLoggerChannel.getJournaledEdits(IPCLoggerChannel.java:565)
>   at 
> org.apache.hadoop.hdfs.qjournal.client.AsyncLoggerSet.getJournaledEdits(AsyncLoggerSet.java:272)
>   at 
> org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager.selectRpcInputStreams(QuorumJournalManager.java:533)
>   at 
> org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager.selectInputStreams(QuorumJournalManager.java:508)
>   at 
> org.apache.hadoop.hdfs.server.namenode.JournalSet.selectInputStreams(JournalSet.java:275)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLog.selectInputStreams(FSEditLog.java:1681)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLog.selectInputStreams(FSEditLog.java:1714)
>   at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer.doTailEdits(EditLogTailer.java:307)
>   at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.doWork(EditLogTailer.java:460)
>   at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.access$300(EditLogTailer.java:410)
>   at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread$1.run(EditLogTailer.java:427)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:360)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1709)
>   at 
> org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:483)
>   at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.run(EditLogTailer.java:423)
> 2019-07-04 17:35:54,112 | INFO  | Edit log tailer | Exiting with status 1: 
> java.lang.OutOfMemoryError: unable to create new native thread | 
> ExitUtil.java:210
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-14655) [SBN Read] Namenode crashes if one of The JN is down

2019-09-09 Thread Erik Krogen (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16925848#comment-16925848
 ] 

Erik Krogen edited comment on HDFS-14655 at 9/9/19 3:43 PM:


Looks better to me, thanks [~ayushtkn]! I do think we should rename the config 
and update the description to represent that this config is a _maximum_ thread 
count; the way it reads now, I would assume that there are always this many 
threads being used.

One thing I noticed, you used a keepalive time of 0:
{code}
return new HadoopThreadPoolExecutor(1, numThreads, 0L,
TimeUnit.MILLISECONDS, new LinkedBlockingQueue(),
{code}
I feel a longer time would probably be better; if more than 1 thread is needed, 
it will probably be needed again soon (might represent a slow JN?), so it seems 
some keepalive would be helpful to avoid the thread creation overhead. Also you 
can use 
[diamond-typing|https://docs.oracle.com/javase/tutorial/java/generics/types.html#diamond]
 here for the {{LinkedBlockingQueue}} instantiation.

[~shv], does the current approach address your previous concerns?


was (Author: xkrogen):
Looks better to me, thanks [~ayushtkn]! I do think we should rename the config 
and update the description to represent that this config is a _maximum_ thread 
count; the way it reads now, I would assume that there are always this many 
threads being used.

One thing I noticed, you used a keepalive time of 0:
{code}
return new HadoopThreadPoolExecutor(1, numThreads, 0L,
TimeUnit.MILLISECONDS, new LinkedBlockingQueue(),
{code}
I feel a longer time would probably be better; if more than 1 thread is needed, 
it will probably be needed again soon (might represent a slow JN?), so it seems 
some keepalive would be helpful to avoid the thread creation overhead. Also you 
can use 
[diamond-typing|https://docs.oracle.com/javase/tutorial/java/generics/types.html#diamond]
 here for the {{LinkedBlockingQueue}} instantiation.

> [SBN Read] Namenode crashes if one of The JN is down
> 
>
> Key: HDFS-14655
> URL: https://issues.apache.org/jira/browse/HDFS-14655
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 3.3.0
>Reporter: Harshakiran Reddy
>Assignee: Ayush Saxena
>Priority: Critical
> Attachments: HDFS-14655-01.patch, HDFS-14655-02.patch, 
> HDFS-14655-03.patch, HDFS-14655.poc.patch
>
>
> {noformat}
> 2019-07-04 17:35:54,064 | INFO  | Logger channel (from parallel executor) to 
> XXX/XXX | Retrying connect to server: XXX/XXX. Already tried 
> 9 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, 
> sleepTime=1000 MILLISECONDS) | Client.java:975
> 2019-07-04 17:35:54,087 | FATAL | Edit log tailer | Unknown error encountered 
> while tailing edits. Shutting down standby NN. | EditLogTailer.java:474
> java.lang.OutOfMemoryError: unable to create new native thread
>   at java.lang.Thread.start0(Native Method)
>   at java.lang.Thread.start(Thread.java:717)
>   at 
> java.util.concurrent.ThreadPoolExecutor.addWorker(ThreadPoolExecutor.java:957)
>   at 
> java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1378)
>   at 
> com.google.common.util.concurrent.MoreExecutors$ListeningDecorator.execute(MoreExecutors.java:440)
>   at 
> com.google.common.util.concurrent.AbstractListeningExecutorService.submit(AbstractListeningExecutorService.java:56)
>   at 
> org.apache.hadoop.hdfs.qjournal.client.IPCLoggerChannel.getJournaledEdits(IPCLoggerChannel.java:565)
>   at 
> org.apache.hadoop.hdfs.qjournal.client.AsyncLoggerSet.getJournaledEdits(AsyncLoggerSet.java:272)
>   at 
> org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager.selectRpcInputStreams(QuorumJournalManager.java:533)
>   at 
> org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager.selectInputStreams(QuorumJournalManager.java:508)
>   at 
> org.apache.hadoop.hdfs.server.namenode.JournalSet.selectInputStreams(JournalSet.java:275)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLog.selectInputStreams(FSEditLog.java:1681)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLog.selectInputStreams(FSEditLog.java:1714)
>   at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer.doTailEdits(EditLogTailer.java:307)
>   at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.doWork(EditLogTailer.java:460)
>   at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.access$300(EditLogTailer.java:410)
>   at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread$1.run(EditLogTailer.java:427)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at 

[jira] [Commented] (HDFS-14699) Erasure Coding: Storage not considered in live replica when replication streams hard limit reached to threshold

2019-09-09 Thread Zhao Yi Ming (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14699?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16925775#comment-16925775
 ] 

Zhao Yi Ming commented on HDFS-14699:
-

[~surendrasingh]  Good Point!  You are right! We only need the srcNodes under 
the replicationStreamsHardLimit control, liveBlockIndices just used for the 
reconstruction work, it can move before the threshold check. I will do the 
changes and test in our testing env, if everything goes well(The testing need 
some time, hope I can occupy the testing env as soon as possible). I will 
submit a new patch with your comments. Thanks again!

> Erasure Coding: Storage not considered in live replica when replication 
> streams hard limit reached to threshold
> ---
>
> Key: HDFS-14699
> URL: https://issues.apache.org/jira/browse/HDFS-14699
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: ec
>Affects Versions: 3.2.0, 3.1.1, 3.3.0
>Reporter: Zhao Yi Ming
>Assignee: Zhao Yi Ming
>Priority: Critical
>  Labels: patch
> Attachments: HDFS-14699.00.patch, HDFS-14699.01.patch, 
> HDFS-14699.02.patch, HDFS-14699.03.patch, HDFS-14699.04.patch, 
> HDFS-14699.05.patch, image-2019-08-20-19-58-51-872.png, 
> image-2019-09-02-17-51-46-742.png
>
>
> We are tried the EC function on 80 node cluster with hadoop 3.1.1, we hit the 
> same scenario as you said https://issues.apache.org/jira/browse/HDFS-8881. 
> Following are our testing steps, hope it can helpful.(following DNs have the 
> testing internal blocks)
>  # we customized a new 10-2-1024k policy and use it on a path, now we have 12 
> internal block(12 live block)
>  # decommission one DN, after the decommission complete. now we have 13 
> internal block(12 live block and 1 decommission block)
>  # then shutdown one DN which did not have the same block id as 1 
> decommission block, now we have 12 internal block(11 live block and 1 
> decommission block)
>  # after wait for about 600s (before the heart beat come) commission the 
> decommissioned DN again, now we have 12 internal block(11 live block and 1 
> duplicate block)
>  # Then the EC is not reconstruct the missed block
> We think this is a critical issue for using the EC function in a production 
> env. Could you help? Thanks a lot!



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDDS-2076) Read fails because the block cannot be located in the container

2019-09-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2076?focusedWorklogId=308962=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-308962
 ]

ASF GitHub Bot logged work on HDDS-2076:


Author: ASF GitHub Bot
Created on: 09/Sep/19 15:04
Start Date: 09/Sep/19 15:04
Worklog Time Spent: 10m 
  Work Description: hadoop-yetus commented on issue #1410: HDDS-2076. Read 
fails because the block cannot be located in the container
URL: https://github.com/apache/hadoop/pull/1410#issuecomment-529520412
 
 
   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime | Comment |
   |::|--:|:|:|
   | 0 | reexec | 41 | Docker mode activated. |
   ||| _ Prechecks _ |
   | +1 | dupname | 0 | No case conflicting files found. |
   | +1 | @author | 0 | The patch does not contain any @author tags. |
   | +1 | test4tests | 0 | The patch appears to include 1 new or modified test 
files. |
   ||| _ trunk Compile Tests _ |
   | 0 | mvndep | 25 | Maven dependency ordering for branch |
   | +1 | mvninstall | 581 | trunk passed |
   | +1 | compile | 383 | trunk passed |
   | +1 | checkstyle | 81 | trunk passed |
   | +1 | mvnsite | 0 | trunk passed |
   | +1 | shadedclient | 881 | branch has no errors when building and testing 
our client artifacts. |
   | +1 | javadoc | 178 | trunk passed |
   | 0 | spotbugs | 418 | Used deprecated FindBugs config; considering 
switching to SpotBugs. |
   | +1 | findbugs | 612 | trunk passed |
   ||| _ Patch Compile Tests _ |
   | 0 | mvndep | 38 | Maven dependency ordering for patch |
   | +1 | mvninstall | 544 | the patch passed |
   | +1 | compile | 394 | the patch passed |
   | +1 | javac | 394 | the patch passed |
   | +1 | checkstyle | 88 | the patch passed |
   | +1 | mvnsite | 0 | the patch passed |
   | +1 | whitespace | 0 | The patch has no whitespace issues. |
   | +1 | shadedclient | 713 | patch has no errors when building and testing 
our client artifacts. |
   | +1 | javadoc | 177 | the patch passed |
   | +1 | findbugs | 704 | the patch passed |
   ||| _ Other Tests _ |
   | -1 | unit | 196 | hadoop-hdds in the patch failed. |
   | -1 | unit | 195 | hadoop-ozone in the patch failed. |
   | +1 | asflicense | 46 | The patch does not generate ASF License warnings. |
   | | | 6058 | |
   
   
   | Reason | Tests |
   |---:|:--|
   | Failed junit tests | hadoop.ozone.container.keyvalue.TestKeyValueContainer 
|
   |   | hadoop.ozone.container.ozoneimpl.TestOzoneContainer |
   |   | hadoop.ozone.om.ratis.TestOzoneManagerRatisServer |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | Client=19.03.1 Server=19.03.1 base: 
https://builds.apache.org/job/hadoop-multibranch/job/PR-1410/3/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/1410 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall 
mvnsite unit shadedclient findbugs checkstyle |
   | uname | Linux cdb643d21b64 4.15.0-58-generic #64-Ubuntu SMP Tue Aug 6 
11:12:41 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | personality/hadoop.sh |
   | git revision | trunk / 60af879 |
   | Default Java | 1.8.0_222 |
   | unit | 
https://builds.apache.org/job/hadoop-multibranch/job/PR-1410/3/artifact/out/patch-unit-hadoop-hdds.txt
 |
   | unit | 
https://builds.apache.org/job/hadoop-multibranch/job/PR-1410/3/artifact/out/patch-unit-hadoop-ozone.txt
 |
   |  Test Results | 
https://builds.apache.org/job/hadoop-multibranch/job/PR-1410/3/testReport/ |
   | Max. process+thread count | 1263 (vs. ulimit of 5500) |
   | modules | C: hadoop-hdds/container-service hadoop-ozone/integration-test 
U: . |
   | Console output | 
https://builds.apache.org/job/hadoop-multibranch/job/PR-1410/3/console |
   | versions | git=2.7.4 maven=3.3.9 findbugs=3.1.0-RC1 |
   | Powered by | Apache Yetus 0.10.0 http://yetus.apache.org |
   
   
   This message was automatically generated.
   
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 308962)
Time Spent: 1.5h  (was: 1h 20m)

> Read fails because the block cannot be located in the container
> ---
>
> Key: HDDS-2076
> URL: https://issues.apache.org/jira/browse/HDDS-2076
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Client, Ozone Datanode
>Affects Versions: 0.4.0
>Reporter: Mukul Kumar Singh
>Assignee: 

[jira] [Commented] (HDFS-14303) check block directory logic not correct when there is only meta file, print no meaning warn log

2019-09-09 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16925758#comment-16925758
 ] 

Hadoop QA commented on HDFS-14303:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  2m 
20s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} branch-3.2 Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 28m 
23s{color} | {color:green} branch-3.2 passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
18s{color} | {color:green} branch-3.2 passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
56s{color} | {color:green} branch-3.2 passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
31s{color} | {color:green} branch-3.2 passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
16m 45s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
15s{color} | {color:green} branch-3.2 passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m  
9s{color} | {color:green} branch-3.2 passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
24s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
9s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  1m  
9s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
44s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
15s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
14m 54s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
23s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
59s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red}117m 35s{color} 
| {color:red} hadoop-hdfs in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
33s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}195m 28s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.hdfs.TestLeaseRecovery2 |
|   | hadoop.hdfs.server.diskbalancer.TestDiskBalancer |
|   | hadoop.hdfs.TestDecommission |
|   | hadoop.hdfs.server.datanode.TestDirectoryScanner |
|   | hadoop.hdfs.server.namenode.ha.TestHAAppend |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=19.03.2 Server=19.03.2 Image:yetus/hadoop:63396beab41 |
| JIRA Issue | HDFS-14303 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12979846/HDFS-14303-branch-3.2.addendum.03.patch
 |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 7b28d52a7e2e 4.15.0-58-generic #64-Ubuntu SMP Tue Aug 6 
11:12:41 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | branch-3.2 / f6cc887 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_212 |
| findbugs | v3.1.0-RC1 |
| unit | 
https://builds.apache.org/job/PreCommit-HDFS-Build/27824/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-HDFS-Build/27824/testReport/ |
| Max. process+thread count | 2848 (vs. ulimit of 5500) |

[jira] [Assigned] (HDFS-14836) FileIoProvider should not increase FileIoErrors metric in datanode volume metric

2019-09-09 Thread Wei-Chiu Chuang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14836?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang reassigned HDFS-14836:
--

Assignee: Aiphago

> FileIoProvider should not increase FileIoErrors metric in datanode volume 
> metric
> 
>
> Key: HDFS-14836
> URL: https://issues.apache.org/jira/browse/HDFS-14836
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.9.1
>Reporter: Aiphago
>Assignee: Aiphago
>Priority: Minor
>
> I found that  FileIoErrors metric will increase in 
> BlockSender.sendPacket(),when use fileIoProvider.transferToSocketFully().But 
> in https://issues.apache.org/jira/browse/HDFS-2054 the Exception has been 
> ignore like "Broken pipe" and "Connection reset" .
> So should do a filter when fileIoProvider increase FileIoErrors count ?



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-2082) Fix flaky TestContainerStateMachineFailures#testApplyTransactionFailure

2019-09-09 Thread Doroszlai, Attila (Jira)


[ 
https://issues.apache.org/jira/browse/HDDS-2082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16925690#comment-16925690
 ] 

Doroszlai, Attila commented on HDDS-2082:
-

[~shashikant], more often 
{{TestContainerStateMachineFailures#testApplyTransactionFailure}} fails (with 
error) due to [exception type mismatch in response to the close container 
request|https://github.com/apache/hadoop/blob/60af8793b45b4057101a22e4248d7ca022b52d79/hadoop-ozone/integration-test/src/test/java/org/apache/hadoop/ozone/client/rpc/TestContainerStateMachineFailures.java#L328-L334].
  The root cause of the {{IOException}} is a {{StateMachineException}}, which 
is not expected by {{checkForException}}, thus the {{IOException}} is re-thrown.

{code}
StateMachineException: 
org.apache.hadoop.hdds.scm.container.common.helpers.StorageContainerException: 
Error while creating/ updating .container file. ContainerID: 5
{code}

https://github.com/elek/ozone-ci/blob/master/trunk/trunk-nightly-zfkm8/integration/hadoop-ozone/integration-test/org.apache.hadoop.ozone.client.rpc.TestContainerStateMachineFailures.txt
https://github.com/elek/ozone-ci/blob/master/pr/pr-hdds-1569-5th2c/integration/hadoop-ozone/integration-test/org.apache.hadoop.ozone.client.rpc.TestContainerStateMachineFailures.txt
https://github.com/elek/ozone-ci/blob/master/pr/pr-hdds-2060-hng4s/integration/hadoop-ozone/integration-test/org.apache.hadoop.ozone.client.rpc.TestContainerStateMachineFailures.txt
https://github.com/elek/ozone-ci/blob/master/pr/pr-hdds-1094-hnp8f/integration/hadoop-ozone/integration-test/org.apache.hadoop.ozone.client.rpc.TestContainerStateMachineFailures.txt
https://github.com/elek/ozone-ci/blob/master/pr/pr-hdds-2002-fbg9h/integration/hadoop-ozone/integration-test/org.apache.hadoop.ozone.client.rpc.TestContainerStateMachineFailures.txt
https://github.com/elek/ozone-ci/blob/master/pr/pr-hdds-1094-85qxc/integration/hadoop-ozone/integration-test/org.apache.hadoop.ozone.client.rpc.TestContainerStateMachineFailures.txt
https://github.com/elek/ozone-ci/blob/master/pr/pr-hdds-1571-bx9p4/integration/hadoop-ozone/integration-test/org.apache.hadoop.ozone.client.rpc.TestContainerStateMachineFailures.txt
https://github.com/elek/ozone-ci/blob/master/pr/pr-hdds-2064-v25ns/integration/hadoop-ozone/integration-test/org.apache.hadoop.ozone.client.rpc.TestContainerStateMachineFailures.txt



> Fix flaky TestContainerStateMachineFailures#testApplyTransactionFailure
> ---
>
> Key: HDDS-2082
> URL: https://issues.apache.org/jira/browse/HDDS-2082
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: test
>Reporter: Dinesh Chitlangia
>Priority: Major
>
> {code:java}
> ---
> Test set: org.apache.hadoop.ozone.client.rpc.TestContainerStateMachineFailures
> ---
> Tests run: 3, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 102.615 s <<< 
> FAILURE! - in 
> org.apache.hadoop.ozone.client.rpc.TestContainerStateMachineFailures
> testApplyTransactionFailure(org.apache.hadoop.ozone.client.rpc.TestContainerStateMachineFailures)
>   Time elapsed: 15.677 s  <<< FAILURE!
> java.lang.AssertionError
>   at org.junit.Assert.fail(Assert.java:86)
>   at org.junit.Assert.assertTrue(Assert.java:41)
>   at org.junit.Assert.assertTrue(Assert.java:52)
>   at 
> org.apache.hadoop.ozone.client.rpc.TestContainerStateMachineFailures.testApplyTransactionFailure(TestContainerStateMachineFailures.java:349)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>   at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:271)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50)
>   at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238)
>   at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63)
>   at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236)
>  

[jira] [Commented] (HDFS-14836) FileIoProvider should not increase FileIoErrors metric in datanode volume metric

2019-09-09 Thread Wei-Chiu Chuang (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16925670#comment-16925670
 ] 

Wei-Chiu Chuang commented on HDFS-14836:


I don't think I understand the proposal here. 

 

Upon Exception in BlockSender.sendPacket(), FileIoErrors is incremented. But 
you don't want "Broken pipe" and "Connection reset" to increment FileIoErrors, 
am I right? Is that because those exceptions are network issues not local disk 
issue?

> FileIoProvider should not increase FileIoErrors metric in datanode volume 
> metric
> 
>
> Key: HDFS-14836
> URL: https://issues.apache.org/jira/browse/HDFS-14836
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.9.1
>Reporter: Aiphago
>Priority: Minor
>
> I found that  FileIoErrors metric will increase in 
> BlockSender.sendPacket(),when use fileIoProvider.transferToSocketFully().But 
> in https://issues.apache.org/jira/browse/HDFS-2054 the Exception has been 
> ignore like "Broken pipe" and "Connection reset" .
> So should do a filter when fileIoProvider increase FileIoErrors count ?



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14836) FileIoProvider should not increase FileIoErrors metric in datanode volume metric

2019-09-09 Thread He Xiaoqiao (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16925661#comment-16925661
 ] 

He Xiaoqiao commented on HDFS-14836:


Thanks [~Aiphag0] for your report, and I agree that we should filter out count 
FileIoError when meet some explicit exception as HDFS-2054 said, otherwise this 
counter will be polluted and no valuable to reference.
[~jojochuang] [~ayushtkn] any thought? could you help to add [~Aiphag0] as 
contributor and assign this JIRA to him?

> FileIoProvider should not increase FileIoErrors metric in datanode volume 
> metric
> 
>
> Key: HDFS-14836
> URL: https://issues.apache.org/jira/browse/HDFS-14836
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.9.1
>Reporter: Aiphago
>Priority: Minor
>
> I found that  FileIoErrors metric will increase in 
> BlockSender.sendPacket(),when use fileIoProvider.transferToSocketFully().But 
> in https://issues.apache.org/jira/browse/HDFS-2054 the Exception has been 
> ignore like "Broken pipe" and "Connection reset" .
> So should do a filter when fileIoProvider increase FileIoErrors count ?



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14836) FileIoProvider should not increase FileIoErrors metric in datanode volume metric

2019-09-09 Thread Aiphago (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14836?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aiphago updated HDFS-14836:
---
Summary: FileIoProvider should not increase FileIoErrors metric in datanode 
volume metric  (was: FileIoProvider will increase FileIoErrors metric in 
datanode volume metric)

> FileIoProvider should not increase FileIoErrors metric in datanode volume 
> metric
> 
>
> Key: HDFS-14836
> URL: https://issues.apache.org/jira/browse/HDFS-14836
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.9.1
>Reporter: Aiphago
>Priority: Minor
>
> I found that  FileIoErrors metric will increase in 
> BlockSender.sendPacket(),when use fileIoProvider.transferToSocketFully().But 
> in https://issues.apache.org/jira/browse/HDFS-2054 the Exception has been 
> ignore like "Broken pipe" and "Connection reset" .
> So should do a filter when fileIoProvider increase FileIoErrors count ?



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14378) Simplify the design of multiple NN and both logic of edit log roll and checkpoint

2019-09-09 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16925642#comment-16925642
 ] 

Hadoop QA commented on HDFS-14378:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m  
0s{color} | {color:blue} Docker mode activated. {color} |
| {color:red}-1{color} | {color:red} patch {color} | {color:red}  0m  7s{color} 
| {color:red} HDFS-14378 does not apply to trunk. Rebase required? Wrong 
Branch? See https://wiki.apache.org/hadoop/HowToContribute for help. {color} |
\\
\\
|| Subsystem || Report/Notes ||
| JIRA Issue | HDFS-14378 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12966476/HDFS-14378-trunk.006.patch
 |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/27825/console |
| Powered by | Apache Yetus 0.8.0   http://yetus.apache.org |


This message was automatically generated.



> Simplify the design of multiple NN and both logic of edit log roll and 
> checkpoint
> -
>
> Key: HDFS-14378
> URL: https://issues.apache.org/jira/browse/HDFS-14378
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: ha, namenode
>Affects Versions: 3.1.2
>Reporter: star
>Assignee: star
>Priority: Major
> Attachments: HDFS-14378-trunk.001.patch, HDFS-14378-trunk.002.patch, 
> HDFS-14378-trunk.003.patch, HDFS-14378-trunk.004.patch, 
> HDFS-14378-trunk.005.patch, HDFS-14378-trunk.006.patch
>
>
>       HDFS-6440 introduced a mechanism to support more than 2 NNs. It 
> implements a first-writer-win policy to avoid duplicated fsimage downloading. 
> Variable 'isPrimaryCheckPointer' is used to hold the first-writer state, with 
> which SNN will provide fsimage for ANN next time. Then we have three roles in 
> NN cluster: ANN, one primary SNN, one or more normal SNN.
>       Since HDFS-12248, there may be more than two primary SNN shortly after 
> a exception occurred. It takes care with a scenario  that SNN will not upload 
> fsimage on IOE and Interrupted exceptions. Though it will not cause any 
> further functional issues, it is inconsistent. 
>       Futher more, edit log may be rolled more frequently than necessary with 
> multiple Standby name nodes, HDFS-14349. (I'm not so sure about this, will 
> verify by unit tests or any one could point it out.)
>       Above all, I‘m wondering if we could make it simple with following 
> changes:
>  * There are only two roles:ANN, SNN
>  * ANN will roll its edit log every DFS_HA_LOGROLL_PERIOD_KEY period.
>  * ANN will select a SNN to download checkpoint.
> SNN will just do logtail and checkpoint. Then provide a servlet for fsimage 
> downloading as normal. SNN will not try to roll edit log or send checkpoint 
> request to ANN.
> In a word, ANN will be more active. Suggestions are welcomed.
>  



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14836) FileIoProvider will increase FileIoErrors metric in datanode volume metric

2019-09-09 Thread Aiphago (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14836?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aiphago updated HDFS-14836:
---
Description: 
I found that  FileIoErrors metric will increase in 
BlockSender.sendPacket(),when use fileIoProvider.transferToSocketFully().But in 
https://issues.apache.org/jira/browse/HDFS-2054 the Exception has been ignore 
like "Broken pipe" and "Connection reset" .

So should do a filter when fileIoProvider increase FileIoErrors count ?

  was:
I found that  FileIoErrors metric will increase in 
BlockSender.sendPacket(),when use 

fileIoProvider.transferToSocketFully().But in 
https://issues.apache.org/jira/browse/HDFS-2054 the Exception has been ignore 
like "Broken pipe" and "Connection reset" .So should do a filter when 
fileIoProvider increase FileIoErrors count ?


> FileIoProvider will increase FileIoErrors metric in datanode volume metric
> --
>
> Key: HDFS-14836
> URL: https://issues.apache.org/jira/browse/HDFS-14836
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.9.1
>Reporter: Aiphago
>Priority: Minor
>
> I found that  FileIoErrors metric will increase in 
> BlockSender.sendPacket(),when use fileIoProvider.transferToSocketFully().But 
> in https://issues.apache.org/jira/browse/HDFS-2054 the Exception has been 
> ignore like "Broken pipe" and "Connection reset" .
> So should do a filter when fileIoProvider increase FileIoErrors count ?



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-14836) FileIoProvider will increase FileIoErrors metric in datanode volume metric

2019-09-09 Thread Aiphago (Jira)
Aiphago created HDFS-14836:
--

 Summary: FileIoProvider will increase FileIoErrors metric in 
datanode volume metric
 Key: HDFS-14836
 URL: https://issues.apache.org/jira/browse/HDFS-14836
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.9.1
Reporter: Aiphago


I found that  FileIoErrors metric will increase in 
BlockSender.sendPacket(),when use 

fileIoProvider.transferToSocketFully().But in 
https://issues.apache.org/jira/browse/HDFS-2054 the Exception has been ignore 
like "Broken pipe" and "Connection reset" .So should do a filter when 
fileIoProvider increase FileIoErrors count ?



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-14378) Simplify the design of multiple NN and both logic of edit log roll and checkpoint

2019-09-09 Thread star (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16925622#comment-16925622
 ] 

star edited comment on HDFS-14378 at 9/9/19 11:51 AM:
--

Thanks [~jojochuang] for reviewing and advise.

Section 7.6 of HDFS-1073 emphasize the problem of multi nn. HDFS-6440 didn't 
take care of edits rolling and avoid multiple fsimage uploading by 'primary 
check pointer' status for multi SNN. 

I'd like to make two sub jiras as respect to edits rolling and fsimage 
downloading. ANN will roll its edit logs. As to fsimage, two options as far as 
my concern:
 # SNN do its own checkpointk and ANN will download fsimage from a random 
selected SNN.
 # ANN issues a checkpoint command to SNNs by a special edit log like 
"OP_ROLLING_UPGRADE_START", then ANN downloads fsimage form a random selected 
SNN.

[~jojochuang], [~tlipcon]  what's your opinion?


was (Author: starphin):
Thanks [~jojochuang] for reviewing and advise.

Section 7.6 of HDFS-1073 emphasize the problem of multi nn. HDFS-6440 didn't 
take care of edits rolling and avoid multiple fsimage uploading by 'primary 
check pointer' status for multi SNN. 

I'd like to make two sub jiras as respect to edits rolling and fsimage 
downloading. ANN will roll its edit logs. As to fsimage, two options as far as 
my concern:
 # SNN do its own checkpointk and ANN will download fsimage from a random 
selected SNN.
 # 2. ANN issues a checkpoint command to SNNs by a special edit log like 
"OP_ROLLING_UPGRADE_START", then ANN downloads fsimage form a random selected 
SNN.

[~jojochuang], [~tlipcon]  what's your opinion?

> Simplify the design of multiple NN and both logic of edit log roll and 
> checkpoint
> -
>
> Key: HDFS-14378
> URL: https://issues.apache.org/jira/browse/HDFS-14378
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: ha, namenode
>Affects Versions: 3.1.2
>Reporter: star
>Assignee: star
>Priority: Major
> Attachments: HDFS-14378-trunk.001.patch, HDFS-14378-trunk.002.patch, 
> HDFS-14378-trunk.003.patch, HDFS-14378-trunk.004.patch, 
> HDFS-14378-trunk.005.patch, HDFS-14378-trunk.006.patch
>
>
>       HDFS-6440 introduced a mechanism to support more than 2 NNs. It 
> implements a first-writer-win policy to avoid duplicated fsimage downloading. 
> Variable 'isPrimaryCheckPointer' is used to hold the first-writer state, with 
> which SNN will provide fsimage for ANN next time. Then we have three roles in 
> NN cluster: ANN, one primary SNN, one or more normal SNN.
>       Since HDFS-12248, there may be more than two primary SNN shortly after 
> a exception occurred. It takes care with a scenario  that SNN will not upload 
> fsimage on IOE and Interrupted exceptions. Though it will not cause any 
> further functional issues, it is inconsistent. 
>       Futher more, edit log may be rolled more frequently than necessary with 
> multiple Standby name nodes, HDFS-14349. (I'm not so sure about this, will 
> verify by unit tests or any one could point it out.)
>       Above all, I‘m wondering if we could make it simple with following 
> changes:
>  * There are only two roles:ANN, SNN
>  * ANN will roll its edit log every DFS_HA_LOGROLL_PERIOD_KEY period.
>  * ANN will select a SNN to download checkpoint.
> SNN will just do logtail and checkpoint. Then provide a servlet for fsimage 
> downloading as normal. SNN will not try to roll edit log or send checkpoint 
> request to ANN.
> In a word, ANN will be more active. Suggestions are welcomed.
>  



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-14378) Simplify the design of multiple NN and both logic of edit log roll and checkpoint

2019-09-09 Thread star (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16925622#comment-16925622
 ] 

star edited comment on HDFS-14378 at 9/9/19 11:51 AM:
--

Thanks [~jojochuang] for reviewing and advise.

Section 7.6 of HDFS-1073 emphasize the problem of multi nn. HDFS-6440 didn't 
take care of edits rolling and avoid multiple fsimage uploading by 'primary 
check pointer' status for multi SNN. 

I'd like to make two sub jiras as respect to edits rolling and fsimage 
downloading. ANN will roll its edit logs. As to fsimage, two options as far as 
my concern:
 # SNN do its own checkpointk and ANN will download fsimage from a random 
selected SNN.
 # 2. ANN issues a checkpoint command to SNNs by a special edit log like 
"OP_ROLLING_UPGRADE_START", then ANN downloads fsimage form a random selected 
SNN.

[~jojochuang], [~tlipcon]  what's your opinion?


was (Author: starphin):
Thanks [~jojochuang] for reviewing and advise.

Section 7.6 of HDFS-1073 emphasize the problem of multi nn. HDFS-6440 didn't 
take care of edits rolling and avoid multiple fsimage uploading by 'primary 
check pointer' status for multi SNN. 

I'd like to make two sub jiras as respect to edits rolling and fsimage 
downloading. ANN will roll its edit logs. As to fsimage, two options as far as 
my concern: 1. SNN do its own checkpointk and ANN will download fsimage from a 
random selected SNN. 2. ANN issues a checkpoint command to SNNs by a special 
edit log like "OP_ROLLING_UPGRADE_START", then ANN downloads fsimage form a 
random selected SNN.

[~jojochuang], [~tlipcon]  what's your opinion?

> Simplify the design of multiple NN and both logic of edit log roll and 
> checkpoint
> -
>
> Key: HDFS-14378
> URL: https://issues.apache.org/jira/browse/HDFS-14378
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: ha, namenode
>Affects Versions: 3.1.2
>Reporter: star
>Assignee: star
>Priority: Major
> Attachments: HDFS-14378-trunk.001.patch, HDFS-14378-trunk.002.patch, 
> HDFS-14378-trunk.003.patch, HDFS-14378-trunk.004.patch, 
> HDFS-14378-trunk.005.patch, HDFS-14378-trunk.006.patch
>
>
>       HDFS-6440 introduced a mechanism to support more than 2 NNs. It 
> implements a first-writer-win policy to avoid duplicated fsimage downloading. 
> Variable 'isPrimaryCheckPointer' is used to hold the first-writer state, with 
> which SNN will provide fsimage for ANN next time. Then we have three roles in 
> NN cluster: ANN, one primary SNN, one or more normal SNN.
>       Since HDFS-12248, there may be more than two primary SNN shortly after 
> a exception occurred. It takes care with a scenario  that SNN will not upload 
> fsimage on IOE and Interrupted exceptions. Though it will not cause any 
> further functional issues, it is inconsistent. 
>       Futher more, edit log may be rolled more frequently than necessary with 
> multiple Standby name nodes, HDFS-14349. (I'm not so sure about this, will 
> verify by unit tests or any one could point it out.)
>       Above all, I‘m wondering if we could make it simple with following 
> changes:
>  * There are only two roles:ANN, SNN
>  * ANN will roll its edit log every DFS_HA_LOGROLL_PERIOD_KEY period.
>  * ANN will select a SNN to download checkpoint.
> SNN will just do logtail and checkpoint. Then provide a servlet for fsimage 
> downloading as normal. SNN will not try to roll edit log or send checkpoint 
> request to ANN.
> In a word, ANN will be more active. Suggestions are welcomed.
>  



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



  1   2   >