from:"Jing Zhao \(JIRA\)"

[jira] [Commented] (HDFS-16875) Erasure Coding: data access proxy to allow old clients to read EC data

2023-01-03 Thread Jing Zhao (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-16875?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17654209#comment-17654209
 ] 

Jing Zhao commented on HDFS-16875:
--

Posted the design doc for the EC access proxy.

> Erasure Coding: data access proxy to allow old clients to read EC data
> --
>
> Key: HDFS-16875
> URL: https://issues.apache.org/jira/browse/HDFS-16875
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: ec, erasure-coding
>Reporter: Jing Zhao
>Assignee: Jing Zhao
>Priority: Major
> Attachments: Erasure Coding Access Proxy.pdf
>
>
> Erasure Coding is only supported by Hadoop 3, while many production 
> deployments still depend on Hadoop 2. Upgrading the whole data tech stack to 
> the Hadoop 3 release may involve big migration efforts and even reliability 
> risks, considering the incompatibilities between these two Hadoop major 
> releases as well as the potential uncovered issues and risks hidden in newer 
> releases. Therefore, we need to find a solution, with the least amount of 
> migration effort and risk, to adopt Erasure Coding for cost efficiency but 
> still allow HDFS clients with old versions (Hadoop 2.x) to access EC data in 
> a transparent manner.
> Internally we have developed an EC access proxy which translates the EC data 
> for old clients. We also extend the NameNode RPC so it can recognize HDFS 
> clients with/without the EC support, and redirect the old clients to the 
> proxy. With the proxy we set up separate Erasure Coding clusters storing 
> hundreds of PB of data, while leaving other production clusters and all the 
> upper layer applications untouched.
> Considering some changes are made at fundamental components of HDFS (e.g., 
> client-NN RPC header), we do not aim to merge the change to trunk. We will 
> use this ticket to share the design and implementation details (including the 
> code) and collect feedback. We may use a separate github repo to open source 
> the implementation later.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-16875) Erasure Coding: data access proxy to allow old clients to read EC data

2023-01-03 Thread Jing Zhao (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-16875?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jing Zhao updated HDFS-16875:
-
Attachment: Erasure Coding Access Proxy.pdf

> Erasure Coding: data access proxy to allow old clients to read EC data
> --
>
> Key: HDFS-16875
> URL: https://issues.apache.org/jira/browse/HDFS-16875
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: ec, erasure-coding
>Reporter: Jing Zhao
>Assignee: Jing Zhao
>Priority: Major
> Attachments: Erasure Coding Access Proxy.pdf
>
>
> Erasure Coding is only supported by Hadoop 3, while many production 
> deployments still depend on Hadoop 2. Upgrading the whole data tech stack to 
> the Hadoop 3 release may involve big migration efforts and even reliability 
> risks, considering the incompatibilities between these two Hadoop major 
> releases as well as the potential uncovered issues and risks hidden in newer 
> releases. Therefore, we need to find a solution, with the least amount of 
> migration effort and risk, to adopt Erasure Coding for cost efficiency but 
> still allow HDFS clients with old versions (Hadoop 2.x) to access EC data in 
> a transparent manner.
> Internally we have developed an EC access proxy which translates the EC data 
> for old clients. We also extend the NameNode RPC so it can recognize HDFS 
> clients with/without the EC support, and redirect the old clients to the 
> proxy. With the proxy we set up separate Erasure Coding clusters storing 
> hundreds of PB of data, while leaving other production clusters and all the 
> upper layer applications untouched.
> Considering some changes are made at fundamental components of HDFS (e.g., 
> client-NN RPC header), we do not aim to merge the change to trunk. We will 
> use this ticket to share the design and implementation details (including the 
> code) and collect feedback. We may use a separate github repo to open source 
> the implementation later.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Created] (HDFS-16875) Erasure Coding: data access proxy to allow old clients to read EC data

2022-12-19 Thread Jing Zhao (Jira)

Jing Zhao created HDFS-16875:


 Summary: Erasure Coding: data access proxy to allow old clients to 
read EC data
 Key: HDFS-16875
 URL: https://issues.apache.org/jira/browse/HDFS-16875
 Project: Hadoop HDFS
  Issue Type: New Feature
  Components: ec, erasure-coding
Reporter: Jing Zhao
Assignee: Jing Zhao


Erasure Coding is only supported by Hadoop 3, while many production deployments 
still depend on Hadoop 2. Upgrading the whole data tech stack to the Hadoop 3 
release may involve big migration efforts and even reliability risks, 
considering the incompatibilities between these two Hadoop major releases as 
well as the potential uncovered issues and risks hidden in newer releases. 
Therefore, we need to find a solution, with the least amount of migration 
effort and risk, to adopt Erasure Coding for cost efficiency but still allow 
HDFS clients with old versions (Hadoop 2.x) to access EC data in a transparent 
manner.

Internally we have developed an EC access proxy which translates the EC data 
for old clients. We also extend the NameNode RPC so it can recognize HDFS 
clients with/without the EC support, and redirect the old clients to the proxy. 
With the proxy we set up separate Erasure Coding clusters storing hundreds of 
PB of data, while leaving other production clusters and all the upper layer 
applications untouched.

Considering some changes are made at fundamental components of HDFS (e.g., 
client-NN RPC header), we do not aim to merge the change to trunk. We will use 
this ticket to share the design and implementation details (including the code) 
and collect feedback. We may use a separate github repo to open source the 
implementation later.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-16874) Improve DataNode decommission for Erasure Coding

2022-12-19 Thread Jing Zhao (Jira)

[
https://issues.apache.org/jira/browse/HDFS-16874?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Jing Zhao updated HDFS-16874:
-
Description:
There are a couple of issues with the current DataNode decommission
implementation when large amounts of Erasure Coding data are involved in the
data re-replication/reconstruction process:
# Slowness. In HDFS-8786 we made a decision to use re-replication for DataNode
decommission if the internal EC block is still available. While this strategy
reduces the CPU cost caused by EC reconstruction, it greatly limits the overall
data recovery bandwidth, since there is only one single DataNode as the source.
While high density HDD hosts are more and more widely used by HDFS especially
along with Erasure Coding for warm data use case, this becomes a big pain for
cluster management. In our production, to decommission a DataNode with several
hundred TB EC data stored might take several days. HDFS-16613 provides
optimization based on the existing mechanism, but more fundamentally we may
want to allow EC reconstruction for DataNode decommission so as to achieve much
larger recovery bandwidth.
# The semantic of the existing EC reconstruction command (the
BlockECReconstructionInfoProto msg sent from NN to DN) is not clear. The
existing reconstruction command depends on the holes in the
srcNodes/liveBlockIndices arrays to indicate the target internal blocks for
recovery, while the holes can also be caused by the fact that the corresponding
datanode is too busy so it cannot be used as the reconstruction source. This
causes the later DataNode side reconstruction may not be consistent with the
original intention. E.g., if the index of the missing block is 6, and the
datanode storing block 0 is busy, the src nodes in the reconstruction command
only cover blocks [1, 2, 3, 4, 5, 7, 8]. The target datanode may reconstruct
the internal block 0 instead of 6. HDFS-16566 is working on this issue by
indicating an excluding index list. More fundamentally we can follow the same
path but go a step further by adding an optional field explicitly indicating
the target block indices in the command protobuf msg. With the extension the
DataNode will no longer use the holes in the src node array to "guess" the
reconstruction targets.

Internally we have developed and applied fixes by following the above
directions. We have seen significant improvement (100+ times speed up) in terms
of datanode decommission speed for EC data. The more clear semantic of the
reconstruction command protobuf msg also help prevent potential data corruption
during the EC reconstruction.

We will use this ticket to track the similar fixes for the Apache releases.

was:
There are a couple of issues with the current DataNode decommission
implementation when large amounts of Erasure Coding data are involved in the
data re-replication/reconstruction process:
# Slowness. In HDFS-8786 we made a decision to use re-replication for DataNode
decommission if the internal EC block is still available. While this strategy
reduces the CPU cost caused by EC reconstruction, it greatly limits the overall
data recovery bandwidth, since there is only one single DataNode as the source.
While high density HDD hosts are more and more widely used by HDFS especially
along with Erasure Coding for warm data use case, this becomes a big pain for
cluster management. In our production, to decommission a DataNode with several
hundred TB EC data stored might take several days. HDFS-16613 provides
optimization based on the existing mechanism, but more fundamentally we may
want to allow EC reconstruction for DataNode decommission so as to achieve much
larger recovery bandwidth.
# The semantic of the existing EC reconstruction command (the
BlockECReconstructionInfoProto msg sent from NN to DN) is not clear. The
existing reconstruction command depends on the holes in the
srcNodes/liveBlockIndices arrays to indicate the target internal blocks for
recovery, while the holes can also be caused by the fact that the corresponding
datanode is too busy so it cannot be used as the reconstruction source. This
causes the later DataNode side reconstruction may not be consistent with the
original intention. E.g., if the index of the missing block is 6, and the
datanode storing block 0 is busy, the src nodes in the reconstruction command
only cover blocks [1, 2, 3, 4, 5, 7, 8]. The target datanode may reconstruct
the internal block 0 instead of 6. HDFS-16566 is working on this issue by
indicating an excluding index list. More fundamentally we can follow the same
path but go steps further by adding an optional field explicitly indicating the
target block indices in the command protobuf msg. With the extension the
DataNode will no longer use the holes in the src node array to "guess" the
reconstruction targets.

Internally we have developed and applied

[jira] [Updated] (HDFS-16874) Improve DataNode decommission for Erasure Coding

2022-12-19 Thread Jing Zhao (Jira)

[
https://issues.apache.org/jira/browse/HDFS-16874?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Jing Zhao updated HDFS-16874:
-
Description:
There are a couple of issues with the current DataNode decommission
implementation when large amounts of Erasure Coding data are involved in the
data re-replication/reconstruction process:
# Slowness. In HDFS-8786 we made a decision to use re-replication for DataNode
decommission if the internal EC block is still available. While this strategy
reduces the CPU cost caused by EC reconstruction, it greatly limits the overall
data recovery bandwidth, since there is only one single DataNode as the source.
While high density HDD hosts are more and more widely used by HDFS especially
along with Erasure Coding for warm data use case, this becomes a big pain for
cluster management. In our production, to decommission a DataNode with several
hundred TB EC data stored might take several days. HDFS-16613 provides
optimization based on the existing mechanism, but more fundamentally we may
want to allow EC reconstruction for DataNode decommission so as to achieve much
larger recovery bandwidth.
# The semantic of the existing EC reconstruction command (the
BlockECReconstructionInfoProto msg sent from NN to DN) is not clear. The
existing reconstruction command depends on the holes in the
srcNodes/liveBlockIndices arrays to indicate the target internal blocks for
recovery, while the holes can also be caused by the fact that the corresponding
datanode is too busy so it cannot be used as the reconstruction source. This
causes the later DataNode side reconstruction may not be consistent with the
original intention. E.g., if the index of the missing block is 6, and the
datanode storing block 0 is busy, the src nodes in the reconstruction command
only cover blocks [1, 2, 3, 4, 5, 7, 8]. The target datanode may reconstruct
the internal block 0 instead of 6. HDFS-16566 is working on this issue by
indicating an excluding index list. More fundamentally we can follow the same
path but go steps further by adding an optional field explicitly indicating the
target block indices in the command protobuf msg. With the extension the
DataNode will no longer use the holes in the src node array to "guess" the
reconstruction targets.

We will use this ticket to track the similar fixes for the Apache releases.

Internally we have developed and applied

[jira] [Created] (HDFS-16874) Improve DataNode decommission for Erasure Coding

2022-12-19 Thread Jing Zhao (Jira)

Jing Zhao created HDFS-16874:


 Summary: Improve DataNode decommission for Erasure Coding
 Key: HDFS-16874
 URL: https://issues.apache.org/jira/browse/HDFS-16874
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: ec, erasure-coding
Reporter: Jing Zhao
Assignee: Jing Zhao


There are a couple of issues with the current DataNode decommission 
implementation when large amounts of Erasure Coding data are involved in the 
data re-replication/reconstruction process:
 # Slowness. In HDFS-8786 we made a decision to use re-replication for DataNode 
decommission if the internal EC block is still available. While this strategy 
reduces the CPU cost caused by EC reconstruction, it greatly limits the overall 
data recovery bandwidth, since there is only one single DataNode as the source. 
While high density HDD hosts are more and more widely used by HDFS especially 
along with Erasure Coding for warm data use case, this becomes a big pain for 
cluster management. In our production, to decommission a DataNode with several 
hundred TB EC data stored might take several days. HDFS-16613 provides 
optimization based on the existing mechanism, but more fundamentally we may 
want to allow EC reconstruction for DataNode decommission so as to achieve much 
larger recovery bandwidth.
 # The semantic of the existing EC reconstruction command (the 
BlockECReconstructionInfoProto msg sent from NN to DN) is not clear. The 
existing reconstruction command depends on the holes in the 
srcNodes/liveBlockIndices arrays to indicate the target internal blocks for 
recovery, while the holes can also be caused by the fact that the corresponding 
datanode is too busy so it cannot be used as the reconstruction source. This 
causes the later DataNode side reconstruction may not be consistent with the 
original intention. E.g., if the index of the missing block is 6, and the 
datanode storing block 0 is busy, the src nodes in the reconstruction command 
only cover blocks [1, 2, 3, 4, 5, 7, 8]. The target datanode may reconstruct 
the internal block 0 instead of 6. HDFS-16566 is working on this issue by 
indicating an excluding index list. More fundamentally we can follow the same 
path but go steps further by adding an optional field explicitly indicating the 
target block indices in the command protobuf msg. With the extension the 
DataNode will no longer use the holes in the src node array to "guess" the 
reconstruction targets.

Internally we have developed and applied fixes by following the above 
directions. We have seen significant improvement (100+ times) in terms of 
datanode decommission speed for EC data. The more clear semantic of the 
reconstruction command protobuf msg also help prevent potential data corruption 
during the EC reconstruction.

We will use this ticket to track the similar fixes for the Apache releases.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-16422) Fix thread safety of EC decoding during concurrent preads

2022-03-09 Thread Jing Zhao (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-16422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17503858#comment-17503858
 ] 

Jing Zhao commented on HDFS-16422:
--

Looks like we already have r/w lock protection in AbstractNativeRawDecoder and 
its subclasses (NativeRSRawDecoder and NativeXORRawDecoder). So does that mean 
the extra protection is only necessary for other decoder implementations (such 
as RSRawDecoder)? 

HADOOP-15499 used r/w lock to replace the original object monitor (i.e. 
synchronized) so as to improve performance. Now looks like we're adding 
"synchronized" back to the APIs defined in the parent class.

I guess instead of updating the decode APIs in RawErasureDecoder, we may want 
to only fix the subclasses without lock protection. What do you think, 
[~weichiu] [~cndaimin] ?

> Fix thread safety of EC decoding during concurrent preads
> -
>
> Key: HDFS-16422
> URL: https://issues.apache.org/jira/browse/HDFS-16422
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: dfsclient, ec, erasure-coding
>Affects Versions: 3.3.0, 3.3.1
>Reporter: daimin
>Assignee: daimin
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.2.3, 3.3.3
>
>  Time Spent: 3h 40m
>  Remaining Estimate: 0h
>
> Reading data on an erasure-coded file with missing replicas(internal block of 
> block group) will cause online reconstruction: read dataUnits part of data 
> and decode them into the target missing data. Each DFSStripedInputStream 
> object has a RawErasureDecoder object, and when we doing pread concurrently, 
> RawErasureDecoder.decode will be invoked concurrently too. 
> RawErasureDecoder.decode is not thread safe, as a result of that we get wrong 
> data from pread occasionally.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-16283) RBF: improve renewLease() to call only a specific NameNode rather than make fan-out calls

2021-10-29 Thread Jing Zhao (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-16283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17436168#comment-17436168
 ] 

Jing Zhao commented on HDFS-16283:
--

Checking into the code, looks like besides the router side performance issue, 
the whole lease mechanism may need to get updated to support hdfs router.

Currently the DFSClient uses a map(INodeID, DFSOutputStream) to track all the 
being-written files. The assumption is that all the being-written files are in 
the same nameservice thus there is no INodeID conflict. Now with the support of 
router, we may have two files belonging to two different nameservices sharing 
the same INodeID (though the possibility is very low in production). So 
theoretically we should update the being-written-file map to ((nameservice, 
INodeID) – DFSOutputStream).

I understand the concern that with router we do not want client to know 
individual nameservices. It would be better if we can still hide nameservices. 
We can discuss different solutions in this ticket. In summary I guess we need 
to have a new mechanism to align the current INode ID based lease renewal 
approach to the new router architecture.

> RBF: improve renewLease() to call only a specific NameNode rather than make 
> fan-out calls
> -
>
> Key: HDFS-16283
> URL: https://issues.apache.org/jira/browse/HDFS-16283
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: rbf
>Reporter: Aihua Xu
>Assignee: Aihua Xu
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> Currently renewLease() against a router will make fan-out to all the 
> NameNodes. Since renewLease() call is so frequent and if one of the NameNodes 
> are slow, then eventually the router queues are blocked by all renewLease() 
> and cause router degradation. 
> We will make a change in the client side to keep track of NameNode Id in 
> additional to current fileId so routers understand which NameNodes the client 
> is renewing lease against.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-16268) Balancer stuck when moving striped blocks due to NPE

2021-10-13 Thread Jing Zhao (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-16268?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17428552#comment-17428552
 ] 

Jing Zhao commented on HDFS-16268:
--

I've committed the fix. Thanks for the contribution, [~LeonG]!

> Balancer stuck when moving striped blocks due to NPE
> 
>
> Key: HDFS-16268
> URL: https://issues.apache.org/jira/browse/HDFS-16268
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: balancer  mover, erasure-coding
>Affects Versions: 3.2.2
>Reporter: Leon Gao
>Assignee: Leon Gao
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> {code:java}
> 21/10/11 06:11:26 WARN balancer.Dispatcher: Dispatcher thread failed
> java.lang.NullPointerException
>         at 
> org.apache.hadoop.hdfs.server.balancer.Dispatcher$PendingMove.markMovedIfGoodBlock(Dispatcher.java:289)
>         at 
> org.apache.hadoop.hdfs.server.balancer.Dispatcher$PendingMove.chooseBlockAndProxy(Dispatcher.java:272)
>         at 
> org.apache.hadoop.hdfs.server.balancer.Dispatcher$PendingMove.access$2500(Dispatcher.java:236)
>         at 
> org.apache.hadoop.hdfs.server.balancer.Dispatcher$Source.chooseNextMove(Dispatcher.java:899)
>         at 
> org.apache.hadoop.hdfs.server.balancer.Dispatcher$Source.dispatchBlocks(Dispatcher.java:958)
>         at 
> org.apache.hadoop.hdfs.server.balancer.Dispatcher$Source.access$3300(Dispatcher.java:757)
>         at 
> org.apache.hadoop.hdfs.server.balancer.Dispatcher$2.run(Dispatcher.java:1226)
>         at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>         at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>         at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>         at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>         at java.lang.Thread.run(Thread.java:748)
> {code}
> Due to NPE in the middle, there will be pending moves left in the queue so 
> balancer will stuck forever.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Resolved] (HDFS-16268) Balancer stuck when moving striped blocks due to NPE

2021-10-13 Thread Jing Zhao (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-16268?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jing Zhao resolved HDFS-16268.
--
Fix Version/s: 3.4.0
 Hadoop Flags: Reviewed
   Resolution: Fixed

> Balancer stuck when moving striped blocks due to NPE
> 
>
> Key: HDFS-16268
> URL: https://issues.apache.org/jira/browse/HDFS-16268
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: balancer  mover, erasure-coding
>Affects Versions: 3.2.2
>Reporter: Leon Gao
>Assignee: Leon Gao
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> {code:java}
> 21/10/11 06:11:26 WARN balancer.Dispatcher: Dispatcher thread failed
> java.lang.NullPointerException
>         at 
> org.apache.hadoop.hdfs.server.balancer.Dispatcher$PendingMove.markMovedIfGoodBlock(Dispatcher.java:289)
>         at 
> org.apache.hadoop.hdfs.server.balancer.Dispatcher$PendingMove.chooseBlockAndProxy(Dispatcher.java:272)
>         at 
> org.apache.hadoop.hdfs.server.balancer.Dispatcher$PendingMove.access$2500(Dispatcher.java:236)
>         at 
> org.apache.hadoop.hdfs.server.balancer.Dispatcher$Source.chooseNextMove(Dispatcher.java:899)
>         at 
> org.apache.hadoop.hdfs.server.balancer.Dispatcher$Source.dispatchBlocks(Dispatcher.java:958)
>         at 
> org.apache.hadoop.hdfs.server.balancer.Dispatcher$Source.access$3300(Dispatcher.java:757)
>         at 
> org.apache.hadoop.hdfs.server.balancer.Dispatcher$2.run(Dispatcher.java:1226)
>         at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>         at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>         at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>         at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>         at java.lang.Thread.run(Thread.java:748)
> {code}
> Due to NPE in the middle, there will be pending moves left in the queue so 
> balancer will stuck forever.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Resolved] (HDFS-10648) Expose Balancer metrics through Metrics2

2021-09-21 Thread Jing Zhao (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-10648?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jing Zhao resolved HDFS-10648.
--
Fix Version/s: 3.4.0
 Hadoop Flags: Reviewed
   Resolution: Resolved

Thanks for the work, [~LeonG]! I've committed the diff.

> Expose Balancer metrics through Metrics2
> 
>
> Key: HDFS-10648
> URL: https://issues.apache.org/jira/browse/HDFS-10648
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: balancer  mover, metrics
>Reporter: Mark Wagner
>Assignee: Leon Gao
>Priority: Major
>  Labels: metrics, pull-request-available
> Fix For: 3.4.0
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> The Balancer currently prints progress information to the console. For 
> deployments that run the balancer frequently, it would be helpful to collect 
> those metrics for publishing to the available sinks. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Resolved] (HDFS-16224) testBalancerWithObserverWithFailedNode times out

2021-09-17 Thread Jing Zhao (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-16224?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jing Zhao resolved HDFS-16224.
--
Fix Version/s: 3.4.0
 Hadoop Flags: Reviewed
   Resolution: Fixed

I've committed the fix. Thanks [~LeonG]!

> testBalancerWithObserverWithFailedNode times out
> 
>
> Key: HDFS-16224
> URL: https://issues.apache.org/jira/browse/HDFS-16224
> Project: Hadoop HDFS
>  Issue Type: Test
>  Components: test
>Reporter: Leon Gao
>Assignee: Leon Gao
>Priority: Trivial
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> testBalancerWithObserverWithFailedNode fails intermittently.
>  
> Seems it is because of datanode cannot shutdown because we need to wait for 
> datanodes to finish retries to failed observer.
>  
> Jenkins report:
>  
> [ERROR] 
> testBalancerWithObserverWithFailedNode(org.apache.hadoop.hdfs.server.balancer.TestBalancerWithHANameNodes)
>  Time elapsed: 180.144 s <<< ERROR! 
> org.junit.runners.model.TestTimedOutException: test timed out after 18 
> milliseconds at java.lang.Object.wait(Native Method) at 
> java.lang.Thread.join(Thread.java:1252) at 
> java.lang.Thread.join(Thread.java:1326) at 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.join(BPServiceActor.java:632)
>  at 
> org.apache.hadoop.hdfs.server.datanode.BPOfferService.join(BPOfferService.java:360)
>  at 
> org.apache.hadoop.hdfs.server.datanode.BlockPoolManager.shutDownAll(BlockPoolManager.java:119)
>  at 
> org.apache.hadoop.hdfs.server.datanode.DataNode.shutdown(DataNode.java:2169) 
> at 
> org.apache.hadoop.hdfs.MiniDFSCluster.shutdownDataNode(MiniDFSCluster.java:2166)
>  at 
> org.apache.hadoop.hdfs.MiniDFSCluster.shutdownDataNodes(MiniDFSCluster.java:2156)
>  at org.apache.hadoop.hdfs.MiniDFSCluster.shutdown(MiniDFSCluster.java:2135) 
> at org.apache.hadoop.hdfs.MiniDFSCluster.shutdown(MiniDFSCluster.java:2109) 
> at org.apache.hadoop.hdfs.MiniDFSCluster.shutdown(MiniDFSCluster.java:2102) 
> at 
> org.apache.hadoop.hdfs.qjournal.MiniQJMHACluster.shutdown(MiniQJMHACluster.java:189)
>  at 
> org.apache.hadoop.hdfs.server.balancer.TestBalancerWithHANameNodes.testBalancerWithObserver(TestBalancerWithHANameNodes.java:240)
>  at 
> org.apache.hadoop.hdfs.server.balancer.TestBalancerWithHANameNodes.testBalancerWithObserverWithFailedNode(TestBalancerWithHANameNodes.java:197)
>  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  at java.lang.reflect.Method.invoke(Method.java:498) at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
>  at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>  at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
>  at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>  at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:299)
>  at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:293)
>  at java.util.concurrent.FutureTask.run(FutureTask.java:266) at 
> java.lang.Thread.run(Thread.java:748)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-16133) Support refresh of IP addresses behind DNS for clients

2021-07-21 Thread Jing Zhao (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-16133?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17384672#comment-17384672
 ] 

Jing Zhao commented on HDFS-16133:
--

Sure, done

> Support refresh of IP addresses behind DNS for clients
> --
>
> Key: HDFS-16133
> URL: https://issues.apache.org/jira/browse/HDFS-16133
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Srinidhi V K
>Priority: Minor
>
> Support for using a single DNS for clients was added as part of HDFS-14118. 
> Java client does the resolution once and caches it. This causes a problem 
> whenever a node is added or removed behind DNS. The idea with this task is to 
> handle this scenario and refresh the IP addresses automatically in Java 
> client.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Resolved] (HDFS-15842) HDFS mover to emit metrics

2021-06-19 Thread Jing Zhao (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-15842?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jing Zhao resolved HDFS-15842.
--
Fix Version/s: 3.4.0
 Hadoop Flags: Reviewed
   Resolution: Fixed

+1. Thank you for the contribution, [~LeonG]!

> HDFS mover to emit metrics
> --
>
> Key: HDFS-15842
> URL: https://issues.apache.org/jira/browse/HDFS-15842
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: balancer  mover
>Reporter: Leon Gao
>Assignee: Leon Gao
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> We can emit metrics thru metrics2 when running HDFS mover, which can help to 
> monitor the progress and turn mover parameters.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Resolved] (HDFS-15781) Add metrics for how blocks are moved in replaceBlock

2021-02-23 Thread Jing Zhao (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-15781?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jing Zhao resolved HDFS-15781.
--
Fix Version/s: 3.4.0
 Hadoop Flags: Reviewed
   Resolution: Fixed

I've committed the patch. Thank you for the contribution, [~LeonG] !

> Add metrics for how blocks are moved in replaceBlock
> 
>
> Key: HDFS-15781
> URL: https://issues.apache.org/jira/browse/HDFS-15781
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: datanode
>Reporter: Leon Gao
>Assignee: Leon Gao
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> We can add some metrics for  to track how the blocks are being moved, to get 
> a sense of the locality of movements.
>  * How many blocks copied to local host?
>  * How many blocks moved to local disk thru hardlink?
>  * How many blocks are copied out of the host
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-15683) Allow configuring DISK/ARCHIVE capacity for individual volumes

2021-02-08 Thread Jing Zhao (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-15683?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jing Zhao updated HDFS-15683:
-
Release Note: Add a new configuration 
"dfs.datanode.same-disk-tiering.capacity-ratio.percentage" to allow admins to 
configure capacity for individual volumes on the same mount.

> Allow configuring DISK/ARCHIVE capacity for individual volumes
> --
>
> Key: HDFS-15683
> URL: https://issues.apache.org/jira/browse/HDFS-15683
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: datanode
>Reporter: Leon Gao
>Assignee: Leon Gao
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>  Time Spent: 4h
>  Remaining Estimate: 0h
>
> This is a follow-up task for https://issues.apache.org/jira/browse/HDFS-15548
> In case that the datanode disks are not unified, we should allow admins to 
> configure capacity for individual volumes on top of the default one.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Resolved] (HDFS-15683) Allow configuring DISK/ARCHIVE capacity for individual volumes

2021-02-08 Thread Jing Zhao (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-15683?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jing Zhao resolved HDFS-15683.
--
Fix Version/s: 3.4.0
 Hadoop Flags: Reviewed
   Resolution: Fixed

> Allow configuring DISK/ARCHIVE capacity for individual volumes
> --
>
> Key: HDFS-15683
> URL: https://issues.apache.org/jira/browse/HDFS-15683
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: datanode
>Reporter: Leon Gao
>Assignee: Leon Gao
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>  Time Spent: 4h
>  Remaining Estimate: 0h
>
> This is a follow-up task for https://issues.apache.org/jira/browse/HDFS-15548
> In case that the datanode disks are not unified, we should allow admins to 
> configure capacity for individual volumes on top of the default one.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-15683) Allow configuring DISK/ARCHIVE capacity for individual volumes

2021-02-08 Thread Jing Zhao (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-15683?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17281326#comment-17281326
 ] 

Jing Zhao commented on HDFS-15683:
--

+1. I've committed the patch. Thank you for the contribution, [~LeonG]!

BTW, the javac warnings in the builds look unrelated. It reported "generated 14 
new + 580 unchanged - 14 fixed = 594 total (was 594)" and the warnings are not 
caused by this PR. But we can use this chance to fix these warnings. Please 
file a new jira to do that, @LeonGao91 


> Allow configuring DISK/ARCHIVE capacity for individual volumes
> --
>
> Key: HDFS-15683
> URL: https://issues.apache.org/jira/browse/HDFS-15683
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: datanode
>Reporter: Leon Gao
>Assignee: Leon Gao
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 4h
>  Remaining Estimate: 0h
>
> This is a follow-up task for https://issues.apache.org/jira/browse/HDFS-15548
> In case that the datanode disks are not unified, we should allow admins to 
> configure capacity for individual volumes on top of the default one.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Comment Edited] (HDFS-15683) Allow configuring DISK/ARCHIVE capacity for individual volumes

2021-02-08 Thread Jing Zhao (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-15683?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17281326#comment-17281326
 ] 

Jing Zhao edited comment on HDFS-15683 at 2/8/21, 7:35 PM:
---

+1. I've committed the patch. Thank you for the contribution, [~LeonG]!

BTW, the javac warnings in the builds look unrelated. It reported "generated 14 
new + 580 unchanged - 14 fixed = 594 total (was 594)" and the warnings are not 
caused by this PR. But we can use this chance to fix these warnings. Please 
file a new jira to do that, [~LeonG]



was (Author: jingzhao):
+1. I've committed the patch. Thank you for the contribution, [~LeonG]!

BTW, the javac warnings in the builds look unrelated. It reported "generated 14 
new + 580 unchanged - 14 fixed = 594 total (was 594)" and the warnings are not 
caused by this PR. But we can use this chance to fix these warnings. Please 
file a new jira to do that, @LeonGao91 


> Allow configuring DISK/ARCHIVE capacity for individual volumes
> --
>
> Key: HDFS-15683
> URL: https://issues.apache.org/jira/browse/HDFS-15683
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: datanode
>Reporter: Leon Gao
>Assignee: Leon Gao
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 4h
>  Remaining Estimate: 0h
>
> This is a follow-up task for https://issues.apache.org/jira/browse/HDFS-15548
> In case that the datanode disks are not unified, we should allow admins to 
> configure capacity for individual volumes on top of the default one.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Resolved] (HDFS-15549) Use Hardlink to move replica between DISK and ARCHIVE storage if on same filesystem mount

2021-01-15 Thread Jing Zhao (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-15549?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jing Zhao resolved HDFS-15549.
--
Fix Version/s: 3.4.0
 Hadoop Flags: Reviewed
   Resolution: Fixed

+1. Thank you for the contribution, [~LeonG]!

> Use Hardlink to move replica between DISK and ARCHIVE storage if on same 
> filesystem mount
> -
>
> Key: HDFS-15549
> URL: https://issues.apache.org/jira/browse/HDFS-15549
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: datanode
>Reporter: Leon Gao
>Assignee: Leon Gao
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>  Time Spent: 3h
>  Remaining Estimate: 0h
>
> When moving blocks between DISK/ARCHIVE, we should prefer the volume on the 
> same underlying filesystem and use "rename" instead of "copy" to save IO.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-15549) Use Hardlink to move replica between DISK and ARCHIVE storage if on same filesystem mount

2021-01-15 Thread Jing Zhao (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-15549?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jing Zhao updated HDFS-15549:
-
Summary: Use Hardlink to move replica between DISK and ARCHIVE storage if 
on same filesystem mount  (was: Improve DISK/ARCHIVE movement if they are on 
same filesystem)

> Use Hardlink to move replica between DISK and ARCHIVE storage if on same 
> filesystem mount
> -
>
> Key: HDFS-15549
> URL: https://issues.apache.org/jira/browse/HDFS-15549
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: datanode
>Reporter: Leon Gao
>Assignee: Leon Gao
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 3h
>  Remaining Estimate: 0h
>
> When moving blocks between DISK/ARCHIVE, we should prefer the volume on the 
> same underlying filesystem and use "rename" instead of "copy" to save IO.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-15549) Improve DISK/ARCHIVE movement if they are on same filesystem

2021-01-08 Thread Jing Zhao (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-15549?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17261690#comment-17261690
 ] 

Jing Zhao commented on HDFS-15549:
--

Thank you for working on this, [~LeonG]! I went through the current patch and 
it looks good to me in general. Please feel free to upload the complete version 
when it's ready.

> Improve DISK/ARCHIVE movement if they are on same filesystem
> 
>
> Key: HDFS-15549
> URL: https://issues.apache.org/jira/browse/HDFS-15549
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: datanode
>Reporter: Leon Gao
>Assignee: Leon Gao
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> When moving blocks between DISK/ARCHIVE, we should prefer the volume on the 
> same underlying filesystem and use "rename" instead of "copy" to save IO.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-14904) Add Option to let Balancer prefer highly utilized nodes in each iteration

2020-12-02 Thread Jing Zhao (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-14904?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17242816#comment-17242816
 ] 

Jing Zhao commented on HDFS-14904:
--

+1. I've committed the change. Thank you for the contribution, [~LeonG]!

> Add Option to let Balancer prefer highly utilized nodes in each iteration
> -
>
> Key: HDFS-14904
> URL: https://issues.apache.org/jira/browse/HDFS-14904
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: balancer  mover
>Reporter: Leon Gao
>Assignee: Leon Gao
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> Normally the most important purpose for HDFS balancer is to reduce the top 
> used node to prevent datanode usage from being too high.
> Currently, balancer almost randomly picks nodes as sources regardless of 
> usage, which makes it slow to bring down the top used datanodes in the 
> cluster, when there are less underutilized nodes in the cluster (consider 
> expansion).
> We can add an option to prefer top used nodes first in each iteration, as 
> suggested in HDFS-14894 .



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Resolved] (HDFS-14904) Add Option to let Balancer prefer highly utilized nodes in each iteration

2020-12-02 Thread Jing Zhao (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-14904?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jing Zhao resolved HDFS-14904.
--
Fix Version/s: 3.4.0
 Hadoop Flags: Reviewed
   Resolution: Fixed

> Add Option to let Balancer prefer highly utilized nodes in each iteration
> -
>
> Key: HDFS-14904
> URL: https://issues.apache.org/jira/browse/HDFS-14904
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: balancer  mover
>Reporter: Leon Gao
>Assignee: Leon Gao
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> Normally the most important purpose for HDFS balancer is to reduce the top 
> used node to prevent datanode usage from being too high.
> Currently, balancer almost randomly picks nodes as sources regardless of 
> usage, which makes it slow to bring down the top used datanodes in the 
> cluster, when there are less underutilized nodes in the cluster (consider 
> expansion).
> We can add an option to prefer top used nodes first in each iteration, as 
> suggested in HDFS-14894 .



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-14904) Add Option to let Balancer prefer highly utilized nodes in each iteration

2020-12-02 Thread Jing Zhao (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-14904?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jing Zhao updated HDFS-14904:
-
Summary: Add Option to let Balancer prefer highly utilized nodes in each 
iteration  (was: Option to let Balancer prefer top used nodes in each iteration)

> Add Option to let Balancer prefer highly utilized nodes in each iteration
> -
>
> Key: HDFS-14904
> URL: https://issues.apache.org/jira/browse/HDFS-14904
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: balancer  mover
>Reporter: Leon Gao
>Assignee: Leon Gao
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> Normally the most important purpose for HDFS balancer is to reduce the top 
> used node to prevent datanode usage from being too high.
> Currently, balancer almost randomly picks nodes as sources regardless of 
> usage, which makes it slow to bring down the top used datanodes in the 
> cluster, when there are less underutilized nodes in the cluster (consider 
> expansion).
> We can add an option to prefer top used nodes first in each iteration, as 
> suggested in HDFS-14894 .



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Resolved] (HDFS-15548) Allow configuring DISK/ARCHIVE storage types on same device mount

2020-11-09 Thread Jing Zhao (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-15548?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jing Zhao resolved HDFS-15548.
--
Fix Version/s: 3.4.0
 Hadoop Flags: Reviewed
   Resolution: Fixed

I've committed the change. Thank you for the great work, [~LeonG]! Thanks for 
the review, [~hexiaoqiao]!

> Allow configuring DISK/ARCHIVE storage types on same device mount
> -
>
> Key: HDFS-15548
> URL: https://issues.apache.org/jira/browse/HDFS-15548
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: datanode
>Reporter: Leon Gao
>Assignee: Leon Gao
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>  Time Spent: 9.5h
>  Remaining Estimate: 0h
>
> We can allow configuring DISK/ARCHIVE storage types on the same device mount 
> on two separate directories.
> Users should be able to configure the capacity for each. Also, the datanode 
> usage report should report stats correctly.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-15548) Allow configuring DISK/ARCHIVE storage types on same device mount

2020-11-05 Thread Jing Zhao (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-15548?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17227065#comment-17227065
 ] 

Jing Zhao commented on HDFS-15548:
--

The current PR looks good to me. Do you also want to take another look, 
[~hexiaoqiao]?

> Allow configuring DISK/ARCHIVE storage types on same device mount
> -
>
> Key: HDFS-15548
> URL: https://issues.apache.org/jira/browse/HDFS-15548
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: datanode
>Reporter: Leon Gao
>Assignee: Leon Gao
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 8.5h
>  Remaining Estimate: 0h
>
> We can allow configuring DISK/ARCHIVE storage types on the same device mount 
> on two separate directories.
> Users should be able to configure the capacity for each. Also, the datanode 
> usage report should report stats correctly.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Comment Edited] (HDFS-11797) BlockManager#createLocatedBlocks() can throw ArrayIndexOutofBoundsException when corrupt replicas are inconsistent

2017-06-07 Thread Jing Zhao (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-11797?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16042052#comment-16042052
 ] 

Jing Zhao edited comment on HDFS-11797 at 6/8/17 1:29 AM:
--

I have not checked the details, but is it related to HDFS-11445 (more 
specifically, this 
[comment|https://issues.apache.org/jira/browse/HDFS-11445?focusedCommentId=15898236=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15898236])
 ?


was (Author: jingzhao):
I have not checked the details, but is it related to HDFS-11445 (more 
specifically, this 
[comment|https://issues.apache.org/jira/browse/HDFS-11445?focusedCommentId=15898236=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15898236]).

> BlockManager#createLocatedBlocks() can throw ArrayIndexOutofBoundsException 
> when corrupt replicas are inconsistent
> --
>
> Key: HDFS-11797
> URL: https://issues.apache.org/jira/browse/HDFS-11797
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Kuhu Shukla
>Assignee: Kuhu Shukla
>Priority: Critical
> Attachments: HDFS-11797.001.patch
>
>
> The calculation for {{numMachines}} can be too less (causing 
> ArrayIndexOutOfBoundsException) or too many (causing NPE (HDFS-9958)) if data 
> structures find inconsistent number of corrupt replicas. This was earlier 
> found related to failed storages. This JIRA tracks a change that works for 
> all possible cases of inconsistencies.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Comment Edited] (HDFS-11797) BlockManager#createLocatedBlocks() can throw ArrayIndexOutofBoundsException when corrupt replicas are inconsistent

2017-06-07 Thread Jing Zhao (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-11797?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16042052#comment-16042052
 ] 

Jing Zhao edited comment on HDFS-11797 at 6/8/17 1:29 AM:
--

I have not checked the details, but is it related to HDFS-11445 (more 
specifically, this 
[comment|https://issues.apache.org/jira/browse/HDFS-11445?focusedCommentId=15898236=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15898236]).


was (Author: jingzhao):
I have not checked the details, but is it related to HDFS-11445?

> BlockManager#createLocatedBlocks() can throw ArrayIndexOutofBoundsException 
> when corrupt replicas are inconsistent
> --
>
> Key: HDFS-11797
> URL: https://issues.apache.org/jira/browse/HDFS-11797
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Kuhu Shukla
>Assignee: Kuhu Shukla
>Priority: Critical
> Attachments: HDFS-11797.001.patch
>
>
> The calculation for {{numMachines}} can be too less (causing 
> ArrayIndexOutOfBoundsException) or too many (causing NPE (HDFS-9958)) if data 
> structures find inconsistent number of corrupt replicas. This was earlier 
> found related to failed storages. This JIRA tracks a change that works for 
> all possible cases of inconsistencies.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-11797) BlockManager#createLocatedBlocks() can throw ArrayIndexOutofBoundsException when corrupt replicas are inconsistent

2017-06-07 Thread Jing Zhao (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-11797?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16042052#comment-16042052
 ] 

Jing Zhao commented on HDFS-11797:
--

I have not checked the details, but is it related to HDFS-11445?

> BlockManager#createLocatedBlocks() can throw ArrayIndexOutofBoundsException 
> when corrupt replicas are inconsistent
> --
>
> Key: HDFS-11797
> URL: https://issues.apache.org/jira/browse/HDFS-11797
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Kuhu Shukla
>Assignee: Kuhu Shukla
>Priority: Critical
> Attachments: HDFS-11797.001.patch
>
>
> The calculation for {{numMachines}} can be too less (causing 
> ArrayIndexOutOfBoundsException) or too many (causing NPE (HDFS-9958)) if data 
> structures find inconsistent number of corrupt replicas. This was earlier 
> found related to failed storages. This JIRA tracks a change that works for 
> all possible cases of inconsistencies.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-11823) Extend TestDFSStripedIutputStream/TestDFSStripedOutputStream with a random EC policy

2017-05-24 Thread Jing Zhao (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-11823?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jing Zhao updated HDFS-11823:
-
   Resolution: Fixed
 Hadoop Flags: Reviewed
Fix Version/s: 3.0.0-alpha3
   Status: Resolved  (was: Patch Available)

I've committed the patch.

> Extend TestDFSStripedIutputStream/TestDFSStripedOutputStream with a random EC 
> policy
> 
>
> Key: HDFS-11823
> URL: https://issues.apache.org/jira/browse/HDFS-11823
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: erasure-coding, test
>Reporter: Takanobu Asanuma
>Assignee: Takanobu Asanuma
>  Labels: hdfs-ec-3.0-nice-to-have
> Fix For: 3.0.0-alpha3
>
> Attachments: HDFS-11823.1.patch
>
>
> From the discussion in HDFS-7866 and HDFS-9962, in addtion to the default ec 
> policy, it would be good if we add a random ec policy to each test.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-11823) Extend TestDFSStripedIutputStream/TestDFSStripedOutputStream with a random EC policy

2017-05-24 Thread Jing Zhao (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-11823?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16023376#comment-16023376
 ] 

Jing Zhao commented on HDFS-11823:
--

Thanks for working on this, [~tasanuma0829]! The patch looks good to me. The 
failed tests are unrelated. +1

> Extend TestDFSStripedIutputStream/TestDFSStripedOutputStream with a random EC 
> policy
> 
>
> Key: HDFS-11823
> URL: https://issues.apache.org/jira/browse/HDFS-11823
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: erasure-coding, test
>Reporter: Takanobu Asanuma
>Assignee: Takanobu Asanuma
>  Labels: hdfs-ec-3.0-nice-to-have
> Attachments: HDFS-11823.1.patch
>
>
> From the discussion in HDFS-7866 and HDFS-9962, in addtion to the default ec 
> policy, it would be good if we add a random ec policy to each test.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-11445) FSCK shows overall health stauts as corrupt even one replica is corrupt

2017-05-18 Thread Jing Zhao (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-11445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16016362#comment-16016362
 ] 

Jing Zhao commented on HDFS-11445:
--

Thanks for updating the patch, [~brahmareddy]. The patch looks good to me. +1 
after fixing the new checkstyle warnings.

> FSCK shows overall health stauts as corrupt even one replica is corrupt
> ---
>
> Key: HDFS-11445
> URL: https://issues.apache.org/jira/browse/HDFS-11445
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Brahma Reddy Battula
>Assignee: Brahma Reddy Battula
> Attachments: HDFS-11445-002.patch, HDFS-11445-003.patch, 
> HDFS-11445.patch
>
>
> In the following scenario,FSCK shows overall health status as corrupt even 
> it's has one good replica.
> 1. Create file with 2 RF.
> 2. Shutdown one DN
> 3. Append to file again. 
> 4. Restart the DN
> 5. After block report, check Fsck



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-11445) FSCK shows overall health stauts as corrupt even one replica is corrupt

2017-05-16 Thread Jing Zhao (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-11445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16013006#comment-16013006
 ] 

Jing Zhao commented on HDFS-11445:
--

Thanks for the patch, [~brahmareddy]. For the current patch, instead of passing 
BlockManager to {{commitBlock}} and {{setGenerationStampAnderifyReplcias}}, why 
not letting these methods return the list of stale replicas and then removing 
these stored blockInfo in BlockManager?

> FSCK shows overall health stauts as corrupt even one replica is corrupt
> ---
>
> Key: HDFS-11445
> URL: https://issues.apache.org/jira/browse/HDFS-11445
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Brahma Reddy Battula
>Assignee: Brahma Reddy Battula
> Attachments: HDFS-11445-002.patch, HDFS-11445.patch
>
>
> In the following scenario,FSCK shows overall health status as corrupt even 
> it's has one good replica.
> 1. Create file with 2 RF.
> 2. Shutdown one DN
> 3. Append to file again. 
> 4. Restart the DN
> 5. After block report, check Fsck



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-11448) JN log segment syncing should support HA upgrade

2017-05-04 Thread Jing Zhao (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-11448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15997365#comment-15997365
 ] 

Jing Zhao commented on HDFS-11448:
--

+1 for the 03 patch. Thank you for the work [~hanishakoneru]! Thank you for the 
review, [~arpitagarwal]!

> JN log segment syncing should support HA upgrade
> 
>
> Key: HDFS-11448
> URL: https://issues.apache.org/jira/browse/HDFS-11448
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs
>Reporter: Hanisha Koneru
>Assignee: Hanisha Koneru
> Attachments: HDFS-11448.001.patch, HDFS-11448.002.patch, 
> HDFS-11448.003.patch
>
>
> HDFS-4025 adds support for sychronizing past log segments to JNs that missed 
> them. But, as pointed out by [~jingzhao], if the segment download happens 
> when an admin tries to rollback, it might fail ([see 
> comment|https://issues.apache.org/jira/browse/HDFS-4025?focusedCommentId=15850633=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15850633]).



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-11448) JN log segment syncing should support HA upgrade

2017-05-03 Thread Jing Zhao (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-11448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15995387#comment-15995387
 ] 

Jing Zhao commented on HDFS-11448:
--

Thanks for continuing working on this, [~hanishakoneru]. And thanks for the 
review, [~arpitagarwal]. I have 2 further minor comments:
# I'm not very sure if we still need to check if the current directory exists 
when starting a sync iteration. We have already checked if current directory 
exists while initializing the journal. Then during the upgrade/rollback, both 
the {{moveTmpSegmentToCurrent}} and {{doRollback}} hold the Journal object's 
monitor, {{moveTmpSegmentToCurrent}} also checks the {{committedTxnId}} before 
moving. Thus to me it is not necessary to have this current directory check in 
{{canJournalSync}}.
{code}
  public boolean canJournalSync() {
// JN should not sync if there is no current directory (during upgrade or
// rollback).
  return storage.getCurrentDir().exists();
  }
{code}
# About the name of the temporary directory: maybe we can have a more specific 
name like "edits.tmp" or "edits.sync"? 

Otherwise the patch looks good to me.

> JN log segment syncing should support HA upgrade
> 
>
> Key: HDFS-11448
> URL: https://issues.apache.org/jira/browse/HDFS-11448
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs
>Reporter: Hanisha Koneru
>Assignee: Hanisha Koneru
> Attachments: HDFS-11448.001.patch
>
>
> HDFS-4025 adds support for sychronizing past log segments to JNs that missed 
> them. But, as pointed out by [~jingzhao], if the segment download happens 
> when an admin tries to rollback, it might fail ([see 
> comment|https://issues.apache.org/jira/browse/HDFS-4025?focusedCommentId=15850633=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15850633]).



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-11395) RequestHedgingProxyProvider#RequestHedgingInvocationHandler hides the Exception thrown from NameNode

2017-03-13 Thread Jing Zhao (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-11395?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jing Zhao updated HDFS-11395:
-
   Resolution: Fixed
 Hadoop Flags: Reviewed
Fix Version/s: 2.9.0
   Status: Resolved  (was: Patch Available)

I've committed the patch. Thanks for the contribution, [~nandakumar131]! Thanks 
for the review, [~arpitagarwal]!

> RequestHedgingProxyProvider#RequestHedgingInvocationHandler hides the 
> Exception thrown from NameNode
> 
>
> Key: HDFS-11395
> URL: https://issues.apache.org/jira/browse/HDFS-11395
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: ha
>Reporter: Nandakumar
>Assignee: Nandakumar
> Fix For: 2.9.0
>
> Attachments: HDFS-11395.000.patch, HDFS-11395.001.patch, 
> HDFS-11395.002.patch, HDFS-11395.003.patch, HDFS-11395.004.patch, 
> HDFS-11395.005.patch
>
>
> When using RequestHedgingProxyProvider, in case of Exception (like 
> FileNotFoundException) from ActiveNameNode, 
> {{RequestHedgingProxyProvider#RequestHedgingInvocationHandler.invoke}} 
> receives {{ExecutionException}} since we use {{CompletionService}} for the 
> call. The ExecutionException is put into a map and wrapped with 
> {{MultiException}}.
> So for a FileNotFoundException the client receives 
> {{MultiException(Map(ExecutionException(InvocationTargetException(RemoteException(FileNotFoundException)}}
> It will cause problem in clients which are handling RemoteExceptions.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-11395) RequestHedgingProxyProvider#RequestHedgingInvocationHandler hides the Exception thrown from NameNode

2017-03-13 Thread Jing Zhao (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-11395?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15922957#comment-15922957
 ] 

Jing Zhao commented on HDFS-11395:
--

+1. I will commit the patch shortly.

> RequestHedgingProxyProvider#RequestHedgingInvocationHandler hides the 
> Exception thrown from NameNode
> 
>
> Key: HDFS-11395
> URL: https://issues.apache.org/jira/browse/HDFS-11395
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: ha
>Reporter: Nandakumar
>Assignee: Nandakumar
> Attachments: HDFS-11395.000.patch, HDFS-11395.001.patch, 
> HDFS-11395.002.patch, HDFS-11395.003.patch, HDFS-11395.004.patch, 
> HDFS-11395.005.patch
>
>
> When using RequestHedgingProxyProvider, in case of Exception (like 
> FileNotFoundException) from ActiveNameNode, 
> {{RequestHedgingProxyProvider#RequestHedgingInvocationHandler.invoke}} 
> receives {{ExecutionException}} since we use {{CompletionService}} for the 
> call. The ExecutionException is put into a map and wrapped with 
> {{MultiException}}.
> So for a FileNotFoundException the client receives 
> {{MultiException(Map(ExecutionException(InvocationTargetException(RemoteException(FileNotFoundException)}}
> It will cause problem in clients which are handling RemoteExceptions.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-11395) RequestHedgingProxyProvider#RequestHedgingInvocationHandler hides the Exception thrown from NameNode

2017-03-07 Thread Jing Zhao (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-11395?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15899906#comment-15899906
 ] 

Jing Zhao commented on HDFS-11395:
--

hmmm looks like the testing node has not got fixed yet...

> RequestHedgingProxyProvider#RequestHedgingInvocationHandler hides the 
> Exception thrown from NameNode
> 
>
> Key: HDFS-11395
> URL: https://issues.apache.org/jira/browse/HDFS-11395
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: ha
>Reporter: Nandakumar
>Assignee: Nandakumar
> Attachments: HDFS-11395.000.patch, HDFS-11395.001.patch, 
> HDFS-11395.002.patch, HDFS-11395.003.patch, HDFS-11395.004.patch, 
> HDFS-11395.005.patch
>
>
> When using RequestHedgingProxyProvider, in case of Exception (like 
> FileNotFoundException) from ActiveNameNode, 
> {{RequestHedgingProxyProvider#RequestHedgingInvocationHandler.invoke}} 
> receives {{ExecutionException}} since we use {{CompletionService}} for the 
> call. The ExecutionException is put into a map and wrapped with 
> {{MultiException}}.
> So for a FileNotFoundException the client receives 
> {{MultiException(Map(ExecutionException(InvocationTargetException(RemoteException(FileNotFoundException)}}
> It will cause problem in clients which are handling RemoteExceptions.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-11395) RequestHedgingProxyProvider#RequestHedgingInvocationHandler hides the Exception thrown from NameNode

2017-03-07 Thread Jing Zhao (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-11395?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15899900#comment-15899900
 ] 

Jing Zhao commented on HDFS-11395:
--

The 005 patch looks good to me. Let me trigger the Jenkins.

> RequestHedgingProxyProvider#RequestHedgingInvocationHandler hides the 
> Exception thrown from NameNode
> 
>
> Key: HDFS-11395
> URL: https://issues.apache.org/jira/browse/HDFS-11395
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: ha
>Reporter: Nandakumar
>Assignee: Nandakumar
> Attachments: HDFS-11395.000.patch, HDFS-11395.001.patch, 
> HDFS-11395.002.patch, HDFS-11395.003.patch, HDFS-11395.004.patch, 
> HDFS-11395.005.patch
>
>
> When using RequestHedgingProxyProvider, in case of Exception (like 
> FileNotFoundException) from ActiveNameNode, 
> {{RequestHedgingProxyProvider#RequestHedgingInvocationHandler.invoke}} 
> receives {{ExecutionException}} since we use {{CompletionService}} for the 
> call. The ExecutionException is put into a map and wrapped with 
> {{MultiException}}.
> So for a FileNotFoundException the client receives 
> {{MultiException(Map(ExecutionException(InvocationTargetException(RemoteException(FileNotFoundException)}}
> It will cause problem in clients which are handling RemoteExceptions.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-11445) FSCK shows overall health stauts as corrupt even one replica is corrupt

2017-03-06 Thread Jing Zhao (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-11445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15898236#comment-15898236
 ] 

Jing Zhao commented on HDFS-11445:
--

Thanks for working on this, [~brahmareddy]!

I think you just find a scenario that the following inconsistency happens:
{code:title=BlockManager#createLocatedBlock}
NumberReplicas numReplicas = countNodes(blk);
final int numCorruptNodes = numReplicas.corruptReplicas();
final int numCorruptReplicas = corruptReplicas.numCorruptReplicas(blk);
if (numCorruptNodes != numCorruptReplicas) {
  LOG.warn("Inconsistent number of corrupt replicas for "
  + blk + " blockMap has " + numCorruptNodes
  + " but corrupt replicas map has " + numCorruptReplicas);
}
{code}

I also did some debugging using your unit test. Looks like the root cause for 
this inconsistency is: {{BlockInfo#setGenerationStampAndVerifyReplicas}} may 
remove a datanode storage from the block's storage list, but still leave the 
storage in the CorruptReplicasMap.

This inconsistency later can be fixed automatically, e.g., by a full block 
report. But maybe we should consider using 
{{BlockManager#removeStoredBlock(BlockInfo, DatanodeDescriptor)}} to remove all 
the records related to the block-dn pair.

> FSCK shows overall health stauts as corrupt even one replica is corrupt
> ---
>
> Key: HDFS-11445
> URL: https://issues.apache.org/jira/browse/HDFS-11445
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Brahma Reddy Battula
>Assignee: Brahma Reddy Battula
> Attachments: HDFS-11445.patch
>
>
> In the following scenario,FSCK shows overall health status as corrupt even 
> it's has one good replica.
> 1. Create file with 2 RF.
> 2. Shutdown one DN
> 3. Append to file again. 
> 4. Restart the DN
> 5. After block report, check Fsck



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-11395) RequestHedgingProxyProvider#RequestHedgingInvocationHandler hides the Exception thrown from NameNode

2017-03-06 Thread Jing Zhao (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-11395?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15897830#comment-15897830
 ] 

Jing Zhao commented on HDFS-11395:
--

The 004 patch looks good to me. Just some minors:
# Here since we're sure e is a MultiException, we can directly catch 
"MultiException e".
{code}
380 } catch (Exception e) {
381   for (Exception ex : ((MultiException)e).getExceptions().values()) 
{
{code}
# The following code can be simplified as "Assert.assertTrue("..", rEx 
instanceof StandbyException)"
{code}
  if (rEx instanceof StandbyException) {
continue;
  } else {
Assert.fail("Unexpected RemoteException: " + rEx.getMessage());
  }
{code}
# "@param ex" can be removed.
{code}
   * @param ex
   * @return unwrapped exception
   */
  private Exception unwrapException(Exception ex) {
{code}

> RequestHedgingProxyProvider#RequestHedgingInvocationHandler hides the 
> Exception thrown from NameNode
> 
>
> Key: HDFS-11395
> URL: https://issues.apache.org/jira/browse/HDFS-11395
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: ha
>Reporter: Nandakumar
>Assignee: Nandakumar
> Attachments: HDFS-11395.000.patch, HDFS-11395.001.patch, 
> HDFS-11395.002.patch, HDFS-11395.003.patch, HDFS-11395.004.patch
>
>
> When using RequestHedgingProxyProvider, in case of Exception (like 
> FileNotFoundException) from ActiveNameNode, 
> {{RequestHedgingProxyProvider#RequestHedgingInvocationHandler.invoke}} 
> receives {{ExecutionException}} since we use {{CompletionService}} for the 
> call. The ExecutionException is put into a map and wrapped with 
> {{MultiException}}.
> So for a FileNotFoundException the client receives 
> {{MultiException(Map(ExecutionException(InvocationTargetException(RemoteException(FileNotFoundException)}}
> It will cause problem in clients which are handling RemoteExceptions.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-11476) Fix NPE in FsDatasetImpl#checkAndUpdate

2017-03-03 Thread Jing Zhao (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-11476?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jing Zhao updated HDFS-11476:
-
  Resolution: Fixed
Hadoop Flags: Reviewed
   Fix Version/s: 2.9.0
Target Version/s:   (was: 2.8.0)
  Status: Resolved  (was: Patch Available)

I've committed this to trunk and branch-2.  Thanks [~xiaobingo] for the fix! 
Thanks [~arpitagarwal] and [~liuml07] for the review.

> Fix NPE in FsDatasetImpl#checkAndUpdate
> ---
>
> Key: HDFS-11476
> URL: https://issues.apache.org/jira/browse/HDFS-11476
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Reporter: Xiaobing Zhou
>Assignee: Xiaobing Zhou
> Fix For: 2.9.0
>
> Attachments: HDFS-11476.000.patch, HDFS-11476.001.patch, 
> HDFS-11476.002.patch, HDFS-11476.003.patch
>
>
> diskMetaFile can be null and passed to compareTo which dereferences it, 
> causing NPE
> {code}
> // Compare generation stamp
>   if (memBlockInfo.getGenerationStamp() != diskGS) {
> File memMetaFile = FsDatasetUtil.getMetaFile(diskFile, 
> memBlockInfo.getGenerationStamp());
> if (memMetaFile.exists()) {
>   if (memMetaFile.compareTo(diskMetaFile) != 0) {
> LOG.warn("Metadata file in memory "
> + memMetaFile.getAbsolutePath()
> + " does not match file found by scan "
> + (diskMetaFile == null? null: 
> diskMetaFile.getAbsolutePath()));
>   }
> } else {
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-11395) RequestHedgingProxyProvider#RequestHedgingInvocationHandler hides the Exception thrown from NameNode

2017-03-03 Thread Jing Zhao (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-11395?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15894925#comment-15894925
 ] 

Jing Zhao commented on HDFS-11395:
--

Thanks for updating the patch, [~nandakumar131]. Some further comments on the 
latest patch:
# In RetryAction, the usage of the new member field {{exception}} is for FAIL 
case. Thus maybe we can:
#* rename "exception" to failException
#* assign a value to this field only when the action is FAIL:
{code}
+  Exception ex = null;
{code}
# Looks like we should not only get {{RemoteException}} out of {{ex}}, but more 
generally get the cause of other types of exceptions. Note exceptions like 
ConnectException and EOFException should also be exposed to retry policies.
{code}
RemoteException rEx = getRemoteException(ex);
if(rEx != null) {
  badResults.put(tProxyInfo.proxyInfo, rEx);
} else {
  badResults.put(tProxyInfo.proxyInfo, ex);
}
{code}
# It will be helpful if we can also have a unit test for the above 
ConnectException/EOFException case.
# Need to fix indentation, line length etc. for 
{{testHedgingWhenFileNotFoundException}}.

> RequestHedgingProxyProvider#RequestHedgingInvocationHandler hides the 
> Exception thrown from NameNode
> 
>
> Key: HDFS-11395
> URL: https://issues.apache.org/jira/browse/HDFS-11395
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: ha
>Reporter: Nandakumar
>Assignee: Nandakumar
> Attachments: HDFS-11395.000.patch, HDFS-11395.001.patch, 
> HDFS-11395.002.patch
>
>
> When using RequestHedgingProxyProvider, in case of Exception (like 
> FileNotFoundException) from ActiveNameNode, 
> {{RequestHedgingProxyProvider#RequestHedgingInvocationHandler.invoke}} 
> receives {{ExecutionException}} since we use {{CompletionService}} for the 
> call. The ExecutionException is put into a map and wrapped with 
> {{MultiException}}.
> So for a FileNotFoundException the client receives 
> {{MultiException(Map(ExecutionException(InvocationTargetException(RemoteException(FileNotFoundException)}}
> It will cause problem in clients which are handling RemoteExceptions.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-11395) RequestHedgingProxyProvider#RequestHedgingInvocationHandler hides the Exception thrown from NameNode

2017-03-02 Thread Jing Zhao (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-11395?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15893227#comment-15893227
 ] 

Jing Zhao commented on HDFS-11395:
--

bq. in case of non RemoteException from ExecutionException, what should be done

That should be OK. The current retry policies are supposed to handle all 
exceptions including non RemoteException. E.g., 
{{FailoverOnNetworkExceptionRetry}} handles {{ConnectException}}, 
{{EOFException}} etc.

bq. In that case, is it ok to add an additional field to 
RetryInvocationHandler#RetryInfo for holding Exception

Yes, sounds good to me.

> RequestHedgingProxyProvider#RequestHedgingInvocationHandler hides the 
> Exception thrown from NameNode
> 
>
> Key: HDFS-11395
> URL: https://issues.apache.org/jira/browse/HDFS-11395
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: ha
>Reporter: Nandakumar
>Assignee: Nandakumar
> Attachments: HDFS-11395.000.patch, HDFS-11395.001.patch
>
>
> When using RequestHedgingProxyProvider, in case of Exception (like 
> FileNotFoundException) from ActiveNameNode, 
> {{RequestHedgingProxyProvider#RequestHedgingInvocationHandler.invoke}} 
> receives {{ExecutionException}} since we use {{CompletionService}} for the 
> call. The ExecutionException is put into a map and wrapped with 
> {{MultiException}}.
> So for a FileNotFoundException the client receives 
> {{MultiException(Map(ExecutionException(InvocationTargetException(RemoteException(FileNotFoundException)}}
> It will cause problem in clients which are handling RemoteExceptions.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Comment Edited] (HDFS-11395) RequestHedgingProxyProvider#RequestHedgingInvocationHandler hides the Exception thrown from NameNode

2017-03-01 Thread Jing Zhao (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-11395?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15890867#comment-15890867
 ] 

Jing Zhao edited comment on HDFS-11395 at 3/1/17 7:27 PM:
--

Thanks for working on this, [~nandakumar131]. I agree we should not directly 
throw a MultiException. But I have similar concern as Arpit, i.e., we should 
not simply throw the first exception. I think we should
# Not mix detailed exception handling logic into 
{{RequestHedgingProxyProvider}}. In {{RequestHedgingProxyProvider}}, we only 
need to get the RemoteException from {{ExecutionException}}, and put all the 
exceptions into {{badResults}}. No need for special handling for 
StandbyException etc there. These should be handled by 
{{RetryInvocationHandler#newRetryInfo}}.
# Then in {{RetryInvocationHandler#newRetryInfo}}, we should let this method 
return both the RetryInfo and the exception to throw from the MultiException. 
These two information should comes from the same internal exception inside of 
the MultiException.


was (Author: jingzhao):
Thanks for working on this, [~nandakumar131]. I agree we should not directly 
throw a MultiException. But I have similar concern as Arpit, i.e., we should 
not simply throw the first exception. I think we should
# Not mix detailed exception handling logic into 
{{RequestHedgingProxyProvider}}. In {{RequestHedgingProxyProvider}}, we only 
need to get the RemoteException from {{ExecutionException}}, and put all the 
exceptions into {{badResults}}. No need for special handling for 
StandbyException etc there. These should be handled by 
{{RetryInvocationHandler#newRetryInfo}}.
# Then in {{RetryInvocationHandler#newRetryInfo}}, we should let this method 
return both the RetryInfo and the exception to throw from the MultiException. 
These two information should comes from the same internal exception inside of 
the MultiException.

> RequestHedgingProxyProvider#RequestHedgingInvocationHandler hides the 
> Exception thrown from NameNode
> 
>
> Key: HDFS-11395
> URL: https://issues.apache.org/jira/browse/HDFS-11395
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: ha
>Reporter: Nandakumar
>Assignee: Nandakumar
> Attachments: HDFS-11395.000.patch, HDFS-11395.001.patch
>
>
> When using RequestHedgingProxyProvider, in case of Exception (like 
> FileNotFoundException) from ActiveNameNode, 
> {{RequestHedgingProxyProvider#RequestHedgingInvocationHandler.invoke}} 
> receives {{ExecutionException}} since we use {{CompletionService}} for the 
> call. The ExecutionException is put into a map and wrapped with 
> {{MultiException}}.
> So for a FileNotFoundException the client receives 
> {{MultiException(Map(ExecutionException(InvocationTargetException(RemoteException(FileNotFoundException)}}
> It will cause problem in clients which are handling RemoteExceptions.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-11395) RequestHedgingProxyProvider#RequestHedgingInvocationHandler hides the Exception thrown from NameNode

2017-03-01 Thread Jing Zhao (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-11395?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15890867#comment-15890867
 ] 

Jing Zhao commented on HDFS-11395:
--

Thanks for working on this, [~nandakumar131]. I agree we should not directly 
throw a MultiException. But I have similar concern as Arpit, i.e., we should 
not simply throw the first exception. I think we should
# Not mix detailed exception handling logic into 
{{RequestHedgingProxyProvider}}. In {{RequestHedgingProxyProvider}}, we only 
need to get the RemoteException from {{ExecutionException}}, and put all the 
exceptions into {{badResults}}. No need for special handling for 
StandbyException etc there. These should be handled by 
{{RetryInvocationHandler#newRetryInfo}}.
# Then in {{RetryInvocationHandler#newRetryInfo}}, we should let this method 
return both the RetryInfo and the exception to throw from the MultiException. 
These two information should comes from the same internal exception inside of 
the MultiException.

> RequestHedgingProxyProvider#RequestHedgingInvocationHandler hides the 
> Exception thrown from NameNode
> 
>
> Key: HDFS-11395
> URL: https://issues.apache.org/jira/browse/HDFS-11395
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: ha
>Reporter: Nandakumar
>Assignee: Nandakumar
> Attachments: HDFS-11395.000.patch, HDFS-11395.001.patch
>
>
> When using RequestHedgingProxyProvider, in case of Exception (like 
> FileNotFoundException) from ActiveNameNode, 
> {{RequestHedgingProxyProvider#RequestHedgingInvocationHandler.invoke}} 
> receives {{ExecutionException}} since we use {{CompletionService}} for the 
> call. The ExecutionException is put into a map and wrapped with 
> {{MultiException}}.
> So for a FileNotFoundException the client receives 
> {{MultiException(Map(ExecutionException(InvocationTargetException(RemoteException(FileNotFoundException)}}
> It will cause problem in clients which are handling RemoteExceptions.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-8498) Blocks can be committed with wrong size

2017-02-24 Thread Jing Zhao (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-8498?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jing Zhao updated HDFS-8498:

Attachment: HDFS-8498.branch-2.001.patch

Thanks for the review, [~jojochuang]. Update the branch-2 patch to address your 
comments.

> Blocks can be committed with wrong size
> ---
>
> Key: HDFS-8498
> URL: https://issues.apache.org/jira/browse/HDFS-8498
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.5.0
>Reporter: Daryn Sharp
>Assignee: Jing Zhao
>Priority: Critical
> Fix For: 3.0.0-alpha3
>
> Attachments: HDFS-8498.000.patch, HDFS-8498.001.patch, 
> HDFS-8498.branch-2.001.patch, HDFS-8498.branch-2.patch
>
>
> When an IBR for a UC block arrives, the NN updates the expected location's 
> block and replica state _only_ if it's on an unexpected storage for an 
> expected DN.  If it's for an expected storage, only the genstamp is updated.  
> When the block is committed, and the expected locations are verified, only 
> the genstamp is checked.  The size is not checked but it wasn't updated in 
> the expected locations anyway.
> A faulty client may misreport the size when committing the block.  The block 
> is effectively corrupted.  If the NN issues replications, the received IBR is 
> considered corrupt, the NN invalidates the block, immediately issues another 
> replication.  The NN eventually realizes all the original replicas are 
> corrupt after full BRs are received from the original DNs.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-4025) QJM: Sychronize past log segments to JNs that missed them

2017-02-22 Thread Jing Zhao (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-4025?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jing Zhao updated HDFS-4025:

   Resolution: Fixed
 Hadoop Flags: Reviewed
Fix Version/s: 3.0.0-alpha3
   Status: Resolved  (was: Patch Available)

I've committed the patch to trunk. Thanks for the contribution Hanisha!

> QJM: Sychronize past log segments to JNs that missed them
> -
>
> Key: HDFS-4025
> URL: https://issues.apache.org/jira/browse/HDFS-4025
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: ha
>Affects Versions: QuorumJournalManager (HDFS-3077)
>Reporter: Todd Lipcon
>Assignee: Hanisha Koneru
> Fix For: 3.0.0-alpha3, QuorumJournalManager (HDFS-3077)
>
> Attachments: HDFS-4025.000.patch, HDFS-4025.001.patch, 
> HDFS-4025.002.patch, HDFS-4025.003.patch, HDFS-4025.004.patch, 
> HDFS-4025.005.patch, HDFS-4025.006.patch, HDFS-4025.007.patch, 
> HDFS-4025.008.patch, HDFS-4025.009.patch, HDFS-4025.010.patch, 
> HDFS-4025.011.patch
>
>
> Currently, if a JournalManager crashes and misses some segment of logs, and 
> then comes back, it will be re-added as a valid part of the quorum on the 
> next log roll. However, it will not have a complete history of log segments 
> (i.e any individual JN may have gaps in its transaction history). This 
> mirrors the behavior of the NameNode when there are multiple local 
> directories specified.
> However, it would be better if a background thread noticed these gaps and 
> "filled them in" by grabbing the segments from other JournalNodes. This 
> increases the resilience of the system when JournalNodes get reformatted or 
> otherwise lose their local disk.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-4025) QJM: Sychronize past log segments to JNs that missed them

2017-02-22 Thread Jing Zhao (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-4025?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15879572#comment-15879572
 ] 

Jing Zhao commented on HDFS-4025:
-

The latest patch looks good to me. +1. I will commit the patch shortly.

[~hanishakoneru], please create another jira to address the remaining issues as 
we discussed.

> QJM: Sychronize past log segments to JNs that missed them
> -
>
> Key: HDFS-4025
> URL: https://issues.apache.org/jira/browse/HDFS-4025
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: ha
>Affects Versions: QuorumJournalManager (HDFS-3077)
>Reporter: Todd Lipcon
>Assignee: Hanisha Koneru
> Fix For: QuorumJournalManager (HDFS-3077)
>
> Attachments: HDFS-4025.000.patch, HDFS-4025.001.patch, 
> HDFS-4025.002.patch, HDFS-4025.003.patch, HDFS-4025.004.patch, 
> HDFS-4025.005.patch, HDFS-4025.006.patch, HDFS-4025.007.patch, 
> HDFS-4025.008.patch, HDFS-4025.009.patch, HDFS-4025.010.patch, 
> HDFS-4025.011.patch
>
>
> Currently, if a JournalManager crashes and misses some segment of logs, and 
> then comes back, it will be re-added as a valid part of the quorum on the 
> next log roll. However, it will not have a complete history of log segments 
> (i.e any individual JN may have gaps in its transaction history). This 
> mirrors the behavior of the NameNode when there are multiple local 
> directories specified.
> However, it would be better if a background thread noticed these gaps and 
> "filled them in" by grabbing the segments from other JournalNodes. This 
> increases the resilience of the system when JournalNodes get reformatted or 
> otherwise lose their local disk.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-11402) HDFS Snapshots should capture point-in-time copies of OPEN files

2017-02-17 Thread Jing Zhao (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-11402?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15872794#comment-15872794
 ] 

Jing Zhao commented on HDFS-11402:
--

bq. We have the same problem with parallel readers when there is an ongoing 
write.

This is not true. The current DFSInputstream has the capability to talk to 
datanodes and learn the current length.

> HDFS Snapshots should capture point-in-time copies of OPEN files
> 
>
> Key: HDFS-11402
> URL: https://issues.apache.org/jira/browse/HDFS-11402
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs
>Affects Versions: 2.6.0
>Reporter: Manoj Govindassamy
>Assignee: Manoj Govindassamy
> Attachments: HDFS-11402.01.patch, HDFS-11402.02.patch
>
>
> *Problem:*
> 1. When there are files being written and when HDFS Snapshots are taken in 
> parallel, Snapshots do capture all these files, but these being written files 
> in Snapshots do not have the point-in-time file length captured. That is, 
> these open files are not frozen in HDFS Snapshots. These open files 
> grow/shrink in length, just like the original file, even after the snapshot 
> time.
> 2. At the time of File close or any other meta data modification operation on 
> these files, HDFS reconciles the file length and records the modification in 
> the last taken Snapshot. All the previously taken Snapshots continue to have 
> those open Files with no modification recorded. So, all those previous 
> snapshots end up using the final modification record in the last snapshot. 
> Thus after the file close, file lengths in all those snapshots will end up 
> same.
> Assume File1 is opened for write and a total of 1MB written to it. While the 
> writes are happening, snapshots are taken in parallel.
> {noformat}
> |---Time---T1---T2-T3T4-->
> |---Snap1--Snap2-Snap3--->
> |---File1.open---write-write---close->
> {noformat}
> Then at time,
> T2:
> Snap1.File1.length = 0
> T3:
> Snap1.File1.length = 0
> Snap2.File1.length = 0
> 
> T4:
> Snap1.File1.length = 1MB
> Snap2.File1.length = 1MB
> Snap3.File1.length = 1MB
> *Proposal*
> 1. At the time of taking Snapshot, {{SnapshotManager#createSnapshot}} can 
> optionally request {{DirectorySnapshottableFeature#addSnapshot}} to freeze 
> open files. 
> 2. {{DirectorySnapshottableFeature#addSnapshot}} can consult with 
> {{LeaseManager}} and get a list INodesInPath for all open files under the 
> snapshot dir. 
> 3. {{DirectorySnapshottableFeature#addSnapshot}} after the Snapshot creation, 
> Diff creation and updating modification time, can invoke 
> {{INodeFile#recordModification}} for each of the open files. This way, the 
> Snapshot just taken will have a {{FileDiff}} with {{fileSize}} captured for 
> each of the open files. 
> 4. Above model follows the current Snapshot and Diff protocols and doesn't 
> introduce any any disk formats. So, I don't think we will be needing any new 
> FSImage Loader/Saver changes for Snapshots.
> 5. One of the design goals of HDFS Snapshot was ability to take any number of 
> snapshots in O(1) time. LeaseManager though has all the open files with 
> leases in-memory map, an iteration is still needed to prune the needed open 
> files and then run recordModification on each of them. So, it will not be a 
> strict O(1) with the above proposal. But, its going be a marginal increase 
> only as the new order will be of O(open_files_under_snap_dir). In order to 
> avoid HDFS Snapshots change in behavior for open files and avoid change in 
> time complexity, this improvement can be made under a new config 
> {{"dfs.namenode.snapshot.freeze.openfiles"}} which by default can be 
> {{false}}.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-11402) HDFS Snapshots should capture point-in-time copies of OPEN files

2017-02-17 Thread Jing Zhao (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-11402?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15872739#comment-15872739
 ] 

Jing Zhao commented on HDFS-11402:
--

Thanks for working on this, [~manojg]. Yes the length of open files is a known 
issue for the current snapshot implementation. As you mentioned in the 
description, the current semantic is to capture a length in snapshot which is 
>= the real length when the snapshot was created. This behavior kind of breaks 
the read-only semantic.

I think the key challenge here is how to let NN know the lengths of open files. 
Currently the length of an open file is updated on the NN only when 1) the 
first time hflush is called, or 2) hflush/hsync is called along with the 
UPDATE_LENGTH flag. Thus if a file is being written, the file length on the NN 
side (let's call it {{l_n}}) is usually a lot less than the length seen by the 
DN/client.

If we choose to record {{l_n}} in the snapshot, then later we may have risk to 
lose data (from client's point of view). E.g., a user wrote 100MB data and took 
a snapshot. The {{l_n}} at that time might be only 1MB or even 0. Later if the 
user deletes the file she will expect ~100MB still kept in the snapshotted 
file, instead of 1MB or an empty file. At this time, from the safety point of 
view, maybe the semantic of the current snapshot implementation is better.

So before we update the NN side logic about capturing the file length in 
snapshots, I think we first need to solve the problem about how to report the 
length of open files to NN (e.g., maybe utilizing the DN heartbeats or some 
other ways).

> HDFS Snapshots should capture point-in-time copies of OPEN files
> 
>
> Key: HDFS-11402
> URL: https://issues.apache.org/jira/browse/HDFS-11402
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs
>Affects Versions: 2.6.0
>Reporter: Manoj Govindassamy
>Assignee: Manoj Govindassamy
> Attachments: HDFS-11402.01.patch, HDFS-11402.02.patch
>
>
> *Problem:*
> 1. When there are files being written and when HDFS Snapshots are taken in 
> parallel, Snapshots do capture all these files, but these being written files 
> in Snapshots do not have the point-in-time file length captured. That is, 
> these open files are not frozen in HDFS Snapshots. These open files 
> grow/shrink in length, just like the original file, even after the snapshot 
> time.
> 2. At the time of File close or any other meta data modification operation on 
> these files, HDFS reconciles the file length and records the modification in 
> the last taken Snapshot. All the previously taken Snapshots continue to have 
> those open Files with no modification recorded. So, all those previous 
> snapshots end up using the final modification record in the last snapshot. 
> Thus after the file close, file lengths in all those snapshots will end up 
> same.
> Assume File1 is opened for write and a total of 1MB written to it. While the 
> writes are happening, snapshots are taken in parallel.
> {noformat}
> |---Time---T1---T2-T3T4-->
> |---Snap1--Snap2-Snap3--->
> |---File1.open---write-write---close->
> {noformat}
> Then at time,
> T2:
> Snap1.File1.length = 0
> T3:
> Snap1.File1.length = 0
> Snap2.File1.length = 0
> 
> T4:
> Snap1.File1.length = 1MB
> Snap2.File1.length = 1MB
> Snap3.File1.length = 1MB
> *Proposal*
> 1. At the time of taking Snapshot, {{SnapshotManager#createSnapshot}} can 
> optionally request {{DirectorySnapshottableFeature#addSnapshot}} to freeze 
> open files. 
> 2. {{DirectorySnapshottableFeature#addSnapshot}} can consult with 
> {{LeaseManager}} and get a list INodesInPath for all open files under the 
> snapshot dir. 
> 3. {{DirectorySnapshottableFeature#addSnapshot}} after the Snapshot creation, 
> Diff creation and updating modification time, can invoke 
> {{INodeFile#recordModification}} for each of the open files. This way, the 
> Snapshot just taken will have a {{FileDiff}} with {{fileSize}} captured for 
> each of the open files. 
> 4. Above model follows the current Snapshot and Diff protocols and doesn't 
> introduce any any disk formats. So, I don't think we will be needing any new 
> FSImage Loader/Saver changes for Snapshots.
> 5. One of the design goals of HDFS Snapshot was ability to take any number of 
> snapshots in O(1) time. LeaseManager though has all the open files with 
> leases in-memory map, an iteration is still needed to prune the needed open 
> files and then run recordModification on each of them. So, it will not be a 
> strict O(1) with the above proposal. But, its going be a marginal increase 
> only as the new order will be of O(open_files_under_snap_dir). In order to

[jira] [Reopened] (HDFS-8498) Blocks can be committed with wrong size

2017-02-16 Thread Jing Zhao (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-8498?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jing Zhao reopened HDFS-8498:
-

Reopen for the branch-2 patch.

> Blocks can be committed with wrong size
> ---
>
> Key: HDFS-8498
> URL: https://issues.apache.org/jira/browse/HDFS-8498
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.5.0
>Reporter: Daryn Sharp
>Assignee: Jing Zhao
>Priority: Critical
> Fix For: 3.0.0-alpha3
>
> Attachments: HDFS-8498.000.patch, HDFS-8498.001.patch, 
> HDFS-8498.branch-2.patch
>
>
> When an IBR for a UC block arrives, the NN updates the expected location's 
> block and replica state _only_ if it's on an unexpected storage for an 
> expected DN.  If it's for an expected storage, only the genstamp is updated.  
> When the block is committed, and the expected locations are verified, only 
> the genstamp is checked.  The size is not checked but it wasn't updated in 
> the expected locations anyway.
> A faulty client may misreport the size when committing the block.  The block 
> is effectively corrupted.  If the NN issues replications, the received IBR is 
> considered corrupt, the NN invalidates the block, immediately issues another 
> replication.  The NN eventually realizes all the original replicas are 
> corrupt after full BRs are received from the original DNs.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-8498) Blocks can be committed with wrong size

2017-02-16 Thread Jing Zhao (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-8498?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jing Zhao updated HDFS-8498:

Attachment: HDFS-8498.branch-2.patch

Upload the patch for branch-2

> Blocks can be committed with wrong size
> ---
>
> Key: HDFS-8498
> URL: https://issues.apache.org/jira/browse/HDFS-8498
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.5.0
>Reporter: Daryn Sharp
>Assignee: Jing Zhao
>Priority: Critical
> Fix For: 3.0.0-alpha3
>
> Attachments: HDFS-8498.000.patch, HDFS-8498.001.patch, 
> HDFS-8498.branch-2.patch
>
>
> When an IBR for a UC block arrives, the NN updates the expected location's 
> block and replica state _only_ if it's on an unexpected storage for an 
> expected DN.  If it's for an expected storage, only the genstamp is updated.  
> When the block is committed, and the expected locations are verified, only 
> the genstamp is checked.  The size is not checked but it wasn't updated in 
> the expected locations anyway.
> A faulty client may misreport the size when committing the block.  The block 
> is effectively corrupted.  If the NN issues replications, the received IBR is 
> considered corrupt, the NN invalidates the block, immediately issues another 
> replication.  The NN eventually realizes all the original replicas are 
> corrupt after full BRs are received from the original DNs.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-8498) Blocks can be committed with wrong size

2017-02-15 Thread Jing Zhao (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-8498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15868412#comment-15868412
 ] 

Jing Zhao commented on HDFS-8498:
-

[~jojochuang], currently I do not plan to backport this change to branch 2.x. 
But please feel free to do it if you think it's necessary and I will be happy 
to review.

> Blocks can be committed with wrong size
> ---
>
> Key: HDFS-8498
> URL: https://issues.apache.org/jira/browse/HDFS-8498
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.5.0
>Reporter: Daryn Sharp
>Assignee: Jing Zhao
>Priority: Critical
> Fix For: 3.0.0-alpha3
>
> Attachments: HDFS-8498.000.patch, HDFS-8498.001.patch
>
>
> When an IBR for a UC block arrives, the NN updates the expected location's 
> block and replica state _only_ if it's on an unexpected storage for an 
> expected DN.  If it's for an expected storage, only the genstamp is updated.  
> When the block is committed, and the expected locations are verified, only 
> the genstamp is checked.  The size is not checked but it wasn't updated in 
> the expected locations anyway.
> A faulty client may misreport the size when committing the block.  The block 
> is effectively corrupted.  If the NN issues replications, the received IBR is 
> considered corrupt, the NN invalidates the block, immediately issues another 
> replication.  The NN eventually realizes all the original replicas are 
> corrupt after full BRs are received from the original DNs.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-8498) Blocks can be committed with wrong size

2017-02-15 Thread Jing Zhao (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-8498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15868333#comment-15868333
 ] 

Jing Zhao commented on HDFS-8498:
-

I've committed the patch into trunk.

> Blocks can be committed with wrong size
> ---
>
> Key: HDFS-8498
> URL: https://issues.apache.org/jira/browse/HDFS-8498
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.5.0
>Reporter: Daryn Sharp
>Assignee: Jing Zhao
>Priority: Critical
> Fix For: 3.0.0-alpha3
>
> Attachments: HDFS-8498.000.patch, HDFS-8498.001.patch
>
>
> When an IBR for a UC block arrives, the NN updates the expected location's 
> block and replica state _only_ if it's on an unexpected storage for an 
> expected DN.  If it's for an expected storage, only the genstamp is updated.  
> When the block is committed, and the expected locations are verified, only 
> the genstamp is checked.  The size is not checked but it wasn't updated in 
> the expected locations anyway.
> A faulty client may misreport the size when committing the block.  The block 
> is effectively corrupted.  If the NN issues replications, the received IBR is 
> considered corrupt, the NN invalidates the block, immediately issues another 
> replication.  The NN eventually realizes all the original replicas are 
> corrupt after full BRs are received from the original DNs.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-8498) Blocks can be committed with wrong size

2017-02-15 Thread Jing Zhao (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-8498?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jing Zhao updated HDFS-8498:

   Resolution: Fixed
 Hadoop Flags: Reviewed
Fix Version/s: 3.0.0-alpha3
   Status: Resolved  (was: Patch Available)

Thanks for the review, [~jnp]! I will commit the patch shortly.

> Blocks can be committed with wrong size
> ---
>
> Key: HDFS-8498
> URL: https://issues.apache.org/jira/browse/HDFS-8498
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.5.0
>Reporter: Daryn Sharp
>Assignee: Jing Zhao
>Priority: Critical
> Fix For: 3.0.0-alpha3
>
> Attachments: HDFS-8498.000.patch, HDFS-8498.001.patch
>
>
> When an IBR for a UC block arrives, the NN updates the expected location's 
> block and replica state _only_ if it's on an unexpected storage for an 
> expected DN.  If it's for an expected storage, only the genstamp is updated.  
> When the block is committed, and the expected locations are verified, only 
> the genstamp is checked.  The size is not checked but it wasn't updated in 
> the expected locations anyway.
> A faulty client may misreport the size when committing the block.  The block 
> is effectively corrupted.  If the NN issues replications, the received IBR is 
> considered corrupt, the NN invalidates the block, immediately issues another 
> replication.  The NN eventually realizes all the original replicas are 
> corrupt after full BRs are received from the original DNs.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-8498) Blocks can be committed with wrong size

2017-02-08 Thread Jing Zhao (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-8498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15858371#comment-15858371
 ] 

Jing Zhao commented on HDFS-8498:
-

Do you also want to take a look at the patch, [~vinayrpet]?

> Blocks can be committed with wrong size
> ---
>
> Key: HDFS-8498
> URL: https://issues.apache.org/jira/browse/HDFS-8498
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.5.0
>Reporter: Daryn Sharp
>Assignee: Jing Zhao
>Priority: Critical
> Attachments: HDFS-8498.000.patch, HDFS-8498.001.patch
>
>
> When an IBR for a UC block arrives, the NN updates the expected location's 
> block and replica state _only_ if it's on an unexpected storage for an 
> expected DN.  If it's for an expected storage, only the genstamp is updated.  
> When the block is committed, and the expected locations are verified, only 
> the genstamp is checked.  The size is not checked but it wasn't updated in 
> the expected locations anyway.
> A faulty client may misreport the size when committing the block.  The block 
> is effectively corrupted.  If the NN issues replications, the received IBR is 
> considered corrupt, the NN invalidates the block, immediately issues another 
> replication.  The NN eventually realizes all the original replicas are 
> corrupt after full BRs are received from the original DNs.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-4025) QJM: Sychronize past log segments to JNs that missed them

2017-02-02 Thread Jing Zhao (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-4025?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15850633#comment-15850633
 ] 

Jing Zhao commented on HDFS-4025:
-

The failed unit test should be unrelated and has been reported in HDFS-10644.

In the meanwhile, the current patch may still hit an issue while HA upgrade is 
going on. If the segment downloading is happening while the admin tries to 
rollback, the deletion of the {{current}} directory may fail on Windows. As a 
fix we can disable the sync while there is {{prev}} directory on JN (which 
means the upgrade is still going on). Or we can download the segment first into 
another directory. 

Currently I'm thinking maybe we can disable this feature in the configuration 
by default, then use separate jiras to track remaining issues. This also allows 
us to do more testing. Thoughts?

> QJM: Sychronize past log segments to JNs that missed them
> -
>
> Key: HDFS-4025
> URL: https://issues.apache.org/jira/browse/HDFS-4025
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: ha
>Affects Versions: QuorumJournalManager (HDFS-3077)
>Reporter: Todd Lipcon
>Assignee: Hanisha Koneru
> Fix For: QuorumJournalManager (HDFS-3077)
>
> Attachments: HDFS-4025.000.patch, HDFS-4025.001.patch, 
> HDFS-4025.002.patch, HDFS-4025.003.patch, HDFS-4025.004.patch, 
> HDFS-4025.005.patch, HDFS-4025.006.patch, HDFS-4025.007.patch, 
> HDFS-4025.008.patch, HDFS-4025.009.patch, HDFS-4025.010.patch
>
>
> Currently, if a JournalManager crashes and misses some segment of logs, and 
> then comes back, it will be re-added as a valid part of the quorum on the 
> next log roll. However, it will not have a complete history of log segments 
> (i.e any individual JN may have gaps in its transaction history). This 
> mirrors the behavior of the NameNode when there are multiple local 
> directories specified.
> However, it would be better if a background thread noticed these gaps and 
> "filled them in" by grabbing the segments from other JournalNodes. This 
> increases the resilience of the system when JournalNodes get reformatted or 
> otherwise lose their local disk.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-4025) QJM: Sychronize past log segments to JNs that missed them

2017-02-01 Thread Jing Zhao (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-4025?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15849052#comment-15849052
 ] 

Jing Zhao commented on HDFS-4025:
-

Thanks for the updating the patch, [~hanishakoneru]. The latest patch looks 
pretty good to me. Some minor comments:
# In hdfs-default.xml, "i" --> "if"
{code}
+  dfs.journalnode.enable.sync
+  true
+  
+If true, the journal nodes wil sync with each other. The journal nodes
+will periodically gossip with other journal nodes to compare edit log
+manifests and i they detect any missing log segment, they will download
+it from the other journal nodes.
+  
+
{code}
# In JournalNodeSyncer.java, the following code will generate an 
{{UnsupportedOperationException}} since thisJournalEditLogs is an immutable 
list. In fact this add op can be skipped.
{code}
  if (success) {
thisJournalEditLogs.add(missingLog);
  }
{code}
# Maybe "Transferring" can be changed to "Downloading"?
{code}
LOG.info("Transferring Missing Edit Log from " + url + " to " + jnStorage
.getRoot());
{code}
# {{finalEditsFile}} should be {{tmpEditsFile}}.
{code}
LOG.info("Downloaded file " + tmpEditsFile.getName() + " size " +
finalEditsFile.length() + " bytes.");
{code}
# In {{TestJournalNodeSync}}, {{jid}} can be declared as final, and 
{{editLogExists}} can be private.
# For {{deleteEditLog}},  we can either change the while loop to an if, or 
refresh logFile instance within the while loop.
{code}
+   while (logFile.isInProgress()) {
+  dfsCluster.getNameNode(0).getRpcServer().rollEditLog();
{code}
# The following code can be simplified as "Assert.assertTrue("Couldn't delete 
edit log file", deleteFile.delete());"
{code}
+if (!deleteFile.delete()) {
+  assert false: "Couldn't delete edit log file";
+  return null;
+}
{code}
# In {{generateEditLog}}, let's also check the result of {{doAndEdit}}. I.e., 
we do "Assert.assertTrue(doAnEdit());"

> QJM: Sychronize past log segments to JNs that missed them
> -
>
> Key: HDFS-4025
> URL: https://issues.apache.org/jira/browse/HDFS-4025
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: ha
>Affects Versions: QuorumJournalManager (HDFS-3077)
>Reporter: Todd Lipcon
>Assignee: Hanisha Koneru
> Fix For: QuorumJournalManager (HDFS-3077)
>
> Attachments: HDFS-4025.000.patch, HDFS-4025.001.patch, 
> HDFS-4025.002.patch, HDFS-4025.003.patch, HDFS-4025.004.patch, 
> HDFS-4025.005.patch, HDFS-4025.006.patch, HDFS-4025.007.patch, 
> HDFS-4025.008.patch, HDFS-4025.009.patch
>
>
> Currently, if a JournalManager crashes and misses some segment of logs, and 
> then comes back, it will be re-added as a valid part of the quorum on the 
> next log roll. However, it will not have a complete history of log segments 
> (i.e any individual JN may have gaps in its transaction history). This 
> mirrors the behavior of the NameNode when there are multiple local 
> directories specified.
> However, it would be better if a background thread noticed these gaps and 
> "filled them in" by grabbing the segments from other JournalNodes. This 
> increases the resilience of the system when JournalNodes get reformatted or 
> otherwise lose their local disk.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-11370) Optimize NamenodeFsck#getReplicaInfo

2017-02-01 Thread Jing Zhao (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-11370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15848812#comment-15848812
 ] 

Jing Zhao commented on HDFS-11370:
--

The latest patch looks good to me. And the failed test should be unrelated. +1

I will commit the patch shortly. Thanks for the work, [~tasanuma0829]! Thanks 
for the review, [~manojg]!

> Optimize NamenodeFsck#getReplicaInfo
> 
>
> Key: HDFS-11370
> URL: https://issues.apache.org/jira/browse/HDFS-11370
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Reporter: Takanobu Asanuma
>Assignee: Takanobu Asanuma
>Priority: Minor
> Attachments: HDFS-11370.1.patch, HDFS-11370.2.patch, 
> HDFS-11370.3.patch, HDFS-11370.4.patch, HDFS-11370.5.patch, HDFS-11370.6.patch
>
>
> We can optimize the logic of calculating the number of storages in 
> {{NamenodeFsck#getReplicaInfo}}. This is a follow-on task of HDFS-11124.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-11370) Optimize NamenodeFsck#getReplicaInfo

2017-02-01 Thread Jing Zhao (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-11370?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jing Zhao updated HDFS-11370:
-
   Resolution: Fixed
 Hadoop Flags: Reviewed
Fix Version/s: 3.0.0-alpha3
   Status: Resolved  (was: Patch Available)

I've committed this patch to trunk.

> Optimize NamenodeFsck#getReplicaInfo
> 
>
> Key: HDFS-11370
> URL: https://issues.apache.org/jira/browse/HDFS-11370
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Reporter: Takanobu Asanuma
>Assignee: Takanobu Asanuma
>Priority: Minor
> Fix For: 3.0.0-alpha3
>
> Attachments: HDFS-11370.1.patch, HDFS-11370.2.patch, 
> HDFS-11370.3.patch, HDFS-11370.4.patch, HDFS-11370.5.patch, HDFS-11370.6.patch
>
>
> We can optimize the logic of calculating the number of storages in 
> {{NamenodeFsck#getReplicaInfo}}. This is a follow-on task of HDFS-11124.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-11370) Optimize NamenodeFsck#getReplicaInfo

2017-01-31 Thread Jing Zhao (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-11370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15847847#comment-15847847
 ] 

Jing Zhao commented on HDFS-11370:
--

Thanks for the discussion, [~manojg] and [~tasanuma0829]. I agree it's better 
to achieve thread safe for the new {{getExpectedStorageLocationsIterator}}. But 
currently almost all block related classes, from Block to BlockInfo to 
BlockUnderConstructionFeature,  does not provide thread-safe guarantee and 
depend on external mechanism such as the FSNamesystem lock for protection. So I 
do not think we need to address this issue here, but maybe we can add a java 
doc for {{getExpectedStorageLocationsIterator}} explaining that the method is 
not thread-safe by itself.

> Optimize NamenodeFsck#getReplicaInfo
> 
>
> Key: HDFS-11370
> URL: https://issues.apache.org/jira/browse/HDFS-11370
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Reporter: Takanobu Asanuma
>Assignee: Takanobu Asanuma
>Priority: Minor
> Attachments: HDFS-11370.1.patch, HDFS-11370.2.patch, 
> HDFS-11370.3.patch, HDFS-11370.4.patch, HDFS-11370.5.patch
>
>
> We can optimize the logic of calculating the number of storages in 
> {{NamenodeFsck#getReplicaInfo}}. This is a follow-on task of HDFS-11124.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-11370) Optimize NamenodeFsck#getReplicaInfo

2017-01-26 Thread Jing Zhao (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-11370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15840282#comment-15840282
 ] 

Jing Zhao commented on HDFS-11370:
--

Thanks for working on this, [~tasanuma0829]. The current patch looks good to 
me. Some further thoughts:
# In {{getReplicaInfo}} what we need is actually an iterator/iterable of 
storages (used by the for loop). However, currently we're using a storage[], 
and for completed blockInfo we always need to 1) allocate a storage[], 2) get 
an iterator of the storages, and 3) copy all the storages into the array. This 
is unnecessary.
# So how about we provide an iterator/iterable in the UC feature to get all the 
expected locations? Then for completed blocks we can avoid the unnecessary copy.

What do you think?

> Optimize NamenodeFsck#getReplicaInfo
> 
>
> Key: HDFS-11370
> URL: https://issues.apache.org/jira/browse/HDFS-11370
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Reporter: Takanobu Asanuma
>Assignee: Takanobu Asanuma
>Priority: Minor
> Attachments: HDFS-11370.1.patch
>
>
> We can optimize the logic of calculating the number of storages in 
> {{NamenodeFsck#getReplicaInfo}}. This is a follow-on task of HDFS-11124.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-11124) Report blockIds of internal blocks for EC files in Fsck

2017-01-25 Thread Jing Zhao (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-11124?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jing Zhao updated HDFS-11124:
-
  Resolution: Fixed
Hadoop Flags: Reviewed
   Fix Version/s: (was: 3.0.0-alpha2)
  3.0.0-alpha3
Target Version/s:   (was: 3.0.0-alpha3)
  Status: Resolved  (was: Patch Available)

The failed tests also passed in my local machine. The patch looks good to me. 
+1. I've committed it to trunk.

Thanks a lot for the contribution, [~tasanuma0829]!

> Report blockIds of internal blocks for EC files in Fsck
> ---
>
> Key: HDFS-11124
> URL: https://issues.apache.org/jira/browse/HDFS-11124
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: erasure-coding
>Affects Versions: 3.0.0-alpha1
>Reporter: Takanobu Asanuma
>Assignee: Takanobu Asanuma
>  Labels: hdfs-ec-3.0-nice-to-have
> Fix For: 3.0.0-alpha3
>
> Attachments: HDFS-11124.1.patch, HDFS-11124.2.patch, 
> HDFS-11124.3.patch
>
>
> At the moment, when we do fsck for an EC file which has corrupt blocks and 
> missing blocks, the result of fsck is like this:
> {quote}
> /data/striped 393216 bytes, erasure-coded: policy=RS-DEFAULT-6-3-64k, 1 
> block(s): 
> /data/striped: CORRUPT blockpool BP-1204772930-172.16.165.209-1478761131832 
> block blk_-9223372036854775792
>  CORRUPT 1 blocks of total size 393216 B
> 0. BP-1204772930-172.16.165.209-1478761131832:blk_-9223372036854775792_1001 
> len=393216 Live_repl=4  
> [DatanodeInfoWithStorage[127.0.0.1:61617,DS-bcfebe1f-ff54-4d57-9258-ff5bdfde01b5,DISK](CORRUPT),
>  
> DatanodeInfoWithStorage[127.0.0.1:61601,DS-9abf64d0-bb6b-434c-8c5e-de8e3b278f91,DISK](CORRUPT),
>  
> DatanodeInfoWithStorage[127.0.0.1:61596,DS-62698e61-c13f-44f2-9da5-614945960221,DISK](CORRUPT),
>  
> DatanodeInfoWithStorage[127.0.0.1:61605,DS-bbce6708-16fe-44ca-9f1c-506cf00f7e0d,DISK](LIVE),
>  
> DatanodeInfoWithStorage[127.0.0.1:61592,DS-9cdd4afd-2dc8-40da-8805-09712e2afcc4,DISK](LIVE),
>  
> DatanodeInfoWithStorage[127.0.0.1:61621,DS-f2a72d28-c880-4ffe-a70f-0f403e374504,DISK](LIVE),
>  
> DatanodeInfoWithStorage[127.0.0.1:61629,DS-fa6ac558-2c38-41fe-9ef8-222b3f6b2b3c,DISK](LIVE)]
> {quote}
> It would be useful for admins if it reports the blockIds of the internal 
> blocks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-11124) Report blockIds of internal blocks for EC files in Fsck

2017-01-24 Thread Jing Zhao (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-11124?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15836616#comment-15836616
 ] 

Jing Zhao commented on HDFS-11124:
--

Thanks for updating the patch, [~tasanuma0829]! The patch looks pretty good to 
me. Some nits:
# The new code in {{getReplicaInfo}} can be further simplified by changing the 
map from storage->index to storage->internal_id. Something like this:
{code}
final boolean isStriped = storedBlock.isStriped();
Map storage2Id = new HashMap<>();
if (isStriped && isComplete) {
  long blockId = storedBlock.getBlockId();
  Iterable sis =
  ((BlockInfoStriped)storedBlock).getStorageAndIndexInfos();
  for (StorageAndBlockIndex si: sis){
storage2Id.put(si.getStorage(), blockId + si.getBlockIndex());
  }
}
{code}
# I just noticed {{testFsckOpenECFiles}} is writing a very large file to 
generate 2 blocks. Let's use this chance to avoid writing too much data by 
changing the block size in the configuration (maybe 2 striped per block).
# In the test we can also check if the output for the last open block is 
correct.
# {{getReplicaInfo}} may be further optimized: if we change the "for" loop to 
go through an {{Iterable}}, we can avoid scanning the 
storages multiple time in {{blockManager#getStorages}} and 
{{BlockInfoStriped#getStorageAndIndexInfos}}. We can do this in a separate jira.

> Report blockIds of internal blocks for EC files in Fsck
> ---
>
> Key: HDFS-11124
> URL: https://issues.apache.org/jira/browse/HDFS-11124
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: erasure-coding
>Affects Versions: 3.0.0-alpha1
>Reporter: Takanobu Asanuma
>Assignee: Takanobu Asanuma
>  Labels: hdfs-ec-3.0-nice-to-have
> Fix For: 3.0.0-alpha2
>
> Attachments: HDFS-11124.1.patch, HDFS-11124.2.patch
>
>
> At the moment, when we do fsck for an EC file which has corrupt blocks and 
> missing blocks, the result of fsck is like this:
> {quote}
> /data/striped 393216 bytes, erasure-coded: policy=RS-DEFAULT-6-3-64k, 1 
> block(s): 
> /data/striped: CORRUPT blockpool BP-1204772930-172.16.165.209-1478761131832 
> block blk_-9223372036854775792
>  CORRUPT 1 blocks of total size 393216 B
> 0. BP-1204772930-172.16.165.209-1478761131832:blk_-9223372036854775792_1001 
> len=393216 Live_repl=4  
> [DatanodeInfoWithStorage[127.0.0.1:61617,DS-bcfebe1f-ff54-4d57-9258-ff5bdfde01b5,DISK](CORRUPT),
>  
> DatanodeInfoWithStorage[127.0.0.1:61601,DS-9abf64d0-bb6b-434c-8c5e-de8e3b278f91,DISK](CORRUPT),
>  
> DatanodeInfoWithStorage[127.0.0.1:61596,DS-62698e61-c13f-44f2-9da5-614945960221,DISK](CORRUPT),
>  
> DatanodeInfoWithStorage[127.0.0.1:61605,DS-bbce6708-16fe-44ca-9f1c-506cf00f7e0d,DISK](LIVE),
>  
> DatanodeInfoWithStorage[127.0.0.1:61592,DS-9cdd4afd-2dc8-40da-8805-09712e2afcc4,DISK](LIVE),
>  
> DatanodeInfoWithStorage[127.0.0.1:61621,DS-f2a72d28-c880-4ffe-a70f-0f403e374504,DISK](LIVE),
>  
> DatanodeInfoWithStorage[127.0.0.1:61629,DS-fa6ac558-2c38-41fe-9ef8-222b3f6b2b3c,DISK](LIVE)]
> {quote}
> It would be useful for admins if it reports the blockIds of the internal 
> blocks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-8498) Blocks can be committed with wrong size

2017-01-19 Thread Jing Zhao (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-8498?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jing Zhao updated HDFS-8498:

Attachment: HDFS-8498.001.patch

Update the patch to fix bug when block is initialized as null. Also slightly 
changed TestDFSOutputStream.java to trigger tests in hadoop-hdfs.

> Blocks can be committed with wrong size
> ---
>
> Key: HDFS-8498
> URL: https://issues.apache.org/jira/browse/HDFS-8498
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.5.0
>Reporter: Daryn Sharp
>Assignee: Jing Zhao
>Priority: Critical
> Attachments: HDFS-8498.000.patch, HDFS-8498.001.patch
>
>
> When an IBR for a UC block arrives, the NN updates the expected location's 
> block and replica state _only_ if it's on an unexpected storage for an 
> expected DN.  If it's for an expected storage, only the genstamp is updated.  
> When the block is committed, and the expected locations are verified, only 
> the genstamp is checked.  The size is not checked but it wasn't updated in 
> the expected locations anyway.
> A faulty client may misreport the size when committing the block.  The block 
> is effectively corrupted.  If the NN issues replications, the received IBR is 
> considered corrupt, the NN invalidates the block, immediately issues another 
> replication.  The NN eventually realizes all the original replicas are 
> corrupt after full BRs are received from the original DNs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-8498) Blocks can be committed with wrong size

2017-01-19 Thread Jing Zhao (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-8498?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jing Zhao updated HDFS-8498:

Assignee: Jing Zhao  (was: Daryn Sharp)
  Status: Patch Available  (was: Reopened)

> Blocks can be committed with wrong size
> ---
>
> Key: HDFS-8498
> URL: https://issues.apache.org/jira/browse/HDFS-8498
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.5.0
>Reporter: Daryn Sharp
>Assignee: Jing Zhao
>Priority: Critical
> Attachments: HDFS-8498.000.patch
>
>
> When an IBR for a UC block arrives, the NN updates the expected location's 
> block and replica state _only_ if it's on an unexpected storage for an 
> expected DN.  If it's for an expected storage, only the genstamp is updated.  
> When the block is committed, and the expected locations are verified, only 
> the genstamp is checked.  The size is not checked but it wasn't updated in 
> the expected locations anyway.
> A faulty client may misreport the size when committing the block.  The block 
> is effectively corrupted.  If the NN issues replications, the received IBR is 
> considered corrupt, the NN invalidates the block, immediately issues another 
> replication.  The NN eventually realizes all the original replicas are 
> corrupt after full BRs are received from the original DNs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-4025) QJM: Sychronize past log segments to JNs that missed them

2017-01-18 Thread Jing Zhao (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-4025?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15828903#comment-15828903
 ] 

Jing Zhao commented on HDFS-4025:
-

Thanks for the update, [~hkoneru]. The current patch looks better. Further 
comments:
# Journal segment transfer timeout should not share the same configuration with 
image transfer timeout, since a log segment is usually smaller than the 
fsimage. Let's create a new configuration property for it.
# Accordingly we do not need a public method Util#getImageTransferTimeout.
# In Storage.java, the following is a good change. But the code needs a 
reformat so that code like "sd.getStorageDirType" does not break into two 
lines. Besides, I think the patch will not use DirIterator anymore, so this 
change can also be done in a separate jira.
{code}
 private boolean shouldReturnNextDir() {
   StorageDirectory sd = getStorageDir(nextIndex);
-  return (dirType == null || sd.getStorageDirType().isOfType(dirType)) &&
-  (includeShared || !sd.isShared());
+  return (dirType == null || (sd.getStorageDirType() != null && sd
+.getStorageDirType().isOfType(dirType))) && (includeShared || !sd
+.isShared());
 }
{code}
# No need to define EDITS/EDITS_INPROGRESS etc. again in JNStorage. Actually 
currently JournalNode shares the same storage layout as NameNode, and directly 
uses FileJournalManager which is defined in the namenode package. So it's ok to 
use EDITS/EDITS_INPROGRESS defined in NNStorage. We can do further code cleanup 
as a follow-on task.
# Similarly please see if we still need JNStorage#getTemporaryEditsFile and 
JNStorage#getFinalizedEditsFile.
# getNamespaceInfo can be defined in Storage and let NNStorage override it. 
JNStorage can directly use the base version.
# Journal#renameTemporarySegments can be renamed to renameTmpSegment since 
we're renaming a single segment here. Also no need to call Util#deleteTmpFiles. 
Just simply call File#delete and check its result.
# In JournalNodeSyncer, some fields (e.g., journal, jn, jnStorage, conf) can be 
declared as final. "NULL_CONTROLLER" can be skipped.
# Maybe we do not need two lists: otherJournalNodeAddrs and 
journalNodeProxiesList.  We can create a wrapper class to wrap both 
InetSocketAddress and QJournalProtocolPB inside. In this way we only need one 
list.
# "syncJournalDaemon.setDaemon(true);" is unnecessary since syncJournalDaemon 
is already a Daemon.
# getMissingLogList cannot guarantee the returned ArrayList is sorted according 
to the transaction id, since the ArrayList is created based on a HashSet. 
Therefore 1) we cannot guarantee we're downloading older segments first, 2) the 
getNextContinuousTxId logic can be wrong.
# The whole "getMissingLogSegments" may need to be redesigned:
#* getMissingLogList can utilize merge-sort like logic to generate the missing 
list
#* Each time we download a missing segment successfully, we should update 
lastSyncedTxId accordingly.
#* Once we hit any exception while downloading from the remote JN, we can stop 
the current syncing and continue downloading in the next sync session from 
another JN.
#* Once lastSyncedTxId has reached the last finalized segment, normally the 
current JN has caught up. We can reset the lastSyncedTxId back.
# Some further optimization can also be done on getMissingLogList:
#* the remote JN http URLs can be stored in JNSyncer
#* if we know some segments are missing but we did not downloaded in the 
previous sync, we can directly download them from a new JN without calling 
getEditLogManifest RPC.
These can be done separately as follow-on.
# We also need to add a DataTransferThrottler for the downloading to avoid 
occupying too much network bandwidth. See TransferFsImage for an example.
# downloadEditLogFromJournalHttpServer can have a shorter name, maybe 
downloadSegment?
# In downloadEditLogFromJournalHttpServer, no need to call jnStorage.getFiles 
since we do not require any special storage dir type here. You can directly 
check if finalEditsFile exists.
{code}
File finalEditsFile = jnStorage.getFinalizedEditsFile(log.getStartTxId(),
log.getEndTxId());

List finalFiles = jnStorage.getFiles(null, finalEditsFile.getName());
assert !(finalFiles.isEmpty()) : "No checkpoint targets.";
{code}
# Similarly before calling doGetUrl, no need to generate tmpFiles list. Instead 
use ImmutableList.of(tmpEditsFile). We also need to handle Exceptions other 
than IOException. See Journal#syncLog as an example.
# renameTemporarySegments can be renamed to renameTmpSegment. We can let this 
method to return boolean: in this way if the rename fails the tmp file deletion 
can be done out of the lock.
# For the test we can set a smaller sync interval so that the test can be 
faster.
# We need to add more tests to cover different scenarios:
#* multiple segments are missing
#* discontinuous segments are missing
#* more than

[jira] [Updated] (HDFS-8498) Blocks can be committed with wrong size

2017-01-17 Thread Jing Zhao (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-8498?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jing Zhao updated HDFS-8498:

Attachment: HDFS-8498.000.patch

Vinay's proposed solution looks good to me. For the implementation, instead of 
directly changing ExtendedBlock which is used everywhere, maybe we can create a 
Block-similar structure which is thread safe and only used by 
DFSOutputStream/DataStreamer internally.

Upload a patch to demo the idea. Please comment.

> Blocks can be committed with wrong size
> ---
>
> Key: HDFS-8498
> URL: https://issues.apache.org/jira/browse/HDFS-8498
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.5.0
>Reporter: Daryn Sharp
>Assignee: Daryn Sharp
>Priority: Critical
> Attachments: HDFS-8498.000.patch
>
>
> When an IBR for a UC block arrives, the NN updates the expected location's 
> block and replica state _only_ if it's on an unexpected storage for an 
> expected DN.  If it's for an expected storage, only the genstamp is updated.  
> When the block is committed, and the expected locations are verified, only 
> the genstamp is checked.  The size is not checked but it wasn't updated in 
> the expected locations anyway.
> A faulty client may misreport the size when committing the block.  The block 
> is effectively corrupted.  If the NN issues replications, the received IBR is 
> considered corrupt, the NN invalidates the block, immediately issues another 
> replication.  The NN eventually realizes all the original replicas are 
> corrupt after full BRs are received from the original DNs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-4025) QJM: Sychronize past log segments to JNs that missed them

2017-01-11 Thread Jing Zhao (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-4025?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15819779#comment-15819779
 ] 

Jing Zhao commented on HDFS-4025:
-

Thanks for updating the patch, [~hkoneru]. Some further comments:
# We do not need to move getAddressesList() to DatanodeUtil.
# getAddressList() and getOtherJournalNodeAddrs can be combined into one util 
method: getLoggerAddresses(URI uri, Set toExclude).
# Need to clean the uused imports and unused variables in JournalNodeSyncer.java
# sync_journals_timeout should not be retrieved from a newly created 
configuration in a static code block. It should be initialized based on the 
configuration passed to JournalNodeSyncer constructor.
# We need to make sure syncJournalDaemon is always running while the JN is 
alive. So syncJournals should be in a try-catch block which catches Throwables. 
Please see BlockManager.RedundancyMonitor#run as an example.
# Need to stop syncers when stopping JN.
# The temp log segment files should be always be downloaded into the current 
directory. Thus downloadEditLogFromJournalHttpServer can be further simplified.
# The current code may hit a race during the rolling-upgrade rollback. If the 
rollback happens, some log segments may be deleted while a syncer may download 
them from a remote JN which gets delayed in the rollback. Thus renaming temp 
journal files needs to be protected by Journal's monitor and we need to make 
sure its end index is smaller than the current committedTxnId.
# We can consider adding a configuration flag to turn off this feature.
# We do not need to get the local local log manifest for each syncing. The 
local log segment manifest can be reused.

> QJM: Sychronize past log segments to JNs that missed them
> -
>
> Key: HDFS-4025
> URL: https://issues.apache.org/jira/browse/HDFS-4025
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: ha
>Affects Versions: QuorumJournalManager (HDFS-3077)
>Reporter: Todd Lipcon
>Assignee: Hanisha Koneru
> Fix For: QuorumJournalManager (HDFS-3077)
>
> Attachments: HDFS-4025.000.patch, HDFS-4025.001.patch, 
> HDFS-4025.002.patch, HDFS-4025.003.patch, HDFS-4025.004.patch, 
> HDFS-4025.005.patch, HDFS-4025.006.patch
>
>
> Currently, if a JournalManager crashes and misses some segment of logs, and 
> then comes back, it will be re-added as a valid part of the quorum on the 
> next log roll. However, it will not have a complete history of log segments 
> (i.e any individual JN may have gaps in its transaction history). This 
> mirrors the behavior of the NameNode when there are multiple local 
> directories specified.
> However, it would be better if a background thread noticed these gaps and 
> "filled them in" by grabbing the segments from other JournalNodes. This 
> increases the resilience of the system when JournalNodes get reformatted or 
> otherwise lose their local disk.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-8498) Blocks can be committed with wrong size

2017-01-10 Thread Jing Zhao (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-8498?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jing Zhao updated HDFS-8498:

Target Version/s: 2.9.0, 3.0.0-alpha2  (was: 2.7.3)

> Blocks can be committed with wrong size
> ---
>
> Key: HDFS-8498
> URL: https://issues.apache.org/jira/browse/HDFS-8498
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.5.0
>Reporter: Daryn Sharp
>Assignee: Daryn Sharp
>Priority: Critical
>
> When an IBR for a UC block arrives, the NN updates the expected location's 
> block and replica state _only_ if it's on an unexpected storage for an 
> expected DN.  If it's for an expected storage, only the genstamp is updated.  
> When the block is committed, and the expected locations are verified, only 
> the genstamp is checked.  The size is not checked but it wasn't updated in 
> the expected locations anyway.
> A faulty client may misreport the size when committing the block.  The block 
> is effectively corrupted.  If the NN issues replications, the received IBR is 
> considered corrupt, the NN invalidates the block, immediately issues another 
> replication.  The NN eventually realizes all the original replicas are 
> corrupt after full BRs are received from the original DNs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Reopened] (HDFS-8498) Blocks can be committed with wrong size

2017-01-10 Thread Jing Zhao (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-8498?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jing Zhao reopened HDFS-8498:
-

> Blocks can be committed with wrong size
> ---
>
> Key: HDFS-8498
> URL: https://issues.apache.org/jira/browse/HDFS-8498
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.5.0
>Reporter: Daryn Sharp
>Assignee: Daryn Sharp
>Priority: Critical
>
> When an IBR for a UC block arrives, the NN updates the expected location's 
> block and replica state _only_ if it's on an unexpected storage for an 
> expected DN.  If it's for an expected storage, only the genstamp is updated.  
> When the block is committed, and the expected locations are verified, only 
> the genstamp is checked.  The size is not checked but it wasn't updated in 
> the expected locations anyway.
> A faulty client may misreport the size when committing the block.  The block 
> is effectively corrupted.  If the NN issues replications, the received IBR is 
> considered corrupt, the NN invalidates the block, immediately issues another 
> replication.  The NN eventually realizes all the original replicas are 
> corrupt after full BRs are received from the original DNs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-8498) Blocks can be committed with wrong size

2017-01-10 Thread Jing Zhao (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-8498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15816262#comment-15816262
 ] 

Jing Zhao commented on HDFS-8498:
-

We also saw the same scenario as described by [~vinayrpet]. Maybe we can reopen 
this jira and explore the solution proposed by [~vinayrpet].

> Blocks can be committed with wrong size
> ---
>
> Key: HDFS-8498
> URL: https://issues.apache.org/jira/browse/HDFS-8498
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.5.0
>Reporter: Daryn Sharp
>Assignee: Daryn Sharp
>Priority: Critical
>
> When an IBR for a UC block arrives, the NN updates the expected location's 
> block and replica state _only_ if it's on an unexpected storage for an 
> expected DN.  If it's for an expected storage, only the genstamp is updated.  
> When the block is committed, and the expected locations are verified, only 
> the genstamp is checked.  The size is not checked but it wasn't updated in 
> the expected locations anyway.
> A faulty client may misreport the size when committing the block.  The block 
> is effectively corrupted.  If the NN issues replications, the received IBR is 
> considered corrupt, the NN invalidates the block, immediately issues another 
> replication.  The NN eventually realizes all the original replicas are 
> corrupt after full BRs are received from the original DNs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-11273) Move TransferFsImage#doGetUrl function to a Util class

2017-01-09 Thread Jing Zhao (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-11273?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jing Zhao updated HDFS-11273:
-
   Resolution: Fixed
 Hadoop Flags: Reviewed
Fix Version/s: 3.0.0-alpha2
   Status: Resolved  (was: Patch Available)

The latest patch looks good to me. The failed tests should be unrelated and 
they passed in my local machine. +1

I've committed the patch to trunk. Thanks for the contribution, [~hkoneru]!

> Move TransferFsImage#doGetUrl function to a Util class
> --
>
> Key: HDFS-11273
> URL: https://issues.apache.org/jira/browse/HDFS-11273
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Hanisha Koneru
>Assignee: Hanisha Koneru
> Fix For: 3.0.0-alpha2
>
> Attachments: HDFS-11273.000.patch, HDFS-11273.001.patch, 
> HDFS-11273.002.patch, HDFS-11273.003.patch, HDFS-11273.004.patch
>
>
> TransferFsImage#doGetUrl downloads files from the specified url and stores 
> them in the specified storage location. HDFS-4025 plans to synchronize the 
> log segments in JournalNodes. If a log segment is missing from a JN, the JN 
> downloads it from another JN which has the required log segment. We need 
> TransferFsImage#doGetUrl and TransferFsImage#receiveFile to accomplish this. 
> So we propose to move the said functions to a Utility class so as to be able 
> to use it for JournalNode syncing as well, without duplication of code.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-11273) Move TransferFsImage#doGetUrl function to a Util class

2017-01-04 Thread Jing Zhao (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-11273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15799513#comment-15799513
 ] 

Jing Zhao commented on HDFS-11273:
--

Thanks for updating the patch, [~hkoneru]! The updated patch is almost there. I 
have several extra minor comments (sorry I did not mention them in my last 
review...):
# The new Util#setTimeout method may no longer only load timeout value from 
"DFS_IMAGE_TRANSFER_TIMEOUT_KEY". Thus the code loading timeout from 
configuration can be left in TransferFsImage#doGetUrl.
{code}
+  /**
+   * Sets a timeout value in millisecods for the Http connection.
+   * @param connection the Http connection for which timeout needs to be set
+   * @param timeout value to be set as timeout in milliseconds
+   */
+  public static void setTimeout(HttpURLConnection connection, int timeout) {
+if (timeout <= 0) {
+  Configuration conf = new HdfsConfiguration();
+  timeout = conf.getInt(
+  DFSConfigKeys.DFS_IMAGE_TRANSFER_TIMEOUT_KEY,
+  DFSConfigKeys.DFS_IMAGE_TRANSFER_TIMEOUT_DEFAULT);
+  LOG.info("Image Transfer timeout configured to " + timeout
+  + " milliseconds");
+}
+
+if (timeout > 0) {
+  connection.setConnectTimeout(timeout);
+  connection.setReadTimeout(timeout);
+}
+  }
{code}
# HttpGetFailedException can be defined as an upper level class and be moved to 
the o.a.h.hdfs.server.common package.
# The following code can be reformatted.
{code}
+  public static MD5Hash doGetUrl(URL url, List localPaths,
+  Storage dstStorage, boolean getChecksum, URLConnectionFactory
+  connectionFactory, int ioFileBufferSize, boolean isSpnegoEnabled, int
+  timeout) throws IOException {
{code}

> Move TransferFsImage#doGetUrl function to a Util class
> --
>
> Key: HDFS-11273
> URL: https://issues.apache.org/jira/browse/HDFS-11273
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Hanisha Koneru
>Assignee: Hanisha Koneru
> Attachments: HDFS-11273.000.patch, HDFS-11273.001.patch
>
>
> TransferFsImage#doGetUrl downloads files from the specified url and stores 
> them in the specified storage location. HDFS-4025 plans to synchronize the 
> log segments in JournalNodes. If a log segment is missing from a JN, the JN 
> downloads it from another JN which has the required log segment. We need 
> TransferFsImage#doGetUrl and TransferFsImage#receiveFile to accomplish this. 
> So we propose to move the said functions to a Utility class so as to be able 
> to use it for JournalNode syncing as well, without duplication of code.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-11273) Move TransferFsImage#doGetUrl function to a Util class

2017-01-03 Thread Jing Zhao (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-11273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15796829#comment-15796829
 ] 

Jing Zhao commented on HDFS-11273:
--

Thanks for the patch, [~hkoneru]! The patch looks good to me. Just two nits:
# We can use this chance to cleanup the imports of TransferFsImage and Util.
# In Util.java, CONTENT_LENGTH, MD5_HEADER, and deleteTmpFiles do not need to 
be public.

Besides we can add a little more details in the description to explain why 
moving the code is necessary.

> Move TransferFsImage#doGetUrl function to a Util class
> --
>
> Key: HDFS-11273
> URL: https://issues.apache.org/jira/browse/HDFS-11273
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Hanisha Koneru
>Assignee: Hanisha Koneru
> Attachments: HDFS-11273.000.patch
>
>
> TransferFsImage#doGetUrl function is required for JournalNode syncing as 
> well. We can move the code to a Utility class to avoid duplication of code.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-11273) Move TransferFsImage#doGetUrl function to a Util class

2017-01-03 Thread Jing Zhao (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-11273?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jing Zhao updated HDFS-11273:
-
Issue Type: Improvement  (was: Bug)

> Move TransferFsImage#doGetUrl function to a Util class
> --
>
> Key: HDFS-11273
> URL: https://issues.apache.org/jira/browse/HDFS-11273
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Hanisha Koneru
>Assignee: Hanisha Koneru
> Attachments: HDFS-11273.000.patch
>
>
> TransferFsImage#doGetUrl function is required for JournalNode syncing as 
> well. We can move the code to a Utility class to avoid duplication of code.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-11273) Move TransferFsImage#doGetUrl function to a Util class

2017-01-03 Thread Jing Zhao (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-11273?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jing Zhao updated HDFS-11273:
-
Component/s: (was: hdfs)

> Move TransferFsImage#doGetUrl function to a Util class
> --
>
> Key: HDFS-11273
> URL: https://issues.apache.org/jira/browse/HDFS-11273
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Hanisha Koneru
>Assignee: Hanisha Koneru
> Attachments: HDFS-11273.000.patch
>
>
> TransferFsImage#doGetUrl function is required for JournalNode syncing as 
> well. We can move the code to a Utility class to avoid duplication of code.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-4025) QJM: Sychronize past log segments to JNs that missed them

2016-11-10 Thread Jing Zhao (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-4025?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15655302#comment-15655302
 ] 

Jing Zhao commented on HDFS-4025:
-

Thanks for the patch, [~hanishakoneru]! The patch looks good to me in general. 
Please see comments below:
# In JournalNodeSyncer#startSyncJournalsThread, the following sleep may be 
unnecessary: in most of the cases the journal is formatted before we start the 
sync thread.
{code}
try {
  // Wait for the JournalNodes to get formatted before attempting sync
  Thread.sleep(SYNC_JOURNALS_TIMEOUT/2);
} catch (InterruptedException e) {
  LOG.error(e);
}
{code}
# The syncJournalThread should be daemon. Also we can add a flag to control 
when the thread should exit the while loop.
# {{getAllJournalNodeAddrs}} shares the same functionality with 
{{QuorumJournalManager#getLoggerAddresses}}. We can convert it into a utility 
function and use it in these two places.
# Since currently we do not support changing Journal Node configuration while 
JN is running, we can initialize all the other JN proxies in the very 
beginning. Then later we can randomly pick a proxy instead of an 
InetSocketAddress.
# We usually only deploy 3 or 5 JNs in practice, thus we may also choose a 
Round-Robin way to pick sync target. Also if an error/exception happens during 
the sync, we can wait till the next run (instead of retrying another JN 
immediately).
# Typo: getMisingLogList --> getMissingLogList
# {{getMisingLogList}} can use merge-sort style to compare the two lists.
# Let's see if we can avoid copying code from {{TransferFsImage}} but reuse its 
methods.
# We need to make sure we finally purge old tmp editlog files due to failures 
during the downloading/renaming.


> QJM: Sychronize past log segments to JNs that missed them
> -
>
> Key: HDFS-4025
> URL: https://issues.apache.org/jira/browse/HDFS-4025
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: ha
>Affects Versions: QuorumJournalManager (HDFS-3077)
>Reporter: Todd Lipcon
>Assignee: Hanisha Koneru
> Fix For: QuorumJournalManager (HDFS-3077)
>
> Attachments: HDFS-4025.000.patch, HDFS-4025.001.patch, 
> HDFS-4025.002.patch, HDFS-4025.003.patch
>
>
> Currently, if a JournalManager crashes and misses some segment of logs, and 
> then comes back, it will be re-added as a valid part of the quorum on the 
> next log roll. However, it will not have a complete history of log segments 
> (i.e any individual JN may have gaps in its transaction history). This 
> mirrors the behavior of the NameNode when there are multiple local 
> directories specified.
> However, it would be better if a background thread noticed these gaps and 
> "filled them in" by grabbing the segments from other JournalNodes. This 
> increases the resilience of the system when JournalNodes get reformatted or 
> otherwise lose their local disk.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-11095) BlockManagerSafeMode should respect extension period default config value (30s)

2016-11-02 Thread Jing Zhao (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-11095?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15630482#comment-15630482
 ] 

Jing Zhao commented on HDFS-11095:
--

+1 pending Jenkins.

> BlockManagerSafeMode should respect extension period default config value 
> (30s)
> ---
>
> Key: HDFS-11095
> URL: https://issues.apache.org/jira/browse/HDFS-11095
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Reporter: Mingliang Liu
>Assignee: Mingliang Liu
> Attachments: HDFS-11095.000.patch, HDFS-11095.001.patch
>
>
> {code:title=BlockManagerSafeMode.java}
> this.extension = conf.getInt(DFS_NAMENODE_SAFEMODE_EXTENSION_KEY, 0);
> {code}
> Though the default value (30s) is loaded from {{hdfs-default.xml}}, we should 
> also respect this in the code by using 
> {{DFSConfigKeys#DFS_NAMENODE_SAFEMODE_EXTENSION_DEFAULT}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-11090) Leave safemode immediately if all blocks have reported in

2016-11-02 Thread Jing Zhao (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-11090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15629948#comment-15629948
 ] 

Jing Zhao commented on HDFS-11090:
--

If 100% block threshold has been met, this means all the blocks have achieved 
minimum replication requirement (usually 1 replica). Therefore it is still 
possible that NN has not received some FBR. To have safemode extension can 
still avoid unnecessary replication work. But in the meanwhile, the number of 
pending FBR in the above scenario should be limited, considering we're using 
random replication. Also we already have extra logic to initialize replication 
queue earlier ({{initializeReplQueuesIfNecessary}}).

My main concern about the approach in the current patch is whether it is that 
useful in practice. For a large cluster, it is not rare to have a few missing 
blocks, or at least we have to wait for a long time to have 100% block safe, 
thus ppl usually set safemode threshold <1. For a small cluster, we can 
directly set the safemode extension to 0 in the configuration. So do we want to 
add some extra check to the safemode code which is already very complicated?

> Leave safemode immediately if all blocks have reported in
> -
>
> Key: HDFS-11090
> URL: https://issues.apache.org/jira/browse/HDFS-11090
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Affects Versions: 2.7.3
>Reporter: Andrew Wang
>Assignee: Yiqun Lin
> Attachments: HDFS-11090.001.patch
>
>
> Startup safemode is triggered by two thresholds: % blocks reported in, and 
> min # datanodes. It's extended by an interval (default 30s) until these two 
> thresholds are met.
> Safemode extension is helpful when the cluster has data, and the default % 
> blocks threshold (0.99) is used. It gives DNs a little extra time to report 
> in and thus avoid unnecessary replication work.
> However, we can leave startup safemode early if 100% of blocks have reported 
> in.
> Note that operators sometimes change the % blocks threshold to > 1 to never 
> automatically leave safemode. We should maintain this behavior.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-10827) When there are unrecoverable ec block groups, Namenode Web UI shows "There are X missing blocks." but doesn't show the block names.

2016-10-14 Thread Jing Zhao (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-10827?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jing Zhao updated HDFS-10827:
-
   Resolution: Fixed
 Hadoop Flags: Reviewed
Fix Version/s: 3.0.0-alpha2
   Status: Resolved  (was: Patch Available)

+1 on the latest patch. I've committed it into trunk. Thanks for the 
contribution, [~tasanuma0829]!

> When there are unrecoverable ec block groups, Namenode Web UI shows "There 
> are X missing blocks." but doesn't show the block names.
> ---
>
> Key: HDFS-10827
> URL: https://issues.apache.org/jira/browse/HDFS-10827
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: erasure-coding
>Reporter: Takanobu Asanuma
>Assignee: Takanobu Asanuma
> Fix For: 3.0.0-alpha2
>
> Attachments: HDFS-10827.1.patch, HDFS-10827.2.patch, 
> HDFS-10827.3.patch, HDFS-10827.4.patch, case_2.png, case_3.png
>
>
> For RS-6-3, when there is one ec block group and
> 1) 0~3 out of 9 internal blocks are missing, NN Web UI doesn't show any warns.
> 2) 4~8 out of 9 internal blocks are missing, NN Web UI shows "There are 1 
> missing blocks." but doesn't show the block names. (please see case_2.png)
> 3) 9 out of 9 internal blocks are missing, NN Web UI shows "There are 1 
> missing blocks." and also shows the block name. (please see case_3.png)
> We should fix the case 2). I think NN Web UI should show the block names 
> since the ec block group is unrecoverable.
> The values come from JMX. "There are X missing blocks." is 
> {{NumberOfMissingBlocks}} and the block names are {{CorruptFiles}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-10827) When there are unrecoverable ec block groups, Namenode Web UI shows "There are X missing blocks." but doesn't show the block names.

2016-10-12 Thread Jing Zhao (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-10827?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15569512#comment-15569512
 ] 

Jing Zhao commented on HDFS-10827:
--

[~tasanuma0829], I think you're right. We actually do not need to add that 
extra check there.

> When there are unrecoverable ec block groups, Namenode Web UI shows "There 
> are X missing blocks." but doesn't show the block names.
> ---
>
> Key: HDFS-10827
> URL: https://issues.apache.org/jira/browse/HDFS-10827
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: erasure-coding
>Reporter: Takanobu Asanuma
>Assignee: Takanobu Asanuma
> Attachments: HDFS-10827.1.patch, HDFS-10827.2.patch, 
> HDFS-10827.3.patch, case_2.png, case_3.png
>
>
> For RS-6-3, when there is one ec block group and
> 1) 0~3 out of 9 internal blocks are missing, NN Web UI doesn't show any warns.
> 2) 4~8 out of 9 internal blocks are missing, NN Web UI shows "There are 1 
> missing blocks." but doesn't show the block names. (please see case_2.png)
> 3) 9 out of 9 internal blocks are missing, NN Web UI shows "There are 1 
> missing blocks." and also shows the block name. (please see case_3.png)
> We should fix the case 2). I think NN Web UI should show the block names 
> since the ec block group is unrecoverable.
> The values come from JMX. "There are X missing blocks." is 
> {{NumberOfMissingBlocks}} and the block names are {{CorruptFiles}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-10968) BlockManager#isInNewRack should consider decommissioning nodes

2016-10-07 Thread Jing Zhao (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-10968?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jing Zhao updated HDFS-10968:
-
   Resolution: Fixed
Fix Version/s: 3.0.0-alpha2
   Status: Resolved  (was: Patch Available)

Thanks for the review, Nicholas! I've committed this into trunk.

> BlockManager#isInNewRack should consider decommissioning nodes
> --
>
> Key: HDFS-10968
> URL: https://issues.apache.org/jira/browse/HDFS-10968
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: erasure-coding, namenode
>Affects Versions: 3.0.0-alpha1
>Reporter: Jing Zhao
>Assignee: Jing Zhao
> Fix For: 3.0.0-alpha2
>
> Attachments: HDFS-10968.000.patch
>
>
> For an EC block, it is possible we have enough internal blocks but without 
> enough racks. The current reconstruction code calls 
> {{BlockManager#isInNewRack}} to check if the target node can increase the 
> total rack number for the case, which compares the target node's rack with 
> source node racks:
> {code}
> for (DatanodeDescriptor src : srcs) {
>   if (src.getNetworkLocation().equals(target.getNetworkLocation())) {
> return false;
>   }
> }
> {code}
> However here the {{srcs}} may include a decommissioning node, in which case 
> we should allow the target node to be in the same rack with it.
> For e.g., suppose we have 11 nodes: h1 ~ h11, which are located in racks r1, 
> r1, r2, r2, r3, r3, r4, r4, r5, r5, r6, respectively. In case that an EC 
> block has 9 live internal blocks on (h1~h8 + h11), and one internal block on 
> h9 which is to be decommissioned. The current code will not choose h10 for 
> reconstruction because isInNewRack thinks h10 is on the same rack with h9.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-10759) Change fsimage bool isStriped from boolean to an enum

2016-10-07 Thread Jing Zhao (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-10759?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15556534#comment-15556534
 ] 

Jing Zhao commented on HDFS-10759:
--

Yeah I also think the idea is good. But we need to guarantee the compatibility: 
the old fsimage should still be supported and new enum types should be easily 
added (which means we may need to add UNKNOWN_TYPE in the enum according to the 
link).

> Change fsimage bool isStriped from boolean to an enum
> -
>
> Key: HDFS-10759
> URL: https://issues.apache.org/jira/browse/HDFS-10759
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs
>Affects Versions: 3.0.0-alpha1, 3.0.0-beta1, 3.0.0-alpha2
>Reporter: Ewan Higgs
>  Labels: hdfs-ec-3.0-must-do
> Attachments: HDFS-10759.0001.patch
>
>
> The new erasure coding project has updated the protocol for fsimage such that 
> the {{INodeFile}} has a boolean '{{isStriped}}'. I think this is better as an 
> enum or integer since a boolean precludes any future block types. 
> For example:
> {code}
> enum BlockType {
>   CONTIGUOUS = 0,
>   STRIPED = 1,
> }
> {code}
> We can also make this more robust to future changes where there are different 
> block types supported in a staged rollout.  Here, we would use 
> {{UNKNOWN_BLOCK_TYPE}} as the first value since this is the default value. 
> See 
> [here|http://androiddevblog.com/protocol-buffers-pitfall-adding-enum-values/] 
> for more discussion.
> {code}
> enum BlockType {
>   UNKNOWN_BLOCK_TYPE = 0,
>   CONTIGUOUS = 1,
>   STRIPED = 2,
> }
> {code}
> But I'm not convinced this is necessary since there are other enums that 
> don't use this approach.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-10827) When there are unrecoverable ec block groups, Namenode Web UI shows "There are X missing blocks." but doesn't show the block names.

2016-10-07 Thread Jing Zhao (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-10827?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=1949#comment-1949
 ] 

Jing Zhao commented on HDFS-10827:
--

Thanks for working on this, [~tasanuma0829]! The patch looks good to me 
overall. Some minors:
# I think we can rename "isCorrupt" to "isMissing". In the outer "while" loop, 
we're scanning all the blocks with {{QUEUE_WITH_CORRUPT_BLOCKS}} priority, and 
this "if" logic is to select missing blocks.
# We may also want to take into account the decommissioning/decommissioned 
internal blocks for the check.
# As follow-on work, maybe we can create a utility function to check if a block 
is corrupted/missing, considering this is widely used in 
BlockManager/FSNamesystem.
# For the unit test, do you think we can use {{MiniDFSCluster#corruptReplica}} 
to simplify the code?

> When there are unrecoverable ec block groups, Namenode Web UI shows "There 
> are X missing blocks." but doesn't show the block names.
> ---
>
> Key: HDFS-10827
> URL: https://issues.apache.org/jira/browse/HDFS-10827
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: erasure-coding
>Reporter: Takanobu Asanuma
>Assignee: Takanobu Asanuma
> Attachments: HDFS-10827.1.patch, HDFS-10827.2.patch, case_2.png, 
> case_3.png
>
>
> For RS-6-3, when there is one ec block group and
> 1) 0~3 out of 9 internal blocks are missing, NN Web UI doesn't show any warns.
> 2) 4~8 out of 9 internal blocks are missing, NN Web UI shows "There are 1 
> missing blocks." but doesn't show the block names. (please see case_2.png)
> 3) 9 out of 9 internal blocks are missing, NN Web UI shows "There are 1 
> missing blocks." and also shows the block name. (please see case_3.png)
> We should fix the case 2). I think NN Web UI should show the block names 
> since the ec block group is unrecoverable.
> The values come from JMX. "There are X missing blocks." is 
> {{NumberOfMissingBlocks}} and the block names are {{CorruptFiles}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-10968) BlockManager#isNewRack should consider decommissioning nodes

2016-10-05 Thread Jing Zhao (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-10968?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jing Zhao updated HDFS-10968:
-
Status: Patch Available  (was: Open)

> BlockManager#isNewRack should consider decommissioning nodes
> 
>
> Key: HDFS-10968
> URL: https://issues.apache.org/jira/browse/HDFS-10968
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: erasure-coding, namenode
>Affects Versions: 3.0.0-alpha1
>Reporter: Jing Zhao
>Assignee: Jing Zhao
> Attachments: HDFS-10968.000.patch
>
>
> For an EC block, it is possible we have enough internal blocks but without 
> enough racks. The current reconstruction code calls 
> {{BlockManager#isNewRack}} to check if the target node can increase the total 
> rack number for the case, which compares the target node's rack with source 
> node racks:
> {code}
> for (DatanodeDescriptor src : srcs) {
>   if (src.getNetworkLocation().equals(target.getNetworkLocation())) {
> return false;
>   }
> }
> {code}
> However here the {{srcs}} may include a decommissioning node, in which case 
> we should allow the target node to be in the same rack with it.
> For e.g., suppose we have 11 nodes: h1 ~ h11, which are located in racks r1, 
> r1, r2, r2, r3, r3, r4, r4, r5, r5, r6, respectively. In case that an EC 
> block has 9 live internal blocks on (h1~h8 + h11), and one internal block on 
> h9 which is to be decommissioned. The current code will not choose h10 for 
> reconstruction because isNewRack thinks h10 is on the same rack with h9.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-10968) BlockManager#isNewRack should consider decommissioning nodes

2016-10-05 Thread Jing Zhao (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-10968?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jing Zhao updated HDFS-10968:
-
Attachment: HDFS-10968.000.patch

Upload a patch to fix the issue. The patch also includes a unit test that 
reproduces the example in the description.

> BlockManager#isNewRack should consider decommissioning nodes
> 
>
> Key: HDFS-10968
> URL: https://issues.apache.org/jira/browse/HDFS-10968
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: erasure-coding, namenode
>Affects Versions: 3.0.0-alpha1
>Reporter: Jing Zhao
>Assignee: Jing Zhao
> Attachments: HDFS-10968.000.patch
>
>
> For an EC block, it is possible we have enough internal blocks but without 
> enough racks. The current reconstruction code calls 
> {{BlockManager#isNewRack}} to check if the target node can increase the total 
> rack number for the case, which compares the target node's rack with source 
> node racks:
> {code}
> for (DatanodeDescriptor src : srcs) {
>   if (src.getNetworkLocation().equals(target.getNetworkLocation())) {
> return false;
>   }
> }
> {code}
> However here the {{srcs}} may include a decommissioning node, in which case 
> we should allow the target node to be in the same rack with it.
> For e.g., suppose we have 11 nodes: h1 ~ h11, which are located in racks r1, 
> r1, r2, r2, r3, r3, r4, r4, r5, r5, r6, respectively. In case that an EC 
> block has 9 live internal blocks on (h1~h8 + h11), and one internal block on 
> h9 which is to be decommissioned. The current code will not choose h10 for 
> reconstruction because isNewRack thinks h10 is on the same rack with h9.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Created] (HDFS-10968) BlockManager#isNewRack should consider decommissioning nodes

2016-10-05 Thread Jing Zhao (JIRA)

Jing Zhao created HDFS-10968:


 Summary: BlockManager#isNewRack should consider decommissioning 
nodes
 Key: HDFS-10968
 URL: https://issues.apache.org/jira/browse/HDFS-10968
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: erasure-coding, namenode
Affects Versions: 3.0.0-alpha1
Reporter: Jing Zhao
Assignee: Jing Zhao


For an EC block, it is possible we have enough internal blocks but without 
enough racks. The current reconstruction code calls {{BlockManager#isNewRack}} 
to check if the target node can increase the total rack number for the case, 
which compares the target node's rack with source node racks:
{code}
for (DatanodeDescriptor src : srcs) {
  if (src.getNetworkLocation().equals(target.getNetworkLocation())) {
return false;
  }
}
{code}
However here the {{srcs}} may include a decommissioning node, in which case we 
should allow the target node to be in the same rack with it.

For e.g., suppose we have 11 nodes: h1 ~ h11, which are located in racks r1, 
r1, r2, r2, r3, r3, r4, r4, r5, r5, r6, respectively. In case that an EC block 
has 9 live internal blocks on (h1~h8 + h11), and one internal block on h9 which 
is to be decommissioned. The current code will not choose h10 for 
reconstruction because isNewRack thinks h10 is on the same rack with h9.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-10826) Correctly report missing EC blocks in FSCK

2016-10-05 Thread Jing Zhao (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-10826?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jing Zhao updated HDFS-10826:
-
Summary: Correctly report missing EC blocks in FSCK  (was: The result of 
fsck should be CRITICAL when there are unrecoverable ec block groups.)

> Correctly report missing EC blocks in FSCK
> --
>
> Key: HDFS-10826
> URL: https://issues.apache.org/jira/browse/HDFS-10826
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: erasure-coding
>Reporter: Takanobu Asanuma
>Assignee: Takanobu Asanuma
> Attachments: HDFS-10826.2.patch, HDFS-10826.3.patch, 
> HDFS-10826.4.patch, HDFS-10826.5.patch, HDFS-10826.WIP.1.patch
>
>
> For RS-6-3, when there is one ec block group and
> 1) 0~3 out of 9 internal blocks are missing, the result of fsck is HEALTY.
> 2) 4~8 out of 9 internal blocks are missing, the result of fsck is HEALTY.
> {noformat}
> Erasure Coded Block Groups:
>  Total size:536870912 B
>  Total files:   1
>  Total block groups (validated):1 (avg. block group size 536870912 B)
>   
>   UNRECOVERABLE BLOCK GROUPS:   1 (100.0 %)
>   
>  Minimally erasure-coded block groups:  0 (0.0 %)
>  Over-erasure-coded block groups:   0 (0.0 %)
>  Under-erasure-coded block groups:  1 (100.0 %)
>  Unsatisfactory placement block groups: 0 (0.0 %)
>  Default ecPolicy:  RS-DEFAULT-6-3-64k
>  Average block group size:  5.0
>  Missing block groups:  0
>  Corrupt block groups:  0
>  Missing internal blocks:   4 (44.43 %)
> FSCK ended at Wed Aug 31 13:42:05 JST 2016 in 4 milliseconds
> The filesystem under path '/' is HEALTHY
> {noformat}
> 3) 9 out of 9 internal blocks are missing, the result of fsck is CRITICAL. 
> (Because it is regarded as a missing block group.)
> In case 2), the result should be CRITICAL since the ec block group is 
> unrecoverable.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-10826) The result of fsck should be CRITICAL when there are unrecoverable ec block groups.

2016-10-05 Thread Jing Zhao (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-10826?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15549491#comment-15549491
 ] 

Jing Zhao commented on HDFS-10826:
--

I've committed the patch into trunk. Thanks for the contribution, 
[~tasanuma0829]. Thanks for the review, [~ajisakaa] and [~jojochuang].

> The result of fsck should be CRITICAL when there are unrecoverable ec block 
> groups.
> ---
>
> Key: HDFS-10826
> URL: https://issues.apache.org/jira/browse/HDFS-10826
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: erasure-coding
>Reporter: Takanobu Asanuma
>Assignee: Takanobu Asanuma
> Attachments: HDFS-10826.2.patch, HDFS-10826.3.patch, 
> HDFS-10826.4.patch, HDFS-10826.5.patch, HDFS-10826.WIP.1.patch
>
>
> For RS-6-3, when there is one ec block group and
> 1) 0~3 out of 9 internal blocks are missing, the result of fsck is HEALTY.
> 2) 4~8 out of 9 internal blocks are missing, the result of fsck is HEALTY.
> {noformat}
> Erasure Coded Block Groups:
>  Total size:536870912 B
>  Total files:   1
>  Total block groups (validated):1 (avg. block group size 536870912 B)
>   
>   UNRECOVERABLE BLOCK GROUPS:   1 (100.0 %)
>   
>  Minimally erasure-coded block groups:  0 (0.0 %)
>  Over-erasure-coded block groups:   0 (0.0 %)
>  Under-erasure-coded block groups:  1 (100.0 %)
>  Unsatisfactory placement block groups: 0 (0.0 %)
>  Default ecPolicy:  RS-DEFAULT-6-3-64k
>  Average block group size:  5.0
>  Missing block groups:  0
>  Corrupt block groups:  0
>  Missing internal blocks:   4 (44.43 %)
> FSCK ended at Wed Aug 31 13:42:05 JST 2016 in 4 milliseconds
> The filesystem under path '/' is HEALTHY
> {noformat}
> 3) 9 out of 9 internal blocks are missing, the result of fsck is CRITICAL. 
> (Because it is regarded as a missing block group.)
> In case 2), the result should be CRITICAL since the ec block group is 
> unrecoverable.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-10826) The result of fsck should be CRITICAL when there are unrecoverable ec block groups.

2016-10-04 Thread Jing Zhao (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-10826?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15546996#comment-15546996
 ] 

Jing Zhao commented on HDFS-10826:
--

The latest patch looks good to me. To do further code refactoring in HDFS-10933 
also sounds good to me. Do you have further comments, [~jojochuang]?

> The result of fsck should be CRITICAL when there are unrecoverable ec block 
> groups.
> ---
>
> Key: HDFS-10826
> URL: https://issues.apache.org/jira/browse/HDFS-10826
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: erasure-coding
>Reporter: Takanobu Asanuma
>Assignee: Takanobu Asanuma
> Attachments: HDFS-10826.2.patch, HDFS-10826.3.patch, 
> HDFS-10826.4.patch, HDFS-10826.5.patch, HDFS-10826.WIP.1.patch
>
>
> For RS-6-3, when there is one ec block group and
> 1) 0~3 out of 9 internal blocks are missing, the result of fsck is HEALTY.
> 2) 4~8 out of 9 internal blocks are missing, the result of fsck is HEALTY.
> {noformat}
> Erasure Coded Block Groups:
>  Total size:536870912 B
>  Total files:   1
>  Total block groups (validated):1 (avg. block group size 536870912 B)
>   
>   UNRECOVERABLE BLOCK GROUPS:   1 (100.0 %)
>   
>  Minimally erasure-coded block groups:  0 (0.0 %)
>  Over-erasure-coded block groups:   0 (0.0 %)
>  Under-erasure-coded block groups:  1 (100.0 %)
>  Unsatisfactory placement block groups: 0 (0.0 %)
>  Default ecPolicy:  RS-DEFAULT-6-3-64k
>  Average block group size:  5.0
>  Missing block groups:  0
>  Corrupt block groups:  0
>  Missing internal blocks:   4 (44.43 %)
> FSCK ended at Wed Aug 31 13:42:05 JST 2016 in 4 milliseconds
> The filesystem under path '/' is HEALTHY
> {noformat}
> 3) 9 out of 9 internal blocks are missing, the result of fsck is CRITICAL. 
> (Because it is regarded as a missing block group.)
> In case 2), the result should be CRITICAL since the ec block group is 
> unrecoverable.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-10797) Disk usage summary of snapshots causes renamed blocks to get counted twice

2016-09-29 Thread Jing Zhao (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-10797?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15533611#comment-15533611
 ] 

Jing Zhao commented on HDFS-10797:
--

Actually my proposal is like your .005 patch. The current semantic and approach 
looks good to me overall.

> Disk usage summary of snapshots causes renamed blocks to get counted twice
> --
>
> Key: HDFS-10797
> URL: https://issues.apache.org/jira/browse/HDFS-10797
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Sean Mackrory
>Assignee: Sean Mackrory
> Attachments: HDFS-10797.001.patch, HDFS-10797.002.patch, 
> HDFS-10797.003.patch, HDFS-10797.004.patch, HDFS-10797.005.patch
>
>
> DirectoryWithSnapshotFeature.computeContentSummary4Snapshot calculates how 
> much disk usage is used by a snapshot by tallying up the files in the 
> snapshot that have since been deleted (that way it won't overlap with regular 
> files whose disk usage is computed separately). However that is determined 
> from a diff that shows moved (to Trash or otherwise) or renamed files as a 
> deletion and a creation operation that may overlap with the list of blocks. 
> Only the deletion operation is taken into consideration, and this causes 
> those blocks to get represented twice in the disk usage tallying.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-10797) Disk usage summary of snapshots causes renamed blocks to get counted twice

2016-09-28 Thread Jing Zhao (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-10797?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15531432#comment-15531432
 ] 

Jing Zhao commented on HDFS-10797:
--

[~mackrorysd], I agree it will be great to have a consistent and user-friendly 
semantic. To me a better semantic can be like this: if the renamed source 
(which is inside of some snapshot) and the renamed target are both under the 
same directory for counting, we count them once. Otherwise they will be counted 
separately.

With this semantic maybe we only need to move your hashset to the context 
object passed from the beginning of the counting call, and use it to avoid 
duplicated counting. What do you think?

> Disk usage summary of snapshots causes renamed blocks to get counted twice
> --
>
> Key: HDFS-10797
> URL: https://issues.apache.org/jira/browse/HDFS-10797
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Sean Mackrory
>Assignee: Sean Mackrory
> Attachments: HDFS-10797.001.patch, HDFS-10797.002.patch, 
> HDFS-10797.003.patch
>
>
> DirectoryWithSnapshotFeature.computeContentSummary4Snapshot calculates how 
> much disk usage is used by a snapshot by tallying up the files in the 
> snapshot that have since been deleted (that way it won't overlap with regular 
> files whose disk usage is computed separately). However that is determined 
> from a diff that shows moved (to Trash or otherwise) or renamed files as a 
> deletion and a creation operation that may overlap with the list of blocks. 
> Only the deletion operation is taken into consideration, and this causes 
> those blocks to get represented twice in the disk usage tallying.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-10897) Ozone: SCM: Add NodeManager

2016-09-28 Thread Jing Zhao (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-10897?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15531259#comment-15531259
 ] 

Jing Zhao commented on HDFS-10897:
--

Thanks for the reply, Anu.

bq. The reason of breaking up these data structures into 3 separate maps is to 
reduce the single lock contention we seem to run into in the current HDFS.

I think we can still avoid lock contention with a single map. Also most stale 
nodes are temporary and dead nodes may be directly removed. So it may not be 
very helpful to have separate maps for them.

bq. Just want to make sure that we are both on the same page on this one. In 
the current scheme, we get a heartbeat and insert it into a queue – with no 
time stamp. 

Here my concern is that we may need at least two threads for the work done by 
the current worker. Dead node detection work may need to be separate out and 
done by another thread (as today's HeartbeatMonitor) considering there may be a 
lot of following work after a dead node is detected (e.g., triggering 
re-replication of containers etc.). Putting all the work, including handling 
heartbeat msgs and scanning all the healthy/stale nodes, into a single loop may 
finally lead to limit throughput for handling heartbeats.

I think currently most of my concerns have been or can be addressed your future 
patches. So I'm +1 on the current patch and we can continue the discussion.

> Ozone: SCM: Add NodeManager
> ---
>
> Key: HDFS-10897
> URL: https://issues.apache.org/jira/browse/HDFS-10897
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: ozone
>Affects Versions: HDFS-7240
>Reporter: Anu Engineer
>Assignee: Anu Engineer
> Attachments: HDFS-10897-HDFS-7240.001.patch, 
> HDFS-10897-HDFS-7240.002.patch, HDFS-10897-HDFS-7240.003.patch
>
>
> Add a nodeManager class that will be used by Storage Controller Manager 
> eventually.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-10912) Ozone:SCM: Add safe mode support to NodeManager.

2016-09-28 Thread Jing Zhao (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-10912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15530843#comment-15530843
 ] 

Jing Zhao commented on HDFS-10912:
--

Thanks for the work, [~anu]. For SCM, I think we may have two different types 
of "safemode":
# the first one is to make sure the SCM receives enough DN registration. If we 
persist the container-node mapping in SCM, we do not need to wait for full 
container reports. Also SCM does not take the responsibility for maintaining 
the container states/durability, thus this type of safemode is very lightweight 
compared with the current NN safemode. (maybe we can rename it ...)
# the second one is the manual safemode (triggered by {{forceEnterSafeMode}}). 
This safemode is actually against the whole SCM instead of its node manager 
(just like in today's HDFS the manual safemode is for the whole NN instead of 
the blockmanager). 

Therefore, to me 
{{forceExitSafeMode}}/{{forceEnterSafeMode}}/{{isInManualSafeMode}} can be 
moved to SCM level. {{forceExitSafeMode}} will reset both the manual safemode 
and the safemode in nodemanager.

> Ozone:SCM: Add safe mode support to NodeManager.
> 
>
> Key: HDFS-10912
> URL: https://issues.apache.org/jira/browse/HDFS-10912
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: ozone
>Affects Versions: HDFS-7240
>Reporter: Anu Engineer
>Assignee: Anu Engineer
> Attachments: HDFS-10912-HDFS-7240.001.patch
>
>
> Add Safe mode support : That is add the ability to force exit or enter safe 
> mode. As well as get the current safe mode status from node manager.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Comment Edited] (HDFS-10897) Ozone: SCM: Add NodeManager

2016-09-28 Thread Jing Zhao (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-10897?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15530818#comment-15530818
 ] 

Jing Zhao edited comment on HDFS-10897 at 9/28/16 8:43 PM:
---

Thanks for working on this, [~anu]. The patch looks good to me overall. Some 
comments:
# My main concern is about the current way tracking the heartbeat time for 
DataNodes. Instead of using 3 String-Long maps, I think it's better to use 
{{DatanodeInfo}} (or a simplified version) to store the latest heartbeat/report 
time. Later we still need to capture other information about DataNodes (its 
current load and state etc.) thus {{DatanodeInfo}} can be the central place to 
store all the information about a DN (just like today's HDFS). Also in this way 
we only need to maintain a single datanode map (which is more static compared 
with the current 3 maps) and most of the lock protection can be put into the 
DatanodeInfo level.
# Also with this change we can have a more fair way for heartbeat time 
calculation: for every heartbeat msg, we can update the corresponding 
datanode's latest update time before putting the heartbeat into the queue, in 
order to avoid the penalty on DN due to SCM's local latency.
# For Node state, we may want to follow the current HDFS, i.e., we need to have 
AdminStates which includes NORMAL, DECOMMISSION_INPROGRESS, DECOMMISSIONED, 
ENTERING_MAINTENANCE, and IN_MAINTENANCE. Stale/dead are calculated based on 
the latest heartbeat time thus maybe we do not need to define them as an 
explicit state (and for dead nodes we may want to directly remove it).
{code}
36   * 4. A node can be in any of these 4 states: {HEALTHY, STALE, DEAD,
37   * DECOMMISSIONED}
38   * 
39   * HEALTHY - It is a datanode that is regularly heartbeating us.
40   *
41   * STALE - A datanode for which we have missed few heart beats.
42   *
43   * DEAD - A datanode that we have not heard from for a while.
44   *
45   * DECOMMISSIONED - Someone told us to remove this node from the 
tracking
46   * list, by calling removeNode. We will throw away this nodes info soon.
{code}
# {{getNodes}}/{{getNodeCount}} can be defined in a metrics interface (like 
today's FSNamesystemMBean).
# Any reason we need a NodeManager interface?




was (Author: jingzhao):
Thanks for working on this, [~anu]. The patch looks good to me overall. Some 
comments:
# My main concern is about the current way tracking the heartbeat time for 
DataNodes. Instead of using 3 String-Long maps, I think it's better to use 
{{DatanodeInfo}} to store the latest heartbeat/report time. Later we still need 
to capture other information about DataNodes (its current load and state etc.) 
thus {{DatanodeInfo}} can be the central place to store all the information 
about a DN (just like today's HDFS). Also in this way we only need to maintain 
a single datanode map (which is more static compared with the current 3 maps) 
and most of the lock protection can be put into the DatanodeInfo level.
# Also with this change we can have a more fair way for heartbeat time 
calculation: for every heartbeat msg, we can update the corresponding 
datanode's latest update time before putting the heartbeat into the queue, in 
order to avoid the penalty on DN due to SCM's local latency.
# For Node state, we may want to follow the current HDFS, i.e., we need to have 
AdminStates which includes NORMAL, DECOMMISSION_INPROGRESS, DECOMMISSIONED, 
ENTERING_MAINTENANCE, and IN_MAINTENANCE. Stale/dead are calculated based on 
the latest heartbeat time thus maybe we do not need to define them as an 
explicit state (and for dead nodes we may want to directly remove it).
{code}
36   * 4. A node can be in any of these 4 states: {HEALTHY, STALE, DEAD,
37   * DECOMMISSIONED}
38   * 
39   * HEALTHY - It is a datanode that is regularly heartbeating us.
40   *
41   * STALE - A datanode for which we have missed few heart beats.
42   *
43   * DEAD - A datanode that we have not heard from for a while.
44   *
45   * DECOMMISSIONED - Someone told us to remove this node from the 
tracking
46   * list, by calling removeNode. We will throw away this nodes info soon.
{code}
# {{getNodes}}/{{getNodeCount}} can be defined in a metrics interface (like 
today's FSNamesystemMBean).
# Any reason we need a NodeManager interface?



> Ozone: SCM: Add NodeManager
> ---
>
> Key: HDFS-10897
> URL: https://issues.apache.org/jira/browse/HDFS-10897
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: ozone
>Affects Versions: HDFS-7240
>Reporter: Anu Engineer
>Assignee: Anu Engineer
> Attachments: HDFS-10897-HDFS-7240.001.patch, 
> HDFS-10897-HDFS-7240.002.patch, HDFS-10897-HDFS-7240.003.patch
>
>
> Add a nodeManager class that will

[jira] [Commented] (HDFS-10897) Ozone: SCM: Add NodeManager

2016-09-28 Thread Jing Zhao (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-10897?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15530818#comment-15530818
 ] 

Jing Zhao commented on HDFS-10897:
--

Thanks for working on this, [~anu]. The patch looks good to me overall. Some 
comments:
# My main concern is about the current way tracking the heartbeat time for 
DataNodes. Instead of using 3 String-Long maps, I think it's better to use 
{{DatanodeInfo}} to store the latest heartbeat/report time. Later we still need 
to capture other information about DataNodes (its current load and state etc.) 
thus {{DatanodeInfo}} can be the central place to store all the information 
about a DN (just like today's HDFS). Also in this way we only need to maintain 
a single datanode map (which is more static compared with the current 3 maps) 
and most of the lock protection can be put into the DatanodeInfo level.
# Also with this change we can have a more fair way for heartbeat time 
calculation: for every heartbeat msg, we can update the corresponding 
datanode's latest update time before putting the heartbeat into the queue, in 
order to avoid the penalty on DN due to SCM's local latency.
# For Node state, we may want to follow the current HDFS, i.e., we need to have 
AdminStates which includes NORMAL, DECOMMISSION_INPROGRESS, DECOMMISSIONED, 
ENTERING_MAINTENANCE, and IN_MAINTENANCE. Stale/dead are calculated based on 
the latest heartbeat time thus maybe we do not need to define them as an 
explicit state (and for dead nodes we may want to directly remove it).
{code}
36   * 4. A node can be in any of these 4 states: {HEALTHY, STALE, DEAD,
37   * DECOMMISSIONED}
38   * 
39   * HEALTHY - It is a datanode that is regularly heartbeating us.
40   *
41   * STALE - A datanode for which we have missed few heart beats.
42   *
43   * DEAD - A datanode that we have not heard from for a while.
44   *
45   * DECOMMISSIONED - Someone told us to remove this node from the 
tracking
46   * list, by calling removeNode. We will throw away this nodes info soon.
{code}
# {{getNodes}}/{{getNodeCount}} can be defined in a metrics interface (like 
today's FSNamesystemMBean).
# Any reason we need a NodeManager interface?



> Ozone: SCM: Add NodeManager
> ---
>
> Key: HDFS-10897
> URL: https://issues.apache.org/jira/browse/HDFS-10897
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: ozone
>Affects Versions: HDFS-7240
>Reporter: Anu Engineer
>Assignee: Anu Engineer
> Attachments: HDFS-10897-HDFS-7240.001.patch, 
> HDFS-10897-HDFS-7240.002.patch, HDFS-10897-HDFS-7240.003.patch
>
>
> Add a nodeManager class that will be used by Storage Controller Manager 
> eventually.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-10826) The result of fsck should be CRITICAL when there are unrecoverable ec block groups.

2016-09-28 Thread Jing Zhao (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-10826?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15530710#comment-15530710
 ] 

Jing Zhao commented on HDFS-10826:
--

# The failure of TestLeaseRecoveryStriped seems related. Could you please take 
a look, [~tasanuma0829]?
# {{countNodes}} has already been called in {{createLocatedBlock}}. We can 
reuse the result.
{code}
1071final boolean isCorrupt;
1072if (blk.isStriped()) {
1073  BlockInfoStriped sblk = (BlockInfoStriped) blk;
1074  isCorrupt = numCorruptReplicas != 0 &&
1075  countNodes(blk).liveReplicas() < sblk.getRealDataBlockNum();
1076} else {
1077  isCorrupt = numCorruptReplicas != 0 && numCorruptReplicas == 
numNodes;
1078}
{code}


> The result of fsck should be CRITICAL when there are unrecoverable ec block 
> groups.
> ---
>
> Key: HDFS-10826
> URL: https://issues.apache.org/jira/browse/HDFS-10826
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: erasure-coding
>Reporter: Takanobu Asanuma
>Assignee: Takanobu Asanuma
> Attachments: HDFS-10826.2.patch, HDFS-10826.3.patch, 
> HDFS-10826.WIP.1.patch
>
>
> For RS-6-3, when there is one ec block group and
> 1) 0~3 out of 9 internal blocks are missing, the result of fsck is HEALTY.
> 2) 4~8 out of 9 internal blocks are missing, the result of fsck is HEALTY.
> {noformat}
> Erasure Coded Block Groups:
>  Total size:536870912 B
>  Total files:   1
>  Total block groups (validated):1 (avg. block group size 536870912 B)
>   
>   UNRECOVERABLE BLOCK GROUPS:   1 (100.0 %)
>   
>  Minimally erasure-coded block groups:  0 (0.0 %)
>  Over-erasure-coded block groups:   0 (0.0 %)
>  Under-erasure-coded block groups:  1 (100.0 %)
>  Unsatisfactory placement block groups: 0 (0.0 %)
>  Default ecPolicy:  RS-DEFAULT-6-3-64k
>  Average block group size:  5.0
>  Missing block groups:  0
>  Corrupt block groups:  0
>  Missing internal blocks:   4 (44.43 %)
> FSCK ended at Wed Aug 31 13:42:05 JST 2016 in 4 milliseconds
> The filesystem under path '/' is HEALTHY
> {noformat}
> 3) 9 out of 9 internal blocks are missing, the result of fsck is CRITICAL. 
> (Because it is regarded as a missing block group.)
> In case 2), the result should be CRITICAL since the ec block group is 
> unrecoverable.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

1 2 3 4 5 6 7 8 9 10 >

1 - 100 of 4287 matches

Mail list logo