[jira] [Resolved] (HDFS-17181) WebHDFS not considering whether a DN is good when called from outside the cluster

2024-02-20 Thread Wei-Chiu Chuang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17181?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang resolved HDFS-17181.

Fix Version/s: 3.5.0
   Resolution: Fixed

> WebHDFS not considering whether a DN is good when called from outside the 
> cluster
> -
>
> Key: HDFS-17181
> URL: https://issues.apache.org/jira/browse/HDFS-17181
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode, webhdfs
>Affects Versions: 3.3.6
>Reporter: Lars Francke
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.5.0
>
> Attachments: Test_fix_for_HDFS-171811.patch
>
>
> When calling WebHDFS to create a file (I'm sure the same problem occurs for 
> other actions e.g. OPEN but I haven't checked all of them yet) it will 
> happily redirect to nodes that are in maintenance.
> The reason is in the 
> [{{chooseDatanode}}|https://github.com/apache/hadoop/blob/rel/release-3.3.6/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/web/resources/NamenodeWebHdfsMethods.java#L307C9-L315]
>  method in {{NamenodeWebHdfsMethods}} where it will only call the 
> {{BlockPlacementPolicy}} (which considers all these edge cases) in case the 
> {{remoteAddr}} (i.e. the address making the request to WebHDFS) is also 
> running a DataNode.
>  
> In all other cases it just refers to 
> [{{NetworkTopology#chooseRandom}}|https://github.com/apache/hadoop/blob/rel/release-3.3.6/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/web/resources/NamenodeWebHdfsMethods.java#L342-L343]
>  which does not consider any of these circumstances (e.g. load, maintenance).
> I don't understand the reason to not just always refer to the placement 
> policy and we're currently testing a patch to do just that.
> I have attached a draft patch for now.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-17024) Potential data race introduced by HDFS-15865

2023-10-26 Thread Wei-Chiu Chuang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17024?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang resolved HDFS-17024.

Fix Version/s: 3.4.0
   Resolution: Fixed

> Potential data race introduced by HDFS-15865
> 
>
> Key: HDFS-17024
> URL: https://issues.apache.org/jira/browse/HDFS-17024
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: dfsclient
>Affects Versions: 3.3.1
>Reporter: Wei-Chiu Chuang
>Assignee: Segawa Hiroaki
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>
> After HDFS-15865, we found client aborted due to an NPE.
> {noformat}
> 2023-04-10 16:07:43,409 ERROR 
> org.apache.hadoop.hbase.regionserver.HRegionServer: * ABORTING region 
> server kqhdp36,16020,1678077077562: Replay of WAL required. Forcing server 
> shutdown *
> org.apache.hadoop.hbase.DroppedSnapshotException: region: WAFER_ALL,16|CM 
> RIE.MA1|CP1114561.18|PROC|,1625899466315.0fbdf0f1810efa9e68af831247e6555f.
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushCacheAndCommit(HRegion.java:2870)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2539)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2511)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.flushcache(HRegion.java:2401)
> at 
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:613)
> at 
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:582)
> at 
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher.access$1000(MemStoreFlusher.java:69)
> at 
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher$FlushHandler.run(MemStoreFlusher.java:362)
> at java.lang.Thread.run(Thread.java:748)
> Caused by: java.lang.NullPointerException
> at 
> org.apache.hadoop.hdfs.DataStreamer.waitForAckedSeqno(DataStreamer.java:880)
> at 
> org.apache.hadoop.hdfs.DFSOutputStream.flushInternal(DFSOutputStream.java:781)
> at 
> org.apache.hadoop.hdfs.DFSOutputStream.closeImpl(DFSOutputStream.java:898)
> at 
> org.apache.hadoop.hdfs.DFSOutputStream.close(DFSOutputStream.java:850)
> at 
> org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:76)
> at 
> org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:105)
> at 
> org.apache.hadoop.hbase.io.hfile.HFileWriterImpl.finishClose(HFileWriterImpl.java:859)
> at 
> org.apache.hadoop.hbase.io.hfile.HFileWriterImpl.close(HFileWriterImpl.java:687)
> at 
> org.apache.hadoop.hbase.regionserver.StoreFileWriter.close(StoreFileWriter.java:393)
> at 
> org.apache.hadoop.hbase.regionserver.StoreFlusher.finalizeWriter(StoreFlusher.java:69)
> at 
> org.apache.hadoop.hbase.regionserver.DefaultStoreFlusher.flushSnapshot(DefaultStoreFlusher.java:78)
> at 
> org.apache.hadoop.hbase.regionserver.HStore.flushCache(HStore.java:1047)
> at 
> org.apache.hadoop.hbase.regionserver.HStore$StoreFlusherImpl.flushCache(HStore.java:2349)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushCacheAndCommit(HRegion.java:2806)
> {noformat}
> This is only possible if a data race happened. File this jira to improve the 
> data and eliminate the data race.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-17080) Fix ec connection leak (GitHub PR#5807)

2023-07-11 Thread Wei-Chiu Chuang (Jira)
Wei-Chiu Chuang created HDFS-17080:
--

 Summary: Fix ec connection leak (GitHub PR#5807)
 Key: HDFS-17080
 URL: https://issues.apache.org/jira/browse/HDFS-17080
 Project: Hadoop HDFS
  Issue Type: Task
Reporter: Wei-Chiu Chuang


Creating this jira to track GitHub PR #5807.

{quote}Description of PR
This PR fixes EC connection leak if got exception when constructing reader.

How was this patch tested?
Cluster: Presto
Data: EC
Query: select col from table(EC data) limit 10

Presto is a long time running process to deal with query.
In this case, when it gets 10 records, it will interrupt other threads.
Other threads may be in constructing reader or getting next record.
If getting next record is interrupted, it will be caught and Presto will close 
it.
But if constructing reader is interrupted, Presto cannot close it because 
reader in Presto has not been created.
So we can observe whether EC connection is closed when doing EC limit query.
Use netstat command.{quote}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-17024) Potential data race introduced by HDFS-15865

2023-05-23 Thread Wei-Chiu Chuang (Jira)
Wei-Chiu Chuang created HDFS-17024:
--

 Summary: Potential data race introduced by HDFS-15865
 Key: HDFS-17024
 URL: https://issues.apache.org/jira/browse/HDFS-17024
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Wei-Chiu Chuang


After HDFS-15865, we found client aborted due to an NPE.
{noformat}
2023-04-10 16:07:43,409 ERROR 
org.apache.hadoop.hbase.regionserver.HRegionServer: * ABORTING region 
server kqhdp36,16020,1678077077562: Replay of WAL required. Forcing server 
shutdown *
org.apache.hadoop.hbase.DroppedSnapshotException: region: WAFER_ALL,16|CM 
RIE.MA1|CP1114561.18|PROC|,1625899466315.0fbdf0f1810efa9e68af831247e6555f.
at 
org.apache.hadoop.hbase.regionserver.HRegion.internalFlushCacheAndCommit(HRegion.java:2870)
at 
org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2539)
at 
org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2511)
at 
org.apache.hadoop.hbase.regionserver.HRegion.flushcache(HRegion.java:2401)
at 
org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:613)
at 
org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:582)
at 
org.apache.hadoop.hbase.regionserver.MemStoreFlusher.access$1000(MemStoreFlusher.java:69)
at 
org.apache.hadoop.hbase.regionserver.MemStoreFlusher$FlushHandler.run(MemStoreFlusher.java:362)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.NullPointerException
at 
org.apache.hadoop.hdfs.DataStreamer.waitForAckedSeqno(DataStreamer.java:880)
at 
org.apache.hadoop.hdfs.DFSOutputStream.flushInternal(DFSOutputStream.java:781)
at 
org.apache.hadoop.hdfs.DFSOutputStream.closeImpl(DFSOutputStream.java:898)
at 
org.apache.hadoop.hdfs.DFSOutputStream.close(DFSOutputStream.java:850)
at 
org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:76)
at 
org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:105)
at 
org.apache.hadoop.hbase.io.hfile.HFileWriterImpl.finishClose(HFileWriterImpl.java:859)
at 
org.apache.hadoop.hbase.io.hfile.HFileWriterImpl.close(HFileWriterImpl.java:687)
at 
org.apache.hadoop.hbase.regionserver.StoreFileWriter.close(StoreFileWriter.java:393)
at 
org.apache.hadoop.hbase.regionserver.StoreFlusher.finalizeWriter(StoreFlusher.java:69)
at 
org.apache.hadoop.hbase.regionserver.DefaultStoreFlusher.flushSnapshot(DefaultStoreFlusher.java:78)
at 
org.apache.hadoop.hbase.regionserver.HStore.flushCache(HStore.java:1047)
at 
org.apache.hadoop.hbase.regionserver.HStore$StoreFlusherImpl.flushCache(HStore.java:2349)
at 
org.apache.hadoop.hbase.regionserver.HRegion.internalFlushCacheAndCommit(HRegion.java:2806)
{noformat}

This is only possible if a data race happened. File this jira to improve the 
data and eliminate the data race.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-16947) RBF NamenodeHeartbeatService to report error for not being able to register namenode in state store

2023-03-15 Thread Wei-Chiu Chuang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16947?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang resolved HDFS-16947.

Resolution: Fixed

> RBF NamenodeHeartbeatService to report error for not being able to register 
> namenode in state store
> ---
>
> Key: HDFS-16947
> URL: https://issues.apache.org/jira/browse/HDFS-16947
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Viraj Jasani
>Assignee: Viraj Jasani
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>
> Namenode heartbeat service should provide error with full stacktrace if it 
> cannot register namenode in the state store. As of today, we only log info 
> msg.
> For zookeeper based impl, this might mean either a) curator manager is not 
> initialized or b) if it fails to write to znode after exhausting retries. For 
> either of these cases, reporting only INFO log might not be good enough and 
> we might have to look for errors elsewhere.
>  
> Sample example:
> {code:java}
> 2023-02-20 23:10:33,714 DEBUG [NamenodeHeartbeatService {ns} nn0-0] 
> router.NamenodeHeartbeatService - Received service state: ACTIVE from HA 
> namenode: {ns}-nn0:nn-0-{ns}.{cluster}:9000
> 2023-02-20 23:10:33,731 INFO  [NamenodeHeartbeatService {ns} nn0-0] 
> impl.MembershipStoreImpl - Inserting new NN registration: 
> nn-0.namenode.{cluster}:->{ns}:nn0:nn-0-{ns}.{cluster}:9000-ACTIVE
> 2023-02-20 23:10:33,731 INFO  [NamenodeHeartbeatService {ns} nn0-0] 
> router.NamenodeHeartbeatService - Cannot register namenode in the State Store
>  {code}
> If we could log full stacktrace:
> {code:java}
> 2023-02-21 00:20:24,691 ERROR [NamenodeHeartbeatService {ns} nn0-0] 
> router.NamenodeHeartbeatService - Cannot register namenode in the State Store
> org.apache.hadoop.hdfs.server.federation.store.StateStoreUnavailableException:
>  State Store driver StateStoreZooKeeperImpl in nn-0.namenode.{cluster} is not 
> ready.
>         at 
> org.apache.hadoop.hdfs.server.federation.store.driver.StateStoreDriver.verifyDriverReady(StateStoreDriver.java:158)
>         at 
> org.apache.hadoop.hdfs.server.federation.store.driver.impl.StateStoreZooKeeperImpl.putAll(StateStoreZooKeeperImpl.java:235)
>         at 
> org.apache.hadoop.hdfs.server.federation.store.driver.impl.StateStoreBaseImpl.put(StateStoreBaseImpl.java:74)
>         at 
> org.apache.hadoop.hdfs.server.federation.store.impl.MembershipStoreImpl.namenodeHeartbeat(MembershipStoreImpl.java:179)
>         at 
> org.apache.hadoop.hdfs.server.federation.resolver.MembershipNamenodeResolver.registerNamenode(MembershipNamenodeResolver.java:381)
>         at 
> org.apache.hadoop.hdfs.server.federation.router.NamenodeHeartbeatService.updateState(NamenodeHeartbeatService.java:317)
>         at 
> org.apache.hadoop.hdfs.server.federation.router.NamenodeHeartbeatService.lambda$periodicInvoke$0(NamenodeHeartbeatService.java:244)
> ...
> ... {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-16873) FileStatus compareTo does not specify ordering

2022-12-20 Thread Wei-Chiu Chuang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16873?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang resolved HDFS-16873.

Fix Version/s: 3.4.0
   Resolution: Fixed

> FileStatus compareTo does not specify ordering
> --
>
> Key: HDFS-16873
> URL: https://issues.apache.org/jira/browse/HDFS-16873
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: DDillon
>Assignee: DDillon
>Priority: Trivial
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>
> The Javadoc of FileStatus does not specify the field and manner in which 
> objects are ordered. In order to use the Comparable interface, this is 
> critical to understand to avoid making any assumptions. Inspection of code 
> showed that it is by path name quite quickly, but we shouldn't have to go 
> into code to confirm any obvious assumptions.  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-16871) DiskBalancer process may throws IllegalArgumentException when the target DataNode has capital letter in hostname

2022-12-20 Thread Wei-Chiu Chuang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16871?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang resolved HDFS-16871.

Fix Version/s: 3.4.0
   Resolution: Fixed

> DiskBalancer process may throws IllegalArgumentException when the target 
> DataNode has capital letter in hostname
> 
>
> Key: HDFS-16871
> URL: https://issues.apache.org/jira/browse/HDFS-16871
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Daniel Ma
>Assignee: Daniel Ma
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
> Attachments: screenshot-1.png, screenshot-2.png
>
>
> DiskBalancer process read DataNode hostname as lowercase letters,
>  !screenshot-1.png! 
>  but there is no letter case transform when getNodeByName.
>  !screenshot-2.png! 
> For a DataNode with lowercase hostname. everything is ok.
> But for a DataNode with uppercase hostname, when Balancer process try to 
> migrate on it,  there will be a IllegalArgumentException thrown as below,
> {code:java}
> 2022-10-09 16:15:26,631 ERROR tools.DiskBalancerCLI: 
> java.lang.IllegalArgumentException: Unable to find the specified node. 
> node-group-1YlRf0002
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-16854) TestDFSIO to support non-default file system

2022-12-01 Thread Wei-Chiu Chuang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16854?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang resolved HDFS-16854.

Resolution: Duplicate

> TestDFSIO to support non-default file system
> 
>
> Key: HDFS-16854
> URL: https://issues.apache.org/jira/browse/HDFS-16854
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Wei-Chiu Chuang
>Assignee: Wei-Chiu Chuang
>Priority: Major
>
> TestDFSIO expects a parameter {{-Dtest.build.data=}} which is where the data 
> is located. Only paths on the default file system is supported. Running t 
> against other file systems, such as Ozone, throws an exception.
> It can be worked around by specifying {{-Dfs.defaultFS=}} but it would be 
> even nicer to support non-default file systems out of box, because no one 
> would know this trick unless she looks at the code.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-16839) It should consider EC reconstruction work when we determine if a node is busy

2022-11-29 Thread Wei-Chiu Chuang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16839?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang resolved HDFS-16839.

Resolution: Fixed

> It should consider EC reconstruction work when we determine if a node is busy
> -
>
> Key: HDFS-16839
> URL: https://issues.apache.org/jira/browse/HDFS-16839
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Kidd5368
>Assignee: Kidd5368
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.3.9
>
>
> In chooseSourceDatanodes( ), I think it's more reasonable if we take EC 
> reconstruction work as a consideration when we determine if a node is busy or 
> not.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-16854) TestDFSIO to support non-default file system

2022-11-23 Thread Wei-Chiu Chuang (Jira)
Wei-Chiu Chuang created HDFS-16854:
--

 Summary: TestDFSIO to support non-default file system
 Key: HDFS-16854
 URL: https://issues.apache.org/jira/browse/HDFS-16854
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Wei-Chiu Chuang


TestDFSIO expects a parameter {{-Dtest.build.data=}} which is where the data is 
located. Only paths on the default file system is supported. Trying to run it 
against other file systems, such as Ozone, throws an exception.

It can be worked around by specifying {{-Dfs.defaultFS=}} but it would be even 
nice to support non-default file systems out of box, because no one would know 
this trick unless she looks at the code.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-9536) OOM errors during parallel upgrade to Block-ID based layout

2022-10-29 Thread Wei-Chiu Chuang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-9536?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang resolved HDFS-9536.
---
Resolution: Duplicate

I believe this is no longer an issue after HDFS-15937 and HDFS-15610.

> OOM errors during parallel upgrade to Block-ID based layout
> ---
>
> Key: HDFS-9536
> URL: https://issues.apache.org/jira/browse/HDFS-9536
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Vinayakumar B
>Assignee: Vinayakumar B
>Priority: Major
>
> This is a follow-up jira for the OOM errors observed during parallel upgrade 
> to Block-ID based datanode layout using HDFS-8578 fix.
> more clue 
> [here|https://issues.apache.org/jira/browse/HDFS-8578?focusedCommentId=15042012=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15042012]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-4043) Namenode Kerberos Login does not use proper hostname for host qualified hdfs principal name.

2022-08-22 Thread Wei-Chiu Chuang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-4043?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang resolved HDFS-4043.
---
Resolution: Fixed

> Namenode Kerberos Login does not use proper hostname for host qualified hdfs 
> principal name.
> 
>
> Key: HDFS-4043
> URL: https://issues.apache.org/jira/browse/HDFS-4043
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: security
>Affects Versions: 2.0.0-alpha, 2.0.1-alpha, 2.0.2-alpha, 2.0.3-alpha, 
> 3.4.0, 3.3.9
> Environment: CDH4U1 on Ubuntu 12.04
>Reporter: Ahad Rana
>Assignee: Steve Vaughan
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.3.9
>
>   Original Estimate: 24h
>  Time Spent: 50m
>  Remaining Estimate: 23h 10m
>
> The Namenode uses the loginAsNameNodeUser method in NameNode.java to login 
> using the hdfs principal. This method in turn invokes SecurityUtil.login with 
> a hostname (last parameter) obtained via a call to InetAddress.getHostName. 
> This call does not always return the fully qualified host name, and thus 
> causes the namenode to login to fail due to kerberos's inability to find a 
> matching hdfs principal in the hdfs.keytab file. Instead it should use 
> InetAddress.getCanonicalHostName. This is consistent with what is used 
> internally by SecurityUtil.java to login in other services, such as the 
> DataNode. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-16730) Update the doc that append to EC files is supported

2022-08-16 Thread Wei-Chiu Chuang (Jira)
Wei-Chiu Chuang created HDFS-16730:
--

 Summary: Update the doc that append to EC files is supported
 Key: HDFS-16730
 URL: https://issues.apache.org/jira/browse/HDFS-16730
 Project: Hadoop HDFS
  Issue Type: Task
Reporter: Wei-Chiu Chuang


Our doc has a statement regarding EC limitations:
https://hadoop.apache.org/docs/r3.3.0/hadoop-project-dist/hadoop-hdfs/HDFSErasureCoding.html#Limitations

{noformat}
append() and truncate() on an erasure coded file will throw IOException.

{noformat}

In fact, HDFS-7663 added the support since Hadoop 3.3.0. The caveat is that it 
supports "Append to a closed striped file, with NEW_BLOCK flag enabled"



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-16727) Consider reading chunk files using MappedByteBuffer

2022-08-10 Thread Wei-Chiu Chuang (Jira)
Wei-Chiu Chuang created HDFS-16727:
--

 Summary: Consider reading chunk files using MappedByteBuffer
 Key: HDFS-16727
 URL: https://issues.apache.org/jira/browse/HDFS-16727
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Wei-Chiu Chuang
 Attachments: Screen Shot 2022-08-04 at 7.12.55 AM.png, 
ozone_dn-rhel03.ozone.cisco.local.html

Running Impala TPC-DS which stresses Ozone DN read path.

BufferUtils#assignByteBuffers stands out as one of the offender.

We can experiment with MappedByteBuffer and see if it makes performance better.




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-16619) Fix HttpHeaders.Values And HttpHeaders.Names Deprecated Import.

2022-07-27 Thread Wei-Chiu Chuang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16619?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang resolved HDFS-16619.

Resolution: Fixed

> Fix HttpHeaders.Values And HttpHeaders.Names Deprecated Import.
> ---
>
> Key: HDFS-16619
> URL: https://issues.apache.org/jira/browse/HDFS-16619
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 3.4.0
>Reporter: fanshilun
>Assignee: fanshilun
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
> Attachments: Fix HttpHeaders.Values And HttpHeaders.Names 
> Deprecated.png
>
>  Time Spent: 4h 40m
>  Remaining Estimate: 0h
>
> HttpHeaders.Values ​​and HttpHeaders.Names are deprecated, use 
> HttpHeaderValues ​​and HttpHeaderNames instead.
> HttpHeaders.Names
> Deprecated. 
> Use HttpHeaderNames instead. Standard HTTP header names.
> {code:java}
> /** @deprecated */
> @Deprecated
> public static final class Names {
>   public static final String ACCEPT = "Accept";
>   public static final String ACCEPT_CHARSET = "Accept-Charset";
>   public static final String ACCEPT_ENCODING = "Accept-Encoding";
>   public static final String ACCEPT_LANGUAGE = "Accept-Language";
>   public static final String ACCEPT_RANGES = "Accept-Ranges";
>   public static final String ACCEPT_PATCH = "Accept-Patch";
>   public static final String ACCESS_CONTROL_ALLOW_CREDENTIALS = 
> "Access-Control-Allow-Credentials";
>   public static final String ACCESS_CONTROL_ALLOW_HEADERS = 
> "Access-Control-Allow-Headers"; {code}
> HttpHeaders.Values
> Deprecated. 
> Use HttpHeaderValues instead. Standard HTTP header values.
> {code:java}
> /** @deprecated */
> @Deprecated
> public static final class Values {
>   public static final String APPLICATION_JSON = "application/json";
>   public static final String APPLICATION_X_WWW_FORM_URLENCODED = 
> "application/x-www-form-urlencoded";
>   public static final String BASE64 = "base64";
>   public static final String BINARY = "binary";
>   public static final String BOUNDARY = "boundary";
>   public static final String BYTES = "bytes";
>   public static final String CHARSET = "charset";
>   public static final String CHUNKED = "chunked";
>   public static final String CLOSE = "close"; {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-16595) Slow peer metrics - add median, mad and upper latency limits

2022-06-03 Thread Wei-Chiu Chuang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16595?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang resolved HDFS-16595.

Fix Version/s: 3.4.0
   Resolution: Fixed

> Slow peer metrics - add median, mad and upper latency limits
> 
>
> Key: HDFS-16595
> URL: https://issues.apache.org/jira/browse/HDFS-16595
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Reporter: Viraj Jasani
>Assignee: Viraj Jasani
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>  Time Spent: 3h 10m
>  Remaining Estimate: 0h
>
> Slow datanode metrics include slow node and it's reporting node details. With 
> HDFS-16582, we added the aggregate latency that is perceived by the reporting 
> nodes.
> In order to get more insights into how the outlier slownode's latencies 
> differ from the rest of the nodes, we should also expose median, median 
> absolute deviation and the calculated upper latency limit details.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-16583) DatanodeAdminDefaultMonitor can get stuck in an infinite loop

2022-05-31 Thread Wei-Chiu Chuang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16583?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang resolved HDFS-16583.

Resolution: Fixed

> DatanodeAdminDefaultMonitor can get stuck in an infinite loop
> -
>
> Key: HDFS-16583
> URL: https://issues.apache.org/jira/browse/HDFS-16583
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Stephen O'Donnell
>Assignee: Stephen O'Donnell
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.2.4, 3.3.4
>
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> We encountered a case where the decommission monitor in the namenode got 
> stuck for about 6 hours. The logs give:
> {code}
> 2022-05-15 01:09:25,490 INFO 
> org.apache.hadoop.hdfs.server.blockmanagement.HeartbeatManager: Stopping 
> maintenance of dead node 10.185.3.132:50010
> 2022-05-15 01:10:20,918 INFO org.apache.hadoop.http.HttpServer2: Process 
> Thread Dump: jsp requested
> 
> 2022-05-15 01:19:06,810 WARN 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: 
> PendingReconstructionMonitor timed out blk_4501753665_3428271426
> 2022-05-15 01:19:06,810 WARN 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: 
> PendingReconstructionMonitor timed out blk_4501753659_3428271420
> 2022-05-15 01:19:06,810 WARN 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: 
> PendingReconstructionMonitor timed out blk_4501753662_3428271423
> 2022-05-15 01:19:06,810 WARN 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: 
> PendingReconstructionMonitor timed out blk_4501753663_3428271424
> 2022-05-15 06:00:57,281 INFO 
> org.apache.hadoop.hdfs.server.blockmanagement.HeartbeatManager: Stopping 
> maintenance of dead node 10.185.3.34:50010
> 2022-05-15 06:00:58,105 INFO 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: FSNamesystem write lock 
> held for 17492614 ms via
> java.lang.Thread.getStackTrace(Thread.java:1559)
> org.apache.hadoop.util.StringUtils.getStackTrace(StringUtils.java:1032)
> org.apache.hadoop.hdfs.server.namenode.FSNamesystemLock.writeUnlock(FSNamesystemLock.java:263)
> org.apache.hadoop.hdfs.server.namenode.FSNamesystemLock.writeUnlock(FSNamesystemLock.java:220)
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.writeUnlock(FSNamesystem.java:1601)
> org.apache.hadoop.hdfs.server.blockmanagement.DatanodeAdminManager$Monitor.run(DatanodeAdminManager.java:496)
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> java.lang.Thread.run(Thread.java:748)
>   Number of suppressed write-lock reports: 0
>   Longest write-lock held interval: 17492614
> {code}
> We only have the one thread dump triggered by the FC:
> {code}
> Thread 80 (DatanodeAdminMonitor-0):
>   State: RUNNABLE
>   Blocked count: 16
>   Waited count: 453693
>   Stack:
> 
> org.apache.hadoop.hdfs.server.blockmanagement.DatanodeAdminManager$Monitor.check(DatanodeAdminManager.java:538)
> 
> org.apache.hadoop.hdfs.server.blockmanagement.DatanodeAdminManager$Monitor.run(DatanodeAdminManager.java:494)
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
> 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
> 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
> 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> java.lang.Thread.run(Thread.java:748)
> {code}
> This was the line of code:
> {code}
> private void check() {
>   final Iterator>>
>   it = new CyclicIteration<>(outOfServiceNodeBlocks,
>   iterkey).iterator();
>   final LinkedList toRemove = new LinkedList<>();
>   while (it.hasNext() && !exceededNumBlocksPerCheck() && namesystem
>   .isRunning()) {
> numNodesChecked++;
> final Map.Entry>
> entry = it.next();
> final DatanodeDescriptor dn = entry.getKey();
> AbstractList blocks = entry.getValue();
> boolean fullScan = false;
> if (dn.isMaintenance() && 

[jira] [Resolved] (HDFS-16603) Improve DatanodeHttpServer With Netty recommended method

2022-05-31 Thread Wei-Chiu Chuang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16603?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang resolved HDFS-16603.

Resolution: Fixed

> Improve DatanodeHttpServer With Netty recommended method
> 
>
> Key: HDFS-16603
> URL: https://issues.apache.org/jira/browse/HDFS-16603
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: fanshilun
>Assignee: fanshilun
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> When reading the code, I found that some usage methods are outdated due to 
> the upgrade of netty components.
> {color:#172b4d}*1.DatanodeHttpServer#Constructor*{color}
> {code:java}
> @Deprecated
> public static final ChannelOption WRITE_BUFFER_HIGH_WATER_MARK = 
> valueOf("WRITE_BUFFER_HIGH_WATER_MARK"); 
> Deprecated. Use WRITE_BUFFER_WATER_MARK
> @Deprecated
> public static final ChannelOption WRITE_BUFFER_LOW_WATER_MARK = 
> valueOf("WRITE_BUFFER_LOW_WATER_MARK");
> Deprecated. Use WRITE_BUFFER_WATER_MARK
> -
> this.httpServer.childOption(
>           ChannelOption.WRITE_BUFFER_HIGH_WATER_MARK,
>           conf.getInt(
>               DFSConfigKeys.DFS_WEBHDFS_NETTY_HIGH_WATERMARK,
>               DFSConfigKeys.DFS_WEBHDFS_NETTY_HIGH_WATERMARK_DEFAULT));
> this.httpServer.childOption(
>           ChannelOption.WRITE_BUFFER_LOW_WATER_MARK,
>           conf.getInt(
>               DFSConfigKeys.DFS_WEBHDFS_NETTY_LOW_WATERMARK,
>               DFSConfigKeys.DFS_WEBHDFS_NETTY_LOW_WATERMARK_DEFAULT));
> {code}
> *2.Duplicate code* 
> {code:java}
> ChannelFuture f = httpServer.bind(infoAddr);
> try {
>  f.syncUninterruptibly();
> } catch (Throwable e) {
>   if (e instanceof BindException) {
>    throw NetUtils.wrapException(null, 0, infoAddr.getHostName(),
>    infoAddr.getPort(), (SocketException) e);
>  } else {
>    throw e;
>  }
> }
> httpAddress = (InetSocketAddress) f.channel().localAddress();
> LOG.info("Listening HTTP traffic on " + httpAddress);{code}
> *3.io.netty.bootstrap.ChannelFactory Deprecated*
> *use io.netty.channel.ChannelFactory instead.*
> {code:java}
> /** @deprecated */
> @Deprecated
> public interface ChannelFactory {
>     T newChannel();
> }{code}



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-16585) Add @VisibleForTesting in Dispatcher.java after HDFS-16268

2022-05-26 Thread Wei-Chiu Chuang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16585?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang resolved HDFS-16585.

Fix Version/s: 3.4.0
   3.2.4
   3.3.4
   Resolution: Fixed

> Add @VisibleForTesting in Dispatcher.java after HDFS-16268
> --
>
> Key: HDFS-16585
> URL: https://issues.apache.org/jira/browse/HDFS-16585
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Wei-Chiu Chuang
>Assignee: groot
>Priority: Trivial
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.2.4, 3.3.4
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> The scope of a few methods were opened up by HDFS-16268 to facilitate unit 
> testing. We should annotate them with {{@VisibleForTesting}} so that they 
> don't get used by production code.
> The affected methods include:
> PendingMove
> markMovedIfGoodBlock
> isGoodBlockCandidate



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-16583) DatanodeAdminDefaultMonitor can get stuck in an infinite loop

2022-05-26 Thread Wei-Chiu Chuang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16583?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang resolved HDFS-16583.

Resolution: Fixed

> DatanodeAdminDefaultMonitor can get stuck in an infinite loop
> -
>
> Key: HDFS-16583
> URL: https://issues.apache.org/jira/browse/HDFS-16583
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Stephen O'Donnell
>Assignee: Stephen O'Donnell
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.3.4
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> We encountered a case where the decommission monitor in the namenode got 
> stuck for about 6 hours. The logs give:
> {code}
> 2022-05-15 01:09:25,490 INFO 
> org.apache.hadoop.hdfs.server.blockmanagement.HeartbeatManager: Stopping 
> maintenance of dead node 10.185.3.132:50010
> 2022-05-15 01:10:20,918 INFO org.apache.hadoop.http.HttpServer2: Process 
> Thread Dump: jsp requested
> 
> 2022-05-15 01:19:06,810 WARN 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: 
> PendingReconstructionMonitor timed out blk_4501753665_3428271426
> 2022-05-15 01:19:06,810 WARN 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: 
> PendingReconstructionMonitor timed out blk_4501753659_3428271420
> 2022-05-15 01:19:06,810 WARN 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: 
> PendingReconstructionMonitor timed out blk_4501753662_3428271423
> 2022-05-15 01:19:06,810 WARN 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: 
> PendingReconstructionMonitor timed out blk_4501753663_3428271424
> 2022-05-15 06:00:57,281 INFO 
> org.apache.hadoop.hdfs.server.blockmanagement.HeartbeatManager: Stopping 
> maintenance of dead node 10.185.3.34:50010
> 2022-05-15 06:00:58,105 INFO 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: FSNamesystem write lock 
> held for 17492614 ms via
> java.lang.Thread.getStackTrace(Thread.java:1559)
> org.apache.hadoop.util.StringUtils.getStackTrace(StringUtils.java:1032)
> org.apache.hadoop.hdfs.server.namenode.FSNamesystemLock.writeUnlock(FSNamesystemLock.java:263)
> org.apache.hadoop.hdfs.server.namenode.FSNamesystemLock.writeUnlock(FSNamesystemLock.java:220)
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.writeUnlock(FSNamesystem.java:1601)
> org.apache.hadoop.hdfs.server.blockmanagement.DatanodeAdminManager$Monitor.run(DatanodeAdminManager.java:496)
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> java.lang.Thread.run(Thread.java:748)
>   Number of suppressed write-lock reports: 0
>   Longest write-lock held interval: 17492614
> {code}
> We only have the one thread dump triggered by the FC:
> {code}
> Thread 80 (DatanodeAdminMonitor-0):
>   State: RUNNABLE
>   Blocked count: 16
>   Waited count: 453693
>   Stack:
> 
> org.apache.hadoop.hdfs.server.blockmanagement.DatanodeAdminManager$Monitor.check(DatanodeAdminManager.java:538)
> 
> org.apache.hadoop.hdfs.server.blockmanagement.DatanodeAdminManager$Monitor.run(DatanodeAdminManager.java:494)
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
> 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
> 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
> 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> java.lang.Thread.run(Thread.java:748)
> {code}
> This was the line of code:
> {code}
> private void check() {
>   final Iterator>>
>   it = new CyclicIteration<>(outOfServiceNodeBlocks,
>   iterkey).iterator();
>   final LinkedList toRemove = new LinkedList<>();
>   while (it.hasNext() && !exceededNumBlocksPerCheck() && namesystem
>   .isRunning()) {
> numNodesChecked++;
> final Map.Entry>
> entry = it.next();
> final DatanodeDescriptor dn = entry.getKey();
> AbstractList blocks = entry.getValue();
> boolean fullScan = false;
> if (dn.isMaintenance() && 

[jira] [Created] (HDFS-16585) Add @VisibleForTesting in Dispatcher.java after HDFS-16268

2022-05-20 Thread Wei-Chiu Chuang (Jira)
Wei-Chiu Chuang created HDFS-16585:
--

 Summary: Add @VisibleForTesting in Dispatcher.java after HDFS-16268
 Key: HDFS-16585
 URL: https://issues.apache.org/jira/browse/HDFS-16585
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Wei-Chiu Chuang


The scope of a few methods were opened up by HDFS-16268 to facilitate unit 
testing. We should annotate them with {{@VisibleForTesting}} so that they don't 
get used by production code.

The affected methods include:
PendingMove
markMovedIfGoodBlock
isGoodBlockCandidate




--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-16520) Improve EC pread: avoid potential reading whole block

2022-05-06 Thread Wei-Chiu Chuang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16520?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang resolved HDFS-16520.

Resolution: Fixed

Merged the PR and cherrypicked into branch-3.3.

Thanks!

> Improve EC pread: avoid potential reading whole block
> -
>
> Key: HDFS-16520
> URL: https://issues.apache.org/jira/browse/HDFS-16520
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: dfsclient, ec, erasure-coding
>Affects Versions: 3.3.1, 3.3.2
>Reporter: daimin
>Assignee: daimin
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.3.4
>
>  Time Spent: 4h 20m
>  Remaining Estimate: 0h
>
> HDFS client 'pread' represents 'position read', this kind of read just need a 
> range of data instead of reading the whole file/block. By using 
> BlockReaderFactory#setLength, client tells datanode the block length to be 
> read from disk and sent to client.
> To EC file, the block length to read is not well set, by default using 
> 'block.getBlockSize() - offsetInBlock' to both pread and sread. Thus datanode 
> read much more data and send to client, and abort when client closes 
> connection. There is a lot waste of resource to this situation.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-16521) DFS API to retrieve slow datanodes

2022-05-05 Thread Wei-Chiu Chuang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16521?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang resolved HDFS-16521.

Resolution: Fixed

> DFS API to retrieve slow datanodes
> --
>
> Key: HDFS-16521
> URL: https://issues.apache.org/jira/browse/HDFS-16521
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Reporter: Viraj Jasani
>Assignee: Viraj Jasani
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.3.4
>
>  Time Spent: 7h
>  Remaining Estimate: 0h
>
> Providing DFS API to retrieve slow nodes would help add an additional option 
> to "dfsadmin -report" that lists slow datanodes info for operators to take a 
> look, specifically useful filter for larger clusters.
> The other purpose of such API is for HDFS downstreamers without direct access 
> to namenode http port (only rpc port accessible) to retrieve slownodes.
> Moreover, 
> [FanOutOneBlockAsyncDFSOutput|https://github.com/apache/hbase/blob/master/hbase-asyncfs/src/main/java/org/apache/hadoop/hbase/io/asyncfs/FanOutOneBlockAsyncDFSOutput.java]
>  in HBase currently has to rely on it's own way of marking and excluding slow 
> nodes while 1) creating pipelines and 2) handling ack, based on factors like 
> the data length of the packet, processing time with last ack timestamp, 
> whether flush to replicas is finished etc. If it can utilize slownode API 
> from HDFS to exclude nodes appropriately while writing block, a lot of it's 
> own post-ack computation of slow nodes can be _saved_ or _improved_ or based 
> on further experiment, we could find _better solution_ to manage slow node 
> detection logic both in HDFS and HBase. However, in order to collect more 
> data points and run more POC around this area, HDFS should provide API for 
> downstreamers to efficiently utilize slownode info for such critical 
> low-latency use-case (like writing WALs).



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-16551) Backport HADOOP-17588 to 3.3 and other active old branches.

2022-04-24 Thread Wei-Chiu Chuang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16551?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang resolved HDFS-16551.

Fix Version/s: 2.10.2
   3.2.4
   Resolution: Fixed

Done. Thanks!

> Backport HADOOP-17588 to 3.3 and other active old branches.
> ---
>
> Key: HDFS-16551
> URL: https://issues.apache.org/jira/browse/HDFS-16551
> Project: Hadoop HDFS
>  Issue Type: Task
>Reporter: Renukaprasad C
>Assignee: Renukaprasad C
>Priority: Major
>  Labels: pull-request-available
> Fix For: 2.10.2, 3.2.4, 3.3.4
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> This random issue has been handled in trunk, same needs to be backported to 
> active branches.
> org.apache.hadoop.crypto.CryptoInputStream.close() - when 2 threads try to 
> close the stream second thread, fails with error.
> This operation should be synchronized to avoid multiple threads to perform 
> the close operation concurrently.
> [~Hemanth Boyina] 



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-16389) Improve NNThroughputBenchmark test mkdirs

2022-04-17 Thread Wei-Chiu Chuang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16389?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang resolved HDFS-16389.

Fix Version/s: 3.4.0
   Resolution: Fixed

> Improve NNThroughputBenchmark test mkdirs
> -
>
> Key: HDFS-16389
> URL: https://issues.apache.org/jira/browse/HDFS-16389
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: benchmarks, namenode
>Affects Versions: 2.9.2
>Reporter: JiangHua Zhu
>Assignee: JiangHua Zhu
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>  Time Spent: 3h 10m
>  Remaining Estimate: 0h
>
> When using the NNThroughputBenchmark test to create a large number of 
> directories, some abnormal information will be prompted.
> Here is the command:
> ./bin/hadoop org.apache.hadoop.hdfs.server.namenode.NNThroughputBenchmark -fs 
> hdfs:// -op mkdirs -threads 30 -dirs 500
> There are some exceptions here, such as:
> 21/12/20 10:25:00 INFO namenode.NNThroughputBenchmark: Starting benchmark: 
> mkdirs
> 21/12/20 10:25:01 INFO namenode.NNThroughputBenchmark: Generate 500 
> inputs for mkdirs
> 21/12/20 10:25:08 ERROR namenode.NNThroughputBenchmark: 
> java.lang.ArrayIndexOutOfBoundsException: 20
>   at 
> org.apache.hadoop.hdfs.server.namenode.FileNameGenerator.getNextDirName(FileNameGenerator.java:65)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FileNameGenerator.getNextFileName(FileNameGenerator.java:73)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NNThroughputBenchmark$MkdirsStats.generateInputs(NNThroughputBenchmark.java:668)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NNThroughputBenchmark$OperationStatsBase.benchmark(NNThroughputBenchmark.java:257)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NNThroughputBenchmark.run(NNThroughputBenchmark.java:1528)
>   at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
>   at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:90)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NNThroughputBenchmark.runBenchmark(NNThroughputBenchmark.java:1430)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NNThroughputBenchmark.main(NNThroughputBenchmark.java:1550)
> Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 20
>   at 
> org.apache.hadoop.hdfs.server.namenode.FileNameGenerator.getNextDirName(FileNameGenerator.java:65)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FileNameGenerator.getNextFileName(FileNameGenerator.java:73)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NNThroughputBenchmark$MkdirsStats.generateInputs(NNThroughputBenchmark.java:668)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NNThroughputBenchmark$OperationStatsBase.benchmark(NNThroughputBenchmark.java:257)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NNThroughputBenchmark.run(NNThroughputBenchmark.java:1528)
>   at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
>   at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:90)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NNThroughputBenchmark.runBenchmark(NNThroughputBenchmark.java:1430)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NNThroughputBenchmark.main(NNThroughputBenchmark.java:1550)
> These messages appear because some parameters are incorrectly set, such as 
> dirsPerDir or filesPerDir.
> When we see this log, this will make us have some questions.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-16535) SlotReleaser should reuse the domain socket based on socket paths

2022-04-17 Thread Wei-Chiu Chuang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16535?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang resolved HDFS-16535.

Fix Version/s: 3.3.3
   3.4.0
   Resolution: Fixed

Merged. Thanks [~stigahuang] and [~leosun08]!

> SlotReleaser should reuse the domain socket based on socket paths
> -
>
> Key: HDFS-16535
> URL: https://issues.apache.org/jira/browse/HDFS-16535
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs-client
>Affects Versions: 3.3.1, 3.4.0
>Reporter: Quanlong Huang
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.3.3
>
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> HDFS-13639 improves the performance of short-circuit shm slot releasing by 
> reusing the domain socket that the client previously used to send release 
> request to the DataNode.
> This is good when there are only one DataNode locates with the client (truth 
> in most of the production environment). However, if we launch multiple 
> DataNodes on a machine (usually for testing, e.g. Impala's end-to-end tests), 
> the request could be sent to the wrong DataNode. See an example in 
> IMPALA-11234.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-16502) Reconfigure Block Invalidate limit

2022-03-15 Thread Wei-Chiu Chuang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16502?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang resolved HDFS-16502.

Fix Version/s: 3.4.0
   3.3.3
   Resolution: Fixed

> Reconfigure Block Invalidate limit
> --
>
> Key: HDFS-16502
> URL: https://issues.apache.org/jira/browse/HDFS-16502
> Project: Hadoop HDFS
>  Issue Type: Task
>Reporter: Viraj Jasani
>Assignee: Viraj Jasani
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.3.3
>
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> Based on the cluster load, it would be helpful to consider tuning block 
> invalidate limit (dfs.block.invalidate.limit). The only way we can do this 
> without restarting Namenode as of today is by reconfiguring heartbeat 
> interval 
> {code:java}
> Math.max(heartbeatInt*20, blockInvalidateLimit){code}
> , this logic is not straightforward and operators are usually not aware of it 
> (lack of documentation), also updating heartbeat interval is not desired in 
> all the cases.
> We should provide the ability to alter block invalidation limit without 
> affecting heartbeat interval on the live cluster to adjust some load at 
> Datanode level.
> We should also take this opportunity to keep (heartbeatInterval * 20) 
> computation logic in a common method.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-16422) Fix thread safety of EC decoding during concurrent preads

2022-02-10 Thread Wei-Chiu Chuang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16422?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang resolved HDFS-16422.

Fix Version/s: 3.4.0
   3.2.3
   3.3.3
   Resolution: Fixed

Thanks [~cndaimin] for the great finding!

> Fix thread safety of EC decoding during concurrent preads
> -
>
> Key: HDFS-16422
> URL: https://issues.apache.org/jira/browse/HDFS-16422
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: dfsclient, ec, erasure-coding
>Affects Versions: 3.3.0, 3.3.1
>Reporter: daimin
>Assignee: daimin
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.2.3, 3.3.3
>
>  Time Spent: 3h
>  Remaining Estimate: 0h
>
> Reading data on an erasure-coded file with missing replicas(internal block of 
> block group) will cause online reconstruction: read dataUnits part of data 
> and decode them into the target missing data. Each DFSStripedInputStream 
> object has a RawErasureDecoder object, and when we doing pread concurrently, 
> RawErasureDecoder.decode will be invoked concurrently too. 
> RawErasureDecoder.decode is not thread safe, as a result of that we get wrong 
> data from pread occasionally.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-16437) ReverseXML processor doesn't accept XML files without the SnapshotDiffSection.

2022-02-05 Thread Wei-Chiu Chuang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16437?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang resolved HDFS-16437.

Fix Version/s: 3.4.0
   Resolution: Fixed

Merged. Thanks [~it_singer] for the contribution!

> ReverseXML processor doesn't accept XML files without the SnapshotDiffSection.
> --
>
> Key: HDFS-16437
> URL: https://issues.apache.org/jira/browse/HDFS-16437
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs
>Affects Versions: 3.1.1, 3.3.0
>Reporter: yanbin.zhang
>Assignee: yanbin.zhang
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>  Time Spent: 5h 40m
>  Remaining Estimate: 0h
>
> In a cluster environment without snapshot, if you want to convert back to 
> fsimage through the generated xml, an error will be reported.
> {code:java}
> //代码占位符
> [test@test001 ~]$ hdfs oiv -p ReverseXML -i fsimage_0257220.xml 
> -o fsimage_0257220
> OfflineImageReconstructor failed: FSImage XML ended prematurely, without 
> including section(s) SnapshotDiffSection
> java.io.IOException: FSImage XML ended prematurely, without including 
> section(s) SnapshotDiffSection
>         at 
> org.apache.hadoop.hdfs.tools.offlineImageViewer.OfflineImageReconstructor.processXml(OfflineImageReconstructor.java:1765)
>         at 
> org.apache.hadoop.hdfs.tools.offlineImageViewer.OfflineImageReconstructor.run(OfflineImageReconstructor.java:1842)
>         at 
> org.apache.hadoop.hdfs.tools.offlineImageViewer.OfflineImageViewerPB.run(OfflineImageViewerPB.java:211)
>         at 
> org.apache.hadoop.hdfs.tools.offlineImageViewer.OfflineImageViewerPB.main(OfflineImageViewerPB.java:149)
> 22/01/25 15:56:52 INFO util.ExitUtil: Exiting with status 1: ExitException 
> {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-16423) balancer should not get blocks on stale storages

2022-01-25 Thread Wei-Chiu Chuang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16423?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang resolved HDFS-16423.

Fix Version/s: 3.3.3
   Resolution: Fixed

> balancer should not get blocks on stale storages
> 
>
> Key: HDFS-16423
> URL: https://issues.apache.org/jira/browse/HDFS-16423
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: balancer  mover
>Reporter: qinyuren
>Assignee: qinyuren
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.3.3
>
> Attachments: image-2022-01-13-17-18-32-409.png
>
>  Time Spent: 4h 20m
>  Remaining Estimate: 0h
>
> We have met a problems as described in HDFS-16420
> We found that balancer copied a block multi times without deleting the source 
> block if this block was placed in a stale storage. And resulting a block with 
> many copies, but these redundant copies are not deleted until the storage 
> become not stale.
>  
> !image-2022-01-13-17-18-32-409.png|width=657,height=275!



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-16403) Improve FUSE IO performance by supporting FUSE parameter max_background

2022-01-24 Thread Wei-Chiu Chuang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16403?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang resolved HDFS-16403.

Resolution: Fixed

Thank you [~cndaimin] for the great work and the excellent performance test!
Thanks [~pifta] for the code review.

> Improve FUSE IO performance by supporting FUSE parameter max_background
> ---
>
> Key: HDFS-16403
> URL: https://issues.apache.org/jira/browse/HDFS-16403
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: fuse-dfs
>Affects Versions: 3.3.0, 3.3.1
>Reporter: daimin
>Assignee: daimin
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.2.3, 3.3.3
>
>  Time Spent: 3.5h
>  Remaining Estimate: 0h
>
> When we examining the FUSE IO performance on HDFS, we found that the 
> simultaneous IO requests number are limited to a fixed number, like 12. This 
> limitation makes the IO performance on FUSE client quite unacceptable. We did 
> some research on this and inspired by the article  [Performance and Resource 
> Utilization of FUSE User-Space File 
> Systems|https://dl.acm.org/doi/fullHtml/10.1145/3310148], clearly the FUSE 
> parameter '{{{}max_background{}}}' decides the simultaneous IO requests 
> number, which is 12 by default.
> We add 'max_background' to fuse_dfs mount options,  the FUSE kernel will take 
> effect when an option value is given.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Reopened] (HDFS-16423) balancer should not get blocks on stale storages

2022-01-24 Thread Wei-Chiu Chuang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16423?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang reopened HDFS-16423:


Reopen to backport this to lower branches.

> balancer should not get blocks on stale storages
> 
>
> Key: HDFS-16423
> URL: https://issues.apache.org/jira/browse/HDFS-16423
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: balancer  mover
>Reporter: qinyuren
>Assignee: qinyuren
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
> Attachments: image-2022-01-13-17-18-32-409.png
>
>  Time Spent: 3h 20m
>  Remaining Estimate: 0h
>
> We have met a problems as described in HDFS-16420
> We found that balancer copied a block multi times without deleting the source 
> block if this block was placed in a stale storage. And resulting a block with 
> many copies, but these redundant copies are not deleted until the storage 
> become not stale.
>  
> !image-2022-01-13-17-18-32-409.png|width=657,height=275!



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-16317) Backport HDFS-14729 for branch-3.2

2021-12-21 Thread Wei-Chiu Chuang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16317?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang resolved HDFS-16317.

Fix Version/s: 3.2.3
   Resolution: Fixed

Merged the commit into branch-3.2 and branch-3.2.3.

> Backport HDFS-14729 for branch-3.2
> --
>
> Key: HDFS-16317
> URL: https://issues.apache.org/jira/browse/HDFS-16317
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: security
>Affects Versions: 3.2.2
>Reporter: Ananya Singh
>Assignee: Ananya Singh
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.2.3
>
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> Our security tool raised the following security flaw on Hadoop 3.2.2: 
> +[CVE-2015-9251 :  
> |http://web.nvd.nist.gov/view/vuln/detail?vulnId=CVE-2015-9251] 
> [https://nvd.nist.gov/vuln/detail/|https://nvd.nist.gov/vuln/detail/CVE-2021-29425]
>  
> [CVE-2015-9251|http://web.nvd.nist.gov/view/vuln/detail?vulnId=CVE-2015-9251]+
> +[CVE-2019-11358|http://web.nvd.nist.gov/view/vuln/detail?vulnId=CVE-2019-11358]
>  : 
> [https://nvd.nist.gov/vuln/detail/|https://nvd.nist.gov/vuln/detail/CVE-2021-29425]
>  
> [CVE-2019-11358|http://web.nvd.nist.gov/view/vuln/detail?vulnId=CVE-2019-11358]+
> +[CVE-2020-11022 
> |http://web.nvd.nist.gov/view/vuln/detail?vulnId=CVE-2020-11022] : 
> [https://nvd.nist.gov/vuln/detail/|https://nvd.nist.gov/vuln/detail/CVE-2021-29425]
>  
> [CVE-2020-11022|http://web.nvd.nist.gov/view/vuln/detail?vulnId=CVE-2020-11022]+
>  
> +[CVE-2020-11023 
> |http://web.nvd.nist.gov/view/vuln/detail?vulnId=CVE-2020-11023] [ 
> |http://web.nvd.nist.gov/view/vuln/detail?vulnId=CVE-2020-11022] : 
> [https://nvd.nist.gov/vuln/detail/|https://nvd.nist.gov/vuln/detail/CVE-2021-29425]
>  
> [CVE-2020-11023|http://web.nvd.nist.gov/view/vuln/detail?vulnId=CVE-2020-11023]+
>  
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Reopened] (HDFS-16384) Upgrade Netty to 4.1.72.Final

2021-12-16 Thread Wei-Chiu Chuang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16384?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang reopened HDFS-16384:


> Upgrade Netty to 4.1.72.Final
> -
>
> Key: HDFS-16384
> URL: https://issues.apache.org/jira/browse/HDFS-16384
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client
>Affects Versions: 3.3.1
>Reporter: Tamas Penzes
>Assignee: Tamas Penzes
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> New fixes for netty, nothing else changed, just netty version bumped and two 
> more exclusion in hdfs-client because of new netty.
> No new tests added as not needed.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-16337) Show start time of Datanode on Web

2021-11-22 Thread Wei-Chiu Chuang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16337?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang resolved HDFS-16337.

Fix Version/s: 3.4.0
   Resolution: Fixed

> Show start time of Datanode on Web
> --
>
> Key: HDFS-16337
> URL: https://issues.apache.org/jira/browse/HDFS-16337
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Reporter: tomscut
>Assignee: tomscut
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.4.0
>
> Attachments: image-2021-11-19-08-55-58-343.png
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> Show _start time_ of Datanode on Web.
> !image-2021-11-19-08-55-58-343.png|width=540,height=155!
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-16241) Standby close reconstruction thread

2021-10-11 Thread Wei-Chiu Chuang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16241?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang resolved HDFS-16241.

Fix Version/s: 3.3.2
   3.2.3
   3.4.0
   Resolution: Fixed

Thanks. Merged.

> Standby close reconstruction thread
> ---
>
> Key: HDFS-16241
> URL: https://issues.apache.org/jira/browse/HDFS-16241
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: zhanghuazong
>Assignee: zhanghuazong
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.2.3, 3.3.2
>
> Attachments: HDFS-16241
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> When the "Reconstruction Queue Initializer" thread of the active namenode has 
> not stopped, switch to standby namenode. The "Reconstruction Queue 
> Initializer" thread should be closed



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-16258) HDFS-13671 breaks TestBlockManager in branch-3.2

2021-10-06 Thread Wei-Chiu Chuang (Jira)
Wei-Chiu Chuang created HDFS-16258:
--

 Summary: HDFS-13671 breaks TestBlockManager in branch-3.2
 Key: HDFS-16258
 URL: https://issues.apache.org/jira/browse/HDFS-16258
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 3.2.3
Reporter: Wei-Chiu Chuang


TestBlockManager in branch-3.2 has two failed tests: 
* testDeleteCorruptReplicaWithStatleStorages
* testBlockManagerMachinesArray

Looks like broken by HDFS-13671. CC: [~brahmareddy]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-16238) Improve comments related to EncryptionZoneManager

2021-09-30 Thread Wei-Chiu Chuang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16238?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang resolved HDFS-16238.

Fix Version/s: 3.4.0
   Resolution: Fixed

Thanks [~vjasani] [~hexiaoqiao]for the review!

> Improve comments related to EncryptionZoneManager
> -
>
> Key: HDFS-16238
> URL: https://issues.apache.org/jira/browse/HDFS-16238
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: documentation, encryption, namenode
>Reporter: JiangHua Zhu
>Assignee: JiangHua Zhu
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> In EncryptionZoneManager, there are some missing
> The description of the relevant comment. The purpose of this jira is to 
> perfect them.
> E.g:
>/**
> * Re-encrypts the given encryption zone path. If the given path is not the
> * root of an encryption zone, an exception is thrown.
> * @param zoneIIP
> * @param keyVersionName
> * @throws IOException
> */
>List reencryptEncryptionZone(final INodesInPath zoneIIP,
>final String keyVersionName) throws IOException {
> ..
> }
> The description of zoneIIP and keyVersionName is missing here.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-16232) Fix java doc for BlockReaderRemote#newBlockReader

2021-09-23 Thread Wei-Chiu Chuang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16232?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang resolved HDFS-16232.

Fix Version/s: 3.4.0
   Resolution: Fixed

> Fix java doc for BlockReaderRemote#newBlockReader
> -
>
> Key: HDFS-16232
> URL: https://issues.apache.org/jira/browse/HDFS-16232
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: tomscut
>Assignee: tomscut
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Fix java doc for BlockReaderRemote#newBlockReader.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-16233) Do not use exception handler to implement copy-on-write for EnumCounters

2021-09-22 Thread Wei-Chiu Chuang (Jira)
Wei-Chiu Chuang created HDFS-16233:
--

 Summary: Do not use exception handler to implement copy-on-write 
for EnumCounters
 Key: HDFS-16233
 URL: https://issues.apache.org/jira/browse/HDFS-16233
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Reporter: Wei-Chiu Chuang
 Attachments: Screen Shot 2021-09-22 at 1.59.59 PM.png

HDFS-14547 saves the NameNode heap space occupied by EnumCounters by 
essentially implementing a copy-on-write strategy.

At beginning, all EnumCounters refers to the same ConstEnumCounters to save 
heap space. When it is modified, an exception is thrown and the exception 
handler converts ConstEnumCounters to EnumCounters object and updates it.

Using exception handler to perform anything more than occasional is bad for 
performance. 

Propose: use instanceof keyword to detect the type of object and do COW 
accordingly.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-16192) ViewDistributedFileSystem#rename wrongly using src in the place of dst.

2021-08-30 Thread Wei-Chiu Chuang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16192?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang resolved HDFS-16192.

Fix Version/s: 3.3.2
   3.4.0
   Resolution: Fixed

Thanks [~umamaheswararao]!

> ViewDistributedFileSystem#rename wrongly using src in the place of dst.
> ---
>
> Key: HDFS-16192
> URL: https://issues.apache.org/jira/browse/HDFS-16192
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Uma Maheswara Rao G
>Assignee: Uma Maheswara Rao G
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.3.2
>
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> In ViewDistributedFileSystem, we are mistakenly used src path in the place of 
> dst path when finding mount path info.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-16173) Improve CopyCommands#Put#executor queue configurability

2021-08-26 Thread Wei-Chiu Chuang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16173?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang resolved HDFS-16173.

Resolution: Fixed

> Improve CopyCommands#Put#executor queue configurability
> ---
>
> Key: HDFS-16173
> URL: https://issues.apache.org/jira/browse/HDFS-16173
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: fs
>Reporter: JiangHua Zhu
>Assignee: JiangHua Zhu
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.3.2, 3.2.4
>
>  Time Spent: 5h 20m
>  Remaining Estimate: 0h
>
> In CopyCommands#Put, the number of executor queues is a fixed value, 1024.
> We should make him configurable, because there are different usage 
> environments.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-16175) Improve the configurable value of Server #PURGE_INTERVAL_NANOS

2021-08-25 Thread Wei-Chiu Chuang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16175?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang resolved HDFS-16175.

Fix Version/s: 3.2.4
   3.3.2
   3.4.0
   Resolution: Fixed

> Improve the configurable value of Server #PURGE_INTERVAL_NANOS
> --
>
> Key: HDFS-16175
> URL: https://issues.apache.org/jira/browse/HDFS-16175
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: ipc
>Reporter: JiangHua Zhu
>Assignee: JiangHua Zhu
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.3.2, 3.2.4
>
>  Time Spent: 4.5h
>  Remaining Estimate: 0h
>
> In Server, Server #PURGE_INTERVAL_NANOS is a fixed value, 15.
> We can try to improve the configurable value of Server #PURGE_INTERVAL_NANOS, 
> which will make RPC more flexible.
> private final static long PURGE_INTERVAL_NANOS = TimeUnit.NANOSECONDS.convert(
>   15, TimeUnit.MINUTES);



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-16180) FsVolumeImpl.nextBlock should consider that the block meta file has been deleted.

2021-08-23 Thread Wei-Chiu Chuang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16180?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang resolved HDFS-16180.

Fix Version/s: 3.4.0
   Resolution: Fixed

> FsVolumeImpl.nextBlock should consider that the block meta file has been 
> deleted.
> -
>
> Key: HDFS-16180
> URL: https://issues.apache.org/jira/browse/HDFS-16180
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Affects Versions: 3.3.0, 3.4.0
>Reporter: Max  Xie
>Assignee: Max  Xie
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> In my cluster,  we found that when VolumeScanner run, sometime dn will throw 
> some error log below
> ```
>  
> 2021-08-19 08:00:11,549 INFO 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetAsyncDiskService:
>  Deleted BP-1020175758-nnip-1597745872895 blk_1142977964_69237147 URI 
> file:/disk1/dfs/data/current/BP-1020175758- 
> nnip-1597745872895/current/finalized/subdir0/subdir21/blk_1142977964
> 2021-08-19 08:00:48,368 ERROR 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeImpl: 
> nextBlock(DS-060c8e4c-1ef6-49f5-91ef-91957356891a, BP-1020175758- 
> nnip-1597745872895): I/O error
> java.io.IOException: Meta file not found, 
> blockFile=/disk1/dfs/data/current/BP-1020175758- 
> nnip-1597745872895/current/finalized/subdir0/subdir21/blk_1142977964
> at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetUtil.findMetaFile(FsDatasetUtil.java:101)
> at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeImpl$BlockIteratorImpl.nextBlock(FsVolumeImpl.java:809)
> at 
> org.apache.hadoop.hdfs.server.datanode.VolumeScanner.runLoop(VolumeScanner.java:528)
> at 
> org.apache.hadoop.hdfs.server.datanode.VolumeScanner.run(VolumeScanner.java:628)
> 2021-08-19 08:00:48,368 WARN 
> org.apache.hadoop.hdfs.server.datanode.VolumeScanner: 
> VolumeScanner(/disk1/dfs/data, DS-060c8e4c-1ef6-49f5-91ef-91957356891a): 
> nextBlock error on 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeImpl$BlockIteratorImpl@7febc6b4
> ```
> When VolumeScanner scan block  blk_1142977964,  it has been deleted by 
> datanode,  scanner can not find the meta file of blk_1142977964, so it throw 
> these error log.
>  
> Maybe we should handle FileNotFoundException during nextblock to reduce error 
> log and nextblock retry times.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-16112) Fix flaky unit test TestDecommissioningStatusWithBackoffMonitor

2021-08-23 Thread Wei-Chiu Chuang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16112?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang resolved HDFS-16112.

Resolution: Duplicate

closed it for you. You are already granted contributor privilege and you should 
be able to close it yourself (that is my understanding)

> Fix flaky unit test TestDecommissioningStatusWithBackoffMonitor 
> 
>
> Key: HDFS-16112
> URL: https://issues.apache.org/jira/browse/HDFS-16112
> Project: Hadoop HDFS
>  Issue Type: Wish
>Reporter: tomscut
>Assignee: tomscut
>Priority: Minor
>
> These unit tests 
> TestDecommissioningStatusWithBackoffMonitor#testDecommissionStatus and 
> TestDecommissioningStatus#testDecommissionStatus recently seems a little 
> flaky, we should fix them.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-16177) Bug fix for Util#receiveFile

2021-08-18 Thread Wei-Chiu Chuang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16177?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang resolved HDFS-16177.

Fix Version/s: 3.3.2
   3.2.3
   3.4.0
   Resolution: Fixed

Thanks [~tomscut] and [~ferhui]!

> Bug fix for Util#receiveFile
> 
>
> Key: HDFS-16177
> URL: https://issues.apache.org/jira/browse/HDFS-16177
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 3.1.0
>Reporter: tomscut
>Assignee: tomscut
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.2.3, 3.3.2
>
> Attachments: download-fsimage.jpg
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> The time to write file was miscalculated in Util#receiveFile.
> !download-fsimage.jpg|width=578,height=134!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-14477) No enum constant Operation.GET_BLOCK_LOCATIONS

2021-08-18 Thread Wei-Chiu Chuang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14477?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang resolved HDFS-14477.

Resolution: Duplicate

> No enum constant Operation.GET_BLOCK_LOCATIONS 
> ---
>
> Key: HDFS-14477
> URL: https://issues.apache.org/jira/browse/HDFS-14477
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: fs
>Affects Versions: 2.7.0, 2.8.0, 2.7.1, 2.7.2, 2.7.3, 2.9.0, 2.7.4, 2.8.1, 
> 2.8.2, 2.8.3, 2.7.5, 3.0.0, 2.9.1, 2.8.4, 2.7.6, 2.9.2, 2.8.5, 2.7.7, 2.7.8, 
> 2.8.6
> Environment: Running on Ubuntu 16.04
> Hadoop v2.7.4
> Minikube v1.0.1
> Scala v2.11
> Spark v2.4.2
>  
>Reporter: Roksolana Diachuk
>Priority: Major
>
> I was trying to read Avro files contents from HDFS using Spark application 
> and Httpfs configured in minikube (for using Kubernetes locally). Each time I 
> try to read the files I get this exception:
> {code:java}
> Exception in thread "main" 
> org.apache.hadoop.ipc.RemoteException(com.sun.jersey.api.ParamException$QueryParamException):
>  java.lang.IllegalArgumentException: No enum constant 
> org.apache.hadoop.fs.http.client.HttpFSFileSystem.Operation.GET_BLOCK_LOCATIONS
>  at org.apache.hadoop.hdfs.web.JsonUtil.toRemoteException(JsonUtil.java:118)
>  at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem.validateResponse(WebHdfsFileSystem.java:367)
>  at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem.access$200(WebHdfsFileSystem.java:98)
>  at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.runWithRetry(WebHdfsFileSystem.java:625)
>  at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.access$100(WebHdfsFileSystem.java:472)
>  at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner$1.run(WebHdfsFileSystem.java:502)
>  at java.security.AccessController.doPrivileged(Native Method)
>  at javax.security.auth.Subject.doAs(Subject.java:422)
>  at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1746)
>  at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.run(WebHdfsFileSystem.java:498)
>  at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem.getFileBlockLocations(WebHdfsFileSystem.java:1420)
>  at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem.getFileBlockLocations(WebHdfsFileSystem.java:1404)
>  at 
> org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:343)
>  at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:204)
>  at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:253)
>  at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:251)
>  at scala.Option.getOrElse(Option.scala:121)
>  at org.apache.spark.rdd.RDD.partitions(RDD.scala:251)
>  at 
> org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:49)
>  at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:253)
>  at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:251)
>  at scala.Option.getOrElse(Option.scala:121)
>  at org.apache.spark.rdd.RDD.partitions(RDD.scala:251)
>  at org.apache.spark.SparkContext.runJob(SparkContext.scala:2126)
>  at org.apache.spark.rdd.RDD$$anonfun$collect$1.apply(RDD.scala:945)
>  at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
>  at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
>  at org.apache.spark.rdd.RDD.withScope(RDD.scala:363)
>  at org.apache.spark.rdd.RDD.collect(RDD.scala:944)
>  at spark_test.TestSparkJob$.main(TestSparkJob.scala:48)
>  at spark_test.TestSparkJob.main(TestSparkJob.scala){code}
>  
> I access HDFS using Httpfs setup in Kubernetes. So my Spark application runs 
> outside of the K8s cluster therefore, all the services are accessed using 
> NodePorts. When I launch the Spark app inside of the K8s cluster and use only 
> HDFS client or WebHDFS, I can get all the files contents. The error occurs 
> only when I execute an app outside of the cluster and that is when I access 
> HDFS using Httpfs.
> So I checked Hadoop sources and I have found out that there is no such enum 
> as GET_BLOCK_LOCATIONS. It is named GETFILEBLOCKLOCATIONS in Operation enum 
> by [this 
> link|[https://github.com/apache/hadoop/blob/release-2.7.4-RC0/hadoop-hdfs-project/hadoop-hdfs-httpfs/src/main/java/org/apache/hadoop/fs/http/client/HttpFSFileSystem.java]].
>  And the same applies to all the Hadoop versions I have checked (2.7.4 and 
> higher). 
> The conclusion would be that HDFS and HttpFs are not compatible with 
> operations names. But it may be true for other operations. So It is not yet 
> possible to read the data from HDFS using Httpfs. 
> Is it possible to fix this error somehow?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HDFS-16162) Improve DFSUtil#checkProtectedDescendants() related parameter comments

2021-08-17 Thread Wei-Chiu Chuang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16162?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang resolved HDFS-16162.

Fix Version/s: 3.4.0
   Resolution: Fixed

> Improve DFSUtil#checkProtectedDescendants() related parameter comments
> --
>
> Key: HDFS-16162
> URL: https://issues.apache.org/jira/browse/HDFS-16162
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: documentation
>Reporter: JiangHua Zhu
>Assignee: JiangHua Zhu
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> Some parameter comments related to DFSUtil#checkProtectedDescendants() are 
> missing, for example:
> /**
>  * If the given directory has any non-empty protected descendants, throw
>  * (Including itself).
>  *
>  * @param iip directory, to check its descendants.
>  * @throws AccessControlException if it is a non-empty protected 
> descendant
>  *found.
>  * @throws ParentNotDirectoryException
>  * @throws UnresolvedLinkException
>  */
> public static void checkProtectedDescendants(
> FSDirectory fsd, INodesInPath iip)
> Throw AccessControlException, UnresolvedLinkException,
> ParentNotDirectoryException {
> The description of fsd is missing here.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-16161) Corrupt block checksum is not reported to NameNode

2021-08-10 Thread Wei-Chiu Chuang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16161?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang resolved HDFS-16161.

Resolution: Duplicate

Turns out it was fixed by HDFS-14706

> Corrupt block checksum is not reported to NameNode
> --
>
> Key: HDFS-16161
> URL: https://issues.apache.org/jira/browse/HDFS-16161
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Reporter: Wei-Chiu Chuang
>Priority: Major
>
> One of our user reported this error in the log:
> {noformat}
> 2021-07-30 09:51:27,509 ERROR 
> org.apache.hadoop.hdfs.server.datanode.DataNode: 
> an02nphda5777.example.com:1004:DataXceiver error processing READ_BLOCK 
> operation  src: /10.30.10.68:35680 dst: /10.30.10.67:1004
> java.lang.IllegalArgumentException: id=-46 out of range [0, 5)
> at 
> org.apache.hadoop.util.DataChecksum$Type.valueOf(DataChecksum.java:76)
> at 
> org.apache.hadoop.util.DataChecksum.newDataChecksum(DataChecksum.java:167)
> {noformat}
> Analysis:
> it looks like the first few bytes of checksum was bad. The first few bytes 
> determines the type of checksum (CRC32, CRC32C…etc). But the block was never 
> reported to NameNode and removed.
> if DN throws an IOException reading a block, it starts another thread to scan 
> the block. If the block is indeed bad, it tells NN it’s got a bad block. But 
> this is an IllegalArgumentException which is a RuntimeException not an IOE so 
> it’s not handled that way.
> its’ a bug in the error handling code. It should be made more graceful.
> Suggest: catch the IllegalArgumentException in 
> BlockMetadataHeader.preadHeader() and throw CorruptMetaHeaderException, so 
> that DN catches the exception and perform the regular block scan check.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-16161) Corrupt block checksum is not reported to NameNode

2021-08-10 Thread Wei-Chiu Chuang (Jira)
Wei-Chiu Chuang created HDFS-16161:
--

 Summary: Corrupt block checksum is not reported to NameNode
 Key: HDFS-16161
 URL: https://issues.apache.org/jira/browse/HDFS-16161
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: datanode
Reporter: Wei-Chiu Chuang


One of our user reported this error in the log:

{noformat}
2021-07-30 09:51:27,509 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: 
an02nphda5777.npa.bfsiplatform.com:1004:DataXceiver error processing READ_BLOCK 
operation  src: /10.30.10.68:35680 dst: /10.30.10.67:1004
java.lang.IllegalArgumentException: id=-46 out of range [0, 5)
at 
org.apache.hadoop.util.DataChecksum$Type.valueOf(DataChecksum.java:76)
at 
org.apache.hadoop.util.DataChecksum.newDataChecksum(DataChecksum.java:167)
{noformat}

Analysis:
it looks like the first few bytes of checksum was bad. The first few bytes 
determines the type of checksum (CRC32, CRC32C…etc).

if DN throws an IOException reading a block, it starts another thread to scan 
the block. If the block is indeed bad, it tells NN it’s got a bad block. But 
this is an IllegalArgumentException which is a RuntimeException not an IOE so 
it’s not handled that way.

its’ a bug in the error handling code. It should be made more graceful.

Suggest: catch the IllegalArgumentException in 
BlockMetadataHeader.preadHeader() and throw CorruptMetaHeaderException, so that 
DN catches the exception and perform the regular block scan check.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-16154) TestMiniJournalCluster failing intermittently because of not reseting UserGroupInformation completely

2021-08-06 Thread Wei-Chiu Chuang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16154?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang resolved HDFS-16154.

Fix Version/s: 3.3.2
   3.2.3
   3.4.0
   Resolution: Fixed

> TestMiniJournalCluster failing intermittently because of not reseting 
> UserGroupInformation completely
> -
>
> Key: HDFS-16154
> URL: https://issues.apache.org/jira/browse/HDFS-16154
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: wangzhaohui
>Assignee: wangzhaohui
>Priority: Minor
>  Labels: patch-available, pull-request-available
> Fix For: 3.4.0, 3.2.3, 3.3.2
>
> Attachments: HDFS-16154-001.patch
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> When I run the UT of org.apache.hadoop.hdfs.qjournal together at IDEA, there 
> are many failed UT.
> I found they has the same reason ,
> {code:java}
> java.io.IOException: Running in secure mode, but config doesn't have a 
> keytabjava.io.IOException: Running in secure mode, but config doesn't have a 
> keytab
>  at org.apache.hadoop.security.SecurityUtil.login(SecurityUtil.java:308) at 
> org.apache.hadoop.hdfs.qjournal.server.JournalNode.start(JournalNode.java:230)
>  at 
> org.apache.hadoop.hdfs.qjournal.MiniJournalCluster.(MiniJournalCluster.java:121)
>  at 
> org.apache.hadoop.hdfs.qjournal.MiniJournalCluster.(MiniJournalCluster.java:48)
>  at 
> org.apache.hadoop.hdfs.qjournal.MiniJournalCluster$Builder.build(MiniJournalCluster.java:80)
>  at 
> org.apache.hadoop.hdfs.qjournal.TestMiniJournalCluster.testStartStop(TestMiniJournalCluster.java:38)
>  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  at java.lang.reflect.Method.invoke(Method.java:498) at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
>  at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>  at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
>  at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>  at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325) at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
>  at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
>  at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290) at 
> org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71) at 
> org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288) at 
> org.junit.runners.ParentRunner.access$000(ParentRunner.java:58) at 
> org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268) at 
> org.junit.runners.ParentRunner.run(ParentRunner.java:363) at 
> org.junit.runner.JUnitCore.run(JUnitCore.java:137) at 
> org.junit.runner.JUnitCore.run(JUnitCore.java:115) at 
> org.junit.vintage.engine.execution.RunnerExecutor.execute(RunnerExecutor.java:40)
>  at java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:184) 
> at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193) 
> at java.util.Iterator.forEachRemaining(Iterator.java:116) at 
> java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:1801)
>  at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:482) at 
> java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:472) 
> at 
> java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:151) 
> at 
> java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:174)
>  at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234) at 
> java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:418) at 
> org.junit.vintage.engine.VintageTestEngine.executeAllChildren(VintageTestEngine.java:80)
>  at 
> org.junit.vintage.engine.VintageTestEngine.execute(VintageTestEngine.java:71) 
> at 
> org.junit.platform.launcher.core.DefaultLauncher.execute(DefaultLauncher.java:229)
>  at 
> org.junit.platform.launcher.core.DefaultLauncher.lambda$execute$6(DefaultLauncher.java:197)
>  at 
> org.junit.platform.launcher.core.DefaultLauncher.withInterceptedStreams(DefaultLauncher.java:211)
>  at 
> org.junit.platform.launcher.core.DefaultLauncher.execute(DefaultLauncher.java:191)
>  at 
> org.junit.platform.launcher.core.DefaultLauncher.execute(DefaultLauncher.java:128)
>  at 
> com.intellij.junit5.JUnit5IdeaTestRunner.startRunnerWithArgs(JUnit5IdeaTestRunner.java:71)
>  at 
> 

[jira] [Resolved] (HDFS-16149) Improve the parameter annotation in FairCallQueue#priorityLevels

2021-08-03 Thread Wei-Chiu Chuang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16149?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang resolved HDFS-16149.

Fix Version/s: 3.4.0
   Resolution: Fixed

> Improve the parameter annotation in FairCallQueue#priorityLevels
> 
>
> Key: HDFS-16149
> URL: https://issues.apache.org/jira/browse/HDFS-16149
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: JiangHua Zhu
>Assignee: JiangHua Zhu
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> The parameter description of FairCallQueue#priorityLevels is missing in 
> FairCallQueue.
>/**
> * Create a FairCallQueue.
> * @param capacity the total size of all sub-queues
> * @param ns the prefix to use for configuration
> * @param capacityWeights the weights array for capacity allocation
> * among subqueues
> * @param conf the configuration to read from
> * Notes: Each sub-queue has a capacity of `capacity / numSubqueues`.
> * The first or the highest priority sub-queue has an excess capacity
> * of `capacity% numSubqueues`
> */
>public FairCallQueue(int priorityLevels, int capacity, String ns,
>int[] capacityWeights, Configuration conf) {
> We should be more perfect for them.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-14529) SetTimes to throw FileNotFoundException if inode is not found

2021-07-30 Thread Wei-Chiu Chuang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14529?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang resolved HDFS-14529.

Fix Version/s: 3.4.0
   Resolution: Fixed

> SetTimes to throw FileNotFoundException if inode is not found
> -
>
> Key: HDFS-14529
> URL: https://issues.apache.org/jira/browse/HDFS-14529
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs
>Affects Versions: 3.1.1
>Reporter: Harshakiran Reddy
>Assignee: Wei-Chiu Chuang
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> {noformat}
> 2019-05-31 15:15:42,397 ERROR namenode.FSEditLogLoader: Encountered exception 
> on operation TimesOp [length=0, 
> path=/testLoadSpace/dir0/dir0/dir0/dir2/_file_9096763, mtime=-1, 
> atime=1559294343288, opCode=OP_TIMES, txid=18927893]
> java.lang.NullPointerException
> at 
> org.apache.hadoop.hdfs.server.namenode.FSDirAttrOp.unprotectedSetTimes(FSDirAttrOp.java:490)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.applyEditLogOp(FSEditLogLoader.java:711)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:286)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:181)
> at org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:924)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:771)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:331)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:1105)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:726)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.doRecovery(NameNode.java:1558)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1640)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1725){noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-15936) Solve BlockSender#sendPacket() does not record SocketTimeout exception

2021-07-29 Thread Wei-Chiu Chuang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15936?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang resolved HDFS-15936.

Fix Version/s: 3.3.2
   3.4.0
   Resolution: Fixed

Thanks!

> Solve BlockSender#sendPacket() does not record SocketTimeout exception
> --
>
> Key: HDFS-15936
> URL: https://issues.apache.org/jira/browse/HDFS-15936
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: JiangHua Zhu
>Assignee: JiangHua Zhu
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.3.2
>
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> In BlockSender#sendPacket(), if a SocketTimeout exception occurs, no 
> information is recorded here.
> try {
>..
> } catch (IOException e) {
>if (e instanceof SocketTimeoutException) {
>  /*
>   * writing to client timed out. This happens if the client reads
>   * part of a block and then decides not to read the rest (but leaves
>   * the socket open).
>   *
>   * Reporting of this case is done in DataXceiver#run
>   */
>}
> }
> No records are generated here, which is not conducive to troubleshooting.
> We should add a line of warning type log.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-16137) Improve the comments related to FairCallQueue#queues

2021-07-28 Thread Wei-Chiu Chuang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16137?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang resolved HDFS-16137.

Fix Version/s: 3.4.0
   Resolution: Fixed

> Improve the comments related to FairCallQueue#queues
> 
>
> Key: HDFS-16137
> URL: https://issues.apache.org/jira/browse/HDFS-16137
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: ipc
>Reporter: JiangHua Zhu
>Assignee: JiangHua Zhu
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> FairCallQueue#queues related comments are too simple:
>/* The queues */
>private final ArrayList> queues;
> Can not visually see the meaning of FairCallQueue#queues.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-16111) Add a configuration to RoundRobinVolumeChoosingPolicy to avoid failed volumes at datanodes.

2021-07-27 Thread Wei-Chiu Chuang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16111?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang resolved HDFS-16111.

Fix Version/s: 3.4.0
   Resolution: Fixed

Thanks!

> Add a configuration to RoundRobinVolumeChoosingPolicy to avoid failed volumes 
> at datanodes.
> ---
>
> Key: HDFS-16111
> URL: https://issues.apache.org/jira/browse/HDFS-16111
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Reporter: Zhihai Xu
>Assignee: Zhihai Xu
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>  Time Spent: 2h 40m
>  Remaining Estimate: 0h
>
> When we upgraded our hadoop cluster from hadoop 2.6.0 to hadoop 3.2.2, we got 
> failed volume on a lot of datanodes, which cause some missing blocks at that 
> time. Although later on we recovered all the missing blocks by symlinking the 
> path (dfs/dn/current) on the failed volume to a new directory and copying all 
> the data to the new directory, we missed our SLA and it delayed our upgrading 
> process on our production cluster for several hours.
> When this issue happened, we saw a lot of this exceptions happened before the 
> volumed failed on the datanode:
>  [DataXceiver for client  at /[XX.XX.XX.XX:XXX|http://10.104.103.159:33986/] 
> [Receiving block BP-XX-XX.XX.XX.XX-XX:blk_X_XXX]] 
> datanode.DataNode (BlockReceiver.java:(289)) - IOException in 
> BlockReceiver constructor :Possible disk error: Failed to create 
> /XXX/dfs/dn/current/BP-XX-XX.XX.XX.XX-X/tmp/blk_XX. Cause 
> is
> java.io.IOException: No space left on device
>         at java.io.UnixFileSystem.createFileExclusively(Native Method)
>         at java.io.File.createNewFile(File.java:1012)
>         at 
> org.apache.hadoop.hdfs.server.datanode.FileIoProvider.createFile(FileIoProvider.java:302)
>         at 
> org.apache.hadoop.hdfs.server.datanode.DatanodeUtil.createFileWithExistsCheck(DatanodeUtil.java:69)
>         at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.BlockPoolSlice.createTmpFile(BlockPoolSlice.java:292)
>         at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeImpl.createTmpFile(FsVolumeImpl.java:532)
>         at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeImpl.createTemporary(FsVolumeImpl.java:1254)
>         at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createTemporary(FsDatasetImpl.java:1598)
>         at 
> org.apache.hadoop.hdfs.server.datanode.BlockReceiver.(BlockReceiver.java:212)
>         at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.getBlockReceiver(DataXceiver.java:1314)
>         at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:768)
>         at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:173)
>         at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:107)
>         at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:291)
>         at java.lang.Thread.run(Thread.java:748)
>  
> We found this issue happened due to the following two reasons:
> First the upgrade process added some extra disk storage on the each disk 
> volume of the data node:
> BlockPoolSliceStorage.doUpgrade 
> (https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BlockPoolSliceStorage.java#L445)
>  is the main upgrade function in the datanode, it will add some extra 
> storage. The extra storage added is all new directories created in 
> /current//current, although all block data file and block meta data 
> file are hard-linked with /current//previous after upgrade. Since there 
> will be a lot of new directories created, this will use some disk space on 
> each disk volume.
>  
> Second there is a potential bug when picking a disk volume to write a new 
> block file(replica). By default, Hadoop uses RoundRobinVolumeChoosingPolicy, 
> The code to select a disk will check whether the available space on the 
> selected disk is more than the size bytes of block file to store 
> (https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/RoundRobinVolumeChoosingPolicy.java#L86)
>  But when creating a new block, there will be two files created: one is the 
> block file blk_, the other is block metadata file blk__.meta, 
> this is the code when finalizing a block, both block file size and meta data 
> file size will be updated: 
> 

[jira] [Created] (HDFS-16136) Handle all occurrence of InvalidEncryptionKeyException

2021-07-22 Thread Wei-Chiu Chuang (Jira)
Wei-Chiu Chuang created HDFS-16136:
--

 Summary: Handle all occurrence of InvalidEncryptionKeyException 
 Key: HDFS-16136
 URL: https://issues.apache.org/jira/browse/HDFS-16136
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 3.1.1
Reporter: Wei-Chiu Chuang


After HDFS-10609, HDFS-11741, we still observe InvalidEncryptionKeyException 
errors that are not retried.

{noformat}
2021-07-12 11:10:58,795 ERROR datanode.DataNode 
(DataXceiver.java:writeBlock(863)) - 
DataNode{data=FSDataset{dirpath='[/grid/01/hadoop/hdfs/data, 
/grid/02/hadoop/hdfs/data, /grid/03/hadoop/hdfs/data, 
/grid/04/hadoop/hdfs/data, /grid/05/hadoop/hdfs/data, 
/grid/06/hadoop/hdfs/data, /grid/07/hadoop/hdfs/data, 
/grid/08/hadoop/hdfs/data, /grid/09/hadoop/hdfs/data, 
/grid/10/hadoop/hdfs/data, /grid/11/hadoop/hdfs/data, 
/grid/12/hadoop/hdfs/data, /grid/13/hadoop/hdfs/data, 
/grid/14/hadoop/hdfs/data, /grid/15/hadoop/hdfs/data, 
/grid/16/hadoop/hdfs/data, /grid/17/hadoop/hdfs/data, 
/grid/18/hadoop/hdfs/data, /grid/19/hadoop/hdfs/data, 
/grid/20/hadoop/hdfs/data, /grid/21/hadoop/hdfs/data, 
/grid/22/hadoop/hdfs/data]'}, 
localName='lxdmelcly-lxw01-p01-whw10289.oan:10019', 
datanodeUuid='70403b64-cb39-4b4a-ac6c-787ce7bdbe2c', 
xmitsInProgress=0}:Exception transfering block 
BP-1743446178-172.18.16.38-1537373339905:blk_2196991498_1131235321 to mirror 
172.18.16.33:10019
org.apache.hadoop.hdfs.protocol.datatransfer.InvalidEncryptionKeyException: 
Can't re-compute encryption key for nonce, since the required block key 
(keyID=-213389155) doesn't exist. Current key: 1804780309
at 
org.apache.hadoop.hdfs.protocol.datatransfer.sasl.DataTransferSaslUtil.readSaslMessageAndNegotiatedCipherOption(DataTransferSaslUtil.java:419)
at 
org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.doSaslHandshake(SaslDataTransferClient.java:479)
at 
org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.getEncryptedStreams(SaslDataTransferClient.java:303)
at 
org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.send(SaslDataTransferClient.java:245)
at 
org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.checkTrustAndSend(SaslDataTransferClient.java:215)
at 
org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.socketSend(SaslDataTransferClient.java:183)
at 
org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:800)
at 
org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:173)
at 
org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:107)
at 
org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:290)
at java.lang.Thread.run(Thread.java:745)
2021-07-12 11:10:58,796 ERROR datanode.DataNode (DataXceiver.java:run(321)) - 
xxx:10019:DataXceiver error processing WRITE_BLOCK operation  src: 
/172.18.16.8:41992 dst: /172.18.16.20:10019
org.apache.hadoop.hdfs.protocol.datatransfer.InvalidEncryptionKeyException: 
Can't re-compute encryption key for nonce, since the required block key 
(keyID=-213389155) doesn't exist. Current key: 1804780309
{noformat}

We should handle this exception whenever SaslDataTransferClient.socketSend() is 
invoked:

DataXceiver.writeBlock()
BlockDispatcher.moveBlock()
DataNode.run()
DataXceiver.replaceBlock()
StripedBlockWriter.init()

This issue isn't that obvious, because the existing HDFS fault tolerance 
mechanisms should mask the data encryption key error.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-15650) Make the socket timeout for computing checksum of striped blocks configurable

2021-07-15 Thread Wei-Chiu Chuang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15650?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang resolved HDFS-15650.

Fix Version/s: 3.3.2
   3.2.3
   3.4.0
   Resolution: Fixed

Thanks [~yhaya] the PR is merged and cherrypicked to trunk, branch-3.3 and 
branch-3.2.

> Make the socket timeout for computing checksum of striped blocks configurable
> -
>
> Key: HDFS-15650
> URL: https://issues.apache.org/jira/browse/HDFS-15650
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode, ec, erasure-coding
>Reporter: Yushi Hayasaka
>Assignee: Yushi Hayasaka
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.2.3, 3.3.2
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> Regarding the DataNode tries to get the checksum of EC internal blocks from 
> another DataNode for computing the checksum of striped blocks, the timeout is 
> hard-coded now, but it should be configurable.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-16110) Remove unused method reportChecksumFailure in DFSClient

2021-07-05 Thread Wei-Chiu Chuang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16110?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang resolved HDFS-16110.

Fix Version/s: 3.4.0
   Resolution: Fixed

Thanks [~tomscut] and [~tasanuma] for the review!

> Remove unused method reportChecksumFailure in DFSClient
> ---
>
> Key: HDFS-16110
> URL: https://issues.apache.org/jira/browse/HDFS-16110
> Project: Hadoop HDFS
>  Issue Type: Wish
>Reporter: tomscut
>Assignee: tomscut
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> Remove unused method reportChecksumFailure and fix some code styles by the 
> way in DFSClient.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-11528) NameNode load EditRecords throws NPE

2021-07-02 Thread Wei-Chiu Chuang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-11528?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang resolved HDFS-11528.

Resolution: Duplicate

I'll resolve this one and use HDFS-14529 for further discussion.

> NameNode load EditRecords throws NPE
> 
>
> Key: HDFS-11528
> URL: https://issues.apache.org/jira/browse/HDFS-11528
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.7.1
>Reporter: Shangwen Tang
>Priority: Major
>
> this is mylog
> {noformat}
> [2017-03-13T19:18:02.187+08:00] [ERROR] 
> server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java 242) 
> [main] : Encountered exception on operation TimesOp [length=0, 
> path=/user/spark/log/application_1487848228144_0004, mtime=-1, 
> atime=1489392253959, opCode=OP_TIMES, txid=26215]
> java.lang.NullPointerException
> at 
> org.apache.hadoop.hdfs.server.namenode.FSDirAttrOp.unprotectedSetTimes(FSDirAttrOp.java:473)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSDirAttrOp.unprotectedSetTimes(FSDirAttrOp.java:299)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.applyEditLogOp(FSEditLogLoader.java:629)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:234)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:143)
> at org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:837)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:692)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:294)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:980)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:686)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:589)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:649)
> at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:816)
> at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:800)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1498)
> at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1564)
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-16086) Add volume information to datanode log for tracing

2021-06-30 Thread Wei-Chiu Chuang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16086?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang resolved HDFS-16086.

Fix Version/s: 3.4.0
   Resolution: Fixed

> Add volume information to datanode log for tracing
> --
>
> Key: HDFS-16086
> URL: https://issues.apache.org/jira/browse/HDFS-16086
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: tomscut
>Assignee: tomscut
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.4.0
>
> Attachments: CreatingRbw.jpg, Received.jpg
>
>  Time Spent: 3h 10m
>  Remaining Estimate: 0h
>
> To keep track of the block in volume, we can add the volume information to 
> the datanode log.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-16096) Delete useless method DirectoryWithQuotaFeature#setQuota

2021-06-30 Thread Wei-Chiu Chuang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16096?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang resolved HDFS-16096.

Resolution: Fixed

Merged the PR.

Thanks [~zhuxiangyi] and [~vjasani] for the review.

> Delete useless method DirectoryWithQuotaFeature#setQuota
> 
>
> Key: HDFS-16096
> URL: https://issues.apache.org/jira/browse/HDFS-16096
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs
>Reporter: Xiangyi Zhu
>Assignee: Xiangyi Zhu
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> Delete useless method DirectoryWithQuotaFeature#setQuota.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-16089) EC: Add metric EcReconstructionValidateTimeMillis for StripedBlockReconstructor

2021-06-29 Thread Wei-Chiu Chuang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16089?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang resolved HDFS-16089.

Fix Version/s: 3.3.2
   3.4.0
   Resolution: Fixed

> EC: Add metric EcReconstructionValidateTimeMillis for 
> StripedBlockReconstructor
> ---
>
> Key: HDFS-16089
> URL: https://issues.apache.org/jira/browse/HDFS-16089
> Project: Hadoop HDFS
>  Issue Type: Wish
>Reporter: tomscut
>Assignee: tomscut
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.3.2
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> Add metric EcReconstructionValidateTimeMillis for StripedBlockReconstructor, 
> so that we can count the elapsed time for striped block reconstructing.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-16074) Remove an expensive debug string concatenation

2021-06-16 Thread Wei-Chiu Chuang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16074?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang resolved HDFS-16074.

Fix Version/s: 3.3.2
   3.2.3
   3.4.0
   Resolution: Fixed

Thanks a lot for the review

> Remove an expensive debug string concatenation
> --
>
> Key: HDFS-16074
> URL: https://issues.apache.org/jira/browse/HDFS-16074
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 3.0.0-alpha1
>Reporter: Wei-Chiu Chuang
>Assignee: Wei-Chiu Chuang
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.2.3, 3.3.2
>
> Attachments: Screen Shot 2021-06-16 at 2.32.29 PM.png, Screen Shot 
> 2021-06-17 at 10.32.21 AM.png
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Running a YCSB load query, found that we do an expensive string concatenation 
> on the write path in DFSOutputStream.writeChunkPrepare(). 
> Nearly 25% of HDFS client write CPU time is spent here. It is not necessary 
> because it's supposed to be a debug message. So let's remove it.
> {code}
>  if (currentPacket == null) {
>   currentPacket = createPacket(packetSize, chunksPerPacket, getStreamer()
>   .getBytesCurBlock(), getStreamer().getAndIncCurrentSeqno(), false);
>   DFSClient.LOG.debug("WriteChunk allocating new packet seqno={},"
>   + " src={}, packetSize={}, chunksPerPacket={}, 
> bytesCurBlock={}",
>   currentPacket.getSeqno(), src, packetSize, chunksPerPacket,
>   getStreamer().getBytesCurBlock() + ", " + this); < here
> }
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-16074) Remove an expensive debug string concatenation

2021-06-16 Thread Wei-Chiu Chuang (Jira)
Wei-Chiu Chuang created HDFS-16074:
--

 Summary: Remove an expensive debug string concatenation
 Key: HDFS-16074
 URL: https://issues.apache.org/jira/browse/HDFS-16074
 Project: Hadoop HDFS
  Issue Type: Improvement
Affects Versions: 3.4.0
Reporter: Wei-Chiu Chuang
 Attachments: Screen Shot 2021-06-16 at 2.32.29 PM.png

Running a YCSB load query, found that we do an expensive string concatenation 
on the write path in DFSOutputStream.writeChunkPrepare(). 

Nearly 25% of HDFS client write CPU time is spent here. It is not necessary 
because it's supposed to be a debug message. So let's remove it.





--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-15790) Make ProtobufRpcEngineProtos and ProtobufRpcEngineProtos2 Co-Exist

2021-05-24 Thread Wei-Chiu Chuang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15790?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang resolved HDFS-15790.

Fix Version/s: 3.4.0
   3.3.1
   Resolution: Fixed

> Make ProtobufRpcEngineProtos and ProtobufRpcEngineProtos2 Co-Exist
> --
>
> Key: HDFS-15790
> URL: https://issues.apache.org/jira/browse/HDFS-15790
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Critical
>  Labels: pull-request-available, release-blocker
> Fix For: 3.3.1, 3.4.0
>
>  Time Spent: 4h 10m
>  Remaining Estimate: 0h
>
> Changing from Protobuf 2 to Protobuf 3 broke some stuff in Apache Hive 
> project.  This was not an awesome thing to do between minor versions in 
> regards to backwards compatibility for downstream projects.
> Additionally, these two frameworks are not drop-in replacements, they have 
> some differences.  Also, Protobuf 2 is not deprecated or anything so let us 
> have both protocols available at the same time.  In Hadoop 4.x Protobuf 2 
> support can be dropped.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-16027) HDFS-15245 breaks source code compatibility between 3.3.0 and 3.3.1.

2021-05-18 Thread Wei-Chiu Chuang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16027?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang resolved HDFS-16027.

Fix Version/s: 3.3.1
   Resolution: Fixed

Thanks [~ayushtkn]!

>  HDFS-15245 breaks source code compatibility between 3.3.0 and 3.3.1.
> -
>
> Key: HDFS-16027
> URL: https://issues.apache.org/jira/browse/HDFS-16027
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: journal-node, ui
>Affects Versions: 3.3.1
>Reporter: Wei-Chiu Chuang
>Assignee: Wei-Chiu Chuang
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 3.3.1
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Ok, this one is on me.
> JournalNodeMXBean is a Public, Evolving interface.
> But HDFS-15245 was cherrypicked to branch-3.3 which breaks source 
> compatibility between 3.3.0 and 3.3.1 by adding the following three methods:
> {noformat}
> /**
>* Get host and port of JournalNode.
>*
>* @return colon separated host and port.
>*/
>   String getHostAndPort();
>   /**
>* Get list of the clusters of JournalNode's journals
>* as one JournalNode may support multiple clusters.
>*
>* @return list of clusters.
>*/
>   List getClusterIds();
>   /**
>* Gets the version of Hadoop.
>*
>* @return the version of Hadoop.
>*/
>   String getVersion();
> {noformat}
> api checker error:
> {quote}
> Recompilation of a client program may be terminated with the message: a 
> client class C is not abstract and does not override abstract method 
> getClusterIds ( ) in JournalNodeMXBean.
> {quote}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-16027) HDFS-15245 breaks source code compatibility between 3.3.0 and 3.3.1.

2021-05-17 Thread Wei-Chiu Chuang (Jira)
Wei-Chiu Chuang created HDFS-16027:
--

 Summary:  HDFS-15245 breaks source code compatibility between 
3.3.0 and 3.3.1.
 Key: HDFS-16027
 URL: https://issues.apache.org/jira/browse/HDFS-16027
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: journal-node, ui
Affects Versions: 3.3.1
Reporter: Wei-Chiu Chuang
Assignee: Wei-Chiu Chuang


Ok, this one is on me.

JournalNodeMXBean is a Public, Evolving interface.

But HDFS-15245 was cherrypicked to branch-3.3 which breaks source compatibility 
between 3.3.0 and 3.3.1 by adding the following three methods:
{noformat}
/**
   * Get host and port of JournalNode.
   *
   * @return colon separated host and port.
   */
  String getHostAndPort();

  /**
   * Get list of the clusters of JournalNode's journals
   * as one JournalNode may support multiple clusters.
   *
   * @return list of clusters.
   */
  List getClusterIds();

  /**
   * Gets the version of Hadoop.
   *
   * @return the version of Hadoop.
   */
  String getVersion();
{noformat}





--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-13934) Multipart uploaders to be created through API call to FileSystem/FileContext, not service loader

2021-05-16 Thread Wei-Chiu Chuang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-13934?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang resolved HDFS-13934.

Resolution: Fixed

> Multipart uploaders to be created through API call to FileSystem/FileContext, 
> not service loader
> 
>
> Key: HDFS-13934
> URL: https://issues.apache.org/jira/browse/HDFS-13934
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: fs, fs/s3, hdfs
>Affects Versions: 3.2.0
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Major
> Fix For: 3.3.1
>
>
> the Multipart Uploaders are created via service loaders. This is troublesome
> # HADOOP-12636, HADOOP-13323, HADOOP-13625 highlight how the load process 
> forces the transient loading of dependencies.  If a dependent class cannot be 
> loaded (e.g aws-sdk is not on the classpath), that service won't load. 
> Without error handling round the load process, this stops any uploader from 
> loading. Even with that error handling, the performance hit of that load, 
> especially with reshaded dependencies, hurts performance (HADOOP-13138).
> # it makes wrapping the the load with any filter impossible, stops transitive 
> binding through viewFS, mocking, etc.
> # It complicates security in a kerberized world. If you have an FS instance 
> of user A, then you should be able to create an MPU instance with that user's 
> permissions. currently, if a service were to try to create one, you'd be 
> looking at doAs() games around the service loading, and a more complex bind 
> process.
> Proposed
> # remove the service loader mech entirely
> # add to FS & FC as createMultipartUploader(path) call, which will create one 
> bound to the current FS, with its permissions, DTs, etc.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-15995) Rework HDFS-15624 Fix the SetQuotaByStorageTypeOp problem after updating hadoop

2021-05-03 Thread Wei-Chiu Chuang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15995?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang resolved HDFS-15995.

Fix Version/s: 3.4.0
   Resolution: Done

> Rework HDFS-15624 Fix the SetQuotaByStorageTypeOp problem after updating 
> hadoop
> ---
>
> Key: HDFS-15995
> URL: https://issues.apache.org/jira/browse/HDFS-15995
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 3.4.0
>Reporter: Wei-Chiu Chuang
>Assignee: Wei-Chiu Chuang
>Priority: Blocker
> Fix For: 3.4.0
>
>
> As discussed in the mailing list, 
> {quote}
> In HDFS-15624 (fix the function of setting quota by storage type), A new 
> layout version was added
> NVDIMM_SUPPORT(-66, -61, "Support NVDIMM storage type");
> This was added for 3.4.0 (trunk)
> However, there's another jira
> HDFS-15566 (NN restart fails after RollingUpgrade from 3.1.3/3.2.1 to 3.3.0)
> SNAPSHOT_MODIFICATION_TIME(-66, -61, "Support modification time for 
> snapshot");
> where Brahma wanted to add a new layout version in branch-3.3 (3.3.1). The 
> patch got stalled awhile ago and I'm trying to commit it in preparation of 
> 3.3.1 release.
> However, both new layout versions conflict because they intend to use the new 
> same version id. We can't release 3.3.1 without HDFS-15566 but we can't use 
> layout id -66 because of HDFS-15624.
> I propose:
> revert HDFS-15624 (NVDIMM_SUPPORT),
> commit HDFS-15566 (SNAPSHOT_MODIFICATION_TIME)
> re-work on HDFS-15624 but with layout version id -67
> {quote}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-15624) Fix the SetQuotaByStorageTypeOp problem after updating hadoop

2021-04-29 Thread Wei-Chiu Chuang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15624?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang resolved HDFS-15624.

Resolution: Fixed

The updated patch was committed. Thanks Ayush for the help!

>  Fix the SetQuotaByStorageTypeOp problem after updating hadoop 
> ---
>
> Key: HDFS-15624
> URL: https://issues.apache.org/jira/browse/HDFS-15624
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs
>Affects Versions: 3.4.0
>Reporter: YaYun Wang
>Assignee: huangtianhua
>Priority: Major
>  Labels: pull-request-available, release-blocker
> Fix For: 3.4.0
>
>  Time Spent: 10.5h
>  Remaining Estimate: 0h
>
> HDFS-15025 adds a new storage Type NVDIMM, changes the ordinal() of the enum 
> of StorageType. And, setting the quota by storageType depends on the 
> ordinal(), therefore, it may cause the setting of quota to be invalid after 
> upgrade.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-16002) TestJournalNodeRespectsBindHostKeys#testHttpsBindHostKey very flaky

2021-04-29 Thread Wei-Chiu Chuang (Jira)
Wei-Chiu Chuang created HDFS-16002:
--

 Summary: TestJournalNodeRespectsBindHostKeys#testHttpsBindHostKey 
very flaky
 Key: HDFS-16002
 URL: https://issues.apache.org/jira/browse/HDFS-16002
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Wei-Chiu Chuang


This test appears to be failing a lot lately. I suspect it has to be with the 
new change to support reloading httpserver2 certificates, but I've not looked 
into it.
{noformat}
Stacktrace
java.lang.NullPointerException
at sun.nio.fs.UnixPath.normalizeAndCheck(UnixPath.java:77)
at sun.nio.fs.UnixPath.(UnixPath.java:71)
at sun.nio.fs.UnixFileSystem.getPath(UnixFileSystem.java:281)
at java.nio.file.Paths.get(Paths.java:84)
at 
org.apache.hadoop.http.HttpServer2$Builder.makeConfigurationChangeMonitor(HttpServer2.java:609)
at 
org.apache.hadoop.http.HttpServer2$Builder.createHttpsChannelConnector(HttpServer2.java:592)
at 
org.apache.hadoop.http.HttpServer2$Builder.build(HttpServer2.java:518)
at 
org.apache.hadoop.hdfs.qjournal.server.JournalNodeHttpServer.start(JournalNodeHttpServer.java:81)
at 
org.apache.hadoop.hdfs.qjournal.server.JournalNode.start(JournalNode.java:238)
at 
org.apache.hadoop.hdfs.qjournal.MiniJournalCluster.(MiniJournalCluster.java:120)
at 
org.apache.hadoop.hdfs.qjournal.MiniJournalCluster.(MiniJournalCluster.java:47)
at 
org.apache.hadoop.hdfs.qjournal.MiniJournalCluster$Builder.build(MiniJournalCluster.java:79)
at 
org.apache.hadoop.hdfs.qjournal.server.TestJournalNodeRespectsBindHostKeys.testHttpsBindHostKey(TestJournalNodeRespectsBindHostKeys.java:180)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
at 
org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:288)
at 
org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:282)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.lang.Thread.run(Thread.java:748)
{noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-15995) Rework HDFS-15624 Fix the SetQuotaByStorageTypeOp problem after updating hadoop

2021-04-23 Thread Wei-Chiu Chuang (Jira)
Wei-Chiu Chuang created HDFS-15995:
--

 Summary: Rework HDFS-15624 Fix the SetQuotaByStorageTypeOp problem 
after updating hadoop
 Key: HDFS-15995
 URL: https://issues.apache.org/jira/browse/HDFS-15995
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 3.4.0
Reporter: Wei-Chiu Chuang
Assignee: Wei-Chiu Chuang


As discussed in the mailing list, 

{quote}
In HDFS-15624 (fix the function of setting quota by storage type), A new layout 
version was added
NVDIMM_SUPPORT(-66, -61, "Support NVDIMM storage type");
This was added for 3.4.0 (trunk)

However, there's another jira
HDFS-15566 (NN restart fails after RollingUpgrade from 3.1.3/3.2.1 to 3.3.0)
SNAPSHOT_MODIFICATION_TIME(-66, -61, "Support modification time for snapshot");

where Brahma wanted to add a new layout version in branch-3.3 (3.3.1). The 
patch got stalled awhile ago and I'm trying to commit it in preparation of 
3.3.1 release.

However, both new layout versions conflict because they intend to use the new 
same version id. We can't release 3.3.1 without HDFS-15566 but we can't use 
layout id -66 because of HDFS-15624.

I propose:
revert HDFS-15624 (NVDIMM_SUPPORT),
commit HDFS-15566 (SNAPSHOT_MODIFICATION_TIME)
re-work on HDFS-15624 but with layout version id -67
{quote}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Reopened] (HDFS-15624) Fix the SetQuotaByStorageTypeOp problem after updating hadoop

2021-04-22 Thread Wei-Chiu Chuang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15624?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang reopened HDFS-15624:


>  Fix the SetQuotaByStorageTypeOp problem after updating hadoop 
> ---
>
> Key: HDFS-15624
> URL: https://issues.apache.org/jira/browse/HDFS-15624
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs
>Affects Versions: 3.4.0
>Reporter: YaYun Wang
>Assignee: huangtianhua
>Priority: Major
>  Labels: pull-request-available, release-blocker
> Fix For: 3.4.0
>
>  Time Spent: 9h 40m
>  Remaining Estimate: 0h
>
> HDFS-15025 adds a new storage Type NVDIMM, changes the ordinal() of the enum 
> of StorageType. And, setting the quota by storageType depends on the 
> ordinal(), therefore, it may cause the setting of quota to be invalid after 
> upgrade.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-15734) [READ] DirectoryScanner#scan need not check StorageType.PROVIDED

2021-02-22 Thread Wei-Chiu Chuang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15734?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang resolved HDFS-15734.

Fix Version/s: 3.40
   Resolution: Fixed

> [READ] DirectoryScanner#scan need not check StorageType.PROVIDED
> 
>
> Key: HDFS-15734
> URL: https://issues.apache.org/jira/browse/HDFS-15734
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Reporter: Yuxuan Wang
>Assignee: Yuxuan Wang
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.40
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Since https://issues.apache.org/jira/browse/HDFS-12777 , there is no PROVIDED 
> storage in volume report.
> We don't need check it in DirectoryScanner#scan



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-15826) Solve the problem of incorrect progress of delegation tokens when loading FsImage

2021-02-21 Thread Wei-Chiu Chuang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15826?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang resolved HDFS-15826.

Fix Version/s: 3.2.3
   3.1.5
   3.4.0
   3.3.1
   Resolution: Fixed

Thanks! 

> Solve the problem of incorrect progress of delegation tokens when loading 
> FsImage
> -
>
> Key: HDFS-15826
> URL: https://issues.apache.org/jira/browse/HDFS-15826
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: JiangHua Zhu
>Assignee: JiangHua Zhu
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.3.1, 3.4.0, 3.1.5, 3.2.3
>
> Attachments: 2.jpg, in_ progress.jpg
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> When loading the FsImage, if the delegation tokens information is included, 
> the progress bar is displayed on the ui as 100%. However, the delegation 
> tokens information is still being processed at this time, which is incorrect.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-15801) Backport HDFS-14582 to branch-2.10 (Failed to start DN with ArithmeticException when NULL checksum used)

2021-02-01 Thread Wei-Chiu Chuang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15801?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang resolved HDFS-15801.

Fix Version/s: 2.10.2
   Resolution: Fixed

Thanks. This is merged.

> Backport HDFS-14582 to branch-2.10 (Failed to start DN with 
> ArithmeticException when NULL checksum used)
> 
>
> Key: HDFS-15801
> URL: https://issues.apache.org/jira/browse/HDFS-15801
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Janus Chow
>Assignee: Janus Chow
>Priority: Major
>  Labels: pull-request-available
> Fix For: 2.10.2
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> In HDFS-14582, the error message is more clear as follows:
> {code:java}
> Caused by: java.lang.ArithmeticException: / by zero
> at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.BlockPoolSlice.validateIntegrityAndSetLength(BlockPoolSlice.java:823)
> at 
> {code}
> But in branch-2.10.1, the exception message is omitted as follows:
> {code:java}
> 2021-01-29 14:20:30,694 INFO  impl.FsDatasetImpl (FsVolumeList.java:run(204)) 
> - Caught exception while adding replicas from /mnt/disk/0/hdfs/data/current. 
> Will throw later.
> java.io.IOException: Failed to start sub tasks to add replica in replica map 
> :java.lang.ArithmeticExceptionjava.io.IOException: Failed to start sub tasks 
> to add replica in replica map :java.lang.ArithmeticException at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.BlockPoolSlice.getVolumeMap(BlockPoolSlice.java:434)
>  at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeImpl.getVolumeMap(FsVolumeImpl.java:930)
>  at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeList$1.run(FsVolumeList.java:196)
> {code}
> The specific error message is omitted, causing it harder to find the root 
> cause.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-15791) Possible Resource Leak in FSImageFormatProtobuf

2021-02-01 Thread Wei-Chiu Chuang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15791?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang resolved HDFS-15791.

Fix Version/s: 3.4.0
   Resolution: Fixed

> Possible Resource Leak in FSImageFormatProtobuf
> ---
>
> Key: HDFS-15791
> URL: https://issues.apache.org/jira/browse/HDFS-15791
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Reporter: Narges Shadab
>Assignee: Narges Shadab
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> We noticed a possible resource leak 
> [here|https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSImageFormatProtobuf.java#L271].
>  If an I/O error occurs at line 
> [273|https://github.com/apache/hadoop/blob/06a5d3437f68546207f18d23fe527895920c756a/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSImageFormatProtobuf.java#L273]
>  or 
> [277|https://github.com/apache/hadoop/blob/06a5d3437f68546207f18d23fe527895920c756a/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSImageFormatProtobuf.java#L277],
>  {{fin}} remains open since the exception isn't caught locally, and there is 
> no way for any caller to close the FileInputStream
> I'll submit a pull request to fix it.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-15777) Hadoop

2021-01-17 Thread Wei-Chiu Chuang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15777?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang resolved HDFS-15777.

Resolution: Invalid

> Hadoop
> --
>
> Key: HDFS-15777
> URL: https://issues.apache.org/jira/browse/HDFS-15777
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Pushpalatha S K
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-15719) [Hadoop 3] Both NameNodes can crash simultaneously due to the short JN socket timeout

2021-01-04 Thread Wei-Chiu Chuang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15719?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang resolved HDFS-15719.

Fix Version/s: 3.2.3
   3.1.5
   3.4.0
   3.3.1
   Resolution: Fixed

> [Hadoop 3] Both NameNodes can crash simultaneously due to the short JN socket 
> timeout
> -
>
> Key: HDFS-15719
> URL: https://issues.apache.org/jira/browse/HDFS-15719
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 3.0.0
>Reporter: Wei-Chiu Chuang
>Assignee: Wei-Chiu Chuang
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 3.3.1, 3.4.0, 3.1.5, 3.2.3
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> After Hadoop 3, we migrated Jetty 6 to Jetty 9. It was implemented in 
> HADOOP-10075.
> However, HADOOP-10075 erroneously set the HttpServer2 socket idle timeout too 
> low.
> We replaced SelectChannelConnector.setLowResourceMaxIdleTime() with 
> ServerConnector.setIdleTimeout() but they aren't the same.
> Essentially, the HttpServer2's idle timeout was the default timeout set by 
> Jetty 6, which is 200 seconds. After Hadoop 3, the idle timeout is set to 10 
> seconds, which is unreasonable for JN. If NameNodes try to download a big 
> edit log from JournalNodes (say a few hundred MB), it is likely to exceed 10 
> seconds. When it happens, both NN crashes and there's no way to workaround 
> unless you apply the patch in HADOOP-15696 to add a config switch for the 
> idle timeout. Fortunately, it doesn't happen a lot.
> Propose: bump the idle timeout default to 200 seconds to match the behavior 
> in Jetty 6. (Jetty 9 reduces the default idle timeout to 30 seconds, which is 
> not suitable for JN)
> Other things to consider:
> 1. fsck serverlet? (somehow I suspect this is related to the socket timeout 
> reported in HDFS-7175)
> 2. webhdfs, httpfs? --> we've also received reports that webhdfs can timeout. 
> so having a longer timeout makes sense here.
> 2. kms? will the longer timeout cause more lingering sockets?
> Thanks [~zhenshan.wen] for the discussion.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-15720) namenode audit async logger should add some log4j config

2020-12-10 Thread Wei-Chiu Chuang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15720?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang resolved HDFS-15720.

Fix Version/s: 3.2.3
   3.1.5
   3.3.1
   Resolution: Fixed

Thanks. This is merged and cherrypicked to branch-3.3 ~branch-3.1.

> namenode audit async logger should add some log4j config
> 
>
> Key: HDFS-15720
> URL: https://issues.apache.org/jira/browse/HDFS-15720
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs
>Affects Versions: 3.3.0
> Environment: hadoop 3.3.0
>Reporter: Max  Xie
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.3.1, 3.1.5, 3.2.3
>
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> Hadoop project use log4j 1.2.x, we can't config some properties of logger in 
> log4j.properties file , For example, AsyncAppender BufferSize and Blocking 
> see https://logging.apache.org/log4j/1.2/apidocs/index.html.
> Namenode  should add some audit async logger log4j config In order to 
> facilitate the adjustment of log4j usage and audit log output performance 
> adjustment. 
> The new configuration is as follows
> dfs.namenode.audit.log.async.blocking false
> dfs.namenode.audit.log.async.buffer.size 128
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-15719) [Hadoop 3] Both NameNodes can crash simultaneously due to the short JN socket timeout

2020-12-08 Thread Wei-Chiu Chuang (Jira)
Wei-Chiu Chuang created HDFS-15719:
--

 Summary: [Hadoop 3] Both NameNodes can crash simultaneously due to 
the short JN socket timeout
 Key: HDFS-15719
 URL: https://issues.apache.org/jira/browse/HDFS-15719
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 3.0.0
Reporter: Wei-Chiu Chuang


After Hadoop 3, we migrated Jetty 6 to Jetty 9. It was implemented in 
HADOOP-10075.

However, HADOOP-10075 erroneously set the HttpServer2 socket idle timeout too 
low.
We replaced SelectChannelConnector.setLowResourceMaxIdleTime() with 
ServerConnector.setIdleTimeout() but they aren't the same.

Essentially, the HttpServer2's idle timeout was the default timeout set by 
Jetty 6, which is 200 seconds. After Hadoop 3, the idle timeout is set to 10 
seconds, which is unreasonable for JN. If NameNodes try to download a big edit 
log from JournalNodes (say a few hundred MB), it is likely to exceed 10 
seconds. When it happens, both NN crashes and there's no way to workaround 
unless you apply the patch in HADOOP-15696 to add a config switch for the idle 
timeout. Fortunately, it doesn't happen a lot.

Propose: bump the idle timeout default to 200 seconds to match the behavior in 
Jetty 6. (Jetty 9 reduces the default idle timeout to 30 seconds, which is not 
suitable for JN)

Other things to consider:
1. fsck serverlet? (somehow I suspect this is related to the socket timeout 
reported in HDFS-7175)
2. webhdfs, httpfs? --> we've also received reports that webhdfs can timeout. 
so having a longer timeout makes sense here.
2. kms? will the longer timeout cause more lingering sockets?

Thanks [~zhenshan.wen] for the discussion.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-15709) EC: Socket file descriptor leak in StripedBlockChecksumReconstructor

2020-12-07 Thread Wei-Chiu Chuang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15709?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang resolved HDFS-15709.

Fix Version/s: 3.2.3
   3.1.5
   3.4.0
   3.3.1
   Resolution: Fixed

Thanks! I merged the PR and cherrypicked the fix to branch-3.* branches.

> EC: Socket file descriptor leak in StripedBlockChecksumReconstructor
> 
>
> Key: HDFS-15709
> URL: https://issues.apache.org/jira/browse/HDFS-15709
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode, ec, erasure-coding
>Reporter: Yushi Hayasaka
>Assignee: Yushi Hayasaka
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.3.1, 3.4.0, 3.1.5, 3.2.3
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> We found a socket file descriptor leak when we tried to get the checksum of 
> EC file with reconstruction happened during the operation.
> The cause of the leak seems that the StripedBlockChecksumReconstructor does 
> not close StripedReader. Making the reader closed, the CLOSE_WAIT connections 
> are gone.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-15706) HttpFS: Log more information on request failures

2020-12-03 Thread Wei-Chiu Chuang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15706?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang resolved HDFS-15706.

Fix Version/s: 3.2.3
   3.1.5
   3.4.0
   3.3.1
   Resolution: Fixed

Thanks [~ahussein]

> HttpFS: Log more information on request failures
> 
>
> Key: HDFS-15706
> URL: https://issues.apache.org/jira/browse/HDFS-15706
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: httpfs
>Reporter: Ahmed Hussein
>Assignee: Ahmed Hussein
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.3.1, 3.4.0, 3.1.5, 3.2.3
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> [~kihwal] reported that the exception provider does not log anything for the 
> requests failing with 403 or 500. This has made debugging extremely 
> difficult. As more customers are using HttpFS server, this needs an 
> improvement:
> * Turn on/off logging.
> * Log the full stack trace with the exception when it is on.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-15485) Fix outdated properties of JournalNode when performing rollback

2020-11-10 Thread Wei-Chiu Chuang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15485?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang resolved HDFS-15485.

Fix Version/s: 3.2.3
   3.1.5
   3.3.1
   Resolution: Fixed

Cherrypicked the commit into branch-3.3 ~ branch-3.1. Thanks [~Deegue]!

> Fix outdated properties of JournalNode when performing rollback
> ---
>
> Key: HDFS-15485
> URL: https://issues.apache.org/jira/browse/HDFS-15485
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Deegue
>Assignee: Deegue
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.3.1, 3.4.0, 3.1.5, 3.2.3
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> When rollback HDFS cluster, properties in JNStorage won't be refreshed after 
> the storage dir changed. It leads to exceptions when starting namenode.
> The exception like:
> {code:java}
> 2020-07-09 19:04:12,810 FATAL [IPC Server handler 105 on 8022] 
> org.apache.hadoop.hdfs.server.namenode.FSEditLog: Error: 
> recoverUnfinalizedSegments failed for required journal 
> (JournalAndStream(mgr=QJM to [10.0.118.217:8485, 10.0.117.208:8485, 
> 10.0.118.179:8485], stream=null))
> org.apache.hadoop.hdfs.qjournal.client.QuorumException: Got too many 
> exceptions to achieve quorum size 2/3. 3 exceptions thrown:
> 10.0.118.217:8485: Incompatible namespaceID for journal Storage Directory 
> /mnt/vdc-11176G-0/dfs/jn/nameservicetest1: NameNode has nsId 647617129 but 
> storage has nsId 0
>   at 
> org.apache.hadoop.hdfs.qjournal.server.JNStorage.checkConsistentNamespace(JNStorage.java:236)
>   at 
> org.apache.hadoop.hdfs.qjournal.server.Journal.newEpoch(Journal.java:300)
>   at 
> org.apache.hadoop.hdfs.qjournal.server.JournalNodeRpcServer.newEpoch(JournalNodeRpcServer.java:136)
>   at 
> org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolServerSideTranslatorPB.newEpoch(QJournalProtocolServerSideTranslatorPB.java:133)
>   at 
> org.apache.hadoop.hdfs.qjournal.protocol.QJournalProtocolProtos$QJournalProtocolService$2.callBlockingMethod(QJournalProtocolProtos.java:25417)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:617)
>   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1073)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2278)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2274)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1924)
>   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2274)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-13639) SlotReleaser is not fast enough

2020-05-21 Thread Wei-Chiu Chuang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-13639?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang resolved HDFS-13639.

Fix Version/s: 3.4.0
   Resolution: Fixed

> SlotReleaser is not fast enough
> ---
>
> Key: HDFS-13639
> URL: https://issues.apache.org/jira/browse/HDFS-13639
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client
>Affects Versions: 2.4.0, 2.6.0, 3.0.2
> Environment: 1. YCSB:
> {color:#00} recordcount=20
>  fieldcount=1
>  fieldlength=1000
>  operationcount=1000
>  
>  workload=com.yahoo.ycsb.workloads.CoreWorkload
>  
>  table=ycsb-test
>  columnfamily=C
>  readproportion=1
>  updateproportion=0
>  insertproportion=0
>  scanproportion=0
>  
>  maxscanlength=0
>  requestdistribution=zipfian
>  
>  # default 
>  readallfields=true
>  writeallfields=true
>  scanlengthdistribution=constan{color}
> {color:#00}2. datanode:{color}
> -Xmx2048m -Xms2048m -Xmn1024m -XX:MaxDirectMemorySize=1024m 
> -XX:MaxPermSize=256m -Xloggc:$run_dir/stdout/datanode_gc_${start_time}.log 
> -XX:+DisableExplicitGC -XX:+HeapDumpOnOutOfMemoryError 
> -XX:HeapDumpPath=$log_dir -XX:+PrintGCApplicationStoppedTime 
> -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=80 
> -XX:+UseCMSInitiatingOccupancyOnly -XX:+CMSParallelRemarkEnabled 
> -XX:+CMSClassUnloadingEnabled -XX:CMSMaxAbortablePrecleanTime=1 
> -XX:+CMSScavengeBeforeRemark -XX:+PrintPromotionFailure 
> -XX:+CMSConcurrentMTEnabled -XX:+ExplicitGCInvokesConcurrent 
> -XX:+SafepointTimeout -XX:MonitorBound=16384 -XX:-UseBiasedLocking 
> -verbose:gc -XX:+PrintGCDetails -XX:+PrintHeapAtGC -XX:+PrintGCDateStamps
> {color:#00}3. regionserver:{color}
> {color:#00}-Xmx10g -Xms10g -XX:MaxDirectMemorySize=10g 
> -XX:MaxGCPauseMillis=150 -XX:MaxTenuringThreshold=2 
> -XX:+UnlockExperimentalVMOptions -XX:G1NewSizePercent=5 
> -Xloggc:$run_dir/stdout/regionserver_gc_${start_time}.log -Xss256k 
> -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=$log_dir -verbose:gc 
> -XX:+PrintGC -XX:+PrintGCDetails -XX:+PrintGCApplicationStoppedTime 
> -XX:+PrintHeapAtGC -XX:+PrintGCDateStamps -XX:+PrintAdaptiveSizePolicy 
> -XX:+PrintTenuringDistribution -XX:+PrintSafepointStatistics 
> -XX:PrintSafepointStatisticsCount=1 -XX:PrintFLSStatistics=1 
> -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=100 -XX:GCLogFileSize=128m 
> -XX:+SafepointTimeout -XX:MonitorBound=16384 -XX:-UseBiasedLocking 
> -XX:+UseG1GC -XX:InitiatingHeapOccupancyPercent=65 
> -XX:+ParallelRefProcEnabled -XX:ConcGCThreads=4 -XX:ParallelGCThreads=16 
> -XX:G1HeapRegionSize=32m -XX:G1MixedGCCountTarget=64 
> -XX:G1OldCSetRegionThresholdPercent=5{color}
> {color:#00}block cache is disabled:{color}{color:#00} 
>  hbase.bucketcache.size
>  0.9
>  {color}
>  
>Reporter: Gang Xie
>Assignee: Lisheng Sun
>Priority: Major
> Fix For: 3.4.0
>
> Attachments: HDFS-13639-2.4.diff, HDFS-13639.001.patch, 
> HDFS-13639.002.patch, ShortCircuitCache_new_slotReleaser.diff, 
> perf_after_improve_SlotReleaser.png, perf_before_improve_SlotReleaser.png
>
>
> When test the performance of the ShortCircuit Read of the HDFS with YCSB, we 
> find that SlotReleaser of the ShortCircuitCache has some performance issue. 
> The problem is that, the qps of the slot releasing could only reach to 1000+ 
> while the qps of the slot allocating is ~3000. This means that the replica 
> info on datanode could not be released in time, which causes a lot of GCs and 
> finally full GCs.
>  
> The fireflame graph shows that SlotReleaser spends a lot of time to do domain 
> socket connecting and throw/catching the exception when close the domain 
> socket and its streams. It doesn't make any sense to do the connecting and 
> closing each time. Each time when we connect to the domain socket, Datanode 
> allocates a new thread to free the slot. There are a lot of initializing 
> work, and it's costly. We need reuse the domain socket. 
>  
> After switch to reuse the domain socket(see diff attached), we get great 
> improvement(see the perf):
>  # without reusing the domain socket, the get qps of the YCSB getting worse 
> and worse, and after about 45 mins, full GC starts. When we reuse the domain 
> socket, no full GC found, and the stress test could be finished smoothly, the 
> qps of allocating and releasing match.
>  # Due to the datanode young GC, without the improvement, the YCSB get qps is 
> even smaller than the one with the improvement, ~3700 VS ~4200.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-15202) HDFS-client: boost ShortCircuit Cache

2020-05-18 Thread Wei-Chiu Chuang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15202?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang resolved HDFS-15202.

Fix Version/s: 3.4.0
   Resolution: Fixed

> HDFS-client: boost ShortCircuit Cache
> -
>
> Key: HDFS-15202
> URL: https://issues.apache.org/jira/browse/HDFS-15202
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: dfsclient
> Environment: 4 nodes E5-2698 v4 @ 2.20GHz, 700 Gb Mem.
> 8 RegionServers (2 by host)
> 8 tables by 64 regions by 1.88 Gb data in each = 900 Gb total
> Random read in 800 threads via YCSB and a little bit updates (10% of reads)
>Reporter: Danil Lipovoy
>Assignee: Danil Lipovoy
>Priority: Minor
> Fix For: 3.4.0
>
> Attachments: HDFS_CPU_full_cycle.png, cpu_SSC.png, cpu_SSC2.png, 
> hdfs_cpu.png, hdfs_reads.png, hdfs_scc_3_test.png, 
> hdfs_scc_test_full-cycle.png, locks.png, requests_SSC.png
>
>
> ТотI want to propose how to improve reading performance HDFS-client. The 
> idea: create few instances ShortCircuit caches instead of one. 
> The key points:
> 1. Create array of caches (set by 
> clientShortCircuitNum=*dfs.client.short.circuit.num*, see in the pull 
> requests below):
> {code:java}
> private ClientContext(String name, DfsClientConf conf, Configuration config) {
> ...
> shortCircuitCache = new ShortCircuitCache[this.clientShortCircuitNum];
> for (int i = 0; i < this.clientShortCircuitNum; i++) {
>   this.shortCircuitCache[i] = ShortCircuitCache.fromConf(scConf);
> }
> {code}
> 2 Then divide blocks by caches:
> {code:java}
>   public ShortCircuitCache getShortCircuitCache(long idx) {
> return shortCircuitCache[(int) (idx % clientShortCircuitNum)];
>   }
> {code}
> 3. And how to call it:
> {code:java}
> ShortCircuitCache cache = 
> clientContext.getShortCircuitCache(block.getBlockId());
> {code}
> The last number of offset evenly distributed from 0 to 9 - that's why all 
> caches will full approximately the same.
> It is good for performance. Below the attachment, it is load test reading 
> HDFS via HBase where clientShortCircuitNum = 1 vs 3. We can see that 
> performance grows ~30%, CPU usage about +15%. 
> Hope it is interesting for someone.
> Ready to explain some unobvious things.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-15334) INodeAttributeProvider's new API checkPermissionWithContext not getting called in for authorization

2020-05-05 Thread Wei-Chiu Chuang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15334?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang resolved HDFS-15334.

Fix Version/s: 3.3.0
   Resolution: Fixed

PR is merged. Cherrypicked to branch-3.3 and branch-3.3.0.
Thans Arpit.

> INodeAttributeProvider's new API checkPermissionWithContext not getting 
> called in for authorization
> ---
>
> Key: HDFS-15334
> URL: https://issues.apache.org/jira/browse/HDFS-15334
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 3.3.0
>Reporter: Wei-Chiu Chuang
>Assignee: Wei-Chiu Chuang
>Priority: Major
> Fix For: 3.3.0
>
>
> Our integration test found the new API is not being used.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-15334) INodeAttributeProvider's new API checkPermissionWithContext not getting called in for authorization

2020-05-05 Thread Wei-Chiu Chuang (Jira)
Wei-Chiu Chuang created HDFS-15334:
--

 Summary: INodeAttributeProvider's new API 
checkPermissionWithContext not getting called in for authorization
 Key: HDFS-15334
 URL: https://issues.apache.org/jira/browse/HDFS-15334
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 3.3.0
Reporter: Wei-Chiu Chuang


Our integration test found the new API is not being used.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-15270) Account for *env == NULL in hdfsThreadDestructor

2020-05-04 Thread Wei-Chiu Chuang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15270?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang resolved HDFS-15270.

Fix Version/s: 3.4.0
   Resolution: Fixed

Thanks [~babsingh] this is in the trunk. Do you have a branch in mind that you 
want this cherrypicked to?

> Account for *env == NULL in hdfsThreadDestructor
> 
>
> Key: HDFS-15270
> URL: https://issues.apache.org/jira/browse/HDFS-15270
> Project: Hadoop HDFS
>  Issue Type: Bug
> Environment: Please refer to the "steps to reproduce" the failure in 
> https://github.com/eclipse/openj9/issues/7752#issue-521732953.
>Reporter: Babneet Singh
>Assignee: Babneet Singh
>Priority: Major
> Fix For: 3.4.0
>
>
> OpenJ9 JVM properly terminates the thread before hdfsThreadDestructor is
> invoked. JNIEnv is a mirror of J9VMThread in OpenJ9. After proper thread
> termination, accessing JNIEnv in hdfsThreadDestructor (*env)->GetJavaVM,
> yields a SIGSEGV since *env is NULL after thread cleanup is performed.
> The main purpose of hdfsThreadDestructor is to invoke
> DetachCurrentThread, which performs thread cleanup in OpenJ9. Since
> OpenJ9 performs thread cleanup before hdfsThreadDestructor is invoked,
> hdfsThreadDestructor should account for *env == NULL and skip
> DetachCurrentThread.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-15317) Fix libhdfspp warnings

2020-04-30 Thread Wei-Chiu Chuang (Jira)
Wei-Chiu Chuang created HDFS-15317:
--

 Summary: Fix libhdfspp warnings
 Key: HDFS-15317
 URL: https://issues.apache.org/jira/browse/HDFS-15317
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: libhdfs++
Reporter: Wei-Chiu Chuang


Saw these warnings in an unrelated libhdfs patch precommit build.
{noformat}
[WARNING] 
/home/jenkins/jenkins-slave/workspace/hadoop-multibranch_PR-1951/src/hadoop-hdfs-project/hadoop-hdfs-native-client/target/main/native/libhdfspp/lib/proto/ReconfigurationProtocol.pb.cc:286:13:
 warning: 'dynamic_init_dummy_ReconfigurationProtocol_2eproto' defined but not 
used [-Wunused-variable]
[WARNING] 
/home/jenkins/jenkins-slave/workspace/hadoop-multibranch_PR-1951/src/hadoop-hdfs-project/hadoop-hdfs-native-client/target/main/native/libhdfspp/lib/proto/acl.pb.cc:533:13:
 warning: 'dynamic_init_dummy_acl_2eproto' defined but not used 
[-Wunused-variable]
[WARNING] 
/home/jenkins/jenkins-slave/workspace/hadoop-multibranch_PR-1951/src/hadoop-hdfs-project/hadoop-hdfs-native-client/target/main/native/libhdfspp/lib/proto/encryption.pb.cc:467:13:
 warning: 'dynamic_init_dummy_encryption_2eproto' defined but not used 
[-Wunused-variable]
[WARNING] 
/home/jenkins/jenkins-slave/workspace/hadoop-multibranch_PR-1951/src/hadoop-hdfs-project/hadoop-hdfs-native-client/target/main/native/libhdfspp/lib/proto/erasurecoding.pb.cc:745:13:
 warning: 'dynamic_init_dummy_erasurecoding_2eproto' defined but not used 
[-Wunused-variable]
[WARNING] 
/home/jenkins/jenkins-slave/workspace/hadoop-multibranch_PR-1951/src/hadoop-hdfs-project/hadoop-hdfs-native-client/target/main/native/libhdfspp/tests/test_rpc_service.pb.cc:122:13:
 warning: 'dynamic_init_dummy_test_5frpc_5fservice_2eproto' defined but not 
used [-Wunused-variable] {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-15269) NameNode should check the authorization API version only once during initialization

2020-04-09 Thread Wei-Chiu Chuang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15269?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang resolved HDFS-15269.

Fix Version/s: 3.3.0
   Resolution: Fixed

PR is merged in trunk, and cherrypicked to branch-3.3. 

Thanks [~aajisaka] and [~tasanuma]!

> NameNode should check the authorization API version only once during 
> initialization
> ---
>
> Key: HDFS-15269
> URL: https://issues.apache.org/jira/browse/HDFS-15269
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 3.3.0
>Reporter: Wei-Chiu Chuang
>Assignee: Wei-Chiu Chuang
>Priority: Blocker
> Fix For: 3.3.0
>
>
> After HDFS-14743, every authorization check logs a messages like the following
> {noformat}
> 2020-04-07 23:44:55,276 INFO org.apache.hadoop.security.UserGroupInformation: 
> Default authorization provider supports the new authorization provider API
> 2020-04-07 23:44:55,276 INFO org.apache.hadoop.security.UserGroupInformation: 
> Default authorization provider supports the new authorization provider API
> 2020-04-07 23:44:55,277 INFO org.apache.hadoop.security.UserGroupInformation: 
> Default authorization provider supports the new authorization provider API
> 2020-04-07 23:44:55,278 INFO org.apache.hadoop.security.UserGroupInformation: 
> Default authorization provider supports the new authorization provider API
> {noformat}
> The intend was to check for authorization provider's API compatibility during 
> initialization but apparently it's not. This will result in serious 
> performance regression.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-15269) NameNode should check the authorization API version only once during initialization

2020-04-07 Thread Wei-Chiu Chuang (Jira)
Wei-Chiu Chuang created HDFS-15269:
--

 Summary: NameNode should check the authorization API version only 
once during initialization
 Key: HDFS-15269
 URL: https://issues.apache.org/jira/browse/HDFS-15269
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Affects Versions: 3.3.0
Reporter: Wei-Chiu Chuang


After HDFS-14743, every authorization check logs a messages like the following

{noformat}
2020-04-07 23:44:55,276 INFO org.apache.hadoop.security.UserGroupInformation: 
Default authorization provider supports the new authorization provider API
2020-04-07 23:44:55,276 INFO org.apache.hadoop.security.UserGroupInformation: 
Default authorization provider supports the new authorization provider API
2020-04-07 23:44:55,277 INFO org.apache.hadoop.security.UserGroupInformation: 
Default authorization provider supports the new authorization provider API
2020-04-07 23:44:55,278 INFO org.apache.hadoop.security.UserGroupInformation: 
Default authorization provider supports the new authorization provider API
{noformat}

The intend was to check for authorization provider's API compatibility during 
initialization. This will result in serious performance regression.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-14587) Support fail fast when client wait ACK by pipeline over threshold

2020-03-23 Thread Wei-Chiu Chuang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14587?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang resolved HDFS-14587.

Resolution: Duplicate

I believe this is a dup of HDFS-8311, so I"ll resolve this one. Feel free to 
reopen if I am wrong.

> Support fail fast when client wait ACK by pipeline over threshold
> -
>
> Key: HDFS-14587
> URL: https://issues.apache.org/jira/browse/HDFS-14587
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client
>Reporter: Xiaoqiao He
>Assignee: Xiaoqiao He
>Priority: Major
>
> Recently, I meet corner case that client wait for data to be acknowledged by 
> pipeline over 9 hours. After check branch trunk, I think this issue still 
> exist. So I propose to add threshold about wait timeout then fail fast.
> {code:java}
> 2019-06-18 12:53:46,217 WARN [Thread-127] org.apache.hadoop.hdfs.DFSClient: 
> Slow waitForAckedSeqno took 35560718ms (threshold=3ms)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Reopened] (HDFS-15113) Missing IBR when NameNode restart if open processCommand async feature

2020-03-22 Thread Wei-Chiu Chuang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15113?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang reopened HDFS-15113:


Reopen to have the addendum tested.

> Missing IBR when NameNode restart if open processCommand async feature
> --
>
> Key: HDFS-15113
> URL: https://issues.apache.org/jira/browse/HDFS-15113
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Reporter: Xiaoqiao He
>Assignee: Xiaoqiao He
>Priority: Blocker
> Fix For: 3.3.0
>
> Attachments: HDFS-15113.001.patch, HDFS-15113.002.patch, 
> HDFS-15113.003.patch, HDFS-15113.004.patch, HDFS-15113.005.patch, 
> HDFS-15113.addendum.patch
>
>
> Recently, I meet one case that NameNode missing block after restart which is 
> related with HDFS-14997.
> a. during NameNode restart, it will return command `DNA_REGISTER` to DataNode 
> when receive some RPC request from DataNode.
> b. when DataNode receive `DNA_REGISTER` command, it will run #reRegister 
> async.
> {code:java}
>   void reRegister() throws IOException {
> if (shouldRun()) {
>   // re-retrieve namespace info to make sure that, if the NN
>   // was restarted, we still match its version (HDFS-2120)
>   NamespaceInfo nsInfo = retrieveNamespaceInfo();
>   // and re-register
>   register(nsInfo);
>   scheduler.scheduleHeartbeat();
>   // HDFS-9917,Standby NN IBR can be very huge if standby namenode is down
>   // for sometime.
>   if (state == HAServiceState.STANDBY || state == 
> HAServiceState.OBSERVER) {
> ibrManager.clearIBRs();
>   }
> }
>   }
> {code}
> c. As we know, #register will trigger BR immediately.
> d. because #reRegister run async, so we could not make sure which one run 
> first between send FBR and clear IBR. If clean IBR run first, it will be OK. 
> But if send FBR first then clear IBR, it will missing some blocks received 
> between these two time point until next FBR.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-15230) Sanity check should not assume key base name can be derived from version name

2020-03-20 Thread Wei-Chiu Chuang (Jira)
Wei-Chiu Chuang created HDFS-15230:
--

 Summary: Sanity check should not assume key base name can be 
derived from version name
 Key: HDFS-15230
 URL: https://issues.apache.org/jira/browse/HDFS-15230
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Wei-Chiu Chuang


HDFS-14884 checks if the encryption info of a file matches the encryption zone 
key.

{code}
if (!KeyProviderCryptoExtension.
getBaseName(keyVersionName).equals(zoneKeyName)) {
  throw new IllegalArgumentException(String.format(
  "KeyVersion '%s' does not belong to the key '%s'",
  keyVersionName, zoneKeyName));
}
{code}
Here it assumes the "base name" can be derived from key version name, and that 
the base name should be the same as zone key.

However, there is no published definition of what a key version name should be. 

While the code works for the builtin JKS key provider, it may not work for 
other kind of key providers. (Specifically, it breaks Cloudera's KeyTrustee KMS 
KeyProvider)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-15208) Suppress bogus AbstractWadlGeneratorGrammarGenerator in KMS stderr in hdfs

2020-03-18 Thread Wei-Chiu Chuang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15208?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang resolved HDFS-15208.

Fix Version/s: 3.2.2
   3.1.4
   Resolution: Fixed

> Suppress bogus AbstractWadlGeneratorGrammarGenerator in KMS stderr in hdfs
> --
>
> Key: HDFS-15208
> URL: https://issues.apache.org/jira/browse/HDFS-15208
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 3.0.0
>Reporter: Wei-Chiu Chuang
>Assignee: Wei-Chiu Chuang
>Priority: Trivial
> Fix For: 3.3.0, 3.1.4, 3.2.2
>
>
> Continuation of HADOOP-15686
> Add the same log4j property to disable error log in hadoop-hdfs.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-14743) Enhance INodeAttributeProvider/ AccessControlEnforcer Interface in HDFS to support Authorization of mkdir, rm, rmdir, copy, move etc...

2020-03-13 Thread Wei-Chiu Chuang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14743?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang resolved HDFS-14743.

Fix Version/s: 3.3.0
   Resolution: Fixed

Thanks [~xyao] and [~ste...@apache.org] for the throughout reviews!

> Enhance INodeAttributeProvider/ AccessControlEnforcer Interface in HDFS to 
> support Authorization of mkdir, rm, rmdir, copy, move etc...
> ---
>
> Key: HDFS-14743
> URL: https://issues.apache.org/jira/browse/HDFS-14743
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs
>Affects Versions: 3.1.0
>Reporter: Ramesh Mani
>Assignee: Wei-Chiu Chuang
>Priority: Critical
> Fix For: 3.3.0
>
> Attachments: HDFS-14743 Enhance INodeAttributeProvider_ 
> AccessControlEnforcer Interface.pdf
>
>
> Enhance INodeAttributeProvider / AccessControlEnforcer Interface in HDFS to 
> support Authorization of mkdir, rm, rmdir, copy, move etc..., this should 
> help the implementors of the interface like Apache Ranger's HDFS 
> Authorization plugin to authorize and audit those command sets.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-15208) Supress bogus AbstractWadlGeneratorGrammarGenerator in KMS stderr in hdfs

2020-03-05 Thread Wei-Chiu Chuang (Jira)
Wei-Chiu Chuang created HDFS-15208:
--

 Summary: Supress bogus AbstractWadlGeneratorGrammarGenerator in 
KMS stderr in hdfs
 Key: HDFS-15208
 URL: https://issues.apache.org/jira/browse/HDFS-15208
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Wei-Chiu Chuang


Continuation of HADOOP-15686

Add the same log4j property to disable error log in hadoop-hdfs.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-14668) Support Fuse with Users from multiple Security Realms

2020-02-27 Thread Wei-Chiu Chuang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14668?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang resolved HDFS-14668.

Fix Version/s: 3.2.2
   3.1.4
   3.3.0
   Resolution: Fixed

Thanks [~pifta]!

> Support Fuse with Users from multiple Security Realms
> -
>
> Key: HDFS-14668
> URL: https://issues.apache.org/jira/browse/HDFS-14668
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: fuse-dfs
>Affects Versions: 3.1.0, 3.0.3
>Reporter: Sailesh Patel
>Assignee: Istvan Fajth
>Priority: Critical
>  Labels: regression
> Fix For: 3.3.0, 3.1.4, 3.2.2
>
>
> UPDATE:
> See 
> [this|https://issues.apache.org/jira/browse/HDFS-14668?focusedCommentId=16979466=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16979466]
>  comment for the complete description of what is happening here.
> Users from non-default  krb5 domain can't use hadoop-fuse.
> There are 2 Realms with kdc. 
> -one realm is for human users  (USERS.COM.US) 
> -the other is for service principals.   (SERVICE.COM.US) 
> Cross realm trust is setup.
> In krb5.conf  the default domain  is set to SERVICE.COM.US
> Users within USERS.COM.US Realm are not able to put any files to Fuse mounted 
> location
> The client shows:
>   cp: cannot create regular file ‘/hdfs_mount/tmp/hello_from_fuse.txt’: 
> Input/output error



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



  1   2   3   4   >