[jira] [Updated] (HDFS-16625) Unit tests aren't checking for PMDK availability

2022-08-23 Thread Masatake Iwasaki (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16625?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Masatake Iwasaki updated HDFS-16625:

Priority: Major  (was: Blocker)

> Unit tests aren't checking for PMDK availability
> 
>
> Key: HDFS-16625
> URL: https://issues.apache.org/jira/browse/HDFS-16625
> Project: Hadoop HDFS
>  Issue Type: Test
>  Components: test
>Affects Versions: 3.4.0, 3.3.9
>Reporter: Steve Vaughan
>Assignee: Steve Vaughan
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.3.9
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> There are unit tests that require native PMDK libraries which aren't checking 
> if the library is available, resulting in unsuccessful test.  Adding the 
> following in the test setup addresses the problem.
> {code:java}
> assumeTrue ("Requires PMDK", NativeIO.POSIX.isPmdkAvailable()); {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-16625) Unit tests aren't checking for PMDK availability

2022-08-23 Thread Masatake Iwasaki (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16625?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Masatake Iwasaki resolved HDFS-16625.
-
Hadoop Flags: Reviewed
  Resolution: Fixed

> Unit tests aren't checking for PMDK availability
> 
>
> Key: HDFS-16625
> URL: https://issues.apache.org/jira/browse/HDFS-16625
> Project: Hadoop HDFS
>  Issue Type: Test
>  Components: test
>Affects Versions: 3.4.0, 3.3.9
>Reporter: Steve Vaughan
>Assignee: Steve Vaughan
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.3.9
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> There are unit tests that require native PMDK libraries which aren't checking 
> if the library is available, resulting in unsuccessful test.  Adding the 
> following in the test setup addresses the problem.
> {code:java}
> assumeTrue ("Requires PMDK", NativeIO.POSIX.isPmdkAvailable()); {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Assigned] (HDFS-16714) Remove okhttp and kotlin dependencies

2022-08-09 Thread Masatake Iwasaki (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16714?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Masatake Iwasaki reassigned HDFS-16714:
---

Assignee: Cheng Pan

> Remove okhttp and kotlin dependencies
> -
>
> Key: HDFS-16714
> URL: https://issues.apache.org/jira/browse/HDFS-16714
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client
>Affects Versions: 3.3.4
>Reporter: Cheng Pan
>Assignee: Cheng Pan
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> hadoop-common already has apache http client dependencies, okhttp is 
> unnecessary



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14084) Need for more stats in DFSClient

2022-07-12 Thread Masatake Iwasaki (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17565906#comment-17565906
 ] 

Masatake Iwasaki commented on HDFS-14084:
-

update the targets to 3.2.5 for preparing 3.2.4 release.

> Need for more stats in DFSClient
> 
>
> Key: HDFS-14084
> URL: https://issues.apache.org/jira/browse/HDFS-14084
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 3.0.0
>Reporter: Pranay Singh
>Priority: Minor
> Attachments: HDFS-14084.001.patch, HDFS-14084.002.patch, 
> HDFS-14084.003.patch, HDFS-14084.004.patch, HDFS-14084.005.patch, 
> HDFS-14084.006.patch, HDFS-14084.007.patch, HDFS-14084.008.patch, 
> HDFS-14084.009.patch, HDFS-14084.010.patch, HDFS-14084.011.patch, 
> HDFS-14084.012.patch, HDFS-14084.013.patch, HDFS-14084.014.patch, 
> HDFS-14084.015.patch, HDFS-14084.016.patch, HDFS-14084.017.patch, 
> HDFS-14084.018.patch
>
>
> The usage of HDFS has changed from being used as a map-reduce filesystem, now 
> it's becoming more of like a general purpose filesystem. In most of the cases 
> there are issues with the Namenode so we have metrics to know the workload or 
> stress on Namenode.
> However, there is a need to have more statistics collected for different 
> operations/RPCs in DFSClient to know which RPC operations are taking longer 
> time or to know what is the frequency of the operation.These statistics can 
> be exposed to the users of DFS Client and they can periodically log or do 
> some sort of flow control if the response is slow. This will also help to 
> isolate HDFS issue in a mixed environment where on a node say we have Spark, 
> HBase and Impala running together. We can check the throughput of different 
> operation across client and isolate the problem caused because of noisy 
> neighbor or network congestion or shared JVM.
> We have dealt with several problems from the field for which there is no 
> conclusive evidence as to what caused the problem. If we had metrics or stats 
> in DFSClient we would be better equipped to solve such complex problems.
> List of jiras for reference:
> -
>  HADOOP-15538 HADOOP-15530 ( client side deadlock)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14084) Need for more stats in DFSClient

2022-07-12 Thread Masatake Iwasaki (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14084?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Masatake Iwasaki updated HDFS-14084:

Target Version/s: 3.2.5  (was: 3.2.4)

> Need for more stats in DFSClient
> 
>
> Key: HDFS-14084
> URL: https://issues.apache.org/jira/browse/HDFS-14084
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 3.0.0
>Reporter: Pranay Singh
>Priority: Minor
> Attachments: HDFS-14084.001.patch, HDFS-14084.002.patch, 
> HDFS-14084.003.patch, HDFS-14084.004.patch, HDFS-14084.005.patch, 
> HDFS-14084.006.patch, HDFS-14084.007.patch, HDFS-14084.008.patch, 
> HDFS-14084.009.patch, HDFS-14084.010.patch, HDFS-14084.011.patch, 
> HDFS-14084.012.patch, HDFS-14084.013.patch, HDFS-14084.014.patch, 
> HDFS-14084.015.patch, HDFS-14084.016.patch, HDFS-14084.017.patch, 
> HDFS-14084.018.patch
>
>
> The usage of HDFS has changed from being used as a map-reduce filesystem, now 
> it's becoming more of like a general purpose filesystem. In most of the cases 
> there are issues with the Namenode so we have metrics to know the workload or 
> stress on Namenode.
> However, there is a need to have more statistics collected for different 
> operations/RPCs in DFSClient to know which RPC operations are taking longer 
> time or to know what is the frequency of the operation.These statistics can 
> be exposed to the users of DFS Client and they can periodically log or do 
> some sort of flow control if the response is slow. This will also help to 
> isolate HDFS issue in a mixed environment where on a node say we have Spark, 
> HBase and Impala running together. We can check the throughput of different 
> operation across client and isolate the problem caused because of noisy 
> neighbor or network congestion or shared JVM.
> We have dealt with several problems from the field for which there is no 
> conclusive evidence as to what caused the problem. If we had metrics or stats 
> in DFSClient we would be better equipped to solve such complex problems.
> List of jiras for reference:
> -
>  HADOOP-15538 HADOOP-15530 ( client side deadlock)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14571) Command line to force volume failures

2022-07-12 Thread Masatake Iwasaki (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14571?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17565904#comment-17565904
 ] 

Masatake Iwasaki commented on HDFS-14571:
-

update the targets to 3.2.5 for preparing 3.2.4 release.

> Command line to force volume failures
> -
>
> Key: HDFS-14571
> URL: https://issues.apache.org/jira/browse/HDFS-14571
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: datanode, hdfs
> Environment: Linux
>Reporter: Scott A. Wehner
>Priority: Major
>  Labels: disks, volumes
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
> Datanodes that have failed hard drives reports to the namenode that it has a 
> failed volume in line with enabling slow datanode detection and we have a 
> failing drive that has not failed, or has uncorrectable sectors,  I want to 
> be able to run a command to force fail a datanode volume based on storageID 
> or Target Storage location (a.k.a mount point).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14349) Edit log may be rolled more frequently than necessary with multiple Standby nodes

2022-07-12 Thread Masatake Iwasaki (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14349?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17565905#comment-17565905
 ] 

Masatake Iwasaki commented on HDFS-14349:
-

update the targets to 3.2.5 for preparing 3.2.4 release.

> Edit log may be rolled more frequently than necessary with multiple Standby 
> nodes
> -
>
> Key: HDFS-14349
> URL: https://issues.apache.org/jira/browse/HDFS-14349
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: ha, hdfs, qjm
>Reporter: Erik Krogen
>Assignee: Ekanth Sethuramalingam
>Priority: Major
>  Labels: multi-sbnn
>
> When HDFS-14317 was fixed, we tackled the problem that in a cluster with 
> in-progress edit log tailing enabled, a Standby NameNode may _never_ roll the 
> edit logs, which can eventually cause data loss.
> Unfortunately, in the process, it was made so that if there are multiple 
> Standby NameNodes, they will all roll the edit logs at their specified 
> frequency, so the edit log will be rolled X times more frequently than they 
> should be (where X is the number of Standby NNs). This is not as bad as the 
> original bug since rolling frequently does not affect correctness or data 
> availability, but may degrade performance by creating more edit log segments 
> than necessary.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15289) Allow viewfs mounts with HDFS/HCFS scheme and centralized mount table

2022-07-12 Thread Masatake Iwasaki (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15289?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Masatake Iwasaki updated HDFS-15289:

Target Version/s: 3.4.0, 3.3.9, 3.2.5  (was: 3.4.0, 3.2.4, 3.3.9)

> Allow viewfs mounts with HDFS/HCFS scheme and centralized mount table
> -
>
> Key: HDFS-15289
> URL: https://issues.apache.org/jira/browse/HDFS-15289
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: fs
>Affects Versions: 3.2.0
>Reporter: Uma Maheswara Rao G
>Assignee: Uma Maheswara Rao G
>Priority: Major
> Attachments: ViewFSOverloadScheme - V1.0.pdf, ViewFSOverloadScheme.png
>
>
> ViewFS provides flexibility to mount different filesystem types with mount 
> points configuration table. This approach is solving the scalability 
> problems, but users need to reconfigure the filesystem to ViewFS and to its 
> scheme.  This will be problematic in the case of paths persisted in meta 
> stores, ex: Hive. In systems like Hive, it will store uris in meta store. So, 
> changing the file system scheme will create a burden to upgrade/recreate meta 
> stores. In our experience many users are not ready to change that.  
> Router based federation is another implementation to provide coordinated 
> mount points for HDFS federation clusters. Even though this provides 
> flexibility to handle mount points easily, this will not allow 
> other(non-HDFS) file systems to mount. So, this does not solve the purpose 
> when users want to mount external(non-HDFS) filesystems.
> So, the problem here is: Even though many users want to adapt to the scalable 
> fs options available, technical challenges of changing schemes (ex: in meta 
> stores) in deployments are obstructing them. 
> So, we propose to allow hdfs scheme in ViewFS like client side mount system 
> and provision user to create mount links without changing URI paths. 
> I will upload detailed design doc shortly.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14571) Command line to force volume failures

2022-07-12 Thread Masatake Iwasaki (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14571?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Masatake Iwasaki updated HDFS-14571:

Target Version/s: 3.2.5  (was: 3.2.4)

> Command line to force volume failures
> -
>
> Key: HDFS-14571
> URL: https://issues.apache.org/jira/browse/HDFS-14571
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: datanode, hdfs
> Environment: Linux
>Reporter: Scott A. Wehner
>Priority: Major
>  Labels: disks, volumes
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
> Datanodes that have failed hard drives reports to the namenode that it has a 
> failed volume in line with enabling slow datanode detection and we have a 
> failing drive that has not failed, or has uncorrectable sectors,  I want to 
> be able to run a command to force fail a datanode volume based on storageID 
> or Target Storage location (a.k.a mount point).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15289) Allow viewfs mounts with HDFS/HCFS scheme and centralized mount table

2022-07-12 Thread Masatake Iwasaki (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15289?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17565902#comment-17565902
 ] 

Masatake Iwasaki commented on HDFS-15289:
-

update the targets to 3.2.5 for preparing 3.2.4 release.

> Allow viewfs mounts with HDFS/HCFS scheme and centralized mount table
> -
>
> Key: HDFS-15289
> URL: https://issues.apache.org/jira/browse/HDFS-15289
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: fs
>Affects Versions: 3.2.0
>Reporter: Uma Maheswara Rao G
>Assignee: Uma Maheswara Rao G
>Priority: Major
> Attachments: ViewFSOverloadScheme - V1.0.pdf, ViewFSOverloadScheme.png
>
>
> ViewFS provides flexibility to mount different filesystem types with mount 
> points configuration table. This approach is solving the scalability 
> problems, but users need to reconfigure the filesystem to ViewFS and to its 
> scheme.  This will be problematic in the case of paths persisted in meta 
> stores, ex: Hive. In systems like Hive, it will store uris in meta store. So, 
> changing the file system scheme will create a burden to upgrade/recreate meta 
> stores. In our experience many users are not ready to change that.  
> Router based federation is another implementation to provide coordinated 
> mount points for HDFS federation clusters. Even though this provides 
> flexibility to handle mount points easily, this will not allow 
> other(non-HDFS) file systems to mount. So, this does not solve the purpose 
> when users want to mount external(non-HDFS) filesystems.
> So, the problem here is: Even though many users want to adapt to the scalable 
> fs options available, technical challenges of changing schemes (ex: in meta 
> stores) in deployments are obstructing them. 
> So, we propose to allow hdfs scheme in ViewFS like client side mount system 
> and provision user to create mount links without changing URI paths. 
> I will upload detailed design doc shortly.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14349) Edit log may be rolled more frequently than necessary with multiple Standby nodes

2022-07-12 Thread Masatake Iwasaki (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14349?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Masatake Iwasaki updated HDFS-14349:

Target Version/s: 3.2.5  (was: 3.2.4)

> Edit log may be rolled more frequently than necessary with multiple Standby 
> nodes
> -
>
> Key: HDFS-14349
> URL: https://issues.apache.org/jira/browse/HDFS-14349
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: ha, hdfs, qjm
>Reporter: Erik Krogen
>Assignee: Ekanth Sethuramalingam
>Priority: Major
>  Labels: multi-sbnn
>
> When HDFS-14317 was fixed, we tackled the problem that in a cluster with 
> in-progress edit log tailing enabled, a Standby NameNode may _never_ roll the 
> edit logs, which can eventually cause data loss.
> Unfortunately, in the process, it was made so that if there are multiple 
> Standby NameNodes, they will all roll the edit logs at their specified 
> frequency, so the edit log will be rolled X times more frequently than they 
> should be (where X is the number of Standby NNs). This is not as bad as the 
> original bug since rolling frequently does not affect correctness or data 
> availability, but may degrade performance by creating more edit log segments 
> than necessary.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-16022) matlab mapreduce v95 demos can't run hadoop-3.2.2 run time

2022-07-12 Thread Masatake Iwasaki (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16022?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Masatake Iwasaki updated HDFS-16022:

Target Version/s: 3.2.5  (was: 3.2.4)

> matlab mapreduce v95 demos can't run hadoop-3.2.2 run time
> --
>
> Key: HDFS-16022
> URL: https://issues.apache.org/jira/browse/HDFS-16022
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: dfsclient
>Affects Versions: 3.2.2
> Environment: hadoop-3.2.2  + matlab run time+ centos7,  the 
> maxArrivalDelay.ctf file is generated in win10+matlab2018b(V95) by hadoop 
> compiler tools. the airlinesmall.csv upload the HDFS. hadoop can run well by 
> the hadoop-mapreduce-examples-3.2.2.jar wordcount demos, even, jar compiled 
> by the source code in win10+ eclipses env. please help, I have got no idea 
> about this 
>Reporter: cathonxiong
>Priority: Blocker
> Attachments: matlab_errorlog
>
>
>  hadoop \ hadoop \> jar 
> /usr/local/MATLAB/MATLAB_Runtime/v95/toolbox/mlhadoop/jar/a2.2.0/mwmapreduce.jar
>  \> com.mathworks.hadoop.MWMapReduceDriver \> -D 
> mw.mcrroot=/usr/local/MATLAB/MATLAB_Runtime/v95 \> 
> /usr/local/MATLAB/MATLAB_Runtime/v95/maxArrivalDelay.ctf \> 
> hdfs://hadoop.namenode:50070/user/matlab/datasets/airlinesmall.csv \> 
> hdfs://hadoop.namenode:50070/user/matlab/resultsjava.library.path: 
> /usr/local/hadoop-3.2.2/lib/nativeHDFSCTFPath=hdfs://hadoop.namenode:8020/user/root/maxArrivalDelay/maxArrivalDelay.ctfUploading
>  CTF into distributed cache completed.mapred.child.env: 
> MCR_CACHE_ROOT=/tmp,LD_LIBRARY_PATH=/usr/local/MATLAB/MATLAB_Runtime/v95/runtime/glnxa64:/usr/local/MATLAB/MATLAB_Runtime/v95/bin/glnxa64:/usr/local/MATLAB/MATLAB_Runtime/v95/sys/os/glnxa64:/usr/local/MATLAB/MATLAB_Runtime/v95/sys/opengl/lib/glnxa64mapred.child.java.opts:
>  
> -Djava.library.path=/usr/local/MATLAB/MATLAB_Runtime/v95/runtime/glnxa64:/usr/local/MATLAB/MATLAB_Runtime/v95/bin/glnxa64:/usr/local/MATLAB/MATLAB_Runtime/v95/sys/os/glnxa64:/usr/local/MATLAB/MATLAB_Runtime/v95/sys/opengl/lib/glnxa64New
>  java.library.path: 
> /usr/local/hadoop-3.2.2/lib/native:/usr/local/MATLAB/MATLAB_Runtime/v95/runtime/glnxa64:/usr/local/MATLAB/MATLAB_Runtime/v95/bin/glnxa64:/usr/local/MATLAB/MATLAB_Runtime/v95/sys/os/glnxa64:/usr/local/MATLAB/MATLAB_Runtime/v95/sys/opengl/lib/glnxa64Using
>  MATLAB mapper.Set input format class to: ChunkFileRecordReader.Using MATLAB 
> reducer.Set outputformat class to: class 
> org.apache.hadoop.mapreduce.lib.output.SequenceFileOutputFormatSet map output 
> key class to: class com.mathworks.hadoop.MxArrayWritable2Set map output value 
> class to: class com.mathworks.hadoop.MxArrayWritable2Set reduce output key 
> class to: class com.mathworks.hadoop.MxArrayWritable2Set reduce output value 
> class to: class com.mathworks.hadoop.MxArrayWritable2*** run 
> **2021-05-11 14:58:47,043 INFO client.RMProxy: Connecting to 
> ResourceManager at hadoop.namenode/192.168.0.25:80322021-05-11 14:58:47,139 
> WARN net.NetUtils: Unable to wrap exception of type class 
> org.apache.hadoop.ipc.RpcException: it has no (String) 
> constructorjava.lang.NoSuchMethodException: 
> org.apache.hadoop.ipc.RpcException.(java.lang.String) at 
> java.lang.Class.getConstructor0(Class.java:3082) at 
> java.lang.Class.getConstructor(Class.java:1825) at 
> org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:835) at 
> org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:811) at 
> org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1566) at 
> org.apache.hadoop.ipc.Client.call(Client.java:1508) at 
> org.apache.hadoop.ipc.Client.call(Client.java:1405) at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:233)
>  at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:118)
>  at com.sun.proxy.$Proxy9.getFileInfo(Unknown Source) at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:910)
>  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  at java.lang.reflect.Method.invoke(Method.java:498) at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:422)
>  at 
> org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeMethod(RetryInvocationHandler.java:165)
>  at 
> org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invoke(RetryInvocationHandler.java:157)
>  at 
> org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeOnce(RetryInvocationHandler.java:95)
>  at 
> 

[jira] [Commented] (HDFS-16022) matlab mapreduce v95 demos can't run hadoop-3.2.2 run time

2022-07-12 Thread Masatake Iwasaki (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-16022?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17565900#comment-17565900
 ] 

Masatake Iwasaki commented on HDFS-16022:
-

update the targets to 3.2.5 for preparing 3.2.4 release.

> matlab mapreduce v95 demos can't run hadoop-3.2.2 run time
> --
>
> Key: HDFS-16022
> URL: https://issues.apache.org/jira/browse/HDFS-16022
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: dfsclient
>Affects Versions: 3.2.2
> Environment: hadoop-3.2.2  + matlab run time+ centos7,  the 
> maxArrivalDelay.ctf file is generated in win10+matlab2018b(V95) by hadoop 
> compiler tools. the airlinesmall.csv upload the HDFS. hadoop can run well by 
> the hadoop-mapreduce-examples-3.2.2.jar wordcount demos, even, jar compiled 
> by the source code in win10+ eclipses env. please help, I have got no idea 
> about this 
>Reporter: cathonxiong
>Priority: Blocker
> Attachments: matlab_errorlog
>
>
>  hadoop \ hadoop \> jar 
> /usr/local/MATLAB/MATLAB_Runtime/v95/toolbox/mlhadoop/jar/a2.2.0/mwmapreduce.jar
>  \> com.mathworks.hadoop.MWMapReduceDriver \> -D 
> mw.mcrroot=/usr/local/MATLAB/MATLAB_Runtime/v95 \> 
> /usr/local/MATLAB/MATLAB_Runtime/v95/maxArrivalDelay.ctf \> 
> hdfs://hadoop.namenode:50070/user/matlab/datasets/airlinesmall.csv \> 
> hdfs://hadoop.namenode:50070/user/matlab/resultsjava.library.path: 
> /usr/local/hadoop-3.2.2/lib/nativeHDFSCTFPath=hdfs://hadoop.namenode:8020/user/root/maxArrivalDelay/maxArrivalDelay.ctfUploading
>  CTF into distributed cache completed.mapred.child.env: 
> MCR_CACHE_ROOT=/tmp,LD_LIBRARY_PATH=/usr/local/MATLAB/MATLAB_Runtime/v95/runtime/glnxa64:/usr/local/MATLAB/MATLAB_Runtime/v95/bin/glnxa64:/usr/local/MATLAB/MATLAB_Runtime/v95/sys/os/glnxa64:/usr/local/MATLAB/MATLAB_Runtime/v95/sys/opengl/lib/glnxa64mapred.child.java.opts:
>  
> -Djava.library.path=/usr/local/MATLAB/MATLAB_Runtime/v95/runtime/glnxa64:/usr/local/MATLAB/MATLAB_Runtime/v95/bin/glnxa64:/usr/local/MATLAB/MATLAB_Runtime/v95/sys/os/glnxa64:/usr/local/MATLAB/MATLAB_Runtime/v95/sys/opengl/lib/glnxa64New
>  java.library.path: 
> /usr/local/hadoop-3.2.2/lib/native:/usr/local/MATLAB/MATLAB_Runtime/v95/runtime/glnxa64:/usr/local/MATLAB/MATLAB_Runtime/v95/bin/glnxa64:/usr/local/MATLAB/MATLAB_Runtime/v95/sys/os/glnxa64:/usr/local/MATLAB/MATLAB_Runtime/v95/sys/opengl/lib/glnxa64Using
>  MATLAB mapper.Set input format class to: ChunkFileRecordReader.Using MATLAB 
> reducer.Set outputformat class to: class 
> org.apache.hadoop.mapreduce.lib.output.SequenceFileOutputFormatSet map output 
> key class to: class com.mathworks.hadoop.MxArrayWritable2Set map output value 
> class to: class com.mathworks.hadoop.MxArrayWritable2Set reduce output key 
> class to: class com.mathworks.hadoop.MxArrayWritable2Set reduce output value 
> class to: class com.mathworks.hadoop.MxArrayWritable2*** run 
> **2021-05-11 14:58:47,043 INFO client.RMProxy: Connecting to 
> ResourceManager at hadoop.namenode/192.168.0.25:80322021-05-11 14:58:47,139 
> WARN net.NetUtils: Unable to wrap exception of type class 
> org.apache.hadoop.ipc.RpcException: it has no (String) 
> constructorjava.lang.NoSuchMethodException: 
> org.apache.hadoop.ipc.RpcException.(java.lang.String) at 
> java.lang.Class.getConstructor0(Class.java:3082) at 
> java.lang.Class.getConstructor(Class.java:1825) at 
> org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:835) at 
> org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:811) at 
> org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1566) at 
> org.apache.hadoop.ipc.Client.call(Client.java:1508) at 
> org.apache.hadoop.ipc.Client.call(Client.java:1405) at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:233)
>  at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:118)
>  at com.sun.proxy.$Proxy9.getFileInfo(Unknown Source) at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:910)
>  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  at java.lang.reflect.Method.invoke(Method.java:498) at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:422)
>  at 
> org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeMethod(RetryInvocationHandler.java:165)
>  at 
> org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invoke(RetryInvocationHandler.java:157)
>  at 
> 

[jira] [Updated] (HDFS-16022) matlab mapreduce v95 demos can't run hadoop-3.2.2 run time

2022-07-12 Thread Masatake Iwasaki (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16022?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Masatake Iwasaki updated HDFS-16022:

Priority: Major  (was: Blocker)

> matlab mapreduce v95 demos can't run hadoop-3.2.2 run time
> --
>
> Key: HDFS-16022
> URL: https://issues.apache.org/jira/browse/HDFS-16022
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: dfsclient
>Affects Versions: 3.2.2
> Environment: hadoop-3.2.2  + matlab run time+ centos7,  the 
> maxArrivalDelay.ctf file is generated in win10+matlab2018b(V95) by hadoop 
> compiler tools. the airlinesmall.csv upload the HDFS. hadoop can run well by 
> the hadoop-mapreduce-examples-3.2.2.jar wordcount demos, even, jar compiled 
> by the source code in win10+ eclipses env. please help, I have got no idea 
> about this 
>Reporter: cathonxiong
>Priority: Major
> Attachments: matlab_errorlog
>
>
>  hadoop \ hadoop \> jar 
> /usr/local/MATLAB/MATLAB_Runtime/v95/toolbox/mlhadoop/jar/a2.2.0/mwmapreduce.jar
>  \> com.mathworks.hadoop.MWMapReduceDriver \> -D 
> mw.mcrroot=/usr/local/MATLAB/MATLAB_Runtime/v95 \> 
> /usr/local/MATLAB/MATLAB_Runtime/v95/maxArrivalDelay.ctf \> 
> hdfs://hadoop.namenode:50070/user/matlab/datasets/airlinesmall.csv \> 
> hdfs://hadoop.namenode:50070/user/matlab/resultsjava.library.path: 
> /usr/local/hadoop-3.2.2/lib/nativeHDFSCTFPath=hdfs://hadoop.namenode:8020/user/root/maxArrivalDelay/maxArrivalDelay.ctfUploading
>  CTF into distributed cache completed.mapred.child.env: 
> MCR_CACHE_ROOT=/tmp,LD_LIBRARY_PATH=/usr/local/MATLAB/MATLAB_Runtime/v95/runtime/glnxa64:/usr/local/MATLAB/MATLAB_Runtime/v95/bin/glnxa64:/usr/local/MATLAB/MATLAB_Runtime/v95/sys/os/glnxa64:/usr/local/MATLAB/MATLAB_Runtime/v95/sys/opengl/lib/glnxa64mapred.child.java.opts:
>  
> -Djava.library.path=/usr/local/MATLAB/MATLAB_Runtime/v95/runtime/glnxa64:/usr/local/MATLAB/MATLAB_Runtime/v95/bin/glnxa64:/usr/local/MATLAB/MATLAB_Runtime/v95/sys/os/glnxa64:/usr/local/MATLAB/MATLAB_Runtime/v95/sys/opengl/lib/glnxa64New
>  java.library.path: 
> /usr/local/hadoop-3.2.2/lib/native:/usr/local/MATLAB/MATLAB_Runtime/v95/runtime/glnxa64:/usr/local/MATLAB/MATLAB_Runtime/v95/bin/glnxa64:/usr/local/MATLAB/MATLAB_Runtime/v95/sys/os/glnxa64:/usr/local/MATLAB/MATLAB_Runtime/v95/sys/opengl/lib/glnxa64Using
>  MATLAB mapper.Set input format class to: ChunkFileRecordReader.Using MATLAB 
> reducer.Set outputformat class to: class 
> org.apache.hadoop.mapreduce.lib.output.SequenceFileOutputFormatSet map output 
> key class to: class com.mathworks.hadoop.MxArrayWritable2Set map output value 
> class to: class com.mathworks.hadoop.MxArrayWritable2Set reduce output key 
> class to: class com.mathworks.hadoop.MxArrayWritable2Set reduce output value 
> class to: class com.mathworks.hadoop.MxArrayWritable2*** run 
> **2021-05-11 14:58:47,043 INFO client.RMProxy: Connecting to 
> ResourceManager at hadoop.namenode/192.168.0.25:80322021-05-11 14:58:47,139 
> WARN net.NetUtils: Unable to wrap exception of type class 
> org.apache.hadoop.ipc.RpcException: it has no (String) 
> constructorjava.lang.NoSuchMethodException: 
> org.apache.hadoop.ipc.RpcException.(java.lang.String) at 
> java.lang.Class.getConstructor0(Class.java:3082) at 
> java.lang.Class.getConstructor(Class.java:1825) at 
> org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:835) at 
> org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:811) at 
> org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1566) at 
> org.apache.hadoop.ipc.Client.call(Client.java:1508) at 
> org.apache.hadoop.ipc.Client.call(Client.java:1405) at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:233)
>  at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:118)
>  at com.sun.proxy.$Proxy9.getFileInfo(Unknown Source) at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:910)
>  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  at java.lang.reflect.Method.invoke(Method.java:498) at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:422)
>  at 
> org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeMethod(RetryInvocationHandler.java:165)
>  at 
> org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invoke(RetryInvocationHandler.java:157)
>  at 
> org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeOnce(RetryInvocationHandler.java:95)
>  at 
> 

[jira] [Updated] (HDFS-16177) Bug fix for Util#receiveFile

2022-07-12 Thread Masatake Iwasaki (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16177?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Masatake Iwasaki updated HDFS-16177:

Fix Version/s: 3.2.4
   (was: 3.2.3)

> Bug fix for Util#receiveFile
> 
>
> Key: HDFS-16177
> URL: https://issues.apache.org/jira/browse/HDFS-16177
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 3.1.0
>Reporter: Tao Li
>Assignee: Tao Li
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.3.2, 3.2.4
>
> Attachments: download-fsimage.jpg
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> The time to write file was miscalculated in Util#receiveFile.
> !download-fsimage.jpg|width=578,height=134!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-16198) Short circuit read leaks Slot objects when InvalidToken exception is thrown

2022-07-12 Thread Masatake Iwasaki (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16198?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Masatake Iwasaki updated HDFS-16198:

Fix Version/s: 3.2.4
   (was: 3.2.3)

> Short circuit read leaks Slot objects when InvalidToken exception is thrown
> ---
>
> Key: HDFS-16198
> URL: https://issues.apache.org/jira/browse/HDFS-16198
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Eungsop Yoo
>Assignee: Eungsop Yoo
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0, 2.10.2, 3.3.2, 3.2.4
>
> Attachments: HDFS-16198.patch, screenshot-2.png
>
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> In secure mode, 'dfs.block.access.token.enable' should be set 'true'. With 
> this configuration SecretManager.InvalidToken exception may be thrown if the 
> access token expires when we do short circuit reads. It doesn't matter 
> because the failed reads will be retried. But it causes the leakage of 
> ShortCircuitShm.Slot objects. 
>  
> We found this problem in our secure HBase clusters. The number of open file 
> descriptors of RegionServers kept increasing using short circuit reading. 
> !screenshot-2.png!
>  
> It was caused by the leakage of shared memory segments used by short circuit 
> reading.
> {code:java}
> [root ~]# lsof -p $(ps -ef | grep proc_regionserver | grep -v grep | awk 
> '{print $2}') | grep /dev/shm | wc -l
> 3925
> [root ~]# lsof -p $(ps -ef | grep proc_regionserver | grep -v grep | awk 
> '{print $2}') | grep /dev/shm | head -5
> java 86309 hbase DEL REG 0,19 2308279984 
> /dev/shm/HadoopShortCircuitShm_DFSClient_NONMAPREDUCE_-1107866286_1_743473959
> java 86309 hbase DEL REG 0,19 2306359893 
> /dev/shm/HadoopShortCircuitShm_DFSClient_NONMAPREDUCE_-1107866286_1_1594162967
> java 86309 hbase DEL REG 0,19 2305496758 
> /dev/shm/HadoopShortCircuitShm_DFSClient_NONMAPREDUCE_-1107866286_1_2043027439
> java 86309 hbase DEL REG 0,19 2304784261 
> /dev/shm/HadoopShortCircuitShm_DFSClient_NONMAPREDUCE_-1107866286_1_689571088
> java 86309 hbase DEL REG 0,19 2302621988 
> /dev/shm/HadoopShortCircuitShm_DFSClient_NONMAPREDUCE_-1107866286_1_347008590 
> {code}
>  
> We finally found that the root cause of this is the leakage of 
> ShortCircuitShm.Slot.
>  
> The fix is trivial. Just free the slot when InvalidToken exception is thrown.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-16241) Standby close reconstruction thread

2022-07-12 Thread Masatake Iwasaki (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16241?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Masatake Iwasaki updated HDFS-16241:

Fix Version/s: 3.2.4
   (was: 3.2.3)

> Standby close reconstruction thread
> ---
>
> Key: HDFS-16241
> URL: https://issues.apache.org/jira/browse/HDFS-16241
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: zhanghuazong
>Assignee: zhanghuazong
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.3.2, 3.2.4
>
> Attachments: HDFS-16241
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> When the "Reconstruction Queue Initializer" thread of the active namenode has 
> not stopped, switch to standby namenode. The "Reconstruction Queue 
> Initializer" thread should be closed



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-16187) SnapshotDiff behaviour with Xattrs and Acls is not consistent across NN restarts with checkpointing

2022-07-12 Thread Masatake Iwasaki (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16187?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Masatake Iwasaki updated HDFS-16187:

Fix Version/s: 3.2.4
   (was: 3.2.3)

> SnapshotDiff behaviour with Xattrs and Acls is not consistent across NN 
> restarts with checkpointing
> ---
>
> Key: HDFS-16187
> URL: https://issues.apache.org/jira/browse/HDFS-16187
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: snapshots
>Reporter: Srinivasu Majeti
>Assignee: Shashikant Banerjee
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.3.2, 3.2.4
>
>  Time Spent: 5h
>  Remaining Estimate: 0h
>
> The below test shows the snapshot diff between across snapshots is not 
> consistent with Xattr(EZ here settinh the Xattr) across NN restarts with 
> checkpointed FsImage.
> {code:java}
> @Test
> public void testEncryptionZonesWithSnapshots() throws Exception {
>   final Path snapshottable = new Path("/zones");
>   fsWrapper.mkdir(snapshottable, FsPermission.getDirDefault(),
>   true);
>   dfsAdmin.allowSnapshot(snapshottable);
>   dfsAdmin.createEncryptionZone(snapshottable, TEST_KEY, NO_TRASH);
>   fs.createSnapshot(snapshottable, "snap1");
>   SnapshotDiffReport report =
>   fs.getSnapshotDiffReport(snapshottable, "snap1", "");
>   Assert.assertEquals(0, report.getDiffList().size());
>   report =
>   fs.getSnapshotDiffReport(snapshottable, "snap1", "");
>   System.out.println(report);
>   Assert.assertEquals(0, report.getDiffList().size());
>   fs.setSafeMode(SafeModeAction.SAFEMODE_ENTER);
>   fs.saveNamespace();
>   fs.setSafeMode(SafeModeAction.SAFEMODE_LEAVE);
>   cluster.restartNameNode(true);
>   report =
>   fs.getSnapshotDiffReport(snapshottable, "snap1", "");
>   Assert.assertEquals(0, report.getDiffList().size());
> }{code}
> {code:java}
> Pre Restart:
> Difference between snapshot snap1 and current directory under directory 
> /zones:
> Post Restart:
> Difference between snapshot snap1 and current directory under directory 
> /zones:
> M .{code}
> The side effect of this behavior is : distcp with snapshot diff would fail 
> with below error complaining that target cluster has some data changed .
> {code:java}
> WARN tools.DistCp: The target has been modified since snapshot x
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-16182) numOfReplicas is given the wrong value in BlockPlacementPolicyDefault$chooseTarget can cause DataStreamer to fail with Heterogeneous Storage

2022-07-12 Thread Masatake Iwasaki (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16182?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Masatake Iwasaki updated HDFS-16182:

Fix Version/s: 3.2.4
   (was: 3.2.3)

> numOfReplicas is given the wrong value in  
> BlockPlacementPolicyDefault$chooseTarget can cause DataStreamer to fail with 
> Heterogeneous Storage  
> ---
>
> Key: HDFS-16182
> URL: https://issues.apache.org/jira/browse/HDFS-16182
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namanode
>Affects Versions: 3.4.0
>Reporter: Max  Xie
>Assignee: Max  Xie
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.3.2, 3.2.4
>
> Attachments: HDFS-16182.patch
>
>  Time Spent: 3.5h
>  Remaining Estimate: 0h
>
> In our hdfs cluster, we use heterogeneous storage to store data in SSD  for a 
> better performance. Sometimes  hdfs client transfer data in pipline,  it will 
> throw IOException and exit.  Exception logs are below: 
> ```
> java.io.IOException: Failed to replace a bad datanode on the existing 
> pipeline due to no more good datanodes being available to try. (Nodes: 
> current=[DatanodeInfoWithStorage[dn01_ip:5004,DS-ef7882e0-427d-4c1e-b9ba-a929fac44fb4,DISK],
>  
> DatanodeInfoWithStorage[dn02_ip:5004,DS-3871282a-ad45-4332-866a-f000f9361ecb,DISK],
>  
> DatanodeInfoWithStorage[dn03_ip:5004,DS-a388c067-76a4-4014-a16c-ccc49c8da77b,SSD],
>  
> DatanodeInfoWithStorage[dn04_ip:5004,DS-b81da262-0dd9-4567-a498-c516fab84fe0,SSD],
>  
> DatanodeInfoWithStorage[dn05_ip:5004,DS-34e3af2e-da80-46ac-938c-6a3218a646b9,SSD]],
>  
> original=[DatanodeInfoWithStorage[dn01_ip:5004,DS-ef7882e0-427d-4c1e-b9ba-a929fac44fb4,DISK],
>  
> DatanodeInfoWithStorage[dn02_ip:5004,DS-3871282a-ad45-4332-866a-f000f9361ecb,DISK]]).
>  The current failed datanode replacement policy is DEFAULT, and a client may 
> configure this via 
> 'dfs.client.block.write.replace-datanode-on-failure.policy' in its 
> configuration.
> ```
> After search it,   I found when existing pipline need replace new dn to 
> transfer data, the client will get one additional dn from namenode  and check 
> that the number of dn is the original number + 1.
> ```
> ## DataStreamer$findNewDatanode
> if (nodes.length != original.length + 1) {
>  throw new IOException(
>  "Failed to replace a bad datanode on the existing pipeline "
>  + "due to no more good datanodes being available to try. "
>  + "(Nodes: current=" + Arrays.asList(nodes)
>  + ", original=" + Arrays.asList(original) + "). "
>  + "The current failed datanode replacement policy is "
>  + dfsClient.dtpReplaceDatanodeOnFailure
>  + ", and a client may configure this via '"
>  + BlockWrite.ReplaceDatanodeOnFailure.POLICY_KEY
>  + "' in its configuration.");
> }
> ```
> The root cause is that Namenode$getAdditionalDatanode returns multi datanodes 
> , not one in DataStreamer.addDatanode2ExistingPipeline. 
>  
> Maybe we can fix it in BlockPlacementPolicyDefault$chooseTarget.  I think 
> numOfReplicas should not be assigned by requiredStorageTypes.
>  
>    
>  
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-16350) Datanode start time should be set after RPC server starts successfully

2022-07-12 Thread Masatake Iwasaki (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16350?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Masatake Iwasaki updated HDFS-16350:

Fix Version/s: 3.2.4
   (was: 3.2.3)

> Datanode start time should be set after RPC server starts successfully
> --
>
> Key: HDFS-16350
> URL: https://issues.apache.org/jira/browse/HDFS-16350
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Viraj Jasani
>Assignee: Viraj Jasani
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.3.2, 3.2.4
>
> Attachments: Screenshot 2021-11-23 at 4.32.04 PM.png
>
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> We set start time of Datanode when the class is instantiated but it should be 
> ideally set only after RPC server starts and RPC handlers are initialized to 
> serve client requests.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-16337) Show start time of Datanode on Web

2022-07-12 Thread Masatake Iwasaki (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16337?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Masatake Iwasaki updated HDFS-16337:

Fix Version/s: 3.2.4
   (was: 3.2.3)

> Show start time of Datanode on Web
> --
>
> Key: HDFS-16337
> URL: https://issues.apache.org/jira/browse/HDFS-16337
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Reporter: Tao Li
>Assignee: Tao Li
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.3.2, 3.2.4
>
> Attachments: image-2021-11-19-08-55-58-343.png
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> Show _start time_ of Datanode on Web.
> !image-2021-11-19-08-55-58-343.png|width=540,height=155!
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-16352) return the real datanode numBlocks in #getDatanodeStorageReport

2022-07-12 Thread Masatake Iwasaki (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16352?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Masatake Iwasaki updated HDFS-16352:

Fix Version/s: 3.2.4

> return the real datanode numBlocks in #getDatanodeStorageReport
> ---
>
> Key: HDFS-16352
> URL: https://issues.apache.org/jira/browse/HDFS-16352
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: qinyuren
>Assignee: qinyuren
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.2.4, 3.3.9
>
> Attachments: image-2021-11-23-22-04-06-131.png
>
>  Time Spent: 3h 40m
>  Remaining Estimate: 0h
>
> #getDatanodeStorageReport will return the array of DatanodeStorageReport 
> which contains the DatanodeInfo in each DatanodeStorageReport, but the 
> numBlocks in DatanodeInfo is always zero, which is confusing
> !image-2021-11-23-22-04-06-131.png|width=683,height=338!
> Or we can return the real numBlocks in DatanodeInfo when we call 
> #getDatanodeStorageReport



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-16430) Validate maximum blocks in EC group when adding an EC policy

2022-07-12 Thread Masatake Iwasaki (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16430?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Masatake Iwasaki updated HDFS-16430:

Fix Version/s: 3.2.4
   (was: 3.2.3)

> Validate maximum blocks in EC group when adding an EC policy
> 
>
> Key: HDFS-16430
> URL: https://issues.apache.org/jira/browse/HDFS-16430
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: ec, erasure-coding
>Affects Versions: 3.3.0, 3.3.1
>Reporter: daimin
>Assignee: daimin
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.2.4, 3.3.9
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> HDFS EC adopts the last 4 bits of block ID to store the block index in EC 
> block group. Therefore maximum blocks in EC block group is 2^4=16, and which 
> is defined here: HdfsServerConstants#MAX_BLOCKS_IN_GROUP.
> Currently there is no limitation or warning when adding a bad EC policy with 
> numDataUnits + numParityUnits > 16. It only results in read/write error on EC 
> file with bad EC policy. To users this is not very straightforward.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-16403) Improve FUSE IO performance by supporting FUSE parameter max_background

2022-07-12 Thread Masatake Iwasaki (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16403?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Masatake Iwasaki updated HDFS-16403:

Fix Version/s: 3.2.4
   (was: 3.2.3)

> Improve FUSE IO performance by supporting FUSE parameter max_background
> ---
>
> Key: HDFS-16403
> URL: https://issues.apache.org/jira/browse/HDFS-16403
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: fuse-dfs
>Affects Versions: 3.3.0, 3.3.1
>Reporter: daimin
>Assignee: daimin
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.2.4, 3.3.9
>
>  Time Spent: 4.5h
>  Remaining Estimate: 0h
>
> When we examining the FUSE IO performance on HDFS, we found that the 
> simultaneous IO requests number are limited to a fixed number, like 12. This 
> limitation makes the IO performance on FUSE client quite unacceptable. We did 
> some research on this and inspired by the article  [Performance and Resource 
> Utilization of FUSE User-Space File 
> Systems|https://dl.acm.org/doi/fullHtml/10.1145/3310148], clearly the FUSE 
> parameter '{{{}max_background{}}}' decides the simultaneous IO requests 
> number, which is 12 by default.
> We add 'max_background' to fuse_dfs mount options,  the FUSE kernel will take 
> effect when an option value is given.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-16437) ReverseXML processor doesn't accept XML files without the SnapshotDiffSection.

2022-07-12 Thread Masatake Iwasaki (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16437?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Masatake Iwasaki updated HDFS-16437:

Fix Version/s: 3.2.4
   (was: 3.2.3)

> ReverseXML processor doesn't accept XML files without the SnapshotDiffSection.
> --
>
> Key: HDFS-16437
> URL: https://issues.apache.org/jira/browse/HDFS-16437
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs
>Affects Versions: 3.1.1, 3.3.0
>Reporter: yanbin.zhang
>Assignee: yanbin.zhang
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.2.4, 3.3.3
>
>  Time Spent: 5h 40m
>  Remaining Estimate: 0h
>
> In a cluster environment without snapshot, if you want to convert back to 
> fsimage through the generated xml, an error will be reported.
> {code:java}
> //代码占位符
> [test@test001 ~]$ hdfs oiv -p ReverseXML -i fsimage_0257220.xml 
> -o fsimage_0257220
> OfflineImageReconstructor failed: FSImage XML ended prematurely, without 
> including section(s) SnapshotDiffSection
> java.io.IOException: FSImage XML ended prematurely, without including 
> section(s) SnapshotDiffSection
>         at 
> org.apache.hadoop.hdfs.tools.offlineImageViewer.OfflineImageReconstructor.processXml(OfflineImageReconstructor.java:1765)
>         at 
> org.apache.hadoop.hdfs.tools.offlineImageViewer.OfflineImageReconstructor.run(OfflineImageReconstructor.java:1842)
>         at 
> org.apache.hadoop.hdfs.tools.offlineImageViewer.OfflineImageViewerPB.run(OfflineImageViewerPB.java:211)
>         at 
> org.apache.hadoop.hdfs.tools.offlineImageViewer.OfflineImageViewerPB.main(OfflineImageViewerPB.java:149)
> 22/01/25 15:56:52 INFO util.ExitUtil: Exiting with status 1: ExitException 
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-11041) Unable to unregister FsDatasetState MBean if DataNode is shutdown twice

2022-07-12 Thread Masatake Iwasaki (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-11041?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Masatake Iwasaki updated HDFS-11041:

Fix Version/s: 3.2.4
   (was: 3.2.3)

> Unable to unregister FsDatasetState MBean if DataNode is shutdown twice
> ---
>
> Key: HDFS-11041
> URL: https://issues.apache.org/jira/browse/HDFS-11041
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Reporter: Wei-Chiu Chuang
>Assignee: Wei-Chiu Chuang
>Priority: Trivial
> Fix For: 3.4.0, 2.10.2, 3.2.4, 3.3.3
>
> Attachments: HDFS-11041.01.patch, HDFS-11041.02.patch, 
> HDFS-11041.03.patch
>
>
> I saw error message like the following in some tests
> {noformat}
> 2016-10-21 04:09:03,900 [main] WARN  util.MBeans 
> (MBeans.java:unregister(114)) - Error unregistering 
> Hadoop:service=DataNode,name=FSDatasetState-33cd714c-0b1a-471f-8efe-f431d7d874bc
> javax.management.InstanceNotFoundException: 
> Hadoop:service=DataNode,name=FSDatasetState-33cd714c-0b1a-471f-8efe-f431d7d874bc
>   at 
> com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.getMBean(DefaultMBeanServerInterceptor.java:1095)
>   at 
> com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.exclusiveUnregisterMBean(DefaultMBeanServerInterceptor.java:427)
>   at 
> com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.unregisterMBean(DefaultMBeanServerInterceptor.java:415)
>   at 
> com.sun.jmx.mbeanserver.JmxMBeanServer.unregisterMBean(JmxMBeanServer.java:546)
>   at org.apache.hadoop.metrics2.util.MBeans.unregister(MBeans.java:112)
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.shutdown(FsDatasetImpl.java:2127)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataNode.shutdown(DataNode.java:2016)
>   at 
> org.apache.hadoop.hdfs.MiniDFSCluster.shutdownDataNodes(MiniDFSCluster.java:1985)
>   at 
> org.apache.hadoop.hdfs.MiniDFSCluster.shutdown(MiniDFSCluster.java:1962)
>   at 
> org.apache.hadoop.hdfs.MiniDFSCluster.shutdown(MiniDFSCluster.java:1936)
>   at 
> org.apache.hadoop.hdfs.MiniDFSCluster.shutdown(MiniDFSCluster.java:1929)
>   at 
> org.apache.hadoop.hdfs.TestDatanodeReport.testDatanodeReport(TestDatanodeReport.java:144)
> {noformat}
> The test shuts down datanode, and then shutdown cluster, which shuts down the 
> a datanode twice. Resetting the FsDatasetSpi reference in DataNode to null 
> resolves the issue.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-16428) Source path with storagePolicy cause wrong typeConsumed while rename

2022-07-12 Thread Masatake Iwasaki (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16428?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Masatake Iwasaki updated HDFS-16428:

Fix Version/s: 3.2.4
   (was: 3.2.3)

> Source path with storagePolicy cause wrong typeConsumed while rename
> 
>
> Key: HDFS-16428
> URL: https://issues.apache.org/jira/browse/HDFS-16428
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs, namenode
>Reporter: lei w
>Assignee: lei w
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.2.4, 3.3.3
>
> Attachments: example.txt
>
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> When compute quota in rename operation , we use storage policy of the target 
> directory to compute src  quota usage. This will cause wrong value of 
> typeConsumed when source path was setted storage policy. I provided a unit 
> test to present this situation.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15737) Don't remove datanodes from outOfServiceNodeBlocks while checking in DatanodeAdminManager

2022-05-31 Thread Masatake Iwasaki (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15737?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Masatake Iwasaki updated HDFS-15737:

Fix Version/s: (was: 2.10.2)

> Don't remove datanodes from outOfServiceNodeBlocks while checking in 
> DatanodeAdminManager
> -
>
> Key: HDFS-15737
> URL: https://issues.apache.org/jira/browse/HDFS-15737
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Reporter: Ye Ni
>Assignee: Ye Ni
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> With CyclicIteration, remove an item while iterating causes either dead loop 
> or ConcurrentModificationException.
> This item should be removed by
> {{toRemove.add(dn);}}



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13678) StorageType is incompatible when rolling upgrade to 2.6/2.6+ versions

2022-05-24 Thread Masatake Iwasaki (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-13678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17541340#comment-17541340
 ] 

Masatake Iwasaki commented on HDFS-13678:
-

updated the target version for preparing 2.10.2 release.

> StorageType is incompatible when rolling upgrade to 2.6/2.6+ versions
> -
>
> Key: HDFS-13678
> URL: https://issues.apache.org/jira/browse/HDFS-13678
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: rolling upgrades
>Affects Versions: 2.5.0
>Reporter: Yiqun Lin
>Priority: Major
>
> In version 2.6.0, we supported more storage types in HDFS that implemented in 
> HDFS-6584. But this seems a incompatible change when we rolling upgrade our 
> cluster from 2.5.0 to 2.6.0 and throw following error.
> {noformat}
> 2018-06-14 11:43:39,246 ERROR [DataNode: 
> [[[DISK]file:/home/vipshop/hard_disk/dfs/, [DISK]file:/data1/dfs/, 
> [DISK]file:/data2/dfs/]] heartbeating to xx.xx.xx.xx:8022] 
> org.apache.hadoop.hdfs.server.datanode.DataNode: Exception in BPOfferService 
> for Block pool BP-670256553-xx.xx.xx.xx-1528795419404 (Datanode Uuid 
> ab150e05-fcb7-49ed-b8ba-f05c27593fee) service to xx.xx.xx.xx:8022
> java.lang.ArrayStoreException
>  at java.util.ArrayList.toArray(ArrayList.java:412)
>  at 
> java.util.Collections$UnmodifiableCollection.toArray(Collections.java:1034)
>  at org.apache.hadoop.hdfs.protocolPB.PBHelper.convert(PBHelper.java:1030)
>  at org.apache.hadoop.hdfs.protocolPB.PBHelper.convert(PBHelper.java:836)
>  at 
> org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolClientSideTranslatorPB.sendHeartbeat(DatanodeProtocolClientSideTranslatorPB.java:146)
>  at 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.sendHeartBeat(BPServiceActor.java:566)
>  at 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.offerService(BPServiceActor.java:664)
>  at 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:835)
>  at java.lang.Thread.run(Thread.java:748)
> {noformat}
> The scenery is that old DN parses StorageType error that got from new NN. 
> This error is taking place in sending heratbeat to NN and blocks won't be 
> reported to NN successfully. This will lead subsequent errors.
> Corresponding logic in 2.5.0:
> {code}
>   public static BlockCommand convert(BlockCommandProto blkCmd) {
> ...
> StorageType[][] targetStorageTypes = new StorageType[targetList.size()][];
> List targetStorageTypesList = 
> blkCmd.getTargetStorageTypesList();
> if (targetStorageTypesList.isEmpty()) { // missing storage types
>   for(int i = 0; i < targetStorageTypes.length; i++) {
> targetStorageTypes[i] = new StorageType[targets[i].length];
> Arrays.fill(targetStorageTypes[i], StorageType.DEFAULT);
>   }
> } else {
>   for(int i = 0; i < targetStorageTypes.length; i++) {
> List p = 
> targetStorageTypesList.get(i).getStorageTypesList();
> targetStorageTypes[i] = p.toArray(new StorageType[p.size()]);  < 
> error here
>   }
> }
> {code}
> But corresponding to the current logic , it's will be better to return 
> default type instead of a exception in case StorageType changed(new fields 
> added or new types) in new versions during rolling upgrade.
> {code:java}
> public static StorageType convertStorageType(StorageTypeProto type) {
> switch(type) {
> case DISK:
>   return StorageType.DISK;
> case SSD:
>   return StorageType.SSD;
> case ARCHIVE:
>   return StorageType.ARCHIVE;
> case RAM_DISK:
>   return StorageType.RAM_DISK;
> case PROVIDED:
>   return StorageType.PROVIDED;
> default:
>   throw new IllegalStateException(
>   "BUG: StorageTypeProto not found, type=" + type);
> }
>   }
> {code}



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-13678) StorageType is incompatible when rolling upgrade to 2.6/2.6+ versions

2022-05-24 Thread Masatake Iwasaki (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-13678?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Masatake Iwasaki updated HDFS-13678:

Target Version/s: 2.10.3, 2.9.3  (was: 2.9.3, 2.10.2)

> StorageType is incompatible when rolling upgrade to 2.6/2.6+ versions
> -
>
> Key: HDFS-13678
> URL: https://issues.apache.org/jira/browse/HDFS-13678
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: rolling upgrades
>Affects Versions: 2.5.0
>Reporter: Yiqun Lin
>Priority: Major
>
> In version 2.6.0, we supported more storage types in HDFS that implemented in 
> HDFS-6584. But this seems a incompatible change when we rolling upgrade our 
> cluster from 2.5.0 to 2.6.0 and throw following error.
> {noformat}
> 2018-06-14 11:43:39,246 ERROR [DataNode: 
> [[[DISK]file:/home/vipshop/hard_disk/dfs/, [DISK]file:/data1/dfs/, 
> [DISK]file:/data2/dfs/]] heartbeating to xx.xx.xx.xx:8022] 
> org.apache.hadoop.hdfs.server.datanode.DataNode: Exception in BPOfferService 
> for Block pool BP-670256553-xx.xx.xx.xx-1528795419404 (Datanode Uuid 
> ab150e05-fcb7-49ed-b8ba-f05c27593fee) service to xx.xx.xx.xx:8022
> java.lang.ArrayStoreException
>  at java.util.ArrayList.toArray(ArrayList.java:412)
>  at 
> java.util.Collections$UnmodifiableCollection.toArray(Collections.java:1034)
>  at org.apache.hadoop.hdfs.protocolPB.PBHelper.convert(PBHelper.java:1030)
>  at org.apache.hadoop.hdfs.protocolPB.PBHelper.convert(PBHelper.java:836)
>  at 
> org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolClientSideTranslatorPB.sendHeartbeat(DatanodeProtocolClientSideTranslatorPB.java:146)
>  at 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.sendHeartBeat(BPServiceActor.java:566)
>  at 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.offerService(BPServiceActor.java:664)
>  at 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:835)
>  at java.lang.Thread.run(Thread.java:748)
> {noformat}
> The scenery is that old DN parses StorageType error that got from new NN. 
> This error is taking place in sending heratbeat to NN and blocks won't be 
> reported to NN successfully. This will lead subsequent errors.
> Corresponding logic in 2.5.0:
> {code}
>   public static BlockCommand convert(BlockCommandProto blkCmd) {
> ...
> StorageType[][] targetStorageTypes = new StorageType[targetList.size()][];
> List targetStorageTypesList = 
> blkCmd.getTargetStorageTypesList();
> if (targetStorageTypesList.isEmpty()) { // missing storage types
>   for(int i = 0; i < targetStorageTypes.length; i++) {
> targetStorageTypes[i] = new StorageType[targets[i].length];
> Arrays.fill(targetStorageTypes[i], StorageType.DEFAULT);
>   }
> } else {
>   for(int i = 0; i < targetStorageTypes.length; i++) {
> List p = 
> targetStorageTypesList.get(i).getStorageTypesList();
> targetStorageTypes[i] = p.toArray(new StorageType[p.size()]);  < 
> error here
>   }
> }
> {code}
> But corresponding to the current logic , it's will be better to return 
> default type instead of a exception in case StorageType changed(new fields 
> added or new types) in new versions during rolling upgrade.
> {code:java}
> public static StorageType convertStorageType(StorageTypeProto type) {
> switch(type) {
> case DISK:
>   return StorageType.DISK;
> case SSD:
>   return StorageType.SSD;
> case ARCHIVE:
>   return StorageType.ARCHIVE;
> case RAM_DISK:
>   return StorageType.RAM_DISK;
> case PROVIDED:
>   return StorageType.PROVIDED;
> default:
>   throw new IllegalStateException(
>   "BUG: StorageTypeProto not found, type=" + type);
> }
>   }
> {code}



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14794) [SBN read] reportBadBlock is rejected by Observer.

2022-05-24 Thread Masatake Iwasaki (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17541339#comment-17541339
 ] 

Masatake Iwasaki commented on HDFS-14794:
-

updated the target version for preparing 2.10.2 release.

> [SBN read] reportBadBlock is rejected by Observer.
> --
>
> Key: HDFS-14794
> URL: https://issues.apache.org/jira/browse/HDFS-14794
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: namenode
>Affects Versions: 2.10.0
>Reporter: Konstantin Shvachko
>Priority: Major
>
> {{reportBadBlock}} is rejected by Observer via StandbyException
> {code}StandbyException: Operation category WRITE is not supported in state 
> observer{code}
> We should investigate what are the consequences of this and if we should 
> treat {{reportBadBlock}} as IBRs. Note that {{reportBadBlock}} is a part of 
> both {{ClientProtocol}} and {{DatanodeProtocol}}



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14794) [SBN read] reportBadBlock is rejected by Observer.

2022-05-24 Thread Masatake Iwasaki (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Masatake Iwasaki updated HDFS-14794:

Target Version/s: 2.10.3  (was: 2.10.2)

> [SBN read] reportBadBlock is rejected by Observer.
> --
>
> Key: HDFS-14794
> URL: https://issues.apache.org/jira/browse/HDFS-14794
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: namenode
>Affects Versions: 2.10.0
>Reporter: Konstantin Shvachko
>Priority: Major
>
> {{reportBadBlock}} is rejected by Observer via StandbyException
> {code}StandbyException: Operation category WRITE is not supported in state 
> observer{code}
> We should investigate what are the consequences of this and if we should 
> treat {{reportBadBlock}} as IBRs. Note that {{reportBadBlock}} is a part of 
> both {{ClientProtocol}} and {{DatanodeProtocol}}



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15037) Encryption Zone operations should not block other RPC calls while retreiving encryption keys.

2022-05-24 Thread Masatake Iwasaki (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15037?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Masatake Iwasaki updated HDFS-15037:

Target Version/s: 2.10.3  (was: 2.10.2)

> Encryption Zone operations should not block other RPC calls while retreiving 
> encryption keys.
> -
>
> Key: HDFS-15037
> URL: https://issues.apache.org/jira/browse/HDFS-15037
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: encryption, namenode
>Affects Versions: 2.10.0
>Reporter: Konstantin Shvachko
>Priority: Major
>
> I believe it was an intention to avoid blocking other operations while 
> retrieving keys with holding {{FSDirectory.dirLock}}. But in reality all 
> other operations enter first {{FSNamesystemLock}} then {{dirLock}}. So they 
> are all blocked waiting for the key.
> We see substantial increase in RPC wait time ({{RpcQueueTimeAvgTime}}) on 
> NameNode when encryption operations are intermixed with regular workloads.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15357) Do not trust bad block reports from clients

2022-05-24 Thread Masatake Iwasaki (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15357?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17541336#comment-17541336
 ] 

Masatake Iwasaki commented on HDFS-15357:
-

updated the target version for preparing 2.10.2 release.

> Do not trust bad block reports from clients
> ---
>
> Key: HDFS-15357
> URL: https://issues.apache.org/jira/browse/HDFS-15357
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Kihwal Lee
>Priority: Major
>
> {{reportBadBlocks()}} is implemented by both ClientNamenodeProtocol and 
> DatanodeProtocol. When DFSClient is calling it, a faulty client can cause 
> data availability issues in a cluster. 
> In the past we had such an incident where a node with a faulty NIC was 
> randomly corrupting data. All clients ran on the machine reported all 
> accessed blocks and all associated replicas to be corrupt.  More recently, a 
> single faulty client process caused  a small number of missing blocks.  In 
> all cases, actual data was fine.
> The bad block reports from clients shouldn't be trusted blindly. Instead, the 
> namenode should send a datanode command to verify the claim. A bonus would be 
> to keep the record for a while and ignore repeated reports from the same 
> nodes.
> At minimum, there should be an option to ignore bad block reports from 
> clients, perhaps after logging it. A very crude way would be to make it short 
> out in {{ClientNamenodeProtocolServerSideTranslatorPB#reportBadBlocks()}}. 
> More sophisticated way would be to check for the datanode user name in 
> {{FSNamesystem#reportBadBlocks()}} so that it can be easily logged, or 
> optionally do further processing.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15004) Refactor TestBalancer for faster execution.

2022-05-24 Thread Masatake Iwasaki (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15004?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Masatake Iwasaki updated HDFS-15004:

Target Version/s: 2.10.3  (was: 2.10.2)

> Refactor TestBalancer for faster execution.
> ---
>
> Key: HDFS-15004
> URL: https://issues.apache.org/jira/browse/HDFS-15004
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs, test
>Affects Versions: 2.10.0
>Reporter: Konstantin Shvachko
>Priority: Major
>
> {{TestBalancer}} is a big test by itself, it is also a part of many other 
> tests. Running these tests involves spinning of {{MiniDFSCluter}} and 
> shutting it down for every test case, which is inefficient. Many of the test 
> cases can run using the same instance of {{MiniDFSCluter}}, but not all of 
> them. Would be good to refactor the tests to optimize their running time.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15004) Refactor TestBalancer for faster execution.

2022-05-24 Thread Masatake Iwasaki (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17541338#comment-17541338
 ] 

Masatake Iwasaki commented on HDFS-15004:
-

updated the target version for preparing 2.10.2 release.

> Refactor TestBalancer for faster execution.
> ---
>
> Key: HDFS-15004
> URL: https://issues.apache.org/jira/browse/HDFS-15004
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs, test
>Affects Versions: 2.10.0
>Reporter: Konstantin Shvachko
>Priority: Major
>
> {{TestBalancer}} is a big test by itself, it is also a part of many other 
> tests. Running these tests involves spinning of {{MiniDFSCluter}} and 
> shutting it down for every test case, which is inefficient. Many of the test 
> cases can run using the same instance of {{MiniDFSCluter}}, but not all of 
> them. Would be good to refactor the tests to optimize their running time.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15037) Encryption Zone operations should not block other RPC calls while retreiving encryption keys.

2022-05-24 Thread Masatake Iwasaki (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15037?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17541337#comment-17541337
 ] 

Masatake Iwasaki commented on HDFS-15037:
-

updated the target version for preparing 2.10.2 release.

> Encryption Zone operations should not block other RPC calls while retreiving 
> encryption keys.
> -
>
> Key: HDFS-15037
> URL: https://issues.apache.org/jira/browse/HDFS-15037
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: encryption, namenode
>Affects Versions: 2.10.0
>Reporter: Konstantin Shvachko
>Priority: Major
>
> I believe it was an intention to avoid blocking other operations while 
> retrieving keys with holding {{FSDirectory.dirLock}}. But in reality all 
> other operations enter first {{FSNamesystemLock}} then {{dirLock}}. So they 
> are all blocked waiting for the key.
> We see substantial increase in RPC wait time ({{RpcQueueTimeAvgTime}}) on 
> NameNode when encryption operations are intermixed with regular workloads.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15357) Do not trust bad block reports from clients

2022-05-24 Thread Masatake Iwasaki (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15357?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Masatake Iwasaki updated HDFS-15357:

Target Version/s: 3.4.0, 2.10.3  (was: 3.4.0, 2.10.2)

> Do not trust bad block reports from clients
> ---
>
> Key: HDFS-15357
> URL: https://issues.apache.org/jira/browse/HDFS-15357
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Kihwal Lee
>Priority: Major
>
> {{reportBadBlocks()}} is implemented by both ClientNamenodeProtocol and 
> DatanodeProtocol. When DFSClient is calling it, a faulty client can cause 
> data availability issues in a cluster. 
> In the past we had such an incident where a node with a faulty NIC was 
> randomly corrupting data. All clients ran on the machine reported all 
> accessed blocks and all associated replicas to be corrupt.  More recently, a 
> single faulty client process caused  a small number of missing blocks.  In 
> all cases, actual data was fine.
> The bad block reports from clients shouldn't be trusted blindly. Instead, the 
> namenode should send a datanode command to verify the claim. A bonus would be 
> to keep the record for a while and ignore repeated reports from the same 
> nodes.
> At minimum, there should be an option to ignore bad block reports from 
> clients, perhaps after logging it. A very crude way would be to make it short 
> out in {{ClientNamenodeProtocolServerSideTranslatorPB#reportBadBlocks()}}. 
> More sophisticated way would be to check for the datanode user name in 
> {{FSNamesystem#reportBadBlocks()}} so that it can be easily logged, or 
> optionally do further processing.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-16165) Backport the Hadoop 3.x Kerberos synchronization fix to Hadoop 2.x

2022-05-24 Thread Masatake Iwasaki (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16165?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Masatake Iwasaki updated HDFS-16165:

Target Version/s: 2.10.3  (was: 2.10.2)

> Backport the Hadoop 3.x Kerberos synchronization fix to Hadoop 2.x
> --
>
> Key: HDFS-16165
> URL: https://issues.apache.org/jira/browse/HDFS-16165
> Project: Hadoop HDFS
>  Issue Type: Wish
> Environment: Can be reproduced in docker HDFS environment with 
> Kerberos 
> https://github.com/vdesabou/kafka-docker-playground/blob/93a93de293ad2f9bb22afb244f2d8729a178296e/connect/connect-hdfs2-sink/hdfs2-sink-ha-kerberos-repro-gss-exception.sh
>Reporter: Daniel Osvath
>Priority: Major
>  Labels: Confluent
>
> *Problem Description*
> For more than a year Apache Kafka Connect users have been running into a 
> Kerberos renewal issue that causes our HDFS2 connectors to fail. 
> We have been able to consistently reproduce the issue under high load with 40 
> connectors (threads) that use the library. When we try an alternate 
> workaround that uses the kerberos keytab on the system the connector operates 
> without issues.
> We identified the root cause to be a race condition bug in the Hadoop 2.x 
> library that causes the ticker renewal to fail with the error below: 
> {code:java}
> Caused by: javax.security.sasl.SaslException: GSS initiate failed [Caused by 
> GSSException: No valid credentials provided (Mechanism level: Failed to find 
> any Kerberos tgt)]
>  at 
> com.sun.security.sasl.gsskerb.GssKrb5Client.evaluateChallenge(GssKrb5Client.java:211)We
>  reached the conclusion of the root cause once we tried the same environment 
> (40 connectors) with Hadoop 3.x, and our HDFS3 connectors and operated 
> without renewal issues. Additionally, identifying that the synchronization 
> issue has been fixed for the newer Hadoop 3.x releases  we confirmed our 
> hypothesis about the root cause. Request
> {code}
> There are many changes in HDFS 3 
> [UserGroupInformation.java|https://github.com/apache/hadoop/blob/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/security/UserGroupInformation.java]
>  related to UGI synchronization which were done as part of 
> https://issues.apache.org/jira/browse/HADOOP-9747, and those changes suggest 
> some race conditions were happening with older version, i.e HDFS 2.x Which 
> would explain why we can reproduce the problem with HDFS2.
> For example(among others):
> {code:java}
>   private void relogin(HadoopLoginContext login, boolean ignoreLastLoginTime)
>   throws IOException {
> // ensure the relogin is atomic to avoid leaving credentials in an
> // inconsistent state.  prevents other ugi instances, SASL, and SPNEGO
> // from accessing or altering credentials during the relogin.
> synchronized(login.getSubjectLock()) {
>   // another racing thread may have beat us to the relogin.
>   if (login == getLogin()) {
> unprotectedRelogin(login, ignoreLastLoginTime);
>   }
> }
>   }
> {code}
> All those changes were not backported to Hadoop 2.x (out HDFS2 connector uses 
> 2.10.1), on which several CDH distributions are based. 
> *Request*
> We would like to ask for the synchronization fix to be backported to Hadoop 
> 2.x so that our users can operate without issues. 
> *Impact*
> The older 2.x Hadoop version is used by our HDFS connector, which is used in 
> production by our community. Currently, the issue causes our HDFS connector 
> to fail, as it is unable to recover and renew the ticket at a later point. 
> Having the backported fix would allow our users to operate without issues 
> that require manual intervention every week (or few days in some cases). The 
> only workaround available to community for the issue is to run a command or 
> restart their workers. 



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-16165) Backport the Hadoop 3.x Kerberos synchronization fix to Hadoop 2.x

2022-05-24 Thread Masatake Iwasaki (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-16165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17541334#comment-17541334
 ] 

Masatake Iwasaki commented on HDFS-16165:
-

updated the target version for preparing 2.10.2 release.

> Backport the Hadoop 3.x Kerberos synchronization fix to Hadoop 2.x
> --
>
> Key: HDFS-16165
> URL: https://issues.apache.org/jira/browse/HDFS-16165
> Project: Hadoop HDFS
>  Issue Type: Wish
> Environment: Can be reproduced in docker HDFS environment with 
> Kerberos 
> https://github.com/vdesabou/kafka-docker-playground/blob/93a93de293ad2f9bb22afb244f2d8729a178296e/connect/connect-hdfs2-sink/hdfs2-sink-ha-kerberos-repro-gss-exception.sh
>Reporter: Daniel Osvath
>Priority: Major
>  Labels: Confluent
>
> *Problem Description*
> For more than a year Apache Kafka Connect users have been running into a 
> Kerberos renewal issue that causes our HDFS2 connectors to fail. 
> We have been able to consistently reproduce the issue under high load with 40 
> connectors (threads) that use the library. When we try an alternate 
> workaround that uses the kerberos keytab on the system the connector operates 
> without issues.
> We identified the root cause to be a race condition bug in the Hadoop 2.x 
> library that causes the ticker renewal to fail with the error below: 
> {code:java}
> Caused by: javax.security.sasl.SaslException: GSS initiate failed [Caused by 
> GSSException: No valid credentials provided (Mechanism level: Failed to find 
> any Kerberos tgt)]
>  at 
> com.sun.security.sasl.gsskerb.GssKrb5Client.evaluateChallenge(GssKrb5Client.java:211)We
>  reached the conclusion of the root cause once we tried the same environment 
> (40 connectors) with Hadoop 3.x, and our HDFS3 connectors and operated 
> without renewal issues. Additionally, identifying that the synchronization 
> issue has been fixed for the newer Hadoop 3.x releases  we confirmed our 
> hypothesis about the root cause. Request
> {code}
> There are many changes in HDFS 3 
> [UserGroupInformation.java|https://github.com/apache/hadoop/blob/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/security/UserGroupInformation.java]
>  related to UGI synchronization which were done as part of 
> https://issues.apache.org/jira/browse/HADOOP-9747, and those changes suggest 
> some race conditions were happening with older version, i.e HDFS 2.x Which 
> would explain why we can reproduce the problem with HDFS2.
> For example(among others):
> {code:java}
>   private void relogin(HadoopLoginContext login, boolean ignoreLastLoginTime)
>   throws IOException {
> // ensure the relogin is atomic to avoid leaving credentials in an
> // inconsistent state.  prevents other ugi instances, SASL, and SPNEGO
> // from accessing or altering credentials during the relogin.
> synchronized(login.getSubjectLock()) {
>   // another racing thread may have beat us to the relogin.
>   if (login == getLogin()) {
> unprotectedRelogin(login, ignoreLastLoginTime);
>   }
> }
>   }
> {code}
> All those changes were not backported to Hadoop 2.x (out HDFS2 connector uses 
> 2.10.1), on which several CDH distributions are based. 
> *Request*
> We would like to ask for the synchronization fix to be backported to Hadoop 
> 2.x so that our users can operate without issues. 
> *Impact*
> The older 2.x Hadoop version is used by our HDFS connector, which is used in 
> production by our community. Currently, the issue causes our HDFS connector 
> to fail, as it is unable to recover and renew the ticket at a later point. 
> Having the backported fix would allow our users to operate without issues 
> that require manual intervention every week (or few days in some cases). The 
> only workaround available to community for the issue is to run a command or 
> restart their workers. 



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14277) [SBN read] Observer benchmark results

2022-05-24 Thread Masatake Iwasaki (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17541287#comment-17541287
 ] 

Masatake Iwasaki commented on HDFS-14277:
-

I updated the target version and priority for preparing 2.10.3.

> [SBN read] Observer benchmark results
> -
>
> Key: HDFS-14277
> URL: https://issues.apache.org/jira/browse/HDFS-14277
> Project: Hadoop HDFS
>  Issue Type: Task
>  Components: ha, namenode
>Affects Versions: 2.10.0, 3.3.0
> Environment: Hardware: 4-node cluster, each node has 4 core, Xeon 
> 2.5Ghz, 25GB memory.
> Software: CentOS 7.4, CDH 6.0 + Consistent Reads from Standby, Kerberos, SSL, 
> RPC encryption + Data Transfer Encryption, Cloudera Navigator.
>Reporter: Wei-Chiu Chuang
>Priority: Major
> Attachments: Observer profiler.png, Screen Shot 2019-02-14 at 
> 11.50.37 AM.png, observer RPC queue processing time.png
>
>
> Ran a few benchmarks and profiler (VisualVM) today on an Observer-enabled 
> cluster. Would like to share the results with the community. The cluster has 
> 1 Observer node.
> h2. NNThroughputBenchmark
> Generate 1 million files and send fileStatus RPCs.
> {code:java}
> hadoop org.apache.hadoop.hdfs.server.namenode.NNThroughputBenchmark -fs 
>   -op fileStatus -threads 100 -files 100 -useExisting 
> -keepResults
> {code}
> h3. Kerberos, SSL, RPC encryption, Data Transfer Encryption enabled:
> ||Node||fileStatus (Ops per sec)||
> |Active NameNode|4865|
> |Observer|3996|
> h3. Kerberos, SSL:
> ||Node||fileStatus (Ops per sec)||
> |Active NameNode|7078|
> |Observer|6459|
> Observation:
>  * due to the edit tailing overhead, Observer node consume 30% CPU 
> utilization even if the cluster is idle.
>  * While Active NN has less than 1ms RPC processing time, Observer node has > 
> 5ms RPC processing time. I am still looking for the source of the longer 
> processing time. The longer RPC processing time may be the cause for the 
> performance degradation compared to that of Active NN. Note the cluster has 
> Cloudera Navigator installed which adds additional overhead to RPC processing 
> time.
>  * {{GlobalStateIdContext#isCoordinatedCall()}} pops up as one of the top 
> hotspots in the profiler. 
>  



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14277) [SBN read] Observer benchmark results

2022-05-24 Thread Masatake Iwasaki (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Masatake Iwasaki updated HDFS-14277:

Target Version/s: 2.10.3  (was: 2.10.2)

> [SBN read] Observer benchmark results
> -
>
> Key: HDFS-14277
> URL: https://issues.apache.org/jira/browse/HDFS-14277
> Project: Hadoop HDFS
>  Issue Type: Task
>  Components: ha, namenode
>Affects Versions: 2.10.0, 3.3.0
> Environment: Hardware: 4-node cluster, each node has 4 core, Xeon 
> 2.5Ghz, 25GB memory.
> Software: CentOS 7.4, CDH 6.0 + Consistent Reads from Standby, Kerberos, SSL, 
> RPC encryption + Data Transfer Encryption, Cloudera Navigator.
>Reporter: Wei-Chiu Chuang
>Priority: Blocker
> Attachments: Observer profiler.png, Screen Shot 2019-02-14 at 
> 11.50.37 AM.png, observer RPC queue processing time.png
>
>
> Ran a few benchmarks and profiler (VisualVM) today on an Observer-enabled 
> cluster. Would like to share the results with the community. The cluster has 
> 1 Observer node.
> h2. NNThroughputBenchmark
> Generate 1 million files and send fileStatus RPCs.
> {code:java}
> hadoop org.apache.hadoop.hdfs.server.namenode.NNThroughputBenchmark -fs 
>   -op fileStatus -threads 100 -files 100 -useExisting 
> -keepResults
> {code}
> h3. Kerberos, SSL, RPC encryption, Data Transfer Encryption enabled:
> ||Node||fileStatus (Ops per sec)||
> |Active NameNode|4865|
> |Observer|3996|
> h3. Kerberos, SSL:
> ||Node||fileStatus (Ops per sec)||
> |Active NameNode|7078|
> |Observer|6459|
> Observation:
>  * due to the edit tailing overhead, Observer node consume 30% CPU 
> utilization even if the cluster is idle.
>  * While Active NN has less than 1ms RPC processing time, Observer node has > 
> 5ms RPC processing time. I am still looking for the source of the longer 
> processing time. The longer RPC processing time may be the cause for the 
> performance degradation compared to that of Active NN. Note the cluster has 
> Cloudera Navigator installed which adds additional overhead to RPC processing 
> time.
>  * {{GlobalStateIdContext#isCoordinatedCall()}} pops up as one of the top 
> hotspots in the profiler. 
>  



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14277) [SBN read] Observer benchmark results

2022-05-24 Thread Masatake Iwasaki (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Masatake Iwasaki updated HDFS-14277:

Priority: Major  (was: Blocker)

> [SBN read] Observer benchmark results
> -
>
> Key: HDFS-14277
> URL: https://issues.apache.org/jira/browse/HDFS-14277
> Project: Hadoop HDFS
>  Issue Type: Task
>  Components: ha, namenode
>Affects Versions: 2.10.0, 3.3.0
> Environment: Hardware: 4-node cluster, each node has 4 core, Xeon 
> 2.5Ghz, 25GB memory.
> Software: CentOS 7.4, CDH 6.0 + Consistent Reads from Standby, Kerberos, SSL, 
> RPC encryption + Data Transfer Encryption, Cloudera Navigator.
>Reporter: Wei-Chiu Chuang
>Priority: Major
> Attachments: Observer profiler.png, Screen Shot 2019-02-14 at 
> 11.50.37 AM.png, observer RPC queue processing time.png
>
>
> Ran a few benchmarks and profiler (VisualVM) today on an Observer-enabled 
> cluster. Would like to share the results with the community. The cluster has 
> 1 Observer node.
> h2. NNThroughputBenchmark
> Generate 1 million files and send fileStatus RPCs.
> {code:java}
> hadoop org.apache.hadoop.hdfs.server.namenode.NNThroughputBenchmark -fs 
>   -op fileStatus -threads 100 -files 100 -useExisting 
> -keepResults
> {code}
> h3. Kerberos, SSL, RPC encryption, Data Transfer Encryption enabled:
> ||Node||fileStatus (Ops per sec)||
> |Active NameNode|4865|
> |Observer|3996|
> h3. Kerberos, SSL:
> ||Node||fileStatus (Ops per sec)||
> |Active NameNode|7078|
> |Observer|6459|
> Observation:
>  * due to the edit tailing overhead, Observer node consume 30% CPU 
> utilization even if the cluster is idle.
>  * While Active NN has less than 1ms RPC processing time, Observer node has > 
> 5ms RPC processing time. I am still looking for the source of the longer 
> processing time. The longer RPC processing time may be the cause for the 
> performance degradation compared to that of Active NN. Note the cluster has 
> Cloudera Navigator installed which adds additional overhead to RPC processing 
> time.
>  * {{GlobalStateIdContext#isCoordinatedCall()}} pops up as one of the top 
> hotspots in the profiler. 
>  



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-12548) HDFS Jenkins build is unstable on branch-2

2022-05-16 Thread Masatake Iwasaki (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-12548?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17537599#comment-17537599
 ] 

Masatake Iwasaki commented on HDFS-12548:
-

updated target version for preparing 2.10.2 release. reduced priority.

> HDFS Jenkins build is unstable on branch-2
> --
>
> Key: HDFS-12548
> URL: https://issues.apache.org/jira/browse/HDFS-12548
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: build
>Affects Versions: 2.9.0
>Reporter: Rushabh Shah
>Priority: Major
>
> Feel free move the ticket to another project (e.g. infra).
> Recently I attached branch-2 patch while working on one jira 
> [HDFS-12386|https://issues.apache.org/jira/browse/HDFS-12386?focusedCommentId=16180676=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16180676]
> There were at-least 100 failed and timed out tests. I am sure they are not 
> related to my patch.
> Also I came across another jira which was just a javadoc related change and 
> there were around 100 failed tests.
> Below are the details for pre-commits that failed in branch-2
> 1 [HDFS-12386 attempt 
> 1|https://issues.apache.org/jira/browse/HDFS-12386?focusedCommentId=16180069=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16180069]
> {noformat}
> Ran on slave: asf912.gq1.ygridcore.net/H12
> Failed with following error message:
> Build timed out (after 300 minutes). Marking the build as aborted.
> Build was aborted
> Performing Post build task...
> {noformat}
> 2. [HDFS-12386 attempt 
> 2|https://issues.apache.org/jira/browse/HDFS-12386?focusedCommentId=16180676=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16180676]
> {noformat}
> Ran on slave: asf900.gq1.ygridcore.net
> Failed with following error message:
> FATAL: command execution failed
> Command close created at
>   at hudson.remoting.Command.(Command.java:60)
>   at hudson.remoting.Channel$CloseCommand.(Channel.java:1123)
>   at hudson.remoting.Channel$CloseCommand.(Channel.java:1121)
>   at hudson.remoting.Channel.close(Channel.java:1281)
>   at hudson.remoting.Channel.close(Channel.java:1263)
>   at hudson.remoting.Channel$CloseCommand.execute(Channel.java:1128)
> Caused: hudson.remoting.Channel$OrderlyShutdown
>   at hudson.remoting.Channel$CloseCommand.execute(Channel.java:1129)
>   at hudson.remoting.Channel$1.handle(Channel.java:527)
>   at 
> hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:83)
> Caused: java.io.IOException: Backing channel 'H0' is disconnected.
>   at 
> hudson.remoting.RemoteInvocationHandler.channelOrFail(RemoteInvocationHandler.java:192)
>   at 
> hudson.remoting.RemoteInvocationHandler.invoke(RemoteInvocationHandler.java:257)
>   at com.sun.proxy.$Proxy125.isAlive(Unknown Source)
>   at hudson.Launcher$RemoteLauncher$ProcImpl.isAlive(Launcher.java:1043)
>   at hudson.Launcher$RemoteLauncher$ProcImpl.join(Launcher.java:1035)
>   at hudson.tasks.CommandInterpreter.join(CommandInterpreter.java:155)
>   at hudson.tasks.CommandInterpreter.perform(CommandInterpreter.java:109)
>   at hudson.tasks.CommandInterpreter.perform(CommandInterpreter.java:66)
>   at hudson.tasks.BuildStepMonitor$1.perform(BuildStepMonitor.java:20)
>   at 
> hudson.model.AbstractBuild$AbstractBuildExecution.perform(AbstractBuild.java:735)
>   at hudson.model.Build$BuildExecution.build(Build.java:206)
>   at hudson.model.Build$BuildExecution.doRun(Build.java:163)
>   at 
> hudson.model.AbstractBuild$AbstractBuildExecution.run(AbstractBuild.java:490)
>   at hudson.model.Run.execute(Run.java:1735)
>   at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:43)
>   at hudson.model.ResourceController.execute(ResourceController.java:97)
>   at hudson.model.Executor.run(Executor.java:405)
> {noformat}
> 3. [HDFS-12531 attempt 
> 1|https://issues.apache.org/jira/browse/HDFS-12531?focusedCommentId=16176493=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16176493]
> {noformat}
> Ran on slave:  asf911.gq1.ygridcore.net
> Failed with following error message:
> FATAL: command execution failed
> Command close created at
>   at hudson.remoting.Command.(Command.java:60)
>   at hudson.remoting.Channel$CloseCommand.(Channel.java:1123)
>   at hudson.remoting.Channel$CloseCommand.(Channel.java:1121)
>   at hudson.remoting.Channel.close(Channel.java:1281)
>   at hudson.remoting.Channel.close(Channel.java:1263)
>   at hudson.remoting.Channel$CloseCommand.execute(Channel.java:1128)
> Caused: hudson.remoting.Channel$OrderlyShutdown
>   at hudson.remoting.Channel$CloseCommand.execute(Channel.java:1129)
>   at 

[jira] [Updated] (HDFS-12548) HDFS Jenkins build is unstable on branch-2

2022-05-16 Thread Masatake Iwasaki (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-12548?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Masatake Iwasaki updated HDFS-12548:

Target Version/s: 2.10.3  (was: 2.10.2)

> HDFS Jenkins build is unstable on branch-2
> --
>
> Key: HDFS-12548
> URL: https://issues.apache.org/jira/browse/HDFS-12548
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: build
>Affects Versions: 2.9.0
>Reporter: Rushabh Shah
>Priority: Major
>
> Feel free move the ticket to another project (e.g. infra).
> Recently I attached branch-2 patch while working on one jira 
> [HDFS-12386|https://issues.apache.org/jira/browse/HDFS-12386?focusedCommentId=16180676=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16180676]
> There were at-least 100 failed and timed out tests. I am sure they are not 
> related to my patch.
> Also I came across another jira which was just a javadoc related change and 
> there were around 100 failed tests.
> Below are the details for pre-commits that failed in branch-2
> 1 [HDFS-12386 attempt 
> 1|https://issues.apache.org/jira/browse/HDFS-12386?focusedCommentId=16180069=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16180069]
> {noformat}
> Ran on slave: asf912.gq1.ygridcore.net/H12
> Failed with following error message:
> Build timed out (after 300 minutes). Marking the build as aborted.
> Build was aborted
> Performing Post build task...
> {noformat}
> 2. [HDFS-12386 attempt 
> 2|https://issues.apache.org/jira/browse/HDFS-12386?focusedCommentId=16180676=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16180676]
> {noformat}
> Ran on slave: asf900.gq1.ygridcore.net
> Failed with following error message:
> FATAL: command execution failed
> Command close created at
>   at hudson.remoting.Command.(Command.java:60)
>   at hudson.remoting.Channel$CloseCommand.(Channel.java:1123)
>   at hudson.remoting.Channel$CloseCommand.(Channel.java:1121)
>   at hudson.remoting.Channel.close(Channel.java:1281)
>   at hudson.remoting.Channel.close(Channel.java:1263)
>   at hudson.remoting.Channel$CloseCommand.execute(Channel.java:1128)
> Caused: hudson.remoting.Channel$OrderlyShutdown
>   at hudson.remoting.Channel$CloseCommand.execute(Channel.java:1129)
>   at hudson.remoting.Channel$1.handle(Channel.java:527)
>   at 
> hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:83)
> Caused: java.io.IOException: Backing channel 'H0' is disconnected.
>   at 
> hudson.remoting.RemoteInvocationHandler.channelOrFail(RemoteInvocationHandler.java:192)
>   at 
> hudson.remoting.RemoteInvocationHandler.invoke(RemoteInvocationHandler.java:257)
>   at com.sun.proxy.$Proxy125.isAlive(Unknown Source)
>   at hudson.Launcher$RemoteLauncher$ProcImpl.isAlive(Launcher.java:1043)
>   at hudson.Launcher$RemoteLauncher$ProcImpl.join(Launcher.java:1035)
>   at hudson.tasks.CommandInterpreter.join(CommandInterpreter.java:155)
>   at hudson.tasks.CommandInterpreter.perform(CommandInterpreter.java:109)
>   at hudson.tasks.CommandInterpreter.perform(CommandInterpreter.java:66)
>   at hudson.tasks.BuildStepMonitor$1.perform(BuildStepMonitor.java:20)
>   at 
> hudson.model.AbstractBuild$AbstractBuildExecution.perform(AbstractBuild.java:735)
>   at hudson.model.Build$BuildExecution.build(Build.java:206)
>   at hudson.model.Build$BuildExecution.doRun(Build.java:163)
>   at 
> hudson.model.AbstractBuild$AbstractBuildExecution.run(AbstractBuild.java:490)
>   at hudson.model.Run.execute(Run.java:1735)
>   at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:43)
>   at hudson.model.ResourceController.execute(ResourceController.java:97)
>   at hudson.model.Executor.run(Executor.java:405)
> {noformat}
> 3. [HDFS-12531 attempt 
> 1|https://issues.apache.org/jira/browse/HDFS-12531?focusedCommentId=16176493=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16176493]
> {noformat}
> Ran on slave:  asf911.gq1.ygridcore.net
> Failed with following error message:
> FATAL: command execution failed
> Command close created at
>   at hudson.remoting.Command.(Command.java:60)
>   at hudson.remoting.Channel$CloseCommand.(Channel.java:1123)
>   at hudson.remoting.Channel$CloseCommand.(Channel.java:1121)
>   at hudson.remoting.Channel.close(Channel.java:1281)
>   at hudson.remoting.Channel.close(Channel.java:1263)
>   at hudson.remoting.Channel$CloseCommand.execute(Channel.java:1128)
> Caused: hudson.remoting.Channel$OrderlyShutdown
>   at hudson.remoting.Channel$CloseCommand.execute(Channel.java:1129)
>   at hudson.remoting.Channel$1.handle(Channel.java:527)
>   at 
> 

[jira] [Updated] (HDFS-12548) HDFS Jenkins build is unstable on branch-2

2022-05-16 Thread Masatake Iwasaki (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-12548?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Masatake Iwasaki updated HDFS-12548:

Priority: Major  (was: Critical)

> HDFS Jenkins build is unstable on branch-2
> --
>
> Key: HDFS-12548
> URL: https://issues.apache.org/jira/browse/HDFS-12548
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: build
>Affects Versions: 2.9.0
>Reporter: Rushabh Shah
>Priority: Major
>
> Feel free move the ticket to another project (e.g. infra).
> Recently I attached branch-2 patch while working on one jira 
> [HDFS-12386|https://issues.apache.org/jira/browse/HDFS-12386?focusedCommentId=16180676=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16180676]
> There were at-least 100 failed and timed out tests. I am sure they are not 
> related to my patch.
> Also I came across another jira which was just a javadoc related change and 
> there were around 100 failed tests.
> Below are the details for pre-commits that failed in branch-2
> 1 [HDFS-12386 attempt 
> 1|https://issues.apache.org/jira/browse/HDFS-12386?focusedCommentId=16180069=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16180069]
> {noformat}
> Ran on slave: asf912.gq1.ygridcore.net/H12
> Failed with following error message:
> Build timed out (after 300 minutes). Marking the build as aborted.
> Build was aborted
> Performing Post build task...
> {noformat}
> 2. [HDFS-12386 attempt 
> 2|https://issues.apache.org/jira/browse/HDFS-12386?focusedCommentId=16180676=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16180676]
> {noformat}
> Ran on slave: asf900.gq1.ygridcore.net
> Failed with following error message:
> FATAL: command execution failed
> Command close created at
>   at hudson.remoting.Command.(Command.java:60)
>   at hudson.remoting.Channel$CloseCommand.(Channel.java:1123)
>   at hudson.remoting.Channel$CloseCommand.(Channel.java:1121)
>   at hudson.remoting.Channel.close(Channel.java:1281)
>   at hudson.remoting.Channel.close(Channel.java:1263)
>   at hudson.remoting.Channel$CloseCommand.execute(Channel.java:1128)
> Caused: hudson.remoting.Channel$OrderlyShutdown
>   at hudson.remoting.Channel$CloseCommand.execute(Channel.java:1129)
>   at hudson.remoting.Channel$1.handle(Channel.java:527)
>   at 
> hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:83)
> Caused: java.io.IOException: Backing channel 'H0' is disconnected.
>   at 
> hudson.remoting.RemoteInvocationHandler.channelOrFail(RemoteInvocationHandler.java:192)
>   at 
> hudson.remoting.RemoteInvocationHandler.invoke(RemoteInvocationHandler.java:257)
>   at com.sun.proxy.$Proxy125.isAlive(Unknown Source)
>   at hudson.Launcher$RemoteLauncher$ProcImpl.isAlive(Launcher.java:1043)
>   at hudson.Launcher$RemoteLauncher$ProcImpl.join(Launcher.java:1035)
>   at hudson.tasks.CommandInterpreter.join(CommandInterpreter.java:155)
>   at hudson.tasks.CommandInterpreter.perform(CommandInterpreter.java:109)
>   at hudson.tasks.CommandInterpreter.perform(CommandInterpreter.java:66)
>   at hudson.tasks.BuildStepMonitor$1.perform(BuildStepMonitor.java:20)
>   at 
> hudson.model.AbstractBuild$AbstractBuildExecution.perform(AbstractBuild.java:735)
>   at hudson.model.Build$BuildExecution.build(Build.java:206)
>   at hudson.model.Build$BuildExecution.doRun(Build.java:163)
>   at 
> hudson.model.AbstractBuild$AbstractBuildExecution.run(AbstractBuild.java:490)
>   at hudson.model.Run.execute(Run.java:1735)
>   at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:43)
>   at hudson.model.ResourceController.execute(ResourceController.java:97)
>   at hudson.model.Executor.run(Executor.java:405)
> {noformat}
> 3. [HDFS-12531 attempt 
> 1|https://issues.apache.org/jira/browse/HDFS-12531?focusedCommentId=16176493=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16176493]
> {noformat}
> Ran on slave:  asf911.gq1.ygridcore.net
> Failed with following error message:
> FATAL: command execution failed
> Command close created at
>   at hudson.remoting.Command.(Command.java:60)
>   at hudson.remoting.Channel$CloseCommand.(Channel.java:1123)
>   at hudson.remoting.Channel$CloseCommand.(Channel.java:1121)
>   at hudson.remoting.Channel.close(Channel.java:1281)
>   at hudson.remoting.Channel.close(Channel.java:1263)
>   at hudson.remoting.Channel$CloseCommand.execute(Channel.java:1128)
> Caused: hudson.remoting.Channel$OrderlyShutdown
>   at hudson.remoting.Channel$CloseCommand.execute(Channel.java:1129)
>   at hudson.remoting.Channel$1.handle(Channel.java:527)
>   at 
> 

[jira] [Updated] (HDFS-14630) Configuration.getTimeDurationHelper() should not log time unit warning in info log.

2022-03-19 Thread Masatake Iwasaki (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14630?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Masatake Iwasaki updated HDFS-14630:

Fix Version/s: 3.2.3
   (was: 3.2.4)

> Configuration.getTimeDurationHelper() should not log time unit warning in 
> info log.
> ---
>
> Key: HDFS-14630
> URL: https://issues.apache.org/jira/browse/HDFS-14630
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs
>Affects Versions: 3.1.1
>Reporter: Surendra Singh Lilhore
>Assignee: Hemanth Boyina
>Priority: Minor
> Fix For: 3.3.0, 3.2.3
>
> Attachments: HDFS-14630.001.patch, HDFS-14630.patch
>
>
> To solve [HDFS-12920|https://issues.apache.org/jira/browse/HDFS-12920] issue 
> we configured "dfs.client.datanode-restart.timeout" without time unit. No log 
> file is full of
> {noformat}
> 2019-06-22 20:13:14,605 | INFO  | pool-12-thread-1 | No unit for 
> dfs.client.datanode-restart.timeout(30) assuming SECONDS 
> org.apache.hadoop.conf.Configuration.logDeprecation(Configuration.java:1409){noformat}
> No need to log this, just give the behavior in property description.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14630) Configuration.getTimeDurationHelper() should not log time unit warning in info log.

2022-03-19 Thread Masatake Iwasaki (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17509376#comment-17509376
 ] 

Masatake Iwasaki commented on HDFS-14630:
-

and brandh-3.2.3.

> Configuration.getTimeDurationHelper() should not log time unit warning in 
> info log.
> ---
>
> Key: HDFS-14630
> URL: https://issues.apache.org/jira/browse/HDFS-14630
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs
>Affects Versions: 3.1.1
>Reporter: Surendra Singh Lilhore
>Assignee: Hemanth Boyina
>Priority: Minor
> Fix For: 3.3.0, 3.2.4
>
> Attachments: HDFS-14630.001.patch, HDFS-14630.patch
>
>
> To solve [HDFS-12920|https://issues.apache.org/jira/browse/HDFS-12920] issue 
> we configured "dfs.client.datanode-restart.timeout" without time unit. No log 
> file is full of
> {noformat}
> 2019-06-22 20:13:14,605 | INFO  | pool-12-thread-1 | No unit for 
> dfs.client.datanode-restart.timeout(30) assuming SECONDS 
> org.apache.hadoop.conf.Configuration.logDeprecation(Configuration.java:1409){noformat}
> No need to log this, just give the behavior in property description.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14630) Configuration.getTimeDurationHelper() should not log time unit warning in info log.

2022-03-18 Thread Masatake Iwasaki (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17508676#comment-17508676
 ] 

Masatake Iwasaki commented on HDFS-14630:
-

cherry-picked this to branch-3.2.

https://lists.apache.org/thread/2pn7go8wx6w8tftwf3gotjh7rvzndv6z

> Configuration.getTimeDurationHelper() should not log time unit warning in 
> info log.
> ---
>
> Key: HDFS-14630
> URL: https://issues.apache.org/jira/browse/HDFS-14630
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs
>Affects Versions: 3.1.1
>Reporter: Surendra Singh Lilhore
>Assignee: Hemanth Boyina
>Priority: Minor
> Fix For: 3.3.0, 3.2.4
>
> Attachments: HDFS-14630.001.patch, HDFS-14630.patch
>
>
> To solve [HDFS-12920|https://issues.apache.org/jira/browse/HDFS-12920] issue 
> we configured "dfs.client.datanode-restart.timeout" without time unit. No log 
> file is full of
> {noformat}
> 2019-06-22 20:13:14,605 | INFO  | pool-12-thread-1 | No unit for 
> dfs.client.datanode-restart.timeout(30) assuming SECONDS 
> org.apache.hadoop.conf.Configuration.logDeprecation(Configuration.java:1409){noformat}
> No need to log this, just give the behavior in property description.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14630) Configuration.getTimeDurationHelper() should not log time unit warning in info log.

2022-03-18 Thread Masatake Iwasaki (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14630?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Masatake Iwasaki updated HDFS-14630:

Fix Version/s: 3.2.4

> Configuration.getTimeDurationHelper() should not log time unit warning in 
> info log.
> ---
>
> Key: HDFS-14630
> URL: https://issues.apache.org/jira/browse/HDFS-14630
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs
>Affects Versions: 3.1.1
>Reporter: Surendra Singh Lilhore
>Assignee: Hemanth Boyina
>Priority: Minor
> Fix For: 3.3.0, 3.2.4
>
> Attachments: HDFS-14630.001.patch, HDFS-14630.patch
>
>
> To solve [HDFS-12920|https://issues.apache.org/jira/browse/HDFS-12920] issue 
> we configured "dfs.client.datanode-restart.timeout" without time unit. No log 
> file is full of
> {noformat}
> 2019-06-22 20:13:14,605 | INFO  | pool-12-thread-1 | No unit for 
> dfs.client.datanode-restart.timeout(30) assuming SECONDS 
> org.apache.hadoop.conf.Configuration.logDeprecation(Configuration.java:1409){noformat}
> No need to log this, just give the behavior in property description.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-16354) Add description of GETSNAPSHOTDIFFLISTING to WebHDFS doc

2021-12-07 Thread Masatake Iwasaki (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16354?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Masatake Iwasaki updated HDFS-16354:

Fix Version/s: 3.4.0
 Hadoop Flags: Reviewed
   Resolution: Fixed
   Status: Resolved  (was: Patch Available)

> Add description of GETSNAPSHOTDIFFLISTING to WebHDFS doc
> 
>
> Key: HDFS-16354
> URL: https://issues.apache.org/jira/browse/HDFS-16354
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Masatake Iwasaki
>Assignee: Masatake Iwasaki
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> HDFS-16091 added GETSNAPSHOTDIFFLISTING op leveraging 
> ClientProtocol#getSnapshotDiffReportListing.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-16358) HttpFS implementation for getSnapshotDiffReportListing

2021-12-02 Thread Masatake Iwasaki (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16358?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Masatake Iwasaki updated HDFS-16358:

Fix Version/s: 3.4.0
 Hadoop Flags: Reviewed
   Resolution: Fixed
   Status: Resolved  (was: Patch Available)

> HttpFS implementation for getSnapshotDiffReportListing
> --
>
> Key: HDFS-16358
> URL: https://issues.apache.org/jira/browse/HDFS-16358
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Viraj Jasani
>Assignee: Viraj Jasani
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> HttpFS should support getSnapshotDiffReportListing API for improved snapshot 
> diff. WebHdfs implementation available on HDFS-16091.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-16354) Add description of GETSNAPSHOTDIFFLISTING to WebHDFS doc

2021-11-30 Thread Masatake Iwasaki (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16354?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Masatake Iwasaki updated HDFS-16354:

Status: Patch Available  (was: Open)

> Add description of GETSNAPSHOTDIFFLISTING to WebHDFS doc
> 
>
> Key: HDFS-16354
> URL: https://issues.apache.org/jira/browse/HDFS-16354
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Masatake Iwasaki
>Assignee: Masatake Iwasaki
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> HDFS-16091 added GETSNAPSHOTDIFFLISTING op leveraging 
> ClientProtocol#getSnapshotDiffReportListing.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-16354) Add description of GETSNAPSHOTDIFFLISTING to WebHDFS doc

2021-11-24 Thread Masatake Iwasaki (Jira)
Masatake Iwasaki created HDFS-16354:
---

 Summary: Add description of GETSNAPSHOTDIFFLISTING to WebHDFS doc
 Key: HDFS-16354
 URL: https://issues.apache.org/jira/browse/HDFS-16354
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Masatake Iwasaki
Assignee: Masatake Iwasaki


HDFS-16091 added GETSNAPSHOTDIFFLISTING op leveraging 
ClientProtocol#getSnapshotDiffReportListing.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-16091) WebHDFS should support getSnapshotDiffReportListing

2021-10-30 Thread Masatake Iwasaki (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16091?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Masatake Iwasaki updated HDFS-16091:

Fix Version/s: 3.4.0
 Hadoop Flags: Reviewed
   Resolution: Fixed
   Status: Resolved  (was: Patch Available)

> WebHDFS should support getSnapshotDiffReportListing
> ---
>
> Key: HDFS-16091
> URL: https://issues.apache.org/jira/browse/HDFS-16091
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Masatake Iwasaki
>Assignee: Masatake Iwasaki
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>  Time Spent: 3h
>  Remaining Estimate: 0h
>
> When there are millions of diffs between two snapshots, the old 
> getSnapshotDiffReport() isn't scalable.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-16248) Add builder to SnapshotDiffReportListing

2021-10-01 Thread Masatake Iwasaki (Jira)
Masatake Iwasaki created HDFS-16248:
---

 Summary: Add builder to SnapshotDiffReportListing
 Key: HDFS-16248
 URL: https://issues.apache.org/jira/browse/HDFS-16248
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Masatake Iwasaki
Assignee: Masatake Iwasaki






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-16249) Avoid calling getSnapshotDiffReportListing multiple times if not supported

2021-10-01 Thread Masatake Iwasaki (Jira)
Masatake Iwasaki created HDFS-16249:
---

 Summary: Avoid calling getSnapshotDiffReportListing multiple times 
if not supported
 Key: HDFS-16249
 URL: https://issues.apache.org/jira/browse/HDFS-16249
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Masatake Iwasaki
Assignee: Masatake Iwasaki






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-16240) Replace unshaded guava in HttpFSServerWebServer

2021-09-27 Thread Masatake Iwasaki (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16240?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Masatake Iwasaki updated HDFS-16240:

Fix Version/s: 3.4.0
 Hadoop Flags: Reviewed
   Resolution: Fixed
   Status: Resolved  (was: Patch Available)

> Replace unshaded guava in HttpFSServerWebServer
> ---
>
> Key: HDFS-16240
> URL: https://issues.apache.org/jira/browse/HDFS-16240
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: httpfs
>Reporter: Masatake Iwasaki
>Assignee: Masatake Iwasaki
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> HDFS-16129 added use of com.google.common.annotations.VisibleForTesting to 
> HttpFSServerWebServer. It is replaced by replace-guava replacer of 
> HADOOP-17288 on every build time.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-16129) HttpFS signature secret file misusage

2021-09-27 Thread Masatake Iwasaki (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16129?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Masatake Iwasaki updated HDFS-16129:

Fix Version/s: 3.4.0

> HttpFS signature secret file misusage
> -
>
> Key: HDFS-16129
> URL: https://issues.apache.org/jira/browse/HDFS-16129
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: httpfs
>Affects Versions: 3.4.0
>Reporter: Tamas Domok
>Assignee: Tamas Domok
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>  Time Spent: 8h
>  Remaining Estimate: 0h
>
> I started to work on the YARN-10814 issue, and found this bug in the HttpFS. 
> I investigated the problem and I already have some fix for it.
>  
> If the deprecated *httpfs.authentication.signature.secret.file* is not set in 
> the configuration (e.g.: httpfs-site.xml) then the new 
> *hadoop.http.authentication.signature.secret.file* config option won't be 
> used, it will fallback to the random secret provider silently.
> The _HttpFSServerWebServer_ sets an _authFilterConfigurationPrefix_ when 
> building the server for the old path (*httpfs.authentication.*). Later the 
> _AuthenticationFilter.constructSecretProvider_ will immediately fallback to 
> +random+, because the config won't contain the file. If the old path was set 
> too, then it handled the file, and the provider was set to +file+ type.
> The configuration should be based on both the old and the new prefix filter, 
> merging the two. The new config option should win in my opinion.
>  
> There is another issue in the _HttpFSAuthenticationFilter_, it is closely 
> related.
> If both config option is set then the _HttpFSAuthenticationFilter_ will fail 
> with an impossible file path (e.g.: 
> *${httpfs.config.dir}/httpfs-signature.secret*).
> _HttpFSAuthenticationFilter_ constructs the configuration, filtering first 
> the new config prefix then the old prefix. The old prefix code works 
> correctly, it uses the _conf.get(key)_
> instead of the _entry.getValue()_ which gives back the file path mentioned 
> earlier. The code duplication can be eliminated and I think it would be 
> better to change the order, first adding the config options from the old path 
> then the new, and the new should overwrite the old values, with a warning log 
> message.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-16240) Replace unshaded guava in HttpFSServerWebServer

2021-09-27 Thread Masatake Iwasaki (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16240?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Masatake Iwasaki updated HDFS-16240:

Status: Patch Available  (was: Open)

> Replace unshaded guava in HttpFSServerWebServer
> ---
>
> Key: HDFS-16240
> URL: https://issues.apache.org/jira/browse/HDFS-16240
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: httpfs
>Reporter: Masatake Iwasaki
>Assignee: Masatake Iwasaki
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> HDFS-16129 added use of com.google.common.annotations.VisibleForTesting to 
> HttpFSServerWebServer. It is replaced by replace-guava replacer of 
> HADOOP-17288 on every build time.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-16240) Replace unshaded guava in HttpFSServerWebServer

2021-09-27 Thread Masatake Iwasaki (Jira)
Masatake Iwasaki created HDFS-16240:
---

 Summary: Replace unshaded guava in HttpFSServerWebServer
 Key: HDFS-16240
 URL: https://issues.apache.org/jira/browse/HDFS-16240
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: httpfs
Reporter: Masatake Iwasaki
Assignee: Masatake Iwasaki


HDFS-16129 added use of com.google.common.annotations.VisibleForTesting to 
HttpFSServerWebServer. It is replaced by replace-guava replacer of HADOOP-17288 
on every build time.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-16091) WebHDFS should support getSnapshotDiffReportListing

2021-09-02 Thread Masatake Iwasaki (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16091?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Masatake Iwasaki updated HDFS-16091:

Status: Patch Available  (was: Open)

> WebHDFS should support getSnapshotDiffReportListing
> ---
>
> Key: HDFS-16091
> URL: https://issues.apache.org/jira/browse/HDFS-16091
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Masatake Iwasaki
>Assignee: Masatake Iwasaki
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> When there are millions of diffs between two snapshots, the old 
> getSnapshotDiffReport() isn't scalable.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-12920) HDFS default value change (with adding time unit) breaks old version MR tarball work with new version (3.0) of hadoop

2021-07-23 Thread Masatake Iwasaki (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-12920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17386221#comment-17386221
 ] 

Masatake Iwasaki commented on HDFS-12920:
-

+1 on reverting HDFS-10845.

> HDFS default value change (with adding time unit) breaks old version MR 
> tarball work with new version (3.0) of hadoop
> -
>
> Key: HDFS-12920
> URL: https://issues.apache.org/jira/browse/HDFS-12920
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs
>Reporter: Junping Du
>Priority: Blocker
>
> After HADOOP-15059 get resolved. I tried to deploy 2.9.0 tar ball with 3.0.0 
> RC1, and run the job with following errors:
> {noformat}
> 2017-12-12 13:29:06,824 INFO [main] 
> org.apache.hadoop.service.AbstractService: Service 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster failed in state INITED; cause: 
> org.apache.hadoop.yarn.exceptions.YarnRuntimeException: 
> java.lang.NumberFormatException: For input string: "30s"
> org.apache.hadoop.yarn.exceptions.YarnRuntimeException: 
> java.lang.NumberFormatException: For input string: "30s"
>   at 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster$2.call(MRAppMaster.java:542)
>   at 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster$2.call(MRAppMaster.java:522)
>   at 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster.callWithJobClassLoader(MRAppMaster.java:1764)
>   at 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster.createOutputCommitter(MRAppMaster.java:522)
>   at 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster.serviceInit(MRAppMaster.java:308)
>   at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:164)
>   at 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster$5.run(MRAppMaster.java:1722)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1886)
>   at 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster.initAndStartAppMaster(MRAppMaster.java:1719)
>   at 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster.main(MRAppMaster.java:1650)
> {noformat}
> This is because HDFS-10845, we are adding time unit to hdfs-default.xml but 
> it cannot be recognized by old version MR jars. 
> This break our rolling upgrade story, so should mark as blocker.
> A quick workaround is to add values in hdfs-site.xml with removing all time 
> unit. But the right way may be to revert HDFS-10845 (and get rid of noisy 
> warnings).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-13916) Distcp SnapshotDiff to support WebHDFS

2021-06-26 Thread Masatake Iwasaki (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-13916?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Masatake Iwasaki updated HDFS-13916:

Fix Version/s: 3.4.0
 Hadoop Flags: Reviewed
   Resolution: Fixed
   Status: Resolved  (was: Patch Available)

> Distcp SnapshotDiff to support WebHDFS
> --
>
> Key: HDFS-13916
> URL: https://issues.apache.org/jira/browse/HDFS-13916
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: distcp, webhdfs
>Affects Versions: 3.0.1, 3.1.1
>Reporter: Xun REN
>Assignee: Xun REN
>Priority: Major
>  Labels: easyfix, newbie, patch
> Fix For: 3.4.0
>
> Attachments: HDFS-13916.002.patch, HDFS-13916.003.patch, 
> HDFS-13916.004.patch, HDFS-13916.005.patch, HDFS-13916.006.patch, 
> HDFS-13916.007.patch, HDFS-13916.patch
>
>
> [~ljain] has worked on the JIRA: HDFS-13052 to provide the possibility to 
> make DistCP of SnapshotDiff with WebHDFSFileSystem. However, in the patch, 
> there is no modification for the real java class which is used by launching 
> the command "hadoop distcp ..."
>  
> You can check in the latest version here:
> [https://github.com/apache/hadoop/blob/branch-3.1.1/hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/DistCpSync.java#L96-L100]
> In the method "preSyncCheck" of the class "DistCpSync", we still check if the 
> file system is DFS. 
> So I propose to change the class DistCpSync in order to take into 
> consideration what was committed by Lokesh Jain.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13916) Distcp SnapshotDiff to support WebHDFS

2021-06-26 Thread Masatake Iwasaki (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-13916?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17370069#comment-17370069
 ] 

Masatake Iwasaki commented on HDFS-13916:
-

Thanks, [~weichiu] and [~ayushtkn]. I committed the patch and filed HDFS-16091 
as a follow-up.

> Distcp SnapshotDiff to support WebHDFS
> --
>
> Key: HDFS-13916
> URL: https://issues.apache.org/jira/browse/HDFS-13916
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: distcp, webhdfs
>Affects Versions: 3.0.1, 3.1.1
>Reporter: Xun REN
>Assignee: Xun REN
>Priority: Major
>  Labels: easyfix, newbie, patch
> Attachments: HDFS-13916.002.patch, HDFS-13916.003.patch, 
> HDFS-13916.004.patch, HDFS-13916.005.patch, HDFS-13916.006.patch, 
> HDFS-13916.007.patch, HDFS-13916.patch
>
>
> [~ljain] has worked on the JIRA: HDFS-13052 to provide the possibility to 
> make DistCP of SnapshotDiff with WebHDFSFileSystem. However, in the patch, 
> there is no modification for the real java class which is used by launching 
> the command "hadoop distcp ..."
>  
> You can check in the latest version here:
> [https://github.com/apache/hadoop/blob/branch-3.1.1/hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/DistCpSync.java#L96-L100]
> In the method "preSyncCheck" of the class "DistCpSync", we still check if the 
> file system is DFS. 
> So I propose to change the class DistCpSync in order to take into 
> consideration what was committed by Lokesh Jain.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-16091) WebHDFS should support getSnapshotDiffReportListing

2021-06-26 Thread Masatake Iwasaki (Jira)
Masatake Iwasaki created HDFS-16091:
---

 Summary: WebHDFS should support getSnapshotDiffReportListing
 Key: HDFS-16091
 URL: https://issues.apache.org/jira/browse/HDFS-16091
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Masatake Iwasaki
Assignee: Masatake Iwasaki


When there are millions of diffs between two snapshots, the old 
getSnapshotDiffReport() isn't scalable.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13916) Distcp SnapshotDiff to support WebHDFS

2021-06-21 Thread Masatake Iwasaki (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-13916?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17366562#comment-17366562
 ] 

Masatake Iwasaki commented on HDFS-13916:
-

I rebased the patch on behalf of [~renxunsaky]. attached 007.

> Distcp SnapshotDiff to support WebHDFS
> --
>
> Key: HDFS-13916
> URL: https://issues.apache.org/jira/browse/HDFS-13916
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: distcp, webhdfs
>Affects Versions: 3.0.1, 3.1.1
>Reporter: Xun REN
>Assignee: Xun REN
>Priority: Major
>  Labels: easyfix, newbie, patch
> Attachments: HDFS-13916.002.patch, HDFS-13916.003.patch, 
> HDFS-13916.004.patch, HDFS-13916.005.patch, HDFS-13916.006.patch, 
> HDFS-13916.007.patch, HDFS-13916.patch
>
>
> [~ljain] has worked on the JIRA: HDFS-13052 to provide the possibility to 
> make DistCP of SnapshotDiff with WebHDFSFileSystem. However, in the patch, 
> there is no modification for the real java class which is used by launching 
> the command "hadoop distcp ..."
>  
> You can check in the latest version here:
> [https://github.com/apache/hadoop/blob/branch-3.1.1/hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/DistCpSync.java#L96-L100]
> In the method "preSyncCheck" of the class "DistCpSync", we still check if the 
> file system is DFS. 
> So I propose to change the class DistCpSync in order to take into 
> consideration what was committed by Lokesh Jain.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-13916) Distcp SnapshotDiff to support WebHDFS

2021-06-21 Thread Masatake Iwasaki (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-13916?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Masatake Iwasaki updated HDFS-13916:

Attachment: HDFS-13916.007.patch

> Distcp SnapshotDiff to support WebHDFS
> --
>
> Key: HDFS-13916
> URL: https://issues.apache.org/jira/browse/HDFS-13916
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: distcp, webhdfs
>Affects Versions: 3.0.1, 3.1.1
>Reporter: Xun REN
>Assignee: Xun REN
>Priority: Major
>  Labels: easyfix, newbie, patch
> Attachments: HDFS-13916.002.patch, HDFS-13916.003.patch, 
> HDFS-13916.004.patch, HDFS-13916.005.patch, HDFS-13916.006.patch, 
> HDFS-13916.007.patch, HDFS-13916.patch
>
>
> [~ljain] has worked on the JIRA: HDFS-13052 to provide the possibility to 
> make DistCP of SnapshotDiff with WebHDFSFileSystem. However, in the patch, 
> there is no modification for the real java class which is used by launching 
> the command "hadoop distcp ..."
>  
> You can check in the latest version here:
> [https://github.com/apache/hadoop/blob/branch-3.1.1/hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/DistCpSync.java#L96-L100]
> In the method "preSyncCheck" of the class "DistCpSync", we still check if the 
> file system is DFS. 
> So I propose to change the class DistCpSync in order to take into 
> consideration what was committed by Lokesh Jain.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13916) Distcp SnapshotDiff to support WebHDFS

2021-05-20 Thread Masatake Iwasaki (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-13916?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17348950#comment-17348950
 ] 

Masatake Iwasaki commented on HDFS-13916:
-

[~renxunsaky] Could you rebase the patch on current trunk? I filed HADOOP-17719 
and submitted similar PR before finding this JIRA. I can add supplemental fix 
in HADOOP-17719 after this comes in.

> Distcp SnapshotDiff to support WebHDFS
> --
>
> Key: HDFS-13916
> URL: https://issues.apache.org/jira/browse/HDFS-13916
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: distcp, webhdfs
>Affects Versions: 3.0.1, 3.1.1
>Reporter: Xun REN
>Assignee: Xun REN
>Priority: Major
>  Labels: easyfix, newbie, patch
> Attachments: HDFS-13916.002.patch, HDFS-13916.003.patch, 
> HDFS-13916.004.patch, HDFS-13916.005.patch, HDFS-13916.006.patch, 
> HDFS-13916.patch
>
>
> [~ljain] has worked on the JIRA: HDFS-13052 to provide the possibility to 
> make DistCP of SnapshotDiff with WebHDFSFileSystem. However, in the patch, 
> there is no modification for the real java class which is used by launching 
> the command "hadoop distcp ..."
>  
> You can check in the latest version here:
> [https://github.com/apache/hadoop/blob/branch-3.1.1/hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/DistCpSync.java#L96-L100]
> In the method "preSyncCheck" of the class "DistCpSync", we still check if the 
> file system is DFS. 
> So I propose to change the class DistCpSync in order to take into 
> consideration what was committed by Lokesh Jain.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15740) Make basename cross-platform

2021-01-31 Thread Masatake Iwasaki (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15740?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Masatake Iwasaki updated HDFS-15740:

Fix Version/s: (was: 3.3.1)
   3.4.0

> Make basename cross-platform
> 
>
> Key: HDFS-15740
> URL: https://issues.apache.org/jira/browse/HDFS-15740
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: libhdfs++
>Affects Versions: 3.4.0
>Reporter: Gautham Banasandra
>Assignee: Gautham Banasandra
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>   Original Estimate: 24h
>  Time Spent: 9.5h
>  Remaining Estimate: 14.5h
>
> The *basename* function isn't available on Visual Studio 2019 compiler. We 
> need to make it cross platform.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-15802) Address build failure of hadoop-hdfs-native-client on RHEL/CentOS 8

2021-01-31 Thread Masatake Iwasaki (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15802?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Masatake Iwasaki resolved HDFS-15802.
-
Resolution: Duplicate

> Address build failure of hadoop-hdfs-native-client on RHEL/CentOS 8
> ---
>
> Key: HDFS-15802
> URL: https://issues.apache.org/jira/browse/HDFS-15802
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: libhdfs++
>Reporter: Masatake Iwasaki
>Assignee: Masatake Iwasaki
>Priority: Major
>
> Building environment described in BUILDING.txt does not work for RHEL CentOS 
> 8 due to HDFS-15740.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-15802) Address build failure of hadoop-hdfs-native-client on RHEL/CentOS 8

2021-01-29 Thread Masatake Iwasaki (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17275474#comment-17275474
 ] 

Masatake Iwasaki edited comment on HDFS-15802 at 1/30/21, 5:07 AM:
---

For gcc, using gcc-toolset-9 of AppStream of adding {{-lstdc++fs}} on linking 
might be an option. For cmake, installing CMake 3.19 or above from source might 
be needed.


was (Author: iwasakims):
For gcc, using gcc-devtoolset-9-toolchain of AppStream of adding {{-lstdc++fs}} 
on linking might be an option. For cmake, installing CMake 3.19 or above from 
source might be needed.

> Address build failure of hadoop-hdfs-native-client on RHEL/CentOS 8
> ---
>
> Key: HDFS-15802
> URL: https://issues.apache.org/jira/browse/HDFS-15802
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: libhdfs++
>Reporter: Masatake Iwasaki
>Assignee: Masatake Iwasaki
>Priority: Major
>
> Building environment described in BUILDING.txt does not work for RHEL CentOS 
> 8 due to HDFS-15740.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15802) Address build failure of hadoop-hdfs-native-client on RHEL/CentOS 8

2021-01-29 Thread Masatake Iwasaki (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17275474#comment-17275474
 ] 

Masatake Iwasaki commented on HDFS-15802:
-

For gcc, using gcc-devtoolset-9-toolchain of AppStream of adding {{-lstdc++fs}} 
on linking might be an option. For cmake, installing CMake 3.19 or above from 
source might be needed.

> Address build failure of hadoop-hdfs-native-client on RHEL/CentOS 8
> ---
>
> Key: HDFS-15802
> URL: https://issues.apache.org/jira/browse/HDFS-15802
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: libhdfs++
>Reporter: Masatake Iwasaki
>Assignee: Masatake Iwasaki
>Priority: Major
>
> Building environment described in BUILDING.txt does not work for RHEL CentOS 
> 8 due to HDFS-15740.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-15802) Address build failure of hadoop-hdfs-native-client on RHEL/CentOS 8

2021-01-29 Thread Masatake Iwasaki (Jira)
Masatake Iwasaki created HDFS-15802:
---

 Summary: Address build failure of hadoop-hdfs-native-client on 
RHEL/CentOS 8
 Key: HDFS-15802
 URL: https://issues.apache.org/jira/browse/HDFS-15802
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: libhdfs++
Reporter: Masatake Iwasaki
Assignee: Masatake Iwasaki


Building environment described in BUILDING.txt does not work for RHEL CentOS 8 
due to HDFS-15740.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15648) TestFileChecksum should be parameterized

2021-01-03 Thread Masatake Iwasaki (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15648?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Masatake Iwasaki updated HDFS-15648:

Fix Version/s: 3.4.0
 Hadoop Flags: Reviewed
   Resolution: Fixed
   Status: Resolved  (was: Patch Available)

> TestFileChecksum should be parameterized
> 
>
> Key: HDFS-15648
> URL: https://issues.apache.org/jira/browse/HDFS-15648
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Ahmed Hussein
>Assignee: Masatake Iwasaki
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> {{TestFileChecksumCompositeCrc}} extends {{TestFileChecksum}} overriding 3 
> methods that return a constant flag True/False.
> The class is useless and it causes confusion with two different jiras, while 
> the main bug should be in TestFileChecksum.
> The {{TestFileChecksum}} should be parameterized



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15743) Fix -Pdist build failure of hadoop-hdfs-native-client

2020-12-21 Thread Masatake Iwasaki (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15743?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Masatake Iwasaki updated HDFS-15743:

Fix Version/s: 3.4.0
 Hadoop Flags: Reviewed
   Resolution: Fixed
   Status: Resolved  (was: Patch Available)

> Fix -Pdist build failure of hadoop-hdfs-native-client
> -
>
> Key: HDFS-15743
> URL: https://issues.apache.org/jira/browse/HDFS-15743
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Masatake Iwasaki
>Assignee: Masatake Iwasaki
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> {noformat}
> [INFO] --- exec-maven-plugin:1.3.1:exec (pre-dist) @ 
> hadoop-hdfs-native-client ---
> tar: ./*: Cannot stat: No such file or directory
> tar: Exiting with failure status due to previous errors
> Checking to bundle with:
> bundleoption=false, liboption=snappy.lib, pattern=libsnappy. libdir=
> Checking to bundle with:
> bundleoption=false, liboption=zstd.lib, pattern=libzstd. libdir=
> Checking to bundle with:
> bundleoption=false, liboption=openssl.lib, pattern=libcrypto. libdir=
> Checking to bundle with:
> bundleoption=false, liboption=isal.lib, pattern=libisal. libdir=
> Checking to bundle with:
> bundleoption=, liboption=pmdk.lib, pattern=pmdk libdir=
> Bundling bin files failed
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15743) Fix -Pdist build failure of hadoop-hdfs-native-client

2020-12-21 Thread Masatake Iwasaki (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15743?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Masatake Iwasaki updated HDFS-15743:

Status: Patch Available  (was: Open)

> Fix -Pdist build failure of hadoop-hdfs-native-client
> -
>
> Key: HDFS-15743
> URL: https://issues.apache.org/jira/browse/HDFS-15743
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Masatake Iwasaki
>Assignee: Masatake Iwasaki
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> {noformat}
> [INFO] --- exec-maven-plugin:1.3.1:exec (pre-dist) @ 
> hadoop-hdfs-native-client ---
> tar: ./*: Cannot stat: No such file or directory
> tar: Exiting with failure status due to previous errors
> Checking to bundle with:
> bundleoption=false, liboption=snappy.lib, pattern=libsnappy. libdir=
> Checking to bundle with:
> bundleoption=false, liboption=zstd.lib, pattern=libzstd. libdir=
> Checking to bundle with:
> bundleoption=false, liboption=openssl.lib, pattern=libcrypto. libdir=
> Checking to bundle with:
> bundleoption=false, liboption=isal.lib, pattern=libisal. libdir=
> Checking to bundle with:
> bundleoption=, liboption=pmdk.lib, pattern=pmdk libdir=
> Bundling bin files failed
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-15743) Fix -Pdist build failure of hadoop-hdfs-native-client

2020-12-21 Thread Masatake Iwasaki (Jira)
Masatake Iwasaki created HDFS-15743:
---

 Summary: Fix -Pdist build failure of hadoop-hdfs-native-client
 Key: HDFS-15743
 URL: https://issues.apache.org/jira/browse/HDFS-15743
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Masatake Iwasaki
Assignee: Masatake Iwasaki


{noformat}
[INFO] --- exec-maven-plugin:1.3.1:exec (pre-dist) @ hadoop-hdfs-native-client 
---
tar: ./*: Cannot stat: No such file or directory
tar: Exiting with failure status due to previous errors
Checking to bundle with:
bundleoption=false, liboption=snappy.lib, pattern=libsnappy. libdir=
Checking to bundle with:
bundleoption=false, liboption=zstd.lib, pattern=libzstd. libdir=
Checking to bundle with:
bundleoption=false, liboption=openssl.lib, pattern=libcrypto. libdir=
Checking to bundle with:
bundleoption=false, liboption=isal.lib, pattern=libisal. libdir=
Checking to bundle with:
bundleoption=, liboption=pmdk.lib, pattern=pmdk libdir=
Bundling bin files failed
{noformat}




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-15670) Testcase TestBalancer#testBalancerWithPinnedBlocks always fails

2020-12-02 Thread Masatake Iwasaki (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15670?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Masatake Iwasaki resolved HDFS-15670.
-
Resolution: Cannot Reproduce

I'm closing this as not reproducible. I guess it should be an environmental 
issue. Feel free to reopen if you have updates.

> Testcase TestBalancer#testBalancerWithPinnedBlocks always fails
> ---
>
> Key: HDFS-15670
> URL: https://issues.apache.org/jira/browse/HDFS-15670
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Affects Versions: 3.0.0-beta1
>Reporter: Jianfei Jiang
>Priority: Major
> Attachments: HADOOP-15108.000.patch
>
>
> When running testcases without any code changes, the function 
> testBalancerWithPinnedBlocks in TestBalancer.java never succeeded. I tried to 
> use Ubuntu 16.04 and redhat 7, maybe the failure is not related to various 
> linux environment. I am not sure if there is some bug in this case or I used 
> wrong environment and settings. Could anyone give some advice.
> ---
> Test set: org.apache.hadoop.hdfs.server.balancer.TestBalancer
> ---
> Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 100.389 sec 
> <<< FAILURE! - in org.apache.hadoop.hdfs.server.balancer.TestBalancer
> testBalancerWithPinnedBlocks(org.apache.hadoop.hdfs.server.balancer.TestBalancer)
>   Time elapsed: 100.134 sec  <<< ERROR!
> java.lang.Exception: test timed out after 10 milliseconds
>   at java.lang.Object.wait(Native Method)
>   at 
> org.apache.hadoop.hdfs.DataStreamer.waitForAckedSeqno(DataStreamer.java:903)
>   at 
> org.apache.hadoop.hdfs.DFSOutputStream.flushInternal(DFSOutputStream.java:773)
>   at 
> org.apache.hadoop.hdfs.DFSOutputStream.closeImpl(DFSOutputStream.java:870)
>   at 
> org.apache.hadoop.hdfs.DFSOutputStream.close(DFSOutputStream.java:842)
>   at 
> org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:72)
>   at 
> org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:101)
>   at org.apache.hadoop.hdfs.DFSTestUtil.createFile(DFSTestUtil.java:441)
>   at 
> org.apache.hadoop.hdfs.server.balancer.TestBalancer.testBalancerWithPinnedBlocks(TestBalancer.java:515)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15648) TestFileChecksum should be parameterized

2020-12-01 Thread Masatake Iwasaki (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15648?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Masatake Iwasaki updated HDFS-15648:

Summary: TestFileChecksum should be parameterized  (was: TestFileChecksum 
should be parameterized with a boolean flag)

> TestFileChecksum should be parameterized
> 
>
> Key: HDFS-15648
> URL: https://issues.apache.org/jira/browse/HDFS-15648
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Ahmed Hussein
>Assignee: Masatake Iwasaki
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> {{TestFileChecksumCompositeCrc}} extends {{TestFileChecksum}} overriding 3 
> methods that return a constant flag True/False.
> The class is useless and it causes confusion with two different jiras, while 
> the main bug should be in TestFileChecksum.
> The {{TestFileChecksum}} should be parameterized



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15702) Fix intermittent falilure of TestDecommission#testAllocAndIBRWhileDecommission

2020-12-01 Thread Masatake Iwasaki (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15702?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Masatake Iwasaki updated HDFS-15702:

Status: Patch Available  (was: Open)

> Fix intermittent falilure of TestDecommission#testAllocAndIBRWhileDecommission
> --
>
> Key: HDFS-15702
> URL: https://issues.apache.org/jira/browse/HDFS-15702
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Masatake Iwasaki
>Assignee: Masatake Iwasaki
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> {noformat}
> java.lang.AssertionError: expected: but was:
>   at org.junit.Assert.fail(Assert.java:88)
>   at org.junit.Assert.failNotEquals(Assert.java:834)
>   at org.junit.Assert.assertEquals(Assert.java:118)
>   at org.junit.Assert.assertEquals(Assert.java:144)
>   at 
> org.apache.hadoop.hdfs.TestDecommission.testAllocAndIBRWhileDecommission(TestDecommission.java:1025)
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-15702) Fix intermittent falilure of TestDecommission#testAllocAndIBRWhileDecommission

2020-11-30 Thread Masatake Iwasaki (Jira)
Masatake Iwasaki created HDFS-15702:
---

 Summary: Fix intermittent falilure of 
TestDecommission#testAllocAndIBRWhileDecommission
 Key: HDFS-15702
 URL: https://issues.apache.org/jira/browse/HDFS-15702
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Masatake Iwasaki
Assignee: Masatake Iwasaki


{noformat}
java.lang.AssertionError: expected: but was:
at org.junit.Assert.fail(Assert.java:88)
at org.junit.Assert.failNotEquals(Assert.java:834)
at org.junit.Assert.assertEquals(Assert.java:118)
at org.junit.Assert.assertEquals(Assert.java:144)
at 
org.apache.hadoop.hdfs.TestDecommission.testAllocAndIBRWhileDecommission(TestDecommission.java:1025)
{noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-15676) TestRouterRpcMultiDestination#testNamenodeMetrics fails on trunk

2020-11-30 Thread Masatake Iwasaki (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15676?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Masatake Iwasaki resolved HDFS-15676.
-
Resolution: Duplicate

I'm closing this as fixed by HDFS-15677. I will reopen this if the failure of 
testNameNodeMetrics alone rise again.

> TestRouterRpcMultiDestination#testNamenodeMetrics fails on trunk
> 
>
> Key: HDFS-15676
> URL: https://issues.apache.org/jira/browse/HDFS-15676
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Ahmed Hussein
>Assignee: Masatake Iwasaki
>Priority: Major
>
> qbt report (Nov 8, 2020, 11:28 AM) shows failures in testNamenodeMetrics



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15677) TestRouterRpcMultiDestination#testGetCachedDatanodeReport fails on trunk

2020-11-30 Thread Masatake Iwasaki (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15677?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Masatake Iwasaki updated HDFS-15677:

Fix Version/s: 3.4.0
 Hadoop Flags: Reviewed
   Resolution: Fixed
   Status: Resolved  (was: Patch Available)

> TestRouterRpcMultiDestination#testGetCachedDatanodeReport fails on trunk
> 
>
> Key: HDFS-15677
> URL: https://issues.apache.org/jira/browse/HDFS-15677
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Ahmed Hussein
>Assignee: Masatake Iwasaki
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> qbt report (Nov 8, 2020, 11:28 AM) shows failures in 
> testGetCachedDatanodeReport



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15677) TestRouterRpcMultiDestination#testGetCachedDatanodeReport fails on trunk

2020-11-30 Thread Masatake Iwasaki (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15677?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Masatake Iwasaki updated HDFS-15677:

Status: Patch Available  (was: Open)

> TestRouterRpcMultiDestination#testGetCachedDatanodeReport fails on trunk
> 
>
> Key: HDFS-15677
> URL: https://issues.apache.org/jira/browse/HDFS-15677
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Ahmed Hussein
>Assignee: Masatake Iwasaki
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> qbt report (Nov 8, 2020, 11:28 AM) shows failures in 
> testGetCachedDatanodeReport



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15676) TestRouterRpcMultiDestination#testNamenodeMetrics fails on trunk

2020-11-30 Thread Masatake Iwasaki (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15676?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17240554#comment-17240554
 ] 

Masatake Iwasaki commented on HDFS-15676:
-

The test fails only if I run both testNamenodeMetrics and 
testGetCachedDatanodeReport. Running 
{{-Dtest=testNamenodeMetrics#testNamenodeMetrics}} alone has no issue. Since 
the TestRouterRpcMultiDestination reuses a mini cluster in the test cases, 
{{testNamenodeMetrics}} could be fixed by addressing 
{{testGetCachedDatanodeReport}} issue.

{noformat}
$ for i in `seq 100` ; do echo $i && mvn test -DignoreTestFailure=false 
-Dtest=TestRouterRpcMultiDestination#testNamenodeMetrics,TestRouterRpcMultiDestination#testGetCachedDatanodeReport
  || break ; done
...
[ERROR] Failures:
[ERROR]   TestRouterRpcMultiDestination>TestRouterRpc.testNamenodeMetrics:1682 
expected:<12> but was:<11>
[ERROR] Errors:
[ERROR]   
TestRouterRpcMultiDestination>TestRouterRpc.testGetCachedDatanodeReport:1829 » 
Timeout
[INFO]
[ERROR] Tests run: 2, Failures: 1, Errors: 1, Skipped: 0
{noformat}


> TestRouterRpcMultiDestination#testNamenodeMetrics fails on trunk
> 
>
> Key: HDFS-15676
> URL: https://issues.apache.org/jira/browse/HDFS-15676
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Ahmed Hussein
>Assignee: Masatake Iwasaki
>Priority: Major
>
> qbt report (Nov 8, 2020, 11:28 AM) shows failures in testNamenodeMetrics



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Assigned] (HDFS-15677) TestRouterRpcMultiDestination#testGetCachedDatanodeReport fails on trunk

2020-11-30 Thread Masatake Iwasaki (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15677?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Masatake Iwasaki reassigned HDFS-15677:
---

Assignee: Masatake Iwasaki

> TestRouterRpcMultiDestination#testGetCachedDatanodeReport fails on trunk
> 
>
> Key: HDFS-15677
> URL: https://issues.apache.org/jira/browse/HDFS-15677
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Ahmed Hussein
>Assignee: Masatake Iwasaki
>Priority: Major
>
> qbt report (Nov 8, 2020, 11:28 AM) shows failures in 
> testGetCachedDatanodeReport



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Assigned] (HDFS-15676) TestRouterRpcMultiDestination#testNamenodeMetrics fails on trunk

2020-11-29 Thread Masatake Iwasaki (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15676?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Masatake Iwasaki reassigned HDFS-15676:
---

Assignee: Masatake Iwasaki

> TestRouterRpcMultiDestination#testNamenodeMetrics fails on trunk
> 
>
> Key: HDFS-15676
> URL: https://issues.apache.org/jira/browse/HDFS-15676
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Ahmed Hussein
>Assignee: Masatake Iwasaki
>Priority: Major
>
> qbt report (Nov 8, 2020, 11:28 AM) shows failures in testNamenodeMetrics



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15648) TestFileChecksum should be parameterized with a boolean flag

2020-11-29 Thread Masatake Iwasaki (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15648?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Masatake Iwasaki updated HDFS-15648:

Status: Patch Available  (was: Open)

> TestFileChecksum should be parameterized with a boolean flag
> 
>
> Key: HDFS-15648
> URL: https://issues.apache.org/jira/browse/HDFS-15648
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Ahmed Hussein
>Assignee: Masatake Iwasaki
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> {{TestFileChecksumCompositeCrc}} extends {{TestFileChecksum}} overriding 3 
> methods that return a constant flag True/False.
> The class is useless and it causes confusion with two different jiras, while 
> the main bug should be in TestFileChecksum.
> The {{TestFileChecksum}} should be parameterized



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Assigned] (HDFS-15648) TestFileChecksum should be parameterized with a boolean flag

2020-11-29 Thread Masatake Iwasaki (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15648?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Masatake Iwasaki reassigned HDFS-15648:
---

Assignee: Masatake Iwasaki

> TestFileChecksum should be parameterized with a boolean flag
> 
>
> Key: HDFS-15648
> URL: https://issues.apache.org/jira/browse/HDFS-15648
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Ahmed Hussein
>Assignee: Masatake Iwasaki
>Priority: Major
>
> {{TestFileChecksumCompositeCrc}} extends {{TestFileChecksum}} overriding 3 
> methods that return a constant flag True/False.
> The class is useless and it causes confusion with two different jiras, while 
> the main bug should be in TestFileChecksum.
> The {{TestFileChecksum}} should be parameterized



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-15296) TestBPOfferService#testMissBlocksWhenReregister is flaky

2020-11-23 Thread Masatake Iwasaki (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15296?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Masatake Iwasaki resolved HDFS-15296.
-
Resolution: Duplicate

I'm closing this as duplicate of HDFS-15654.

> TestBPOfferService#testMissBlocksWhenReregister is flaky
> 
>
> Key: HDFS-15296
> URL: https://issues.apache.org/jira/browse/HDFS-15296
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: test
>Affects Versions: 3.4.0
>Reporter: Mingliang Liu
>Priority: Major
>
> TestBPOfferService.testMissBlocksWhenReregister fails intermittently in 
> {{trunk}} branch, not sure about other branches. Example failures are
> - 
> https://builds.apache.org/job/hadoop-multibranch/job/PR-1964/4/testReport/org.apache.hadoop.hdfs.server.datanode/TestBPOfferService/testMissBlocksWhenReregister/
> - 
> https://builds.apache.org/job/PreCommit-HDFS-Build/29175/testReport/org.apache.hadoop.hdfs.server.datanode/TestBPOfferService/testMissBlocksWhenReregister/
> Sample exception stack is:
> {quote}
> Stacktrace
> java.lang.AssertionError
>   at org.junit.Assert.fail(Assert.java:86)
>   at org.junit.Assert.assertTrue(Assert.java:41)
>   at org.junit.Assert.assertTrue(Assert.java:52)
>   at 
> org.apache.hadoop.hdfs.server.datanode.TestBPOfferService.testMissBlocksWhenReregister(TestBPOfferService.java:350)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>   at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
>   at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
>   at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
>   at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
>   at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
>   at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
>   at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
>   at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:365)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:273)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:238)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:159)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:384)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:345)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.execute(ForkedBooter.java:126)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:418)
> {quote}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15672) TestBalancerWithMultipleNameNodes#testBalancingBlockpoolsWithBlockPoolPolicy fails on trunk

2020-11-19 Thread Masatake Iwasaki (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15672?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Masatake Iwasaki updated HDFS-15672:

Status: Patch Available  (was: Open)

> TestBalancerWithMultipleNameNodes#testBalancingBlockpoolsWithBlockPoolPolicy 
> fails on trunk
> ---
>
> Key: HDFS-15672
> URL: https://issues.apache.org/jira/browse/HDFS-15672
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Ahmed Hussein
>Assignee: Masatake Iwasaki
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> qbt report shows the following error:
> {code:bash}
> org.apache.hadoop.hdfs.server.balancer.TestBalancerWithMultipleNameNodes.testBalancingBlockpoolsWithBlockPoolPolicy
> Failing for the past 1 build (Since Failed#317 )
> Took 10 min.
> Error Message
> test timed out after 60 milliseconds
> Stacktrace
> org.junit.runners.model.TestTimedOutException: test timed out after 60 
> milliseconds
>   at java.lang.Thread.sleep(Native Method)
>   at 
> org.apache.hadoop.hdfs.server.balancer.TestBalancerWithMultipleNameNodes.sleep(TestBalancerWithMultipleNameNodes.java:353)
>   at 
> org.apache.hadoop.hdfs.server.balancer.TestBalancerWithMultipleNameNodes.wait(TestBalancerWithMultipleNameNodes.java:159)
>   at 
> org.apache.hadoop.hdfs.server.balancer.TestBalancerWithMultipleNameNodes.runBalancer(TestBalancerWithMultipleNameNodes.java:175)
>   at 
> org.apache.hadoop.hdfs.server.balancer.TestBalancerWithMultipleNameNodes.runTest(TestBalancerWithMultipleNameNodes.java:550)
>   at 
> org.apache.hadoop.hdfs.server.balancer.TestBalancerWithMultipleNameNodes.testBalancingBlockpoolsWithBlockPoolPolicy(TestBalancerWithMultipleNameNodes.java:609)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>   at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:298)
>   at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:292)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at java.lang.Thread.run(Thread.java:748)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15674) TestBPOfferService#testMissBlocksWhenReregister fails on trunk

2020-11-17 Thread Masatake Iwasaki (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15674?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Masatake Iwasaki updated HDFS-15674:

Fix Version/s: 3.4.0
 Hadoop Flags: Reviewed
   Resolution: Fixed
   Status: Resolved  (was: Patch Available)

> TestBPOfferService#testMissBlocksWhenReregister fails on trunk
> --
>
> Key: HDFS-15674
> URL: https://issues.apache.org/jira/browse/HDFS-15674
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Ahmed Hussein
>Assignee: Masatake Iwasaki
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> qbt report (Nov 8, 2020, 11:28 AM) shows failures timing out in 
> testMissBlocksWhenReregister 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Assigned] (HDFS-15672) TestBalancerWithMultipleNameNodes#testBalancingBlockpoolsWithBlockPoolPolicy fails on trunk

2020-11-17 Thread Masatake Iwasaki (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15672?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Masatake Iwasaki reassigned HDFS-15672:
---

Assignee: Masatake Iwasaki

> TestBalancerWithMultipleNameNodes#testBalancingBlockpoolsWithBlockPoolPolicy 
> fails on trunk
> ---
>
> Key: HDFS-15672
> URL: https://issues.apache.org/jira/browse/HDFS-15672
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Ahmed Hussein
>Assignee: Masatake Iwasaki
>Priority: Major
>
> qbt report shows the following error:
> {code:bash}
> org.apache.hadoop.hdfs.server.balancer.TestBalancerWithMultipleNameNodes.testBalancingBlockpoolsWithBlockPoolPolicy
> Failing for the past 1 build (Since Failed#317 )
> Took 10 min.
> Error Message
> test timed out after 60 milliseconds
> Stacktrace
> org.junit.runners.model.TestTimedOutException: test timed out after 60 
> milliseconds
>   at java.lang.Thread.sleep(Native Method)
>   at 
> org.apache.hadoop.hdfs.server.balancer.TestBalancerWithMultipleNameNodes.sleep(TestBalancerWithMultipleNameNodes.java:353)
>   at 
> org.apache.hadoop.hdfs.server.balancer.TestBalancerWithMultipleNameNodes.wait(TestBalancerWithMultipleNameNodes.java:159)
>   at 
> org.apache.hadoop.hdfs.server.balancer.TestBalancerWithMultipleNameNodes.runBalancer(TestBalancerWithMultipleNameNodes.java:175)
>   at 
> org.apache.hadoop.hdfs.server.balancer.TestBalancerWithMultipleNameNodes.runTest(TestBalancerWithMultipleNameNodes.java:550)
>   at 
> org.apache.hadoop.hdfs.server.balancer.TestBalancerWithMultipleNameNodes.testBalancingBlockpoolsWithBlockPoolPolicy(TestBalancerWithMultipleNameNodes.java:609)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>   at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:298)
>   at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:292)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at java.lang.Thread.run(Thread.java:748)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15674) TestBPOfferService#testMissBlocksWhenReregister fails on trunk

2020-11-17 Thread Masatake Iwasaki (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17233582#comment-17233582
 ] 

Masatake Iwasaki commented on HDFS-15674:
-

The test intermittently fails if count of reported block in IBRs does not 
reached to expected value(4000).

{noformat}
[ERROR] Failures:
[ERROR]   TestBPOfferService.testMissBlocksWhenReregister:341 Timed out wait 
for IBR counts FBRCount = 2441, IBRCount = 3999; expected = 4000. Exception: 
Timed out waiting for condition. Thread diagnostics:
Timestamp: 2020-11-16 12:45:57,425
{noformat}

This waiting condition does not make sense because pending IBRs are possible to 
be deleted by {{clearIBRs()}} in {{BPServiceActor#reRegister}}. It is ok 
because succeeding full block report covers them based on the discussion of 
HDFS-15113.

The tests should check that all blocks are reported rather than counting number 
of reported blocks in FBR/IBR independently.

I submitted [PR #2467|https://github.com/apache/hadoop/pull/2467] for this.

> TestBPOfferService#testMissBlocksWhenReregister fails on trunk
> --
>
> Key: HDFS-15674
> URL: https://issues.apache.org/jira/browse/HDFS-15674
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Ahmed Hussein
>Assignee: Masatake Iwasaki
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> qbt report (Nov 8, 2020, 11:28 AM) shows failures timing out in 
> testMissBlocksWhenReregister 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15674) TestBPOfferService#testMissBlocksWhenReregister fails on trunk

2020-11-17 Thread Masatake Iwasaki (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15674?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Masatake Iwasaki updated HDFS-15674:

Status: Patch Available  (was: Open)

> TestBPOfferService#testMissBlocksWhenReregister fails on trunk
> --
>
> Key: HDFS-15674
> URL: https://issues.apache.org/jira/browse/HDFS-15674
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Ahmed Hussein
>Assignee: Masatake Iwasaki
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> qbt report (Nov 8, 2020, 11:28 AM) shows failures timing out in 
> testMissBlocksWhenReregister 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Assigned] (HDFS-15674) TestBPOfferService#testMissBlocksWhenReregister fails on trunk

2020-11-16 Thread Masatake Iwasaki (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15674?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Masatake Iwasaki reassigned HDFS-15674:
---

Assignee: Masatake Iwasaki

> TestBPOfferService#testMissBlocksWhenReregister fails on trunk
> --
>
> Key: HDFS-15674
> URL: https://issues.apache.org/jira/browse/HDFS-15674
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Ahmed Hussein
>Assignee: Masatake Iwasaki
>Priority: Major
>
> qbt report (Nov 8, 2020, 11:28 AM) shows failures timing out in 
> testMissBlocksWhenReregister 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15296) TestBPOfferService#testMissBlocksWhenReregister is flaky

2020-11-15 Thread Masatake Iwasaki (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17232468#comment-17232468
 ] 

Masatake Iwasaki commented on HDFS-15296:
-

looks like duplicate of HDFS-15654 and HDFS-15674 is the follow-up

> TestBPOfferService#testMissBlocksWhenReregister is flaky
> 
>
> Key: HDFS-15296
> URL: https://issues.apache.org/jira/browse/HDFS-15296
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: test
>Affects Versions: 3.4.0
>Reporter: Mingliang Liu
>Priority: Major
>
> TestBPOfferService.testMissBlocksWhenReregister fails intermittently in 
> {{trunk}} branch, not sure about other branches. Example failures are
> - 
> https://builds.apache.org/job/hadoop-multibranch/job/PR-1964/4/testReport/org.apache.hadoop.hdfs.server.datanode/TestBPOfferService/testMissBlocksWhenReregister/
> - 
> https://builds.apache.org/job/PreCommit-HDFS-Build/29175/testReport/org.apache.hadoop.hdfs.server.datanode/TestBPOfferService/testMissBlocksWhenReregister/
> Sample exception stack is:
> {quote}
> Stacktrace
> java.lang.AssertionError
>   at org.junit.Assert.fail(Assert.java:86)
>   at org.junit.Assert.assertTrue(Assert.java:41)
>   at org.junit.Assert.assertTrue(Assert.java:52)
>   at 
> org.apache.hadoop.hdfs.server.datanode.TestBPOfferService.testMissBlocksWhenReregister(TestBPOfferService.java:350)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>   at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
>   at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
>   at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
>   at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
>   at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
>   at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
>   at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
>   at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:365)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:273)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:238)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:159)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:384)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:345)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.execute(ForkedBooter.java:126)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:418)
> {quote}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15567) [SBN Read] HDFS should expose msync() API to allow downstream applications call it explicetly.

2020-09-12 Thread Masatake Iwasaki (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17194711#comment-17194711
 ] 

Masatake Iwasaki commented on HDFS-15567:
-

I updated the target version.

> [SBN Read] HDFS should expose msync() API to allow downstream applications 
> call it explicetly.
> --
>
> Key: HDFS-15567
> URL: https://issues.apache.org/jira/browse/HDFS-15567
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: ha, hdfs-client
>Reporter: Konstantin Shvachko
>Priority: Major
>
> Consistent reads from Standby introduced {{msync()}} API HDFS-13688, which 
> updates client's state ID with current state of the Active NameNode to 
> guarantee consistency of subsequent calls to an ObserverNode. Currently this 
> API is exposed via {{DFSClient}} only, which makes it hard for applications 
> to access {{msync()}}. One way is to use something like this:
> {code}
> if(fs instanceof DistributedFileSystem) {
>   ((DistributedFileSystem)fs).getClient().msync();
> }
> {code}
> This should be exposed both for {{FileSystem}} and {{FileContext}}.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15567) [SBN Read] HDFS should expose msync() API to allow downstream applications call it explicetly.

2020-09-12 Thread Masatake Iwasaki (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15567?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Masatake Iwasaki updated HDFS-15567:

Target Version/s: 2.10.2  (was: 2.10.1)

> [SBN Read] HDFS should expose msync() API to allow downstream applications 
> call it explicetly.
> --
>
> Key: HDFS-15567
> URL: https://issues.apache.org/jira/browse/HDFS-15567
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: ha, hdfs-client
>Reporter: Konstantin Shvachko
>Priority: Major
>
> Consistent reads from Standby introduced {{msync()}} API HDFS-13688, which 
> updates client's state ID with current state of the Active NameNode to 
> guarantee consistency of subsequent calls to an ObserverNode. Currently this 
> API is exposed via {{DFSClient}} only, which makes it hard for applications 
> to access {{msync()}}. One way is to use something like this:
> {code}
> if(fs instanceof DistributedFileSystem) {
>   ((DistributedFileSystem)fs).getClient().msync();
> }
> {code}
> This should be exposed both for {{FileSystem}} and {{FileContext}}.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



  1   2   3   4   5   6   7   8   >