Re: Recommended way of using hadoop-minicluster für unit testing?
Hi Richard, Thanx for sharing the steps to reproduce the issue. I cloned the Apache Storm repo and was able to repro the issue. The build was indeed failing due to missing classes. Spent some time to debug the issue, might not be very right (no experience with Storm), There are Two ways to get this going *First Approach: If we want to use the shaded classes* 1. I think the artifact to be used for minicluster should be `hadoop-client-minicluster`, even spark uses the same [1], the one which you are using is `hadoop-minicluster`, which in its own is empty ``` ayushsaxena@ayushsaxena ~ % jar tf /Users/ayushsaxena/.m2/repository/org/apache/hadoop/hadoop-minicluster/3.3.6/hadoop-minicluster-3.3.6.jar | grep .class ayushsaxena@ayushsaxena ~ % ``` It just defines artifacts which are to be used by `hadoop-client-minicluster` and this jar has that shading and stuff, using `hadoop-minicluster` is like adding the hadoop dependencies into the pom transitively, without any shading or so, which tends to conflict with `hadoop-client-api` and `hadoop-client-runtime` jars, which uses the shaded classes. 2. Once you change `hadoop-minicluster` to `hadoop-client-minicluster`, still the tests won't pass, the reason being the `storm-autocreds` dependency which pulls in the hadoop jars via `hbase-client` & `hive-exec`, So, we need to exclude them as well 3. I reverted your classpath hack, changed the jar, & excluded the dependencies from storm-autocreds & ran the storm-hdfs tests & all the tests passed, which were failing initially without any code change ``` [INFO] Results: [INFO] [INFO] Tests run: 57, Failures: 0, Errors: 0, Skipped: 0 [INFO] [INFO] [INFO] BUILD SUCCESS [INFO] ``` 4. Putting the code diff here might make this mail unreadable, so I am sharing the link to the commit which fixed Storm for me here [2], let me know if it has any access issues, I will put the diff on the mail itself in text form. *Second Approach: If we don't want to use the shaded classes* 1. The `hadoop-client-api` & the` hadoop-client-runtime` jars uses shading which tends to conflict with your non shaded `hadoop-minicluster`, Rather than using these jars use the `hadoop-client` jar 2. I removed your hack & changed those two jars with `hadoop-client` jar & the storm-hdfs tests passes 3. I am sharing the link to the commit in my fork, it is here at [3], one advantage is, you don't have to change your existing jar nor you would need to add those exclusions in the `storm-cred` dependency. ++ Adding common-dev, in case any fellow developers with more experience around using the hadoop-client jars can help, if things still don't work or Storm needs something more. The downstream projects which I have experience with don't use these jars (which they should ideally) :-) -Ayush [1] https://github.com/apache/spark/blob/master/pom.xml#L1382 [2] https://github.com/ayushtkn/storm/commit/e0cd8e21201e01d6d0e1f3ac1bc5ada8354436e6 [3] https://github.com/apache/storm/commit/fb5acdedd617de65e494c768b6ae4bab9b3f7ac8 On Fri, 12 Apr 2024 at 10:41, Richard Zowalla wrote: > Hi, > > thanks for the fast reply. The PR is here [1]. > > It works, if I exclude the client-api and client-api-runtime from being > scanned in surefire, which is a hacky workaround for the actual issue. > > The hadoop-commons jar is a transient dependency of the minicluster, which > is used for testing. > > Debugging the situation shows, that HttpServer2 is in the same package in > hadoop-commons as well as in the client-api but with differences in methods > / classes used, so depending on the classpath order the wrong class is > loaded. > > Stacktraces are in the first GH Action run.here: [1]. > > A reproducer would be to check out Storm, go to storm-hdfs and remove the > exclusion in [2] and run the tests in that module, which will fail due to a > missing jetty server class (as the HTTPServer2 class is loaded from > client-api instead of minicluster). > > Gruß & Thx > Richard > > [1] https://github.com/apache/storm/pull/3637 > [2] > https://github.com/apache/storm/blob/e44f72767370d10a682446f8f36b75242040f675/external/storm-hdfs/pom.xml#L120 > > On 2024/04/11 21:29:13 Ayush Saxena wrote: > > Hi Richard, > > I am not able to decode the issue properly here, It would have been > > better if you shared the PR or the failure trace as well. > > QQ: Why are you having hadoop-common as an explicit dependency? Those > > hadoop-common stuff should be there in hadoop-client-api > > I quickly checked once on the 3.4.0 release and I think it does have > them. > > > > ``` > > ayushsaxena@ayushsaxena client % jar tf hadoop-client-api-3.4.0.jar | > > grep org/apache/hadoop/fs/FileSystem.class > > org/apache/hadoop/fs/FileSystem.class > > `` > > > > You didn't mention which shaded classes are being reported as > > missing... I think spark uses
Apache Hadoop qbt Report: trunk+JDK8 on Linux/x86_64
For more details, see https://ci-hadoop.apache.org/job/hadoop-qbt-trunk-java8-linux-x86_64/1557/ [Apr 11, 2024, 10:04:57 AM] (github) HDFS-17455. Fix Client throw IndexOutOfBoundsException in DFSInputStream#fetchBlockAt (#6710). Contributed by Haiyang Hu. [Apr 11, 2024, 6:38:15 PM] (github) HADOOP-19079. HttpExceptionUtils to verify that loaded class is really an exception before instantiation (#6557) -1 overall The following subsystems voted -1: blanks hadolint pathlen spotbugs xml The following subsystems voted -1 but were configured to be filtered/ignored: cc checkstyle javac javadoc pylint shellcheck The following subsystems are considered long running: (runtime bigger than 1h 0m 0s) unit Specific tests: XML : Parsing Error(s): hadoop-common-project/hadoop-common/src/test/resources/xml/external-dtd.xml hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/resources/nvidia-smi-output-excerpt.xml hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/resources/nvidia-smi-output-missing-tags.xml hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/resources/nvidia-smi-output-missing-tags2.xml hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/resources/nvidia-smi-sample-output.xml hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/resources/fair-scheduler-invalid.xml hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/resources/yarn-site-with-invalid-allocation-file-ref.xml spotbugs : module:hadoop-hdfs-project/hadoop-hdfs-client Redundant nullcheck of sockStreamList, which is known to be non-null in org.apache.hadoop.hdfs.PeerCache.getInternal(DatanodeID, boolean) Redundant null check at PeerCache.java:is known to be non-null in org.apache.hadoop.hdfs.PeerCache.getInternal(DatanodeID, boolean) Redundant null check at PeerCache.java:[line 158] spotbugs : module:hadoop-hdfs-project/hadoop-hdfs-httpfs Redundant nullcheck of xAttrs, which is known to be non-null in org.apache.hadoop.fs.http.client.HttpFSFileSystem.getXAttr(Path, String) Redundant null check at HttpFSFileSystem.java:is known to be non-null in org.apache.hadoop.fs.http.client.HttpFSFileSystem.getXAttr(Path, String) Redundant null check at HttpFSFileSystem.java:[line 1373] spotbugs : module:hadoop-yarn-project/hadoop-yarn org.apache.hadoop.yarn.service.ServiceScheduler$1.load(ConfigFile) may return null, but is declared @Nonnull At ServiceScheduler.java:is declared @Nonnull At ServiceScheduler.java:[line 555] spotbugs : module:hadoop-hdfs-project/hadoop-hdfs-rbf Redundant nullcheck of dns, which is known to be non-null in org.apache.hadoop.hdfs.server.federation.router.RouterRpcServer.getCachedDatanodeReport(HdfsConstants$DatanodeReportType) Redundant null check at RouterRpcServer.java:is known to be non-null in org.apache.hadoop.hdfs.server.federation.router.RouterRpcServer.getCachedDatanodeReport(HdfsConstants$DatanodeReportType) Redundant null check at RouterRpcServer.java:[line 1093] spotbugs : module:hadoop-hdfs-project Redundant nullcheck of xAttrs, which is known to be non-null in org.apache.hadoop.fs.http.client.HttpFSFileSystem.getXAttr(Path, String) Redundant null check at HttpFSFileSystem.java:is known to be non-null in org.apache.hadoop.fs.http.client.HttpFSFileSystem.getXAttr(Path, String) Redundant null check at HttpFSFileSystem.java:[line 1373] Redundant nullcheck of sockStreamList, which is known to be non-null in org.apache.hadoop.hdfs.PeerCache.getInternal(DatanodeID, boolean) Redundant null check at PeerCache.java:is known to be non-null in org.apache.hadoop.hdfs.PeerCache.getInternal(DatanodeID, boolean) Redundant null check at PeerCache.java:[line 158] Redundant nullcheck of dns, which is known to be non-null in org.apache.hadoop.hdfs.server.federation.router.RouterRpcServer.getCachedDatanodeReport(HdfsConstants$DatanodeReportType) Redundant null check at RouterRpcServer.java:is known to be non-null in org.apache.hadoop.hdfs.server.federation.router.RouterRpcServer.getCachedDatanodeReport(HdfsConstants$DatanodeReportType) Redundant null check at RouterRpcServer.java:[line 1093] spotbugs : module:hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications org.apache.hadoop.yarn.service.ServiceScheduler$1.load(ConfigFile) may return null, but is declared @Nonnull At ServiceScheduler.java:is declared @Nonnull At ServiceScheduler.java:[line 555] spotbugs : module:hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-services
Apache Hadoop qbt Report: branch-2.10+JDK7 on Linux/x86_64
For more details, see https://ci-hadoop.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86_64/1360/ No changes -1 overall The following subsystems voted -1: asflicense hadolint mvnsite pathlen unit The following subsystems voted -1 but were configured to be filtered/ignored: cc checkstyle javac javadoc pylint shellcheck whitespace The following subsystems are considered long running: (runtime bigger than 1h 0m 0s) unit Specific tests: Failed junit tests : hadoop.fs.TestFileUtil hadoop.contrib.bkjournal.TestBookKeeperHACheckpoints hadoop.hdfs.server.blockmanagement.TestReplicationPolicyWithUpgradeDomain hadoop.hdfs.server.namenode.snapshot.TestSnapshotDeletion hadoop.hdfs.TestFileLengthOnClusterRestart hadoop.hdfs.TestDFSInotifyEventInputStream hadoop.hdfs.server.namenode.snapshot.TestSnapshotBlocksMap hadoop.hdfs.qjournal.server.TestJournalNodeRespectsBindHostKeys hadoop.hdfs.server.federation.router.TestRouterQuota hadoop.hdfs.server.federation.router.TestRouterNamenodeHeartbeat hadoop.hdfs.server.federation.resolver.order.TestLocalResolver hadoop.hdfs.server.federation.resolver.TestMultipleDestinationResolver hadoop.contrib.bkjournal.TestBookKeeperHACheckpoints hadoop.mapreduce.lib.input.TestLineRecordReader hadoop.mapred.TestLineRecordReader hadoop.mapreduce.jobhistory.TestHistoryViewerPrinter hadoop.resourceestimator.service.TestResourceEstimatorService hadoop.resourceestimator.solver.impl.TestLpSolver hadoop.yarn.sls.TestSLSRunner hadoop.yarn.server.nodemanager.containermanager.linux.resources.TestNumaResourceAllocator hadoop.yarn.server.nodemanager.containermanager.linux.resources.TestNumaResourceHandlerImpl hadoop.yarn.server.resourcemanager.TestClientRMService hadoop.yarn.server.resourcemanager.recovery.TestFSRMStateStore hadoop.yarn.server.resourcemanager.monitor.invariants.TestMetricsInvariantChecker cc: https://ci-hadoop.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86_64/1360/artifact/out/diff-compile-cc-root.txt [4.0K] javac: https://ci-hadoop.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86_64/1360/artifact/out/diff-compile-javac-root.txt [488K] checkstyle: https://ci-hadoop.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86_64/1360/artifact/out/diff-checkstyle-root.txt [14M] hadolint: https://ci-hadoop.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86_64/1360/artifact/out/diff-patch-hadolint.txt [4.0K] mvnsite: https://ci-hadoop.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86_64/1360/artifact/out/patch-mvnsite-root.txt [568K] pathlen: https://ci-hadoop.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86_64/1360/artifact/out/pathlen.txt [12K] pylint: https://ci-hadoop.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86_64/1360/artifact/out/diff-patch-pylint.txt [20K] shellcheck: https://ci-hadoop.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86_64/1360/artifact/out/diff-patch-shellcheck.txt [72K] whitespace: https://ci-hadoop.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86_64/1360/artifact/out/whitespace-eol.txt [12M] https://ci-hadoop.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86_64/1360/artifact/out/whitespace-tabs.txt [1.3M] javadoc: https://ci-hadoop.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86_64/1360/artifact/out/patch-javadoc-root.txt [36K] unit: https://ci-hadoop.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86_64/1360/artifact/out/patch-unit-hadoop-common-project_hadoop-common.txt [220K] https://ci-hadoop.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86_64/1360/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt [456K] https://ci-hadoop.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86_64/1360/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs-rbf.txt [36K] https://ci-hadoop.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86_64/1360/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs_src_contrib_bkjournal.txt [16K] https://ci-hadoop.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86_64/1360/artifact/out/patch-unit-hadoop-mapreduce-project_hadoop-mapreduce-client_hadoop-mapreduce-client-core.txt [104K] https://ci-hadoop.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86_64/1360/artifact/out/patch-unit-hadoop-tools_hadoop-azure.txt [20K] https://ci-hadoop.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86_64/1360/artifact/out/patch-unit-hadoop-tools_hadoop-resourceestimator.txt [16K]