Re: Recommended way of using hadoop-minicluster für unit testing?

2024-04-12 Thread Ayush Saxena
Hi Richard,
Thanx for sharing the steps to reproduce the issue. I cloned the Apache
Storm repo and was able to repro the issue. The build was indeed failing
due to missing classes.

Spent some time to debug the issue, might not be very right (no
experience with Storm), There are Two ways to get this going

*First Approach: If we want to use the shaded classes*

1. I think the artifact to be used for minicluster should be
`hadoop-client-minicluster`, even spark uses the same [1], the one which
you are using is `hadoop-minicluster`, which in its own is empty
```
ayushsaxena@ayushsaxena ~ %  jar tf
/Users/ayushsaxena/.m2/repository/org/apache/hadoop/hadoop-minicluster/3.3.6/hadoop-minicluster-3.3.6.jar
 | grep .class
ayushsaxena@ayushsaxena ~ %
```

It just defines artifacts which are to be used by
`hadoop-client-minicluster` and this jar has that shading and stuff, using
`hadoop-minicluster` is like adding the hadoop dependencies into the pom
transitively, without any shading or so, which tends to conflict with
`hadoop-client-api` and `hadoop-client-runtime` jars, which uses the shaded
classes.

2. Once you change `hadoop-minicluster` to `hadoop-client-minicluster`,
still the tests won't pass, the reason being the `storm-autocreds`
dependency which pulls in the hadoop jars via `hbase-client` & `hive-exec`,
So, we need to exclude them as well

3. I reverted your classpath hack, changed the jar, & excluded the
dependencies from storm-autocreds & ran the storm-hdfs tests & all the
tests passed, which were failing initially without any code change
```
[INFO] Results:
[INFO]
[INFO] Tests run: 57, Failures: 0, Errors: 0, Skipped: 0
[INFO]
[INFO]

[INFO] BUILD SUCCESS
[INFO]

```

4. Putting the code diff here might make this mail unreadable, so I am
sharing the link to the commit which fixed Storm for me here [2], let me
know if it has any access issues, I will put the diff on the mail itself in
text form.

*Second Approach: If we don't want to use the shaded classes*

1. The `hadoop-client-api` & the` hadoop-client-runtime` jars uses shading
which tends to conflict with your non shaded `hadoop-minicluster`, Rather
than using these jars use the `hadoop-client` jar

2. I removed your hack & changed those two jars with `hadoop-client` jar &
the storm-hdfs tests passes

3. I am sharing the link to the commit in my fork, it is here at [3], one
advantage is, you don't have to change your existing jar nor you would need
to add those exclusions in the `storm-cred` dependency.

++ Adding common-dev, in case any fellow developers with more
experience around using the hadoop-client jars can help, if things still
don't work or Storm needs something more. The downstream projects which I
have experience with don't use these jars (which they should ideally) :-)

-Ayush


[1] https://github.com/apache/spark/blob/master/pom.xml#L1382
[2]
https://github.com/ayushtkn/storm/commit/e0cd8e21201e01d6d0e1f3ac1bc5ada8354436e6
[3]
https://github.com/apache/storm/commit/fb5acdedd617de65e494c768b6ae4bab9b3f7ac8


On Fri, 12 Apr 2024 at 10:41, Richard Zowalla  wrote:

> Hi,
>
> thanks for the fast reply. The PR is here [1].
>
> It works, if I exclude the client-api and client-api-runtime from being
> scanned in surefire, which is a hacky workaround for the actual issue.
>
> The hadoop-commons jar is a transient dependency of the minicluster, which
> is used for testing.
>
> Debugging the situation shows, that HttpServer2  is in the same package in
> hadoop-commons as well as in the client-api but with differences in methods
> / classes used, so depending on the classpath order the wrong class is
> loaded.
>
> Stacktraces are in the first GH Action run.here: [1].
>
> A reproducer would be to check out Storm, go to storm-hdfs and remove the
> exclusion in [2] and run the tests in that module, which will fail due to a
> missing jetty server class (as the HTTPServer2 class is loaded from
> client-api instead of minicluster).
>
> Gruß & Thx
> Richard
>
> [1] https://github.com/apache/storm/pull/3637
> [2]
> https://github.com/apache/storm/blob/e44f72767370d10a682446f8f36b75242040f675/external/storm-hdfs/pom.xml#L120
>
> On 2024/04/11 21:29:13 Ayush Saxena wrote:
> > Hi Richard,
> > I am not able to decode the issue properly here, It would have been
> > better if you shared the PR or the failure trace as well.
> > QQ: Why are you having hadoop-common as an explicit dependency? Those
> > hadoop-common stuff should be there in hadoop-client-api
> > I quickly checked once on the 3.4.0 release and I think it does have
> them.
> >
> > ```
> > ayushsaxena@ayushsaxena client % jar tf hadoop-client-api-3.4.0.jar |
> > grep org/apache/hadoop/fs/FileSystem.class
> > org/apache/hadoop/fs/FileSystem.class
> > ``
> >
> > You didn't mention which shaded classes are being reported as
> > missing... I think spark uses 

Apache Hadoop qbt Report: trunk+JDK8 on Linux/x86_64

2024-04-12 Thread Apache Jenkins Server
For more details, see 
https://ci-hadoop.apache.org/job/hadoop-qbt-trunk-java8-linux-x86_64/1557/

[Apr 11, 2024, 10:04:57 AM] (github) HDFS-17455. Fix Client throw 
IndexOutOfBoundsException in DFSInputStream#fetchBlockAt (#6710). Contributed 
by Haiyang Hu.
[Apr 11, 2024, 6:38:15 PM] (github) HADOOP-19079. HttpExceptionUtils to verify 
that loaded class is really an exception before instantiation (#6557)




-1 overall


The following subsystems voted -1:
blanks hadolint pathlen spotbugs xml


The following subsystems voted -1 but
were configured to be filtered/ignored:
cc checkstyle javac javadoc pylint shellcheck


The following subsystems are considered long running:
(runtime bigger than 1h  0m  0s)
unit


Specific tests:

XML :

   Parsing Error(s): 
   
hadoop-common-project/hadoop-common/src/test/resources/xml/external-dtd.xml 
   
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/resources/nvidia-smi-output-excerpt.xml
 
   
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/resources/nvidia-smi-output-missing-tags.xml
 
   
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/resources/nvidia-smi-output-missing-tags2.xml
 
   
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/resources/nvidia-smi-sample-output.xml
 
   
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/resources/fair-scheduler-invalid.xml
 
   
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/resources/yarn-site-with-invalid-allocation-file-ref.xml
 

spotbugs :

   module:hadoop-hdfs-project/hadoop-hdfs-client 
   Redundant nullcheck of sockStreamList, which is known to be non-null in 
org.apache.hadoop.hdfs.PeerCache.getInternal(DatanodeID, boolean) Redundant 
null check at PeerCache.java:is known to be non-null in 
org.apache.hadoop.hdfs.PeerCache.getInternal(DatanodeID, boolean) Redundant 
null check at PeerCache.java:[line 158] 

spotbugs :

   module:hadoop-hdfs-project/hadoop-hdfs-httpfs 
   Redundant nullcheck of xAttrs, which is known to be non-null in 
org.apache.hadoop.fs.http.client.HttpFSFileSystem.getXAttr(Path, String) 
Redundant null check at HttpFSFileSystem.java:is known to be non-null in 
org.apache.hadoop.fs.http.client.HttpFSFileSystem.getXAttr(Path, String) 
Redundant null check at HttpFSFileSystem.java:[line 1373] 

spotbugs :

   module:hadoop-yarn-project/hadoop-yarn 
   org.apache.hadoop.yarn.service.ServiceScheduler$1.load(ConfigFile) may 
return null, but is declared @Nonnull At ServiceScheduler.java:is declared 
@Nonnull At ServiceScheduler.java:[line 555] 

spotbugs :

   module:hadoop-hdfs-project/hadoop-hdfs-rbf 
   Redundant nullcheck of dns, which is known to be non-null in 
org.apache.hadoop.hdfs.server.federation.router.RouterRpcServer.getCachedDatanodeReport(HdfsConstants$DatanodeReportType)
 Redundant null check at RouterRpcServer.java:is known to be non-null in 
org.apache.hadoop.hdfs.server.federation.router.RouterRpcServer.getCachedDatanodeReport(HdfsConstants$DatanodeReportType)
 Redundant null check at RouterRpcServer.java:[line 1093] 

spotbugs :

   module:hadoop-hdfs-project 
   Redundant nullcheck of xAttrs, which is known to be non-null in 
org.apache.hadoop.fs.http.client.HttpFSFileSystem.getXAttr(Path, String) 
Redundant null check at HttpFSFileSystem.java:is known to be non-null in 
org.apache.hadoop.fs.http.client.HttpFSFileSystem.getXAttr(Path, String) 
Redundant null check at HttpFSFileSystem.java:[line 1373] 
   Redundant nullcheck of sockStreamList, which is known to be non-null in 
org.apache.hadoop.hdfs.PeerCache.getInternal(DatanodeID, boolean) Redundant 
null check at PeerCache.java:is known to be non-null in 
org.apache.hadoop.hdfs.PeerCache.getInternal(DatanodeID, boolean) Redundant 
null check at PeerCache.java:[line 158] 
   Redundant nullcheck of dns, which is known to be non-null in 
org.apache.hadoop.hdfs.server.federation.router.RouterRpcServer.getCachedDatanodeReport(HdfsConstants$DatanodeReportType)
 Redundant null check at RouterRpcServer.java:is known to be non-null in 
org.apache.hadoop.hdfs.server.federation.router.RouterRpcServer.getCachedDatanodeReport(HdfsConstants$DatanodeReportType)
 Redundant null check at RouterRpcServer.java:[line 1093] 

spotbugs :

   module:hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications 
   org.apache.hadoop.yarn.service.ServiceScheduler$1.load(ConfigFile) may 
return null, but is declared @Nonnull At ServiceScheduler.java:is declared 
@Nonnull At ServiceScheduler.java:[line 555] 

spotbugs :

   
module:hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-services
 
   

Apache Hadoop qbt Report: branch-2.10+JDK7 on Linux/x86_64

2024-04-12 Thread Apache Jenkins Server
For more details, see 
https://ci-hadoop.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86_64/1360/

No changes




-1 overall


The following subsystems voted -1:
asflicense hadolint mvnsite pathlen unit


The following subsystems voted -1 but
were configured to be filtered/ignored:
cc checkstyle javac javadoc pylint shellcheck whitespace


The following subsystems are considered long running:
(runtime bigger than 1h  0m  0s)
unit


Specific tests:

Failed junit tests :

   hadoop.fs.TestFileUtil 
   hadoop.contrib.bkjournal.TestBookKeeperHACheckpoints 
   
hadoop.hdfs.server.blockmanagement.TestReplicationPolicyWithUpgradeDomain 
   hadoop.hdfs.server.namenode.snapshot.TestSnapshotDeletion 
   hadoop.hdfs.TestFileLengthOnClusterRestart 
   hadoop.hdfs.TestDFSInotifyEventInputStream 
   hadoop.hdfs.server.namenode.snapshot.TestSnapshotBlocksMap 
   hadoop.hdfs.qjournal.server.TestJournalNodeRespectsBindHostKeys 
   hadoop.hdfs.server.federation.router.TestRouterQuota 
   hadoop.hdfs.server.federation.router.TestRouterNamenodeHeartbeat 
   hadoop.hdfs.server.federation.resolver.order.TestLocalResolver 
   hadoop.hdfs.server.federation.resolver.TestMultipleDestinationResolver 
   hadoop.contrib.bkjournal.TestBookKeeperHACheckpoints 
   hadoop.mapreduce.lib.input.TestLineRecordReader 
   hadoop.mapred.TestLineRecordReader 
   hadoop.mapreduce.jobhistory.TestHistoryViewerPrinter 
   hadoop.resourceestimator.service.TestResourceEstimatorService 
   hadoop.resourceestimator.solver.impl.TestLpSolver 
   hadoop.yarn.sls.TestSLSRunner 
   
hadoop.yarn.server.nodemanager.containermanager.linux.resources.TestNumaResourceAllocator
 
   
hadoop.yarn.server.nodemanager.containermanager.linux.resources.TestNumaResourceHandlerImpl
 
   hadoop.yarn.server.resourcemanager.TestClientRMService 
   hadoop.yarn.server.resourcemanager.recovery.TestFSRMStateStore 
   
hadoop.yarn.server.resourcemanager.monitor.invariants.TestMetricsInvariantChecker
 
  

   cc:

   
https://ci-hadoop.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86_64/1360/artifact/out/diff-compile-cc-root.txt
  [4.0K]

   javac:

   
https://ci-hadoop.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86_64/1360/artifact/out/diff-compile-javac-root.txt
  [488K]

   checkstyle:

   
https://ci-hadoop.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86_64/1360/artifact/out/diff-checkstyle-root.txt
  [14M]

   hadolint:

   
https://ci-hadoop.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86_64/1360/artifact/out/diff-patch-hadolint.txt
  [4.0K]

   mvnsite:

   
https://ci-hadoop.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86_64/1360/artifact/out/patch-mvnsite-root.txt
  [568K]

   pathlen:

   
https://ci-hadoop.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86_64/1360/artifact/out/pathlen.txt
  [12K]

   pylint:

   
https://ci-hadoop.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86_64/1360/artifact/out/diff-patch-pylint.txt
  [20K]

   shellcheck:

   
https://ci-hadoop.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86_64/1360/artifact/out/diff-patch-shellcheck.txt
  [72K]

   whitespace:

   
https://ci-hadoop.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86_64/1360/artifact/out/whitespace-eol.txt
  [12M]
   
https://ci-hadoop.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86_64/1360/artifact/out/whitespace-tabs.txt
  [1.3M]

   javadoc:

   
https://ci-hadoop.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86_64/1360/artifact/out/patch-javadoc-root.txt
  [36K]

   unit:

   
https://ci-hadoop.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86_64/1360/artifact/out/patch-unit-hadoop-common-project_hadoop-common.txt
  [220K]
   
https://ci-hadoop.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86_64/1360/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt
  [456K]
   
https://ci-hadoop.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86_64/1360/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs-rbf.txt
  [36K]
   
https://ci-hadoop.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86_64/1360/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs_src_contrib_bkjournal.txt
  [16K]
   
https://ci-hadoop.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86_64/1360/artifact/out/patch-unit-hadoop-mapreduce-project_hadoop-mapreduce-client_hadoop-mapreduce-client-core.txt
  [104K]
   
https://ci-hadoop.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86_64/1360/artifact/out/patch-unit-hadoop-tools_hadoop-azure.txt
  [20K]
   
https://ci-hadoop.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86_64/1360/artifact/out/patch-unit-hadoop-tools_hadoop-resourceestimator.txt
  [16K]