Re: [DISCUSS] Shade guava into hadoop-thirdparty
Great question! I can run Java API Compliance Checker to detect any API changes. Guess that's the only one to find out. On Sat, Apr 4, 2020 at 1:19 PM Igor Dvorzhak wrote: > How this proposal will impact public APIs? I.e does Hadoop expose any > Guava classes in the client APIs that will require recompiling all client > applications because they need to use shaded Guava classes? > > On Sat, Apr 4, 2020 at 12:13 PM Wei-Chiu Chuang > wrote: > >> Hi Hadoop devs, >> >> I spent a good part of the past 7 months working with a dozen of >> colleagues >> to update the guava version in Cloudera's software (that includes Hadoop, >> HBase, Spark, Hive, Cloudera Manager ... more than 20+ projects) >> >> After 7 months, I finally came to a conclusion: Update to Hadoop 3.3 / >> 3.2.1 / 3.1.3, even if you just go from Hadoop 3.0/ 3.1.0 is going to be >> really hard because of guava. Because of Guava, the amount of work to >> certify a minor release update is almost equivalent to a major release >> update. >> >> That is because: >> (1) Going from guava 11 to guava 27 is a big jump. There are several >> incompatible API changes in many places. Too bad the Google developers are >> not sympathetic about its users. >> (2) guava is used in all Hadoop jars. Not just Hadoop servers but also >> client jars and Hadoop common libs. >> (3) The Hadoop library is used in practically all software at Cloudera. >> >> Here is my proposal: >> (1) shade guava into hadoop-thirdparty, relocate the classpath to >> org.hadoop.thirdparty.com.google.common.* >> (2) make a hadoop-thirdparty 1.1.0 release. >> (3) update existing references to guava to the relocated path. There are >> more than 2k imports that need an update. >> (4) release Hadoop 3.3.1 / 3.2.2 that contains this change. >> >> In this way, we will be able to update guava in Hadoop in the future >> without disrupting Hadoop applications. >> >> Note: HBase already did this and this guava update project would have been >> much more difficult if HBase didn't do so. >> >> Thoughts? Other options include >> (1) force downstream applications to migrate to Hadoop client artifacts as >> listed here >> >> https://hadoop.apache.org/docs/r3.1.1/hadoop-project-dist/hadoop-common/DownstreamDev.html >> but >> that's nearly impossible. >> (2) Migrate Guava to Java APIs. I suppose this is a big project and I >> can't >> estimate how much work it's going to be. >> >> Weichiu >> >
Re: [DISCUSS] Shade guava into hadoop-thirdparty
How this proposal will impact public APIs? I.e does Hadoop expose any Guava classes in the client APIs that will require recompiling all client applications because they need to use shaded Guava classes? On Sat, Apr 4, 2020 at 12:13 PM Wei-Chiu Chuang wrote: > Hi Hadoop devs, > > I spent a good part of the past 7 months working with a dozen of colleagues > to update the guava version in Cloudera's software (that includes Hadoop, > HBase, Spark, Hive, Cloudera Manager ... more than 20+ projects) > > After 7 months, I finally came to a conclusion: Update to Hadoop 3.3 / > 3.2.1 / 3.1.3, even if you just go from Hadoop 3.0/ 3.1.0 is going to be > really hard because of guava. Because of Guava, the amount of work to > certify a minor release update is almost equivalent to a major release > update. > > That is because: > (1) Going from guava 11 to guava 27 is a big jump. There are several > incompatible API changes in many places. Too bad the Google developers are > not sympathetic about its users. > (2) guava is used in all Hadoop jars. Not just Hadoop servers but also > client jars and Hadoop common libs. > (3) The Hadoop library is used in practically all software at Cloudera. > > Here is my proposal: > (1) shade guava into hadoop-thirdparty, relocate the classpath to > org.hadoop.thirdparty.com.google.common.* > (2) make a hadoop-thirdparty 1.1.0 release. > (3) update existing references to guava to the relocated path. There are > more than 2k imports that need an update. > (4) release Hadoop 3.3.1 / 3.2.2 that contains this change. > > In this way, we will be able to update guava in Hadoop in the future > without disrupting Hadoop applications. > > Note: HBase already did this and this guava update project would have been > much more difficult if HBase didn't do so. > > Thoughts? Other options include > (1) force downstream applications to migrate to Hadoop client artifacts as > listed here > > https://hadoop.apache.org/docs/r3.1.1/hadoop-project-dist/hadoop-common/DownstreamDev.html > but > that's nearly impossible. > (2) Migrate Guava to Java APIs. I suppose this is a big project and I can't > estimate how much work it's going to be. > > Weichiu > smime.p7s Description: S/MIME Cryptographic Signature
[DISCUSS] Shade guava into hadoop-thirdparty
Hi Hadoop devs, I spent a good part of the past 7 months working with a dozen of colleagues to update the guava version in Cloudera's software (that includes Hadoop, HBase, Spark, Hive, Cloudera Manager ... more than 20+ projects) After 7 months, I finally came to a conclusion: Update to Hadoop 3.3 / 3.2.1 / 3.1.3, even if you just go from Hadoop 3.0/ 3.1.0 is going to be really hard because of guava. Because of Guava, the amount of work to certify a minor release update is almost equivalent to a major release update. That is because: (1) Going from guava 11 to guava 27 is a big jump. There are several incompatible API changes in many places. Too bad the Google developers are not sympathetic about its users. (2) guava is used in all Hadoop jars. Not just Hadoop servers but also client jars and Hadoop common libs. (3) The Hadoop library is used in practically all software at Cloudera. Here is my proposal: (1) shade guava into hadoop-thirdparty, relocate the classpath to org.hadoop.thirdparty.com.google.common.* (2) make a hadoop-thirdparty 1.1.0 release. (3) update existing references to guava to the relocated path. There are more than 2k imports that need an update. (4) release Hadoop 3.3.1 / 3.2.2 that contains this change. In this way, we will be able to update guava in Hadoop in the future without disrupting Hadoop applications. Note: HBase already did this and this guava update project would have been much more difficult if HBase didn't do so. Thoughts? Other options include (1) force downstream applications to migrate to Hadoop client artifacts as listed here https://hadoop.apache.org/docs/r3.1.1/hadoop-project-dist/hadoop-common/DownstreamDev.html but that's nearly impossible. (2) Migrate Guava to Java APIs. I suppose this is a big project and I can't estimate how much work it's going to be. Weichiu
Apache Hadoop qbt Report: trunk+JDK8 on Linux/x86
For more details, see https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/1459/ [Apr 3, 2020 1:59:07 AM] (github) MAPREDUCE-7268. Fix TestMapreduceConfigFields (#1935) [Apr 3, 2020 7:37:41 AM] (pjoseph) YARN-10120. Amendment fix for Java Doc. [Apr 3, 2020 8:27:02 AM] (ayushsaxena) HADOOP-16952. Add .diff to gitignore. Contributed by Ayush Saxena. [Apr 3, 2020 3:13:41 PM] (github) HDFS-15258. RBF: Mark Router FSCK unstable. (#1934) [Apr 3, 2020 10:20:51 PM] (iwasakims) HADOOP-16647. Support OpenSSL 1.1.1 LTS. Contributed by Rakesh -1 overall The following subsystems voted -1: asflicense findbugs pathlen unit xml The following subsystems voted -1 but were configured to be filtered/ignored: cc checkstyle javac javadoc pylint shellcheck shelldocs whitespace The following subsystems are considered long running: (runtime bigger than 1h 0m 0s) unit Specific tests: XML : Parsing Error(s): hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/resources/nvidia-smi-output-excerpt.xml hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/resources/nvidia-smi-output-missing-tags.xml hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/resources/nvidia-smi-output-missing-tags2.xml hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/resources/nvidia-smi-sample-output.xml hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/resources/fair-scheduler-invalid.xml hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/resources/yarn-site-with-invalid-allocation-file-ref.xml FindBugs : module:hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common org.apache.hadoop.yarn.server.webapp.WebServiceClient.sslFactory should be package protected At WebServiceClient.java: At WebServiceClient.java:[line 42] FindBugs : module:hadoop-cloud-storage-project/hadoop-cos Redundant nullcheck of dir, which is known to be non-null in org.apache.hadoop.fs.cosn.BufferPool.createDir(String) Redundant null check at BufferPool.java:is known to be non-null in org.apache.hadoop.fs.cosn.BufferPool.createDir(String) Redundant null check at BufferPool.java:[line 66] org.apache.hadoop.fs.cosn.CosNInputStream$ReadBuffer.getBuffer() may expose internal representation by returning CosNInputStream$ReadBuffer.buffer At CosNInputStream.java:by returning CosNInputStream$ReadBuffer.buffer At CosNInputStream.java:[line 87] Found reliance on default encoding in org.apache.hadoop.fs.cosn.CosNativeFileSystemStore.storeFile(String, File, byte[]):in org.apache.hadoop.fs.cosn.CosNativeFileSystemStore.storeFile(String, File, byte[]): new String(byte[]) At CosNativeFileSystemStore.java:[line 199] Found reliance on default encoding in org.apache.hadoop.fs.cosn.CosNativeFileSystemStore.storeFileWithRetry(String, InputStream, byte[], long):in org.apache.hadoop.fs.cosn.CosNativeFileSystemStore.storeFileWithRetry(String, InputStream, byte[], long): new String(byte[]) At CosNativeFileSystemStore.java:[line 178] org.apache.hadoop.fs.cosn.CosNativeFileSystemStore.uploadPart(File, String, String, int) may fail to clean up java.io.InputStream Obligation to clean up resource created at CosNativeFileSystemStore.java:fail to clean up java.io.InputStream Obligation to clean up resource created at CosNativeFileSystemStore.java:[line 252] is not discharged Failed junit tests : hadoop.hdfs.server.datanode.TestBPOfferService hadoop.hdfs.server.blockmanagement.TestUnderReplicatedBlocks hadoop.yarn.server.resourcemanager.reservation.TestCapacityOverTimePolicy hadoop.yarn.applications.distributedshell.TestDistributedShell hadoop.mapred.TestNetworkedJob hadoop.yarn.sls.TestSLSRunner cc: https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/1459/artifact/out/diff-compile-cc-root.txt [8.0K] javac: https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/1459/artifact/out/diff-compile-javac-root.txt [428K] checkstyle: https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/1459/artifact/out/diff-checkstyle-root.txt [16M] pathlen: https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/1459/artifact/out/pathlen.txt [12K] pylint: The source tree stderr: https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/1459/artifact/out/patch-pylint-stderr.txt [] shellcheck: https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/1459/artifact/out/diff-patch-shellcheck.txt [16K] shelldocs: https://builds.apache.org/job/hadoop-qbt-trunk-java8-lin
Apache Hadoop qbt Report: branch2.10+JDK7 on Linux/x86
For more details, see https://builds.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86/645/ No changes -1 overall The following subsystems voted -1: asflicense findbugs hadolint pathlen unit xml The following subsystems voted -1 but were configured to be filtered/ignored: cc checkstyle javac javadoc pylint shellcheck shelldocs whitespace The following subsystems are considered long running: (runtime bigger than 1h 0m 0s) unit Specific tests: XML : Parsing Error(s): hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/conf/empty-configuration.xml hadoop-tools/hadoop-azure/src/config/checkstyle-suppressions.xml hadoop-yarn-project/hadoop-yarn/hadoop-yarn-ui/public/crossdomain.xml hadoop-yarn-project/hadoop-yarn/hadoop-yarn-ui/src/main/webapp/public/crossdomain.xml FindBugs : module:hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-timelineservice-hbase/hadoop-yarn-server-timelineservice-hbase-client Boxed value is unboxed and then immediately reboxed in org.apache.hadoop.yarn.server.timelineservice.storage.common.ColumnRWHelper.readResultsWithTimestamps(Result, byte[], byte[], KeyConverter, ValueConverter, boolean) At ColumnRWHelper.java:then immediately reboxed in org.apache.hadoop.yarn.server.timelineservice.storage.common.ColumnRWHelper.readResultsWithTimestamps(Result, byte[], byte[], KeyConverter, ValueConverter, boolean) At ColumnRWHelper.java:[line 335] Failed junit tests : hadoop.hdfs.server.namenode.ha.TestDFSUpgradeWithHA hadoop.hdfs.qjournal.server.TestJournalNodeRespectsBindHostKeys hadoop.contrib.bkjournal.TestBookKeeperHACheckpoints hadoop.contrib.bkjournal.TestBookKeeperHACheckpoints hadoop.yarn.client.api.impl.TestAMRMProxy hadoop.registry.secure.TestSecureLogins cc: https://builds.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86/645/artifact/out/diff-compile-cc-root-jdk1.7.0_95.txt [4.0K] javac: https://builds.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86/645/artifact/out/diff-compile-javac-root-jdk1.7.0_95.txt [324K] cc: https://builds.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86/645/artifact/out/diff-compile-cc-root-jdk1.8.0_242.txt [4.0K] javac: https://builds.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86/645/artifact/out/diff-compile-javac-root-jdk1.8.0_242.txt [304K] checkstyle: https://builds.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86/645/artifact/out/diff-checkstyle-root.txt [16M] hadolint: https://builds.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86/645/artifact/out/diff-patch-hadolint.txt [4.0K] pathlen: https://builds.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86/645/artifact/out/pathlen.txt [12K] pylint: The source tree stderr: https://builds.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86/645/artifact/out/patch-pylint-stderr.txt [] shellcheck: https://builds.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86/645/artifact/out/diff-patch-shellcheck.txt [56K] shelldocs: https://builds.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86/645/artifact/out/diff-patch-shelldocs.txt [8.0K] whitespace: https://builds.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86/645/artifact/out/whitespace-eol.txt [12M] https://builds.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86/645/artifact/out/whitespace-tabs.txt [1.3M] xml: https://builds.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86/645/artifact/out/xml.txt [12K] findbugs: https://builds.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86/645/artifact/out/branch-findbugs-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-timelineservice-hbase_hadoop-yarn-server-timelineservice-hbase-client-warnings.html [8.0K] javadoc: https://builds.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86/645/artifact/out/diff-javadoc-javadoc-root-jdk1.7.0_95.txt [16K] https://builds.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86/645/artifact/out/diff-javadoc-javadoc-root-jdk1.8.0_242.txt [1.1M] unit: https://builds.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86/645/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt [236K] https://builds.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86/645/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs_src_contrib_bkjournal.txt [12K] https://builds.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86/645/artifact/out/patch-unit-hadoop-mapreduce-project_hadoop-mapreduce-client_hadoop-mapreduce-client-jobclient.txt [96K] https://builds.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86/645/artifact/out/patch-unit-ha