[jira] [Created] (MAPREDUCE-7474) [ABFS] Improve commit resilience and performance in Manifest Committer

2024-04-03 Thread Steve Loughran (Jira)
Steve Loughran created MAPREDUCE-7474:
-

 Summary: [ABFS] Improve commit resilience and performance in 
Manifest Committer
 Key: MAPREDUCE-7474
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-7474
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: client
Affects Versions: 3.3.6, 3.4.0
Reporter: Steve Loughran



* Manifest committer is not resilient to rename failures on task commit without 
HADOOP-18012 rename recovery enabled. 
* large burst of delete calls noted: are they needed


relates to HADOOP-19093 but takes a more minimal approach with goal of changes 
in manifest committer only.

Initial proposed changes
* retry recovery on task commit rename, always (repeat save, delete, rename)
* audit delete use and see if it can be pruned
* maybe: rate limit some IO internally, but not delegate to abfs



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org



[jira] [Resolved] (MAPREDUCE-7470) multi-thread mapreduce committer

2024-03-20 Thread Steve Loughran (Jira)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-7470?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran resolved MAPREDUCE-7470.
---
Resolution: Duplicate

> multi-thread mapreduce committer
> 
>
> Key: MAPREDUCE-7470
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7470
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: mrv2
>Reporter: TianyiMa
>Priority: Major
>  Labels: mapreduce, pull-request-available
> Attachments: MAPREDUCE-7470.0.patch
>
>
> In cloud environment, such as aws, aliyun etc., the internet delay is 
> non-trival when we commit thounds of files.
> In our situation, the ping delay is about 0.03ms in IDC, but when move to 
> Coud, the ping delay is about 3ms, which is roughly 100x slower. We found 
> that, committing tens thounds of files will cost a few tens of minutes. The 
> more files there are, the logger it takes.
> So we propose a new committer algorithm, which is a variant of committer 
> algorithm version 1, called 3. In this new algorithm 3, in order to decrease 
> the committer time, we use a thread pool to commit job's final output.
> Our test result in Cloud production shows that, the new algorithm 3 has 
> decrease the committer time by serveral tens of times.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org



Re: [jira] [Created] (MAPREDUCE-7472) decode value of hive.query.string for the job Confguration which was encoded by hive

2024-03-19 Thread Steve Loughran
who is reviewing MR patches these days?

On Fri, 2 Feb 2024 at 01:46, wangzhongwei (Jira)  wrote:

> wangzhongwei created MAPREDUCE-7472:
> ---
>
>  Summary: decode value of hive.query.string for the job
> Confguration which was encoded by hive
>  Key: MAPREDUCE-7472
>  URL: https://issues.apache.org/jira/browse/MAPREDUCE-7472
>  Project: Hadoop Map/Reduce
>   Issue Type: Bug
> Affects Versions: 3.3.3
> Reporter: wangzhongwei
> Assignee: wangzhongwei
>  Attachments: image-2024-02-02-09-44-57-503.png
>
>  the value of  hive.query.string in job Configuratio is URLEncoded by hive
> and written to hdfs,which shoud be decoded before rendered
>
> !image-2024-02-02-09-44-57-503.png!
>
>
>
> --
> This message was sent by Atlassian Jira
> (v8.20.10#820010)
>
> -
> To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org
> For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org
>
>


Re: [VOTE] Release Apache Hadoop 3.4.0 (RC3)

2024-03-12 Thread Steve Loughran
followup: overnight work happy too.

one interesting pain point is that on a raspberry pi 64 os checknative
complains that libcrypto is missing

> bin/hadoop checknative

2024-03-12 11:50:24,359 INFO bzip2.Bzip2Factory: Successfully loaded &
initialized native-bzip2 library system-native
2024-03-12 11:50:24,363 INFO zlib.ZlibFactory: Successfully loaded &
initialized native-zlib library
2024-03-12 11:50:24,370 WARN erasurecode.ErasureCodeNative: ISA-L support
is not available in your platform... using builtin-java codec where
applicable
2024-03-12 11:50:24,429 INFO nativeio.NativeIO: The native code was built
without PMDK support.
2024-03-12 11:50:24,431 WARN crypto.OpensslCipher: Failed to load OpenSSL
Cipher.
java.lang.UnsatisfiedLinkError: Cannot load libcrypto.so (libcrypto.so:
cannot open shared object file: No such file or directory)!
at org.apache.hadoop.crypto.OpensslCipher.initIDs(Native Method)
at
org.apache.hadoop.crypto.OpensslCipher.(OpensslCipher.java:90)
at
org.apache.hadoop.util.NativeLibraryChecker.main(NativeLibraryChecker.java:111)
Native library checking:
hadoop:  true
/home/stevel/Projects/hadoop-release-support/target/arm-untar/hadoop-3.4.0/lib/native/libhadoop.so.1.0.0
zlib:true /lib/aarch64-linux-gnu/libz.so.1
zstd  :  true /lib/aarch64-linux-gnu/libzstd.so.1
bzip2:   true /lib/aarch64-linux-gnu/libbz2.so.1
openssl: false Cannot load libcrypto.so (libcrypto.so: cannot open shared
object file: No such file or directory)!
ISA-L:   false libhadoop was built without ISA-L support
PMDK:false The native code was built without PMDK support.

which happens because its not in /lib/aarch64-linux-gnu but instead in
/usr/lib/aarch64-linux-gnu/l
ls -l /usr/lib/aarch64-linux-gnu/libcrypto*
-rw-r--r-- 1 root root 2739952 Sep 19 13:09
/usr/lib/aarch64-linux-gnu/libcrypto.so.1.1
-rw-r--r-- 1 root root 4466856 Oct 27 13:40
/usr/lib/aarch64-linux-gnu/libcrypto.so.3

Anyone got any insights on how I should set up this (debian-based) OS here?
I know it's only a small box but with arm64 VMs becoming available in cloud
infras, it'd be good to know if they are similar.

Note: checknative itself is happy; but checknative -a will fail because of
this -though it's an OS setup issue, nothing related to the hadoop binaries.

steve

On Tue, 12 Mar 2024 at 02:26, Xiaoqiao He  wrote:

> Hi Shilun, Counter should be with yourself vote, where the current summary
> is 5 +1 binding and 1 +1 non-binding. Let's re-count when deadline.
> Thanks again.
>
> Best Regards,
> - He Xiaoqiao
>
> On Tue, Mar 12, 2024 at 9:00 AM slfan1989  wrote:
>
> > As of now, we have collected 5 affirmative votes, with 4 votes binding
> and
> > 1 vote non-binding.
> >
> > Thank you very much for voting and verifying!
> >
> > This voting will continue until March 15th, this Friday.
> >
> > Best Regards,
> > Shilun Fan.
> >
> > On Tue, Mar 12, 2024 at 4:29 AM Steve Loughran
>  > >
> > wrote:
> >
> > > +1 binding
> > >
> > > (sorry, this had ended in the yarn-dev folder, otherwise I'd have seen
> it
> > > earlier. been testing it this afternoon:
> > >
> > > pulled the latest version of
> > > https://github.com/apache/hadoop-release-support
> > > (note, this module is commit-then-review; whoever is working
> > on/validating
> > > a release can commit as they go along. This is not production code...)
> > >
> > > * went through the "validating a release" step, validating maven
> > artifacts
> > > * building the same downstream modules which built for me last time
> (avro
> > > too complex; hboss not aws v2 in apache yet)
> > >
> > > spark build is still ongoing, but I'm not going to wait. It is
> building,
> > > which is key.
> > >
> > > The core changes I needed in are at the dependency level and I've
> > > verified they are good.
> > >
> > > Oh, and I've also got my raspberry p5 doing the download of the arm
> > > stuff for its checknative; not expecting problems.
> > >
> > > So: i've got some stuff still ongoing, but the core changes to
> packaging
> > > are in and the rest I'm not worried about -they shouldn't block the
> > release
> > > as I already validated them on RC2
> > >
> > >
> > >
> > >
> > >
> > > On Mon, 4 Mar 2024 at 22:08, slfan1989  wrote:
> > >
> > > > Hi folks,
> > > >
> > > > Xiaoqiao He and I have put together a release candidate (RC3) for
> > Hadoop
> > > > 3.4.0.
> > > >
> > > > What we would like is for anyone who can to verify the tarballs,
> > 

Re: [VOTE] Release Apache Hadoop 3.4.0 (RC3)

2024-03-11 Thread Steve Loughran
+1 binding

(sorry, this had ended in the yarn-dev folder, otherwise I'd have seen it
earlier. been testing it this afternoon:

pulled the latest version of
https://github.com/apache/hadoop-release-support
(note, this module is commit-then-review; whoever is working on/validating
a release can commit as they go along. This is not production code...)

* went through the "validating a release" step, validating maven artifacts
* building the same downstream modules which built for me last time (avro
too complex; hboss not aws v2 in apache yet)

spark build is still ongoing, but I'm not going to wait. It is building,
which is key.

The core changes I needed in are at the dependency level and I've
verified they are good.

Oh, and I've also got my raspberry p5 doing the download of the arm
stuff for its checknative; not expecting problems.

So: i've got some stuff still ongoing, but the core changes to packaging
are in and the rest I'm not worried about -they shouldn't block the release
as I already validated them on RC2





On Mon, 4 Mar 2024 at 22:08, slfan1989  wrote:

> Hi folks,
>
> Xiaoqiao He and I have put together a release candidate (RC3) for Hadoop
> 3.4.0.
>
> What we would like is for anyone who can to verify the tarballs, especially
> anyone who can try the arm64 binaries as we want to include them too.
>
> The RC is available at:
> https://dist.apache.org/repos/dist/dev/hadoop/hadoop-3.4.0-RC3/
>
> The git tag is release-3.4.0-RC3, commit bd8b77f398f
>
> The maven artifacts are staged at
> https://repository.apache.org/content/repositories/orgapachehadoop-1408
>
> You can find my public key at:
> https://dist.apache.org/repos/dist/release/hadoop/common/KEYS
>
> Change log
> https://dist.apache.org/repos/dist/dev/hadoop/hadoop-3.4.0-RC3/CHANGELOG.md
>
> Release notes
>
> https://dist.apache.org/repos/dist/dev/hadoop/hadoop-3.4.0-RC3/RELEASENOTES.md
>
> This is off branch-3.4.0 and is the first big release since 3.3.6.
>
> Key changes include
>
> * S3A: Upgrade AWS SDK to V2
> * HDFS DataNode Split one FsDatasetImpl lock to volume grain locks
> * YARN Federation improvements
> * YARN Capacity Scheduler improvements
> * HDFS RBF: Code Enhancements, New Features, and Bug Fixes
> * HDFS EC: Code Enhancements and Bug Fixes
> * Transitive CVE fixes
>
> Differences from Hadoop-3.4.0-RC2
>
> * From branch-3.4 to branch-3.4.0 backport 2 Prs
> * HADOOP-18088: Replacing log4j 1.x with reload4j. (ad8b6541117b)
> * HADOOP-19084: Pruning hadoop-common transitive dependencies.
> (80b4bb68159c)
> * Use hadoop-release-support[1] for packaging and verification.
> * Add protobuf compatibility issue description
>
> Note, because the arm64 binaries are built separately on a different
> platform and JVM, their jar files may not match those of the x86
> release -and therefore the maven artifacts. I don't think this is
> an issue (the ASF actually releases source tarballs, the binaries are
> there for help only, though with the maven repo that's a bit blurred).
>
> The only way to be consistent would actually untar the x86.tar.gz,
> overwrite its binaries with the arm stuff, retar, sign and push out
> for the vote. Even automating that would be risky.
>
> [1] hadoop-release-support:
> https://github.com/apache/hadoop-release-support
> Thanks to steve for providing hadoop-release-support.
>
> Best Regards,
> Shilun Fan.
>
>


Re: [DISCUSS] Support/Fate of HBase v1 in Hadoop

2024-03-11 Thread Steve Loughran
 +1 for cutting hbase 1; it only reduces dependency pain (no more protobuf
2.5!)

Created the JIRA on that a few days back
https://issues.apache.org/jira/browse/YARN-11658

On Tue, 5 Mar 2024 at 12:08, Bryan Beaudreault 
wrote:

> Hbase v1 is EOL for a while now, so option 2 probably makes sense. While
> you are at it you should probably update the hbase2 version, because 2.2.x
> is also very old and EOL. 2.5.x is the currently maintained release for
> hbase2, with 2.5.7 being the latest. We’re soon going to release 2.6.0 as
> well.
>
> On Tue, Mar 5, 2024 at 6:56 AM Ayush Saxena  wrote:
>
> > Hi Folks,
> > As of now we have two profiles for HBase: one for HBase v1(1.7.1) & other
> > for v2(2.2.4). The versions are specified over here: [1], how to build is
> > mentioned over here: [2]
> >
> > As of now we by default run our Jenkins "only" for HBase v1, so we have
> > seen HBase v2 profile silently breaking a couple of times.
> >
> > Considering there are stable versions for HBase v2 as per [3] & HBase v2
> > seems not too new, I have some suggestions, we can consider:
> >
> > * Make HBase v2 profile as the default profile & let HBase v1 profile
> stay
> > in our code.
> > * Ditch HBase v1 profile & just lets support HBase v2 profile.
> > * Let everything stay as is, just add a Jenkins job/ Github action which
> > compiles HBase v2 as well, so we make sure no change breaks it.
> >
> > Personally I would go with the second option, the last HBase v1 release
> > seems to be 2 years back, it might be pulling in some
> > problematic transitive dependencies & it will open scope for us to
> support
> > HBase 3.x when they have a stable release in future.
> >
> >
> > Let me know your thoughts!!!
> >
> > -Ayush
> >
> >
> > [1]
> >
> >
> https://github.com/apache/hadoop/blob/dae871e3e0783e1fe6ea09131c3f4650abfa8a1d/hadoop-project/pom.xml#L206-L207
> >
> > [2]
> >
> >
> https://github.com/apache/hadoop/blob/dae871e3e0783e1fe6ea09131c3f4650abfa8a1d/BUILDING.txt#L168-L172
> >
> > [3] https://hbase.apache.org/downloads.html
> >
>


Re: [VOTE] Release Apache Hadoop 3.4.0 (RC2)

2024-02-23 Thread Steve Loughran
I have been testing this all week, and a -1 until some very minor changes
go in.


   1. build the arm64 binaries with the same jar artifacts as the x86 one
   2. include ad8b6541117b HADOOP-18088. Replace log4j 1.x with reload4j.
   3. include 80b4bb68159c HADOOP-19084. Prune hadoop-common transitive
   dependencies


For #1 we have automation there in my client-validator module, which I have
moved to be a hadoop-managed project and tried to make more
manageable
https://github.com/apache/hadoop-release-support

This contains an ant project to perform a lot of the documented build
stages, including using SCP to copy down an x86 release tarball and make a
signed copy of this containing (locally built) arm artifacts.

Although that only works with my development environment (macbook m1 laptop
and remote ec2 server), it should be straightforward to make it more
flexible.

It also includes and tests a maven project which imports many of the
hadoop-* pom files and run some test with it; this caught some problems
with exported slf4j and log4j2 artifacts getting into the classpath. That
is: hadoop-common pulling in log4j 1.2 and 2.x bindings.

HADOOP-19084 fixes this; the build file now includes a target to scan the
dependencies and fail if "forbidden" artifacts are found. I have not been
able to stop logback ending on the transitive dependency list, but at least
there is only one slf4j there.

HADOOP-18088. Replace log4j 1.x with reload4j switches over to reload4j
while the move to v2 is still something we have to consider a WiP.

I have tried doing some other changes to the packaging this week
- creating a lean distro without the AWS SDK
- trying to get protobuf-2.5 out of yarn-api
However, I think it is too late to try applying patches this risky.

I Believe we should get the 3.4.0 release out for people to start playing
with while we rapidly iterate 3.4.1 release out with
- updated dependencies (where possible)
- separate "lean" and "full" installations, where "full" includes all the
cloud connectors and their dependencies; the default is lean and doesn't.
That will cut the default download size in half.
- critical issues which people who use the 3.4.0 release raise with us.

That is: a packaging and bugs release, with a minimal number of new
features.

I've created HADOOP-19087
<https://issues.apache.org/jira/browse/HADOOP-19087> to cover this,
I'm willing to get my hands dirty here -Shilun Fan and Xiaoqiao He have put
a lot of work on 3.4.0 and probably need other people to take up the work
for next release. Who else is willing to participate? (Yes Mukund, I have
you in mind too)

One thing I would like to visit is: what hadoop-tools modules can we cut?
Are rumen and hadoop-streaming being actively used? Or can we consider them
implicitly EOL and strip. Just think of the maintenance effort we would
save.

---

Incidentally, I have tested the arm stuff on my raspberry pi5 which is now
running 64 bit linux. I believe it is the first time we have qualified a
Hadoop release with the media player under someone's television.

On Thu, 15 Feb 2024 at 20:41, Mukund Madhav Thakur 
wrote:

> Thanks, Shilun for putting this together.
>
> Tried the below things and everything worked for me.
>
> validated checksum and gpg signature.
> compiled from source.
> Ran AWS integration tests.
> untar the binaries and able to access objects in S3 via hadoop fs commands.
> compiled gcs-connector successfully using the 3.4.0 version.
>
> qq: what is the difference between RC1 and RC2? apart from some extra
> patches.
>
>
>
> On Thu, Feb 15, 2024 at 10:58 AM slfan1989  wrote:
>
>> Thank you for explaining this part!
>>
>> hadoop-3.4.0-RC2 used the validate-hadoop-client-artifacts tool to
>> generate
>> the ARM tar package, which should meet expectations.
>>
>> We also look forward to other members helping to verify.
>>
>> Best Regards,
>> Shilun Fan.
>>
>> On Fri, Feb 16, 2024 at 12:22 AM Steve Loughran 
>> wrote:
>>
>> >
>> >
>> > On Mon, 12 Feb 2024 at 15:32, slfan1989  wrote:
>> >
>> >>
>> >>
>> >> Note, because the arm64 binaries are built separately on a different
>> >> platform and JVM, their jar files may not match those of the x86
>> >> release -and therefore the maven artifacts. I don't think this is
>> >> an issue (the ASF actually releases source tarballs, the binaries are
>> >> there for help only, though with the maven repo that's a bit blurred).
>> >>
>> >> The only way to be consistent would actually untar the x86.tar.gz,
>> >> overwrite its binaries with the arm stuff, retar, sign and push out
>> >> for the vote.
>> >
>> >
>> >

Re: [VOTE] Release Apache Hadoop 3.4.0 (RC2)

2024-02-15 Thread Steve Loughran
On Mon, 12 Feb 2024 at 15:32, slfan1989  wrote:

>
>
> Note, because the arm64 binaries are built separately on a different
> platform and JVM, their jar files may not match those of the x86
> release -and therefore the maven artifacts. I don't think this is
> an issue (the ASF actually releases source tarballs, the binaries are
> there for help only, though with the maven repo that's a bit blurred).
>
> The only way to be consistent would actually untar the x86.tar.gz,
> overwrite its binaries with the arm stuff, retar, sign and push out
> for the vote.



that's exactly what the "arm.release" target in my client-validator does.
builds an arm tar with the x86 binaries but the arm native libs, signs it.



> Even automating that would be risky.
>
>
automating is the *only* way to do it; apache ant has everything needed for
this including the ability to run gpg.

we did this on the relevant 3.3.x releases and nobody has yet complained...


Re: [VOTE] Release Apache Hadoop 3.4.0 (RC1)

2024-02-12 Thread Steve Loughran
so it'll have the s3a checksum and region stuff? that'd be wonderful! I've
been assuming i'd have to push out a 3.4.1 release for that alone.

On Sat, 10 Feb 2024 at 23:36, slfan1989  wrote:

> We will end the voting for hadoop-3.4.0-RC1 and will open the voting for
> hadoop-3.4.0-RC2 in 1-2 days.
>
> * hadoop-3.4.0-RC2 will contain all commits in branch-3.4, and newly
> submitted(in branch-3.4) commits will be cherry-picked to branch-3.4.0.
> * hadoop-3.4.0-RC2 will use hadoop-thirdparty-1.2.0.
>
> Best Regards,
> Shilun Fan.
>
> On Mon, Jan 29, 2024 at 10:33 PM slfan1989  wrote:
>
> > Apache Hadoop 3.4.0
> >
> > Xiaoqiao He and I have put together a release candidate (RC1) for Hadoop
> > 3.4.0.
> >
> > What we would like is for anyone who can to verify the tarballs,
> especially
> > anyone who can try the arm64 binaries as we want to include them too.
> >
> > The RC is available at:
> > https://dist.apache.org/repos/dist/dev/hadoop/hadoop-3.4.0-RC1/
> >
> > The git tag is release-3.4.0-RC1, commit 7e2edd8c5d1
> >
> > The maven artifacts are staged at
> > https://repository.apache.org/content/repositories/orgapachehadoop-1395/
> >
> > You can find my public key at:
> > https://dist.apache.org/repos/dist/release/hadoop/common/KEYS
> >
> > Change log
> >
> https://dist.apache.org/repos/dist/dev/hadoop/hadoop-3.4.0-RC1/CHANGELOG.md
> >
> > Release notes
> >
> >
> https://dist.apache.org/repos/dist/dev/hadoop/hadoop-3.4.0-RC1/RELEASENOTES.md
> >
> > This is off branch-3.4.0 and is the first big release since 3.3.6.
> >
> > Key changes include
> >
> > * S3A: Upgrade AWS SDK to V2
> > * HDFS DataNode Split one FsDatasetImpl lock to volume grain locks
> > * YARN Federation improvements
> > * YARN Capacity Scheduler improvements
> > * HDFS RBF: Code Enhancements, New Features, and Bug Fixes
> > * HDFS EC: Code Enhancements and Bug Fixes
> > * Transitive CVE fixes
> >
> > Differences from RC0
> >
> > * We've improved Hadoop 3.4.0 Highlight big features and improvements.
> > * Confirmed the JIRA status of Hadoop, HDFS, YARN, and MAPREDUCE modules.
> > * Use validate-hadoop-client-artifacts[1] for packaging and verification.
> >
> > Note, because the arm64 binaries are built separately on a different
> > platform and JVM, their jar files may not match those of the x86
> > release -and therefore the maven artifacts. I don't think this is
> > an issue (the ASF actually releases source tarballs, the binaries are
> > there for help only, though with the maven repo that's a bit blurred).
> >
> > The only way to be consistent would actually untar the x86.tar.gz,
> > overwrite its binaries with the arm stuff, retar, sign and push out
> > for the vote. Even automating that would be risky.
> >
> > [1] validate-hadoop-client-artifacts:
> > https://github.com/steveloughran/validate-hadoop-client-artifacts
> > Thanks to steve for providing validate-hadoop-client-artifacts.
> >
> > Best Regards,
> > Shilun Fan.
> >
>


Re: Fw:Re: [VOTE] Release Apache Hadoop 3.4.0 RC0

2024-01-15 Thread Steve Loughran
-1 I'm afraid, just due to staging/packaging issues.

This took me a few goes to get right myself, so nothing unusual.

Note I used my validator project which is set to retrieve binaries, check
signatures, run maven builds against staged artifacts *and clean up any
local copies first*and more.

This uses apache ant to manage all this:

https://github.com/steveloughran/validate-hadoop-client-artifacts

Here's the initial build.properties:file I used to try and manage this

## build.properties:
hadoop.version=3.4.0
rc=RC0
amd.src.dir=https://home.apache.org/~slfan1989/hadoop-3.4.0-RC0-amd64/
http.source=https://home.apache.org/~slfan1989/hadoop-3.4.0-RC0-amd64

release=hadoop-${hadoop.version}-RC0
rc.dirname=${release}
release.native.binaries=false
git.commit.id=cdb8af4f22ec
nexus.staging.url=
https://repository.apache.org/content/repositories/orgapachehadoop-1391/
hadoop.source.dir=${local.dev.dir}/hadoop-trunk
##

When I did my own builds, all the artifacts created were without the RC0
suffix. It is critical this happens because the .sha512 checksums include
that in their paths

> cat hadoop-3.4.0-RC0.tar.gz.sha512
SHA512 (hadoop-3.4.0-RC0.tar.gz) =
e50e68aecb36867c610db8309ccd3aae812184da21354b50d2a461b29c73f21d097fb27372c73c150e1c035003bb99a61c64db26c090fe0fb9e7ed6041722eab


Maven artifacts: staging problems

Couldn't build with a -Pstaging profile as the staging repository wasn't
yet closed -I tried to do that myself.

This failed with some rule problem

Event: Failed: Checksum Validation
Monday, January 15, 2024 14:37:13 GMT (GMT+)
typeId checksum-staging
failureMessage INVALID SHA-1:
'/org/apache/hadoop/hadoop-mapreduce-client-jobclient/3.4.0/hadoop-mapreduce-client-jobclient-3.4.0-tests.jar.sha1'
failureMessage Requires one-of SHA-1:
/org/apache/hadoop/hadoop-mapreduce-client-jobclient/3.4.0/hadoop-mapreduce-client-jobclient-3.4.0-tests.jar.sha1,
SHA-256:
/org/apache/hadoop/hadoop-mapreduce-client-jobclient/3.4.0/hadoop-mapreduce-client-jobclient-3.4.0-tests.jar.sha256,
SHA-512:
/org/apache/hadoop/hadoop-mapreduce-client-jobclient/3.4.0/hadoop-mapreduce-client-jobclient-3.4.0-tests.jar.sha512

I don't know precisely what this means...my guess is that the upload didn't
include everything.

Note my client-validator module can check this; just run its maven test
commands

mvn clean test -U -P3.4 -Pstaging

GPG signing: all good.

Picked your key up from the site ( ant gpg.keys ) ... first validation with
ant gpg.verify was unhappy as your key wasn't trusted. I've signed it and
pushed that signature up, so people who trust me get some reassurance about
you.

My build then failed as the gpg code couldn't find the
hadoop-3.4.0-aarch64.tar.gz.asc

The problem here is that although we want separate arm and x86 tar files,
we don't really want separate binaries as it only creates different jars in
the wild.

The way I addressed that was after creating that x86 release on an ec2 vm
and downloading it, I then did a local arm64 build and then created an arm
.tar.gz file, copied it into the same dir as the amd66 binaries but with
the arm64 .tar.gz filename, .asc and .sha512 checksum files all renamed
(checksum file patches to match the name).

https://github.com/steveloughran/validate-hadoop-client-artifacts?tab=readme-ov-file#arm64-binaries


Re: [VOTE] Release Apache Hadoop 3.4.0 RC0

2024-01-11 Thread Steve Loughran
wonderful! I'll be testing over the weekend

Meanwhile, new changes I'm putting in to trunk are tagged as fixed in 3.5.0
-correct?

steve


On Thu, 11 Jan 2024 at 05:15, slfan1989  wrote:

> Hello all,
>
> We plan to release hadoop 3.4.0 based on hadoop trunk, which is the first
> hadoop 3.4.0-RC version.
>
> The RC is available at:
> https://home.apache.org/~slfan1989/hadoop-3.4.0-RC0-amd64/ (for amd64)
> https://home.apache.org/~slfan1989/hadoop-3.4.0-RC0-arm64/ (for arm64)
>
> Maven artifacts is built by x86 machine and are staged at
> https://repository.apache.org/content/repositories/orgapachehadoop-1391/
>
> My public key:
> https://dist.apache.org/repos/dist/release/hadoop/common/KEYS
>
> Changelog:
> https://home.apache.org/~slfan1989/hadoop-3.4.0-RC0-amd64/CHANGELOG.md
>
> Release notes:
> https://home.apache.org/~slfan1989/hadoop-3.4.0-RC0-amd64/RELEASENOTES.md
>
> This is a relatively big release (by Hadoop standard) containing about 2852
> commits.
>
> Please give it a try, this RC vote will run for 7 days.
>
> Feature highlights:
>
> DataNode FsDatasetImpl Fine-Grained Locking via BlockPool
> 
> [HDFS-15180](https://issues.apache.org/jira/browse/HDFS-15180) Split
> FsDatasetImpl datasetLock via blockpool to solve the issue of heavy
> FsDatasetImpl datasetLock
> When there are many namespaces in a large cluster.
>
> YARN Federation improvements
> 
> [YARN-5597](https://issues.apache.org/jira/browse/YARN-5597) brings many
> improvements, including the following:
>
> 1. YARN Router now boasts a full implementation of all relevant interfaces
> including the ApplicationClientProtocol,
> ResourceManagerAdministrationProtocol, and RMWebServiceProtocol.
> 2. Enhanced support for Application cleanup and automatic offline
> mechanisms for SubCluster are now facilitated by the YARN Router.
> 3. Code optimization for Router and AMRMProxy was undertaken, coupled with
> improvements to previously pending functionalities.
> 4. Audit logs and Metrics for Router received upgrades.
> 5. A boost in cluster security features was achieved, with the inclusion of
> Kerberos support.
> 6. The page function of the router has been enhanced.
>
> Upgrade AWS SDK to V2
> 
> [HADOOP-18073](https://issues.apache.org/jira/browse/HADOOP-18073)
> The S3A connector now uses the V2 AWS SDK.  This is a significant change at
> the source code level.
> Any applications using the internal extension/override points in the
> filesystem connector are likely to break.
> Consult the document aws\_sdk\_upgrade for the full details.
>
> hadoop-thirdparty will also provide the new RC0 soon.
>
> Best Regards,
> Shilun Fan.
>


Re: [VOTE] Hadoop 3.2.x EOL

2023-12-06 Thread Steve Loughran
+1

On Wed, 6 Dec 2023 at 04:09, Xiaoqiao He  wrote:

> Dear Hadoop devs,
>
> Given the feedback from the discussion thread [1], I'd like to start
> an official thread for the community to vote on release line 3.2 EOL.
>
> It will include,
> a. An official announcement informs no further regular Hadoop 3.2.x
> releases.
> b. Issues which target 3.2.5 will not be fixed.
>
> This vote will run for 7 days and conclude by Dec 13, 2023.
>
> I’ll start with my +1.
>
> Best Regards,
> - He Xiaoqiao
>
> [1] https://lists.apache.org/thread/bbf546c6jz0og3xcl9l3qfjo93b65szr
>


Re: [DISCUSS] Make some release lines EOL

2023-12-05 Thread Steve Loughran
+1 for making 3.3 and 3.4 the maintained lines

3.2.x we should say -as it is true- that the age of the dependencies is
such that it is transitively insecure. To fix those, people must upgrade.

For 2.10.x, we should think about whether to cherrypick our own CVEs there,
but not actually do any new ASF releases.
I couldn't even get hold of a java7 JDK to do the release even if I wanted
to -the same must hold for many others; getting a new release qualified
would be hard. Best to say "upgrade time'.


This goes well with a 3.4.0 release, as there's a clear story: we have a
new 3.4.x line stabilising, if you want something already stable move onto
3.3.x if you hadn't already





On Mon, 4 Dec 2023 at 12:39, Xiaoqiao He  wrote:

> Hi folks,
>
> There are many discussions about which release lines should we still
> consider actively
> maintained in history. I want to launch this topic again, and try to get a
> consensus.
>
> From download page[1] and active branches page[2], we have the following
> release lines:
> Hadoop 3.3 Release (release-3.3.5 at Jun 22 2022),  360 commits checked in
> since last release.
> Hadoop 3.2 Release (release-3.2.4 at Jul 11, 2022) 36 commits checked in
> since last release.
> Hadoop 2.10 Release (release-2.10.2 at May 17, 2022) 24 commits checked in
> since last release.
>
> And Hadoop 3.4.0 will be coming soon which Shilun Fan (maybe cooperating
> with Ahmar Suhail?)
> has been actively working on getting the 3.4.0 release out.
>
> Considering the less updates for some active branches, should we declare to
> our downstream
> users that some of these lines will EOL?
>
> IMO we should announce EOL branch-2.10 and branch-3.2 which are not active
> now.
> Then we could focus on minor active branches (branch-3.3 and branch-3.4)
> and increase release pace.
>
> So how about to keep branch-3.3 and branch-3.4 release lines as actively
> maintained, And mark branch-2.10 and branch-3.2 EOL? Any opinions? Thanks.
>
> Best Regards,
> - He Xiaoqiao
>
> [1] https://hadoop.apache.org/releases.html
> [2]
>
> https://cwiki.apache.org/confluence/display/HADOOP/Hadoop+Active+Release+Lines
>


[jira] [Resolved] (MAPREDUCE-7451) review TrackerDistributedCacheManager.checkPermissionOfOther

2023-08-21 Thread Steve Loughran (Jira)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-7451?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran resolved MAPREDUCE-7451.
---
Resolution: Won't Fix

The class TrackerDistributedCacheManager only exists in hadoop releases <= 1.2. 
no need to look at it

> review TrackerDistributedCacheManager.checkPermissionOfOther
> 
>
> Key: MAPREDUCE-7451
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7451
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Affects Versions: 0.22.0, 1.2.1
>Reporter: Yiheng Cao
>Priority: Major
>
> TrackerDistributedCacheManager.checkPermissionOfOther() doesn't seem to work 
> reliably



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org



[jira] [Created] (MAPREDUCE-7452) ManifestCommitter to support / as a destination

2023-08-21 Thread Steve Loughran (Jira)
Steve Loughran created MAPREDUCE-7452:
-

 Summary: ManifestCommitter to support / as a destination
 Key: MAPREDUCE-7452
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-7452
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: client
Affects Versions: 3.3.6
Reporter: Steve Loughran


you can't commit work to the root of an object store through the manifest 
committer, as it will fail if the destination path exists, which always holds 
for root.

proposed
* check for dest / in job setup; if the path is not root, use 
createNewDirectory() as today
* if the path is root, delete all children but not the dir.





--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org



Re: Signing releases using automated release infra

2023-07-20 Thread Steve Loughran
could be good.

why not set it up for the third-party module first to see how well it works?

On Tue, 18 Jul 2023 at 21:05, Ayush Saxena  wrote:

> Something we can explore as well!!
>
> -Ayush
>
> Begin forwarded message:
>
> > From: Volkan Yazıcı 
> > Date: 19 July 2023 at 1:24:49 AM IST
> > To: d...@community.apache.org
> > Subject: Signing releases using automated release infra
> > Reply-To: d...@community.apache.org
> >
> > Abstract: Signing release artifacts using an automated release
> > infrastructure has been officially approved by LEGAL. This enables
> > projects to sign artifacts using, say, GitHub Actions.
> >
> > I have been trying to overhaul the Log4j release process and make it
> > as frictionless as possible since last year. As a part of that effort,
> > I wanted to sign artifacts in CI during deployment and in a
> > `members@a.o` thread[0] I explained how one can do that securely with
> > the help of Infra. That was in December 2022. It has been a long,
> > rough journey, but we succeeded. In this PR[1], Legal has updated the
> > release policy to reflect that this process is officially allowed.
> > Further, Infra put together guides[2][3] to assist projects. Logging
> > Services PMC has already successfully performed 4 Log4j Tools releases
> > using this approach, see its release process[4] for a demonstration.
> >
> > [0] (members only!)
> > https://lists.apache.org/thread/1o12mkjrhyl45f9pof94pskg55vhs61n
> > [1] https://github.com/apache/www-site/pull/235
> > [2] https://infra.apache.org/release-publishing.html#signing
> > [3]
> https://infra.apache.org/release-signing.html#automated-release-signing
> > [4]
> https://github.com/apache/logging-log4j-tools/blob/master/RELEASING.adoc
> >
> > # F.A.Q.
> >
> > ## Why shall a project be interested in this?
> >
> > It greatly simplifies the release process. See Log4j Tools release
> > process[4], probably the simplest among all Java-based ASF projects.
> >
> > ## How can a project get started?
> >
> > 1. Make sure your project builds are reproducible (otherwise there is
> > no way PMC can verify the integrity of CI-produced and -signed
> > artifacts)
> > 2. Clone and adapt INFRA-23996 (GPG keys in GitHub secrets)
> > 3. Clone and adapt INFRA-23974 (Nexus creds. in GitHub secrets for
> > snapshot deployments)
> > 4. Clone and adapt INFRA-24051 (Nexus creds. in GitHub secrets for
> > staging deployments)
> >
> > You might also want to check this[5] GitHub Action workflow for
> inspiration.
> >
> > [5]
> https://github.com/apache/logging-log4j-tools/blob/master/.github/workflows/build.yml
> >
> > ## Does the "automated release infrastructure" (CI) perform the full
> release?
> >
> > No. CI *only* uploads signed artifacts to Nexus. The release manager
> > (RM) still needs to copy the CI-generated files to SVN, PMC needs to
> > vote, and, upon consensus, RM needs to "close" the release in Nexus
> > and so on.
> >
> > -
> > To unsubscribe, e-mail: dev-unsubscr...@community.apache.org
> > For additional commands, e-mail: dev-h...@community.apache.org
> >
>


[jira] [Resolved] (MAPREDUCE-7432) Make Manifest Committer the default for abfs and gcs

2023-06-27 Thread Steve Loughran (Jira)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-7432?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran resolved MAPREDUCE-7432.
---
Fix Version/s: 3.4.0
   3.3.9
 Release Note: By default, the mapreduce manifest committer is used for 
jobs working with abfs and gcs.. Hadoop mapreduce jobs will pick this up 
automatically; for Spark it is a bit complicated: read the docs  to see the 
steps required.
   Resolution: Fixed

> Make Manifest Committer the default for abfs and gcs
> 
>
> Key: MAPREDUCE-7432
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7432
> Project: Hadoop Map/Reduce
>  Issue Type: New Feature
>  Components: client
>Affects Versions: 3.3.5
>    Reporter: Steve Loughran
>    Assignee: Steve Loughran
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.3.9
>
>
> Switch to the manifest committer as default for abfs and gcs
> * abfs: needed for performance, scale and resilience under some failure modes
> * gcs: provides correctness through atomic task commit and better job commit 
> performance



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org



Re: [VOTE] Release Apache Hadoop 3.3.6 RC0

2023-06-21 Thread Steve Loughran
1. we should patch gcs, to ignore those new keys.
2. i may be able to validate RC1 this weekend



On Fri, 16 Jun 2023 at 03:20, Wei-Chiu Chuang 
wrote:

> Overall so far so good.
>
> hadoop-api-shim:
> built, tested successfully.
>
> cloudstore:
> built successfully.
>
> Spark:
> built successfully. Passed hadoop-cloud tests.
>
> Ozone:
> One test failure due to unrelated Ozone issue. This test is being disabled
> in the latest Ozone code.
>
> org.apache.hadoop.hdds.utils.NativeLibraryNotLoadedException: Unable
> to load library ozone_rocksdb_tools from both java.library.path &
> resource file libozone_rocksdb_t
> ools.so from jar.
> at
> org.apache.hadoop.hdds.utils.db.managed.ManagedSSTDumpTool.(ManagedSSTDumpTool.java:49)
>
>
> Google gcs:
> There are two test failures. The tests were added recently by HADOOP-18724
>  in Hadoop 3.3.6. This
> is okay. Not production code problem. Can be addressed in GCS code.
>
> [ERROR] Errors:
> [ERROR]
>
> TestInMemoryGoogleContractOpen>AbstractContractOpenTest.testFloatingPointLength:403
> » IllegalArgument Unknown mandatory key for gs://fake-in-memory-test-buck
> et/contract-test/testFloatingPointLength "fs.option.openfile.length"
> [ERROR]
>
> TestInMemoryGoogleContractOpen>AbstractContractOpenTest.testOpenFileApplyAsyncRead:341
> » IllegalArgument Unknown mandatory key for gs://fake-in-memory-test-b
> ucket/contract-test/testOpenFileApplyAsyncRead "fs.option.openfile.length"
>
>
>
>
>
> On Wed, Jun 14, 2023 at 5:01 PM Wei-Chiu Chuang 
> wrote:
>
> > The hbase-filesystem tests passed after reverting HADOOP-18596
> >  and HADOOP-18633
> >  from my local tree.
> > So I think it's a matter of the default behavior being changed. It's not
> > the end of the world. I think we can address it by adding an incompatible
> > change flag and a release note.
> >
> > On Wed, Jun 14, 2023 at 3:55 PM Wei-Chiu Chuang 
> > wrote:
> >
> >> Cross referenced git history and jira. Changelog needs some update
> >>
> >> Not in the release
> >>
> >>1. HDFS-16858 
> >>
> >>
> >>1. HADOOP-18532 
> >>2.
> >>   1. HDFS-16861 
> >>  2.
> >> 1. HDFS-16866
> >> 
> >> 2.
> >>1. HADOOP-18320
> >>
> >>2.
> >>
> >> Updated fixed version. Will generate. new Changelog in the next RC.
> >>
> >> Was able to build HBase and hbase-filesystem without any code change.
> >>
> >> hbase has one unit test failure. This one is reproducible even with
> >> Hadoop 3.3.5, so maybe a red herring. Local env or something.
> >>
> >> [ERROR] Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed:
> >> 9.007 s <<< FAILURE! - in
> >> org.apache.hadoop.hbase.regionserver.TestSyncTimeRangeTracker
> >> [ERROR]
> >>
> org.apache.hadoop.hbase.regionserver.TestSyncTimeRangeTracker.testConcurrentIncludeTimestampCorrectness
> >>  Time elapsed: 3.13 s  <<< ERROR!
> >> java.lang.OutOfMemoryError: Java heap space
> >> at
> >>
> org.apache.hadoop.hbase.regionserver.TestSyncTimeRangeTracker$RandomTestData.(TestSyncTimeRangeTracker.java:91)
> >> at
> >>
> org.apache.hadoop.hbase.regionserver.TestSyncTimeRangeTracker.testConcurrentIncludeTimestampCorrectness(TestSyncTimeRangeTracker.java:156)
> >>
> >> hbase-filesystem has three test failures in TestHBOSSContractDistCp, and
> >> is not reproducible with Hadoop 3.3.5.
> >> [ERROR] Failures: [ERROR]
> >>
> TestHBOSSContractDistCp>AbstractContractDistCpTest.testDistCpUpdateCheckFileSkip:976->Assert.fail:88
> >> 10 errors in file of length 10
> >> [ERROR]
> >>
> TestHBOSSContractDistCp>AbstractContractDistCpTest.testUpdateDeepDirectoryStructureNoChange:270->AbstractContractDistCpTest.assertCounterInRange:290->Assert.assertTrue:41->Assert.fail:88
> >> Files Skipped value 0 too below minimum 1
> >> [ERROR]
> >>
> TestHBOSSContractDistCp>AbstractContractDistCpTest.testUpdateDeepDirectoryStructureToRemote:259->AbstractContractDistCpTest.distCpUpdateDeepDirectoryStructure:334->AbstractContractDistCpTest.assertCounterInRange:294->Assert.assertTrue:41->Assert.fail:88
> >> Files Copied value 2 above maximum 1
> >> [INFO]
> >> [ERROR] Tests run: 240, Failures: 3, Errors: 0, Skipped: 58
> >>
> >>
> >> Ozone
> >> test in progress. Will report back.
> >>
> >>
> >> On Tue, Jun 13, 2023 at 11:27 PM Wei-Chiu Chuang 
> >> wrote:
> >>
> >>> I am inviting anyone to try and vote on this release candidate.
> >>>
> >>> Note:
> >>> This is built off branch-3.3.6 plus PR#5741 (aws sdk update) and
> PR#5740
> >>> (LICENSE file update)
> >>>
> >>> The RC is available at:
> >>> 

Re: [VOTE] Release Apache Hadoop 3.3.6 RC0

2023-06-15 Thread Steve Loughran
Which branch is -3.3.6 off? 3.3.5 or 3.3?

I'm travelling for the next few days and unlikely to be able to test this;
will do my best

On Wed, 14 Jun 2023 at 07:27, Wei-Chiu Chuang  wrote:

> I am inviting anyone to try and vote on this release candidate.
>
> Note:
> This is built off branch-3.3.6 plus PR#5741 (aws sdk update) and PR#5740
> (LICENSE file update)
>
> The RC is available at:
> https://home.apache.org/~weichiu/hadoop-3.3.6-RC0-amd64/ (for amd64)
> https://home.apache.org/~weichiu/hadoop-3.3.6-RC0-arm64/ (for arm64)
>
> Git tag: release-3.3.6-RC0
> https://github.com/apache/hadoop/releases/tag/release-3.3.6-RC0
>
> Maven artifacts is built by x86 machine and are staged at
> https://repository.apache.org/content/repositories/orgapachehadoop-1378/
>
> My public key:
> https://dist.apache.org/repos/dist/release/hadoop/common/KEYS
>
> Changelog:
> https://home.apache.org/~weichiu/hadoop-3.3.6-RC0-amd64/CHANGELOG.md
>
> Release notes:
> https://home.apache.org/~weichiu/hadoop-3.3.6-RC0-amd64/RELEASENOTES.md
>
> This is a relatively small release (by Hadoop standard) containing about
> 120 commits.
> Please give it a try, this RC vote will run for 7 days.
>
>
> Feature highlights:
>
> SBOM artifacts
> 
> Starting from this release, Hadoop publishes Software Bill of Materials
> (SBOM) using
> CycloneDX Maven plugin. For more information about SBOM, please go to
> [SBOM](https://cwiki.apache.org/confluence/display/COMDEV/SBOM).
>
> HDFS RBF: RDBMS based token storage support
> 
> HDFS Router-Router Based Federation now supports storing delegation tokens
> on MySQL,
> [HADOOP-18535](https://issues.apache.org/jira/browse/HADOOP-18535)
> which improves token operation through over the original Zookeeper-based
> implementation.
>
>
> New File System APIs
> 
> [HADOOP-18671](https://issues.apache.org/jira/browse/HADOOP-18671) moved a
> number of
> HDFS-specific APIs to Hadoop Common to make it possible for certain
> applications that
> depend on HDFS semantics to run on other Hadoop compatible file systems.
>
> In particular, recoverLease() and isFileClosed() are exposed through
> LeaseRecoverable
> interface. While setSafeMode() is exposed through SafeMode interface.
>


[jira] [Resolved] (MAPREDUCE-7435) ManifestCommitter OOM on azure job

2023-06-12 Thread Steve Loughran (Jira)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-7435?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran resolved MAPREDUCE-7435.
---
Fix Version/s: 3.4.0
   3.3.9
   Resolution: Fixed

> ManifestCommitter OOM on azure job
> --
>
> Key: MAPREDUCE-7435
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7435
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: client
>Affects Versions: 3.3.5
>    Reporter: Steve Loughran
>    Assignee: Steve Loughran
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.3.9
>
>
> I've got some reports of spark jobs OOM if the manifest committer is used 
> through abfs.
> either the manifests are using too much memory, or something is not working 
> with azure stream memory use (or both).
> before proposing a solution, first step should be to write a test to load 
> many, many manifests, each with lots of dirs and files to see what breaks.
> note: we did have OOM issues with the s3a committer, on teragen but those 
> structures have to include every etag of every block, so the manifest size is 
> O(blocks); the new committer is O(files + dirs).
> {code}
> java.lang.OutOfMemoryError: Java heap space
> at 
> org.apache.hadoop.fs.azurebfs.services.AbfsInputStream.readOneBlock(AbfsInputStream.java:314)
> at 
> org.apache.hadoop.fs.azurebfs.services.AbfsInputStream.read(AbfsInputStream.java:267)
> at java.io.DataInputStream.read(DataInputStream.java:149)
> at 
> com.fasterxml.jackson.core.json.ByteSourceJsonBootstrapper.ensureLoaded(ByteSourceJsonBootstrapper.java:539)
> at 
> com.fasterxml.jackson.core.json.ByteSourceJsonBootstrapper.detectEncoding(ByteSourceJsonBootstrapper.java:133)
> at 
> com.fasterxml.jackson.core.json.ByteSourceJsonBootstrapper.constructParser(ByteSourceJsonBootstrapper.java:256)
> at com.fasterxml.jackson.core.JsonFactory._createParser(JsonFactory.java:1656)
> at com.fasterxml.jackson.core.JsonFactory.createParser(JsonFactory.java:1085)
> at 
> com.fasterxml.jackson.databind.ObjectMapper.readValue(ObjectMapper.java:3585)
> at 
> org.apache.hadoop.util.JsonSerialization.fromJsonStream(JsonSerialization.java:164)
> at org.apache.hadoop.util.JsonSerialization.load(JsonSerialization.java:279)
> at 
> org.apache.hadoop.mapreduce.lib.output.committer.manifest.files.TaskManifest.load(TaskManifest.java:361)
> at 
> org.apache.hadoop.mapreduce.lib.output.committer.manifest.impl.ManifestStoreOperationsThroughFileSystem.loadTaskManifest(ManifestStoreOperationsThroughFileSystem.java:133)
> at 
> org.apache.hadoop.mapreduce.lib.output.committer.manifest.stages.AbstractJobOrTaskStage.lambda$loadManifest$6(AbstractJobOrTaskStage.java:493)
> at 
> org.apache.hadoop.mapreduce.lib.output.committer.manifest.stages.AbstractJobOrTaskStage$$Lambda$231/1813048085.apply(Unknown
>  Source)
> at 
> org.apache.hadoop.fs.statistics.impl.IOStatisticsBinding.invokeTrackingDuration(IOStatisticsBinding.java:543)
> at 
> org.apache.hadoop.fs.statistics.impl.IOStatisticsBinding.lambda$trackDurationOfOperation$5(IOStatisticsBinding.java:524)
> at 
> org.apache.hadoop.fs.statistics.impl.IOStatisticsBinding$$Lambda$217/489150849.apply(Unknown
>  Source)
> at 
> org.apache.hadoop.fs.statistics.impl.IOStatisticsBinding.trackDuration(IOStatisticsBinding.java:445)
> at 
> org.apache.hadoop.mapreduce.lib.output.committer.manifest.stages.AbstractJobOrTaskStage.loadManifest(AbstractJobOrTaskStage.java:492)
> at 
> org.apache.hadoop.mapreduce.lib.output.committer.manifest.stages.LoadManifestsStage.fetchTaskManifest(LoadManifestsStage.java:170)
> at 
> org.apache.hadoop.mapreduce.lib.output.committer.manifest.stages.LoadManifestsStage.processOneManifest(LoadManifestsStage.java:138)
> at 
> org.apache.hadoop.mapreduce.lib.output.committer.manifest.stages.LoadManifestsStage$$Lambda$229/137752948.run(Unknown
>  Source)
> at 
> org.apache.hadoop.util.functional.TaskPool$Builder.lambda$runParallel$0(TaskPool.java:410)
> at 
> org.apache.hadoop.util.functional.TaskPool$Builder$$Lambda$230/467893357.run(Unknown
>  Source)
> at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:750)
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org



Re: [DISCUSS] Hadoop 3.3.6 release planning

2023-05-05 Thread Steve Loughran
Wei-Chiu has suggested a minimal "things in 3.3.5 which were very broken,
api change for ozone and any critical jar updates"

so much lower risk/easier to qualify and ship.

I need to get https://issues.apache.org/jira/browse/HADOOP-18724 in here;
maybe look at a refresh of the "classic" jars (slf4j, reload, jackson*,
remove json-smart...)

I'd also like to downgrade protobuf 2.5 from required to optional; even
though hadoop uses the shaded one, to support hbase etc the IPC code still
has direct use of the 2.5 classes. that coud be optional

if anyone wants to take up this PR, I would be very happy
https://github.com/apache/hadoop/pull/4996

On Fri, 5 May 2023 at 04:27, Xiaoqiao He  wrote:

> Thanks Wei-Chiu for driving this release.
> Cherry-pick YARN-11482 to branch-3.3 and mark 3.3.6 as the fixed version.
>
> so far only 8 jiras were resolved in the branch-3.3 line.
>
>
> If we should consider both 3.3.6 and 3.3.9 (which is from release-3.3.5
> discuss)[1] for this release line?
> I try to query with `project in (HDFS, YARN, HADOOP, MAPREDUCE) AND
> fixVersion in (3.3.6, 3.3.9)`[2],
> there are more than hundred jiras now.
>
> Best Regards,
> - He Xiaoqiao
>
> [1] https://lists.apache.org/thread/kln96frt2tcg93x6ht99yck9m7r9qwxp
> [2]
>
> https://issues.apache.org/jira/browse/YARN-11482?jql=project%20in%20(HDFS%2C%20YARN%2C%20HADOOP%2C%20MAPREDUCE)%20AND%20fixVersion%20in%20(3.3.6%2C%203.3.9)
>
>
> On Fri, May 5, 2023 at 1:19 AM Wei-Chiu Chuang  wrote:
>
> > Hi community,
> >
> > I'd like to kick off the discussion around Hadoop 3.3.6 release plan.
> >
> > I'm being selfish but my intent for 3.3.6 is to have the new APIs in
> > HADOOP-18671  added
> so
> > we can have HBase to adopt this new API. Other than that, perhaps
> > thirdparty dependency updates.
> >
> > If you have open items to be added in the coming weeks, please add 3.3.6
> to
> > the target release version. Right now I am only seeing three open jiras
> > targeting 3.3.6.
> >
> > I imagine this is going to be a small release as 3.3.5 (hat tip to Steve)
> > was only made two months back, and so far only 8 jiras were resolved in
> the
> > branch-3.3 line.
> >
> > Best,
> > Weichiu
> >
>


[jira] [Resolved] (MAPREDUCE-7437) spotbugs complaining about .Fetcher's update of a nonatomic static counter

2023-04-25 Thread Steve Loughran (Jira)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-7437?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran resolved MAPREDUCE-7437.
---
Fix Version/s: 3.4.0
   3.3.9
   Resolution: Fixed

> spotbugs complaining about .Fetcher's update of a nonatomic static counter
> --
>
> Key: MAPREDUCE-7437
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7437
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: build, client
>Affects Versions: 3.4.0
>    Reporter: Steve Loughran
>    Assignee: Steve Loughran
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.3.9
>
>
> I'm having to do this to get MAPREDUCE-7435 through the build; spotbugs is 
> complaining about the Fetcher constructor incrementing a non-static shared 
> counter. Which is true, just odd it has only just surfaced.
> going to fix as a standalone patch but include that in the commit chain of 
> that PR too



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org



[jira] [Created] (MAPREDUCE-7437) spotbugs complaining about .Fetcher's update of a nonatomic static counter

2023-04-21 Thread Steve Loughran (Jira)
Steve Loughran created MAPREDUCE-7437:
-

 Summary: spotbugs complaining about .Fetcher's update of a 
nonatomic static counter
 Key: MAPREDUCE-7437
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-7437
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: build, client
Affects Versions: 3.4.0
Reporter: Steve Loughran
Assignee: Steve Loughran


I'm having to do this to get MAPREDUCE-7435 through the build; spotbugs is 
complaining about the Fetcher constructor incrementing a non-static shared 
counter. Which is true, just odd it has only just surfaced.

going to fix as a standalone patch but include that in the commit chain of that 
PR too



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org



Re: [VOTE] Release Apache Hadoop 3.3.5 (RC3)

2023-03-31 Thread Steve Loughran
go ahead and cut it...I'd forgotten about it.

thanks,

steve

On Fri, 31 Mar 2023 at 06:44, Ayush Saxena  wrote:

> We have a daily build running for 3.3.5:
> https://ci-hadoop.apache.org/job/hadoop-qbt-3.3.5-java8-linux-x86_64/
>
> We have already released it, so I feel we can disable it. Will do it
> tomorrow, if nobody objects. In case the one who configured it wants
> to do it early, feel free to do so.
>
> We already have one for branch-3.3 which runs weekly which most
> probably most of us don't follow :)
>
> -Ayush
>
> On Wed, 22 Mar 2023 at 00:20, Steve Loughran
>  wrote:
> >
> > ok, here's my summary, even though most of the binding voters forgot to
> > declare they were on the PMC.
> >
> > +1 binding
> >
> > Steve Loughran
> > Chris Nauroth
> > Masatake Iwasaki
> > Ayush Saxena
> > Xiaoqiao He
> >
> > +1 non-binding
> >
> > Viraj Jasani
> >
> >
> > 0 or -1 votes: none.
> >
> >
> > Accordingly: the release is good!
> >
> > I will send the formal announcement out tomorrow
> >
> > A big thank you to everyone who qualified the RC, I know its a lot of
> work.
> > We can now get this out and *someone else* can plan the followup.
> >
> >
> > steve
> >
> > On Mon, 20 Mar 2023 at 16:01, Chris Nauroth  wrote:
> >
> > > +1
> > >
> > > Thank you for the release candidate, Steve!
> > >
> > > * Verified all checksums.
> > > * Verified all signatures.
> > > * Built from source, including native code on Linux.
> > > * mvn clean package -Pnative -Psrc -Drequire.openssl
> -Drequire.snappy
> > > -Drequire.zstd -DskipTests
> > > * Tests passed.
> > > * mvn --fail-never clean test -Pnative -Dparallel-tests
> > > -Drequire.snappy -Drequire.zstd -Drequire.openssl
> > > -Dsurefire.rerunFailingTestsCount=3 -DtestsThreadCount=8
> > > * Checked dependency tree to make sure we have all of the expected
> library
> > > updates that are mentioned in the release notes.
> > > * mvn -o dependency:tree
> > > * Confirmed that hadoop-openstack is now just a stub placeholder
> artifact
> > > with no code.
> > > * For ARM verification:
> > > * Ran "file " on all native binaries in the ARM tarball to
> confirm
> > > they actually came out with ARM as the architecture.
> > > * Output of hadoop checknative -a on ARM looks good.
> > > * Ran a MapReduce job with the native bzip2 codec for compression,
> and
> > > it worked fine.
> > > * Ran a MapReduce job with YARN configured to use
> > > LinuxContainerExecutor and verified launching the containers through
> > > container-executor worked.
> > >
> > > Chris Nauroth
> > >
> > >
> > > On Mon, Mar 20, 2023 at 3:45 AM Ayush Saxena 
> wrote:
> > >
> > > > +1(Binding)
> > > >
> > > > * Built from source (x86 & ARM)
> > > > * Successful Native Build (x86 & ARM)
> > > > * Verified Checksums (x86 & ARM)
> > > > * Verified Signature (x86 & ARM)
> > > > * Checked the output of hadoop version (x86 & ARM)
> > > > * Verified the output of hadoop checknative (x86 & ARM)
> > > > * Ran some basic HDFS shell commands.
> > > > * Ran some basic Yarn shell commands.
> > > > * Played a bit with HDFS Erasure Coding.
> > > > * Ran TeraGen & TeraSort
> > > > * Browed through NN, DN, RM & NM UI
> > > > * Skimmed over the contents of website.
> > > > * Skimmed over the contents of maven repo.
> > > > * Selectively ran some HDFS & CloudStore tests
> > > >
> > > > Thanx Steve for driving the release. Good Luck!!!
> > > >
> > > > -Ayush
> > > >
> > > > > On 20-Mar-2023, at 12:54 PM, Xiaoqiao He 
> > > wrote:
> > > > >
> > > > > +1
> > > > >
> > > > > * Verified signature and checksum of the source tarball.
> > > > > * Built the source code on Ubuntu and OpenJDK 11 by `mvn clean
> package
> > > > > -DskipTests -Pnative -Pdist -Dtar`.
> > > > > * Setup pseudo cluster with HDFS and YARN.
> > > > > * Run simple FsShell - mkdir/put/get/mv/rm (include EC) and check
> the
> > > > > result.
> > > > > * Run example mr applications and check the result - Pi &
> wordcount.
> 

Re: [DISCUSS] hadoop branch-3.3+ going to java11 only

2023-03-28 Thread Steve Loughran
well, how about we flip the switch and get on with it.

slf4j seems happy on java11,

side issue, anyone seen test failures on zulu1.8; somehow my test run is
failing and i'm trying to work out whether its a mismatch in command
line/ide jvm versions, or the 3.3.5 JARs have been built with an openjdk
version which requires IntBuffer implements an overridden method IntBuffer
rewind().

java.lang.NoSuchMethodError: java.nio.IntBuffer.rewind()Ljava/nio/IntBuffer;

at org.apache.hadoop.fs.FSInputChecker.verifySums(FSInputChecker.java:341)
at
org.apache.hadoop.fs.FSInputChecker.readChecksumChunk(FSInputChecker.java:308)
at org.apache.hadoop.fs.FSInputChecker.read1(FSInputChecker.java:257)
at org.apache.hadoop.fs.FSInputChecker.read(FSInputChecker.java:202)
at java.io.DataInputStream.read(DataInputStream.java:149)

On Tue, 28 Mar 2023 at 15:52, Viraj Jasani  wrote:

> IIRC some of the ongoing major dependency upgrades (log4j 1 to 2, jersey 1
> to 2 and junit 4 to 5) are blockers for java 11 compile + test stability.
>
>
> On Tue, Mar 28, 2023 at 4:55 AM Steve Loughran  >
> wrote:
>
> >  Now that hadoop 3.3.5 is out, i want to propose something new
> >
> > we switch branch-3.3 and trunk to being java11 only
> >
> >
> >1. java 11 has been out for years
> >2. oracle java 8 is no longer available under "premier support"; you
> >can't really get upgrades
> >https://www.oracle.com/java/technologies/java-se-support-roadmap.html
> >3. openJDK 8 releases != oracle ones, and things you compile with them
> >don't always link to oracle java 8 (some classes in java.nio have
> added
> >more overrides)
> >4. more and more libraries we want to upgrade to/bundle are java 11
> only
> >5. moving to java 11 would cut our yetus build workload in half, and
> >line up for adding java 17 builds instead.
> >
> >
> > I know there are some outstanding issues still in
> > https://issues.apache.org/jira/browse/HADOOP-16795 -but are they
> blockers?
> > Could we just move to java11 and enhance at our leisure, once java8 is no
> > longer a concern.
> >
>


[DISCUSS] hadoop branch-3.3+ going to java11 only

2023-03-28 Thread Steve Loughran
 Now that hadoop 3.3.5 is out, i want to propose something new

we switch branch-3.3 and trunk to being java11 only


   1. java 11 has been out for years
   2. oracle java 8 is no longer available under "premier support"; you
   can't really get upgrades
   https://www.oracle.com/java/technologies/java-se-support-roadmap.html
   3. openJDK 8 releases != oracle ones, and things you compile with them
   don't always link to oracle java 8 (some classes in java.nio have added
   more overrides)
   4. more and more libraries we want to upgrade to/bundle are java 11 only
   5. moving to java 11 would cut our yetus build workload in half, and
   line up for adding java 17 builds instead.


I know there are some outstanding issues still in
https://issues.apache.org/jira/browse/HADOOP-16795 -but are they blockers?
Could we just move to java11 and enhance at our leisure, once java8 is no
longer a concern.


[jira] [Created] (MAPREDUCE-7435) ManifestCommitter OOM on azure job

2023-03-27 Thread Steve Loughran (Jira)
Steve Loughran created MAPREDUCE-7435:
-

 Summary: ManifestCommitter OOM on azure job
 Key: MAPREDUCE-7435
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-7435
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: client
Affects Versions: 3.3.5
Reporter: Steve Loughran
Assignee: Steve Loughran


I've got some reports of spark jobs OOM if the manifest committer is used 
through abfs.

either the manifests are using too much memory, or something is not working 
with azure stream memory use (or both).

before proposing a solution, first step should be to write a test to load many, 
many manifests, each with lots of dirs and files to see what breaks.

note: we did have OOM issues with the s3a committer, on teragen but those 
structures have to include every etag of every block, so the manifest size is 
O(blocks); the new committer is O(files + dirs).

{code}
java.lang.OutOfMemoryError: Java heap space
at 
org.apache.hadoop.fs.azurebfs.services.AbfsInputStream.readOneBlock(AbfsInputStream.java:314)
at 
org.apache.hadoop.fs.azurebfs.services.AbfsInputStream.read(AbfsInputStream.java:267)
at java.io.DataInputStream.read(DataInputStream.java:149)
at 
com.fasterxml.jackson.core.json.ByteSourceJsonBootstrapper.ensureLoaded(ByteSourceJsonBootstrapper.java:539)
at 
com.fasterxml.jackson.core.json.ByteSourceJsonBootstrapper.detectEncoding(ByteSourceJsonBootstrapper.java:133)
at 
com.fasterxml.jackson.core.json.ByteSourceJsonBootstrapper.constructParser(ByteSourceJsonBootstrapper.java:256)
at com.fasterxml.jackson.core.JsonFactory._createParser(JsonFactory.java:1656)
at com.fasterxml.jackson.core.JsonFactory.createParser(JsonFactory.java:1085)
at com.fasterxml.jackson.databind.ObjectMapper.readValue(ObjectMapper.java:3585)
at 
org.apache.hadoop.util.JsonSerialization.fromJsonStream(JsonSerialization.java:164)
at org.apache.hadoop.util.JsonSerialization.load(JsonSerialization.java:279)
at 
org.apache.hadoop.mapreduce.lib.output.committer.manifest.files.TaskManifest.load(TaskManifest.java:361)
at 
org.apache.hadoop.mapreduce.lib.output.committer.manifest.impl.ManifestStoreOperationsThroughFileSystem.loadTaskManifest(ManifestStoreOperationsThroughFileSystem.java:133)
at 
org.apache.hadoop.mapreduce.lib.output.committer.manifest.stages.AbstractJobOrTaskStage.lambda$loadManifest$6(AbstractJobOrTaskStage.java:493)
at 
org.apache.hadoop.mapreduce.lib.output.committer.manifest.stages.AbstractJobOrTaskStage$$Lambda$231/1813048085.apply(Unknown
 Source)
at 
org.apache.hadoop.fs.statistics.impl.IOStatisticsBinding.invokeTrackingDuration(IOStatisticsBinding.java:543)
at 
org.apache.hadoop.fs.statistics.impl.IOStatisticsBinding.lambda$trackDurationOfOperation$5(IOStatisticsBinding.java:524)
at 
org.apache.hadoop.fs.statistics.impl.IOStatisticsBinding$$Lambda$217/489150849.apply(Unknown
 Source)
at 
org.apache.hadoop.fs.statistics.impl.IOStatisticsBinding.trackDuration(IOStatisticsBinding.java:445)
at 
org.apache.hadoop.mapreduce.lib.output.committer.manifest.stages.AbstractJobOrTaskStage.loadManifest(AbstractJobOrTaskStage.java:492)
at 
org.apache.hadoop.mapreduce.lib.output.committer.manifest.stages.LoadManifestsStage.fetchTaskManifest(LoadManifestsStage.java:170)
at 
org.apache.hadoop.mapreduce.lib.output.committer.manifest.stages.LoadManifestsStage.processOneManifest(LoadManifestsStage.java:138)
at 
org.apache.hadoop.mapreduce.lib.output.committer.manifest.stages.LoadManifestsStage$$Lambda$229/137752948.run(Unknown
 Source)
at 
org.apache.hadoop.util.functional.TaskPool$Builder.lambda$runParallel$0(TaskPool.java:410)
at 
org.apache.hadoop.util.functional.TaskPool$Builder$$Lambda$230/467893357.run(Unknown
 Source)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:750)

{code}





--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org



[ANNOUNCE] Apache Hadoop 3.3.5 release

2023-03-23 Thread Steve Loughran
On behalf of the Apache Hadoop Project Management Committee, I am
pleased to announce the release of Apache Hadoop 3.3.5.


This is a release of Apache Hadoop 3.3 line.

Key changes include

* A big update of dependencies to try and keep those reports of
  transitive CVEs under control -both genuine and false positives.
* Critical fix to ABFS input stream prefetching for correct reading.
* Vectored IO API for all FSDataInputStream implementations, with
  high-performance versions for file:// and s3a:// filesystems.
  file:// through java native IO
  s3a:// parallel GET requests.
* Arm64 binaries. Note, because the arm64 release was on a different
  platform, the jar files may not match those of the x86
  release -and therefore the maven artifacts.
* Security fixes in Hadoop's own code.

Users of Apache Hadoop 3.3.4 and earlier should upgrade to
this release.

All users are encouraged to read the [overview of major changes][1]
since release 3.3.4.

For details of bug fixes, improvements, and other enhancements since
the previous 3.3.4 release, please check [release notes][2]
and [changelog][3].


*Azure ABFS: Critical Stream Prefetch Fix*


The ABFS connector has a critical bug fix
https://issues.apache.org/jira/browse/HADOOP-18546:
*ABFS. Disable purging list of in-progress reads in abfs stream close().*

All users of the abfs connector in hadoop releases 3.3.2+ MUST either
upgrade
to this release or disable prefetching by setting
`fs.azure.readaheadqueue.depth` to `0`.


[1]: http://hadoop.apache.org/docs/r3.3.5/index.html
[2]:
http://hadoop.apache.org/docs/r3.3.5/hadoop-project-dist/hadoop-common/release/3.3.5/RELEASENOTES.3.3.5.html
[3]:
http://hadoop.apache.org/docs/r3.3.5/hadoop-project-dist/hadoop-common/release/3.3.5/CHANGELOG.3.3.5.html


Many thanks to everyone who helped in this release by supplying patches,
reviewing them, helping get this release building and testing and
reviewing the final artifacts.

steve


Re: [VOTE] Release Apache Hadoop 3.3.5 (RC3)

2023-03-21 Thread Steve Loughran
ok, here's my summary, even though most of the binding voters forgot to
declare they were on the PMC.

+1 binding

Steve Loughran
Chris Nauroth
Masatake Iwasaki
Ayush Saxena
Xiaoqiao He

+1 non-binding

Viraj Jasani


0 or -1 votes: none.


Accordingly: the release is good!

I will send the formal announcement out tomorrow

A big thank you to everyone who qualified the RC, I know its a lot of work.
We can now get this out and *someone else* can plan the followup.


steve

On Mon, 20 Mar 2023 at 16:01, Chris Nauroth  wrote:

> +1
>
> Thank you for the release candidate, Steve!
>
> * Verified all checksums.
> * Verified all signatures.
> * Built from source, including native code on Linux.
> * mvn clean package -Pnative -Psrc -Drequire.openssl -Drequire.snappy
> -Drequire.zstd -DskipTests
> * Tests passed.
> * mvn --fail-never clean test -Pnative -Dparallel-tests
> -Drequire.snappy -Drequire.zstd -Drequire.openssl
> -Dsurefire.rerunFailingTestsCount=3 -DtestsThreadCount=8
> * Checked dependency tree to make sure we have all of the expected library
> updates that are mentioned in the release notes.
> * mvn -o dependency:tree
> * Confirmed that hadoop-openstack is now just a stub placeholder artifact
> with no code.
> * For ARM verification:
> * Ran "file " on all native binaries in the ARM tarball to confirm
> they actually came out with ARM as the architecture.
> * Output of hadoop checknative -a on ARM looks good.
> * Ran a MapReduce job with the native bzip2 codec for compression, and
> it worked fine.
> * Ran a MapReduce job with YARN configured to use
> LinuxContainerExecutor and verified launching the containers through
> container-executor worked.
>
> Chris Nauroth
>
>
> On Mon, Mar 20, 2023 at 3:45 AM Ayush Saxena  wrote:
>
> > +1(Binding)
> >
> > * Built from source (x86 & ARM)
> > * Successful Native Build (x86 & ARM)
> > * Verified Checksums (x86 & ARM)
> > * Verified Signature (x86 & ARM)
> > * Checked the output of hadoop version (x86 & ARM)
> > * Verified the output of hadoop checknative (x86 & ARM)
> > * Ran some basic HDFS shell commands.
> > * Ran some basic Yarn shell commands.
> > * Played a bit with HDFS Erasure Coding.
> > * Ran TeraGen & TeraSort
> > * Browed through NN, DN, RM & NM UI
> > * Skimmed over the contents of website.
> > * Skimmed over the contents of maven repo.
> > * Selectively ran some HDFS & CloudStore tests
> >
> > Thanx Steve for driving the release. Good Luck!!!
> >
> > -Ayush
> >
> > > On 20-Mar-2023, at 12:54 PM, Xiaoqiao He 
> wrote:
> > >
> > > +1
> > >
> > > * Verified signature and checksum of the source tarball.
> > > * Built the source code on Ubuntu and OpenJDK 11 by `mvn clean package
> > > -DskipTests -Pnative -Pdist -Dtar`.
> > > * Setup pseudo cluster with HDFS and YARN.
> > > * Run simple FsShell - mkdir/put/get/mv/rm (include EC) and check the
> > > result.
> > > * Run example mr applications and check the result - Pi & wordcount.
> > > * Check the Web UI of NameNode/DataNode/Resourcemanager/NodeManager
> etc.
> > >
> > > Thanks Steve for your work.
> > >
> > > Best Regards,
> > > - He Xiaoqiao
> > >
> > >> On Mon, Mar 20, 2023 at 12:04 PM Masatake Iwasaki <
> > iwasak...@oss.nttdata.com>
> > >> wrote:
> > >>
> > >> +1
> > >>
> > >> + verified the signature and checksum of the source tarball.
> > >>
> > >> + built from the source tarball on Rocky Linux 8 (x86_64) and OpenJDK
> 8
> > >> with native profile enabled.
> > >>   + launched pseudo distributed cluster including kms and httpfs with
> > >> Kerberos and SSL enabled.
> > >>   + created encryption zone, put and read files via httpfs.
> > >>   + ran example MR wordcount over encryption zone.
> > >>   + checked the binary of container-executor.
> > >>
> > >> + built rpm packages by Bigtop (with trivial modifications) on Rocky
> > Linux
> > >> 8 (aarch64).
> > >>   + ran smoke-tests of hdfs, yarn and mapreduce.
> > >> + built site documentation and skimmed the contents.
> > >>   +  Javadocs are contained.
> > >>
> > >> Thanks,
> > >> Masatake Iwasaki
> > >>
> > >>> On 2023/03/16 4:47, Steve Loughran wrote:
> > >>> Apache Hadoop 3.3.5
> > >>>
> > >>> Mukund and I have put toget

Re: [VOTE] Release Apache Hadoop 3.3.5 (RC3)

2023-03-18 Thread Steve Loughran
Thank you for this!

Can anyone else with time do a review too? i really want to get this one
done, now the HDFS issues are all resolved.

I do not want this release to fall by the wayside through lack of votes
alone. In fact, I would be very unhappy



On Sat, 18 Mar 2023 at 06:47, Viraj Jasani  wrote:

> +1 (non-binding)
>
> * Signature/Checksum: ok
> * Rat check (1.8.0_341): ok
>  - mvn clean apache-rat:check
> * Built from source (1.8.0_341): ok
>  - mvn clean install  -DskipTests
> * Built tar from source (1.8.0_341): ok
>  - mvn clean package  -Pdist -DskipTests -Dtar -Dmaven.javadoc.skip=true
>
> Containerized deployments:
> * Deployed and started Hdfs - NN, DN, JN with Hbase 2.5 and Zookeeper 3.7
> * Deployed and started JHS, RM, NM
> * Hbase, hdfs CRUD looks good
> * Sample RowCount MapReduce job looks good
>
> * S3A tests with scale profile looks good
>
>
> On Wed, Mar 15, 2023 at 12:48 PM Steve Loughran
> 
> wrote:
>
> > Apache Hadoop 3.3.5
> >
> > Mukund and I have put together a release candidate (RC3) for Hadoop
> 3.3.5.
> >
> > What we would like is for anyone who can to verify the tarballs,
> especially
> > anyone who can try the arm64 binaries as we want to include them too.
> >
> > The RC is available at:
> > https://dist.apache.org/repos/dist/dev/hadoop/hadoop-3.3.5-RC3/
> >
> > The git tag is release-3.3.5-RC3, commit 706d88266ab
> >
> > The maven artifacts are staged at
> > https://repository.apache.org/content/repositories/orgapachehadoop-1369/
> >
> > You can find my public key at:
> > https://dist.apache.org/repos/dist/release/hadoop/common/KEYS
> >
> > Change log
> >
> https://dist.apache.org/repos/dist/dev/hadoop/hadoop-3.3.5-RC3/CHANGELOG.md
> >
> > Release notes
> >
> >
> https://dist.apache.org/repos/dist/dev/hadoop/hadoop-3.3.5-RC3/RELEASENOTES.md
> >
> > This is off branch-3.3 and is the first big release since 3.3.2.
> >
> > Key changes include
> >
> > * Big update of dependencies to try and keep those reports of
> >   transitive CVEs under control -both genuine and false positives.
> > * HDFS RBF enhancements
> > * Critical fix to ABFS input stream prefetching for correct reading.
> > * Vectored IO API for all FSDataInputStream implementations, with
> >   high-performance versions for file:// and s3a:// filesystems.
> >   file:// through java native io
> >   s3a:// parallel GET requests.
> > * This release includes Arm64 binaries. Please can anyone with
> >   compatible systems validate these.
> > * and compared to the previous RC, all the major changes are
> >   HDFS issues.
> >
> > Note, because the arm64 binaries are built separately on a different
> > platform and JVM, their jar files may not match those of the x86
> > release -and therefore the maven artifacts. I don't think this is
> > an issue (the ASF actually releases source tarballs, the binaries are
> > there for help only, though with the maven repo that's a bit blurred).
> >
> > The only way to be consistent would actually untar the x86.tar.gz,
> > overwrite its binaries with the arm stuff, retar, sign and push out
> > for the vote. Even automating that would be risky.
> >
> > Please try the release and vote. The vote will run for 5 days.
> >
> > -Steve
> >
>


Re: [VOTE] Release Apache Hadoop 3.3.5 (RC3)

2023-03-17 Thread Steve Loughran
and my vote

My vote

+1 binding

I've been using the RCs for a while as my CLI entry point, and testing it
through other builds

for this RC
* Local builds of cloudstore
* fs-api-shim
* spark
* built and ran my cloud integration tests, which now include large CVS
file jobs which should show the Azure prefetch bug if it still existed.

downloaded the tar, expanded it, ran command line code with it, including
cloudstore against the stores. we need to get hadoop-azure and its
dependencies onto the path by default, to make abfs io easier.


I have the arm binaries building, and did a checknative to make sure all
was good

stevel@0da162643f99:~/hadoop/patchprocess/hadoop-3.3.5$ bin/hadoop
checknative
2023-03-17 13:00:27,107 INFO bzip2.Bzip2Factory: Successfully loaded &
initialized native-bzip2 library system-native
2023-03-17 13:00:27,112 INFO zlib.ZlibFactory: Successfully loaded &
initialized native-zlib library
2023-03-17 13:00:27,121 WARN erasurecode.ErasureCodeNative: ISA-L support
is not available in your platform... using builtin-java codec where
applicable
2023-03-17 13:00:27,156 INFO nativeio.NativeIO: The native code was built
without PMDK support.
Native library checking:
hadoop:  true
/home/stevel/hadoop/patchprocess/hadoop-3.3.5/lib/native/libhadoop.so.1.0.0
zlib:true /lib/aarch64-linux-gnu/libz.so.1
zstd  :  true /lib/aarch64-linux-gnu/libzstd.so.1
bzip2:   true /lib/aarch64-linux-gnu/libbz2.so.1
openssl: true /lib/aarch64-linux-gnu/libcrypto.so
ISA-L:   false libhadoop was built without ISA-L support
PMDK:false The native code was built without PMDK support.

---


On Wed, 15 Mar 2023 at 19:47, Steve Loughran  wrote:

>
> Apache Hadoop 3.3.5
>
> Mukund and I have put together a release candidate (RC3) for Hadoop 3.3.5.
>
> What we would like is for anyone who can to verify the tarballs, especially
> anyone who can try the arm64 binaries as we want to include them too.
>
> The RC is available at:
> https://dist.apache.org/repos/dist/dev/hadoop/hadoop-3.3.5-RC3/
>
> The git tag is release-3.3.5-RC3, commit 706d88266ab
>
> The maven artifacts are staged at
> https://repository.apache.org/content/repositories/orgapachehadoop-1369/
>
> You can find my public key at:
> https://dist.apache.org/repos/dist/release/hadoop/common/KEYS
>
> Change log
> https://dist.apache.org/repos/dist/dev/hadoop/hadoop-3.3.5-RC3/CHANGELOG.md
>
> Release notes
>
> https://dist.apache.org/repos/dist/dev/hadoop/hadoop-3.3.5-RC3/RELEASENOTES.md
>
> This is off branch-3.3 and is the first big release since 3.3.2.
>
> Key changes include
>
> * Big update of dependencies to try and keep those reports of
>   transitive CVEs under control -both genuine and false positives.
> * HDFS RBF enhancements
> * Critical fix to ABFS input stream prefetching for correct reading.
> * Vectored IO API for all FSDataInputStream implementations, with
>   high-performance versions for file:// and s3a:// filesystems.
>   file:// through java native io
>   s3a:// parallel GET requests.
> * This release includes Arm64 binaries. Please can anyone with
>   compatible systems validate these.
> * and compared to the previous RC, all the major changes are
>   HDFS issues.
>
> Note, because the arm64 binaries are built separately on a different
> platform and JVM, their jar files may not match those of the x86
> release -and therefore the maven artifacts. I don't think this is
> an issue (the ASF actually releases source tarballs, the binaries are
> there for help only, though with the maven repo that's a bit blurred).
>
> The only way to be consistent would actually untar the x86.tar.gz,
> overwrite its binaries with the arm stuff, retar, sign and push out
> for the vote. Even automating that would be risky.
>
> Please try the release and vote. The vote will run for 5 days.
>
> -Steve
>


[VOTE] Release Apache Hadoop 3.3.5 (RC3)

2023-03-15 Thread Steve Loughran
Apache Hadoop 3.3.5

Mukund and I have put together a release candidate (RC3) for Hadoop 3.3.5.

What we would like is for anyone who can to verify the tarballs, especially
anyone who can try the arm64 binaries as we want to include them too.

The RC is available at:
https://dist.apache.org/repos/dist/dev/hadoop/hadoop-3.3.5-RC3/

The git tag is release-3.3.5-RC3, commit 706d88266ab

The maven artifacts are staged at
https://repository.apache.org/content/repositories/orgapachehadoop-1369/

You can find my public key at:
https://dist.apache.org/repos/dist/release/hadoop/common/KEYS

Change log
https://dist.apache.org/repos/dist/dev/hadoop/hadoop-3.3.5-RC3/CHANGELOG.md

Release notes
https://dist.apache.org/repos/dist/dev/hadoop/hadoop-3.3.5-RC3/RELEASENOTES.md

This is off branch-3.3 and is the first big release since 3.3.2.

Key changes include

* Big update of dependencies to try and keep those reports of
  transitive CVEs under control -both genuine and false positives.
* HDFS RBF enhancements
* Critical fix to ABFS input stream prefetching for correct reading.
* Vectored IO API for all FSDataInputStream implementations, with
  high-performance versions for file:// and s3a:// filesystems.
  file:// through java native io
  s3a:// parallel GET requests.
* This release includes Arm64 binaries. Please can anyone with
  compatible systems validate these.
* and compared to the previous RC, all the major changes are
  HDFS issues.

Note, because the arm64 binaries are built separately on a different
platform and JVM, their jar files may not match those of the x86
release -and therefore the maven artifacts. I don't think this is
an issue (the ASF actually releases source tarballs, the binaries are
there for help only, though with the maven repo that's a bit blurred).

The only way to be consistent would actually untar the x86.tar.gz,
overwrite its binaries with the arm stuff, retar, sign and push out
for the vote. Even automating that would be risky.

Please try the release and vote. The vote will run for 5 days.

-Steve


Re: [VOTE] Release Apache Hadoop 3.3.5 (RC2)

2023-03-07 Thread Steve Loughran
thanks.

now looking at a critical kerby CVE (
https://github.com/apache/hadoop/pull/5458) and revisited one for netty
from last week

i am never a fan of last-minute jar updates, but if we don't ship with them
we will be fielding jiras of "update kerby/netty on 3.3.5" for the next 18
months

On Mon, 6 Mar 2023 at 23:29, Erik Krogen  wrote:

> > OK. Could you have a go with a (locally built) patch release
>
> Just validated the same on the latest HEAD of branch-3.3.5, which includes
> the two HDFS Jiras I mentioned plus one additional one:
>
> * 143fe8095d4 (HEAD -> branch-3.3.5) 2023-03-06 HDFS-16934.
> TestDFSAdmin.testAllDatanodesReconfig regression (#5434) [slfan1989 <
> 55643692+slfan1...@users.noreply.github.com>]
> * d4ea9687a8e 2023-03-03 HDFS-16923. [SBN read] getlisting RPC to observer
> will throw NPE if path does not exist (#5400) [ZanderXu <
> zande...@apache.org
> >]
> * 44bf8aadedf 2023-03-03 HDFS-16832. [SBN READ] Follow-on to HDFS-16732.
> Fix NPE when check the block location of empty directory (#5099)
> [zhengchenyu ]
> * 72f8c2a4888 (tag: release-3.3.5-RC2) 2023-02-25 HADOOP-18641. Cloud
> connector dependency and LICENSE fixup. (#5429) [Steve Loughran <
> ste...@cloudera.com>]
>
> On Mon, Mar 6, 2023 at 2:17 AM Steve Loughran  >
> wrote:
>
> >  i looked at that test and wondered if it it was just being brittle to
> > time. I'm not a fan of those -there's one in abfs which is particularly
> bad
> > for me- maybe we could see if the test can be cut as it is quite a slow
> one
> >
> > On Sat, 4 Mar 2023 at 18:28, Viraj Jasani  wrote:
> >
> > > A minor update on ITestS3AConcurrentOps#testParallelRename
> > >
> > > I was previously connected to a vpn due to which bandwidth was getting
> > > throttled earlier. Ran the test again today without vpn and had no
> issues
> > > (earlier only 40% of the overall putObject were able to get completed
> > > within timeout).
> > >
> > >
> > > On Sat, Mar 4, 2023 at 4:29 AM Steve Loughran
> >  > > >
> > > wrote:
> > >
> > > > On Sat, 4 Mar 2023 at 01:47, Erik Krogen  wrote:
> > > >
> > > > > Thanks Steve. I see now that the branch cut was way back in October
> > so
> > > I
> > > > > definitely understand your frustration here!
> > > > >
> > > > > This made me realize that HDFS-16832
> > > > > <https://issues.apache.org/jira/browse/HDFS-16832>, which
> resolves a
> > > > very
> > > > > similar issue as the aforementioned HDFS-16923, is also missing
> from
> > > the
> > > > > RC. I erroneously marked it with a fix version of 3.3.5 -- it was
> > > before
> > > > > the initial 3.3.5 RC was made and I didn't notice the branch was
> cut.
> > > My
> > > > > apologies for that. I've pushed both HDFS-16832 and HDFS-16932 to
> > > > > branch-3.3.5, so they are ready if/when an RC3 is cut.
> > > > >
> > > >
> > > > thanks.
> > > >
> > > > >
> > > > > In the meantime, I tested for RC2 that a local cluster of NN +
> > standby
> > > +
> > > > > observer + QJM works as expected for some basic HDFS commands.
> > > > >
> > > >
> > > > OK. Could you have a go with a (locally built) patch release
> > > >
> > > > >
> > > > > On Fri, Mar 3, 2023 at 2:52 AM Steve Loughran
> > > > 
> > > > > wrote:
> > > > >
> > > > >> shipping broken hdfs isn't something we'd want to do, but if we
> can
> > be
> > > > >> confident that all other issues can be addressed in RC3 then I'd
> be
> > > > happy.
> > > > >>
> > > > >> On Fri, 3 Mar 2023 at 05:09, Ayush Saxena 
> > wrote:
> > > > >>
> > > > >> > I will highlight that I am completely fed up with doing this
> > > release
> > > > >> and
> > > > >> >> really want to get it out the way -for which I depend on
> support
> > > from
> > > > >> as
> > > > >> >> many other developers as possible.
> > > > >> >
> > > > >> >
> > > > >> > hmm, I can feel the pain. I tried to find if there is any config
> > or
> > > > any
> > > > >> > workaround which can dodge this HD

Re: [VOTE] Release Apache Hadoop 3.3.5 (RC2)

2023-03-06 Thread Steve Loughran
 i looked at that test and wondered if it it was just being brittle to
time. I'm not a fan of those -there's one in abfs which is particularly bad
for me- maybe we could see if the test can be cut as it is quite a slow one

On Sat, 4 Mar 2023 at 18:28, Viraj Jasani  wrote:

> A minor update on ITestS3AConcurrentOps#testParallelRename
>
> I was previously connected to a vpn due to which bandwidth was getting
> throttled earlier. Ran the test again today without vpn and had no issues
> (earlier only 40% of the overall putObject were able to get completed
> within timeout).
>
>
> On Sat, Mar 4, 2023 at 4:29 AM Steve Loughran  >
> wrote:
>
> > On Sat, 4 Mar 2023 at 01:47, Erik Krogen  wrote:
> >
> > > Thanks Steve. I see now that the branch cut was way back in October so
> I
> > > definitely understand your frustration here!
> > >
> > > This made me realize that HDFS-16832
> > > <https://issues.apache.org/jira/browse/HDFS-16832>, which resolves a
> > very
> > > similar issue as the aforementioned HDFS-16923, is also missing from
> the
> > > RC. I erroneously marked it with a fix version of 3.3.5 -- it was
> before
> > > the initial 3.3.5 RC was made and I didn't notice the branch was cut.
> My
> > > apologies for that. I've pushed both HDFS-16832 and HDFS-16932 to
> > > branch-3.3.5, so they are ready if/when an RC3 is cut.
> > >
> >
> > thanks.
> >
> > >
> > > In the meantime, I tested for RC2 that a local cluster of NN + standby
> +
> > > observer + QJM works as expected for some basic HDFS commands.
> > >
> >
> > OK. Could you have a go with a (locally built) patch release
> >
> > >
> > > On Fri, Mar 3, 2023 at 2:52 AM Steve Loughran
> > 
> > > wrote:
> > >
> > >> shipping broken hdfs isn't something we'd want to do, but if we can be
> > >> confident that all other issues can be addressed in RC3 then I'd be
> > happy.
> > >>
> > >> On Fri, 3 Mar 2023 at 05:09, Ayush Saxena  wrote:
> > >>
> > >> > I will highlight that I am completely fed up with doing this
> release
> > >> and
> > >> >> really want to get it out the way -for which I depend on support
> from
> > >> as
> > >> >> many other developers as possible.
> > >> >
> > >> >
> > >> > hmm, I can feel the pain. I tried to find if there is any config or
> > any
> > >> > workaround which can dodge this HDFS issue, but unfortunately
> couldn't
> > >> find
> > >> > any. If someone does a getListing with needLocation and the file
> > doesn't
> > >> > exist at Observer he is gonna get a NPE rather than a FNF, It isn't
> > just
> > >> > the exception, AFAIK Observer reads have some logic around handling
> > FNF
> > >> > specifically, that it attempts Active NN or something like that in
> > such
> > >> > cases, So, that will be broken as well for this use case.
> > >> >
> > >> > Now, there is no denying the fact there is an issue on the HDFS
> side,
> > >> and
> > >> > it has already been too much work on your side, so you can argue
> that
> > it
> > >> > might not be a very frequent use case or so. It's your call.
> > >> >
> > >> > Just sharing, no intentions of saying you should do that, But as an
> RM
> > >> > "nobody" can force you for a new iteration of a RC, it is gonna be
> > your
> > >> > call and discretion. As far as I know a release can not be vetoed by
> > >> > "anybody" as per the apache by laws.(
> > >> > https://www.apache.org/legal/release-policy.html#release-approval).
> > >> Even
> > >> > our bylaws say that product release requires a Lazy Majority not a
> > >> > Consensus Approval.
> > >> >
> > >> > So, you have a way out. You guys are 2 already and 1 I will give
> you a
> > >> > pass, in case you are really in a state: ''Get me out of this mess"
> > >> state,
> > >> > my basic validations on x86 & Aarch64 both are passing as of now,
> > >> couldn't
> > >> > reach the end for any of the RC's
> > >> >
> > >> > -Ayush
> > >> >
> > >> > On Fri, 3 Mar 2023 at 08:41, Viraj Jasani 
> wrote:
> > >> &g

Re: [VOTE] Release Apache Hadoop 3.3.5 (RC2)

2023-03-04 Thread Steve Loughran
On Sat, 4 Mar 2023 at 01:47, Erik Krogen  wrote:

> Thanks Steve. I see now that the branch cut was way back in October so I
> definitely understand your frustration here!
>
> This made me realize that HDFS-16832
> <https://issues.apache.org/jira/browse/HDFS-16832>, which resolves a very
> similar issue as the aforementioned HDFS-16923, is also missing from the
> RC. I erroneously marked it with a fix version of 3.3.5 -- it was before
> the initial 3.3.5 RC was made and I didn't notice the branch was cut. My
> apologies for that. I've pushed both HDFS-16832 and HDFS-16932 to
> branch-3.3.5, so they are ready if/when an RC3 is cut.
>

thanks.

>
> In the meantime, I tested for RC2 that a local cluster of NN + standby +
> observer + QJM works as expected for some basic HDFS commands.
>

OK. Could you have a go with a (locally built) patch release

>
> On Fri, Mar 3, 2023 at 2:52 AM Steve Loughran 
> wrote:
>
>> shipping broken hdfs isn't something we'd want to do, but if we can be
>> confident that all other issues can be addressed in RC3 then I'd be happy.
>>
>> On Fri, 3 Mar 2023 at 05:09, Ayush Saxena  wrote:
>>
>> > I will highlight that I am completely fed up with doing this  release
>> and
>> >> really want to get it out the way -for which I depend on support from
>> as
>> >> many other developers as possible.
>> >
>> >
>> > hmm, I can feel the pain. I tried to find if there is any config or any
>> > workaround which can dodge this HDFS issue, but unfortunately couldn't
>> find
>> > any. If someone does a getListing with needLocation and the file doesn't
>> > exist at Observer he is gonna get a NPE rather than a FNF, It isn't just
>> > the exception, AFAIK Observer reads have some logic around handling FNF
>> > specifically, that it attempts Active NN or something like that in such
>> > cases, So, that will be broken as well for this use case.
>> >
>> > Now, there is no denying the fact there is an issue on the HDFS side,
>> and
>> > it has already been too much work on your side, so you can argue that it
>> > might not be a very frequent use case or so. It's your call.
>> >
>> > Just sharing, no intentions of saying you should do that, But as an RM
>> > "nobody" can force you for a new iteration of a RC, it is gonna be your
>> > call and discretion. As far as I know a release can not be vetoed by
>> > "anybody" as per the apache by laws.(
>> > https://www.apache.org/legal/release-policy.html#release-approval).
>> Even
>> > our bylaws say that product release requires a Lazy Majority not a
>> > Consensus Approval.
>> >
>> > So, you have a way out. You guys are 2 already and 1 I will give you a
>> > pass, in case you are really in a state: ''Get me out of this mess"
>> state,
>> > my basic validations on x86 & Aarch64 both are passing as of now,
>> couldn't
>> > reach the end for any of the RC's
>> >
>> > -Ayush
>> >
>> > On Fri, 3 Mar 2023 at 08:41, Viraj Jasani  wrote:
>> >
>> >> While this RC is not going to be final, I just wanted to share the
>> results
>> >> of the testing I have done so far with RC1 and RC2.
>> >>
>> >> * Signature: ok
>> >> * Checksum : ok
>> >> * Rat check (1.8.0_341): ok
>> >>  - mvn clean apache-rat:check
>> >> * Built from source (1.8.0_341): ok
>> >>  - mvn clean install  -DskipTests
>> >> * Built tar from source (1.8.0_341): ok
>> >>  - mvn clean package  -Pdist -DskipTests -Dtar
>> -Dmaven.javadoc.skip=true
>> >>
>> >> * Built images using the tarball, installed and started all of Hdfs,
>> JHS
>> >> and Yarn components
>> >> * Ran Hbase (latest 2.5) tests against Hdfs, ran RowCounter Mapreduce
>> job
>> >> * Hdfs CRUD tests
>> >> * MapReduce wordcount job
>> >>
>> >> * Ran S3A tests with scale profile against us-west-2:
>> >> mvn clean verify -Dparallel-tests -DtestsThreadCount=8 -Dscale
>> >>
>> >> ITestS3AConcurrentOps#testParallelRename is timing out after ~960s.
>> This
>> >> is
>> >> consistently failing, looks like a recent regression.
>> >> I was also able to repro on trunk, will create Jira.
>> >>
>> >>
>> >> On Mon, Feb 27, 2023 at 9:59 AM Steve Loughran
>> >> 
>> >> wrote:
>> >>
>> >> 

Re: [VOTE] Release Apache Hadoop 3.3.5 (RC2)

2023-03-03 Thread Steve Loughran
shipping broken hdfs isn't something we'd want to do, but if we can be
confident that all other issues can be addressed in RC3 then I'd be happy.

On Fri, 3 Mar 2023 at 05:09, Ayush Saxena  wrote:

> I will highlight that I am completely fed up with doing this  release and
>> really want to get it out the way -for which I depend on support from as
>> many other developers as possible.
>
>
> hmm, I can feel the pain. I tried to find if there is any config or any
> workaround which can dodge this HDFS issue, but unfortunately couldn't find
> any. If someone does a getListing with needLocation and the file doesn't
> exist at Observer he is gonna get a NPE rather than a FNF, It isn't just
> the exception, AFAIK Observer reads have some logic around handling FNF
> specifically, that it attempts Active NN or something like that in such
> cases, So, that will be broken as well for this use case.
>
> Now, there is no denying the fact there is an issue on the HDFS side, and
> it has already been too much work on your side, so you can argue that it
> might not be a very frequent use case or so. It's your call.
>
> Just sharing, no intentions of saying you should do that, But as an RM
> "nobody" can force you for a new iteration of a RC, it is gonna be your
> call and discretion. As far as I know a release can not be vetoed by
> "anybody" as per the apache by laws.(
> https://www.apache.org/legal/release-policy.html#release-approval). Even
> our bylaws say that product release requires a Lazy Majority not a
> Consensus Approval.
>
> So, you have a way out. You guys are 2 already and 1 I will give you a
> pass, in case you are really in a state: ''Get me out of this mess" state,
> my basic validations on x86 & Aarch64 both are passing as of now, couldn't
> reach the end for any of the RC's
>
> -Ayush
>
> On Fri, 3 Mar 2023 at 08:41, Viraj Jasani  wrote:
>
>> While this RC is not going to be final, I just wanted to share the results
>> of the testing I have done so far with RC1 and RC2.
>>
>> * Signature: ok
>> * Checksum : ok
>> * Rat check (1.8.0_341): ok
>>  - mvn clean apache-rat:check
>> * Built from source (1.8.0_341): ok
>>  - mvn clean install  -DskipTests
>> * Built tar from source (1.8.0_341): ok
>>  - mvn clean package  -Pdist -DskipTests -Dtar -Dmaven.javadoc.skip=true
>>
>> * Built images using the tarball, installed and started all of Hdfs, JHS
>> and Yarn components
>> * Ran Hbase (latest 2.5) tests against Hdfs, ran RowCounter Mapreduce job
>> * Hdfs CRUD tests
>> * MapReduce wordcount job
>>
>> * Ran S3A tests with scale profile against us-west-2:
>> mvn clean verify -Dparallel-tests -DtestsThreadCount=8 -Dscale
>>
>> ITestS3AConcurrentOps#testParallelRename is timing out after ~960s. This
>> is
>> consistently failing, looks like a recent regression.
>> I was also able to repro on trunk, will create Jira.
>>
>>
>> On Mon, Feb 27, 2023 at 9:59 AM Steve Loughran
>> 
>> wrote:
>>
>> > Mukund and I have put together a release candidate (RC2) for Hadoop
>> 3.3.5.
>> >
>> > We need anyone who can to verify the source and binary artifacts,
>> > including those JARs staged on maven, the site documentation and the
>> arm64
>> > tar file.
>> >
>> > The RC is available at:
>> > https://dist.apache.org/repos/dist/dev/hadoop/hadoop-3.3.5-RC2/
>> >
>> > The git tag is release-3.3.5-RC2, commit 72f8c2a4888
>> >
>> > The maven artifacts are staged at
>> >
>> https://repository.apache.org/content/repositories/orgapachehadoop-1369/
>> >
>> > You can find my public key at:
>> > https://dist.apache.org/repos/dist/release/hadoop/common/KEYS
>> >
>> > Change log
>> >
>> https://dist.apache.org/repos/dist/dev/hadoop/hadoop-3.3.5-RC2/CHANGELOG.md
>> >
>> > Release notes
>> >
>> >
>> https://dist.apache.org/repos/dist/dev/hadoop/hadoop-3.3.5-RC2/RELEASENOTES.md
>> >
>> > This is off branch-3.3 and is the first big release since 3.3.2.
>> >
>> > As to what changed since the RC1 attempt last week
>> >
>> >
>> >1. Version fixup in JIRA (credit due to Takanobu Asanuma there)
>> >2. HADOOP-18470. Remove HDFS RBF text in the 3.3.5 index.md file
>> >3. Revert "HADOOP-18590. Publish SBOM artifacts (#5281)" (creating
>> build
>> >issues in maven 3.9.0)
>> >4. HADOOP-18641. Cloud connector dependency and LICENSE fixup.
>> (#5429)
>> >
>> >
>> > Note, because the arm64 binaries are built separately on a different
>> > platform and JVM, their jar files may not match those of the x86
>> > release -and therefore the maven artifacts. I don't think this is
>> > an issue (the ASF actually releases source tarballs, the binaries are
>> > there for help only, though with the maven repo that's a bit blurred).
>> >
>> > The only way to be consistent would actually untar the x86.tar.gz,
>> > overwrite its binaries with the arm stuff, retar, sign and push out
>> > for the vote. Even automating that would be risky.
>> >
>> > Please try the release and vote. The vote will run for 5 days.
>> >
>> > Steve and Mukund
>> >
>>
>


Re: [VOTE] Release Apache Hadoop 3.3.5 (RC2)

2023-03-02 Thread Steve Loughran
well, lets see what others say.

we don't want to ship stuff with serious regression to hdfs.

I will highlight that I am completely fed up with doing this  release and
really want to get it out the way -for which I depend on support from as
many other developers as possible.

Erik, right now what you can help by doing is test all the rest of the
release knowing that this issue exists and seeing if you can identify
anything else. That way this update will be the sole blocker and we can get
through that next RC with nothing else surfacing.

I had noticed that the arm64 release somehow missed out the native binaries
and was going to investigate that but didn't consider that a blocker… I was
just going to cut that artefact and, post Darcy, create a new arm64 release
using all the jars of the x86 build but replacing the x86 native libs with
the arm versions. This helps ensure that the JAR files in the wild all
match, which strikes me as a good thing.

Can I also encourage people in the HFS team to put their hand up and
volunteer for leading the next release, with a goal of getting something
out later this year.



On Thu, 2 Mar 2023 at 00:27, Erik Krogen  wrote:

> Hi folks, apologies for being late to the release conversation, but I think
> we need to get HDFS-16923
> <https://issues.apache.org/jira/browse/HDFS-16923> into
> 3.3.5. HDFS-16732 <https://issues.apache.org/jira/browse/HDFS-16732>,
> which
> also went into 3.3.5, introduced an issue whereby Observer NameNodes will
> throw NPE upon any getListing call on a directory that doesn't exist. It
> will make Observer NN pretty much unusable in 3.3.5. Zander put up a patch
> for this and it has been merged to trunk/branch-3.3 as of a few minutes
> ago. I'd like to see about merging to branch-3.3.5 as well.
>
> Thanks for the consideration and sorry for not bringing this up in RC1 or
> earlier.
>
> On Mon, Feb 27, 2023 at 9:59 AM Steve Loughran  >
> wrote:
>
> > Mukund and I have put together a release candidate (RC2) for Hadoop
> 3.3.5.
> >
> > We need anyone who can to verify the source and binary artifacts,
> > including those JARs staged on maven, the site documentation and the
> arm64
> > tar file.
> >
> > The RC is available at:
> > https://dist.apache.org/repos/dist/dev/hadoop/hadoop-3.3.5-RC2/
> >
> > The git tag is release-3.3.5-RC2, commit 72f8c2a4888
> >
> > The maven artifacts are staged at
> > https://repository.apache.org/content/repositories/orgapachehadoop-1369/
> >
> > You can find my public key at:
> > https://dist.apache.org/repos/dist/release/hadoop/common/KEYS
> >
> > Change log
> >
> https://dist.apache.org/repos/dist/dev/hadoop/hadoop-3.3.5-RC2/CHANGELOG.md
> >
> > Release notes
> >
> >
> https://dist.apache.org/repos/dist/dev/hadoop/hadoop-3.3.5-RC2/RELEASENOTES.md
> >
> > This is off branch-3.3 and is the first big release since 3.3.2.
> >
> > As to what changed since the RC1 attempt last week
> >
> >
> >1. Version fixup in JIRA (credit due to Takanobu Asanuma there)
> >2. HADOOP-18470. Remove HDFS RBF text in the 3.3.5 index.md file
> >3. Revert "HADOOP-18590. Publish SBOM artifacts (#5281)" (creating
> build
> >issues in maven 3.9.0)
> >4. HADOOP-18641. Cloud connector dependency and LICENSE fixup. (#5429)
> >
> >
> > Note, because the arm64 binaries are built separately on a different
> > platform and JVM, their jar files may not match those of the x86
> > release -and therefore the maven artifacts. I don't think this is
> > an issue (the ASF actually releases source tarballs, the binaries are
> > there for help only, though with the maven repo that's a bit blurred).
> >
> > The only way to be consistent would actually untar the x86.tar.gz,
> > overwrite its binaries with the arm stuff, retar, sign and push out
> > for the vote. Even automating that would be risky.
> >
> > Please try the release and vote. The vote will run for 5 days.
> >
> > Steve and Mukund
> >
>


[VOTE] Release Apache Hadoop 3.3.5 (RC2)

2023-02-27 Thread Steve Loughran
Mukund and I have put together a release candidate (RC2) for Hadoop 3.3.5.

We need anyone who can to verify the source and binary artifacts,
including those JARs staged on maven, the site documentation and the arm64
tar file.

The RC is available at:
https://dist.apache.org/repos/dist/dev/hadoop/hadoop-3.3.5-RC2/

The git tag is release-3.3.5-RC2, commit 72f8c2a4888

The maven artifacts are staged at
https://repository.apache.org/content/repositories/orgapachehadoop-1369/

You can find my public key at:
https://dist.apache.org/repos/dist/release/hadoop/common/KEYS

Change log
https://dist.apache.org/repos/dist/dev/hadoop/hadoop-3.3.5-RC2/CHANGELOG.md

Release notes
https://dist.apache.org/repos/dist/dev/hadoop/hadoop-3.3.5-RC2/RELEASENOTES.md

This is off branch-3.3 and is the first big release since 3.3.2.

As to what changed since the RC1 attempt last week


   1. Version fixup in JIRA (credit due to Takanobu Asanuma there)
   2. HADOOP-18470. Remove HDFS RBF text in the 3.3.5 index.md file
   3. Revert "HADOOP-18590. Publish SBOM artifacts (#5281)" (creating build
   issues in maven 3.9.0)
   4. HADOOP-18641. Cloud connector dependency and LICENSE fixup. (#5429)


Note, because the arm64 binaries are built separately on a different
platform and JVM, their jar files may not match those of the x86
release -and therefore the maven artifacts. I don't think this is
an issue (the ASF actually releases source tarballs, the binaries are
there for help only, though with the maven repo that's a bit blurred).

The only way to be consistent would actually untar the x86.tar.gz,
overwrite its binaries with the arm stuff, retar, sign and push out
for the vote. Even automating that would be risky.

Please try the release and vote. The vote will run for 5 days.

Steve and Mukund


Re: [VOTE] Release Apache Hadoop 3.3.5

2023-02-24 Thread Steve Loughran
 need this pr in too, https://github.com/apache/hadoop/pull/5429

   1. cuts back on some transitive dependencies from hadoop-aliyun
   2. fixes LICENSE-bin to be correct

#2 is the blocker...and it looks like 3.2.x will also need fixup as well as
the later ones -hadoop binaries have shipped without that file being up to
date, but at least all the transitive stuff is correctly licensed. And i
think we need to change the PR template to mention transitive updates in
the license bit too

if this goes in, I will do the rebuild on monday UK time

On Thu, 23 Feb 2023 at 11:18, Steve Loughran  wrote:

>
> And I've just hit HADOOP-18641. cyclonedx maven plugin breaks on recent
> maven releases (3.9.0)
>
> on a new local build with maven updated on homebrew (which i needed for
> spark). so a code change too. That issue doesn't surface on our
> release dockers, but will hit other people. especially over time. Going to
> revert HADOOP-18590. Publish SBOM artifacts (#5281)
>
>
>
> On Thu, 23 Feb 2023 at 10:29, Steve Loughran  wrote:
>
>> ok, let me cancel, update those jiras and kick off again. that will save
>> anyone else having to do their homework
>>
>> On Thu, 23 Feb 2023 at 08:56, Takanobu Asanuma 
>> wrote:
>>
>>> I'm now -1 as I found the wrong information on the top page (index.md).
>>>
>>> > 1. HDFS-13522, HDFS-16767 & Related Jiras: Allow Observer Reads in HDFS
>>> Router Based Federation.
>>>
>>> The fix version of HDFS-13522 and HDFS-16767 also included 3.3.5 before,
>>> though it is actually not in branch-3.3. I corrected the fix version and
>>> created HDFS-16889 to backport them to branch-3.3 about a month ago.
>>> Unfortunately, it won't be fixed soon. I should have let you know at that
>>> time, sorry.  Supporting Observer NameNode in RBF is a prominent feature.
>>> So I think we have to delete the description from the top page not to
>>> confuse Hadoop users.
>>>
>>> - Takanobu
>>>
>>> 2023年2月23日(木) 17:17 Takanobu Asanuma :
>>>
>>> > Thanks for driving the release, Steve and Mukund.
>>> >
>>> > I found that there were some jiras with wrong fix versions.
>>> >
>>> > The fix versions included 3.3.5, but actually, it isn't in 3.3.5-RC1:
>>> > - HDFS-16845
>>> > - HADOOP-18345
>>> >
>>> > The fix versions didn't include 3.3.5, but actually, it is in 3.3.5-RC1
>>> > (and it is not in release-3.3.4) :
>>> > - HADOOP-17276
>>> > - HDFS-13293
>>> > - HDFS-15630
>>> > - HDFS-16266
>>> > - HADOOP-18003
>>> > - HDFS-16310
>>> > - HADOOP-18014
>>> >
>>> > I corrected all the wrong fix versions just now. I'm not sure we should
>>> > revote it since it only affects the changelog.
>>> >
>>> > - Takanobu
>>> >
>>> > 2023年2月21日(火) 22:43 Steve Loughran :
>>> >
>>> >> Apache Hadoop 3.3.5
>>> >>
>>> >> Mukund and I have put together a release candidate (RC1) for Hadoop
>>> 3.3.5.
>>> >>
>>> >> What we would like is for anyone who can to verify the tarballs,
>>> >> especially
>>> >> anyone who can try the arm64 binaries as we want to include them too.
>>> >>
>>> >> The RC is available at:
>>> >> https://dist.apache.org/repos/dist/dev/hadoop/hadoop-3.3.5-RC1/
>>> >>
>>> >> The git tag is release-3.3.5-RC1, commit 274f91a3259
>>> >>
>>> >> The maven artifacts are staged at
>>> >>
>>> https://repository.apache.org/content/repositories/orgapachehadoop-1368/
>>> >>
>>> >> You can find my public key at:
>>> >> https://dist.apache.org/repos/dist/release/hadoop/common/KEYS
>>> >>
>>> >> Change log
>>> >>
>>> >>
>>> https://dist.apache.org/repos/dist/dev/hadoop/hadoop-3.3.5-RC1/CHANGELOG.md
>>> >>
>>> >> Release notes
>>> >>
>>> >>
>>> https://dist.apache.org/repos/dist/dev/hadoop/hadoop-3.3.5-RC1/RELEASENOTES.md
>>> >>
>>> >> This is off branch-3.3 and is the first big release since 3.3.2.
>>> >>
>>> >> Key changes include
>>> >>
>>> >> * Big update of dependencies to try and keep those reports of
>>> >>   transitive CVEs under control -both genuine and false positives.
>>>

Re: [VOTE] Release Apache Hadoop 3.3.5

2023-02-23 Thread Steve Loughran
And I've just hit HADOOP-18641. cyclonedx maven plugin breaks on recent
maven releases (3.9.0)

on a new local build with maven updated on homebrew (which i needed for
spark). so a code change too. That issue doesn't surface on our
release dockers, but will hit other people. especially over time. Going to
revert HADOOP-18590. Publish SBOM artifacts (#5281)



On Thu, 23 Feb 2023 at 10:29, Steve Loughran  wrote:

> ok, let me cancel, update those jiras and kick off again. that will save
> anyone else having to do their homework
>
> On Thu, 23 Feb 2023 at 08:56, Takanobu Asanuma 
> wrote:
>
>> I'm now -1 as I found the wrong information on the top page (index.md).
>>
>> > 1. HDFS-13522, HDFS-16767 & Related Jiras: Allow Observer Reads in HDFS
>> Router Based Federation.
>>
>> The fix version of HDFS-13522 and HDFS-16767 also included 3.3.5 before,
>> though it is actually not in branch-3.3. I corrected the fix version and
>> created HDFS-16889 to backport them to branch-3.3 about a month ago.
>> Unfortunately, it won't be fixed soon. I should have let you know at that
>> time, sorry.  Supporting Observer NameNode in RBF is a prominent feature.
>> So I think we have to delete the description from the top page not to
>> confuse Hadoop users.
>>
>> - Takanobu
>>
>> 2023年2月23日(木) 17:17 Takanobu Asanuma :
>>
>> > Thanks for driving the release, Steve and Mukund.
>> >
>> > I found that there were some jiras with wrong fix versions.
>> >
>> > The fix versions included 3.3.5, but actually, it isn't in 3.3.5-RC1:
>> > - HDFS-16845
>> > - HADOOP-18345
>> >
>> > The fix versions didn't include 3.3.5, but actually, it is in 3.3.5-RC1
>> > (and it is not in release-3.3.4) :
>> > - HADOOP-17276
>> > - HDFS-13293
>> > - HDFS-15630
>> > - HDFS-16266
>> > - HADOOP-18003
>> > - HDFS-16310
>> > - HADOOP-18014
>> >
>> > I corrected all the wrong fix versions just now. I'm not sure we should
>> > revote it since it only affects the changelog.
>> >
>> > - Takanobu
>> >
>> > 2023年2月21日(火) 22:43 Steve Loughran :
>> >
>> >> Apache Hadoop 3.3.5
>> >>
>> >> Mukund and I have put together a release candidate (RC1) for Hadoop
>> 3.3.5.
>> >>
>> >> What we would like is for anyone who can to verify the tarballs,
>> >> especially
>> >> anyone who can try the arm64 binaries as we want to include them too.
>> >>
>> >> The RC is available at:
>> >> https://dist.apache.org/repos/dist/dev/hadoop/hadoop-3.3.5-RC1/
>> >>
>> >> The git tag is release-3.3.5-RC1, commit 274f91a3259
>> >>
>> >> The maven artifacts are staged at
>> >>
>> https://repository.apache.org/content/repositories/orgapachehadoop-1368/
>> >>
>> >> You can find my public key at:
>> >> https://dist.apache.org/repos/dist/release/hadoop/common/KEYS
>> >>
>> >> Change log
>> >>
>> >>
>> https://dist.apache.org/repos/dist/dev/hadoop/hadoop-3.3.5-RC1/CHANGELOG.md
>> >>
>> >> Release notes
>> >>
>> >>
>> https://dist.apache.org/repos/dist/dev/hadoop/hadoop-3.3.5-RC1/RELEASENOTES.md
>> >>
>> >> This is off branch-3.3 and is the first big release since 3.3.2.
>> >>
>> >> Key changes include
>> >>
>> >> * Big update of dependencies to try and keep those reports of
>> >>   transitive CVEs under control -both genuine and false positives.
>> >> * HDFS RBF enhancements
>> >> * Critical fix to ABFS input stream prefetching for correct reading.
>> >> * Vectored IO API for all FSDataInputStream implementations, with
>> >>   high-performance versions for file:// and s3a:// filesystems.
>> >>   file:// through java native io
>> >>   s3a:// parallel GET requests.
>> >> * This release includes Arm64 binaries. Please can anyone with
>> >>   compatible systems validate these.
>> >>
>> >> Note, because the arm64 binaries are built separately on a different
>> >> platform and JVM, their jar files may not match those of the x86
>> >> release -and therefore the maven artifacts. I don't think this is
>> >> an issue (the ASF actually releases source tarballs, the binaries are
>> >> there for help only, though with the maven repo that's a bit blurred).
>> >>
>> >> The only way to be consistent would actually untar the x86.tar.gz,
>> >> overwrite its binaries with the arm stuff, retar, sign and push out
>> >> for the vote. Even automating that would be risky.
>> >>
>> >> Please try the release and vote. The vote will run for 5 days.
>> >>
>> >> Steve and Mukund
>> >>
>> >
>>
>


Re: [VOTE] Release Apache Hadoop 3.3.5

2023-02-23 Thread Steve Loughran
ok, let me cancel, update those jiras and kick off again. that will save
anyone else having to do their homework

On Thu, 23 Feb 2023 at 08:56, Takanobu Asanuma  wrote:

> I'm now -1 as I found the wrong information on the top page (index.md).
>
> > 1. HDFS-13522, HDFS-16767 & Related Jiras: Allow Observer Reads in HDFS
> Router Based Federation.
>
> The fix version of HDFS-13522 and HDFS-16767 also included 3.3.5 before,
> though it is actually not in branch-3.3. I corrected the fix version and
> created HDFS-16889 to backport them to branch-3.3 about a month ago.
> Unfortunately, it won't be fixed soon. I should have let you know at that
> time, sorry.  Supporting Observer NameNode in RBF is a prominent feature.
> So I think we have to delete the description from the top page not to
> confuse Hadoop users.
>
> - Takanobu
>
> 2023年2月23日(木) 17:17 Takanobu Asanuma :
>
> > Thanks for driving the release, Steve and Mukund.
> >
> > I found that there were some jiras with wrong fix versions.
> >
> > The fix versions included 3.3.5, but actually, it isn't in 3.3.5-RC1:
> > - HDFS-16845
> > - HADOOP-18345
> >
> > The fix versions didn't include 3.3.5, but actually, it is in 3.3.5-RC1
> > (and it is not in release-3.3.4) :
> > - HADOOP-17276
> > - HDFS-13293
> > - HDFS-15630
> > - HDFS-16266
> > - HADOOP-18003
> > - HDFS-16310
> > - HADOOP-18014
> >
> > I corrected all the wrong fix versions just now. I'm not sure we should
> > revote it since it only affects the changelog.
> >
> > - Takanobu
> >
> > 2023年2月21日(火) 22:43 Steve Loughran :
> >
> >> Apache Hadoop 3.3.5
> >>
> >> Mukund and I have put together a release candidate (RC1) for Hadoop
> 3.3.5.
> >>
> >> What we would like is for anyone who can to verify the tarballs,
> >> especially
> >> anyone who can try the arm64 binaries as we want to include them too.
> >>
> >> The RC is available at:
> >> https://dist.apache.org/repos/dist/dev/hadoop/hadoop-3.3.5-RC1/
> >>
> >> The git tag is release-3.3.5-RC1, commit 274f91a3259
> >>
> >> The maven artifacts are staged at
> >>
> https://repository.apache.org/content/repositories/orgapachehadoop-1368/
> >>
> >> You can find my public key at:
> >> https://dist.apache.org/repos/dist/release/hadoop/common/KEYS
> >>
> >> Change log
> >>
> >>
> https://dist.apache.org/repos/dist/dev/hadoop/hadoop-3.3.5-RC1/CHANGELOG.md
> >>
> >> Release notes
> >>
> >>
> https://dist.apache.org/repos/dist/dev/hadoop/hadoop-3.3.5-RC1/RELEASENOTES.md
> >>
> >> This is off branch-3.3 and is the first big release since 3.3.2.
> >>
> >> Key changes include
> >>
> >> * Big update of dependencies to try and keep those reports of
> >>   transitive CVEs under control -both genuine and false positives.
> >> * HDFS RBF enhancements
> >> * Critical fix to ABFS input stream prefetching for correct reading.
> >> * Vectored IO API for all FSDataInputStream implementations, with
> >>   high-performance versions for file:// and s3a:// filesystems.
> >>   file:// through java native io
> >>   s3a:// parallel GET requests.
> >> * This release includes Arm64 binaries. Please can anyone with
> >>   compatible systems validate these.
> >>
> >> Note, because the arm64 binaries are built separately on a different
> >> platform and JVM, their jar files may not match those of the x86
> >> release -and therefore the maven artifacts. I don't think this is
> >> an issue (the ASF actually releases source tarballs, the binaries are
> >> there for help only, though with the maven repo that's a bit blurred).
> >>
> >> The only way to be consistent would actually untar the x86.tar.gz,
> >> overwrite its binaries with the arm stuff, retar, sign and push out
> >> for the vote. Even automating that would be risky.
> >>
> >> Please try the release and vote. The vote will run for 5 days.
> >>
> >> Steve and Mukund
> >>
> >
>


Re: yetus reporting javadoc errors on @InterfaceAudience attributes

2023-02-22 Thread Steve Loughran
in hadoop, the compile support ain't there, it is being
>> tracked here:
>> https://issues.apache.org/jira/browse/HADOOP-16795
>>
>> Some issues are there, one with Jersey I know and may be a couple of more.
>>
>> -Ayush
>>
>> On Fri, 16 Dec 2022 at 20:07, Steve Loughran 
>> wrote:
>>
>>> OK, it's a JDK bug
>>>
>>> both the java8 and java11 javadocs are now using java11. Have we stopped
>>> with the java8 builds?
>>>
>>> as i am happy with that, we just need to make an explicit declaration and
>>> wrap up of anything outstanding.
>>>
>>>
>>>
>>> On Thu, 15 Dec 2022 at 22:30, Ayush Saxena  wrote:
>>>
>>> > Thanx Ashutosh, Let me know if you need any help there.
>>> >
>>> > Got some time to recheck the Javadoc stuff, it seems like a JDK bug
>>> > https://bugs.openjdk.org/browse/JDK-8295850
>>> >
>>> > more details over here:
>>> >
>>> https://github.com/apache/hadoop/pull/5226#pullrequestreview-1220041496
>>> >
>>> > -Ayush
>>> >
>>> > On Mon, 12 Dec 2022 at 19:46, Ashutosh Gupta <
>>> ashutoshgupta...@gmail.com>
>>> > wrote:
>>> >
>>> >> Thanks Ayush for pointing out the failures related to the Junit 5
>>> >> upgrade. As I have closely worked in upgrading Junit 4 to Junit 5
>>> >> throughout the hadoop project. I will create a JIRA for these
>>> failures and
>>> >> fix them on priority.
>>> >>
>>> >> -Ashutosh
>>> >>
>>> >> On Mon, Dec 12, 2022 at 1:59 PM Ayush Saxena 
>>> wrote:
>>> >>
>>> >>> Try to fix in the same way it was done here and couple of similar
>>> PRs:
>>> >>> https://github.com/apache/hadoop/pull/5179
>>> >>>
>>> >>> There are a bunch of PRs in yarn getting the similar error fixed
>>> module
>>> >>> wise, the problem would be there in many other modules as well...
>>> >>>
>>> >>> The daily JDK-11 build also shows that failure here:
>>> >>>
>>> >>>
>>> https://ci-hadoop.apache.org/job/hadoop-qbt-trunk-java11-linux-x86_64/410/artifact/out/patch-javadoc-root.txt
>>> >>>
>>> >>> BTW. the daily build is also broken with some whooping 150+ failures
>>> >>>
>>> >>>
>>> https://ci-hadoop.apache.org/view/Hadoop/job/hadoop-qbt-trunk-java8-linux-x86_64/1071/testReport/
>>> >>>
>>> >>> Mostly some Junit upgrade patch being the reason.
>>> >>>
>>> >>> -Ayush
>>> >>>
>>> >>> On Mon, 12 Dec 2022 at 18:46, Steve Loughran
>>> >> >>> >
>>> >>> wrote:
>>> >>>
>>> >>> > yetus is now reporting errors on our @InterfaceAudience tags in
>>> java8
>>> >>> and
>>> >>> > java11 javadoc generation
>>> >>> > https://github.com/apache/hadoop/pull/5205#issuecomment-1344664692
>>> >>> >
>>> >>> >
>>> >>>
>>> https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5205/2/artifact/out/branch-javadoc-hadoop-tools_hadoop-azure-jdkUbuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04.txt
>>> >>> >
>>> >>> > it looks a bit like the javadocs are both being done in the java11
>>> >>> version,
>>> >>> > and is is unhappy.
>>> >>> >
>>> >>> > any suggestions as to a fix?
>>> >>> >
>>> >>>
>>> >>
>>>
>>


[VOTE] Release Apache Hadoop 3.3.5

2023-02-21 Thread Steve Loughran
Apache Hadoop 3.3.5

Mukund and I have put together a release candidate (RC1) for Hadoop 3.3.5.

What we would like is for anyone who can to verify the tarballs, especially
anyone who can try the arm64 binaries as we want to include them too.

The RC is available at:
https://dist.apache.org/repos/dist/dev/hadoop/hadoop-3.3.5-RC1/

The git tag is release-3.3.5-RC1, commit 274f91a3259

The maven artifacts are staged at
https://repository.apache.org/content/repositories/orgapachehadoop-1368/

You can find my public key at:
https://dist.apache.org/repos/dist/release/hadoop/common/KEYS

Change log
https://dist.apache.org/repos/dist/dev/hadoop/hadoop-3.3.5-RC1/CHANGELOG.md

Release notes
https://dist.apache.org/repos/dist/dev/hadoop/hadoop-3.3.5-RC1/RELEASENOTES.md

This is off branch-3.3 and is the first big release since 3.3.2.

Key changes include

* Big update of dependencies to try and keep those reports of
  transitive CVEs under control -both genuine and false positives.
* HDFS RBF enhancements
* Critical fix to ABFS input stream prefetching for correct reading.
* Vectored IO API for all FSDataInputStream implementations, with
  high-performance versions for file:// and s3a:// filesystems.
  file:// through java native io
  s3a:// parallel GET requests.
* This release includes Arm64 binaries. Please can anyone with
  compatible systems validate these.

Note, because the arm64 binaries are built separately on a different
platform and JVM, their jar files may not match those of the x86
release -and therefore the maven artifacts. I don't think this is
an issue (the ASF actually releases source tarballs, the binaries are
there for help only, though with the maven repo that's a bit blurred).

The only way to be consistent would actually untar the x86.tar.gz,
overwrite its binaries with the arm stuff, retar, sign and push out
for the vote. Even automating that would be risky.

Please try the release and vote. The vote will run for 5 days.

Steve and Mukund


[jira] [Created] (MAPREDUCE-7432) Make Manifest Committer the default for abfs and gcs

2023-02-09 Thread Steve Loughran (Jira)
Steve Loughran created MAPREDUCE-7432:
-

 Summary: Make Manifest Committer the default for abfs and gcs
 Key: MAPREDUCE-7432
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-7432
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
  Components: client
Affects Versions: 3.3.5
Reporter: Steve Loughran


Switch to the manifest committer as default for abfs and gcs

* abfs: needed for performance, scale and resilience under some failure modes
* gcs: provides correctness through atomic task commit and better job commit 
performance



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org



reviewers sought for MAPREDUCE-7430 FileSystemCount enumeration changes will cause mapreduce application failure during upgrade

2023-01-16 Thread Steve Loughran
before I commit this PR does anyone else want to look at it?
https://github.com/apache/hadoop/pull/5255

I don't think it's critical for the 3.3.5 release -as it has clearly been
around for a while


Re: [VOTE] Release Apache Hadoop 3.3.5

2023-01-16 Thread Steve Loughran
thanks

pulling in a few of the recent changes which seem needed/important, now
wondering about the javadocs.

i will add a new probe for this in our automated release ant bulld so we
can't cut a release without that
https://github.com/steveloughran/validate-hadoop-client-artifacts

On Mon, 2 Jan 2023 at 15:47, Masatake Iwasaki 
wrote:

> >- building HBase 2.4.13 and Hive 3.1.3 against 3.3.5 failed due to
> dependency change.
>
> For HBase, classes under com/sun/jersey/json/* and com/sun/xml/* are not
> expected in hbase-shaded-with-hadoop-check-invariants.
> Updating hbase-shaded/pom.xml is expected to be the fix as done in
> HBASE-27292.
>
> https://github.com/apache/hbase/commit/00612106b5fa78a0dd198cbcaab610bd8b1be277
>
>
are we adding some new dependencies from somewhere then? i never even knew
there was a com.sun.json module

hey, imagine if there was a single, standard, json library with a minimal
O/J mapping (strings, numbers, arrays and maps) -we'd be able to cut out
all of jackson, gson, jettison and maybe even avoid the eternal
jackson-databind CVE homework


>[INFO] --- exec-maven-plugin:1.6.0:exec
> (check-jar-contents-for-stuff-with-hadoop) @
> hbase-shaded-with-hadoop-check-invariants ---
>[ERROR] Found artifact with unexpected contents:
> '/home/rocky/srcs/bigtop/build/hbase/rpm/BUILD/hbase-2.4.13/hbase-shaded/hbase-shaded-client/target/hbase-shaded-client-2.4.13.jar'
>Please check the following and either correct the build or update
>the allowed list with reasoning.
>
>com/
>com/sun/
>com/sun/jersey/
>com/sun/jersey/json/
>...
>
>
> For Hive, classes belonging to org.bouncycastle:bcprov-jdk15on:1.68 seem
> to be problematic.
> Excluding them on hive-jdbc  might be the fix.
>
>[ERROR] Failed to execute goal
> org.apache.maven.plugins:maven-shade-plugin:3.2.1:shade (default) on
> project hive-jdbc: Error creating shaded jar: Problem shading JAR
> /home/rocky/.m2/repository/org/bouncycastle/bcprov-jdk15on/1.68/bcprov-jdk15on-1.68.jar
> entry
> META-INF/versions/15/org/bouncycastle/jcajce/provider/asymmetric/edec/SignatureSpi$EdDSA.class:
> java.lang.IllegalArgumentException: Unsupported class file major version 59
> -> [Help 1]
>...
>
>
ahh, covered in https://issues.apache.org/jira/browse/HADOOP-17563  ... the
maven shade plugin needs to be updated to handle the multi-JVM JAR

>
> On 2023/01/02 22:02, Masatake Iwasaki wrote:
> > Thanks for your great effort for the new release, Steve and Mukund.
> >
> > +1 while it would be nice if we can address missed Javadocs.
> >
> > + verified the signature and checksum.
> > + built from source tarball on Rocky Linux 8 and OpenJDK 8 with native
> profile enabled.
> >+ launched pseudo distributed cluster including kms and httpfs with
> Kerberos and SSL enabled.
> >+ created encryption zone, put and read files via httpfs.
> >+ ran example MR wordcount over encryption zone.
> > + built rpm packages by Bigtop and ran smoke-tests on Rocky Linux 8
> (both x86_64 and aarch64).
> >- building HBase 2.4.13 and Hive 3.1.3 against 3.3.5 failed due to
> dependency change.
> >  # while building HBase 2.4.13 and Hive 3.1.3 against Hadoop 3.3.4
> worked.
> > + skimmed the site contents.
> >- Javadocs are not contained (under r3.3.5/api).
> >  # The issue can be reproduced even if I built site docs from the
> source.
> >
> > Masatake Iwasaki
> >
>


[VOTE] Release Apache Hadoop 3.3.5

2022-12-21 Thread Steve Loughran
Mukund and I have put together a release candidate (RC0) for Hadoop 3.3.5.

Given the time of year it's a bit unrealistic to run a 5 day vote and
expect people to be able to test it thoroughly enough to make this the one
we can ship.

What we would like is for anyone who can to verify the tarballs, and test
the binaries, especially anyone who can try the arm64 binaries. We've got
the building of those done and now the build file will incorporate them
into the release -but neither of us have actually tested it yet. Maybe I
should try it on my pi400 over xmas.

The maven artifacts are up on the apache staging repo -they are the ones
from x86 build. Building and testing downstream apps will be incredibly
helpful.

The RC is available at:
https://dist.apache.org/repos/dist/dev/hadoop/hadoop-3.3.5-RC0/

The git tag is release-3.3.5-RC0, commit 3262495904d

The maven artifacts are staged at
https://repository.apache.org/content/repositories/orgapachehadoop-1365/

You can find my public key at:
https://dist.apache.org/repos/dist/release/hadoop/common/KEYS

Change log
https://dist.apache.org/repos/dist/dev/hadoop/hadoop-3.3.5-RC0/CHANGELOG.md

Release notes
https://dist.apache.org/repos/dist/dev/hadoop/hadoop-3.3.5-RC0/RELEASENOTES.md

This is off branch-3.3 and is the first big release since 3.3.2.

Key changes include

* Big update of dependencies to try and keep those reports of
  transitive CVEs under control -both genuine and false positive.
* HDFS RBF enhancements
* Critical fix to ABFS input stream prefetching for correct reading.
* Vectored IO API for all FSDataInputStream implementations, with
  high-performance versions for file:// and s3a:// filesystems.
  file:// through java native io
  s3a:// parallel GET requests.
* This release includes Arm64 binaries. Please can anyone with
  compatible systems validate these.


Please try the release and vote on it, even though i don't know what is a
good timeline here...i'm actually going on holiday in early jan. Mukund is
around and so can drive the process while I'm offline.

Assuming we do have another iteration, the RC1 will not be before mid jan
for that reason

Steve (and mukund)


[jira] [Resolved] (MAPREDUCE-7428) Fix failures related to Junit 4 to Junit 5 upgrade in org.apache.hadoop.mapreduce.v2.app.webapp

2022-12-14 Thread Steve Loughran (Jira)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-7428?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran resolved MAPREDUCE-7428.
---
Fix Version/s: 3.4.0
   Resolution: Fixed

> Fix failures related to Junit 4 to Junit 5 upgrade in 
> org.apache.hadoop.mapreduce.v2.app.webapp
> ---
>
> Key: MAPREDUCE-7428
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7428
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: test
>Affects Versions: 3.4.0
>Reporter: Ashutosh Gupta
>Assignee: Ashutosh Gupta
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>
> Few test are getting failed due to Junit 4 to Junit 5 upgrade in 
> org.apache.hadoop.mapreduce.v2.app.webapp 
> [https://ci-hadoop.apache.org/view/Hadoop/job/hadoop-qbt-trunk-java8-linux-x86_64/1071/testReport/]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org



exciting new content needed for the 3.3.5 index.md file

2022-12-01 Thread Steve Loughran
The first "smoke test" RC is about to be up for people to play with, we are
just testing things here and getting that arm build done.

Can I have some content for the index.html page describing what has changed?
hadoop-project/src/site/markdown/index.md.vm

I can (and will) speak highly of stuff I've been involved in, but need
contributions from others for what is new in this release in HDFS, YARN,
and MR (other than the manifest committer).

It'd be good to have a list of CVEs fixed by upgrading jars. Maybe we
should have a transitive-CVE tag for all JIRAs which update a dependency
for this, so that then we could have the release notes explicitly list
these in their own section.

Please submit changes to branch-3.3.5; use HADOOP-18470. as the jira for
all the release notes.

 thanks.


[jira] [Resolved] (MAPREDUCE-7401) Optimize liststatus for better performance by using recursive listing

2022-11-29 Thread Steve Loughran (Jira)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-7401?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran resolved MAPREDUCE-7401.
---
Resolution: Won't Fix

> Optimize liststatus for better performance by using recursive listing
> -
>
> Key: MAPREDUCE-7401
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7401
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Affects Versions: 3.3.3
>Reporter: Ashutosh Gupta
>Assignee: Ashutosh Gupta
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> This change adds recursive listing APIs to FileSystem. The purpose is to 
> enable different FileSystem implementations optimize on the listStatus calls 
> if they can. Default implementation is provided for normal FileSystem 
> implementation which does level by level listing for each directory.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org



[jira] [Resolved] (MAPREDUCE-7386) Maven parallel builds (skipping tests) fail

2022-11-04 Thread Steve Loughran (Jira)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-7386?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran resolved MAPREDUCE-7386.
---
Fix Version/s: 3.4.0
   Resolution: Fixed

in trunk, backport once we are happy that it is stable

> Maven parallel builds (skipping tests) fail
> ---
>
> Key: MAPREDUCE-7386
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7386
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: build
>Affects Versions: 3.4.0, 3.3.5
> Environment: The problem occurred while using the Hadoop development 
> environment (Ubuntu)
>Reporter: Steve Vaughan
>Assignee: Steve Vaughan
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> Running a parallel build fails during assembly with the following error when 
> running either package or install:
> {code:java}
> org.apache.maven.lifecycle.LifecycleExecutionException: Failed to execute 
> goal org.apache.maven.plugins:maven-assembly-plugin:2.4:single 
> (package-mapreduce) on project hadoop-mapreduce: Failed to create assembly: 
> Artifact: org.apache.hadoop:hadoop-mapreduce-client-core:jar:3.4.0-SNAPSHOT 
> (included by module) does not have an artifact with a file. Please ensure the 
> package phase is run before the assembly is generated. {code}
> {code:java}
> Caused by: org.apache.maven.plugin.MojoExecutionException: Failed to create 
> assembly: Artifact: 
> org.apache.hadoop:hadoop-mapreduce-client-core:jar:3.4.0-SNAPSHOT (included 
> by module) does not have an artifact with a file. Please ensure the package 
> phase is run before the assembly is generated.  {code}
> The command executed was:
> {code:java}
> $ mvn -nsu clean install -Pdist,native -DskipTests -Dtar 
> -Dmaven.javadoc.skip=true -T 2C {code}
> Adding dependencies to the assembly plugin configuration addresses the issue 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org



[jira] [Resolved] (MAPREDUCE-7411) Use secure XML parser utils in MapReduce

2022-10-26 Thread Steve Loughran (Jira)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-7411?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran resolved MAPREDUCE-7411.
---
Fix Version/s: 3.4.0
   3.3.5
   Resolution: Fixed

merged back to branch-3.3.5

> Use secure XML parser utils in MapReduce
> 
>
> Key: MAPREDUCE-7411
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7411
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Reporter: PJ Fanning
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.3.5
>
>
> Uptake of HADOOP-18469



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org



Re: [DISCUSS] Hadoop 3.3.5 release planning

2022-10-11 Thread Steve Loughran
On Fri, 7 Oct 2022 at 22:36, Wei-Chiu Chuang  wrote:

> Bumping this up. Adding the [DISCUSS] text to make this message stand out
> of your inbox.
>
> I certainly missed this message and only realized 3.3.5 has more than just
> security updates.
>
> What was the issue with the ARM64 build? I was able to publish ARM64 build
> for 3.3.1 release without problems.
>

changes in the underlying dependencies, python libraries in particular.
things wouldn't build on branch-3.3.

Also, building on a macbook m1 wasn't a problem then -it is now and some of
the bash tests needed tuning

fixed now HADOOP-18401



>
>
> On Tue, Sep 27, 2022 at 9:35 AM Steve Loughran  >
> wrote:
>
> > Mukund has just created the new Hadoop release JIRA,
> > https://issues.apache.org/jira/browse/HADOOP-18470, and is doing the
> first
> > build/test before pushing it up. This is off branch-3.3, so it will have
> > more significant changes than the 3.3.3 and 3.3.4 releases, which were
> just
> > CVE/integration fixes.
> >
> > The new branch, branch-3.3.5 has its maven/hadoop.version set to
> > 3.3.5-SNAPSHOT.
> >
> > All JIRA issues fixed/blocked for 3.3.9 now reference 3.3.5. The next
> step
> > of the release is to actually move those wontfix issues back to being
> > against 3.3.9
> >
> > There is still a 3.3.9 version; branch-3.3's maven build still refers to
> > it. Issues found/fixed in branch-3.3 *but not the new branch-3.3.5
> branch-
> > should still refer to this version. Anything targeting the 3.3.5 release
> > must be committed to the new branch, and have its JIRA version tagged
> > appropriately.
> >
> > All changes to be cherrypicked into 3.3.5, except for those ones related
> to
> > the release process itself, MUST be in branch-3.3 first, and SHOULD be in
> > trunk unless there is some fundamental reason they can't apply there
> > (reload4j etc).
> >
> > Let's try and stabilise this releases, especially bringing up to date all
> > the JAR dependencies which we can safely update.
> >
> > Anyone planning to give talks at ApacheCon about forthcoming features
> > already in 3.3 SHOULD
> >
> >1. reference Hadoop 3.3.5 as the version
> >2. make sure their stuff works.
> >
> > Mukund will be at the conf; find him and offer any help you can in
> getting
> > this release out.
> >
> > I'd like to get that Arm64 build workingdoes anyone else want to get
> > involved?
> >
> > -steve
> >
>


HADOOP-18470 Release hadoop 3.3.5

2022-09-27 Thread Steve Loughran
Mukund has just created the new Hadoop release JIRA,
https://issues.apache.org/jira/browse/HADOOP-18470, and is doing the first
build/test before pushing it up. This is off branch-3.3, so it will have
more significant changes than the 3.3.3 and 3.3.4 releases, which were just
CVE/integration fixes.

The new branch, branch-3.3.5 has its maven/hadoop.version set to
3.3.5-SNAPSHOT.

All JIRA issues fixed/blocked for 3.3.9 now reference 3.3.5. The next step
of the release is to actually move those wontfix issues back to being
against 3.3.9

There is still a 3.3.9 version; branch-3.3's maven build still refers to
it. Issues found/fixed in branch-3.3 *but not the new branch-3.3.5 branch-
should still refer to this version. Anything targeting the 3.3.5 release
must be committed to the new branch, and have its JIRA version tagged
appropriately.

All changes to be cherrypicked into 3.3.5, except for those ones related to
the release process itself, MUST be in branch-3.3 first, and SHOULD be in
trunk unless there is some fundamental reason they can't apply there
(reload4j etc).

Let's try and stabilise this releases, especially bringing up to date all
the JAR dependencies which we can safely update.

Anyone planning to give talks at ApacheCon about forthcoming features
already in 3.3 SHOULD

   1. reference Hadoop 3.3.5 as the version
   2. make sure their stuff works.

Mukund will be at the conf; find him and offer any help you can in getting
this release out.

I'd like to get that Arm64 build workingdoes anyone else want to get
involved?

-steve


the next hadoop 3.3.x release

2022-09-09 Thread Steve Loughran
Hi

Mukund Thakur plans to fork off the next branch-3.3. release in 10 days
time. last chance to get changes in.

I'm away next week...try not to break the branch. Any testing you can do
would be appreciated

Steve


[jira] [Resolved] (MAPREDUCE-7403) Support spark dynamic partitioning in the Manifest Committer

2022-08-24 Thread Steve Loughran (Jira)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-7403?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran resolved MAPREDUCE-7403.
---
Fix Version/s: 3.4.0
   3.3.9
   Resolution: Fixed

hadoop side is in; needs spark to match

> Support spark dynamic partitioning in the Manifest Committer
> 
>
> Key: MAPREDUCE-7403
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7403
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: mrv2
>Affects Versions: 3.3.9
>    Reporter: Steve Loughran
>    Assignee: Steve Loughran
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.3.9
>
>
> Currently the spark integration with PathOutputCommitters rejects attempt to 
> instantiate them if dynamic partitioning is enabled. That is because the 
> spark partitioning code assumes that
> # file rename works as a fast and safe commit algorithm
> # the working directory is in the same FS as the final directory
> Assumption 1 doesn't hold on s3a, and #2 isn't true for the staging 
> committers.
> The new abfs/gcs manifest committer and the target stores do meet both 
> requirements. So we no longer need to reject the operation, provided the 
> spark side binding-code can can identify when all is good.
> Proposed: add a new hasCapability() probe which, if, a committer implements 
> StreamCapabilities can be used to see if the committer will work. 
> ManifestCommitter will declare that it holds. As the API has existed since 
> 2.10, it will be immediately available.
> spark's PathOutputCommitProtocol to query the committer in setupCommitter, 
> and fail if dynamicPartitionOverwrite is requested but not available.
> BindingParquetOutputCommitter to implement and forward 
> StreamCapabilities.hasCapability. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org



[jira] [Created] (MAPREDUCE-7403) Support spark dynamic partitioning in the Manifest Committer

2022-08-09 Thread Steve Loughran (Jira)
Steve Loughran created MAPREDUCE-7403:
-

 Summary: Support spark dynamic partitioning in the Manifest 
Committer
 Key: MAPREDUCE-7403
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-7403
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv2
Affects Versions: 3.3.9
Reporter: Steve Loughran
Assignee: Steve Loughran



Currently the spark integration with PathOutputCommitters rejects attempt to 
instantiate them if dynamic partitioning is enabled. That is because the spark 
partitioning code assumes that
# file rename works as a fast and safe commit algorithm
# the working directory is in the same FS as the final directory

Assumption 1 doesn't hold on s3a, and #2 isn't true for the staging committers.


The new abfs/gcs manifest committer and the target stores do meet both 
requirements. So we no longer need to reject the operation, provided the spark 
side binding-code can can identify when all is good.


Proposed: add a new hasCapability() probe which, if, a committer implements 
StreamCapabilities can be used to see if the committer will work. 
ManifestCommitter will declare that it holds. As the API has existed since 
2.10, it will be immediately available.

spark's PathOutputCommitProtocol to query the committer in setupCommitter, and 
fail if dynamicPartitionOverwrite is requested but not available.

BindingParquetOutputCommitter to implement and forward 
StreamCapabilities.hasCapability. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org



[ANNOUNCE] Apache Hadoop 3.3.4 release

2022-08-08 Thread Steve Loughran
On behalf of the Apache Hadoop Project Management Committee, I am
pleased to announce the release of Apache Hadoop 3.3.4.

---
This is a release of Apache Hadoop 3.3 line.

It contains a small number of security and critical integration fixes since
3.3.3.

Users of Apache Hadoop 3.3.3 should upgrade to this release.

Users of hadoop 2.x and hadoop 3.2 should also upgrade to the 3.3.x line.
As well as feature enhancements, this is the sole branch currently
receiving fixes for anything other than critical security/data integrity
issues.

Users are encouraged to read the [overview of major changes][1] since
release 3.3.3.
For details of bug fixes, improvements, and other enhancements since the
previous 3.3.3 release,
please check [release notes][2] and [changelog][3].

[1]: http://hadoop.apache.org/docs/r3.3.4/index.html
[2]:
http://hadoop.apache.org/docs/r3.3.4/hadoop-project-dist/hadoop-common/release/3.3.4/RELEASENOTES.3.3.4.html
[3]:
http://hadoop.apache.org/docs/r3.3.4/hadoop-project-dist/hadoop-common/release/3.3.4/CHANGELOG.3.3.4.html


Many thanks to everyone who helped in this release by supplying patches,
reviewing them, helping get this release building and testing and
reviewing the final artifacts.

-Steve


Re: [VOTE] Release Apache Hadoop 3.3.4

2022-08-08 Thread Steve Loughran
thanks.

the release is up for download as is the site; i will do the announcement
now.

also automated a lot more of the work of doing the release, inc testing
across projects
https://github.com/steveloughran/validate-hadoop-client-artifacts

On Thu, 4 Aug 2022 at 19:08, Stack  wrote:

> +1 (Sorry, took me a while)
>
> Ran: ./dev-support/hadoop-vote.sh --source
> https://dist.apache.org/repos/dist/dev/hadoop/hadoop-3.3.4-RC1/
>
> * Signature: ok
>
> * Checksum : failed
>
> * Rat check (17.0.1): ok
>
>  - mvn clean apache-rat:check
>
> * Built from source (17.0.1): ok
>
>  - mvn clean install  -DskipTests
>
> * Built tar from source (17.0.1): ok
>
>  - mvn clean package  -Pdist -DskipTests -Dtar
> -Dmaven.javadoc.skip=true
>
> Took a look at website. Home page says stuff like, “ARM Support: This is
> the first release to support ARM architectures.“, which I don’t think is
> true of 3.3.4 but otherwise, looks fine.
>
> Only played with HDFS. UIs looked right.
>
> Deployed to ten node arm64 cluster. Ran the hbase verification job on top
> of it and all passed. Did some kills, stuff came back.
>
> I didn't spend time on unit tests but one set passed on a local rig here:
>
> [image: image.png]
> Stack
>
> On Fri, Jul 29, 2022 at 11:48 AM Steve Loughran
>  wrote:
>
>> I have put together a release candidate (RC1) for Hadoop 3.3.4
>>
>> The RC is available at:
>> https://dist.apache.org/repos/dist/dev/hadoop/hadoop-3.3.4-RC1/
>>
>> The git tag is release-3.3.4-RC1, commit a585a73c3e0
>>
>> The maven artifacts are staged at
>> https://repository.apache.org/content/repositories/orgapachehadoop-1358/
>>
>> You can find my public key at:
>> https://dist.apache.org/repos/dist/release/hadoop/common/KEYS
>>
>> Change log
>>
>> https://dist.apache.org/repos/dist/dev/hadoop/hadoop-3.3.4-RC1/CHANGELOG.md
>>
>> Release notes
>>
>> https://dist.apache.org/repos/dist/dev/hadoop/hadoop-3.3.4-RC1/RELEASENOTES.md
>>
>> There's a very small number of changes, primarily critical code/packaging
>> issues and security fixes.
>>
>> See the release notes for details.
>>
>> Please try the release and vote. The vote will run for 5 days.
>>
>> steve
>>
>


Re: [VOTE] Release Apache Hadoop 3.3.4

2022-08-04 Thread Steve Loughran
The vote passed with the following result.

Binding PMC members:

+1 Chris Nauroth
+1 Steve Loughran
+1 Masatake Iwasaki

Non binding votes:

+1 Ashutosh Gupta

Cheng Pan was worried about the new transient kotlin dependency. They are
working on a PR there which we can target for the successor to this release
https://issues.apache.org/jira/browse/HDFS-16714

I'm going to publish the artifacts, site docs, maven artifacts, then
announce.

Thank you to all who helped to push this release out.


On Thu, 4 Aug 2022 at 11:48, Ashutosh Gupta 
wrote:

> +1 (non-binding)
>
> * Builds from source looks good.
> * Checksums and signatures are correct.
> * Running basic HDFS commands and running simple MapReduce jobs looks good.
> * Skimmed through the contents of site documentation and it looks good.
>
> Thanks Steve for driving this release.
>
> Ashutosh
>
>
> On Wed, Aug 3, 2022 at 9:39 PM Chris Nauroth  wrote:
>
> > +1 (binding)
> >
> > * Verified all checksums.
> > * Verified all signatures.
> > * Built from source, including native code on Linux.
> > * mvn clean package -Pnative -Psrc -Drequire.openssl -Drequire.snappy
> > -Drequire.zstd -DskipTests
> > * Tests passed.
> > * mvn --fail-never clean test -Pnative -Dparallel-tests
> > -Drequire.snappy -Drequire.zstd -Drequire.openssl
> > -Dsurefire.rerunFailingTestsCount=3 -DtestsThreadCount=8
> > * Checked dependency tree to make sure we have all of the expected
> library
> > updates that are mentioned in the release notes.
> > * mvn -o dependency:tree
> >
> > I saw a LibHDFS test failure, but I know it's something flaky that's
> > already tracked in a JIRA issue. The release looks good. Steve, thank you
> > for driving this.
> >
> > Chris Nauroth
> >
> >
> > On Wed, Aug 3, 2022 at 11:27 AM Steve Loughran
>  > >
> > wrote:
> >
> > > my vote for this is +1, binding.
> > >
> > > obviously I`m biased, but i do not want to have to issue any more
> interim
> > > releases before the feature release off branch-3.3, so I am trying to
> be
> > > ruthless.
> > >
> > > my client vaidator ant project has a more targets to help with
> releasing,
> > > and now builds a lot mor of my local projects
> > > https://github.com/steveloughran/validate-hadoop-client-artifacts
> > > all good as far as my test coverage goes, with these projects
> validating
> > > the staged dependencies.
> > >
> > > now, who else can review
> > >
> > > On Fri, 29 Jul 2022 at 19:47, Steve Loughran 
> > wrote:
> > >
> > > >
> > > >
> > > > I have put together a release candidate (RC1) for Hadoop 3.3.4
> > > >
> > > > The RC is available at:
> > > > https://dist.apache.org/repos/dist/dev/hadoop/hadoop-3.3.4-RC1/
> > > >
> > > > The git tag is release-3.3.4-RC1, commit a585a73c3e0
> > > >
> > > > The maven artifacts are staged at
> > > >
> > https://repository.apache.org/content/repositories/orgapachehadoop-1358/
> > > >
> > > > You can find my public key at:
> > > > https://dist.apache.org/repos/dist/release/hadoop/common/KEYS
> > > >
> > > > Change log
> > > >
> > >
> >
> https://dist.apache.org/repos/dist/dev/hadoop/hadoop-3.3.4-RC1/CHANGELOG.md
> > > >
> > > > Release notes
> > > >
> > > >
> > >
> >
> https://dist.apache.org/repos/dist/dev/hadoop/hadoop-3.3.4-RC1/RELEASENOTES.md
> > > >
> > > > There's a very small number of changes, primarily critical
> > code/packaging
> > > > issues and security fixes.
> > > >
> > > > See the release notes for details.
> > > >
> > > > Please try the release and vote. The vote will run for 5 days.
> > > >
> > > > steve
> > > >
> > >
> >
>


Re: [VOTE] Release Apache Hadoop 3.3.4

2022-08-03 Thread Steve Loughran
my vote for this is +1, binding.

obviously I`m biased, but i do not want to have to issue any more interim
releases before the feature release off branch-3.3, so I am trying to be
ruthless.

my client vaidator ant project has a more targets to help with releasing,
and now builds a lot mor of my local projects
https://github.com/steveloughran/validate-hadoop-client-artifacts
all good as far as my test coverage goes, with these projects validating
the staged dependencies.

now, who else can review

On Fri, 29 Jul 2022 at 19:47, Steve Loughran  wrote:

>
>
> I have put together a release candidate (RC1) for Hadoop 3.3.4
>
> The RC is available at:
> https://dist.apache.org/repos/dist/dev/hadoop/hadoop-3.3.4-RC1/
>
> The git tag is release-3.3.4-RC1, commit a585a73c3e0
>
> The maven artifacts are staged at
> https://repository.apache.org/content/repositories/orgapachehadoop-1358/
>
> You can find my public key at:
> https://dist.apache.org/repos/dist/release/hadoop/common/KEYS
>
> Change log
> https://dist.apache.org/repos/dist/dev/hadoop/hadoop-3.3.4-RC1/CHANGELOG.md
>
> Release notes
>
> https://dist.apache.org/repos/dist/dev/hadoop/hadoop-3.3.4-RC1/RELEASENOTES.md
>
> There's a very small number of changes, primarily critical code/packaging
> issues and security fixes.
>
> See the release notes for details.
>
> Please try the release and vote. The vote will run for 5 days.
>
> steve
>


[VOTE] Release Apache Hadoop 3.3.4

2022-07-29 Thread Steve Loughran
I have put together a release candidate (RC1) for Hadoop 3.3.4

The RC is available at:
https://dist.apache.org/repos/dist/dev/hadoop/hadoop-3.3.4-RC1/

The git tag is release-3.3.4-RC1, commit a585a73c3e0

The maven artifacts are staged at
https://repository.apache.org/content/repositories/orgapachehadoop-1358/

You can find my public key at:
https://dist.apache.org/repos/dist/release/hadoop/common/KEYS

Change log
https://dist.apache.org/repos/dist/dev/hadoop/hadoop-3.3.4-RC1/CHANGELOG.md

Release notes
https://dist.apache.org/repos/dist/dev/hadoop/hadoop-3.3.4-RC1/RELEASENOTES.md

There's a very small number of changes, primarily critical code/packaging
issues and security fixes.

See the release notes for details.

Please try the release and vote. The vote will run for 5 days.

steve


Re: [VOTE] Release Apache Hadoop 3.3.4

2022-07-26 Thread Steve Loughran
cancelling this RC so i can issue a new one with an updated reload4j
https://issues.apache.org/jira/browse/HADOOP-18354

i will also do a pr to update aws sdk, which is less critical (the jackson
databind classes don't seem to get used), -upgrading will stop audit tools
overreacting

On Thu, 21 Jul 2022 at 19:07, Steve Loughran  wrote:

>
> I have put together a release candidate (RC0) for Hadoop 3.3.4
>
> The RC is available at:
> https://dist.apache.org/repos/dist/dev/hadoop/hadoop-3.3.4-RC0/
>
> The git tag is release-3.3.4-RC0, commit c679bc76d26
>
> The maven artifacts are staged at
> https://repository.apache.org/content/repositories/orgapachehadoop-1356/
>
> You can find my public key at:
> https://dist.apache.org/repos/dist/release/hadoop/common/KEYS
>
> Change log
> https://dist.apache.org/repos/dist/dev/hadoop/hadoop-3.3.4-RC0/CHANGELOG.md
>
> Release notes
>
> https://dist.apache.org/repos/dist/dev/hadoop/hadoop-3.3.4-RC0/RELEASENOTES.md
>
> There's a very small number of changes, primarily critical code/packaging
> issues and security fixes.
>
> See the release notes for details.
>
> Please try the release and vote. The vote will run for 5 days.
>
> -Steve
>
>
>


[VOTE] Release Apache Hadoop 3.3.4

2022-07-21 Thread Steve Loughran
I have put together a release candidate (RC0) for Hadoop 3.3.4

The RC is available at:
https://dist.apache.org/repos/dist/dev/hadoop/hadoop-3.3.4-RC0/

The git tag is release-3.3.4-RC0, commit c679bc76d26

The maven artifacts are staged at
https://repository.apache.org/content/repositories/orgapachehadoop-1356/

You can find my public key at:
https://dist.apache.org/repos/dist/release/hadoop/common/KEYS

Change log
https://dist.apache.org/repos/dist/dev/hadoop/hadoop-3.3.4-RC0/CHANGELOG.md

Release notes
https://dist.apache.org/repos/dist/dev/hadoop/hadoop-3.3.4-RC0/RELEASENOTES.md

There's a very small number of changes, primarily critical code/packaging
issues and security fixes.

See the release notes for details.

Please try the release and vote. The vote will run for 5 days.

-Steve


Fwd: [jira] [Created] (MAPREDUCE-7372) MapReduce set permission too late in copyJar method

2022-07-18 Thread Steve Loughran
could someone look at the PR for this,
https://github.com/apache/hadoop/pull/4026

I think it iis ok to go in without tests, but don't want to break things.
if someone else could also review, that would be great

-- Forwarded message -
From: Zhang Dongsheng (Jira) 
Date: Thu, 24 Feb 2022 at 03:53
Subject: [jira] [Created] (MAPREDUCE-7372) MapReduce set permission too
late in copyJar method
To: 


Zhang Dongsheng created MAPREDUCE-7372:
--

 Summary: MapReduce set permission too late in copyJar method
 Key: MAPREDUCE-7372
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-7372
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv2
Affects Versions: 3.3.1
Reporter: Zhang Dongsheng


while execute copyJar in JobResourceUploader .the setPermission running
after setReplication,but setReplication need permission first.So if we set
restrict umask in project such as 0600, the mapreduce process will fail.

In patch file , I put setPermisson before setReplication.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org


[jira] [Resolved] (MAPREDUCE-7397) Received status code 501 from server: HTTPS Required

2022-07-18 Thread Steve Loughran (Jira)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-7397?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran resolved MAPREDUCE-7397.
---
Resolution: Cannot Reproduce

> Received status code 501 from server: HTTPS Required
> 
>
> Key: MAPREDUCE-7397
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7397
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: build, client
>Affects Versions: 3.3.3
>Reporter: Jitin Dominic
>Priority: Major
>
> I've updated my _hadoop-client_ dependency to v3.3.3. While using 
> {_}gradle{_}(v1.10) to build my jars, I keep getting following error:
> {code:java}
> org.apache.hadoop:hadoop-client:3.3.3 > 
> org.apache.hadoop:hadoop-mapreduce-client-jobclient:3.3.3 > 
> org.apache.hadoop:hadoop-mapreduce-client-common:3.3.3
> Could not HEAD 
> 'http://repo1.maven.org/maven2/javax/ws/rs/javax.ws.rs-api/2.1.1/javax.ws.rs-api-2.1.1.pom'.
>  Received status code 501 from server: HTTPS Required
>    > Could not parse POM 
> https://repo1.maven.org/maven2/javax/ws/rs/javax.ws.rs-api/2.1.1/javax.ws.rs-api-2.1.1.pom
>       > Illegal character in path at index 88: 
> https://repo1.maven.org/maven2/javax/ws/rs/javax.ws.rs-api/2.1.1/javax.ws.rs-api-2.1.1.${packaging.type}
>    > Could not parse POM 
> https://repo.grails.org/grails/core/javax/ws/rs/javax.ws.rs-api/2.1.1/javax.ws.rs-api-2.1.1.pom
>       > Illegal character in path at index 93: 
> https://repo.grails.org/grails/core/javax/ws/rs/javax.ws.rs-api/2.1.1/javax.ws.rs-api-2.1.1.${packaging.type}
>  {code}
>  
> Is there a work-around for this?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org



[jira] [Resolved] (MAPREDUCE-7391) TestLocalDistributedCacheManager failing after HADOOP-16202

2022-06-22 Thread Steve Loughran (Jira)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-7391?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran resolved MAPREDUCE-7391.
---
Fix Version/s: 3.4.0
   3.3.9
   Resolution: Fixed

> TestLocalDistributedCacheManager failing after HADOOP-16202
> ---
>
> Key: MAPREDUCE-7391
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7391
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: test
>Affects Versions: 3.4.0, 3.3.9
>    Reporter: Steve Loughran
>    Assignee: Steve Loughran
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.3.9
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> After HADOOP-16202, TestLocalDistributedCacheManager.testDownload is failing 
> with an NPE



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org



Re: Hadoop 3.3.4 release process underway

2022-06-22 Thread Steve Loughran
update


   1. branch 3.3 has a version of 3.3.9-SNAPSHOT
   2. and 3.3.9 should be the version for fixes against this, with 3.3.4
   for the new point release
   3. please don't use 3.3.4 for branch-3.3 changes from now on

I've already got a PR up with the changes; going to create an asf
branch-3.3.4 branch mirroring it and kicking off with the pom update.

I'm going to do a dry run of a release this week to build the binaries on
x86 and ARM but not put for a vote as I am away next week. instead it'll be
a validation of my processes, ant-based automation etc. (
https://github.com/steveloughran/validate-hadoop-client-artifacts)

 I will kick off the release the following week, which, being july4 week,
may have more people offline. it does give larry a chance to get
https://issues.apache.org/jira/browse/HADOOP-18074 in, as it may have
security implications.



On Mon, 20 Jun 2022 at 19:02, Steve Loughran  wrote:

> I'm setting things up for a new release
>
> https://issues.apache.org/jira/browse/HADOOP-18305
>
> absolute minimum of fixes. as well as some related to ZK lockdown, i would
> like to include
>
> https://issues.apache.org/jira/browse/HADOOP-18303
> remove shading exclusion of javax.ws.rs-api from hadoop-client-runtime
>
> and
> https://issues.apache.org/jira/browse/HADOOP-18307
> remove hadoop-cos as a dependency of hadoop-cloud-storage
>
> the last one is a lstminute workaround for, a classpath ordering issue due
> to too many libraries having unshaded references to mozilla/prefix-list.txt
> https://issues.apache.org/jira/browse/HADOOP-18159
>
> the proper fixes would be getting that updated library tested (who is set
> up to test tencent cloud?) and ideally (aws, cos, gcs shaded libraries to
> shade their references)
>
> for 3.3.4, i am just going to cut the declaration of the module as a
> dependency of hadoop-cloud-storage so out of downstream apps unless they
> explicitly ask forit.
>
> now, numbering
>
>
>1. I am calling this 3.3.4
>2. I am going to increase the version of branch 3.3. to 3.3.9. that
>leaves space for some more but doesn't confuse jira dropdown dialogs.
>
>
> i do believe branch-3.3. should be renamed branch-3.4 and the release i
> plan to do with mukund called 3.4.0, but that is another bit of project
> organisation.
>
> expect the first RC up soon, I am going to be away on vacation from june
> 28 to july 23 though, which complicates things
>
>
>


Hadoop 3.3.4 release process underway

2022-06-20 Thread Steve Loughran
I'm setting things up for a new release

https://issues.apache.org/jira/browse/HADOOP-18305

absolute minimum of fixes. as well as some related to ZK lockdown, i would
like to include

https://issues.apache.org/jira/browse/HADOOP-18303
remove shading exclusion of javax.ws.rs-api from hadoop-client-runtime

and
https://issues.apache.org/jira/browse/HADOOP-18307
remove hadoop-cos as a dependency of hadoop-cloud-storage

the last one is a lstminute workaround for, a classpath ordering issue due
to too many libraries having unshaded references to mozilla/prefix-list.txt
https://issues.apache.org/jira/browse/HADOOP-18159

the proper fixes would be getting that updated library tested (who is set
up to test tencent cloud?) and ideally (aws, cos, gcs shaded libraries to
shade their references)

for 3.3.4, i am just going to cut the declaration of the module as a
dependency of hadoop-cloud-storage so out of downstream apps unless they
explicitly ask forit.

now, numbering


   1. I am calling this 3.3.4
   2. I am going to increase the version of branch 3.3. to 3.3.9. that
   leaves space for some more but doesn't confuse jira dropdown dialogs.


i do believe branch-3.3. should be renamed branch-3.4 and the release i
plan to do with mukund called 3.4.0, but that is another bit of project
organisation.

expect the first RC up soon, I am going to be away on vacation from june 28
to july 23 though, which complicates things


[DISCUSS] Forthcoming Hadoop releases

2022-06-08 Thread Steve Loughran
I want to start a quick discussion on a plan for hadoop releases this
summer. I am willing to do the release manager work. Mukund and Mehakmeet
have have already volunteered to help even if they don't know that yet.

I've got two goals

   1. minor followup to 3.3.3
   2. feature release of new stuff


*Followup to 3.3.3, working title "3.3.4"*

I've a PR up on github to add those change to the 3.3.2/3.3.3 line which
have shipped elsewhere and/or we consider critical.

https://github.com/apache/hadoop/pull/4345

This is for critical data integrity/service availability patches; things
like test values we will just triage.

I can start a new release of this at the end of the week, with an RC up
next week ready for review. With the wonderful docker based build and some
extra automation I've been adding for validating releases
(validate-hadoop-client-artifacts), getting that RC out is not that
problematic; issuing git commands is the heavy lifting.

What does take effort is the testing by everybody else; the smaller the set
of changes the more this is limited to validating the artifacts and the
maven publishing.

As it is a follow up to hadoop 3.3.3 then it needs the version number
3.3.4. This raises the question "what about branch-3.3", which brings me to
the next deliverable.

*branch-3.3 => branch-3.4, targeting hadoop 3.4.0 in 3Q22*

With the 3.3.x line being maintained for critical fixes only, make the
hadoop version in branch-3.3 "hadoop-3.4.0" and release later this year.

A release schedule which is probably doable despite people taking time off
over the summer could be

   - feature complete by July/August
   - RC(s) sept/oct with goal of shipping by October


I volunteer to be release manager, albeit with critical help from
colleagues. For people who haven't worked with me on a project release
before, know that I'm fairly ruthless about getting changes in once the
branch is locked down. So get those features in now.

hadoop trunk gets its version number incremented to 3.5.0-SNAPSHOT

It's probably time we think about what a release off trunk would mean -but
t I would like to get a branch-3.3 release out rather than later.

What do people think of this? And is there anyone else willing to get
involved with the release process?

-Steve


Re: [VOTE] Release Apache Hadoop 2.10.2 - RC0

2022-05-30 Thread Steve Loughran
+1 binding

I've extended my validator project
https://github.com/steveloughran/validate-hadoop-client-artifacts

it now
* fetches KEYS
* fetches an RC from a remote url
* validates signing
* untars and tries to build source
* untars binary release and tries some native commands

the source build fails because I'm on a mac M1 and branch-2 doesn't have
the dynamic protoc switching. not sure if it can go in at all.

binary untar worked, basic commands ok except for bin/hadoop checknative
-again, no native libs for this system.

so: as far as all my newly automated process goes, and allowing for mac m1
native code issues, I'm happy.

On Wed, 25 May 2022 at 03:41, Masatake Iwasaki 
wrote:

> Hi all,
>
> Here's Hadoop 2.10.2 release candidate #0:
>
> The RC is available at:
>https://home.apache.org/~iwasakims/hadoop-2.10.2-RC0/
>
> The RC tag is at:
>https://github.com/apache/hadoop/releases/tag/release-2.10.2-RC0
>
> The Maven artifacts are staged at:
>https://repository.apache.org/content/repositories/orgapachehadoop-1350
>
> You can find my public key at:
>https://downloads.apache.org/hadoop/common/KEYS
>
> Please evaluate the RC and vote.
> The vote will be open for (at least) 5 days.
>
> Thanks,
> Masatake Iwasaki
>
> -
> To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
> For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
>
>


[ANNOUNCE] Apache Hadoop 3.3.3 release

2022-05-18 Thread Steve Loughran
On behalf of the Apache Hadoop Project Management Committee, I'm pleased to
announce that Hadoop 3.3.3 has been
released:

https://hadoop.apache.org/release/3.3.3.html

This is the third stable release of the Apache Hadoop 3.3 line.

It contains 23 bug fixes, improvements and enhancements since 3.3.2.

This is primarily a security update; for this reason, upgrading is strongly
advised.

Users are encouraged to read the overview of major changes[1] since 3.3.2.
For details of bug fixes, improvements, and other enhancements since the
previous 3.3.2 release,
please check release notes[2] and changelog[3].

[1]: /docs/r3.3.3/index.html
[2]:
http://hadoop.apache.org/docs/r3.3.3/hadoop-project-dist/hadoop-common/release/3.3.3/RELEASENOTES.3.3.3.html
[3]:
http://hadoop.apache.org/docs/r3.3.3/hadoop-project-dist/hadoop-common/release/3.3.3/CHANGELOG.3.3.3.html


As the release notes highlight, this release contains HADOOP-18088 "Replace
log4j 1.x with reload4j"
https://issues.apache.org/jira/browse/HADOOP-18088

This ensures that the version of log4j shipped is free of known CVEs. the
standard log4j 1.2.17 has some known CVEs in classes which were never uses;
reload4j cuts them out. Audit scanning tools should stop highlighting
perceived risks here.

If you are using maven exclusions to manage logging libraries, or were
otherwise replacing the log4j artifacts in deployments, note the different
library/artifact names which need to be handled.

Many thanks to everyone who helped in this release by supplying patches,
reviewing them, helping get this release building and testing reviewing the
final artifacts.

Steve


Re: [VOTE] Release Apache Hadoop 3.3.3 (RC1)

2022-05-17 Thread Steve Loughran
Here are the result of the vote.

Binding PMC members

+1 Stack
+1 Masatake Iwasaki
+1 Xiaoqiao He
+1 Chao Sun
+1 Chris Nauroth


Non binding votes

+1 Ayush Saxena
+1 Viraj Jasani
+1 Mukund Thakur


There were no negative votes, and maven is happy.
I'm pleased to announce therefore, that this release candidate has been
approved to become the final release.

I'm going to go through the next steps of the process: publishing the
artifacts, site docs, maven artifacts, announcements etc.

Thank you to all who reviewed this!

-Steve

PS: I'm going to start some discussion on correcting/tuning the release
process, once I've got through it.

On Mon, 16 May 2022 at 19:16, Chris Nauroth  wrote:

> +1 (binding)
>
> - Verified all checksums.
> - Verified all signatures.
> - Built from source, including native code on Linux.
> - Ran several examples successfully.
>
> Chris Nauroth
>
>
> On Mon, May 16, 2022 at 10:06 AM Chao Sun  wrote:
>
> > +1
> >
> > - Compiled from source
> > - Verified checksums & signatures
> > - Launched a pseudo HDFS cluster and ran some simple commands
> > - Ran full Spark tests with the RC
> >
> > Thanks Steve!
> >
> > Chao
> >
> > On Mon, May 16, 2022 at 2:19 AM Ayush Saxena  wrote:
> > >
> > > +1,
> > > * Built from source.
> > > * Successful native build on Ubuntu 18.04
> > > * Verified Checksums.
> > >
> >
> (CHANGELOG.md,RELEASENOTES.md,hadoop-3.3.3-rat.txt,hadoop-3.3.3-site.tar.gz,hadoop-3.3.3-src.tar.gz,hadoop-3.3.3.tar.gz)
> > > * Verified Signature.
> > > * Successful RAT check
> > > * Ran basic HDFS shell commands.
> > > * Ran basic YARN shell commands.
> > > * Verified version in hadoop version command and UI
> > > * Ran some MR example Jobs.
> > > * Browsed
> UI(Namenode/Datanode/ResourceManager/NodeManager/HistoryServer)
> > > * Browsed the contents of Maven Artifacts.
> > > * Browsed the contents of the website.
> > >
> > > Thanx Steve for driving the release, Good Luck!!!
> > >
> > > -Ayush
> > >
> > > On Mon, 16 May 2022 at 08:20, Xiaoqiao He 
> wrote:
> > >
> > > > +1(binding)
> > > >
> > > > * Verified signature and checksum of the source tarball.
> > > > * Built the source code on Ubuntu and OpenJDK 11 by `mvn clean
> package
> > > > -DskipTests -Pnative -Pdist -Dtar`.
> > > > * Setup pseudo cluster with HDFS and YARN.
> > > > * Run simple FsShell - mkdir/put/get/mv/rm and check the result.
> > > > * Run example mr applications and check the result - Pi & wordcount.
> > > > * Check the Web UI of NameNode/DataNode/Resourcemanager/NodeManager
> > etc.
> > > >
> > > > Thanks Steve for your work.
> > > >
> > > > - He Xiaoqiao
> > > >
> > > > On Mon, May 16, 2022 at 4:25 AM Viraj Jasani 
> > wrote:
> > > > >
> > > > > +1 (non-binding)
> > > > >
> > > > > * Signature: ok
> > > > > * Checksum : ok
> > > > > * Rat check (1.8.0_301): ok
> > > > >  - mvn clean apache-rat:check
> > > > > * Built from source (1.8.0_301): ok
> > > > >  - mvn clean install  -DskipTests
> > > > > * Built tar from source (1.8.0_301): ok
> > > > >  - mvn clean package  -Pdist -DskipTests -Dtar
> > -Dmaven.javadoc.skip=true
> > > > >
> > > > > HDFS, MapReduce and HBase (2.5) CRUD functional testing on
> > > > > pseudo-distributed mode looks good.
> > > > >
> > > > >
> > > > > On Wed, May 11, 2022 at 10:26 AM Steve Loughran
> > > > 
> > > > > wrote:
> > > > >
> > > > > > I have put together a release candidate (RC1) for Hadoop 3.3.3
> > > > > >
> > > > > > The RC is available at:
> > > > > > https://dist.apache.org/repos/dist/dev/hadoop/3.3.3-RC1/
> > > > > >
> > > > > > The git tag is release-3.3.3-RC1, commit d37586cbda3
> > > > > >
> > > > > > The maven artifacts are staged at
> > > > > >
> > > >
> > https://repository.apache.org/content/repositories/orgapachehadoop-1349/
> > > > > >
> > > > > > You can find my public key at:
> > > > > > https://dist.apache.org/repos/dist/release/hadoop/common/KEYS
> > > > > >
> > > > > 

[jira] [Resolved] (MAPREDUCE-7378) An error occurred while concurrently writing to a path

2022-05-16 Thread Steve Loughran (Jira)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-7378?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran resolved MAPREDUCE-7378.
---
Resolution: Won't Fix

> An error occurred while concurrently writing to a path
> --
>
> Key: MAPREDUCE-7378
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7378
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Reporter: jingpan xiong
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> When we use FileOutputCommitter as the base class of Job Committer for other 
> components, we may meet an concurrently writing problem.
> Like HadoopMapReduceCommitProtocol in Spark, when there have multiple 
> application to write data in same path, they will commit job and task in the 
> "_temporary" dir. Once a Job finished ,it will delete the "_temporary" dir, 
> make the other jobs failed.
>  
> error message:
> {code:java}
> // code placeholder
> 21/11/04 19:01:21 ERROR Utils: Aborting task ExitCodeException exitCode=1: 
> chmod: cannot access 
> '/data/spark-examples/spark-warehouse/test/temporary/0/_temporary/attempt_202111041901182933014038999149736_0001_m_01
>  
> 4/dt=2021-11-03/hour=10/.part-1-95895b03-45d2-4ac6-806b-b76fd1dfa3dc.c000.snappy.parquet.crc':
>  No such file or directory at 
> org.apache.hadoop.util.Shell.runCommand(Shell.java:1008) at 
> org.apache.hadoop.util.Shell.run(Shell.java:901) at 
> org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:1213) at 
> org.apache.hadoop.util.Shell.execCommand(Shell.java:1307) at 
> org.apache.hadoop.util.Shell.execCommand(Shell.java:1289) at 
> org.apache.hadoop.fs.RawLocalFileSystem.setPermission(RawLocalFileSystem.java:978)
>  at 
> org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileOutputStream.(RawLocalFileSystem.java:324)
>  at 
> org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileOutputStream.(RawLocalFileSystem.java:294)
>  at 
> org.apache.hadoop.fs.RawLocalFileSystem.createOutputStreamWithMode(RawLocalFileSystem.java:439)
>  at 
> org.apache.hadoop.fs.RawLocalFileSystem.create(RawLocalFileSystem.java:428) 
> at 
> org.apache.hadoop.fs.RawLocalFileSystem.create(RawLocalFileSystem.java:459) 
> at 
> org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSOutputSummer.(ChecksumFileSystem.java:437)
>  at 
> org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:521) 
> at 
> org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:500) 
> at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:1195) at 
> org.apache.hadoop.fs.FileSystem.create(FileSystem.java:1175) at 
> org.apache.parquet.hadoop.util.HadoopOutputFile.create(HadoopOutputFile.java:74)
>  at 
> org.apache.parquet.hadoop.ParquetFileWriter.(ParquetFileWriter.java:329)
>  at 
> org.apache.parquet.hadoop.ParquetOutputFormat.getRecordWriter(ParquetOutputFormat.java:482)
>  at 
> org.apache.parquet.hadoop.ParquetOutputFormat.getRecordWriter(ParquetOutputFormat.java:420)
>  at 
> org.apache.parquet.hadoop.ParquetOutputFormat.getRecordWriter(ParquetOutputFormat.java:409)
>  at 
> org.apache.spark.sql.execution.datasources.parquet.ParquetOutputWriter.(ParquetOutputWriter.scala:36)
>  at 
> org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormat$$anon$1.newInstance(ParquetFileFormat.scala:150)
>  at 
> org.apache.spark.sql.execution.datasources.BaseDynamicPartitionDataWriter.renewCurrentWriter(FileFormatDataWriter.scala:290)
>  at 
> org.apache.spark.sql.execution.datasources.DynamicPartitionDataSingleWriter.write(FileFormatDataWriter.scala:357)
>  at 
> org.apache.spark.sql.execution.datasources.FileFormatDataWriter.writeWithMetrics(FileFormatDataWriter.scala:85)
>  at 
> org.apache.spark.sql.execution.datasources.FileFormatDataWriter.writeWithIterator(FileFormatDataWriter.scala:92)
>  at 
> org.apache.spark.sql.execution.datasources.FileFormatWriter$.$anonfun$executeTask$1(FileFormatWriter.scala:304)
>  at 
> org.apache.spark.util.Utils$.tryWithSafeFinallyAndFailureCallbacks(Utils.scala:1496)
>  at 
> org.apache.spark.sql.execution.datasources.FileFormatWriter$.executeTask(FileFormatWriter.scala:311)
>  at 
> org.apache.spark.sql.execution.datasources.FileFormatWriter$.$anonfun$write$16(FileFormatWriter.scala:229)
>  at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90) at 
> org.apache.spark.scheduler.Task.run(Task.scala:131) at 
> org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:506)
>  at org.apach

[VOTE] Release Apache Hadoop 3.3.3 (RC1)

2022-05-11 Thread Steve Loughran
I have put together a release candidate (RC1) for Hadoop 3.3.3

The RC is available at:
https://dist.apache.org/repos/dist/dev/hadoop/3.3.3-RC1/

The git tag is release-3.3.3-RC1, commit d37586cbda3

The maven artifacts are staged at
https://repository.apache.org/content/repositories/orgapachehadoop-1349/

You can find my public key at:
https://dist.apache.org/repos/dist/release/hadoop/common/KEYS

Change log
https://dist.apache.org/repos/dist/dev/hadoop/3.3.3-RC1/CHANGELOG.md

Release notes
https://dist.apache.org/repos/dist/dev/hadoop/3.3.3-RC1/RELEASENOTES.md

There's a very small number of changes, primarily critical code/packaging
issues and security fixes.

* The critical fixes which shipped in the 3.2.3 release.
* CVEs in our code and dependencies
* Shaded client packaging issues.
* A switch from log4j to reload4j

reload4j is an active fork of the log4j 1.17 library with the classes
which contain CVEs removed. Even though hadoop never used those classes,
they regularly raised alerts on security scans and concen from users.
Switching to the forked project allows us to ship a secure logging
framework. It will complicate the builds of downstream
maven/ivy/gradle projects which exclude our log4j artifacts, as they
need to cut the new dependency instead/as well.

See the release notes for details.

This is the second release attempt. It is the same git commit as before, but
fully recompiled with another republish to maven staging, which has bee
verified by building spark, as well as a minimal test project.

Please try the release and vote. The vote will run for 5 days.

-Steve


Re: [VOTE] Release Apache Hadoop 3.3.3

2022-05-10 Thread Steve Loughran
update: got a stub project whose test run requires the client api and
shaded artifacts, and whose clean target wiil delete all artifacts of a
specific version from your local repo, so as to ensure all it untainted

https://github.com/steveloughran/validate-hadoop-client-artifacts

I do now have a version in staging with the files; still don't know how the
previous delete lacked them, I will do more testing tomorrow before putting
up the next RC. which will be off the same git commit as before, just a
rebuild, repackage and republish.

-Steve

On Tue, 10 May 2022 at 00:18, Steve Loughran  wrote:

> I've done another docker build and the client jars appear to be there.
> I'll test tomorrow before putting up another vote. it will be exactly the
> same commit as before, just recompiled/republished
>
> On Mon, 9 May 2022 at 17:45, Chao Sun  wrote:
>
>> Agreed, that step #10 is out-dated and should be removed (I skipped
>> that when releasing Hadoop 3.3.2 but didn't update it, sorry).
>>
>> > How about using
>> https://repository.apache.org/content/repositories/orgapachehadoop-1348/
>>
>> Akira, I tried this too but it didn't work. I think we'd need the
>> artifacts to be properly pushed to the staging repository.
>>
>> > Could you please let me know how I can consume the Hadoop 3 jars in
>> maven?
>>
>> Gautham (if you are following this thread), you'll need to add the
>> following:
>>
>> 
>>staged
>>staged-releases
>>https://repository.apache.org/content/repositories/staging/
>> 
>>
>>      true
>>
>>
>>  true
>>
>>  
>>
>> to the `` section in your Maven pom.xml file.
>>
>> On Mon, May 9, 2022 at 8:52 AM Steve Loughran
>>  wrote:
>> >
>> > I didn't do that as the docker image was doing it itself...I discussed
>> this
>> > with Akira and Ayush & they agreed. so whatever went wrong. it was
>> > something else.
>> >
>> > I have been building a list of things I'd like to change there; cutting
>> > that line was one of them. but I need to work out the correct workflow.
>> >
>> > trying again, and creating a stub module to verify the client is in
>> staging
>> >
>> > On Mon, 9 May 2022 at 15:19, Masatake Iwasaki <
>> iwasak...@oss.nttdata.co.jp>
>> > wrote:
>> >
>> > > It seems to be caused by obsolete instruction in HowToRelease Wiki?
>> > >
>> > > After HADOOP-15058, `mvn deploy` is kicked by
>> > > `dev-support/bin/create-release --asfrelease`.
>> > > https://issues.apache.org/jira/browse/HADOOP-15058
>> > >
>> > > Step #10 in "Creating the release candidate (X.Y.Z-RC)" section
>> > > of the Wiki still instructs to run `mvn deploy` with `-DskipShade`.
>> > >
>> > > 2 sets of artifact are deployed after creating RC based on the
>> instruction.
>> > > The latest one contains empty shaded jars.
>> > >
>> > > hadoop-client-api and hadoop-client-runtime of already released 3.2.3
>> > > looks having same issue...
>> > >
>> > > Masatake Iwasaki
>> > >
>> > > On 2022/05/08 6:45, Akira Ajisaka wrote:
>> > > > Hi Chao,
>> > > >
>> > > > How about using
>> > > >
>> https://repository.apache.org/content/repositories/orgapachehadoop-1348/
>> > > > instead of
>> https://repository.apache.org/content/repositories/staging/ ?
>> > > >
>> > > > Akira
>> > > >
>> > > > On Sat, May 7, 2022 at 10:52 AM Ayush Saxena 
>> wrote:
>> > > >
>> > > >> Hmm, I see the artifacts ideally should have got overwritten by
>> the new
>> > > >> RC, but they didn’t. The reason seems like the staging path shared
>> > > doesn’t
>> > > >> have any jars…
>> > > >> That is why it was picking the old jars. I think Steve needs to
>> run mvn
>> > > >> deploy again…
>> > > >>
>> > > >> Sent from my iPhone
>> > > >>
>> > > >>> On 07-May-2022, at 7:12 AM, Chao Sun  wrote:
>> > > >>>
>> > > >>> 
>> > > >>>>
>> > > >>>> Chao can you use the one that Steve mentioned in the mail?
>> > > >>>
>> > > >>> Hm

Re: [VOTE] Release Apache Hadoop 3.3.3

2022-05-09 Thread Steve Loughran
I've done another docker build and the client jars appear to be there. I'll
test tomorrow before putting up another vote. it will be exactly the same
commit as before, just recompiled/republished

On Mon, 9 May 2022 at 17:45, Chao Sun  wrote:

> Agreed, that step #10 is out-dated and should be removed (I skipped
> that when releasing Hadoop 3.3.2 but didn't update it, sorry).
>
> > How about using
> https://repository.apache.org/content/repositories/orgapachehadoop-1348/
>
> Akira, I tried this too but it didn't work. I think we'd need the
> artifacts to be properly pushed to the staging repository.
>
> > Could you please let me know how I can consume the Hadoop 3 jars in
> maven?
>
> Gautham (if you are following this thread), you'll need to add the
> following:
>
> 
>staged
>staged-releases
>https://repository.apache.org/content/repositories/staging/
> 
>
>  true
>
>
>  true
>
>  
>
> to the `` section in your Maven pom.xml file.
>
> On Mon, May 9, 2022 at 8:52 AM Steve Loughran
>  wrote:
> >
> > I didn't do that as the docker image was doing it itself...I discussed
> this
> > with Akira and Ayush & they agreed. so whatever went wrong. it was
> > something else.
> >
> > I have been building a list of things I'd like to change there; cutting
> > that line was one of them. but I need to work out the correct workflow.
> >
> > trying again, and creating a stub module to verify the client is in
> staging
> >
> > On Mon, 9 May 2022 at 15:19, Masatake Iwasaki <
> iwasak...@oss.nttdata.co.jp>
> > wrote:
> >
> > > It seems to be caused by obsolete instruction in HowToRelease Wiki?
> > >
> > > After HADOOP-15058, `mvn deploy` is kicked by
> > > `dev-support/bin/create-release --asfrelease`.
> > > https://issues.apache.org/jira/browse/HADOOP-15058
> > >
> > > Step #10 in "Creating the release candidate (X.Y.Z-RC)" section
> > > of the Wiki still instructs to run `mvn deploy` with `-DskipShade`.
> > >
> > > 2 sets of artifact are deployed after creating RC based on the
> instruction.
> > > The latest one contains empty shaded jars.
> > >
> > > hadoop-client-api and hadoop-client-runtime of already released 3.2.3
> > > looks having same issue...
> > >
> > > Masatake Iwasaki
> > >
> > > On 2022/05/08 6:45, Akira Ajisaka wrote:
> > > > Hi Chao,
> > > >
> > > > How about using
> > > >
> https://repository.apache.org/content/repositories/orgapachehadoop-1348/
> > > > instead of
> https://repository.apache.org/content/repositories/staging/ ?
> > > >
> > > > Akira
> > > >
> > > > On Sat, May 7, 2022 at 10:52 AM Ayush Saxena 
> wrote:
> > > >
> > > >> Hmm, I see the artifacts ideally should have got overwritten by the
> new
> > > >> RC, but they didn’t. The reason seems like the staging path shared
> > > doesn’t
> > > >> have any jars…
> > > >> That is why it was picking the old jars. I think Steve needs to run
> mvn
> > > >> deploy again…
> > > >>
> > > >> Sent from my iPhone
> > > >>
> > > >>> On 07-May-2022, at 7:12 AM, Chao Sun  wrote:
> > > >>>
> > > >>> 
> > > >>>>
> > > >>>> Chao can you use the one that Steve mentioned in the mail?
> > > >>>
> > > >>> Hmm how do I do that? Typically after closing the RC in nexus the
> > > >>> release bits will show up in
> > > >>>
> > > >>
> > >
> https://repository.apache.org/content/repositories/staging/org/apache/hadoop
> > > >>> and Spark build will be able to pick them up for testing. However
> in
> > > >>> this case I don't see any 3.3.3 jars in the URL.
> > > >>>
> > > >>>> On Fri, May 6, 2022 at 6:24 PM Ayush Saxena 
> > > wrote:
> > > >>>>
> > > >>>> There were two 3.3.3 staged. The earlier one was with skipShade,
> the
> > > >> date was also april 22, I archived that. Chao can you use the one
> that
> > > >> Steve mentioned in the mail?
> > > >>>>
> > > >>>>> On Sat, 7 May 2022 at 06:18, Chao Sun 
> wrote:
> > > >>>>>
> > > >>&

Re: [VOTE] Release Apache Hadoop 3.3.3

2022-05-09 Thread Steve Loughran
I didn't do that as the docker image was doing it itself...I discussed this
with Akira and Ayush & they agreed. so whatever went wrong. it was
something else.

I have been building a list of things I'd like to change there; cutting
that line was one of them. but I need to work out the correct workflow.

trying again, and creating a stub module to verify the client is in staging

On Mon, 9 May 2022 at 15:19, Masatake Iwasaki 
wrote:

> It seems to be caused by obsolete instruction in HowToRelease Wiki?
>
> After HADOOP-15058, `mvn deploy` is kicked by
> `dev-support/bin/create-release --asfrelease`.
> https://issues.apache.org/jira/browse/HADOOP-15058
>
> Step #10 in "Creating the release candidate (X.Y.Z-RC)" section
> of the Wiki still instructs to run `mvn deploy` with `-DskipShade`.
>
> 2 sets of artifact are deployed after creating RC based on the instruction.
> The latest one contains empty shaded jars.
>
> hadoop-client-api and hadoop-client-runtime of already released 3.2.3
> looks having same issue...
>
> Masatake Iwasaki
>
> On 2022/05/08 6:45, Akira Ajisaka wrote:
> > Hi Chao,
> >
> > How about using
> > https://repository.apache.org/content/repositories/orgapachehadoop-1348/
> > instead of https://repository.apache.org/content/repositories/staging/ ?
> >
> > Akira
> >
> > On Sat, May 7, 2022 at 10:52 AM Ayush Saxena  wrote:
> >
> >> Hmm, I see the artifacts ideally should have got overwritten by the new
> >> RC, but they didn’t. The reason seems like the staging path shared
> doesn’t
> >> have any jars…
> >> That is why it was picking the old jars. I think Steve needs to run mvn
> >> deploy again…
> >>
> >> Sent from my iPhone
> >>
> >>> On 07-May-2022, at 7:12 AM, Chao Sun  wrote:
> >>>
> >>> 
> >>>>
> >>>> Chao can you use the one that Steve mentioned in the mail?
> >>>
> >>> Hmm how do I do that? Typically after closing the RC in nexus the
> >>> release bits will show up in
> >>>
> >>
> https://repository.apache.org/content/repositories/staging/org/apache/hadoop
> >>> and Spark build will be able to pick them up for testing. However in
> >>> this case I don't see any 3.3.3 jars in the URL.
> >>>
> >>>> On Fri, May 6, 2022 at 6:24 PM Ayush Saxena 
> wrote:
> >>>>
> >>>> There were two 3.3.3 staged. The earlier one was with skipShade, the
> >> date was also april 22, I archived that. Chao can you use the one that
> >> Steve mentioned in the mail?
> >>>>
> >>>>> On Sat, 7 May 2022 at 06:18, Chao Sun  wrote:
> >>>>>
> >>>>> Seems there are some issues with the shaded client as I was not able
> >>>>> to compile Apache Spark with the RC
> >>>>> (https://github.com/apache/spark/pull/36474). Looks like it's
> compiled
> >>>>> with the `-DskipShade` option and the hadoop-client-api JAR doesn't
> >>>>> contain any class:
> >>>>>
> >>>>> ➜  hadoop-client-api jar tf 3.3.3/hadoop-client-api-3.3.3.jar
> >>>>> META-INF/
> >>>>> META-INF/MANIFEST.MF
> >>>>> META-INF/NOTICE.txt
> >>>>> META-INF/LICENSE.txt
> >>>>> META-INF/maven/
> >>>>> META-INF/maven/org.apache.hadoop/
> >>>>> META-INF/maven/org.apache.hadoop/hadoop-client-api/
> >>>>> META-INF/maven/org.apache.hadoop/hadoop-client-api/pom.xml
> >>>>> META-INF/maven/org.apache.hadoop/hadoop-client-api/pom.properties
> >>>>>
> >>>>> On Fri, May 6, 2022 at 4:24 PM Stack  wrote:
> >>>>>>
> >>>>>> +1 (binding)
> >>>>>>
> >>>>>>   * Signature: ok
> >>>>>>   * Checksum : passed
> >>>>>>   * Rat check (1.8.0_191): passed
> >>>>>>- mvn clean apache-rat:check
> >>>>>>   * Built from source (1.8.0_191): failed
> >>>>>>- mvn clean install  -DskipTests
> >>>>>>- mvn -fae --no-transfer-progress -DskipTests
> >> -Dmaven.javadoc.skip=true
> >>>>>> -Pnative -Drequire.openssl -Drequire.snappy -Drequire.valgrind
> >>>>>> -Drequire.zstd -Drequire.test.libhadoop clean install
> >>>>>>   * Unit tests pass (1.8.0_191):
> >>>>>> - HDFS Tests passed (Didn't run more than 

Re: [VOTE] Release Apache Hadoop 3.3.3

2022-05-08 Thread Steve Loughran
t;> META-INF/
> > >>> META-INF/MANIFEST.MF
> > >>> META-INF/NOTICE.txt
> > >>> META-INF/LICENSE.txt
> > >>> META-INF/maven/
> > >>> META-INF/maven/org.apache.hadoop/
> > >>> META-INF/maven/org.apache.hadoop/hadoop-client-api/
> > >>> META-INF/maven/org.apache.hadoop/hadoop-client-api/pom.xml
> > >>> META-INF/maven/org.apache.hadoop/hadoop-client-api/pom.properties
> > >>>
> > >>> On Fri, May 6, 2022 at 4:24 PM Stack  wrote:
> > >>>>
> > >>>> +1 (binding)
> > >>>>
> > >>>>  * Signature: ok
> > >>>>  * Checksum : passed
> > >>>>  * Rat check (1.8.0_191): passed
> > >>>>   - mvn clean apache-rat:check
> > >>>>  * Built from source (1.8.0_191): failed
> > >>>>   - mvn clean install  -DskipTests
> > >>>>   - mvn -fae --no-transfer-progress -DskipTests
> > -Dmaven.javadoc.skip=true
> > >>>> -Pnative -Drequire.openssl -Drequire.snappy -Drequire.valgrind
> > >>>> -Drequire.zstd -Drequire.test.libhadoop clean install
> > >>>>  * Unit tests pass (1.8.0_191):
> > >>>>- HDFS Tests passed (Didn't run more than this).
> > >>>>
> > >>>> Deployed a ten node ha hdfs cluster with three namenodes and five
> > >>>> journalnodes. Ran a ten node hbase (older version of 2.5 branch
> built
> > >>>> against 3.3.2) against it. Tried a small verification job. Good.
> Ran a
> > >>>> bigger job with mild chaos. All seems to be working properly
> > (recoveries,
> > >>>> logs look fine). Killed a namenode. Failover worked promptly. UIs
> look
> > >>>> good. Poked at the hdfs cli. Seems good.
> > >>>>
> > >>>> S
> > >>>>
> > >>>> On Tue, May 3, 2022 at 4:24 AM Steve Loughran
> > 
> > >>>> wrote:
> > >>>>
> > >>>>> I have put together a release candidate (rc0) for Hadoop 3.3.3
> > >>>>>
> > >>>>> The RC is available at:
> > >>>>> https://dist.apache.org/repos/dist/dev/hadoop/3.3.3-RC0/
> > >>>>>
> > >>>>> The git tag is release-3.3.3-RC0, commit d37586cbda3
> > >>>>>
> > >>>>> The maven artifacts are staged at
> > >>>>>
> > https://repository.apache.org/content/repositories/orgapachehadoop-1348/
> > >>>>>
> > >>>>> You can find my public key at:
> > >>>>> https://dist.apache.org/repos/dist/release/hadoop/common/KEYS
> > >>>>>
> > >>>>> Change log
> > >>>>>
> https://dist.apache.org/repos/dist/dev/hadoop/3.3.3-RC0/CHANGELOG.md
> > >>>>>
> > >>>>> Release notes
> > >>>>>
> > https://dist.apache.org/repos/dist/dev/hadoop/3.3.3-RC0/RELEASENOTES.md
> > >>>>>
> > >>>>> There's a very small number of changes, primarily critical
> > code/packaging
> > >>>>> issues and security fixes.
> > >>>>>
> > >>>>>
> > >>>>>   - The critical fixes which shipped in the 3.2.3 release.
> > >>>>>   -  CVEs in our code and dependencies
> > >>>>>   - Shaded client packaging issues.
> > >>>>>   - A switch from log4j to reload4j
> > >>>>>
> > >>>>>
> > >>>>> reload4j is an active fork of the log4j 1.17 library with the
> > classes which
> > >>>>> contain CVEs removed. Even though hadoop never used those classes,
> > they
> > >>>>> regularly raised alerts on security scans and concen from users.
> > Switching
> > >>>>> to the forked project allows us to ship a secure logging framework.
> > It will
> > >>>>> complicate the builds of downstream maven/ivy/gradle projects which
> > exclude
> > >>>>> our log4j artifacts, as they need to cut the new dependency
> > instead/as
> > >>>>> well.
> > >>>>>
> > >>>>> See the release notes for details.
> > >>>>>
> > >>>>> This is my first release through the new docker build process, do
> > please
> > >>>>> validate artifact signing  to make sure it is good. I'll be
> trying
> > builds
> > >>>>> of downstream projects.
> > >>>>>
> > >>>>> We know there are some outstanding issues with at least one library
> > we are
> > >>>>> shipping (okhttp), but I don't want to hold this release up for it.
> > If the
> > >>>>> docker based release process works smoothly enough we can do a
> > followup
> > >>>>> security release in a few weeks.
> > >>>>>
> > >>>>> Please try the release and vote. The vote will run for 5 days.
> > >>>>>
> > >>>>> -Steve
> > >>>>>
> > >>>
> > >>> -
> > >>> To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
> > >>> For additional commands, e-mail: common-dev-h...@hadoop.apache.org
> > >>>
> >
> > -
> > To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
> > For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
> >
> >
>


[VOTE] Release Apache Hadoop 3.3.3

2022-05-03 Thread Steve Loughran
I have put together a release candidate (rc0) for Hadoop 3.3.3

The RC is available at:
https://dist.apache.org/repos/dist/dev/hadoop/3.3.3-RC0/

The git tag is release-3.3.3-RC0, commit d37586cbda3

The maven artifacts are staged at
https://repository.apache.org/content/repositories/orgapachehadoop-1348/

You can find my public key at:
https://dist.apache.org/repos/dist/release/hadoop/common/KEYS

Change log
https://dist.apache.org/repos/dist/dev/hadoop/3.3.3-RC0/CHANGELOG.md

Release notes
https://dist.apache.org/repos/dist/dev/hadoop/3.3.3-RC0/RELEASENOTES.md

There's a very small number of changes, primarily critical code/packaging
issues and security fixes.


   - The critical fixes which shipped in the 3.2.3 release.
   -  CVEs in our code and dependencies
   - Shaded client packaging issues.
   - A switch from log4j to reload4j


reload4j is an active fork of the log4j 1.17 library with the classes which
contain CVEs removed. Even though hadoop never used those classes, they
regularly raised alerts on security scans and concen from users. Switching
to the forked project allows us to ship a secure logging framework. It will
complicate the builds of downstream maven/ivy/gradle projects which exclude
our log4j artifacts, as they need to cut the new dependency instead/as well.

See the release notes for details.

This is my first release through the new docker build process, do please
validate artifact signing  to make sure it is good. I'll be trying builds
of downstream projects.

We know there are some outstanding issues with at least one library we are
shipping (okhttp), but I don't want to hold this release up for it. If the
docker based release process works smoothly enough we can do a followup
security release in a few weeks.

Please try the release and vote. The vote will run for 5 days.

-Steve


MAPREDUCE-7369: including heartbeats from workers as proof of liveness

2022-04-29 Thread Steve Loughran
could someone competent in the mr codebase review the pr for

https://issues.apache.org/jira/browse/MAPREDUCE-7369

this is to address the problem "MapReduce tasks timing out when spends more
time on MultipleOutputs#close"

with object store close() calls potentially including longer upload/close
overhead, it is probably more of an issue there

thanks,

steve


[jira] [Resolved] (MAPREDUCE-7373) Building MapReduce NativeTask fails on Fedora 34+

2022-04-18 Thread Steve Loughran (Jira)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-7373?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran resolved MAPREDUCE-7373.
---
Fix Version/s: 3.3.3
   (was: 3.4.0)
   (was: 3.3.4)
   Resolution: Fixed

FIxed in 3.3.3; updating fix versions as appropriate

> Building MapReduce NativeTask fails on Fedora 34+
> -
>
> Key: MAPREDUCE-7373
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7373
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: build, nativetask
>Reporter: Kengo Seki
>Assignee: Kengo Seki
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.2.4, 3.3.3
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> Fedora 34 adopted GCC 11, in which C++17 features are enabled by default.
> https://gcc.gnu.org/projects/cxx-status.html#cxx17
> Building MapReduce NativeTask with it leads to the following error.
> (I found it on branch-3.2, but it's supposed to be the same as trunk)
> {code}
> $ mvn package -Pdist,native -DskipTests -Dtar -Dmaven.javadoc.skip=true
> ...
> [WARNING] In file included from 
> /home/vagrant/hadoop/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-nativetask/src/main/native/src/lib/MapOutputCollector.h:30,
> [WARNING]  from 
> /home/vagrant/hadoop/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-nativetask/src/main/native/src/handler/MCollectorOutputHandler.cc:24:
> [WARNING] 
> /home/vagrant/hadoop/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-nativetask/src/main/native/src/lib/PartitionBucket.h:127:36:
>  error: ISO C++17 does not allow dynamic exception specifications
> [WARNING]   127 |   void spill(IFileWriter * writer) throw (IOException, 
> UnsupportException);
> [WARNING]   |^
> [WARNING] make[2]: *** [CMakeFiles/nativetask_static.dir/build.make:160: 
> CMakeFiles/nativetask_static.dir/main/native/src/handler/MCollectorOutputHandler.cc.o]
>  Error 1
> [WARNING] make[1]: *** [CMakeFiles/Makefile2:115: 
> CMakeFiles/nativetask_static.dir/all] Error 2
> [WARNING] make: *** [Makefile:91: all] Error 2
> ...
> [INFO] Apache Hadoop MapReduce HistoryServer Plugins .. SUCCESS [  0.570 
> s]
> [INFO] Apache Hadoop MapReduce NativeTask . FAILURE [ 11.016 
> s]
> [INFO] Apache Hadoop MapReduce Uploader ... SKIPPED
> [INFO] Apache Hadoop MapReduce Examples ... SKIPPED
> [INFO] Apache Hadoop MapReduce  SKIPPED
> [INFO] Apache Hadoop MapReduce Streaming .. SKIPPED
> [INFO] Apache Hadoop Distributed Copy . SKIPPED
> [INFO] Apache Hadoop Archives . SKIPPED
> [INFO] Apache Hadoop Archive Logs . SKIPPED
> [INFO] Apache Hadoop Rumen  SKIPPED
> [INFO] Apache Hadoop Gridmix .. SKIPPED
> [INFO] Apache Hadoop Data Join  SKIPPED
> [INFO] Apache Hadoop Extras ... SKIPPED
> [INFO] Apache Hadoop Pipes  SKIPPED
> [INFO] Apache Hadoop OpenStack support  SKIPPED
> [INFO] Apache Hadoop Amazon Web Services support .. SKIPPED
> [INFO] Apache Hadoop Kafka Library support  SKIPPED
> [INFO] Apache Hadoop Azure support  SKIPPED
> [INFO] Apache Hadoop Aliyun OSS support ... SKIPPED
> [INFO] Apache Hadoop Client Aggregator  SKIPPED
> [INFO] Apache Hadoop Scheduler Load Simulator . SKIPPED
> [INFO] Apache Hadoop Resource Estimator Service ... SKIPPED
> [INFO] Apache Hadoop Azure Data Lake support .. SKIPPED
> [INFO] Apache Hadoop Tools Dist ... SKIPPED
> [INFO] Apache Hadoop Tools  SKIPPED
> [INFO] Apache Hadoop Client API ... SKIPPED
> [INFO] Apache Hadoop Client Runtime ... SKIPPED
> [INFO] Apache Hadoop Client Packaging Invariants .. SKIPPED
> [INFO] Apache Hadoop Client Test Minicluster .. SKIPPED
> [INFO] Apache Hadoop Client Packaging Invariants for Test . SKIPPED
> [INFO] Apache Hadoop Client Packaging Integration Tests ... SKIPPED
> [INFO] Apache Hadoop Distribution . 

[ANNOUNCE] branch-3.3.3 created off 3.3.2; branch-3.3 hadoop.version is now 3.3.4-SNAPSHOT

2022-04-12 Thread Steve Loughran
There is now a branch-3.3.3 in the repo; this is forked off 3.3.2, *not*
branch-3.3

branch-3.3 has its hadoop version now set to 3.3.4

I've renamed the release version 3.3.3 in JIRA to 3.3.4, so all
fixed/target/affects references have been automatically updated. text
references in JIRAs and PRs will not. If you have JIRAs with 3.3. in the
title, now is the time to change them.

There is now a new 3.3.3 version in JIRA for all fixes only targeted at
this release, and for future bug reports.

The list of JIRAs to go in is under
https://issues.apache.org/jira/browse/HADOOP-18198 ; I've re-opened them
all to track the merge

-Steve


[jira] [Reopened] (MAPREDUCE-7373) Building MapReduce NativeTask fails on Fedora 34+

2022-04-12 Thread Steve Loughran (Jira)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-7373?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran reopened MAPREDUCE-7373:
---

> Building MapReduce NativeTask fails on Fedora 34+
> -
>
> Key: MAPREDUCE-7373
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7373
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: build, nativetask
>Reporter: Kengo Seki
>Assignee: Kengo Seki
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.2.4, 3.3.4
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> Fedora 34 adopted GCC 11, in which C++17 features are enabled by default.
> https://gcc.gnu.org/projects/cxx-status.html#cxx17
> Building MapReduce NativeTask with it leads to the following error.
> (I found it on branch-3.2, but it's supposed to be the same as trunk)
> {code}
> $ mvn package -Pdist,native -DskipTests -Dtar -Dmaven.javadoc.skip=true
> ...
> [WARNING] In file included from 
> /home/vagrant/hadoop/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-nativetask/src/main/native/src/lib/MapOutputCollector.h:30,
> [WARNING]  from 
> /home/vagrant/hadoop/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-nativetask/src/main/native/src/handler/MCollectorOutputHandler.cc:24:
> [WARNING] 
> /home/vagrant/hadoop/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-nativetask/src/main/native/src/lib/PartitionBucket.h:127:36:
>  error: ISO C++17 does not allow dynamic exception specifications
> [WARNING]   127 |   void spill(IFileWriter * writer) throw (IOException, 
> UnsupportException);
> [WARNING]   |^
> [WARNING] make[2]: *** [CMakeFiles/nativetask_static.dir/build.make:160: 
> CMakeFiles/nativetask_static.dir/main/native/src/handler/MCollectorOutputHandler.cc.o]
>  Error 1
> [WARNING] make[1]: *** [CMakeFiles/Makefile2:115: 
> CMakeFiles/nativetask_static.dir/all] Error 2
> [WARNING] make: *** [Makefile:91: all] Error 2
> ...
> [INFO] Apache Hadoop MapReduce HistoryServer Plugins .. SUCCESS [  0.570 
> s]
> [INFO] Apache Hadoop MapReduce NativeTask . FAILURE [ 11.016 
> s]
> [INFO] Apache Hadoop MapReduce Uploader ... SKIPPED
> [INFO] Apache Hadoop MapReduce Examples ... SKIPPED
> [INFO] Apache Hadoop MapReduce  SKIPPED
> [INFO] Apache Hadoop MapReduce Streaming .. SKIPPED
> [INFO] Apache Hadoop Distributed Copy . SKIPPED
> [INFO] Apache Hadoop Archives . SKIPPED
> [INFO] Apache Hadoop Archive Logs . SKIPPED
> [INFO] Apache Hadoop Rumen  SKIPPED
> [INFO] Apache Hadoop Gridmix .. SKIPPED
> [INFO] Apache Hadoop Data Join  SKIPPED
> [INFO] Apache Hadoop Extras ... SKIPPED
> [INFO] Apache Hadoop Pipes  SKIPPED
> [INFO] Apache Hadoop OpenStack support  SKIPPED
> [INFO] Apache Hadoop Amazon Web Services support .. SKIPPED
> [INFO] Apache Hadoop Kafka Library support  SKIPPED
> [INFO] Apache Hadoop Azure support  SKIPPED
> [INFO] Apache Hadoop Aliyun OSS support ... SKIPPED
> [INFO] Apache Hadoop Client Aggregator  SKIPPED
> [INFO] Apache Hadoop Scheduler Load Simulator . SKIPPED
> [INFO] Apache Hadoop Resource Estimator Service ... SKIPPED
> [INFO] Apache Hadoop Azure Data Lake support .. SKIPPED
> [INFO] Apache Hadoop Tools Dist ... SKIPPED
> [INFO] Apache Hadoop Tools  SKIPPED
> [INFO] Apache Hadoop Client API ... SKIPPED
> [INFO] Apache Hadoop Client Runtime ... SKIPPED
> [INFO] Apache Hadoop Client Packaging Invariants .. SKIPPED
> [INFO] Apache Hadoop Client Test Minicluster .. SKIPPED
> [INFO] Apache Hadoop Client Packaging Invariants for Test . SKIPPED
> [INFO] Apache Hadoop Client Packaging Integration Tests ... SKIPPED
> [INFO] Apache Hadoop Distribution . SKIPPED
> [INFO] Apache Hadoop Client Modules ... SKIPPED
> [INFO] Apache Hadoop Cloud Storage  SKIPPED
> [INFO] Apache Hadoop Cloud Storage Project  SKIPPE

Re: HADOOP-18198. Release Hadoop 3.3.3: hadoop-3.3.2 with CVE fixes

2022-04-12 Thread Steve Loughran
I should add that the CVEs in question are minor, unless you are running
Hadoop on windows. given you have to compile the native binaries yourself
for that, that is not something we know anyone actually does in production.

The reload4j fix means that we can get out of the classpath the log4j
vulnerabilities which were never reached in the Hadoop code, but which
audit tools would flag up.

I'd also like to update our shaded protobuf library too



On Mon, 11 Apr 2022 at 14:54, Steve Loughran  wrote:

>
> I've just created a new JIRA and assigned to myself: HADOOP-18198. Release
> Hadoop 3.3.3: hadoop-3.3.2 with CVE fixes
>
>
> https://issues.apache.org/jira/browse/HADOOP-18198
>
> --
>
> Hadoop 3.3.3 is a minor followup release to Hadoop 3.3.2 with
>
> * CVE fixes in Hadoop source
> * CVE fixes in dependencies
> * replacement of log4j 1.2.17 to reload4j
> * some changes which shipped in hadoop 3.2.3 for consistency
>
> --
>
>
> This is not a release off branch-3.3, it is a fork of 3.3.2 with the
> changes.
>
> The next release of branch-3.3 will be numbered hadoop-3.3.4; updating
> maven versions and JIRA fix versions is part of this release process.
>
> To get these fixes out fast and avoid any regressions, *I'm not putting
> anything else in other than the fixes which shipped in 3.2.4*
>
> For all non-CVE related fixes, consult this process:
> https://scarfolk.blogspot.com/2015/08/no-1973-1975.html
>
> I will try and do some ARM binaries too, but I'm not going to make a
> commitment. My laptop is now an ARM CPU, so in fact cutting this release
> involves me actually building it on a different machine; my previous
> laptop, or, if that doesn't work out, some remote server.
>
> as usual, any help testing would be wonderful.
>
> After this, I would like to start planning that 3.3.4 feature release. I
> think I will nominate myself as the release engineer there, with help from
> colleagues, especially Mehakmeet and Mukund.
>
> -Steve
>


HADOOP-18198. Release Hadoop 3.3.3: hadoop-3.3.2 with CVE fixes

2022-04-11 Thread Steve Loughran
I've just created a new JIRA and assigned to myself: HADOOP-18198. Release
Hadoop 3.3.3: hadoop-3.3.2 with CVE fixes


https://issues.apache.org/jira/browse/HADOOP-18198

--

Hadoop 3.3.3 is a minor followup release to Hadoop 3.3.2 with

* CVE fixes in Hadoop source
* CVE fixes in dependencies
* replacement of log4j 1.2.17 to reload4j
* some changes which shipped in hadoop 3.2.3 for consistency

--


This is not a release off branch-3.3, it is a fork of 3.3.2 with the
changes.

The next release of branch-3.3 will be numbered hadoop-3.3.4; updating
maven versions and JIRA fix versions is part of this release process.

To get these fixes out fast and avoid any regressions, *I'm not putting
anything else in other than the fixes which shipped in 3.2.4*

For all non-CVE related fixes, consult this process:
https://scarfolk.blogspot.com/2015/08/no-1973-1975.html

I will try and do some ARM binaries too, but I'm not going to make a
commitment. My laptop is now an ARM CPU, so in fact cutting this release
involves me actually building it on a different machine; my previous
laptop, or, if that doesn't work out, some remote server.

as usual, any help testing would be wonderful.

After this, I would like to start planning that 3.3.4 feature release. I
think I will nominate myself as the release engineer there, with help from
colleagues, especially Mehakmeet and Mukund.

-Steve


[jira] [Resolved] (MAPREDUCE-7341) Add a task-manifest output committer for Azure and GCS

2022-03-17 Thread Steve Loughran (Jira)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-7341?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran resolved MAPREDUCE-7341.
---
Fix Version/s: 3.3.3
   Resolution: Fixed

> Add a task-manifest output committer for Azure and GCS
> --
>
> Key: MAPREDUCE-7341
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7341
> Project: Hadoop Map/Reduce
>  Issue Type: New Feature
>  Components: client
>Affects Versions: 3.3.1
>    Reporter: Steve Loughran
>    Assignee: Steve Loughran
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.3.3
>
>  Time Spent: 39.5h
>  Remaining Estimate: 0h
>
> Add a task-manifest output committer for Azure and GCS
> The S3A committers are very popular in Spark on S3, as they are both correct 
> and fast.
> The classic FileOutputCommitter v1 and v2 algorithms are all that is 
> available for Azure ABFS and Google GCS, and they have limitations. 
> The v2 algorithm isn't safe in the presence of failed task attempt commits, 
> so we
> recommend the v1 algorithm for Azure. But that is slow because it 
> sequentially lists
> then renames files and directories, one-by-one. The latencies of list
> and rename make things slow.
> Google GCS lacks the atomic directory rename required for v1 correctness;
> v2 can be used (which doesn't have the job commit performance limitations),
> but it's not safe.
> Proposed
> * Add a new FileOutputFormat committer which uses an intermediate manifest to
>   pass the list of files created by a TA to the job committer.
> * Job committer to parallelise reading these task manifests and submit all the
>   rename operations into a pool of worker threads. (also: mkdir, directory 
> deletions on cleanup)
> * Use the committer plugin mechanism added for s3a to make this the default 
> committer for ABFS
>   (i.e. no need to make any changes to FileOutputCommitter)
> * Add lots of IOStatistics instrumentation + logging of operations in the 
> JobCommit
>   for visibility of where delays are occurring.
> * Reuse the S3A committer _SUCCESS JSON structure to publish IOStats & other 
> data
>   for testing/support.  
> This committer will be faster than the V1 algorithm because of the 
> parallelisation, and
> because a manifest written by create-and-rename will be exclusive to a single 
> task
> attempt, delivers the isolation which the v2 committer lacks.
> This is not an attempt to do an iceberg/hudi/delta-lake style manifest-only 
> format
> for describing the contents of a table; the final output is still a directory 
> tree
> which must be scanned during query planning.
> As such the format is still suboptimal for cloud storage -but at least we 
> will have
> faster job execution during the commit phases.
>   
> Note: this will also work on HDFS, where again, it should be faster than
> the v1 committer. However the target is very much Spark with ABFS and GCS; no 
> plans to worry about MR as that simplifies the challenge of dealing with job 
> restart (i.e. you don't have to)



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org



Re: [VOTE] Release Apache Hadoop 3.3.2 - RC3

2022-01-29 Thread Steve Loughran
maybe even before moving the catalog away, we could make it optional

On Fri, 28 Jan 2022 at 08:24, Akira Ajisaka  wrote:

> Thank you Masatake and Chao!
>
> On Fri, Jan 28, 2022 at 5:11 PM Chao Sun  wrote:
>
> > Thanks Masatake and Akira for discovering the issue. I used
> > `dev-support/bin/create-release` which runs `mvn deploy -DskipTests
> > -Pnative -Pdist ...` in a separate container and somehow it didn't hit
> this
> > issue.
> >
> > Let me cherry-pick https://issues.apache.org/jira/browse/YARN-10561 to
> > branch-3.3.2 and start another RC then.
> >
> > Thanks,
> > Chao
> >
> > On Fri, Jan 28, 2022 at 12:01 AM Masatake Iwasaki <
> > iwasak...@oss.nttdata.co.jp> wrote:
> >
> >> Thanks, Akira.
> >>
> >> I confirmed that the issue is fixed in current branch-3.3 containing
> >> YARN-10561.
> >>
> >> On 2022/01/28 14:25, Akira Ajisaka wrote:
> >> > Hi Masatake,
> >> >
> >> > I faced the same error in a clean environment and
> >> https://issues.apache.org/jira/browse/YARN-10561 <
> >> https://issues.apache.org/jira/browse/YARN-10561> should fix this
> issue.
> >> I'll rebase the patch shortly.
> >> >
> >> > By the way, I'm afraid there is no active maintainer in
> >> hadoop-yarn-applications-catalog module. The module is for a sample
> >> application catalog, so I think we can move the module to a separate
> >> repository. Of course, it should be discussed separately.
> >> >
> >> > Thanks and regards,
> >> > Akira
> >> >
> >> > On Fri, Jan 28, 2022 at 1:39 PM Masatake Iwasaki <
> >> iwasak...@oss.nttdata.co.jp >
> wrote:
> >> >
> >> > Thanks for putting this up, Chao Sun.
> >> >
> >> > I got following error on building the RC3 source tarball.
> >> > It is reproducible even in the container launched by
> >> `./start-build-env.sh`.
> >> > There seems to be no relevant diff between release-3.3.2-RC0 and
> >> release-3.3.2-RC3 (and trunk)
> >> > under hadoop-yarn-applications-catalog-webapp.
> >> >
> >> > I guess developers having caches of related artifacts under ~/.m2
> >> did not see this?
> >> >
> >> > ```
> >> > $ mvn clean install -DskipTests -Pnative -Pdist
> >> > ...
> >> > [INFO] Installing node version v8.11.3
> >> > [INFO] Downloading
> >> https://nodejs.org/dist/v8.11.3/node-v8.11.3-linux-x64.tar.gz <
> >> https://nodejs.org/dist/v8.11.3/node-v8.11.3-linux-x64.tar.gz> to
> >>
> /home/centos/.m2/repository/com/github/eirslett/node/8.11.3/node-8.11.3-linux-x64.tar.gz
> >> > [INFO] No proxies configured
> >> > [INFO] No proxy was configured, downloading directly
> >> > [INFO] Unpacking
> >>
> /home/centos/.m2/repository/com/github/eirslett/node/8.11.3/node-8.11.3-linux-x64.tar.gz
> >> into
> >>
> /home/centos/srcs/hadoop-3.3.2-src/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-catalog/hadoop-yarn-applications-catalog-webapp/target/node/tmp
> >> > [INFO] Copying node binary from
> >>
> /home/centos/srcs/hadoop-3.3.2-src/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-catalog/hadoop-yarn-applications-catalog-webapp/target/node/tmp/node-v8.11.3-linux-x64/bin/node
> >> to
> >>
> /home/centos/srcs/hadoop-3.3.2-src/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-catalog/hadoop-yarn-applications-catalog-webapp/target/node/node
> >> > [INFO] Installed node locally.
> >> > [INFO] Installing Yarn version v1.7.0
> >> > [INFO] Downloading
> >>
> https://github.com/yarnpkg/yarn/releases/download/v1.7.0/yarn-v1.7.0.tar.gz
> >> <
> >>
> https://github.com/yarnpkg/yarn/releases/download/v1.7.0/yarn-v1.7.0.tar.gz
> >
> >> to
> >>
> /home/centos/.m2/repository/com/github/eirslett/yarn/1.7.0/yarn-1.7.0.tar.gz
> >> > [INFO] No proxies configured
> >> > [INFO] No proxy was configured, downloading directly
> >> > [INFO] Unpacking
> >>
> /home/centos/.m2/repository/com/github/eirslett/yarn/1.7.0/yarn-1.7.0.tar.gz
> >> into
> >>
> /home/centos/srcs/hadoop-3.3.2-src/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-catalog/hadoop-yarn-applications-catalog-webapp/target/node/yarn
> >> > [INFO] Installed Yarn locally.
> >> > [INFO]
> >> > [INFO] --- frontend-maven-plugin:1.11.2:yarn (yarn install) @
> >> hadoop-yarn-applications-catalog-webapp ---
> >> > [INFO] testFailureIgnore property is ignored in non test phases
> >> > [INFO] Running 'yarn ' in
> >>
> /home/centos/srcs/hadoop-3.3.2-src/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-catalog/hadoop-yarn-applications-catalog-webapp/target
> >> > [INFO] yarn install v1.7.0
> >> > [INFO] info No lockfile found.
> >> > [INFO] [1/4] Resolving packages...
> >> > [INFO] [2/4] Fetching packages...
> >> > [INFO] error safe-stable-stringify@2.3.1: The engine "node" is
> >> incompatible with this module. Expected version ">=10".
> >> > [INFO] error 

Re: [VOTE] Release Apache Hadoop 3.3.2 - RC2

2022-01-25 Thread Steve Loughran
that error
org.restlet.jee:org.restlet:pom:2.3.0 from/to maven-default-http-blocker (
http://0.0.0.0/): Blocked mirror for repositories: [maven-restlet

implies maven is not downloading http artifacts, and it had decided that
the reslet artifacts were coming off an http repo, even though its in maven
central

which means look at your global maven settings



On Tue, 25 Jan 2022 at 07:27, Mukund Madhav Thakur
 wrote:

> Hi Chao,
> I was using the command "mvn package -Pdist -DskipTests -Dtar
> -Dmaven.javadoc.skip=true" on commit id *6da346a358c. *
> It is working for me today. So maybe it was an intermittent issue in my
> local last time when I was trying this. So we can ignore this. Thanks
>
>
>
> On Tue, Jan 25, 2022 at 6:21 AM Stack  wrote:
>
> > +1 (binding)
> >
> > * Signature: ok
> > * Checksum : ok
> > * Rat check (1.8.0_191): ok
> >  - mvn clean apache-rat:check
> > * Built from source (1.8.0_191): ok
> >  - mvn clean install  -DskipTests
> >
> > Poking around in the binary, it looks good. Unpacked site. Looks right.
> > Checked a few links work.
> >
> > Deployed over ten node cluster. Ran HBase ITBLL over it for a few hours
> w/
> > chaos. Worked like 3.3.1...
> >
> > I tried to build with 3.8.1 maven and got the below.
> >
> > [ERROR] Failed to execute goal on project
> > hadoop-yarn-applications-catalog-webapp: Could not resolve dependencies
> for
> > project
> > org.apache.hadoop:hadoop-yarn-applications-catalog-webapp:war:3.3.2:
> Failed
> > to collect dependencies at org.apache.solr:solr-core:jar:7.7.0 ->
> > org.restlet.jee:org.restlet:jar:2.3.0: Failed to read artifact descriptor
> > for org.restlet.
> > jee:org.restlet:jar:2.3.0: Could not transfer artifact
> > org.restlet.jee:org.restlet:pom:2.3.0 from/to maven-default-http-blocker
> (
> > http://0.0.0.0/): Blocked mirror for repositories: [maven-restlet (
> > http://maven.restlet.org, default, releases+snapshots),
> apache.snapshots (
> > http://repository.apache.org/snapshots, default, disabled)] -> [Help 1]
> >
> > I used 3.6.3 mvn instead (looks like a simple fix).
> >
> > Thanks for packaging up this fat point release Chao Sun.
> >
> > S
> >
> > On Wed, Jan 19, 2022 at 9:50 AM Chao Sun  wrote:
> >
> > > Hi all,
> > >
> > > I've put together Hadoop 3.3.2 RC2 below:
> > >
> > > The RC is available at:
> > > http://people.apache.org/~sunchao/hadoop-3.3.2-RC2/
> > > The RC tag is at:
> > > https://github.com/apache/hadoop/releases/tag/release-3.3.2-RC2
> > > The Maven artifacts are staged at:
> > >
> https://repository.apache.org/content/repositories/orgapachehadoop-1332
> > >
> > > You can find my public key at:
> > > https://downloads.apache.org/hadoop/common/KEYS
> > >
> > > I've done the following tests and they look good:
> > > - Ran all the unit tests
> > > - Started a single node HDFS cluster and tested a few simple commands
> > > - Ran all the tests in Spark using the RC2 artifacts
> > >
> > > Please evaluate the RC and vote, thanks!
> > >
> > > Best,
> > > Chao
> > >
> >
>


Re: [VOTE] Release Apache Hadoop 3.3.2 - RC2

2022-01-24 Thread Steve Loughran
fix is in t disable auditing, which is now the default
https://issues.apache.org/jira/browse/HADOOP-18094

everything is OK for apps which retain the same fs instances for the life
of the app, but not for Hive...

will do a better fix ASAP where in exchange for loss of auditing after a GC
event, only weak refs are held in maps private to the auditor.

i will put that in hadoop common as i would want to use the same code in
thread-levek IOStatistics tracking.
there we;d demand create an IOStatistics snapshot per thread,  short lived
worker threads for stream io would still update the stats of the thread the
stream was created in. this will let lus collect stats on store io through
the orc/paquet readers for each thread doing work for a job, and include
them in job stats.

and how would that be useful? well. look at this coimparison of job/task
commit performance with the manifest committer
https://gist.github.com/steveloughran/7dc1e68220db67327b781b345b42c0b8


Re: [VOTE] Release Apache Hadoop 3.3.2 - RC2

2022-01-22 Thread Steve Loughran
`now some bad news
https://issues.apache.org/jira/browse/HADOOP-18091
S3A auditing leaks memory through ThreadLocal references

surfaces in processes with long lived threads creating and destroying many
s3a FS instances.

working on a fix right now

On Fri, 21 Jan 2022 at 21:02, Eric Payne 
wrote:

> +1 (binding)
>
> - Built from source
>
> - Brought up a non-secure virtual cluster w/ NN, 1 DN, RM, AHS, JHS, and 3
> NMs
>
> - Validated inter- and intra-queue preemption
>
> - Validated exclusive node labels
>
> Thanks a lot Chao for your diligence and hard work on this release.
>
> Eric
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> On Wednesday, January 19, 2022, 11:50:34 AM CST, Chao Sun <
> sunc...@apache.org> wrote:
>
>
>
>
>
> Hi all,
>
> I've put together Hadoop 3.3.2 RC2 below:
>
> The RC is available at:
> http://people.apache.org/~sunchao/hadoop-3.3.2-RC2/
> The RC tag is at:
> https://github.com/apache/hadoop/releases/tag/release-3.3.2-RC2
> The Maven artifacts are staged at:
> https://repository.apache.org/content/repositories/orgapachehadoop-1332
>
> You can find my public key at:
> https://downloads.apache.org/hadoop/common/KEYS
>
> I've done the following tests and they look good:
> - Ran all the unit tests
> - Started a single node HDFS cluster and tested a few simple commands
> - Ran all the tests in Spark using the RC2 artifacts
>
> Please evaluate the RC and vote, thanks!
>
> Best,
> Chao
>
> -
> To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org
> For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org
>
>


Re: [VOTE] Release Apache Hadoop 3.3.2 - RC2

2022-01-20 Thread Steve Loughran
*+1 binding.*

reviewed binaries, source, artifacts in the staging maven repository in
downstream builds. all good.

*## test run*

checked out the asf github repo at commit 6da346a358c into a location
already set up with aws and azure test credentials

ran the hadoop-aws tests with -Dparallel-tests -DtestsThreadCount=6
 -Dmarkers=delete -Dscale
and hadoop-azure against azure cardiff with -Dparallel-tests=abfs
-DtestsThreadCount=6

all happy



*## binary*
downloaded KEYS and imported, so adding your key to my list (also signed
this and updated the key servers)

downloaded rc tar and verified
```
> gpg2 --verify hadoop-3.3.2.tar.gz.asc hadoop-3.3.2.tar.gz
gpg: Signature made Sat Jan 15 23:41:10 2022 GMT
gpg:using RSA key DE7FA241EB298D027C97B2A1D8F1A97BE51ECA98
gpg: Good signature from "Chao Sun (CODE SIGNING KEY) "
[full]


> cat hadoop-3.3.2.tar.gz.sha512
SHA512 (hadoop-3.3.2.tar.gz) =
cdd3d9298ba7d6e63ed63f93c159729ea14d2b7d5e3a0640b1761c86c7714a721f88bdfa8cb1d8d3da316f616e4f0ceaace4f32845ee4441e6aaa7a12b8c647d

> shasum -a 512 hadoop-3.3.2.tar.gz
cdd3d9298ba7d6e63ed63f93c159729ea14d2b7d5e3a0640b1761c86c7714a721f88bdfa8cb1d8d3da316f616e4f0ceaace4f32845ee4441e6aaa7a12b8c647d
 hadoop-3.3.2.tar.gz
```


*# cloudstore against staged artifacts*
```
cd ~/.m2/repository/org/apache/hadoop
find . -name \*3.3.2\* -print | xargs rm -r
```
ensures no local builds have tainted the repo.

in cloudstore mvn build without tests
```
mci -Pextra -Phadoop-3.3.2 -Psnapshots-and-staging
```
this fetches all from asf staging

```
Downloading from ASF Staging:
https://repository.apache.org/content/groups/staging/org/apache/hadoop/hadoop-client/3.3.2/hadoop-client-3.3.2.pom
Downloaded from ASF Staging:
https://repository.apache.org/content/groups/staging/org/apache/hadoop/hadoop-client/3.3.2/hadoop-client-3.3.2.pom
(11 kB at 20 kB/s)
```
there's no tests there, but it did audit the download process. FWIW, that
project has switched to logback, so I now have all hadoop imports excluding
slf4j and log4j. it takes too much effort right now.

build works.

tested abfs and s3a storediags, all happy




*### google GCS against staged artifacts*

gcs is now java 11 only, so I had to switch JVMs here.

had to add a snapshots and staging profile, after which I could build and
test.

```
 -Dhadoop.three.version=3.3.2 -Psnapshots-and-staging
```
two test failures were related to auth failures where the tests were trying
to raise exceptions but things failed differently
```
[ERROR] Failures:
[ERROR]
GoogleHadoopFileSystemTest.eagerInitialization_fails_withInvalidCredentialsConfiguration:122
unexpected exception type thrown; expected:
but was:
[ERROR]
GoogleHadoopFileSystemTest.lazyInitialization_deleteCall_fails_withInvalidCredentialsConfiguration:100
value of: throwable.getMessage()
expected: Failed to create GCS FS
but was : A JSON key file may not be specified at the same time as
credentials via configuration.

```

I'm not worried here.

ran cloudstore's diagnostics against gcs.

Nice to see they are now collecting IOStatistics on their input streams. we
really need to get this collected through the parquet/orc libs and then
through the query engines.

```
> bin/hadoop jar $CLOUDSTORE storediag gs://stevel-london/

...
2022-01-20 17:52:47,447 [main] INFO  diag.StoreDiag
(StoreDurationInfo.java:(56)) - Starting: Reading a file
gs://stevel-london/dir-9cbfc774-76ff-49c0-b216-d7800369c3e1/file
input stream summary: org.apache.hadoop.fs.FSDataInputStream@6cfd9a54:
com.google.cloud.hadoop.fs.gcs.GoogleHadoopFSInputStream@78c1372d{counters=((stream_read_close_operations=1)
(stream_read_seek_backward_operations=0) (stream_read_total_bytes=7)
(stream_read_bytes=7) (stream_read_exceptions=0)
(stream_read_seek_operations=0) (stream_read_seek_bytes_skipped=0)
(stream_read_operations=3) (stream_read_bytes_backwards_on_seek=0)
(stream_read_seek_forward_operations=0)
(stream_read_operations_incomplete=1));
gauges=();
minimums=();
maximums=();
means=();
}
...
```

*### source*

once I'd done builds and tests which fetched from staging, I did a local
build and test

repeated download/validate of source tarball, unzip/untar

build with java11.

I've not done the test run there, because that directory tree doesn't have
the credentials, and this mornings run was good.

altogether then: very happy. tests good, downstream libraries building and
linking.

On Wed, 19 Jan 2022 at 17:50, Chao Sun  wrote:

> Hi all,
>
> I've put together Hadoop 3.3.2 RC2 below:
>
> The RC is available at:
> http://people.apache.org/~sunchao/hadoop-3.3.2-RC2/
> The RC tag is at:
> https://github.com/apache/hadoop/releases/tag/release-3.3.2-RC2
> The Maven artifacts are staged at:
> https://repository.apache.org/content/repositories/orgapachehadoop-1332
>
> You can find my public key at:
> https://downloads.apache.org/hadoop/common/KEYS
>
> I've done the following tests and they look good:
> - Ran all the unit tests
> - Started a single node HDFS cluster and 

Re: [DISCUSS] Migrate hadoop from log4j1 to log4j2

2022-01-20 Thread Steve Loughran
On Thu, 20 Jan 2022 at 17:15, Andrew Purtell 
wrote:

> Just to clarify: I think you want to upgrade to Log4J2 (or switch to
> LogBack) as a strategy for new releases, but you have the option in
> maintenance releases to use Reload4J to maintain Appender API and
> operational compatibility, and users who want to minimize risks in
> production while mitigating the security issues will prefer that.


i like this



>
>
> > On Jan 20, 2022, at 8:59 AM, Andrew Purtell 
> wrote:
> >
> > Reload4J has fixed all of those CVEs without requiring an upgrade.
> >
> >> On Jan 20, 2022, at 5:56 AM, Duo Zhang  wrote:
> >>
> >> There are 3 new CVEs for log4j1 reported recently[1][2][3]. So I think
> it
> >> is time to speed up the migration to log4j2 work[4] now.
> >>
> >> You can see the discussion on the jira issue[4], our goal is to fully
> >> migrate to log4j2 and the current most blocking issue is lack of the
> >> "log4j.rootLogger=INFO,Console" grammer support for log4j2. I've already
> >> started a discussion thread on the log4j dev mailing list[5] and the
> result
> >> is optimistic and I've filed an issue for log4j2[6], but I do not think
> it
> >> could be addressed and released soon. If we want to fully migrate to
> >> log4j2, then either we introduce new environment variables or split the
> old
> >> HADOOP_ROOT_LOGGER variable in the startup scripts. And considering the
> >> complexity of our current startup scripts, the work is not easy and it
> will
> >> also break lots of other hadoop deployment systems if they do not use
> our
> >> startup scripts...
> >>
> >> So after reconsidering the current situation, I prefer we use the
> log4j1.2
> >> bridge to remove the log4j1 dependency first, and once LOG4J2-3341 is
> >> addressed and released, we start to fully migrate to log4j2. Of course
> we
> >> have other problems for log4j1.2 bridge too, as we have TaskLogAppender,
> >> ContainerLogAppender and ContainerRollingLogAppender which inherit
> >> FileAppender and RollingFileAppender in log4j1, which are not part of
> the
> >> log4j1.2 bridge. But anyway, at least we could just copy the source
> code to
> >> hadoop as we have WriteAppender in log4j1.2 bridge, and these two
> classes
> >> do not have related CVEs.
> >>
> >> Thoughts? For me I would like us to make a new 3.4.x release line to
> remove
> >> the log4j1 dependencies ASAP.
> >>
> >> Thanks.
> >>
> >> 1. https://nvd.nist.gov/vuln/detail/CVE-2022-23302
> >> 2. https://nvd.nist.gov/vuln/detail/CVE-2022-23305
> >> 3. https://nvd.nist.gov/vuln/detail/CVE-2022-23307
> >> 4. https://issues.apache.org/jira/browse/HADOOP-16206
> >> 5. https://lists.apache.org/thread/gvfb3jkg6t11cyds4jmpo7lrswmx28w3
> >> 6. https://issues.apache.org/jira/browse/LOG4J2-3341
>
> -
> To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
> For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
>
>


Re: [VOTE] Release Apache Hadoop 3.3.2 - RC2

2022-01-20 Thread Steve Loughran
thanks, i'm on it..will run the aws and azure tests and then play with the
artifacts

On Wed, 19 Jan 2022 at 17:50, Chao Sun  wrote:

> Hi all,
>
> I've put together Hadoop 3.3.2 RC2 below:
>
> The RC is available at:
> http://people.apache.org/~sunchao/hadoop-3.3.2-RC2/
> The RC tag is at:
> https://github.com/apache/hadoop/releases/tag/release-3.3.2-RC2
> The Maven artifacts are staged at:
> https://repository.apache.org/content/repositories/orgapachehadoop-1332
>
> You can find my public key at:
> https://downloads.apache.org/hadoop/common/KEYS
>
> I've done the following tests and they look good:
> - Ran all the unit tests
> - Started a single node HDFS cluster and tested a few simple commands
> - Ran all the tests in Spark using the RC2 artifacts
>
> Please evaluate the RC and vote, thanks!
>
> Best,
> Chao
>


Re: [VOTE] Release Apache Hadoop 3.3.2 - RC0

2021-12-30 Thread Steve Loughran
On Tue, 14 Dec 2021 at 22:56, Chao Sun  wrote:

> Thanks all for taking a look! looks like we need another RC addressing the
> following issues.
>
> > 1. the overview page of the doc is for the Hadoop 3.0 release. It would
> be best to base the doc on top of Hadoop 3.3.0 overview page. (it's a miss
> on my part... The overview page of 3.3.1 wasn't updated)
>
> For this, I just need to update
> the hadoop-project/src/site/markdown/index.md.vm and incorporate notable
> changes made in 3.3.1/3.3.2, is that correct? looks like the file hasn't
> been touched for a while.
>
> > 2. ARM binaries is not included. For the 3.3.1 release, I had to run the
> create release script on an ARM machine separately to create the binary
> tarball.
>
> Hmm this might be challenging for me. Could you share the steps of how you
> did it? especially where did you get an ARM machine.
>
> > 3. the jdiff version
> https://github.com/apache/hadoop/blob/branch-3.3.2/hadoop-project-dist/pom.xml#L137
>
> I just need to backport this commit:
> https://github.com/apache/hadoop/commit/a77bf7cf07189911da99e305e3b80c589edbbfb5
> to branch-3.3.2 (and potentially branch-3.3)?
>
> > The 3.3.1 binary tarball is 577mb. The 3.3.2 RC0 is 608mb. I'm curious
> what are added.
>
> The difference is mostly in aws-java-sdk-bundle jar: 3.3.1 uses
> https://mvnrepository.com/artifact/com.amazonaws/aws-java-sdk-bundle/1.11.901
> while 3.3.2 uses
> https://mvnrepository.com/artifact/com.amazonaws/aws-java-sdk-bundle/1.11.1026.
> The difference is ~32.5mb.
>
> Chao
>
>>
>>> Wow, that's a big change. But the bundling of a shaded jar avoids so
many problems we've had in the past. Come January I will look at moving to
the 1.12 SDK; I don't know how long that will take to stabilize.


Re: [VOTE] Release Apache Hadoop 3.3.2 - RC0

2021-12-14 Thread Steve Loughran
I'll do my best to test this; I'm a bit broken right now.

I think we should mention in a release notes that is the version of a log4j
included in this and all previous releases is not vulnerable. But provide a
list plus links to any that have been fixed

On Fri, 10 Dec 2021 at 02:09, Chao Sun  wrote:

> Hi all,
>
> Sorry for the long delay. I've prepared RC0 for Hadoop 3.3.2 below:
>
> The RC is available at:
> http://people.apache.org/~sunchao/hadoop-3.3.2-RC0/
> The RC tag is at:
> https://github.com/apache/hadoop/releases/tag/release-3.3.2-RC0
> The Maven artifacts are staged at:
> https://repository.apache.org/content/repositories/orgapachehadoop-1330/
>
> You can find my public key at: https://people.apache.org/~sunchao/KEYS
>
> Please evaluate the RC and vote.
>
> Thanks,
> Chao
>


[jira] [Created] (MAPREDUCE-7367) Parallelize file moves in FileOutputCommitter v1 job commit

2021-10-28 Thread Steve Loughran (Jira)
Steve Loughran created MAPREDUCE-7367:
-

 Summary: Parallelize file moves in FileOutputCommitter v1 job 
commit
 Key: MAPREDUCE-7367
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-7367
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: mrv2
Affects Versions: 3.4.0
Reporter: Steve Loughran
Assignee: Rajesh Balamohan


add options to v1 job commit to scan TA dirs and rename files in parallel.

This is work by Rajesh Balamohan which is an interim patch before 
MAPREDUCE-7341 -I Don't intend to merge it). It will be the commit before 
HADOOP-17981



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org



[jira] [Resolved] (MAPREDUCE-7366) FileOutputCommitter Enable Concurent Writes

2021-10-26 Thread Steve Loughran (Jira)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-7366?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran resolved MAPREDUCE-7366.
---
Resolution: Duplicate

topic already covered in MAPREDUCE-7331

> FileOutputCommitter Enable Concurent Writes 
> 
>
> Key: MAPREDUCE-7366
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7366
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: mrv2
>Affects Versions: 3.3.1
>Reporter: ismail
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> is it possible to make `{{PENDING_DIR_NAME}}` configurable? 
> That will enable concurrent writes to same location. current if two spark 
> processes write same destination one of them is failing.
> current
> {code:java}
>  public static final String PENDING_DIR_NAME = "_temporary";{code}
> new:
> {code:java}
> PENDING_DIR_NAME = conf.get("mapreduce.fileoutputcommitter.pending.dir", 
> "_temporary");{code}
> here is custom commiter doing it: 
> https://gist.github.com/ismailsimsek/33c55d8e1fcfc79160483c38a978edbd



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org



[jira] [Reopened] (MAPREDUCE-7366) FileOutputCommitter Enable Concurent Writes

2021-10-26 Thread Steve Loughran (Jira)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-7366?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran reopened MAPREDUCE-7366:
---

> FileOutputCommitter Enable Concurent Writes 
> 
>
> Key: MAPREDUCE-7366
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7366
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: mrv2
>Affects Versions: 3.3.1
>Reporter: ismail
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> is it possible to make `{{PENDING_DIR_NAME}}` configurable? 
> That will enable concurrent writes to same location. current if two spark 
> processes write same destination one of them is failing.
> current
> {code:java}
>  public static final String PENDING_DIR_NAME = "_temporary";{code}
> new:
> {code:java}
> PENDING_DIR_NAME = conf.get("mapreduce.fileoutputcommitter.pending.dir", 
> "_temporary");{code}
> here is custom commiter doing it: 
> https://gist.github.com/ismailsimsek/33c55d8e1fcfc79160483c38a978edbd



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org



[jira] [Resolved] (MAPREDUCE-7366) FileOutputCommitter Enable Concurent Writes

2021-10-25 Thread Steve Loughran (Jira)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-7366?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran resolved MAPREDUCE-7366.
---
Resolution: Won't Fix

closing as a wontfix as

* v2 is broken
* this doesn't work with v1 as v1 job commit assumes exclusive access to the 
dest dir
* general fear of going near FileOutputCommitter

Proposed a patch in MAPREDUCE-7366

> FileOutputCommitter Enable Concurent Writes 
> 
>
> Key: MAPREDUCE-7366
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7366
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Reporter: ismail
>Priority: Major
>  Labels: committers, easyfix, easytask, pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> is it possible to make `{{PENDING_DIR_NAME}}` configurable? 
> That will enable concurrent writes to same location. current if two spark 
> processes write same destination one of them is failing.
> current
> {code:java}
>  public static final String PENDING_DIR_NAME = "_temporary";{code}
> new:
> {code:java}
> PENDING_DIR_NAME = conf.get("mapreduce.fileoutputcommitter.pending.dir", 
> "_temporary");{code}
> here is custom commiter doing it: 
> https://gist.github.com/ismailsimsek/33c55d8e1fcfc79160483c38a978edbd



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org



  1   2   3   4   >