Re: [DISCUSS] hadoop branch-3.3+ going to java11 only

2023-03-28 Thread Chris Nauroth
In theory, I like the idea of setting aside Java 8. Unfortunately, I don't
know that upgrading within the 3.3 line adheres to our binary compatibility
policy [1]. I don't see specific discussion of the Java version there, but
it states that you should be able to drop in minor upgrades and have
existing apps keep working. Users might find it surprising if they try to
upgrade a cluster that has JDK 8.

There is also the question of impact on downstream projects [2]. We'd have
to check plans with our consumers.

What about the idea of shooting for a 3.4 release on JDK 11 (or even 17)?
The downside is that we'd probably need to set boundaries on end of
life/limited support for 3.2 and 3.3 to keep the workload manageable.

[1]
https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/Compatibility.html#Java_Binary_compatibility_for_end-user_applications_i.e._Apache_Hadoop_ABI
[2] https://github.com/apache/spark/blob/v3.3.2/pom.xml#L109

Chris Nauroth


On Tue, Mar 28, 2023 at 11:10 AM Ayush Saxena  wrote:

> >
> >  it's already hard to migrate from JDK8 why not retarget JDK17.
> >
>
> +1, makes sense to me, sounds like a win-win situation to me, though there
> would be some additional issues to chase now :)
>
> -Ayush
>
>
> On Tue, 28 Mar 2023 at 23:29, Wei-Chiu Chuang  wrote:
>
> > My random thoughts. Probably bad takes:
> >
> > There are projects experimenting with JDK17 now.
> > JDK11 active support will end in 6 months. If it's already hard to
> migrate
> > from JDK8 why not retarget JDK17.
> >
> > On Tue, Mar 28, 2023 at 10:30 AM Ayush Saxena 
> wrote:
> >
> >> I know Jersey upgrade as a blocker. Some folks were chasing that last
> >> year during 3.3.4 time, I don’t know where it is now, didn’t see then
> >> what’s the problem there but I remember there was some intitial PR which
> >> did it for HDFS atleast, so I never looked beyond that…
> >>
> >> I too had jdk-11 in my mind, but only for trunk. 3.4.x can stay as
> >> java-11 only branch may be, but that is something later to decide, once
> we
> >> get the code sorted…
> >>
> >> -Ayush
> >>
> >> > On 28-Mar-2023, at 9:16 PM, Steve Loughran
> 
> >> wrote:
> >> >
> >> > well, how about we flip the switch and get on with it.
> >> >
> >> > slf4j seems happy on java11,
> >> >
> >> > side issue, anyone seen test failures on zulu1.8; somehow my test run
> is
> >> > failing and i'm trying to work out whether its a mismatch in command
> >> > line/ide jvm versions, or the 3.3.5 JARs have been built with an
> openjdk
> >> > version which requires IntBuffer implements an overridden method
> >> IntBuffer
> >> > rewind().
> >> >
> >> > java.lang.NoSuchMethodError:
> >> java.nio.IntBuffer.rewind()Ljava/nio/IntBuffer;
> >> >
> >> > at
> >> org.apache.hadoop.fs.FSInputChecker.verifySums(FSInputChecker.java:341)
> >> > at
> >> >
> >>
> org.apache.hadoop.fs.FSInputChecker.readChecksumChunk(FSInputChecker.java:308)
> >> > at org.apache.hadoop.fs.FSInputChecker.read1(FSInputChecker.java:257)
> >> > at org.apache.hadoop.fs.FSInputChecker.read(FSInputChecker.java:202)
> >> > at java.io.DataInputStream.read(DataInputStream.java:149)
> >> >
> >> >> On Tue, 28 Mar 2023 at 15:52, Viraj Jasani 
> wrote:
> >> >> IIRC some of the ongoing major dependency upgrades (log4j 1 to 2,
> >> jersey 1
> >> >> to 2 and junit 4 to 5) are blockers for java 11 compile + test
> >> stability.
> >> >> On Tue, Mar 28, 2023 at 4:55 AM Steve Loughran
> >>  >> >> wrote:
> >> >>> Now that hadoop 3.3.5 is out, i want to propose something new
> >> >>> we switch branch-3.3 and trunk to being java11 only
> >> >>> 1. java 11 has been out for years
> >> >>> 2. oracle java 8 is no longer available under "premier support"; you
> >> >>> can't really get upgrades
> >> >>>
> https://www.oracle.com/java/technologies/java-se-support-roadmap.html
> >> >>> 3. openJDK 8 releases != oracle ones, and things you compile with
> them
> >> >>> don't always link to oracle java 8 (some classes in java.nio have
> >> >> added
> >> >>> more overrides)
> >> >>> 4. more and more libraries we want to upgrade to/bundle are java 11
> >> >> only
> >> >>> 5. moving to java 11 would cut our yetus build workload in half, and
> >> >>> line up for adding java 17 builds instead.
> >> >>> I know there are some outstanding issues still in
> >> >>> https://issues.apache.org/jira/browse/HADOOP-16795 -but are they
> >> >> blockers?
> >> >>> Could we just move to java11 and enhance at our leisure, once java8
> >> is no
> >> >>> longer a concern.
> >>
> >> -
> >> To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
> >> For additional commands, e-mail: common-dev-h...@hadoop.apache.org
> >>
> >>
>


Re: [VOTE] Release Apache Hadoop 3.3.5 (RC3)

2023-03-20 Thread Chris Nauroth
+1

Thank you for the release candidate, Steve!

* Verified all checksums.
* Verified all signatures.
* Built from source, including native code on Linux.
* mvn clean package -Pnative -Psrc -Drequire.openssl -Drequire.snappy
-Drequire.zstd -DskipTests
* Tests passed.
* mvn --fail-never clean test -Pnative -Dparallel-tests
-Drequire.snappy -Drequire.zstd -Drequire.openssl
-Dsurefire.rerunFailingTestsCount=3 -DtestsThreadCount=8
* Checked dependency tree to make sure we have all of the expected library
updates that are mentioned in the release notes.
* mvn -o dependency:tree
* Confirmed that hadoop-openstack is now just a stub placeholder artifact
with no code.
* For ARM verification:
* Ran "file " on all native binaries in the ARM tarball to confirm
they actually came out with ARM as the architecture.
* Output of hadoop checknative -a on ARM looks good.
* Ran a MapReduce job with the native bzip2 codec for compression, and
it worked fine.
* Ran a MapReduce job with YARN configured to use
LinuxContainerExecutor and verified launching the containers through
container-executor worked.

Chris Nauroth


On Mon, Mar 20, 2023 at 3:45 AM Ayush Saxena  wrote:

> +1(Binding)
>
> * Built from source (x86 & ARM)
> * Successful Native Build (x86 & ARM)
> * Verified Checksums (x86 & ARM)
> * Verified Signature (x86 & ARM)
> * Checked the output of hadoop version (x86 & ARM)
> * Verified the output of hadoop checknative (x86 & ARM)
> * Ran some basic HDFS shell commands.
> * Ran some basic Yarn shell commands.
> * Played a bit with HDFS Erasure Coding.
> * Ran TeraGen & TeraSort
> * Browed through NN, DN, RM & NM UI
> * Skimmed over the contents of website.
> * Skimmed over the contents of maven repo.
> * Selectively ran some HDFS & CloudStore tests
>
> Thanx Steve for driving the release. Good Luck!!!
>
> -Ayush
>
> > On 20-Mar-2023, at 12:54 PM, Xiaoqiao He  wrote:
> >
> > +1
> >
> > * Verified signature and checksum of the source tarball.
> > * Built the source code on Ubuntu and OpenJDK 11 by `mvn clean package
> > -DskipTests -Pnative -Pdist -Dtar`.
> > * Setup pseudo cluster with HDFS and YARN.
> > * Run simple FsShell - mkdir/put/get/mv/rm (include EC) and check the
> > result.
> > * Run example mr applications and check the result - Pi & wordcount.
> > * Check the Web UI of NameNode/DataNode/Resourcemanager/NodeManager etc.
> >
> > Thanks Steve for your work.
> >
> > Best Regards,
> > - He Xiaoqiao
> >
> >> On Mon, Mar 20, 2023 at 12:04 PM Masatake Iwasaki <
> iwasak...@oss.nttdata.com>
> >> wrote:
> >>
> >> +1
> >>
> >> + verified the signature and checksum of the source tarball.
> >>
> >> + built from the source tarball on Rocky Linux 8 (x86_64) and OpenJDK 8
> >> with native profile enabled.
> >>   + launched pseudo distributed cluster including kms and httpfs with
> >> Kerberos and SSL enabled.
> >>   + created encryption zone, put and read files via httpfs.
> >>   + ran example MR wordcount over encryption zone.
> >>   + checked the binary of container-executor.
> >>
> >> + built rpm packages by Bigtop (with trivial modifications) on Rocky
> Linux
> >> 8 (aarch64).
> >>   + ran smoke-tests of hdfs, yarn and mapreduce.
> >> + built site documentation and skimmed the contents.
> >>   +  Javadocs are contained.
> >>
> >> Thanks,
> >> Masatake Iwasaki
> >>
> >>> On 2023/03/16 4:47, Steve Loughran wrote:
> >>> Apache Hadoop 3.3.5
> >>>
> >>> Mukund and I have put together a release candidate (RC3) for Hadoop
> >> 3.3.5.
> >>>
> >>> What we would like is for anyone who can to verify the tarballs,
> >> especially
> >>> anyone who can try the arm64 binaries as we want to include them too.
> >>>
> >>> The RC is available at:
> >>> https://dist.apache.org/repos/dist/dev/hadoop/hadoop-3.3.5-RC3/
> >>>
> >>> The git tag is release-3.3.5-RC3, commit 706d88266ab
> >>>
> >>> The maven artifacts are staged at
> >>>
> https://repository.apache.org/content/repositories/orgapachehadoop-1369/
> >>>
> >>> You can find my public key at:
> >>> https://dist.apache.org/repos/dist/release/hadoop/common/KEYS
> >>>
> >>> Change log
> >>>
> >>
> https://dist.apache.org/repos/dist/dev/hadoop/hadoop-3.3.5-RC3/CHANGELOG.md
> >>>
> >>> Release notes
> >>>

Re: [VOTE] Release Apache Hadoop 3.3.5 (RC3)

2023-03-18 Thread Chris Nauroth
Yes, I'm in progress on verification, so you can expect to get a vote from
me. Thank you, Steve!

Chris Nauroth


On Sat, Mar 18, 2023 at 9:19 AM Ashutosh Gupta 
wrote:

> Hi Steve
>
> I will also do it by today/tomorrow.
>
> Thanks,
> Ashutosh
>
> On Sat, 18 Mar, 2023, 4:07 pm Steve Loughran,  >
> wrote:
>
> > Thank you for this!
> >
> > Can anyone else with time do a review too? i really want to get this one
> > done, now the HDFS issues are all resolved.
> >
> > I do not want this release to fall by the wayside through lack of votes
> > alone. In fact, I would be very unhappy
> >
> >
> >
> > On Sat, 18 Mar 2023 at 06:47, Viraj Jasani  wrote:
> >
> > > +1 (non-binding)
> > >
> > > * Signature/Checksum: ok
> > > * Rat check (1.8.0_341): ok
> > >  - mvn clean apache-rat:check
> > > * Built from source (1.8.0_341): ok
> > >  - mvn clean install  -DskipTests
> > > * Built tar from source (1.8.0_341): ok
> > >  - mvn clean package  -Pdist -DskipTests -Dtar
> -Dmaven.javadoc.skip=true
> > >
> > > Containerized deployments:
> > > * Deployed and started Hdfs - NN, DN, JN with Hbase 2.5 and Zookeeper
> 3.7
> > > * Deployed and started JHS, RM, NM
> > > * Hbase, hdfs CRUD looks good
> > > * Sample RowCount MapReduce job looks good
> > >
> > > * S3A tests with scale profile looks good
> > >
> > >
> > > On Wed, Mar 15, 2023 at 12:48 PM Steve Loughran
> > > 
> > > wrote:
> > >
> > > > Apache Hadoop 3.3.5
> > > >
> > > > Mukund and I have put together a release candidate (RC3) for Hadoop
> > > 3.3.5.
> > > >
> > > > What we would like is for anyone who can to verify the tarballs,
> > > especially
> > > > anyone who can try the arm64 binaries as we want to include them too.
> > > >
> > > > The RC is available at:
> > > > https://dist.apache.org/repos/dist/dev/hadoop/hadoop-3.3.5-RC3/
> > > >
> > > > The git tag is release-3.3.5-RC3, commit 706d88266ab
> > > >
> > > > The maven artifacts are staged at
> > > >
> > https://repository.apache.org/content/repositories/orgapachehadoop-1369/
> > > >
> > > > You can find my public key at:
> > > > https://dist.apache.org/repos/dist/release/hadoop/common/KEYS
> > > >
> > > > Change log
> > > >
> > >
> >
> https://dist.apache.org/repos/dist/dev/hadoop/hadoop-3.3.5-RC3/CHANGELOG.md
> > > >
> > > > Release notes
> > > >
> > > >
> > >
> >
> https://dist.apache.org/repos/dist/dev/hadoop/hadoop-3.3.5-RC3/RELEASENOTES.md
> > > >
> > > > This is off branch-3.3 and is the first big release since 3.3.2.
> > > >
> > > > Key changes include
> > > >
> > > > * Big update of dependencies to try and keep those reports of
> > > >   transitive CVEs under control -both genuine and false positives.
> > > > * HDFS RBF enhancements
> > > > * Critical fix to ABFS input stream prefetching for correct reading.
> > > > * Vectored IO API for all FSDataInputStream implementations, with
> > > >   high-performance versions for file:// and s3a:// filesystems.
> > > >   file:// through java native io
> > > >   s3a:// parallel GET requests.
> > > > * This release includes Arm64 binaries. Please can anyone with
> > > >   compatible systems validate these.
> > > > * and compared to the previous RC, all the major changes are
> > > >   HDFS issues.
> > > >
> > > > Note, because the arm64 binaries are built separately on a different
> > > > platform and JVM, their jar files may not match those of the x86
> > > > release -and therefore the maven artifacts. I don't think this is
> > > > an issue (the ASF actually releases source tarballs, the binaries are
> > > > there for help only, though with the maven repo that's a bit
> blurred).
> > > >
> > > > The only way to be consistent would actually untar the x86.tar.gz,
> > > > overwrite its binaries with the arm stuff, retar, sign and push out
> > > > for the vote. Even automating that would be risky.
> > > >
> > > > Please try the release and vote. The vote will run for 5 days.
> > > >
> > > > -Steve
> > > >
> > >
> >
>


[jira] [Resolved] (MAPREDUCE-7375) JobSubmissionFiles don't set right permission after mkdirs

2023-01-12 Thread Chris Nauroth (Jira)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-7375?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Nauroth resolved MAPREDUCE-7375.
--
Fix Version/s: 3.4.0
   3.2.5
   3.3.9
   Resolution: Fixed

> JobSubmissionFiles don't set right permission after mkdirs
> --
>
> Key: MAPREDUCE-7375
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7375
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: mrv2
>Affects Versions: 3.3.2
>Reporter: Zhang Dongsheng
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.2.5, 3.3.9
>
> Attachments: MAPREDUCE-7375.patch
>
>  Time Spent: 2h 40m
>  Remaining Estimate: 0h
>
> JobSubmissionFiles provide getStagingDir to get Staging Directory.If 
> stagingArea missing, method will create new directory with this.
> {quote}fs.mkdirs(stagingArea, new FsPermission(JOB_DIR_PERMISSION));{quote}
> It seems create new directory with JOB_DIR_PERMISSION,but this permission 
> will be apply by umask.If umask too strict , this permission may be 000(if 
> umask is 700).So we should change permission after create.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org



Re: [VOTE] Release Apache Hadoop 3.3.5

2023-01-04 Thread Chris Nauroth
Is it a problem limited to MiniDFSCluster, or is it a broader problem of
RPC client resource cleanup? The patch is changing connection close
cleanup, so I assumed the latter. If so, then it could potentially impact
applications integrating with the RPC clients.

If the problem is limited to MiniDFSCluster and restarts within a single
JVM, then I agree the impact is smaller. Then, we'd want to consider what
downstream projects have tests that do restarts on a MiniDFSCluster.

Chris Nauroth


On Wed, Jan 4, 2023 at 4:22 PM Ayush Saxena  wrote:

> Hmm I'm looking at HADOOP-11867 related stuff but couldn't find it
>> mentioned anywhere in change log or release notes. Are they actually
>> up-to-date?
>
>
> I don't think there is any issue with the ReleaseNotes generation as such
> but with the Resolution type of this ticket, It ain't marked as Fixed but
> Done. The other ticket which is marked Done is also not part of the release
> notes. [1]
>
> if I'm understanding the potential impact of HDFS-16853
>> correctly, then it's serious enough to fix before a release. (I could
>> change my vote if someone wants to make a case that it's not that
>> serious.)
>>
>
> Chris, I just had a very quick look at HDFS-16853, I am not sure if this
> can happen outside a MiniDfsCluster setup? Just guessing from the
> description in the ticket. It looked like when we did a restart of the
> Namenode in the MiniDfsCluster, I guess that would be in the same single
> JVM, and that is why a previous blocked thread caused issues with the
> restart. That is what I understood, I haven't checked the code though.
>
> Second, In the same context, Being curious If this lands up being a
> MiniDfsCluster only issue, do we still consider this a release blocker? Not
> saying in a way it won't be serious, MiniDfsCluster is very widely used by
> downstream projects and all, so just wanted to know
>
> Regarding the Hive & Bouncy castle. The PR seems to have a valid binding
> veto, I am not sure if it will get done any time soon, so if the use case
> is something required, I would suggest handling it at Hadoop itself. It
> seems to be centric to Hive-3.x, I tried compiling the Hive master branch
> with 3.3.5 and it passed. Other than that Hive officially support only
> Hadoop-3.3.1 and that too only in the last 4.x release[2]
>
>
> [1]
> https://issues.apache.org/jira/browse/HADOOP-11867?jql=project%20%3D%20HADOOP%20AND%20resolution%20%3D%20Done%20AND%20fixVersion%20%3D%203.3.5%20ORDER%20BY%20resolution%20DESC
> [2] https://issues.apache.org/jira/browse/HIVE-24484
>
> -Ayush
>
> On Tue, 3 Jan 2023 at 23:51, Chris Nauroth  wrote:
>
>> -1, because if I'm understanding the potential impact of HDFS-16853
>> correctly, then it's serious enough to fix before a release. (I could
>> change my vote if someone wants to make a case that it's not that
>> serious.)
>>
>> Otherwise, this RC was looking good:
>>
>> * Verified all checksums.
>> * Verified all signatures.
>> * Built from source, including native code on Linux.
>> * mvn clean package -Pnative -Psrc -Drequire.openssl -Drequire.snappy
>> -Drequire.zstd -DskipTests
>> * Tests passed.
>> * mvn --fail-never clean test -Pnative -Dparallel-tests
>> -Drequire.snappy -Drequire.zstd -Drequire.openssl
>> -Dsurefire.rerunFailingTestsCount=3 -DtestsThreadCount=8
>> * Checked dependency tree to make sure we have all of the expected library
>> updates that are mentioned in the release notes.
>> * mvn -o dependency:tree
>> * Farewell, S3Guard.
>> * Confirmed that hadoop-openstack is now just a stub placeholder artifact
>> with no code.
>> * For ARM verification:
>> * Ran "file " on all native binaries in the ARM tarball to confirm
>> they actually came out with ARM as the architecture.
>> * Output of hadoop checknative -a on ARM looks good.
>> * Ran a MapReduce job with the native bzip2 codec for compression, and
>> it worked fine.
>> * Ran a MapReduce job with YARN configured to use
>> LinuxContainerExecutor and verified launching the containers through
>> container-executor worked.
>>
>> My local setup didn't have the test failures mentioned by Viraj, though
>> there was some flakiness with a few HDFS snapshot tests timing out.
>>
>> Regarding Hive and Bouncy Castle, there is an existing issue and pull
>> request tracking an upgrade attempt. It's looking like some amount of code
>> changes are required:
>>
>> https://issues.apache.org/jira/browse/HIVE-26648
>> https://github.com/apache/hive/pull/3744
>>
>> Chris Nauroth
>>
>>
>> On Tue, Jan 3, 2

Re: [VOTE] Release Apache Hadoop 3.3.5

2023-01-03 Thread Chris Nauroth
-1, because if I'm understanding the potential impact of HDFS-16853
correctly, then it's serious enough to fix before a release. (I could
change my vote if someone wants to make a case that it's not that serious.)

Otherwise, this RC was looking good:

* Verified all checksums.
* Verified all signatures.
* Built from source, including native code on Linux.
* mvn clean package -Pnative -Psrc -Drequire.openssl -Drequire.snappy
-Drequire.zstd -DskipTests
* Tests passed.
* mvn --fail-never clean test -Pnative -Dparallel-tests
-Drequire.snappy -Drequire.zstd -Drequire.openssl
-Dsurefire.rerunFailingTestsCount=3 -DtestsThreadCount=8
* Checked dependency tree to make sure we have all of the expected library
updates that are mentioned in the release notes.
* mvn -o dependency:tree
* Farewell, S3Guard.
* Confirmed that hadoop-openstack is now just a stub placeholder artifact
with no code.
* For ARM verification:
* Ran "file " on all native binaries in the ARM tarball to confirm
they actually came out with ARM as the architecture.
* Output of hadoop checknative -a on ARM looks good.
* Ran a MapReduce job with the native bzip2 codec for compression, and
it worked fine.
* Ran a MapReduce job with YARN configured to use
LinuxContainerExecutor and verified launching the containers through
container-executor worked.

My local setup didn't have the test failures mentioned by Viraj, though
there was some flakiness with a few HDFS snapshot tests timing out.

Regarding Hive and Bouncy Castle, there is an existing issue and pull
request tracking an upgrade attempt. It's looking like some amount of code
changes are required:

https://issues.apache.org/jira/browse/HIVE-26648
https://github.com/apache/hive/pull/3744

Chris Nauroth


On Tue, Jan 3, 2023 at 8:57 AM Chao Sun  wrote:

> Hmm I'm looking at HADOOP-11867 related stuff but couldn't find it
> mentioned anywhere in change log or release notes. Are they actually
> up-to-date?
>
> On Mon, Jan 2, 2023 at 7:48 AM Masatake Iwasaki
>  wrote:
> >
> > >- building HBase 2.4.13 and Hive 3.1.3 against 3.3.5 failed due to
> dependency change.
> >
> > For HBase, classes under com/sun/jersey/json/* and com/sun/xml/* are not
> expected in hbase-shaded-with-hadoop-check-invariants.
> > Updating hbase-shaded/pom.xml is expected to be the fix as done in
> HBASE-27292.
> >
> https://github.com/apache/hbase/commit/00612106b5fa78a0dd198cbcaab610bd8b1be277
> >
> >[INFO] --- exec-maven-plugin:1.6.0:exec
> (check-jar-contents-for-stuff-with-hadoop) @
> hbase-shaded-with-hadoop-check-invariants ---
> >[ERROR] Found artifact with unexpected contents:
> '/home/rocky/srcs/bigtop/build/hbase/rpm/BUILD/hbase-2.4.13/hbase-shaded/hbase-shaded-client/target/hbase-shaded-client-2.4.13.jar'
> >Please check the following and either correct the build or update
> >the allowed list with reasoning.
> >
> >com/
> >com/sun/
> >com/sun/jersey/
> >com/sun/jersey/json/
> >...
> >
> >
> > For Hive, classes belonging to org.bouncycastle:bcprov-jdk15on:1.68 seem
> to be problematic.
> > Excluding them on hive-jdbc  might be the fix.
> >
> >[ERROR] Failed to execute goal
> org.apache.maven.plugins:maven-shade-plugin:3.2.1:shade (default) on
> project hive-jdbc: Error creating shaded jar: Problem shading JAR
> /home/rocky/.m2/repository/org/bouncycastle/bcprov-jdk15on/1.68/bcprov-jdk15on-1.68.jar
> entry
> META-INF/versions/15/org/bouncycastle/jcajce/provider/asymmetric/edec/SignatureSpi$EdDSA.class:
> java.lang.IllegalArgumentException: Unsupported class file major version 59
> -> [Help 1]
> >...
> >
> >
> > On 2023/01/02 22:02, Masatake Iwasaki wrote:
> > > Thanks for your great effort for the new release, Steve and Mukund.
> > >
> > > +1 while it would be nice if we can address missed Javadocs.
> > >
> > > + verified the signature and checksum.
> > > + built from source tarball on Rocky Linux 8 and OpenJDK 8 with native
> profile enabled.
> > >+ launched pseudo distributed cluster including kms and httpfs with
> Kerberos and SSL enabled.
> > >+ created encryption zone, put and read files via httpfs.
> > >+ ran example MR wordcount over encryption zone.
> > > + built rpm packages by Bigtop and ran smoke-tests on Rocky Linux 8
> (both x86_64 and aarch64).
> > >- building HBase 2.4.13 and Hive 3.1.3 against 3.3.5 failed due to
> dependency change.
> > >  # while building HBase 2.4.13 and Hive 3.1.3 against Hadoop 3.3.4
> worked.
> > > + skimmed the site contents.
> > >- Javadocs are not contained (under r3.3.5/

Re: [VOTE] Release Apache Hadoop 3.3.5

2022-12-27 Thread Chris Nauroth
I'm not quite ready to vote yet, pending some additional testing.

However, I wanted to give a quick update that ARM support is looking good
from my perspective. I focused on verifying the native bits that would need
to be different for ARM vs. x64. Here is what I did:
* Ran "file " on all native binaries in the ARM tarball to confirm they
actually came out with ARM as the architecture.
* Output of hadoop checknative -a on ARM looks good.
* Ran a MapReduce job with the native bzip2 codec for compression, and it
worked fine.
* Ran a MapReduce job with YARN configured to use LinuxContainerExecutor
and verified launching the containers through container-executor worked.

Chris Nauroth


On Wed, Dec 21, 2022 at 11:29 AM Steve Loughran 
wrote:

> Mukund and I have put together a release candidate (RC0) for Hadoop 3.3.5.
>
> Given the time of year it's a bit unrealistic to run a 5 day vote and
> expect people to be able to test it thoroughly enough to make this the one
> we can ship.
>
> What we would like is for anyone who can to verify the tarballs, and test
> the binaries, especially anyone who can try the arm64 binaries. We've got
> the building of those done and now the build file will incorporate them
> into the release -but neither of us have actually tested it yet. Maybe I
> should try it on my pi400 over xmas.
>
> The maven artifacts are up on the apache staging repo -they are the ones
> from x86 build. Building and testing downstream apps will be incredibly
> helpful.
>
> The RC is available at:
> https://dist.apache.org/repos/dist/dev/hadoop/hadoop-3.3.5-RC0/
>
> The git tag is release-3.3.5-RC0, commit 3262495904d
>
> The maven artifacts are staged at
> https://repository.apache.org/content/repositories/orgapachehadoop-1365/
>
> You can find my public key at:
> https://dist.apache.org/repos/dist/release/hadoop/common/KEYS
>
> Change log
> https://dist.apache.org/repos/dist/dev/hadoop/hadoop-3.3.5-RC0/CHANGELOG.md
>
> Release notes
>
> https://dist.apache.org/repos/dist/dev/hadoop/hadoop-3.3.5-RC0/RELEASENOTES.md
>
> This is off branch-3.3 and is the first big release since 3.3.2.
>
> Key changes include
>
> * Big update of dependencies to try and keep those reports of
>   transitive CVEs under control -both genuine and false positive.
> * HDFS RBF enhancements
> * Critical fix to ABFS input stream prefetching for correct reading.
> * Vectored IO API for all FSDataInputStream implementations, with
>   high-performance versions for file:// and s3a:// filesystems.
>   file:// through java native io
>   s3a:// parallel GET requests.
> * This release includes Arm64 binaries. Please can anyone with
>   compatible systems validate these.
>
>
> Please try the release and vote on it, even though i don't know what is a
> good timeline here...i'm actually going on holiday in early jan. Mukund is
> around and so can drive the process while I'm offline.
>
> Assuming we do have another iteration, the RC1 will not be before mid jan
> for that reason
>
> Steve (and mukund)
>


[jira] [Resolved] (MAPREDUCE-7425) Document Fix for yarn.app.mapreduce.client-am.ipc.max-retries

2022-11-01 Thread Chris Nauroth (Jira)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-7425?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Nauroth resolved MAPREDUCE-7425.
--
Fix Version/s: 3.4.0
   3.3.5
   3.2.5
 Assignee: teng wang
   Resolution: Fixed

> Document Fix for yarn.app.mapreduce.client-am.ipc.max-retries
> -
>
> Key: MAPREDUCE-7425
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7425
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: yarn
>Affects Versions: 3.3.4
>Reporter: teng wang
>Assignee: teng wang
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.3.5, 3.2.5
>
>
> The document of *yarn.app.mapreduce.client-am.ipc.max-retries* and 
> *yarn.app.mapreduce.client-am.ipc.max-retries-on-timeouts* is not detailed 
> and complete. *yarn.app.mapreduce.client-am.ipc.max-retries* is used to 
> *overwrite ipc.client.connect.max.retries* in ClientServiceDelegate.java. So, 
> the document is suggested to fix as: (refer to yarn.client.failover-retries)
>  
> {code:java}
> // mapred-default.xml
> 
>   yarn.app.mapreduce.client-am.ipc.max-retries
>   3
>   The number of client retries to the AM - before reconnecting
> -    to the RM to fetch Application Status.
> +    to the RM to fetch Application Status. 
> +    In other words, it is the ipc.client.connect.max.retries to be used 
> during
> +    reconnecting to the RM and fetching Application Status.
>  {code}
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org



Re: [VOTE] Release Apache Hadoop 3.3.4

2022-08-03 Thread Chris Nauroth
+1 (binding)

* Verified all checksums.
* Verified all signatures.
* Built from source, including native code on Linux.
* mvn clean package -Pnative -Psrc -Drequire.openssl -Drequire.snappy
-Drequire.zstd -DskipTests
* Tests passed.
* mvn --fail-never clean test -Pnative -Dparallel-tests
-Drequire.snappy -Drequire.zstd -Drequire.openssl
-Dsurefire.rerunFailingTestsCount=3 -DtestsThreadCount=8
* Checked dependency tree to make sure we have all of the expected library
updates that are mentioned in the release notes.
* mvn -o dependency:tree

I saw a LibHDFS test failure, but I know it's something flaky that's
already tracked in a JIRA issue. The release looks good. Steve, thank you
for driving this.

Chris Nauroth


On Wed, Aug 3, 2022 at 11:27 AM Steve Loughran 
wrote:

> my vote for this is +1, binding.
>
> obviously I`m biased, but i do not want to have to issue any more interim
> releases before the feature release off branch-3.3, so I am trying to be
> ruthless.
>
> my client vaidator ant project has a more targets to help with releasing,
> and now builds a lot mor of my local projects
> https://github.com/steveloughran/validate-hadoop-client-artifacts
> all good as far as my test coverage goes, with these projects validating
> the staged dependencies.
>
> now, who else can review
>
> On Fri, 29 Jul 2022 at 19:47, Steve Loughran  wrote:
>
> >
> >
> > I have put together a release candidate (RC1) for Hadoop 3.3.4
> >
> > The RC is available at:
> > https://dist.apache.org/repos/dist/dev/hadoop/hadoop-3.3.4-RC1/
> >
> > The git tag is release-3.3.4-RC1, commit a585a73c3e0
> >
> > The maven artifacts are staged at
> > https://repository.apache.org/content/repositories/orgapachehadoop-1358/
> >
> > You can find my public key at:
> > https://dist.apache.org/repos/dist/release/hadoop/common/KEYS
> >
> > Change log
> >
> https://dist.apache.org/repos/dist/dev/hadoop/hadoop-3.3.4-RC1/CHANGELOG.md
> >
> > Release notes
> >
> >
> https://dist.apache.org/repos/dist/dev/hadoop/hadoop-3.3.4-RC1/RELEASENOTES.md
> >
> > There's a very small number of changes, primarily critical code/packaging
> > issues and security fixes.
> >
> > See the release notes for details.
> >
> > Please try the release and vote. The vote will run for 5 days.
> >
> > steve
> >
>


[jira] [Resolved] (MAPREDUCE-7372) MapReduce set permission too late in copyJar method

2022-07-25 Thread Chris Nauroth (Jira)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-7372?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Nauroth resolved MAPREDUCE-7372.
--
Fix Version/s: 3.4.0
   3.3.9
   3.2.5
 Hadoop Flags: Reviewed
   Resolution: Fixed

Committed fix to trunk, branch-3.3 and branch-3.2. [~skysider], thank you for 
the contribution.

> MapReduce set permission too late in copyJar method
> ---
>
> Key: MAPREDUCE-7372
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7372
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: mrv2
>Affects Versions: 3.3.1
>Reporter: Zhang Dongsheng
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.3.9, 3.2.5
>
> Attachments: MAPREDUCE-7372.patch
>
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> while execute copyJar in JobResourceUploader .the setPermission running after 
> setReplication,but setReplication need permission first.So if we set restrict 
> umask in project such as 0600, the mapreduce process will fail.
> In patch file , I put setPermisson before setReplication.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org



Re: [VOTE] Release Apache Hadoop 3.2.4 - RC0

2022-07-21 Thread Chris Nauroth
I'm changing my vote to +1 (binding).

Masatake and Ashutosh, thank you for investigating.

I reran tests without the parallel options, and that mostly addressed the
failures. Maybe the tests in question are just not sufficiently isolated to
support parallel execution. That looks to be the case for TestFsck, where
the failure was caused by missing audit log entries. This test works by
toggling global logging state, so I can see why multi-threaded execution
might confuse the test.

Chris Nauroth


On Thu, Jul 21, 2022 at 12:01 AM Ashutosh Gupta 
wrote:

> +1(non-binding)
>
> * Builds from source look good.
> * Checksums and signatures are correct.
> * Running basic HDFS and MapReduce commands looks good.
>
> > * TestAMRMProxy - Not able to reproduce in local
> > * TestFsck - I can see failure only I can see is
>  TestFsck.testFsckListCorruptSnapshotFiles which passed after applying
> HDFS-15038
> > * TestSLSStreamAMSynth - Not able to reproduce in local
> > * TestServiceAM - Not able to reproduce in local
>
> Thanks Masatake for driving this release.
>
> On Thu, Jul 21, 2022 at 5:51 AM Masatake Iwasaki <
> iwasak...@oss.nttdata.com>
> wrote:
>
> > Hi developers,
> >
> > I'm still waiting for your vote.
> > I'm considering the intermittent test failures mentioned by Chris are not
> > blocker.
> > Please file a JIRA and let me know if you find a blocker issue.
> >
> > I will appreciate your help for the release process.
> >
> > Regards,
> > Masatake Iwasaki
> >
> > On 2022/07/20 14:50, Masatake Iwasaki wrote:
> > >> TestServiceAM
> > >
> > > I can see the reported failure of TestServiceAM in some "Apache Hadoop
> > qbt Report: branch-3.2+JDK8 on Linux/x86_64".
> > > 3.3.0 and above might be fixed by YARN-8867 which added guard using
> > GenericTestUtils#waitFor for stabilizing the
> > testContainersReleasedWhenPreLaunchFails.
> > > YARN 8867 did not modified other code under hadoop-yarn-services.
> > > If it is the case, TestServiceAM can be tagged as flaky in branch-3.2.
> > >
> > >
> > > On 2022/07/20 14:21, Masatake Iwasaki wrote:
> > >> Thanks for testing the RC0, Chris.
> > >>
> > >>> The following are new test failures for me on 3.2.4:
> > >>> * TestAMRMProxy
> > >>> * TestFsck
> > >>> * TestSLSStreamAMSynth
> > >>> * TestServiceAM
> > >>
> > >> I could not reproduce the test failures on my local.
> > >>
> > >> For TestFsck, if the failed test case is
> > testFsckListCorruptSnapshotFiles,
> > >> cherry-picking HDFS-15038 (fixing only test code) could be the fix.
> > >>
> > >> The failure of TestSLSStreamAMSynth looks frequently reported by
> > >> "Apache Hadoop qbt Report: trunk+JDK8 on Linux/x86_64".
> > >> It could be tagged as known flaky test.
> > >>
> > >> On 2022/07/20 9:15, Chris Nauroth wrote:
> > >>> -0 (binding)
> > >>>
> > >>> * Verified all checksums.
> > >>> * Verified all signatures.
> > >>> * Built from source, including native code on Linux.
> > >>>  * mvn clean package -Pnative -Psrc -Drequire.openssl
> > -Drequire.snappy
> > >>> -Drequire.zstd -DskipTests
> > >>> * Tests mostly passed, but see below.
> > >>>  * mvn --fail-never clean test -Pnative -Dparallel-tests
> > >>> -Drequire.snappy -Drequire.zstd -Drequire.openssl
> > >>> -Dsurefire.rerunFailingTestsCount=3 -DtestsThreadCount=8
> > >>>
> > >>> The following are new test failures for me on 3.2.4:
> > >>> * TestAMRMProxy
> > >>> * TestFsck
> > >>> * TestSLSStreamAMSynth
> > >>> * TestServiceAM
> > >>>
> > >>> The following tests also failed, but they also fail for me on 3.2.3,
> so
> > >>> they aren't likely to be related to this release candidate:
> > >>> * TestCapacitySchedulerNodeLabelUpdate
> > >>> * TestFrameworkUploader
> > >>> * TestSLSGenericSynth
> > >>> * TestSLSRunner
> > >>> * test_libhdfs_threaded_hdfspp_test_shim_static
> > >>>
> > >>> I'm not voting a full -1, because I haven't done any root cause
> > analysis on
> > >>> these new test failures. I don't know if it's a quirk to my
> > environment,
> > >>> though I'm using the start-build-env.sh Docker co

Re: [VOTE] Release Apache Hadoop 3.2.4 - RC0

2022-07-19 Thread Chris Nauroth
-0 (binding)

* Verified all checksums.
* Verified all signatures.
* Built from source, including native code on Linux.
* mvn clean package -Pnative -Psrc -Drequire.openssl -Drequire.snappy
-Drequire.zstd -DskipTests
* Tests mostly passed, but see below.
* mvn --fail-never clean test -Pnative -Dparallel-tests
-Drequire.snappy -Drequire.zstd -Drequire.openssl
-Dsurefire.rerunFailingTestsCount=3 -DtestsThreadCount=8

The following are new test failures for me on 3.2.4:
* TestAMRMProxy
* TestFsck
* TestSLSStreamAMSynth
* TestServiceAM

The following tests also failed, but they also fail for me on 3.2.3, so
they aren't likely to be related to this release candidate:
* TestCapacitySchedulerNodeLabelUpdate
* TestFrameworkUploader
* TestSLSGenericSynth
* TestSLSRunner
* test_libhdfs_threaded_hdfspp_test_shim_static

I'm not voting a full -1, because I haven't done any root cause analysis on
these new test failures. I don't know if it's a quirk to my environment,
though I'm using the start-build-env.sh Docker container, so any build
dependencies should be consistent. I'd be comfortable moving ahead if
others are seeing these tests pass.

Chris Nauroth


On Thu, Jul 14, 2022 at 7:57 AM Masatake Iwasaki 
wrote:

> +1 from myself.
>
> * skimmed the contents of site documentation.
>
> * built the source tarball on Rocky Linux 8 (x86_64) by OpenJDK 8 with
> `-Pnative`.
>
> * launched pseudo distributed cluster including kms and httpfs with
> Kerberos and SSL enabled.
>
>* created encryption zone, put and read files via httpfs.
>* ran example MR wordcount over encryption zone.
>
> * launched 3-node docker cluster with NN-HA and RM-HA enabled and ran some
> example MR jobs.
>
> * built HBase 2.4.11, Hive 3.1.2 and Spark 3.1.2 against Hadoop 3.2.4 RC0
>on CentOS 7 (x86_64) by using Bigtop branch-3.1 and ran smoke-tests.
>https://github.com/apache/bigtop/pull/942
>
>* Hive needs updating exclusion rule to address HADOOP-18088 (migration
> to reload4j).
>
> * built Spark 3.3.0 against Hadoop 3.2.4 RC0 using the staging repository::
>
>  
> staged
> staged-releases
> 
> https://repository.apache.org/content/repositories/orgapachehadoop-1354
> 
> 
>   true
> 
> 
>   true
> 
>   
>
> Thanks,
> Masatake Iwasaki
>
> On 2022/07/13 1:14, Masatake Iwasaki wrote:
> > Hi all,
> >
> > Here's Hadoop 3.2.4 release candidate #0:
> >
> > The RC is available at:
> >https://home.apache.org/~iwasakims/hadoop-3.2.4-RC0/
> >
> > The RC tag is at:
> >https://github.com/apache/hadoop/releases/tag/release-3.2.4-RC0
> >
> > The Maven artifacts are staged at:
> >
> https://repository.apache.org/content/repositories/orgapachehadoop-1354
> >
> > You can find my public key at:
> >https://downloads.apache.org/hadoop/common/KEYS
> >
> > Please evaluate the RC and vote.
> > The vote will be open for (at least) 5 days.
> >
> > Thanks,
> > Masatake Iwasaki
> >
> > -
> > To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
> > For additional commands, e-mail: common-dev-h...@hadoop.apache.org
> >
>
> -
> To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
> For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
>
>


Re: [VOTE] Release Apache Hadoop 2.10.2 - RC0

2022-05-29 Thread Chris Nauroth
+1 (binding)

* Verified all checksums.
* Verified all signatures.
* Built from source, including native code on Linux.
* mvn clean package -Pnative -Psrc -Drequire.openssl -Drequire.snappy
-Drequire.zstd -DskipTests
* Almost all unit tests passed.
* mvn clean test -Pnative -Dparallel-tests -Drequire.snappy
-Drequire.zstd -Drequire.openssl -Dsurefire.rerunFailingTestsCount=3
-DtestsThreadCount=8
* TestBookKeeperHACheckpoints consistently has a few failures.
* TestCapacitySchedulerNodeLabelUpdate is flaky, intermittently timing
out.

These test failures don't look significant enough to hold up a release, so
I'm still voting +1.

Chris Nauroth


On Sun, May 29, 2022 at 2:35 AM Masatake Iwasaki <
iwasak...@oss.nttdata.co.jp> wrote:

> Thanks for the help, Ayush.
>
> I committed HADOOP-16663/HADOOP-16664 and cherry-picked HADOOP-16985 to
> branch-2.10 (and branch-3.2).
> If I need to cut RC1, I will try cherry-picking them to branch-2.10.2
>
> Masatake Iwasaki
>
>
> On 2022/05/28 5:23, Ayush Saxena wrote:
> > The checksum stuff was addressed in HADOOP-16985, so that filename stuff
> is
> > sorted only post 3.3.x
> > BTW it is a known issue:
> >
> https://issues.apache.org/jira/browse/HADOOP-16494?focusedCommentId=16927236=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16927236
> >
> > Must not be a blocker for us
> >
> > The RAT check failing with dependency issue. That also should work post
> > 3.3.x because there is no Hadoop-maven-plugin dependency in
> Hadoop-yarn-api
> > module post 3.3.x, HADOOP-16560 removed it.
> > Ref:
> >
> https://github.com/apache/hadoop/pull/1496/files#diff-f5d219eaf211871f9527ae48da59586e7e9958ea7649de74a1393e599caa6dd6L121-R122
> >
> > So, that is why the RAT check passes for 3.3.x+ without the need of this
> > module. Committing HADOOP-16663, should solve this though.(I haven't
> tried
> > though, just by looking at the problem)
> >
> > Good to have patches, but doesn't look like blockers to me. kind of build
> > related stuffs only, nothing bad with our core Hadoop code.
> >
> > -Ayush
> >
> > On Sat, 28 May 2022 at 01:04, Viraj Jasani  wrote:
> >
> >> +0 (non-binding),
> >>
> >> * Signature/Checksum looks good, though I am not sure where
> >> "target/artifacts" is coming from for the tars, here is the diff (this
> was
> >> the case for 2.10.1 as well but checksum was correct):
> >>
> >> 1c1
> >> < SHA512 (hadoop-2.10.2-site.tar.gz) =
> >>
> >>
> 3055a830003f5012660d92da68a317e15da5b73301c2c73cf618e724c67b7d830551b16928e0c28c10b66f04567e4b6f0b564647015bacc4677e232c0011537f
> >> ---
> >>> SHA512 (target/artifacts/hadoop-2.10.2-site.tar.gz) =
> >>
> >>
> 3055a830003f5012660d92da68a317e15da5b73301c2c73cf618e724c67b7d830551b16928e0c28c10b66f04567e4b6f0b564647015bacc4677e232c0011537f
> >> 1c1
> >> < SHA512 (hadoop-2.10.2-src.tar.gz) =
> >>
> >>
> 483b6a4efd44234153e21ffb63a9f551530a1627f983a8837c655ce1b8ef13486d7178a7917ed3f35525c338e7df9b23404f4a1b0db186c49880448988b88600
> >> ---
> >>> SHA512 (target/artifacts/hadoop-2.10.2-src.tar.gz) =
> >>
> >>
> 483b6a4efd44234153e21ffb63a9f551530a1627f983a8837c655ce1b8ef13486d7178a7917ed3f35525c338e7df9b23404f4a1b0db186c49880448988b88600
> >> 1c1
> >> < SHA512 (hadoop-2.10.2.tar.gz) =
> >>
> >>
> 13e95907073d815e3f86cdcc24193bb5eec0374239c79151923561e863326988c7f32a05fb7a1e5bc962728deb417f546364c2149541d6234221b00459154576
> >> ---
> >>> SHA512 (target/artifacts/hadoop-2.10.2.tar.gz) =
> >>
> >>
> 13e95907073d815e3f86cdcc24193bb5eec0374239c79151923561e863326988c7f32a05fb7a1e5bc962728deb417f546364c2149541d6234221b00459154576
> >>
> >> However, checksums are correct.
> >>
> >> * Builds from source look good
> >>   - mvn clean install  -DskipTests
> >>   - mvn clean package  -Pdist -DskipTests -Dtar
> -Dmaven.javadoc.skip=true
> >>
> >> * Rat check, if run before building from source locally, fails with
> error:
> >>
> >> [ERROR] Plugin org.apache.hadoop:hadoop-maven-plugins:2.10.2 or one of
> its
> >> dependencies could not be resolved: Could not find artifact
> >> org.apache.hadoop:hadoop-maven-plugins:jar:2.10.2 in central (
> >> https://repo.maven.apache.org/maven2) -> [Help 1]
> >> [ERROR]
> >>
> >> However, once we build locally, rat check passes (because
> >> hadoop-maven-plugins 2.10.2 would be present in 

Re: [VOTE] Release Apache Hadoop 3.3.3 (RC1)

2022-05-16 Thread Chris Nauroth
+1 (binding)

- Verified all checksums.
- Verified all signatures.
- Built from source, including native code on Linux.
- Ran several examples successfully.

Chris Nauroth


On Mon, May 16, 2022 at 10:06 AM Chao Sun  wrote:

> +1
>
> - Compiled from source
> - Verified checksums & signatures
> - Launched a pseudo HDFS cluster and ran some simple commands
> - Ran full Spark tests with the RC
>
> Thanks Steve!
>
> Chao
>
> On Mon, May 16, 2022 at 2:19 AM Ayush Saxena  wrote:
> >
> > +1,
> > * Built from source.
> > * Successful native build on Ubuntu 18.04
> > * Verified Checksums.
> >
> (CHANGELOG.md,RELEASENOTES.md,hadoop-3.3.3-rat.txt,hadoop-3.3.3-site.tar.gz,hadoop-3.3.3-src.tar.gz,hadoop-3.3.3.tar.gz)
> > * Verified Signature.
> > * Successful RAT check
> > * Ran basic HDFS shell commands.
> > * Ran basic YARN shell commands.
> > * Verified version in hadoop version command and UI
> > * Ran some MR example Jobs.
> > * Browsed UI(Namenode/Datanode/ResourceManager/NodeManager/HistoryServer)
> > * Browsed the contents of Maven Artifacts.
> > * Browsed the contents of the website.
> >
> > Thanx Steve for driving the release, Good Luck!!!
> >
> > -Ayush
> >
> > On Mon, 16 May 2022 at 08:20, Xiaoqiao He  wrote:
> >
> > > +1(binding)
> > >
> > > * Verified signature and checksum of the source tarball.
> > > * Built the source code on Ubuntu and OpenJDK 11 by `mvn clean package
> > > -DskipTests -Pnative -Pdist -Dtar`.
> > > * Setup pseudo cluster with HDFS and YARN.
> > > * Run simple FsShell - mkdir/put/get/mv/rm and check the result.
> > > * Run example mr applications and check the result - Pi & wordcount.
> > > * Check the Web UI of NameNode/DataNode/Resourcemanager/NodeManager
> etc.
> > >
> > > Thanks Steve for your work.
> > >
> > > - He Xiaoqiao
> > >
> > > On Mon, May 16, 2022 at 4:25 AM Viraj Jasani 
> wrote:
> > > >
> > > > +1 (non-binding)
> > > >
> > > > * Signature: ok
> > > > * Checksum : ok
> > > > * Rat check (1.8.0_301): ok
> > > >  - mvn clean apache-rat:check
> > > > * Built from source (1.8.0_301): ok
> > > >  - mvn clean install  -DskipTests
> > > > * Built tar from source (1.8.0_301): ok
> > > >  - mvn clean package  -Pdist -DskipTests -Dtar
> -Dmaven.javadoc.skip=true
> > > >
> > > > HDFS, MapReduce and HBase (2.5) CRUD functional testing on
> > > > pseudo-distributed mode looks good.
> > > >
> > > >
> > > > On Wed, May 11, 2022 at 10:26 AM Steve Loughran
> > > 
> > > > wrote:
> > > >
> > > > > I have put together a release candidate (RC1) for Hadoop 3.3.3
> > > > >
> > > > > The RC is available at:
> > > > > https://dist.apache.org/repos/dist/dev/hadoop/3.3.3-RC1/
> > > > >
> > > > > The git tag is release-3.3.3-RC1, commit d37586cbda3
> > > > >
> > > > > The maven artifacts are staged at
> > > > >
> > >
> https://repository.apache.org/content/repositories/orgapachehadoop-1349/
> > > > >
> > > > > You can find my public key at:
> > > > > https://dist.apache.org/repos/dist/release/hadoop/common/KEYS
> > > > >
> > > > > Change log
> > > > >
> https://dist.apache.org/repos/dist/dev/hadoop/3.3.3-RC1/CHANGELOG.md
> > > > >
> > > > > Release notes
> > > > >
> > >
> https://dist.apache.org/repos/dist/dev/hadoop/3.3.3-RC1/RELEASENOTES.md
> > > > >
> > > > > There's a very small number of changes, primarily critical
> > > code/packaging
> > > > > issues and security fixes.
> > > > >
> > > > > * The critical fixes which shipped in the 3.2.3 release.
> > > > > * CVEs in our code and dependencies
> > > > > * Shaded client packaging issues.
> > > > > * A switch from log4j to reload4j
> > > > >
> > > > > reload4j is an active fork of the log4j 1.17 library with the
> classes
> > > > > which contain CVEs removed. Even though hadoop never used those
> > > classes,
> > > > > they regularly raised alerts on security scans and concen from
> users.
> > > > > Switching to the forked project allows us to ship a secure logging
> > > > > framework. It will complicate the builds of downstream
> > > > > maven/ivy/gradle projects which exclude our log4j artifacts, as
> they
> > > > > need to cut the new dependency instead/as well.
> > > > >
> > > > > See the release notes for details.
> > > > >
> > > > > This is the second release attempt. It is the same git commit as
> > > before,
> > > > > but
> > > > > fully recompiled with another republish to maven staging, which
> has bee
> > > > > verified by building spark, as well as a minimal test project.
> > > > >
> > > > > Please try the release and vote. The vote will run for 5 days.
> > > > >
> > > > > -Steve
> > > > >
> > >
> > > -
> > > To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
> > > For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
> > >
> > >
>
> -
> To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org
> For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org
>
>


Re: [DISCUSS] Hadoop 3.3.2 release?

2021-09-08 Thread Chris Nauroth
+1

Chao, thank you very much for volunteering on the release.

Chris Nauroth


On Tue, Sep 7, 2021 at 10:00 PM Igor Dvorzhak 
wrote:

> +1
>
> On Tue, Sep 7, 2021 at 10:06 AM Chao Sun  wrote:
>
>> Hi all,
>>
>> It has been almost 3 months since the 3.3.1 release and branch-3.3 has
>> accumulated quite a few commits (118 atm). In particular, Spark community
>> recently found an issue which prevents one from using the shaded Hadoop
>> client together with certain compression codecs such as lz4 and snappy
>> codec. The details are recorded in HADOOP-17891 and SPARK-36669.
>>
>> Therefore, I'm wondering if anyone is also interested in a 3.3.2 release.
>> If there is no objection, I'd like to volunteer myself for the work as
>> well.
>>
>> Best Regards,
>> Chao
>>
>


Re: [VOTE] Release Apache Hadoop 2.7.3 RC1

2016-08-18 Thread Chris Nauroth
Andrew, thanks for adding your perspective on this.

What is a realistic strategy for us to evolve the HDFS audit log in a 
backward-compatible way?  If the API is essentially any form of ad-hoc 
scripting, then for any proposed audit log format change, I can find a reason 
to veto it on grounds of backward incompatibility.

- I can’t add a new field on the end, because that would break an awk script 
that uses $NF expecting to find a specific field.
- I can’t prepend a new field, because that would break a "cut -f1" expecting 
to find the timestamp.
- HDFS can’t add any new features, because someone might have written a script 
that does "exit 1" if it finds an unexpected RPC in the "cmd=" field.
- Hadoop is not allowed to add full IPv6 support, because someone might have 
written a script that looks at the "ip=" field and parses it by IPv4 syntax.

On the CLI, a potential solution for evolving the output is to preserve the old 
format by default and only enable the new format if the user explicitly passes 
a new argument.  What should we do for the audit log?  Configuration flags in 
hdfs-site.xml?  (That of course adds its own brand of complexity.)

I’m particularly interested to hear potential solutions from people like Andrew 
and Allen who have been most vocal about the need for a stable format.  Without 
a solution, this unfortunately devolves into the format being frozen within a 
major release line.

We could benefit from getting a patch on the compatibility doc that addresses 
the HDFS audit log specifically. 

--Chris Nauroth

On 8/18/16, 8:47 AM, "Andrew Purtell" <andrew.purt...@gmail.com> wrote:

An incompatible APIs change is developer unfriendly. An incompatible 
behavioral change is operator unfriendly. Historically, one dimension of 
incompatibility has had a lot more mindshare than the other. It's great that 
this might be changing for the better. 

Where I work when we move from one Hadoop 2.x minor to another we always 
spend time updating our deployment plans, alerting, log scraping, and related 
things due to changes. Some are debatable as if qualifying for the 
'incompatible' designation. I think the audit logging change that triggered 
this discussion is a good example of one that does. If you want to audit HDFS 
actions those log emissions are your API. (Inotify doesn't offer access control 
events.) One has to code regular expressions for parsing them and reverse 
engineer under what circumstances an audit line is emitted so you can make 
assumptions about what transpired. Change either and you might break someone's 
automation for meeting industry or legal compliance obligations. Not a trivial 
matter. If you don't operate Hadoop in production you might not realize the 
implications of such a change. Glad to see Hadoop has community diversity to 
recognize it in some cases. 

> On Aug 18, 2016, at 6:57 AM, Junping Du <j...@hortonworks.com> wrote:
> 
> I think Allen's previous comments are very misleading. 
> In my understanding, only incompatible API (RPC, CLIs, WebService, etc.) 
shouldn't land on branch-2, but other incompatible behaviors (logs, audit-log, 
daemon's restart, etc.) should get flexible for landing. Otherwise, how could 
52 issues ( https://s.apache.org/xJk5) marked with incompatible-changes could 
get landed on branch-2 after 2.2.0 release? Most of them are already released. 
> 
> Thanks,
> 
> Junping
> 
> From: Vinod Kumar Vavilapalli <vino...@apache.org>
> Sent: Wednesday, August 17, 2016 9:29 PM
> To: Allen Wittenauer
> Cc: common-...@hadoop.apache.org; hdfs-...@hadoop.apache.org; 
yarn-...@hadoop.apache.org; mapreduce-dev@hadoop.apache.org
> Subject: Re: [VOTE] Release Apache Hadoop 2.7.3 RC1
> 
> I always look at CHANGES.txt entries for incompatible-changes and this 
JIRA obviously wasn’t there.
> 
> Anyways, this shouldn’t be in any of branch-2.* as committers there 
clearly mentioned that this is an incompatible change.
> 
> I am reverting the patch from branch-2* .
> 
> Thanks
> +Vinod
> 
>> On Aug 16, 2016, at 9:29 PM, Allen Wittenauer 
<a...@effectivemachines.com> wrote:
>> 
>> 
>> 
>> -1
>> 
>> HDFS-9395 is an incompatible change:
>> 
>> a) Why is not marked as such in the changes file?
>> b) Why is an incompatible change in a micro release, much less a minor?
>> c) Where is the release note for this change?
>> 
>> 
>>> On Aug 12, 2016, at 9:45 AM, Vinod Kumar Vavilapalli 
<vino...@apache.org> wrote:
>>> 
>>> Hi all,
>>> 
>>> I've created a release candidate RC1

Apache MSDN Offer is Back

2016-07-19 Thread Chris Nauroth
A few months ago, we learned that the offer for ASF committers to get an MSDN 
license had gone away.  I'm happy to report that as of a few weeks ago, that 
offer is back in place.  For more details, committers can check out 
https://svn.apache.org/repos/private/committers and read 
donated-licenses/msdn.txt.

--Chris Nauroth


Re: ASF OS X Build Infrastructure

2016-05-21 Thread Chris Nauroth
Hi Ravi,

Something certainly seems off about that bootstrapping problem you encountered. 
 :-)  When I've done this, the artifact I downloaded was an .iso file, which I 
could then use to install a VirtualBox VM.

I'm now tuned into the discussion Sean referenced about the ASF MSDN program.  
I'll send another update when I have something more specific to share.

--Chris Nauroth

From: Ravi Prakash <ravihad...@gmail.com<mailto:ravihad...@gmail.com>>
Date: Friday, May 20, 2016 at 4:56 PM
To: Sean Busbey <bus...@cloudera.com<mailto:bus...@cloudera.com>>
Cc: Chris Nauroth <cnaur...@hortonworks.com<mailto:cnaur...@hortonworks.com>>, 
Steve Loughran <ste...@hortonworks.com<mailto:ste...@hortonworks.com>>, Hadoop 
Common <common-...@hadoop.apache.org<mailto:common-...@hadoop.apache.org>>, 
"mapreduce-dev@hadoop.apache.org<mailto:mapreduce-dev@hadoop.apache.org>" 
<mapreduce-dev@hadoop.apache.org<mailto:mapreduce-dev@hadoop.apache.org>>, 
"hdfs-...@hadoop.apache.org<mailto:hdfs-...@hadoop.apache.org>" 
<hdfs-...@hadoop.apache.org<mailto:hdfs-...@hadoop.apache.org>>, 
"yarn-...@hadoop.apache.org<mailto:yarn-...@hadoop.apache.org>" 
<yarn-...@hadoop.apache.org<mailto:yarn-...@hadoop.apache.org>>
Subject: Re: ASF OS X Build Infrastructure

FWIW, I was able to get a response from the form last month. I was issued a new 
MSDN subscriber ID using which I could have downloaded Microsoft Visual Studio 
(and some other products, I think). I was interested in downloading an image of 
Windows to run in a VM, but the downloader is. wait for it. an exe file 
:-) Haven't gotten around to begging someone with a Windows OS to run that 
image downloader.

On Fri, May 20, 2016 at 10:39 AM, Sean Busbey 
<bus...@cloudera.com<mailto:bus...@cloudera.com>> wrote:
Some talk about the MSDN-for-committers program recently passed by on a private
list. It's still active, it just changed homes within Microsoft. The
info should still be in the committer repo. If something is amiss
please let me know and I'll pipe up to the folks already plugged in to
confirming it's active.

On Fri, May 20, 2016 at 12:13 PM, Chris Nauroth
<cnaur...@hortonworks.com<mailto:cnaur...@hortonworks.com>> wrote:
> It's very disappointing to see that vanish.  I'm following up to see if I
> can learn more about what happened or if I can do anything to help
> reinstate it.
>
> --Chris Nauroth
>
>
>
>
> On 5/20/16, 6:11 AM, "Steve Loughran" 
> <ste...@hortonworks.com<mailto:ste...@hortonworks.com>> wrote:
>
>>
>>> On 20 May 2016, at 10:40, Lars Francke 
>>> <lars.fran...@gmail.com<mailto:lars.fran...@gmail.com>> wrote:
>>>
>>>>
>>>> Regarding lack of personal access to anything but Linux, I'll take
>>>>this as
>>>> an opportunity to remind everyone that ASF committers (not just
>>>>limited to
>>>> Hadoop committers) are entitled to a free MSDN license, which can get
>>>>you
>>>> a Windows VM for validating Windows issues and any patches that touch
>>>> cross-platform concerns, like the native code.  Contributors who are
>>>>not
>>>> committers still might struggle to get access to Windows, but all of us
>>>> reviewing and committing patches do have access.
>>>>
>>>
>>> Actually, from all I can tell this MSDN offer has been discontinued for
>>> now. All the information has been removed from the committers repo. Do
>>>you
>>> have any more up to date information on this?
>>>
>>
>>
>>That's interesting.
>>
>>I did an SVN update and it went away..looks like something happened on
>>April 26
>>
>>No idea, though the svn log has a bit of detail
>>
>>-
>>To unsubscribe, e-mail: 
>>mapreduce-dev-unsubscr...@hadoop.apache.org<mailto:mapreduce-dev-unsubscr...@hadoop.apache.org>
>>For additional commands, e-mail: 
>>mapreduce-dev-h...@hadoop.apache.org<mailto:mapreduce-dev-h...@hadoop.apache.org>
>>
>>
>
>
> -
> To unsubscribe, e-mail: 
> common-dev-unsubscr...@hadoop.apache.org<mailto:common-dev-unsubscr...@hadoop.apache.org>
> For additional commands, e-mail: 
> common-dev-h...@hadoop.apache.org<mailto:common-dev-h...@hadoop.apache.org>
>



--
busbey

-
To unsubscribe, e-mail: 
mapreduce-dev-unsubscr...@hadoop.apache.org<mailto:mapreduce-dev-unsubscr...@hadoop.apache.org>
For additional commands, e-mail: 
mapreduce-dev-h...@hadoop.apache.org<mailto:mapreduce-dev-h...@hadoop.apache.org>




Re: ASF OS X Build Infrastructure

2016-05-20 Thread Chris Nauroth
It's very disappointing to see that vanish.  I'm following up to see if I
can learn more about what happened or if I can do anything to help
reinstate it.

--Chris Nauroth




On 5/20/16, 6:11 AM, "Steve Loughran" <ste...@hortonworks.com> wrote:

>
>> On 20 May 2016, at 10:40, Lars Francke <lars.fran...@gmail.com> wrote:
>> 
>>> 
>>> Regarding lack of personal access to anything but Linux, I'll take
>>>this as
>>> an opportunity to remind everyone that ASF committers (not just
>>>limited to
>>> Hadoop committers) are entitled to a free MSDN license, which can get
>>>you
>>> a Windows VM for validating Windows issues and any patches that touch
>>> cross-platform concerns, like the native code.  Contributors who are
>>>not
>>> committers still might struggle to get access to Windows, but all of us
>>> reviewing and committing patches do have access.
>>> 
>> 
>> Actually, from all I can tell this MSDN offer has been discontinued for
>> now. All the information has been removed from the committers repo. Do
>>you
>> have any more up to date information on this?
>> 
>
>
>That's interesting.
>
>I did an SVN update and it went away..looks like something happened on
>April 26
>
>No idea, though the svn log has a bit of detail
>
>-
>To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org
>For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org
>
>


-
To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org



Re: ASF OS X Build Infrastructure

2016-05-19 Thread Chris Nauroth
Allen, thank you for doing this.

Regarding lack of personal access to anything but Linux, I'll take this as
an opportunity to remind everyone that ASF committers (not just limited to
Hadoop committers) are entitled to a free MSDN license, which can get you
a Windows VM for validating Windows issues and any patches that touch
cross-platform concerns, like the native code.  Contributors who are not
committers still might struggle to get access to Windows, but all of us
reviewing and committing patches do have access.

It has long been on my TODO list to set up similar Jenkins jobs for
Windows, but it keeps slipping.  I'll try once again to bump up priority.

--Chris Nauroth




On 5/19/16, 9:41 AM, "Allen Wittenauer" <a...@apache.org> wrote:

>   
>   Some of you may not know that the ASF actually does have an OS X machine
>(a Mac mini, so it¹s not a speed demon) in the build infrastructure.
>While messing around with getting all? of the trunk jobs reconfigured to
>do Java 8 and separate maven repos, I noticed that this box tends to sit
>idle most of the day. Why take advantage of it?  Therefore, I also setup
>two jobs for us to use to help alleviate the ³I don¹t have access to
>anything but Linux² excuse when writing code that may not work in a
>portable manner.
>
>Jobs #1:
>
>   https://builds.apache.org/view/H-L/view/Hadoop/job/Precommit-HADOOP-OSX
>
>   This basically runs Apache Yetus precommit with quite a few of the
>unnecessary tests disabled.  For example, there¹s no point in running
>checkstyle.  Note that this job takes the *full* JIRA issue id as input.
>So ŒHADOOP-9902¹ not Œ9902¹.  This allows for one Jenkins job to be used
>for all the Hadoop sub-projects (HADOOP, HDFS, MR, YARN).  ³But my code
>is on github and I don¹t want to upload a patch!²  I haven¹t tested it,
>but it should also take a URL, so just add a .diff to the end of your
>github compare URL and put that in the issue box.  It hypothetically
>should work.
>
>Job #2:
>
>   I¹m still hammering on this one because the email notifications aren¹t
>working to my satisfaction plus we have some extremely Linux-specific
>code in YARNŠ but 
>
>   
> https://builds.apache.org/view/H-L/view/Hadoop/job/hadoop-trunk-osx-java8
>/
>
>   Š is a ³build the world² job similar to what is currently running under
>the individual sub projects.  (This actually makes it one of the few
>³build everything² jobs we have running. Most of the other jobs only
>build that particular sub project.).  It does not run the full unit test
>suite and it also does not build all of the native code.  This gives us a
>place to start on our journey of making Hadoop actually, truly run
>everywhere.  (Interesting side note: It¹s been *extremely* consistent in
>what fails vs. the Linux build hosts.)
>
>   At some point, likely after YETUS-390 is complete, I¹ll switch this job
>over to be run by Apache Yetus in qbt mode so that it¹s actually easier
>to track failures across all dirs.  A huge advantage over raw maven
>commands.
>
>   Happy testing everyone.
>
>   NOTE: if you don¹t have access to launch jobs on builds.apache.org,
>you¹ll need to send a request to private@.  The Apache Hadoop PMC has the
>keys to give access to folks.
>
>
>
>-
>To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
>For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
>
>


-
To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org



Re: [VOTE] Merge feature branch HADOOP-12930

2016-05-16 Thread Chris Nauroth
Understood about the tests.

--Chris Nauroth




On 5/15/16, 7:30 AM, "Allen Wittenauer" <a...@apache.org> wrote:

>
>> On May 14, 2016, at 3:11 PM, Chris Nauroth <cnaur...@hortonworks.com>
>>wrote:
>> 
>> +1 (binding)
>> 
>> -Tried a dry-run merge of HADOOP-12930 to trunk.
>> -Successfully built distro on Windows.
>> -Ran "hdfs namenode", "hdfs datanode", and various interactive hdfs
>> commands through Cygwin.
>> -Reviewed documentation.
>> 
>> Allen, thank you for the contribution.  Would you please attach a full
>> patch to HADOOP-12930 to check pre-commit results?
>
>
>   Nope.  The whole reason this was done as a branch with multiple patches
>was to prevent Jenkins from getting overwhelmed since it would trigger
>full unit tests on pretty much the entire code base….
>
>> While testing this, I discovered a bug in the distro build for Windows.
>> Could someone please code review my patch on HADOOP-13149?
>
>   Done!
>
>> 
>> --Chris Nauroth
>> 
>> 
>> 
>> 
>> On 5/9/16, 1:26 PM, "Allen Wittenauer" <a...@apache.org> wrote:
>> 
>>> 
>>> Hey gang!
>>> 
>>> I¹d like to call a vote to run for 7 days (ending May 16 at 13:30 PT)
>>>to
>>> merge the HADOOP-12930 feature branch into trunk. This branch was
>>> developed exclusively by me as per the discussion two months ago as a
>>>way
>>> to make what would be a rather large patch hopefully easier to review.
>>> The vast majority of the branch is code movement in the same file,
>>> additional license headers, maven assembly hooks for distribution, and
>>> variable renames. Not a whole lot of new code, but a big diff file
>>> none-the-less.
>>> 
>>> This branch modifies the Œhadoop¹, Œhdfs¹, Œmapred¹, and Œyarn¹
>>>commands
>>> to allow for subcommands to be added or modified at runtime.  This
>>>allows
>>> for individual users or entire sites to tweak the execution environment
>>> to suit their local needs.  For example, it has been a practice for
>>>some
>>> locations to change the distcp jar out for a custom one.  Using this
>>> functionality, it is possible that the Œhadoop distcp¹ command could
>>>run
>>> the local version without overwriting the bundled jar and for existing
>>> documentation (read: results from Internet searches) to work as written
>>> without modification. This has the potential to be a huge win,
>>>especially
>>> for:
>>> 
>>> * advanced end users looking to supplement the Apache Hadoop
>>>experience
>>> * operations teams that may be able to leverage existing
>>>documentation
>>> without having to remain local ³exception² docs
>>> * development groups wanting an easy way to trial experimental
>>>features
>>> 
>>> Additionally, this branch includes the following, related changes:
>>> 
>>> * Adds the first unit tests for the Œhadoop¹ command
>>> * Adds the infrastructure for hdfs script testing and the first 
>>> unit
>>> test for the Œhdfs¹ command
>>> * Modifies the hadoop-tools components to be dynamic rather 
>>> than hard
>>> coded
>>> * Renames the shell profiles for hdfs, mapred, and yarn to be
>>> consistent with other bundled profiles, including the ones introduced
>>>in
>>> this branch
>>> 
>>> Documentation, including a Œhello world¹-style example, is in the
>>> UnixShellGuide markdown file.  (Of course!)
>>> 
>>>  I am at ApacheCon this week if anyone wants to discuss in-depth.
>>> 
>>> Thanks!
>>> 
>>> P.S.,
>>> 
>>> There are still two open sub-tasks.  These are blocked by other issues
>>> so that we may add unit testing to the shell code in those respective
>>> areas.  I¹ll covert to full issues after HADOOP-12930 is closed.
>>> 
>>> 
>>> -
>>> To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
>>> For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
>>> 
>>> 
>> 
>
>


-
To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org


Re: [VOTE] Merge feature branch HADOOP-12930

2016-05-14 Thread Chris Nauroth
+1 (binding)

-Tried a dry-run merge of HADOOP-12930 to trunk.
-Successfully built distro on Windows.
-Ran "hdfs namenode", "hdfs datanode", and various interactive hdfs
commands through Cygwin.
-Reviewed documentation.

Allen, thank you for the contribution.  Would you please attach a full
patch to HADOOP-12930 to check pre-commit results?

While testing this, I discovered a bug in the distro build for Windows.
Could someone please code review my patch on HADOOP-13149?

--Chris Nauroth




On 5/9/16, 1:26 PM, "Allen Wittenauer" <a...@apache.org> wrote:

>
>   Hey gang!
>
>   I¹d like to call a vote to run for 7 days (ending May 16 at 13:30 PT) to
>merge the HADOOP-12930 feature branch into trunk. This branch was
>developed exclusively by me as per the discussion two months ago as a way
>to make what would be a rather large patch hopefully easier to review.
>The vast majority of the branch is code movement in the same file,
>additional license headers, maven assembly hooks for distribution, and
>variable renames. Not a whole lot of new code, but a big diff file
>none-the-less.
>
>   This branch modifies the Œhadoop¹, Œhdfs¹, Œmapred¹, and Œyarn¹ commands
>to allow for subcommands to be added or modified at runtime.  This allows
>for individual users or entire sites to tweak the execution environment
>to suit their local needs.  For example, it has been a practice for some
>locations to change the distcp jar out for a custom one.  Using this
>functionality, it is possible that the Œhadoop distcp¹ command could run
>the local version without overwriting the bundled jar and for existing
>documentation (read: results from Internet searches) to work as written
>without modification. This has the potential to be a huge win, especially
>for:
>   
>   * advanced end users looking to supplement the Apache Hadoop 
> experience
>   * operations teams that may be able to leverage existing 
> documentation
>without having to remain local ³exception² docs
>   * development groups wanting an easy way to trial experimental 
> features
>
>   Additionally, this branch includes the following, related changes:
>
>   * Adds the first unit tests for the Œhadoop¹ command
>   * Adds the infrastructure for hdfs script testing and the first 
> unit
>test for the Œhdfs¹ command
>   * Modifies the hadoop-tools components to be dynamic rather 
> than hard
>coded
>   * Renames the shell profiles for hdfs, mapred, and yarn to be
>consistent with other bundled profiles, including the ones introduced in
>this branch
>
>   Documentation, including a Œhello world¹-style example, is in the
>UnixShellGuide markdown file.  (Of course!)
>
>I am at ApacheCon this week if anyone wants to discuss in-depth.
>
>   Thanks!
>
>P.S.,
>
>   There are still two open sub-tasks.  These are blocked by other issues
>so that we may add unit testing to the shell code in those respective
>areas.  I¹ll covert to full issues after HADOOP-12930 is closed.
>
>
>-
>To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
>For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
>
>


-
To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org



Re: [DISCUSS] Treating LimitedPrivate({"MapReduce"}) as Public APIs for YARN applications

2016-05-10 Thread Chris Nauroth
Yes, I agree with you Andrew.

Sorry, I should clarify my prior response.  I didn't mean to imply a blind 
s/LimitedPrivate/Public/g across the whole codebase.  Instead, I'm +1 for the 
intent of HADOOP-10776: a transition to Public for UserGroupInformation, and by 
extension the related parts of its API like Credentials.

I'm in the camp that generally questions the usefulness of LimitedPrivate, but 
I agree that transitions to Public need case-by-case consideration.

--Chris Nauroth

From: Andrew Wang <andrew.w...@cloudera.com<mailto:andrew.w...@cloudera.com>>
Date: Tuesday, May 10, 2016 at 2:40 PM
To: Chris Nauroth <cnaur...@hortonworks.com<mailto:cnaur...@hortonworks.com>>
Cc: Hitesh Shah <hit...@apache.org<mailto:hit...@apache.org>>, 
"yarn-...@hadoop.apache.org<mailto:yarn-...@hadoop.apache.org>" 
<yarn-...@hadoop.apache.org<mailto:yarn-...@hadoop.apache.org>>, 
"mapreduce-dev@hadoop.apache.org<mailto:mapreduce-dev@hadoop.apache.org>" 
<mapreduce-dev@hadoop.apache.org<mailto:mapreduce-dev@hadoop.apache.org>>, 
"common-...@hadoop.apache.org<mailto:common-...@hadoop.apache.org>" 
<common-...@hadoop.apache.org<mailto:common-...@hadoop.apache.org>>
Subject: Re: [DISCUSS] Treating LimitedPrivate({"MapReduce"}) as Public APIs 
for YARN applications

Why don't we address these on a case-by-case basis, changing the annotations on 
these key classes to Public? LimitedPrivate{"YARN applications"} is the same 
thing as Public.

This way we don't need to add special exceptions to our compatibility policy. 
Keeps it simple.

Best,
Andrew

On Tue, May 10, 2016 at 2:26 PM, Chris Nauroth 
<cnaur...@hortonworks.com<mailto:cnaur...@hortonworks.com>> wrote:
+1 for transitioning from LimitedPrivate to Public.

I view this as an extension of the need for UserGroupInformation and
related APIs to be Public.  Regardless of the original intent behind
LimitedPrivate, these are de facto public now, because there is no viable
alternative for applications that want to integrate with a secured Hadoop
cluster.

There is prior discussion of this topic on HADOOP-10776 and HADOOP-12913.
HADOOP-10776 is a blocker for 2.8.0 to make the transition to Public.

--Chris Nauroth




On 5/10/16, 11:34 AM, "Hitesh Shah" 
<hit...@apache.org<mailto:hit...@apache.org>> wrote:

>There seems to be some incorrect assumptions on why the application had
>an issue. For rolling upgrade deployments, the application bundles the
>client-side jars that it was compiled against and uses them in its
>classpath and expects to be able to communicate with upgraded servers.
>Given that hadoop-common is a monolithic jar, it ends up being used on
>both client-side and server-side. The problem in this case was caused by
>the fact that the ResourceManager was generating the credentials file
>with a format understood only by hadoop-common from 3.x. For an
>application compiled against 2.x and has *only* hadoop-common from 2.x on
>its classpath, trying to read this file fails.
>
>This is not about whether internal implementations can change for
>non-public APIs. The file format for the Credential file in this scenario
>is *not* internal implementation especially when you can have different
>versions of the library trying to read the file. If an older client is
>talking to a newer versioned server, the general backward compat
>assumption is that the client should receive a response that it can parse
>and understand. In this scenario, the credentials file provided to the
>YARN app by the RM should have been written out with the older version or
>at the very least been readable by the older hadoop-common.jar.
>
>In any case, does anyone have any specific concerns with changing
>LimitedPrivate({"MapReduce"}) to Public?
>
>And sure, if we are saying that Hadoop-3.x requires all apps built
>against it to go through a full re-compile as well as downtime as
>existing apps may no longer work out of the box, lets call it out very
>explicitly in the Release notes.
>
>‹ Hitesh
>
>> On May 10, 2016, at 9:24 AM, Allen Wittenauer
>><allenwittena...@yahoo.com<mailto:allenwittena...@yahoo.com>> wrote:
>>
>>
>>> On May 10, 2016, at 8:37 AM, Hitesh Shah 
>>> <hit...@apache.org<mailto:hit...@apache.org>> wrote:
>>>
>>> There have been various discussions on various JIRAs where upstream
>>>projects such as YARN apps ( Tez, Slider, etc ) are called out for
>>>using the above so-called Private APIs. A lot of YARN applications that
>>>have been built out have picked up various bits and pieces of
>>>implementation from MapReduce and DistributedShell to get things to
>>>work.
>>>
>&g

Re: [DISCUSS] Treating LimitedPrivate({"MapReduce"}) as Public APIs for YARN applications

2016-05-10 Thread Chris Nauroth
+1 for transitioning from LimitedPrivate to Public.

I view this as an extension of the need for UserGroupInformation and
related APIs to be Public.  Regardless of the original intent behind
LimitedPrivate, these are de facto public now, because there is no viable
alternative for applications that want to integrate with a secured Hadoop
cluster.

There is prior discussion of this topic on HADOOP-10776 and HADOOP-12913.
HADOOP-10776 is a blocker for 2.8.0 to make the transition to Public.

--Chris Nauroth




On 5/10/16, 11:34 AM, "Hitesh Shah" <hit...@apache.org> wrote:

>There seems to be some incorrect assumptions on why the application had
>an issue. For rolling upgrade deployments, the application bundles the
>client-side jars that it was compiled against and uses them in its
>classpath and expects to be able to communicate with upgraded servers.
>Given that hadoop-common is a monolithic jar, it ends up being used on
>both client-side and server-side. The problem in this case was caused by
>the fact that the ResourceManager was generating the credentials file
>with a format understood only by hadoop-common from 3.x. For an
>application compiled against 2.x and has *only* hadoop-common from 2.x on
>its classpath, trying to read this file fails.
>
>This is not about whether internal implementations can change for
>non-public APIs. The file format for the Credential file in this scenario
>is *not* internal implementation especially when you can have different
>versions of the library trying to read the file. If an older client is
>talking to a newer versioned server, the general backward compat
>assumption is that the client should receive a response that it can parse
>and understand. In this scenario, the credentials file provided to the
>YARN app by the RM should have been written out with the older version or
>at the very least been readable by the older hadoop-common.jar.
>
>In any case, does anyone have any specific concerns with changing
>LimitedPrivate({"MapReduce"}) to Public?
>
>And sure, if we are saying that Hadoop-3.x requires all apps built
>against it to go through a full re-compile as well as downtime as
>existing apps may no longer work out of the box, lets call it out very
>explicitly in the Release notes.
>
>‹ Hitesh
>
>> On May 10, 2016, at 9:24 AM, Allen Wittenauer
>><allenwittena...@yahoo.com> wrote:
>> 
>> 
>>> On May 10, 2016, at 8:37 AM, Hitesh Shah <hit...@apache.org> wrote:
>>> 
>>> There have been various discussions on various JIRAs where upstream
>>>projects such as YARN apps ( Tez, Slider, etc ) are called out for
>>>using the above so-called Private APIs. A lot of YARN applications that
>>>have been built out have picked up various bits and pieces of
>>>implementation from MapReduce and DistributedShell to get things to
>>>work.
>>> 
>>> A recent example is a backward incompatible change introduced ( where
>>>the API is not even directly invoked ) in the Credentials class related
>>>to the ability to read tokens/credentials from a file.
>> 
>>  Let¹s be careful here.  It should be noted that the problem happened
>>primarily because the application jar appears to have included some
>>hadoop jars in them.   So the API invocation isn¹t the problem:  it¹s
>>the fact that the implementation under the hood changed.  If the
>>application jar didn¹t bundle hadoop jars ‹especially given that were
>>already on the classpath--this problem should never have happened.
>> 
>>> This functionality is required by pretty much everyone as YARN
>>>provides the credentials to the app by writing the credentials/tokens
>>>to a local file which is read in when
>>>UserGroupInformation.getCurrentUser() is invoked.
>> 
>>  What you¹re effectively arguing is that implementations should never
>>change for public (and in this case LimitedPrivate) APIs.  I don¹t think
>>that¹s reasonable.  Hadoop is filled with changes in major branches
>>where the implementations have changed but the internals have been
>>reworked to perform the work in a slightly different manner.
>> 
>>> This change breaks rolling upgrades for yarn applications from 2.x to
>>>3.x (whether we end up supporting rolling upgrades across 2.x to 3.x is
>>>a separate discussion )
>> 
>>  
>>  At least today, according to the document attached to YARN-666 (lol),
>>rolling upgrades are only supported within the same major version.
>> 
>>> 
>>> I would like to change our compatibility docs to state that any API
>>>that is marked as LimitedPrivate{Mapreduce} impl

Re: [Release thread] 2.8.0 release activities

2016-02-04 Thread Chris Nauroth
FYI, I've just needed to raise HDFS-9761 to blocker status for the 2.8.0
release.

--Chris Nauroth




On 2/3/16, 6:19 PM, "Karthik Kambatla" <ka...@cloudera.com> wrote:

>Thanks Vinod. Not labeling 2.8.0 stable sounds perfectly reasonable to me.
>Let us not call it alpha or beta though, it is quite confusing. :)
>
>On Wed, Feb 3, 2016 at 8:17 PM, Gangumalla, Uma <uma.ganguma...@intel.com>
>wrote:
>
>> Thanks Vinod. +1 for 2.8 release start.
>>
>> Regards,
>> Uma
>>
>> On 2/3/16, 3:53 PM, "Vinod Kumar Vavilapalli" <vino...@apache.org>
>>wrote:
>>
>> >Seems like all the features listed in the Roadmap wiki are in. I¹m
>>going
>> >to try cutting an RC this weekend for a first/non-stable release off of
>> >branch-2.8.
>> >
>> >Let me know if anyone has any objections/concerns.
>> >
>> >Thanks
>> >+Vinod
>> >
>> >> On Nov 25, 2015, at 5:59 PM, Vinod Kumar Vavilapalli
>> >><vino...@apache.org> wrote:
>> >>
>> >> Branch-2.8 is created.
>> >>
>> >> As mentioned before, the goal on branch-2.8 is to put improvements /
>> >>fixes to existing features with a goal of converging on an alpha
>>release
>> >>soon.
>> >>
>> >> Thanks
>> >> +Vinod
>> >>
>> >>
>> >>> On Nov 25, 2015, at 5:30 PM, Vinod Kumar Vavilapalli
>> >>><vino...@apache.org> wrote:
>> >>>
>> >>> Forking threads now in order to track all things related to the
>> >>>release.
>> >>>
>> >>> Creating the branch now.
>> >>>
>> >>> Thanks
>> >>> +Vinod
>> >>>
>> >>>
>> >>>> On Nov 25, 2015, at 11:37 AM, Vinod Kumar Vavilapalli
>> >>>><vino...@apache.org> wrote:
>> >>>>
>> >>>> I think we¹ve converged at a high level w.r.t 2.8. And as I just
>>sent
>> >>>>out an email, I updated the Roadmap wiki reflecting the same:
>> >>>>https://wiki.apache.org/hadoop/Roadmap
>> >>>><https://wiki.apache.org/hadoop/Roadmap>
>> >>>>
>> >>>> I plan to create a 2.8 branch EOD today.
>> >>>>
>> >>>> The goal for all of us should be to restrict improvements & fixes
>>to
>> >>>>only (a) the feature-set documented under 2.8 in the RoadMap wiki
>>and
>> >>>>(b) other minor features that are already in 2.8.
>> >>>>
>> >>>> Thanks
>> >>>> +Vinod
>> >>>>
>> >>>>
>> >>>>> On Nov 11, 2015, at 12:13 PM, Vinod Kumar Vavilapalli
>> >>>>><vino...@hortonworks.com <mailto:vino...@hortonworks.com>> wrote:
>> >>>>>
>> >>>>> - Cut a branch about two weeks from now
>> >>>>> - Do an RC mid next month (leaving ~4weeks since branch-cut)
>> >>>>> - As with 2.7.x series, the first release will still be called as
>> >>>>>early / alpha release in the interest of
>> >>>>>   ‹ gaining downstream adoption
>> >>>>>   ‹ wider testing,
>> >>>>>   ‹ yet reserving our right to fix any inadvertent
>>incompatibilities
>> >>>>>introduced.
>> >>>>
>> >>>
>> >>
>> >
>>
>>



[jira] [Created] (MAPREDUCE-6563) Streaming documentation contains a stray '%' character.

2015-12-04 Thread Chris Nauroth (JIRA)
Chris Nauroth created MAPREDUCE-6563:


 Summary: Streaming documentation contains a stray '%' character.
 Key: MAPREDUCE-6563
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6563
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: documentation
Reporter: Chris Nauroth
Assignee: Chris Nauroth
Priority: Trivial


We have an unneeded '%' character above the title in this page.

http://hadoop.apache.org/docs/r2.7.1/hadoop-streaming/HadoopStreaming.html




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MAPREDUCE-6565) Configuration to use host name in delegation token service is not read from job.xml during MapReduce job execution.

2015-12-04 Thread Chris Nauroth (JIRA)
Chris Nauroth created MAPREDUCE-6565:


 Summary: Configuration to use host name in delegation token 
service is not read from job.xml during MapReduce job execution.
 Key: MAPREDUCE-6565
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6565
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Reporter: Chris Nauroth


By default, the service field of a delegation token is populated based on 
server IP address.  Setting {{hadoop.security.token.service.use_ip}} to 
{{false}} changes this behavior to use host name instead of IP address.  
However, this configuration property is not read from job.xml.  Instead, it's 
read from a separate {{Configuration}} instance created during static 
initialization of {{SecurityUtil}}.  This does not work correctly with 
MapReduce jobs if the framework is distributed by setting 
{{mapreduce.application.framework.path}} and the 
{{mapreduce.application.classpath}} is isolated to avoid reading core-site.xml 
from the cluster nodes.  MapReduce tasks will fail to authenticate to HDFS, 
because they'll try to find a delegation token based on the NameNode IP 
address, even though at job submission time the tokens were generated using the 
host name.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: [DISCUSS] Looking to a 2.8.0 release

2015-11-25 Thread Chris Nauroth
+1.  Thanks, Vinod.

--Chris Nauroth




On 11/25/15, 1:45 PM, "Vinod Kumar Vavilapalli" <vino...@apache.org> wrote:

>Okay, tx for this clarification Chris! I dug more into this and now
>realized the actual scope of this. Given the the limited nature of this
>feature (non-Namenode etc) and the WIP nature of the larger umbrella
>HADOOP-11744, we will ship the feature but I’ll stop calling this out as
>a notable feature.
>
>Thanks
>+Vinod
>
>
>> On Nov 25, 2015, at 12:04 PM, Chris Nauroth <cnaur...@hortonworks.com>
>>wrote:
>> 
>> Hi Vinod,
>> 
>> The HDFS-8155 work is complete in branch-2 already, so feel free to
>> include it in the roadmap.
>> 
>> For those watching the thread that aren't familiar with HDFS-8155, I
>>want
>> to call out that it was a client-side change only.  The WebHDFS client
>>is
>> capable of obtaining OAuth2 tokens and passing them along in its HTTP
>> requests.  The NameNode and DataNode server side currently do not have
>>any
>> support for OAuth2, so overall, this feature is only useful in some very
>> unique deployment architectures right now.  This is all discussed
>> explicitly in documentation committed with HDFS-8155, but I wanted to
>> prevent any mistaken assumptions for people only reading this thread.
>> 
>> --Chris Nauroth
>> 
>> 
>> 
>> 
>> On 11/25/15, 11:08 AM, "Vinod Kumar Vavilapalli" <vino...@apache.org>
>> wrote:
>> 
>>> This is the current state from the feedback I gathered.
>>> - Support priorities across applications within the same queue
>>>YARN-1963
>>>   ‹ Can push as an alpha / beta feature per Sunil
>>> - YARN-1197 Support changing resources of an allocated container:
>>>   ‹ Can push as an alpha/beta feature per Wangda
>>> - YARN-3611 Support Docker Containers In LinuxContainerExecutor: Well
>>> most of it anyways.
>>>   ‹ Can push as an alpha feature.
>>> - YARN Timeline Service v1.5 - YARN-4233
>>>   ‹ Should include per Li Lu
>>> - YARN Timeline Service Next generation: YARN-2928
>>>   ‹ Per analysis from Sangjin, drop this from 2.8.
>>> 
>>> One open feature status
>>> - HDFS-8155Support OAuth2 in WebHDFS: Alpha / Early feature?
>>> 
>>> Updated the Roadmap wiki with the same.
>>> 
>>> Thanks
>>> +Vinod
>>> 
>>>> On Nov 13, 2015, at 12:12 PM, Sangjin Lee <sj...@apache.org> wrote:
>>>> 
>>>> I reviewed the current state of the YARN-2928 changes regarding its
>>>> impact
>>>> if the timeline service v.2 is disabled. It does appear that there
>>>>are a
>>>> lot of things that still do get created and enabled unconditionally
>>>> regardless of configuration. While this is understandable when we were
>>>> working to implement the feature, this clearly needs to be cleaned up
>>>>so
>>>> that when disabled the timeline service v.2 doesn't impact other
>>>>things.
>>>> 
>>>> I filed a JIRA for that work:
>>>> https://issues.apache.org/jira/browse/YARN-4356
>>>> 
>>>> We need to complete it before we can merge.
>>>> 
>>>> Somewhat related is the status of the configuration and what it means
>>>>in
>>>> various contexts (client/app-side vs. server-side, v.1 vs. v.2,
>>>>etc.). I
>>>> know there is an ongoing discussion regarding YARN-4183. We'll need to
>>>> reflect the outcome of that discussion.
>>>> 
>>>> My overall impression of whether this can be done for 2.8 is that it
>>>> looks
>>>> rather challenging given the suggested timeframe. We also need to
>>>> complete
>>>> several major tasks before it is ready.
>>>> 
>>>> Sangjin
>>>> 
>>>> 
>>>> On Wed, Nov 11, 2015 at 5:49 PM, Sangjin Lee <sjl...@gmail.com> wrote:
>>>> 
>>>>> 
>>>>> On Wed, Nov 11, 2015 at 12:13 PM, Vinod Vavilapalli <
>>>>> vino...@hortonworks.com> wrote:
>>>>> 
>>>>>>   ‹ YARN Timeline Service Next generation: YARN-2928: Lots of
>>>>>> momentum,
>>>>>> but clearly a work in progress. Two options here
>>>>>>   ‹ If it is safe to ship it into 2.8 in a disable manner, we
>>>>>>can
>>>>>> get the early code into trunk and all the way int o2.8.
>>>>>>   ‹ If it is not safe, it organically rolls over into 2.9
>>>>>> 
>>>>> 
>>>>> I'll review the changes on YARN-2928 to see what impact it has (if
>>>>> any) if
>>>>> the timeline service v.2 is disabled.
>>>>> 
>>>>> Another condition for it to make 2.8 is whether the branch will be
>>>>>in a
>>>>> shape in a couple of weeks such that it adds value for folks that
>>>>>want
>>>>> to
>>>>> test it. Hopefully it will become clearer soon.
>>>>> 
>>>>> Sangjin
>>>>> 
>>> 
>> 
>> 
>
>



Re: [DISCUSS] Looking to a 2.8.0 release

2015-11-25 Thread Chris Nauroth
Hi Vinod,

The HDFS-8155 work is complete in branch-2 already, so feel free to
include it in the roadmap.

For those watching the thread that aren't familiar with HDFS-8155, I want
to call out that it was a client-side change only.  The WebHDFS client is
capable of obtaining OAuth2 tokens and passing them along in its HTTP
requests.  The NameNode and DataNode server side currently do not have any
support for OAuth2, so overall, this feature is only useful in some very
unique deployment architectures right now.  This is all discussed
explicitly in documentation committed with HDFS-8155, but I wanted to
prevent any mistaken assumptions for people only reading this thread.

--Chris Nauroth




On 11/25/15, 11:08 AM, "Vinod Kumar Vavilapalli" <vino...@apache.org>
wrote:

>This is the current state from the feedback I gathered.
> - Support priorities across applications within the same queue YARN-1963
>‹ Can push as an alpha / beta feature per Sunil
> - YARN-1197 Support changing resources of an allocated container:
>‹ Can push as an alpha/beta feature per Wangda
> - YARN-3611 Support Docker Containers In LinuxContainerExecutor: Well
>most of it anyways.
>‹ Can push as an alpha feature.
> - YARN Timeline Service v1.5 - YARN-4233
>‹ Should include per Li Lu
> - YARN Timeline Service Next generation: YARN-2928
>‹ Per analysis from Sangjin, drop this from 2.8.
>
>One open feature status
> - HDFS-8155Support OAuth2 in WebHDFS: Alpha / Early feature?
>
>Updated the Roadmap wiki with the same.
>
>Thanks
>+Vinod
>
>> On Nov 13, 2015, at 12:12 PM, Sangjin Lee <sj...@apache.org> wrote:
>> 
>> I reviewed the current state of the YARN-2928 changes regarding its
>>impact
>> if the timeline service v.2 is disabled. It does appear that there are a
>> lot of things that still do get created and enabled unconditionally
>> regardless of configuration. While this is understandable when we were
>> working to implement the feature, this clearly needs to be cleaned up so
>> that when disabled the timeline service v.2 doesn't impact other things.
>> 
>> I filed a JIRA for that work:
>> https://issues.apache.org/jira/browse/YARN-4356
>> 
>> We need to complete it before we can merge.
>> 
>> Somewhat related is the status of the configuration and what it means in
>> various contexts (client/app-side vs. server-side, v.1 vs. v.2, etc.). I
>> know there is an ongoing discussion regarding YARN-4183. We'll need to
>> reflect the outcome of that discussion.
>> 
>> My overall impression of whether this can be done for 2.8 is that it
>>looks
>> rather challenging given the suggested timeframe. We also need to
>>complete
>> several major tasks before it is ready.
>> 
>> Sangjin
>> 
>> 
>> On Wed, Nov 11, 2015 at 5:49 PM, Sangjin Lee <sjl...@gmail.com> wrote:
>> 
>>> 
>>> On Wed, Nov 11, 2015 at 12:13 PM, Vinod Vavilapalli <
>>> vino...@hortonworks.com> wrote:
>>> 
>>>>‹ YARN Timeline Service Next generation: YARN-2928: Lots of
>>>>momentum,
>>>> but clearly a work in progress. Two options here
>>>>‹ If it is safe to ship it into 2.8 in a disable manner, we can
>>>> get the early code into trunk and all the way int o2.8.
>>>>‹ If it is not safe, it organically rolls over into 2.9
>>>> 
>>> 
>>> I'll review the changes on YARN-2928 to see what impact it has (if
>>>any) if
>>> the timeline service v.2 is disabled.
>>> 
>>> Another condition for it to make 2.8 is whether the branch will be in a
>>> shape in a couple of weeks such that it adds value for folks that want
>>>to
>>> test it. Hopefully it will become clearer soon.
>>> 
>>> Sangjin
>>> 
>



Re: Hadoop 2.6.1 Release process thread

2015-08-14 Thread Chris Nauroth
The HADOOP-10786 patch is compatible with JDK 6.  This was a point of
discussion during the original development of the patch.  If you'd like
full details, please see the comments there.  Like Akira, I also confirmed
that the new test works correctly when running with JDK 6.

Thanks!

--Chris Nauroth




On 8/14/15, 9:22 AM, Akira AJISAKA ajisa...@oss.nttdata.co.jp wrote:

Good point. I ran the regression test in HADOOP-10786 successfully on
ajisakaa/common-merge branch with JDK6.
I'll run all the unit tests against JDK6 locally after merging all the
jiras.

Thanks,
Akira

On 8/14/15 23:21, Allen Wittenauer wrote:

   I hope someone tests this against JDK6, otherwise this is an
incompatible change.

 On Aug 12, 2015, at 2:21 PM, Chris Nauroth cnaur...@hortonworks.com
wrote:

 I've just applied the 2.6.1-candidate label to HADOOP-10786.  Since
this
 is somewhat late in the process, I thought I'd better follow up over
email
 too.

 This bug was originally reported with JDK 8.  A code change in JDK 8
broke
 our automatic relogin from a Kerberos keytab, and we needed to change
 UserGroupInformation to fix it.  Just today I discovered that the JDK
code
 change has made it into the JDK 7 code line too.  Specifically, I can
 repro the bug against OpenJDK 1.7.0_85.  Since many users wouldn't
expect
 a minor version upgrade of the JDK to cause such a severe problem, I
think
 HADOOP-10786 is justified for inclusion in a patch release.

 --Chris Nauroth




 On 8/11/15, 7:48 PM, Sangjin Lee sjl...@gmail.com wrote:

 It might have been because we thought that HDFS-7704 was going to
make it.
 It's both make it or neither does. Now that we know HDFS-7704 is out,
 HDFS-7916 should definitely be out. I hope that clarifies.

 On Tue, Aug 11, 2015 at 6:26 PM, Vinod Kumar Vavilapalli 
 vino...@hortonworks.com wrote:

 I earlier removed HDFS-7916 from the list given HDFS-7704 was only in
 2.7.0.

 Chris Trezzo added it back and so it appeared in my lists.

 I removed it again, Chris, please comment on why you added it back.
If
 you
 want it included, please comment here and we can add it after we
figure
 out
 the why and the dependent tickets.

 Thanks
 +Vinod

 On Aug 11, 2015, at 4:37 PM, Sangjin Lee sjl...@gmail.commailto:
 sjl...@gmail.com wrote:

 Could you double check HDFS-7916? HDFS-7916 is needed only if
HDFS-7704
 makes it. However, I see commits for HDFS-7916 in this list, but not
for
 HDFS-7704. If HDFS-7704 is not in the list, we should not backport
 HDFS-7916 as it fixes an issue introduced by HDFS-7704.

 On Tue, Aug 11, 2015 at 4:10 PM, Vinod Kumar Vavilapalli 
 vino...@hortonworks.commailto:vino...@hortonworks.com wrote:
 Put the list here:
 https://wiki.apache.org/hadoop/Release-2.6.1-Working-Notes. And
started
 figuring out ways to fast-path the cherry-picks.

 Thanks
 +Vinod

 On Aug 11, 2015, at 1:15 PM, Vinod Kumar Vavilapalli
vino...@apache.org
 mailto:vino...@apache.org wrote:

   (2) With Wangda's help offline, I prepared an ordered list of
 cherry-pick commits that we can do from our candidate list [1], will
do
 some ground work today.












Re: Hadoop 2.6.1 Release process thread

2015-08-12 Thread Chris Nauroth
I've just applied the 2.6.1-candidate label to HADOOP-10786.  Since this
is somewhat late in the process, I thought I'd better follow up over email
too.

This bug was originally reported with JDK 8.  A code change in JDK 8 broke
our automatic relogin from a Kerberos keytab, and we needed to change
UserGroupInformation to fix it.  Just today I discovered that the JDK code
change has made it into the JDK 7 code line too.  Specifically, I can
repro the bug against OpenJDK 1.7.0_85.  Since many users wouldn't expect
a minor version upgrade of the JDK to cause such a severe problem, I think
HADOOP-10786 is justified for inclusion in a patch release.

--Chris Nauroth




On 8/11/15, 7:48 PM, Sangjin Lee sjl...@gmail.com wrote:

It might have been because we thought that HDFS-7704 was going to make it.
It's both make it or neither does. Now that we know HDFS-7704 is out,
HDFS-7916 should definitely be out. I hope that clarifies.

On Tue, Aug 11, 2015 at 6:26 PM, Vinod Kumar Vavilapalli 
vino...@hortonworks.com wrote:

 I earlier removed HDFS-7916 from the list given HDFS-7704 was only in
 2.7.0.

 Chris Trezzo added it back and so it appeared in my lists.

 I removed it again, Chris, please comment on why you added it back. If
you
 want it included, please comment here and we can add it after we figure
out
 the why and the dependent tickets.

 Thanks
 +Vinod

 On Aug 11, 2015, at 4:37 PM, Sangjin Lee sjl...@gmail.commailto:
 sjl...@gmail.com wrote:

 Could you double check HDFS-7916? HDFS-7916 is needed only if HDFS-7704
 makes it. However, I see commits for HDFS-7916 in this list, but not for
 HDFS-7704. If HDFS-7704 is not in the list, we should not backport
 HDFS-7916 as it fixes an issue introduced by HDFS-7704.

 On Tue, Aug 11, 2015 at 4:10 PM, Vinod Kumar Vavilapalli 
 vino...@hortonworks.commailto:vino...@hortonworks.com wrote:
 Put the list here:
 https://wiki.apache.org/hadoop/Release-2.6.1-Working-Notes. And started
 figuring out ways to fast-path the cherry-picks.

 Thanks
 +Vinod

 On Aug 11, 2015, at 1:15 PM, Vinod Kumar Vavilapalli vino...@apache.org
 mailto:vino...@apache.org wrote:

   (2) With Wangda's help offline, I prepared an ordered list of
 cherry-pick commits that we can do from our candidate list [1], will do
 some ground work today.








Re: 2.7.2 release plan

2015-07-16 Thread Chris Nauroth
I'd be comfortable with inclusion of any doc-only patch in minor releases.
 There is a lot of value to end users in pushing documentation fixes as
quickly as possible, and they don't bear the same risk of regressions or
incompatibilities as code changes.

--Chris Nauroth




On 7/16/15, 12:38 AM, Tsuyoshi Ozawa oz...@apache.org wrote:

Hi,

thank you for starting the discussion about 2.7.2 release.

 The focus obviously is to have blocker issues [2], bug-fixes and *no*
features / improvements.

I've committed YARN-3170, which is an improvement of documentation. I
thought documentation pages which can be fit into branch-2.7 can be
included easily. Should I revert it?

 I need help from all committers in automatically
merging in any patch that fits the above criterion into 2.7.2 instead of
only on trunk or 2.8.

Sure, I'll try my best.

 That way we can include not only blocker but also critical bug fixes to
2.7.2 release.

As Vinod mentioned, we should also apply major bug fixes into branch-2.7.

Thanks,
- Tsuyoshi

On Thu, Jul 16, 2015 at 3:52 PM, Akira AJISAKA
ajisa...@oss.nttdata.co.jp wrote:
 Thanks Vinod for starting 2.7.2 release plan.

 The focus obviously is to have blocker issues [2], bug-fixes and *no*
 features / improvements.

 Can we adopt the plan as Karthik mentioned in Additional maintenance
 releases for Hadoop 2.y versions thread? That way we can include not
only
 blocker but also critical bug fixes to 2.7.2 release.

 In addition, branch-2.7 is a special case. (2.7.1 is the first stable
 release) Therefore I'm thinking we can include major bug fixes as well.

 Regards,
 Akira


 On 7/16/15 04:13, Vinod Kumar Vavilapalli wrote:

 Hi all,


 Thanks everyone for the push on 2.7.1! Branch-2.7 is now open for
commits
 to a 2.7.2 release. JIRA also now has a 2.7.2 version for all the
 sub-projects.


 Continuing the previous 2.7.1 thread on steady maintenance releases
[1],
 we
 should follow up 2.7.1 with a 2.7.2 within 4 weeks. Earlier I tried a
2-3
 week cycle for 2.7.1, but it seems to be impractical given the
community
 size. So, I propose we target a release by the end for 4 weeks from
now,
 starting the release close-down within 2-3 weeks.

 The focus obviously is to have blocker issues [2], bug-fixes and *no*
 features / improvements. I need help from all committers in
automatically
 merging in any patch that fits the above criterion into 2.7.2 instead
of
 only on trunk or 2.8.

 Thoughts?

 Thanks,

 +Vinod

 [1] A 2.7.1 release to follow up 2.7.0
 http://markmail.org/message/zwzze6cqqgwq4rmw

 [2] 2.7.2 release blockers:
 https://issues.apache.org/jira/issues/?filter=12332867






Re: IMPORTANT: automatic changelog creation

2015-07-02 Thread Chris Nauroth
+1

Thank you to Allen for the script, and thank you to Andrew for
volunteering to drive the conversion.

--Chris Nauroth




On 7/2/15, 2:01 PM, Andrew Wang andrew.w...@cloudera.com wrote:

Hi all,

I want to revive the discussion on this thread, since the overhead of
CHANGES.txt came up again in the context of backporting fixes for
maintenance releases.

Allen's automatic generation script (HADOOP-11731) went into trunk but not
branch-2, so we're still maintaining CHANGES.txt everywhere. What do
people
think about backporting this to branch-2 and then removing CHANGES.txt
from
trunk/branch-2 (HADOOP-11792)? Based on discussion on this thread and in
HADOOP-11731, we seem to agree that CHANGES.txt is an unreliable source of
information, and JIRA is at least as reliable and probably much more so.
Thus I don't see any downsides to backporting it.

Would like to hear everyone's thoughts on this, I'm willing to drive the
effort.

Thanks,
Andrew

On Thu, Apr 2, 2015 at 2:00 PM, Tsz Wo Sze szets...@yahoo.com.invalid
wrote:

 Generating change log from JIRA is a good idea.  It bases on an
assumption
 that each JIRA has an accurate summary (a.k.a. JIRA title) to reflect
the
 committed change. Unfortunately, the assumption is invalid for many
cases
 since we never enforce that the JIRA summary must be the same as the
change
 log.  We may compare the current CHANGES.txt with the generated change
 log.  I beg the diff is long.
 Besides, after a release R1 is out, someone may (accidentally or
 intentionally) modify the JIRA summary.  Then, the entry for the same
item
 in a later release R2 could be different from the one in R1.
 I agree that manually editing CHANGES.txt is not a perfect solution.
 However, it works well in the past for many releases.  I suggest we keep
 the current dev workflow.  Try using the new script provided by
 HADOOP-11731 to generate the next release.  If everything works well, we
 shell remove CHANGES.txt and revise the dev workflow.  What do you
think?
 Regards,Tsz-Wo


  On Thursday, April 2, 2015 12:57 PM, Allen Wittenauer 
 a...@altiscale.com wrote:





 On Apr 2, 2015, at 12:40 PM, Vinod Kumar Vavilapalli 
 vino...@hortonworks.com wrote:

 
  We'd then doing two commits for every patch. Let's simply not remove
 CHANGES.txt from trunk, keep the existing dev workflow, but doc the
release
 process to remove CHANGES.txt in trunk at the time of a release going
out
 of trunk.



 Might as well copy branch-2¹s changes.txt into trunk then. (or 2.7¹s.
 Last I looked, people updated branch-2 and not 2.7¹s or vice versa for
some
 patches that went into both branches.)  So that folks who are
committing to
 both branches and want to cherry pick all changes can.

 I mean, trunk¹s is very very very wrong. Right now. Today. Borderline
 useless. See HADOOP-11718 (which I will now close out as won¹t fix)Š and
 that jira is only what is miscategorized, not what is missing.







F 6/19: Jenkins clogged up

2015-06-19 Thread Chris Nauroth
Hi everyone,

I was just in contact with Apache infrastructure.  Jenkins wasn't running
jobs for a while, so there is a large backlog in the queue now (over 200
jobs).  Infra has fixed the problems, so jobs are running now, but our
test-patch runs might have to sit in the queue a long time today.

--Chris Nauroth



Reminder: Apache committers have access to a free MSDN license

2015-06-06 Thread Chris Nauroth
If you are a committer on any Apache project (not just Hadoop), then you
have access to a free MSDN license.  The details are described here.

https://svn.apache.org/repos/private/committers/donated-licenses/msdn-licen
se-grants.txt


You'll need to authenticate with your Apache credentials.

This means that all Hadoop committers, and a large number of contributors
who are also committers on other Apache projects, are empowered to review
and test patches on Windows.

After getting the free MSDN license, you can download the installation iso
for Windows Server 2008 or 2010 and run it in a VirtualBox VM (or your
hypervisor of choice). Instructions for setting up a Windows development
environment have been in BUILDING.txt for a few years.

This would prevent situations where patches are blocked from getting
committed while waiting for me or any other individual to test.

--Chris Nauroth



Re: 2.7.1 status

2015-05-27 Thread Chris Nauroth
Thanks, Larry.  I have marked HADOOP-11934 as a blocker for 2.7.1.  I have
reviewed and +1'd it.  I can commit it after we get feedback from Jenkins.

--Chris Nauroth




On 5/26/15, 12:41 PM, larry mccay lmc...@apache.org wrote:

Hi Vinod -

I think that https://issues.apache.org/jira/browse/HADOOP-11934 should
also
be added to the blocker list.
This is a critical bug in our ability to protect the LDAP connection
password in LdapGroupsMapper.

thanks!

--larry

On Tue, May 26, 2015 at 3:32 PM, Vinod Kumar Vavilapalli 
vino...@hortonworks.com wrote:

 Tx for reporting this, Elliot.

 Made it a blocker, not with a deeper understanding of the problem. Can
you
 please chime in with your opinion and perhaps code reviews?

 Thanks
 +Vinod

 On May 26, 2015, at 10:48 AM, Elliott Clark ecl...@apache.org wrote:

  HADOOP-12001 should probably be added to the blocker list since it's a
  regression that can keep ldap from working.





Re: [DISCUSS] branch-1

2015-05-08 Thread Chris Nauroth
I think it would be fine to auto-close most remaining branch-1 issues
even if the branch is still formally considered alive.  I don't expect us
to create a new 1.x release unless a security vulnerability or critical
bug forces it.  Closing all non-critical issues would match with the
reality that no one is actively developing for the branch, but there would
still be the option of filing new critical bugs if someone decides that
they want a new 1.x release.

--Chris Nauroth




On 5/8/15, 10:50 AM, Karthik Kambatla ka...@cloudera.com wrote:

I would be -1 to declaring the branch dead just yet. There have been 7
commits to that branch this year. I know this isn't comparable to trunk or
branch-2, but it is not negligible either.

I propose we come up with a policy for deprecating past major release
branches. May be, something along the lines of - deprecate branch-x when
release x+3.0.0 goes GA?



On Fri, May 8, 2015 at 10:41 AM, Allen Wittenauer a...@altiscale.com
wrote:


 May we declare this branch dead and just close bugs (but not
 necessarily concepts, ideas, etc) with won¹t fix?  I don¹t think anyone
has
 any intention of working on the 1.3 release, especially given that 1.2.1
 was Aug 2013 Š.

 I guess we need a PMC member to declare a vote or whateverŠ.





-- 
Karthik Kambatla
Software Engineer, Cloudera Inc.

http://five.sentenc.es



Re: Planning Hadoop 2.6.1 release

2015-04-30 Thread Chris Nauroth
Thank you, Arpit.  In addition, I suggest we include the following:

HADOOP-11333. Fix deadlock in DomainSocketWatcher when the notification
pipe is full
HADOOP-11604. Prevent ConcurrentModificationException while closing domain
sockets during shutdown of DomainSocketWatcher thread.
HADOOP-11648. Set DomainSocketWatcher thread name explicitly
HADOOP-11802. DomainSocketWatcher thread terminates sometimes after there
is an I/O error during requestShortCircuitShm

HADOOP-11604 and 11648 are not critical by themselves, but they are
pre-requisites to getting a clean cherry-pick of 11802, which we believe
finally fixes the root cause of this issue.


--Chris Nauroth




On 4/30/15, 3:55 PM, Arpit Agarwal aagar...@hortonworks.com wrote:

HDFS candidates for back-porting to Hadoop 2.6.1. The first two were
requested in [1].

HADOOP-11674. oneByteBuf in CryptoInputStream and CryptoOutputStream
should be non static
HADOOP-11710. Make CryptoOutputStream behave like DFSOutputStream wrt
synchronization

HDFS-7009. Active NN and standby NN have different live nodes.
HDFS-7035. Make adding a new data directory to the DataNode an atomic and
improve error handling
HDFS-7425. NameNode block deletion logging uses incorrect appender.
HDFS-7443. Datanode upgrade to BLOCKID_BASED_LAYOUT fails if duplicate
block files are present in the same volume.
HDFS-7489. Incorrect locking in FsVolumeList#checkDirs can hang datanodes
HDFS-7503. Namenode restart after large deletions can cause slow
processReport.
HDFS-7575. Upgrade should generate a unique storage ID for each volume.
HDFS-7579. Improve log reporting during block report rpc failure.
HDFS-7587. Edit log corruption can happen if append fails with a quota
violation.
HDFS-7596. NameNode should prune dead storages from storageMap.
HDFS-7611. deleteSnapshot and delete of a file can leave orphaned blocks
in the blocksMap on NameNode restart.
HDFS-7714. Simultaneous restart of HA NameNodes and DataNode can cause
DataNode to register successfully with only one NameNode.
HDFS-7733. NFS: readdir/readdirplus return null directory attribute on
failure.
HDFS-7831. Fix the starting index and end condition of the loop in
FileDiffList.findEarlierSnapshotBlocks().
HDFS-7885. Datanode should not trust the generation stamp provided by
client.
HDFS-7960. The full block report should prune zombie storages even if
they're not empty.
HDFS-8072. Reserved RBW space is not released if client terminates while
writing block.
HDFS-8127. NameNode Failover during HA upgrade can cause DataNode to
finalize upgrade.


Arpit

[1] Will Hadoop 2.6.1 be released soon?
http://markmail.org/thread/zlsr6prejyogdyvh



On 4/27/15, 11:47 AM, Vinod Kumar Vavilapalli vino...@apache.org
wrote:

There were several requests on the user lists [1] for a 2.6.1 release. I
got many offline comments too.

Planning to do a 2.6.1 release in a few weeks time. We already have a
bunch
of tickets committed to 2.7.1. I created a filter [2] to tracking pending
tickets.

We need to collectively come up with a list of critical issues. We can
use
the JIRA Target Version field for the same. I see some but not a whole
lot
of new work for this release, most of it is likely going to be pulling in
critical patches from 2.7.1/2.8 etc.

Thoughts?

Thanks
+Vinod

[1] Will Hadoop 2.6.1 be released soon?
http://markmail.org/thread/zlsr6prejyogdyvh
[2] 2.6.1 pending tickets
https://issues.apache.org/jira/issues/?filter=12331711






Re: Set minimum version of Hadoop 3 to JDK 8

2015-04-21 Thread Chris Nauroth
Suppose we configure maven-compiler-plugin with source set to 1.7 but
target set to 1.8 in trunk.  I believe this would have the effect of
generating JDK 8 bytecode, but enforcing that our code sticks to JDK 7
compatibility at compile time.  Does that still satisfy requirements for
HADOOP-11858?

I'd prefer to avoid executing duplicate builds for different JDK versions.
 Pre-commit already takes a long time, and I suspect doubling the amount
of builds will make us starved for executors in Jenkins.

Chris Nauroth
Hortonworks
http://hortonworks.com/






On 4/21/15, 8:38 PM, Sean Busbey bus...@cloudera.com wrote:

A few options:

* Only change the builds for master to use jdk8
* build with both jdk7 and jdk8 by copying jobs
* build with both jdk7 and jdk8 using a jenkins matrix build

Robert, if you'd like help with any of these please send me a ping
off-list.

On Tue, Apr 21, 2015 at 8:19 PM, Vinod Kumar Vavilapalli 
vino...@hortonworks.com wrote:

 We don't want JDK 8 only code going into branch-2 line. Moving Jenkins
to
 1.8 right-away will shield such code, how do we address that?

 Thanks,
 +Vinod

 On Apr 21, 2015, at 5:54 PM, Robert Kanter rkan...@cloudera.com wrote:

  Sure, I'll try to change the Jenkins builds to 1.8 first.
 
  On Tue, Apr 21, 2015 at 3:31 PM, Andrew Wang
andrew.w...@cloudera.com
  wrote:
 
  Hey Robert,
 
  As a first step, could we try switching all our precommit and nightly
  builds over to use 1.8? This is a prerequisite for HADOOP-11858, and
 safe
  to do in any case since it'll still target 1.7.
 
  I'll note that HADOOP-10530 details the pain Steve went through
 switching
  us to JDK7. Might be some lessons learned about how to do this
 transition
  more smoothly.
 
  Thanks,
  Andrew
 
  On Tue, Apr 21, 2015 at 3:15 PM, Robert Kanter rkan...@cloudera.com
  wrote:
 
  + yarn-dev, hdfs-dev, mapred-dev
 
  On Tue, Apr 21, 2015 at 3:14 PM, Robert Kanter
rkan...@cloudera.com
  wrote:
 
  Hi all,
 
  Moving forward on some of the discussions on Hadoop 3, I've created
  HADOOP-11858 to set the minimum version of Hadoop 3 to JDK 8.  I
just
  wanted to let everyone know in case there's some reason we
shouldn't
 go
  ahead with this.
 
  thanks
  - Robert
 
 
 




-- 
Sean



Re: Patch failed to build with eclipse:eclipse.

2015-04-20 Thread Chris Nauroth
Hello Jens,

We have seen spurious failures from mvn eclipse:eclipse like this in the
past.  I recommend that you enter a comment on MAPREDUCE-6320 stating that
you tested mvn eclipse:eclipse locally and it ran fine.

BTW, we're nearly ready to commit a major rewrite of our pre-commit build
automation in HADOOP-11746.  Let's see if the problems subside after that.
 For now, I don't think it's worthwhile to investigate this failure any
further.

Thank you for following up!

Chris Nauroth
Hortonworks
http://hortonworks.com/






On 4/20/15, 12:28 PM, Jens Rabe rabe-j...@t-online.de wrote:

Hello!

I just submitted a patch for MAPREDUCE-6320 and Hadoop QA said it failed
to build with eclipse:eclipse. I looked in the console output from the
build and indeed there is a suspicious output:

/home/jenkins/tools/maven/latest/bin/mvn eclipse:eclipse
-DHadoopPatchProcess 
/home/jenkins/jenkins-slave/workspace/PreCommit-MAPREDUCE-Build/../patchpr
ocess/patchEclipseOutput.txt 21
/home/jenkins/jenkins-slave/workspace/PreCommit-MAPREDUCE-Build/dev-suppor
t/test-patch.sh: line 696:
/home/jenkins/jenkins-slave/workspace/PreCommit-MAPREDUCE-Build/../patchpr
ocess/patchEclipseOutput.txt: No such file or directory

I ran mvn eclipse:eclipse -DHadoopPatchProcess manually on my computer
and it worked. Is this warning related to my patch or is this another
issue? mvn apache-rat:check also reports some „no such file or directory“.



committing HADOOP-11746 test-patch improvements

2015-04-16 Thread Chris Nauroth
I'd like to thank Allen Wittenauer for his work on HADOOP-11746 to rewrite
test-patch.sh.  There is a lot of nice new functionality in there.  My
favorite part is that some patches will execute much faster, so I expect
this will make the project more efficient overall at moving patches
through the pre-commit process.

I have +1'd the patch, but since this is a tool that we all use
frequently, I'd like to delay a day before committing.  Please comment on
the jira if you have any additional feedback.  We're aiming to commit on
Friday, 4/17.

Chris Nauroth
Hortonworks
http://hortonworks.com/



Re: [VOTE] Release Apache Hadoop 2.7.0 RC0

2015-04-15 Thread Chris Nauroth
+1 (binding)

- Downloaded source tarball and verified signature and checksum.
- Built from source, including native code on Linux and Windows.
- Ran a 3-node unsecured cluster.
- Tested various HDFS and YARN commands, including sample MapReduce jobs.
- Confirmed that SecondaryNameNode can take a checkpoint.
- HADOOP-9629: Tested integration with Azure storage as an alternative
FileSystem.
- HADOOP-11394/11395/11396: Built site docs and confirmed presence of
these fixes in the documentation for Hadoop Compatible File Systems.
- HDFS-7604: Confirmed presence of DataNode volume failure reporting in
web UI and metrics.
- HDFS-7879: Ran nm -g libhdfs.so and dumpbin /exports hdfs.dll to confirm
export of public API symbols in libhdfs.

Allen mentioned HDFS-8132, which reported a problem using JCarder on the
build.  Brahma, Todd and I have determined that root cause is
incompatibility of the JCarder build with Java 7 classes.  (2.7.0 is our
first release compiling as Java 7.)  This is not a problem that needs to
hold up the release.

Vinod, thank you for putting together the release.


Chris Nauroth
Hortonworks
http://hortonworks.com/






On 4/15/15, 7:07 AM, Mit Desai mitde...@yahoo-inc.com.INVALID wrote:

+1 (non-binding)
   
   - Downloaded and built the source.
   - Deployed to a local cluster and ran sample jobls like sleepJob and
Wordcount.
   - Verified Signatures
   - Verified the RM UI for correctness.

Thanks Vinod for taking the time and effort to drive this release.


-Mit Desai 


 On Wednesday, April 15, 2015 8:03 AM, Vinod Kumar Vavilapalli
vino...@hortonworks.com wrote:
   

 Tx Pat. This is really interesting news!

+Vinod

On Apr 14, 2015, at 11:18 PM, Pat White patwh...@yahoo-inc.com.INVALID
wrote:

 +1 non-binding
 Ran a performance comparison of 2.6.0 Latest with 2.7.0 RC, looks good,
no regressions observed, most metrics are well within 5%  tolerance
(dfsio, sort, amrestart, gmv3) and some tests (scan, amscale,
compression, shuffle) appear to have improvements.
 
 Disclaimer, please note a JDK difference, 2.6.0 ran with jdk1.7 while
2.7.0 had jdk1.8, so some of the improvement may be from Java 1.8.0,
that said, the 2.7.0 benchmarks compare well against current 2.6.0.
 Thanks.
 
 patw
 
 
 - Forwarded Message -  From: Vinod Kumar Vavilapalli
vino...@apache.org
 To: common-...@hadoop.apache.org; hdfs-...@hadoop.apache.org;
yarn-...@hadoop.apache.org; mapreduce-dev@hadoop.apache.org
 Cc: vino...@apache.org
 Sent: Friday, April 10, 2015 6:44 PM
 Subject: [VOTE] Release Apache Hadoop 2.7.0 RC0
  Hi all,
 I've created a release candidate RC0 for Apache Hadoop 2.7.0.
 The RC is available at:
http://people.apache.org/~vinodkv/hadoop-2.7.0-RC0/
 The RC tag in git is: release-2.7.0-RC0
 The maven artifacts are available via repository.apache.org
athttps://repository.apache.org/content/repositories/orgapachehadoop-1017
/
 As discussed before - This release will only work with JDK 1.7 and
above - I¹d like to use this as a starting release for 2.7.x [1],
depending onhow it goes, get it stabilized and potentially use a 2.7.1
in a fewweeks as the stable release.
 Please try the release and vote; the vote will run for the usual 5 days.
 Thanks, Vinod
 [1]: A 2.7.1 release to follow up
2.7.0http://markmail.org/thread/zwzze6cqqgwq4rmw


  



Re: A 2.7.1 release to follow up 2.7.0

2015-04-09 Thread Chris Nauroth
+1, full agreement with both Vinod and Karthik.  Thanks!

Chris Nauroth
Hortonworks
http://hortonworks.com/






On 4/9/15, 12:07 PM, Karthik Kambatla ka...@cloudera.com wrote:

Inline.

On Thu, Apr 9, 2015 at 11:48 AM, Vinod Kumar Vavilapalli
vino...@apache.org
 wrote:

 Hi all,

 I feel like we haven't done a great job of maintaining the previous 2.x
 releases. Seeing as how long 2.7.0 release has taken, I am sure we will
 spend more time stabilizing it, fixing issues etc.

 I propose that we immediately follow up 2.7.0 with a 2.7.1 within 2-3
 weeks. The focus obviously is to have blocker issues, bug-fixes and *no*
 features.


+1. Having a 2.7.2/2.7.3 to continue stabilizing is also appealing. Would
greatly help folks who upgrade to later releases for major bug fixes
instead of the new and shiny features.



 Improvements are going to be slightly hard to reason about, but I
 propose limiting ourselves to very small improvements, if at all.


I would avoid any improvements unless they are to fix severe regressions -
performance or otherwise. I guess they become blockers in that case. So,
yeah, I suggest no improvements at all.



 The other area of concern with the previous releases had been
 compatibility. With help from Li Lu, I got jdiff reinstated in branch-2
 (though patches are not yet in), and did a pass. In the unavoidable
event
 that we find incompatibilities with 2.7.0, we can fix those in 2.7.1 and
 promote that to be the stable release.


Sounds reasonable.



 Thoughts?

 Thanks,+Vinod




-- 
Karthik Kambatla
Software Engineer, Cloudera Inc.

http://five.sentenc.es



Re: map() function call related

2015-04-07 Thread Chris Nauroth
Hello Shahil,

In the current trunk codebase, the relevant files are
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-co
re/src/main/java/org/apache/hadoop/mapred/MapTask.java and
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-co
re/src/main/java/org/apache/hadoop/mapreduce/Mapper.java.  MapTask manages
the execution of the mapper task, and eventually it calls Mapper#run,
which then calls into the implementation of the map method.  BTW, you'll
also see a corresponding ReduceTask.java and Reducer.java in the same
directories if you need to look at those too.

Input split calculation is performed by a subclass of InputFormat.

http://hadoop.apache.org/docs/r2.6.0/api/org/apache/hadoop/mapreduce/InputF
ormat.html


I recommend looking at that.  You also can navigate down through those
JavaDocs to identify subclasses of InputFormat, like FileInputFormat and
TextInputFormat, which you can then find in the source code.

I hope this helps.

Chris Nauroth
Hortonworks
http://hortonworks.com/






On 4/7/15, 6:09 AM, Shahil Varshney shahilvarsh...@gmail.com wrote:

Sir ,
i want to know that which class in hadoop (internal source class) is
responsible for calling map function for each key value pair(means calls
map() function).

 and which class actually done the input split job. i want to create my
own
class for input split so please tell me .



[jira] [Resolved] (MAPREDUCE-6212) UnsatisfiedLinkError: org.apache.hadoop.security.JniBasedUnixGroupsMapping.anchorNative() happened when starting MRAppMaster

2015-01-08 Thread Chris Nauroth (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6212?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Nauroth resolved MAPREDUCE-6212.
--
Resolution: Invalid

 UnsatisfiedLinkError: 
 org.apache.hadoop.security.JniBasedUnixGroupsMapping.anchorNative() happened 
 when starting MRAppMaster
 

 Key: MAPREDUCE-6212
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6212
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: security
Affects Versions: 2.6.0
 Environment: CentOS 64bit
Reporter: Dinh Hoang Mai
Assignee: Dinh Hoang Mai

 I have just started to work with Hadoop 2.
 After installing with basic configs, I always failed to run any examples. Has 
 anyone seen this problem and please help me?
 This is the log
 2015-01-08 01:52:01,599 INFO [main] 
 org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Created MRAppMaster for 
 application appattempt_1420648881673_0004_01
 2015-01-08 01:52:01,764 FATAL [main] 
 org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Error starting MRAppMaster
 java.lang.RuntimeException: java.lang.reflect.InvocationTargetException
   at 
 org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:131)
   at org.apache.hadoop.security.Groups.init(Groups.java:70)
   at org.apache.hadoop.security.Groups.init(Groups.java:66)
   at 
 org.apache.hadoop.security.Groups.getUserToGroupsMappingService(Groups.java:280)
   at 
 org.apache.hadoop.security.UserGroupInformation.initialize(UserGroupInformation.java:271)
   at 
 org.apache.hadoop.security.UserGroupInformation.setConfiguration(UserGroupInformation.java:299)
   at 
 org.apache.hadoop.mapreduce.v2.app.MRAppMaster.initAndStartAppMaster(MRAppMaster.java:1473)
   at 
 org.apache.hadoop.mapreduce.v2.app.MRAppMaster.main(MRAppMaster.java:1429)
 Caused by: java.lang.reflect.InvocationTargetException
   at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
   at 
 sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
   at 
 sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
   at java.lang.reflect.Constructor.newInstance(Constructor.java:408)
   at 
 org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:129)
   ... 7 more
 Caused by: java.lang.UnsatisfiedLinkError: 
 org.apache.hadoop.security.JniBasedUnixGroupsMapping.anchorNative()V
   at 
 org.apache.hadoop.security.JniBasedUnixGroupsMapping.anchorNative(Native 
 Method)
   at 
 org.apache.hadoop.security.JniBasedUnixGroupsMapping.clinit(JniBasedUnixGroupsMapping.java:49)
   at 
 org.apache.hadoop.security.JniBasedUnixGroupsMappingWithFallback.init(JniBasedUnixGroupsMappingWithFallback.java:39)
   ... 12 more
 2015-01-08 01:52:01,767 INFO [main] org.apache.hadoop.util.ExitUtil: Exiting 
 with status 1
 This is my configs
 core-site.xml
 property
   namefs.defaultFS/name
   valuehdfs://grey5:9000/value
 /property
  
 property
   namehadoop.tmp.dir/name
   value/home/maidinh/hadoop2/hadoop-data/value
 /property
  
 hdfs-site.xml
 property
   namedfs.namenode.name.dir/name
   value/home/maidinh/hadoop2/nn/value
 /property
  
 property
   namedfs.datanode.data.dir/name
   
 value/data1/maidinh/hadoop2/dn,/data2/maidinh/hadoop2/dn,/data3/maidinh/hadoop2/dn/value
 /property
 yarn-site.xml
 property
   nameyarn.resourcemanager.hostname/name
   valuegrey5/value
 /property
  
 property
   nameyarn.nodemanager.local-dirs/name
   
 value/data4/maidinh/hadoop2/yarn-data,/data5/maidinh/hadoop2/yarn-data,/data6/maidinh/hadoop2/yarn-data/value
 /property
  
 property
   nameyarn.nodemanager.log-dirs/name
   
 value/data4/maidinh/hadoop2/yarn-logs,/data5/maidinh/hadoop2/yarn-logs,/data6/maidinh/hadoop2/yarn-logs/value
 /property
  
 property
   nameyarn.nodemanager.aux-services/name
   valuemapreduce_shuffle/value
 /property
 mapred-site.xml
 property
   namemapreduce.framework.name/name
   valueyarn/value
 /property
  
 property
   namemapreduce.jobhistory.address/name
   valuegrey5:10020/value
 /property
 property
   namemapreduce.jobhistory.webapp.address/name
   valuegrey5:19888/value
 /property
  
 property
   namemapreduce.jobtracker.address/name
   valuegrey5:50030/value
 /property
 .bashrc
 export JAVA_HOME=/usr/java/latest/
 export HADOOP_PREFIX=/home/maidinh/hadoop2/hadoop-2.6.0
 export HADOOP_YARN_USER=maidinh
 export HADOOP_HOME=$HADOOP_PREFIX
 export HADOOP_CONF_DIR=$HADOOP_PREFIX/etc/hadoop
 export HADOOP_PID_DIR=$HADOOP_PREFIX
 export HADOOP_LOG_DIR=$HADOOP_PREFIX/logs
 export HADOOP_OPTS=$HADOOP_OPTS -Djava.io.tmpdir=$HADOOP_PREFIX/tmp
 export YARN_HOME=$HADOOP_PREFIX
 export

Re: [VOTE] Release Apache Hadoop 2.6.0

2014-11-14 Thread Chris Nauroth
+1 (binding)

- Verified checksums and signatures for source and binary tarballs.
- Started a pseudo-distributed HDFS cluster in secure mode with SSL.
- Tested various file system operations.
- Verified HDFS-2856, the new feature to run a secure DataNode without
requiring root.
- Verified HDFS-7385, the recent blocker related to incorrect ACLs
serialized to the edit log.

Thank you to Arun as release manager, and thank you to all of the
contributors for their hard work on this release.

Chris Nauroth
Hortonworks
http://hortonworks.com/


On Fri, Nov 14, 2014 at 10:57 AM, Yongjun Zhang yzh...@cloudera.com wrote:

 Thanks Arun for leading the 2.6 release effort.

 +1 (non-binding)

 - Downloaded rc1 source and did build
 - Created two single-node clusters running 2.6
 - Ran sample mapreduce job
 - Ran distcp between two clusters

 --Yongjun


 On Thu, Nov 13, 2014 at 3:08 PM, Arun C Murthy a...@hortonworks.com
 wrote:

  Folks,
 
  I've created another release candidate (rc1) for hadoop-2.6.0 based on
 the
  feedback.
 
  The RC is available at:
  http://people.apache.org/~acmurthy/hadoop-2.6.0-rc1
  The RC tag in git is: release-2.6.0-rc1
 
  The maven artifacts are available via repository.apache.org at
  https://repository.apache.org/content/repositories/orgapachehadoop-1013.
 
  Please try the release and vote; the vote will run for the usual 5 days.
 
  thanks,
  Arun
 
 
  --
  CONFIDENTIALITY NOTICE
  NOTICE: This message is intended for the use of the individual or entity
 to
  which it is addressed and may contain information that is confidential,
  privileged and exempt from disclosure under applicable law. If the reader
  of this message is not the intended recipient, you are hereby notified
 that
  any printing, copying, dissemination, distribution, disclosure or
  forwarding of this communication is strictly prohibited. If you have
  received this communication in error, please contact the sender
 immediately
  and delete it from your system. Thank You.
 


-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.


Re: [VOTE] Release Apache Hadoop 2.6.0

2014-11-14 Thread Chris Nauroth
Hi Eric,

There was a second release candidate created (named rc1), and voting
started fresh in a new thread.  You might want to join in on that second
thread to make sure that your vote gets counted.  Thanks!

Chris Nauroth
Hortonworks
http://hortonworks.com/


On Fri, Nov 14, 2014 at 3:08 PM, Eric Payne erichadoo...@yahoo.com.invalid
wrote:

 +1
 I downloaded and built source. Started local cluster and ran wordcount,
 sleep, and simple streaming job.

 Also, I ran a distributed shell job which tested preserving containers
 across AM restart by setting the
 -keep_containers_across_application_attempts flag and killing the first AM
 once the containers start.
 Enabled the preemption feature and verified containers were preempted and
 queues were levelized.
 Ran unit tests for hadoop-yarn-server-resourcemanagerRan unit tests for
 hadoop-hdfs
  Thank you,-Eric Payne


   From: Arun C Murthy a...@hortonworks.com
  To: common-...@hadoop.apache.org common-...@hadoop.apache.org; 
 hdfs-...@hadoop.apache.org hdfs-...@hadoop.apache.org; 
 yarn-...@hadoop.apache.org yarn-...@hadoop.apache.org; 
 mapreduce-dev@hadoop.apache.org mapreduce-dev@hadoop.apache.org
  Sent: Monday, November 10, 2014 8:52 PM
  Subject: [VOTE] Release Apache Hadoop 2.6.0

 Folks,

 I've created a release candidate (rc0) for hadoop-2.6.0 that I would like
 to see released.

 The RC is available at:
 http://people.apache.org/~acmurthy/hadoop-2.6.0-rc0
 The RC tag in git is: release-2.6.0-rc0

 The maven artifacts are available via repository.apache.org at
 https://repository.apache.org/content/repositories/orgapachehadoop-1012.

 Please try the release and vote; the vote will run for the usual 5 days.

 thanks,
 Arun


 --
 CONFIDENTIALITY NOTICE
 NOTICE: This message is intended for the use of the individual or entity to
 which it is addressed and may contain information that is confidential,
 privileged and exempt from disclosure under applicable law. If the reader
 of this message is not the intended recipient, you are hereby notified that
 any printing, copying, dissemination, distribution, disclosure or
 forwarding of this communication is strictly prohibited. If you have
 received this communication in error, please contact the sender immediately
 and delete it from your system. Thank You.





-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.


Re: [VOTE] Release Apache Hadoop 2.6.0

2014-11-13 Thread Chris Nauroth
I'm helping to expedite a complete, approved patch for HDFS-7385 now.
Then, we can make a final decision on its inclusion in 2.6.0.  Thank you
for bringing it up, Yi.

Chris Nauroth
Hortonworks
http://hortonworks.com/


On Wed, Nov 12, 2014 at 7:31 PM, Liu, Yi A yi.a@intel.com wrote:

 Arun, could you wait for HDFS-7385? It will cause issue of HDFS ACL and
 XAttrs in some case, the fix is very easy but I think the issue is critical.
 I'm helping review it, and expect to commit today. Thanks.

 Regards,
 Yi Liu

 -Original Message-
 From: Arun C Murthy [mailto:a...@hortonworks.com]
 Sent: Thursday, November 13, 2014 12:58 AM
 To: yarn-...@hadoop.apache.org
 Cc: mapreduce-dev@hadoop.apache.org; Ravi Prakash;
 hdfs-...@hadoop.apache.org; common-...@hadoop.apache.org
 Subject: Re: [VOTE] Release Apache Hadoop 2.6.0

 Sounds good. I'll create an rc1. Thanks.

 Arun

 On Nov 11, 2014, at 2:06 PM, Robert Kanter rkan...@cloudera.com wrote:

  Hi Arun,
 
  We were testing the RC and ran into a problem with the recent fixes
  that were done for POODLE for Tomcat (HADOOP-11217 for KMS and
  HDFS-7274 for HttpFS).  Basically, in disabling SSLv3, we also
  disabled SSLv2Hello, which is required for older clients (e.g. Java 6
  with openssl 0.9.8x) so they can't connect without it.  Just to be
  clear, it does not mean SSLv2, which is insecure.  This also affects the
 MR shuffle in HADOOP-11243.
 
  The fix is super simple, so I think we should reopen these 3 JIRAs and
  put in addendum patches and get them into 2.6.0.
 
  thanks
  - Robert
 
  On Tue, Nov 11, 2014 at 1:04 PM, Ravi Prakash ravi...@ymail.com wrote:
 
  Hi Arun!
  We are very close to completion on YARN-1964 (DockerContainerExecutor).
  I'd also like HDFS-4882 to be checked in. Do you think these issues
  merit another RC?
  ThanksRavi
 
 
  On Tuesday, November 11, 2014 11:57 AM, Steve Loughran 
  ste...@hortonworks.com wrote:
 
 
  +1 binding
 
  -patched slider pom to build against 2.6.0
 
  -verified build did download, which it did at up to ~8Mbps. Faster
  than a local build.
 
  -full clean test runs on OS/X  Linux
 
 
  Windows 2012:
 
  Same thing. I did have to first build my own set of the windows
  native binaries, by checking out branch-2.6.0; doing a native build,
  copying the binaries and then purging the local m2 repository of
  hadoop artifacts to be confident I was building against. For anyone
  who wants those native libs they will be up on
  https://github.com/apache/incubator-slider/tree/develop/bin/windows/
  once it syncs with the ASF repos.
 
  afterwords: the tests worked!
 
 
  On 11 November 2014 02:52, Arun C Murthy a...@hortonworks.com wrote:
 
  Folks,
 
  I've created a release candidate (rc0) for hadoop-2.6.0 that I would
  like to see released.
 
  The RC is available at:
  http://people.apache.org/~acmurthy/hadoop-2.6.0-rc0
  The RC tag in git is: release-2.6.0-rc0
 
  The maven artifacts are available via repository.apache.org at
 
 https://repository.apache.org/content/repositories/orgapachehadoop-1012.
 
  Please try the release and vote; the vote will run for the usual 5
 days.
 
  thanks,
  Arun
 
 
  --
  CONFIDENTIALITY NOTICE
  NOTICE: This message is intended for the use of the individual or
  entity
  to
  which it is addressed and may contain information that is
  confidential, privileged and exempt from disclosure under applicable
  law. If the reader of this message is not the intended recipient,
  you are hereby notified
  that
  any printing, copying, dissemination, distribution, disclosure or
  forwarding of this communication is strictly prohibited. If you have
  received this communication in error, please contact the sender
  immediately
  and delete it from your system. Thank You.
 
 
  --
  CONFIDENTIALITY NOTICE
  NOTICE: This message is intended for the use of the individual or
  entity to which it is addressed and may contain information that is
  confidential, privileged and exempt from disclosure under applicable
  law. If the reader of this message is not the intended recipient, you
  are hereby notified that any printing, copying, dissemination,
  distribution, disclosure or forwarding of this communication is
  strictly prohibited. If you have received this communication in
  error, please contact the sender immediately and delete it from your
 system. Thank You.
 
 
 
 

 --
 Arun C. Murthy
 Hortonworks Inc.
 http://hortonworks.com/hdp/



 --
 CONFIDENTIALITY NOTICE
 NOTICE: This message is intended for the use of the individual or entity
 to which it is addressed and may contain information that is confidential,
 privileged and exempt from disclosure under applicable law. If the reader
 of this message is not the intended recipient, you are hereby notified that
 any printing, copying, dissemination, distribution, disclosure or
 forwarding of this communication is strictly prohibited. If you have
 received this communication in error, please contact the sender immediately

Re: [VOTE] Release Apache Hadoop 2.6.0

2014-11-13 Thread Chris Nauroth
I have committed HDFS-7385 down through branch-2.6.0.  Thank you!

Chris Nauroth
Hortonworks
http://hortonworks.com/


On Thu, Nov 13, 2014 at 9:14 AM, Chris Nauroth cnaur...@hortonworks.com
wrote:

 I'm helping to expedite a complete, approved patch for HDFS-7385 now.
 Then, we can make a final decision on its inclusion in 2.6.0.  Thank you
 for bringing it up, Yi.

 Chris Nauroth
 Hortonworks
 http://hortonworks.com/


 On Wed, Nov 12, 2014 at 7:31 PM, Liu, Yi A yi.a@intel.com wrote:

 Arun, could you wait for HDFS-7385? It will cause issue of HDFS ACL and
 XAttrs in some case, the fix is very easy but I think the issue is critical.
 I'm helping review it, and expect to commit today. Thanks.

 Regards,
 Yi Liu

 -Original Message-
 From: Arun C Murthy [mailto:a...@hortonworks.com]
 Sent: Thursday, November 13, 2014 12:58 AM
 To: yarn-...@hadoop.apache.org
 Cc: mapreduce-dev@hadoop.apache.org; Ravi Prakash;
 hdfs-...@hadoop.apache.org; common-...@hadoop.apache.org
 Subject: Re: [VOTE] Release Apache Hadoop 2.6.0

 Sounds good. I'll create an rc1. Thanks.

 Arun

 On Nov 11, 2014, at 2:06 PM, Robert Kanter rkan...@cloudera.com wrote:

  Hi Arun,
 
  We were testing the RC and ran into a problem with the recent fixes
  that were done for POODLE for Tomcat (HADOOP-11217 for KMS and
  HDFS-7274 for HttpFS).  Basically, in disabling SSLv3, we also
  disabled SSLv2Hello, which is required for older clients (e.g. Java 6
  with openssl 0.9.8x) so they can't connect without it.  Just to be
  clear, it does not mean SSLv2, which is insecure.  This also affects
 the MR shuffle in HADOOP-11243.
 
  The fix is super simple, so I think we should reopen these 3 JIRAs and
  put in addendum patches and get them into 2.6.0.
 
  thanks
  - Robert
 
  On Tue, Nov 11, 2014 at 1:04 PM, Ravi Prakash ravi...@ymail.com
 wrote:
 
  Hi Arun!
  We are very close to completion on YARN-1964 (DockerContainerExecutor).
  I'd also like HDFS-4882 to be checked in. Do you think these issues
  merit another RC?
  ThanksRavi
 
 
  On Tuesday, November 11, 2014 11:57 AM, Steve Loughran 
  ste...@hortonworks.com wrote:
 
 
  +1 binding
 
  -patched slider pom to build against 2.6.0
 
  -verified build did download, which it did at up to ~8Mbps. Faster
  than a local build.
 
  -full clean test runs on OS/X  Linux
 
 
  Windows 2012:
 
  Same thing. I did have to first build my own set of the windows
  native binaries, by checking out branch-2.6.0; doing a native build,
  copying the binaries and then purging the local m2 repository of
  hadoop artifacts to be confident I was building against. For anyone
  who wants those native libs they will be up on
  https://github.com/apache/incubator-slider/tree/develop/bin/windows/
  once it syncs with the ASF repos.
 
  afterwords: the tests worked!
 
 
  On 11 November 2014 02:52, Arun C Murthy a...@hortonworks.com wrote:
 
  Folks,
 
  I've created a release candidate (rc0) for hadoop-2.6.0 that I would
  like to see released.
 
  The RC is available at:
  http://people.apache.org/~acmurthy/hadoop-2.6.0-rc0
  The RC tag in git is: release-2.6.0-rc0
 
  The maven artifacts are available via repository.apache.org at
 
 https://repository.apache.org/content/repositories/orgapachehadoop-1012.
 
  Please try the release and vote; the vote will run for the usual 5
 days.
 
  thanks,
  Arun
 
 
  --
  CONFIDENTIALITY NOTICE
  NOTICE: This message is intended for the use of the individual or
  entity
  to
  which it is addressed and may contain information that is
  confidential, privileged and exempt from disclosure under applicable
  law. If the reader of this message is not the intended recipient,
  you are hereby notified
  that
  any printing, copying, dissemination, distribution, disclosure or
  forwarding of this communication is strictly prohibited. If you have
  received this communication in error, please contact the sender
  immediately
  and delete it from your system. Thank You.
 
 
  --
  CONFIDENTIALITY NOTICE
  NOTICE: This message is intended for the use of the individual or
  entity to which it is addressed and may contain information that is
  confidential, privileged and exempt from disclosure under applicable
  law. If the reader of this message is not the intended recipient, you
  are hereby notified that any printing, copying, dissemination,
  distribution, disclosure or forwarding of this communication is
  strictly prohibited. If you have received this communication in
  error, please contact the sender immediately and delete it from your
 system. Thank You.
 
 
 
 

 --
 Arun C. Murthy
 Hortonworks Inc.
 http://hortonworks.com/hdp/



 --
 CONFIDENTIALITY NOTICE
 NOTICE: This message is intended for the use of the individual or entity
 to which it is addressed and may contain information that is confidential,
 privileged and exempt from disclosure under applicable law. If the reader
 of this message is not the intended recipient, you are hereby notified that
 any

[jira] [Created] (MAPREDUCE-6123) TestCombineFileInputFormat incorrectly starts 2 MiniDFSCluster instances.

2014-10-06 Thread Chris Nauroth (JIRA)
Chris Nauroth created MAPREDUCE-6123:


 Summary: TestCombineFileInputFormat incorrectly starts 2 
MiniDFSCluster instances.
 Key: MAPREDUCE-6123
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6123
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: test
Reporter: Chris Nauroth
Assignee: Chris Nauroth
Priority: Trivial


{{TestCombineFileInputFormat#testGetSplitsWithDirectory}} starts 2 
{{MiniDFSCluster}} instances, one right after the other, using the exact same 
configuration.  There is no need for 2 clusters in this test.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


HDFS-573: Windows builds will require CMake

2014-08-03 Thread Chris Nauroth
FYI, I plan to commit HDFS-573 next week, which ports libhdfs to Windows.
 With this patch, we have a new build requirement: Windows build machines
now require CMake.  I've updated BUILDING.txt accordingly.

For those of you working on a Windows dev machine, please install CMake at
your earliest convenience.  I'll hold off committing for a few more days to
give everyone time to react.

Chris Nauroth
Hortonworks
http://hortonworks.com/

-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.


Re: [DISCUSS] Assume Private-Unstable for classes that are not annotated

2014-07-24 Thread Chris Nauroth
+1 for the proposal.

I believe stating that classes without annotations are implicitly private
is consistent with what we publish for our JavaDocs.
 IncludePublicAnnotationsStandardDoclet, used in the root pom.xml, filters
out classes that don't explicitly have the Public annotation.

Chris Nauroth
Hortonworks
http://hortonworks.com/



On Wed, Jul 23, 2014 at 10:55 AM, Karthik Kambatla ka...@cloudera.com
wrote:

 Fair points, Jason.

 The fact that we include this in the compatibility guideline should not
 affect how developers go about this. We should still strive to annotate
 every new class we add, and reviewers should continue to check for them.
 However, in case we miss annotations, we won't be burdened to support those
 APIs for essentially eternity.

 I am aware of downstream projects that use @Private APIs, but I have also
 seen that improve in the recent past with compatible 2.x releases. So, I am
 hoping they will let us know of APIs they would like to see and eventually
 use only Public-Stable APIs.


 On Wed, Jul 23, 2014 at 7:22 AM, Jason Lowe jl...@yahoo-inc.com.invalid
 wrote:

  I think that's a reasonable proposal as long as we understand it changes
  the burden from finding all the things that should be marked @Private to
  finding all the things that should be marked @Public. As Tom Graves
 pointed
  out in an earlier discussion about @LimitedPrivate, it may be impossible
 to
  do a straightforward task and use only interfaces marked @Public.  If
 users
  can't do basic things without straying from @Public interfaces then tons
 of
  code can break if we assume it's always fair game to change anything not
  marked @Public.  The well you shouldn't have used a non-@Public
  interface argument is not very useful in that context.
 
  So as long as we're good about making sure officially supported features
  have corresponding @Public interfaces to wield them then I agree it will
 be
  easier to track those rather than track all the classes that should be
  @Private.  Hopefully if users understand that's how things work they'll
  help file JIRAs for interfaces that need to be @Public to get their work
  done.
 
  Jason
 
 
  On 07/22/2014 04:54 PM, Karthik Kambatla wrote:
 
  Hi devs
 
  As you might have noticed, we have several classes and methods in them
  that
  are not annotated at all. This is seldom intentional. Avoiding
  incompatible
  changes to all these classes can be considerable baggage.
 
  I was wondering if we should add an explicit disclaimer in our
  compatibility guide that says, Classes without annotations are to
  considered @Private
 
  For methods, is it reasonable to say - Class members without specific
  annotations inherit the annotations of the class?
 
  Thanks
  Karthik
 
 
 


-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.


Re: Jenkins Build Slaves

2014-07-10 Thread Chris Nauroth
Thanks, Giri, for taking care of pkgconfig.

It looks like most (all?) pre-commit builds have some new failing tests:

https://builds.apache.org/job/PreCommit-HADOOP-Build/4247/testReport/

On the symlink tests, is there any chance that the new hosts have a
different version/different behavior for the ln command?

The TestIPC failure is in a stress test that checks behavior after spamming
a lot of connections at an RPC server.  Maybe the new hosts have something
different in the TCP stack, such as TCP backlog?

I likely won't get a chance to investigate any more today, but I wanted to
raise the issue in case someone else gets an opportunity to look.

Chris Nauroth
Hortonworks
http://hortonworks.com/



On Wed, Jul 9, 2014 at 10:33 AM, Giridharan Kesavan 
gkesa...@hortonworks.com wrote:

 I dont think so, let me fix that. Thanks Chris for pointing that out.


 -giri


 On Wed, Jul 9, 2014 at 9:50 AM, Chris Nauroth cnaur...@hortonworks.com
 wrote:

 Hi Giri,

 Is pkgconfig deployed on the new Jenkins slaves?  I noticed this build
 failed:

 https://builds.apache.org/job/PreCommit-HADOOP-Build/4237/

 Looking in the console output, it appears the HDFS native code failed to
 build due to missing pkgconfig.

  [exec] CMake Error at
 /usr/share/cmake-2.8/Modules/FindPackageHandleStandardArgs.cmake:108
 (message):
  [exec]   Could NOT find PkgConfig (missing: PKG_CONFIG_EXECUTABLE)

 Chris Nauroth
 Hortonworks
 http://hortonworks.com/



 On Wed, Jul 9, 2014 at 7:08 AM, Giridharan Kesavan 
 gkesa...@hortonworks.com wrote:

 Build jobs are now configured to run on the newer set of slaves.



 -giri


 On Mon, Jul 7, 2014 at 4:12 PM, Giridharan Kesavan 
 gkesa...@hortonworks.com
  wrote:

  All
 
  Yahoo is in the process of retiring all the hadoop jenkins build
 slaves,
  *hadoop[1-9]* and

  replace them with a newer set of beefier hosts. These new machines are
  configured
  with *ubuntu-14.04*.

 
  Over the next couple of days I will be configuring the build jobs to
 run
  on these newly
  configured build slaves.  To automate the installation of tools and
 build
  libraries I have
  put together ansible scripts and here is the link to the toolchain
 repo.
 
 
  *https://github.com/apache/toolchain 
 https://github.com/apache/toolchain

  *
 
  During the transition, the old build slave will be accessible, and
  expected to be shutdown by 07/15.
 
  I will send out an update later this week when this transition is
  complete.
 
  *Mean while, I would like to request the project owners to
 remove/cleanup
  any stale *
  *jenkins job for their respective project and help with any builds
 issue
  to make this *
  *transition seamless. *
 
  Thanks
 
  -
  Giri
 

 --
 CONFIDENTIALITY NOTICE
 NOTICE: This message is intended for the use of the individual or entity
 to
 which it is addressed and may contain information that is confidential,
 privileged and exempt from disclosure under applicable law. If the reader
 of this message is not the intended recipient, you are hereby notified
 that
 any printing, copying, dissemination, distribution, disclosure or
 forwarding of this communication is strictly prohibited. If you have
 received this communication in error, please contact the sender
 immediately
 and delete it from your system. Thank You.





-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.


Re: Jenkins Build Slaves

2014-07-09 Thread Chris Nauroth
Hi Giri,

Is pkgconfig deployed on the new Jenkins slaves?  I noticed this build
failed:

https://builds.apache.org/job/PreCommit-HADOOP-Build/4237/

Looking in the console output, it appears the HDFS native code failed to
build due to missing pkgconfig.

 [exec] CMake Error at
/usr/share/cmake-2.8/Modules/FindPackageHandleStandardArgs.cmake:108
(message):
 [exec]   Could NOT find PkgConfig (missing: PKG_CONFIG_EXECUTABLE)

Chris Nauroth
Hortonworks
http://hortonworks.com/



On Wed, Jul 9, 2014 at 7:08 AM, Giridharan Kesavan gkesa...@hortonworks.com
 wrote:

 Build jobs are now configured to run on the newer set of slaves.



 -giri


 On Mon, Jul 7, 2014 at 4:12 PM, Giridharan Kesavan 
 gkesa...@hortonworks.com
  wrote:

  All
 
  Yahoo is in the process of retiring all the hadoop jenkins build slaves,
  *hadoop[1-9]* and
  replace them with a newer set of beefier hosts. These new machines are
  configured
  with *ubuntu-14.04*.
 
  Over the next couple of days I will be configuring the build jobs to run
  on these newly
  configured build slaves.  To automate the installation of tools and build
  libraries I have
  put together ansible scripts and here is the link to the toolchain repo.
 
 
  *https://github.com/apache/toolchain 
 https://github.com/apache/toolchain
  *
 
  During the transition, the old build slave will be accessible, and
  expected to be shutdown by 07/15.
 
  I will send out an update later this week when this transition is
  complete.
 
  *Mean while, I would like to request the project owners to remove/cleanup
  any stale *
  *jenkins job for their respective project and help with any builds issue
  to make this *
  *transition seamless. *
 
  Thanks
 
  -
  Giri
 

 --
 CONFIDENTIALITY NOTICE
 NOTICE: This message is intended for the use of the individual or entity to
 which it is addressed and may contain information that is confidential,
 privileged and exempt from disclosure under applicable law. If the reader
 of this message is not the intended recipient, you are hereby notified that
 any printing, copying, dissemination, distribution, disclosure or
 forwarding of this communication is strictly prohibited. If you have
 received this communication in error, please contact the sender immediately
 and delete it from your system. Thank You.


-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.


Re: Moving to JDK7, JDK8 and new major releases

2014-06-28 Thread Chris Nauroth
Following up on ecosystem, I just took a look at the Apache trunk pom.xml
files for HBase, Flume and Oozie.  All are specifying 1.6 for source and
target in the maven-compiler-plugin configuration, so there may be
additional follow-up required here.  (For example, if HBase has made a
statement that its client will continue to support JDK6, then it wouldn't
be practical for them to link to a JDK7 version of hadoop-common.)

+1 for the whole plan though.  We can work through these details.

Chris Nauroth
Hortonworks
http://hortonworks.com/



On Fri, Jun 27, 2014 at 3:10 PM, Karthik Kambatla ka...@cloudera.com
wrote:

 +1 to making 2.6 the last JDK6 release.

 If we want, 2.7 could be a parallel release or one soon after 2.6. We could
 upgrade other dependencies that require JDK7 as well.


 On Fri, Jun 27, 2014 at 3:01 PM, Arun C. Murthy a...@hortonworks.com
 wrote:

  Thanks everyone for the discussion. Looks like we have come to a
 pragmatic
  and progressive conclusion.
 
  In terms of execution of the consensus plan, I think a little bit of
  caution is in order.
 
  Let's give downstream projects more of a runway.
 
  I propose we inform HBase, Pig, Hive etc. that we are considering making
  2.6 (not 2.5) the last JDK6 release and solicit their feedback. Once they
  are comfortable we can pull the trigger in 2.7.
 
  thanks,
  Arun
 
 
   On Jun 27, 2014, at 11:34 AM, Karthik Kambatla ka...@cloudera.com
  wrote:
  
   As someone else already mentioned, we should announce one future
 release
   (may be, 2.5) as the last JDK6-based release before making the move to
  JDK7.
  
   I am comfortable calling 2.5 the last JDK6 release.
  
  
   On Fri, Jun 27, 2014 at 11:26 AM, Andrew Wang 
 andrew.w...@cloudera.com
   wrote:
  
   Hi all, responding to multiple messages here,
  
   Arun, thanks for the clarification regarding MR classpaths. It sounds
  like
   the story there is improved and still improving.
  
   However, I think we still suffer from this at least on the HDFS side.
 We
   have a single JAR for all of HDFS, and our clients need to have all
 the
  fun
   deps like Guava on the classpath. I'm told Spark sticks a newer Guava
 at
   the front of the classpath and the HDFS client still works okay, but
  this
   is more happy coincidence than anything else. While we're leaking
 deps,
   we're in a scary situation.
  
   API compat to me means that an app should be able to run on a new
 minor
   version of Hadoop and not have anything break. MAPREDUCE-4421 sounds
  like
   it allows you to run e.g. 2.3 MR jobs on a 2.4 YARN cluster, but what
   should also be possible is running an HDFS 2.3 app with HDFS 2.4 JARs
  and
   have nothing break. If we muck with the classpath, my understanding is
  that
   this could break.
  
   Owen, bumping the minimum JDK version in a minor release like this
  should
   be a one-time exception as Tucu stated. A number of people have
 pointed
  out
   how painful a forced JDK upgrade is for end users, and it's not
  something
   we should be springing on them in a minor release unless we're *very*
   confident like in this case.
  
   Chris, thanks for bringing up the ecosystem. For CDH5, we standardized
  on
   JDK7 across the CDH stack, so I think that's an indication that most
   ecosystem projects are ready to make the jump. Is that sufficient in
  your
   mind?
  
   For the record, I'm also +1 on the Tucu plan. Is it too late to do
 this
  for
   2.5? I'll offer to help out with some of the mechanics.
  
   Thanks,
   Andrew
  
   On Wed, Jun 25, 2014 at 4:18 PM, Chris Nauroth 
  cnaur...@hortonworks.com
   wrote:
  
   I understood the plan for avoiding JDK7-specific features in our
 code,
   and
   your suggestion to add an extra Jenkins job is a great way to guard
   against
   that.  The thing I haven't seen discussed yet is how downstream
  projects
   will continue to consume our built artifacts.  If a downstream
 project
   upgrades to pick up a bug fix, and the jar switches to 1.7 class
 files,
   but
   their project is still building with 1.6, then it would be a nasty
   surprise.
  
   These are the options I see:
  
   1. Make sure all other projects upgrade first.  This doesn't sound
   feasible, unless all other ecosystem projects have moved to JDK7
  already.
   If not, then waiting on a single long pole project would hold up our
   migration indefinitely.
  
   2. We switch to JDK7, but run javac with -target 1.6 until the whole
   ecosystem upgrades.  I find this undesirable, because in a certain
  sense,
   it still leaves a bit of 1.6 lingering in the project.  (I'll assume
  that
   end-of-life for JDK6 also means end-of-life for the 1.6 bytecode
  format.)
  
   3. Just declare a clean break on some version (your earlier email
 said
   2.5)
   and start publishing artifacts built with JDK7 and no -target option.
   Overall, this is my preferred option.  However, as a side effect,
 this
   sets us up for longer-term maintenance and patch

Re: Anyone know how to mock a secured hdfs for unit test?

2014-06-27 Thread Chris Nauroth
Hi David and Kai,

There are a couple of challenges with this, but I just figured out a pretty
decent setup while working on HDFS-2856.  That code isn't committed yet,
but if you open patch version 5 attached to that issue and look for the
TestSaslDataTransfer class, then you'll see how it works.  Most of the
logic for bootstrapping a MiniKDC and setting up the right HDFS
configuration properties is in an abstract base class named
SaslDataTransferTestCase.

I hope this helps.

There are a few other open issues out there related to tests in secure
mode.  I know of HDFS-4312 and HDFS-5410.  It would be great to get more
regular test coverage with something that more closely approximates a
secured deployment.

Chris Nauroth
Hortonworks
http://hortonworks.com/



On Thu, Jun 26, 2014 at 7:27 AM, Zheng, Kai kai.zh...@intel.com wrote:

 Hi David,

 Quite some time ago I opened HADOOP-9952 and planned to create secured
 MiniClusters by making use of MiniKDC. Unfortunately since then I didn't
 get the chance to work on it yet. If you need something like that and would
 contribute, please let me know and see if anything I can help with. Thanks.

 Regards,
 Kai

 -Original Message-
 From: Liu, David [mailto:liujion...@gmail.com]
 Sent: Thursday, June 26, 2014 10:12 PM
 To: hdfs-...@hadoop.apache.org; hdfs-iss...@hadoop.apache.org;
 yarn-...@hadoop.apache.org; yarn-iss...@hadoop.apache.org;
 mapreduce-dev@hadoop.apache.org; secur...@hadoop.apache.org
 Subject: Anyone know how to mock a secured hdfs for unit test?

 Hi all,

 I need to test my code which read data from secured hdfs, is there any
 library to mock secured hdfs, can minihdfscluster do the work?
 Any suggestion is appreciated.


 Thanks


-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.


Re: Moving to JDK7, JDK8 and new major releases

2014-06-25 Thread Chris Nauroth
I'm also +1 for getting us to JDK7 within the 2.x line after reading the
proposals and catching up on the discussion in this thread.

Has anyone yet considered how to coordinate this change with downstream
projects?  Would we request downstream projects to upgrade to JDK7 first
before we make the move?  Would we switch to JDK7, but run javac -target
1.6 to maintain compatibility for downstream projects during an interim
period?

Chris Nauroth
Hortonworks
http://hortonworks.com/



On Wed, Jun 25, 2014 at 9:48 AM, Owen O'Malley omal...@apache.org wrote:

 On Tue, Jun 24, 2014 at 4:44 PM, Alejandro Abdelnur t...@cloudera.com
 wrote:

  After reading this thread and thinking a bit about it, I think it should
 be
  OK such move up to JDK7 in Hadoop


 I agree with Alejandro. Changing minimum JDKs is not an incompatible change
 and is fine in the 2 branch. (Although I think it is would *not* be
 appropriate for a patch release.) Of course we need to do it with
 forethought and testing, but moving off of JDK 6, which is EOL'ed is a good
 thing. Moving to Java 8 as a minimum seems much too aggressive and I would
 push back on that.

 I'm also think that we need to let the dust settle on the Hadoop 2 line for
 a while before we talk about Hadoop 3. It seems that it has only been in
 the last 6 months that Hadoop 2 adoption has reached the main stream users.
 Our user community needs time to digest the changes in Hadoop 2.x before we
 fracture the community by starting to discuss Hadoop 3 releases.

 .. Owen


-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.


Re: [VOTE] Change by-laws on release votes: 5 days instead of 7

2014-06-24 Thread Chris Nauroth
+1 (binding)

Chris Nauroth
Hortonworks
http://hortonworks.com/



On Tue, Jun 24, 2014 at 10:58 AM, Jakob Homan jgho...@gmail.com wrote:

 +1 (binding)


 On Tue, Jun 24, 2014 at 10:33 AM, Zhijie Shen zs...@hortonworks.com
 wrote:

  +1 (non-binding)
 
 
  On Wed, Jun 25, 2014 at 1:26 AM, Aaron T. Myers a...@cloudera.com
 wrote:
 
   +1 (binding)
  
   --
   Aaron T. Myers
   Software Engineer, Cloudera
  
  
   On Tue, Jun 24, 2014 at 1:53 AM, Arun C Murthy a...@hortonworks.com
   wrote:
  
Folks,
   
 As discussed, I'd like to call a vote on changing our by-laws to
  change
release votes from 7 days to 5.
   
 I've attached the change to by-laws I'm proposing.
   
 Please vote, the vote will the usual period of 7 days.
   
thanks,
Arun
   

   
[main]$ svn diff
Index: author/src/documentation/content/xdocs/bylaws.xml
===
--- author/src/documentation/content/xdocs/bylaws.xml   (revision
   1605015)
+++ author/src/documentation/content/xdocs/bylaws.xml   (working
 copy)
@@ -344,7 +344,16 @@
 pVotes are open for a period of 7 days to allow all active
 voters time to consider the vote. Votes relating to code
 changes are not subject to a strict timetable but should be
-made as timely as possible./p/li
+made as timely as possible./p
+
+ ul
+ li strongProduct Release - Vote Timeframe/strong
+   pRelease votes, alone, run for a period of 5 days. All
   other
+ votes are subject to the above timeframe of 7 days./p
+ /li
+   /ul
+   /li
+
/ul
/section
 /body
--
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or
  entity
   to
which it is addressed and may contain information that is
 confidential,
privileged and exempt from disclosure under applicable law. If the
  reader
of this message is not the intended recipient, you are hereby
 notified
   that
any printing, copying, dissemination, distribution, disclosure or
forwarding of this communication is strictly prohibited. If you have
received this communication in error, please contact the sender
   immediately
and delete it from your system. Thank You.
   
  
 
 
 
  --
  Zhijie Shen
  Hortonworks Inc.
  http://hortonworks.com/
 
  --
  CONFIDENTIALITY NOTICE
  NOTICE: This message is intended for the use of the individual or entity
 to
  which it is addressed and may contain information that is confidential,
  privileged and exempt from disclosure under applicable law. If the reader
  of this message is not the intended recipient, you are hereby notified
 that
  any printing, copying, dissemination, distribution, disclosure or
  forwarding of this communication is strictly prohibited. If you have
  received this communication in error, please contact the sender
 immediately
  and delete it from your system. Thank You.
 


-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.


Re: SIMPLE authentication is not enabled error for secured hdfs read

2014-06-24 Thread Chris Nauroth
Hi David,

UserGroupInformation.createRemoteUser does not attach credentials to the
returned ugi.  I expect the server side is rejecting the connection due to
lack of credentials.  This is actually by design.  The
UserGroupInformation.createRemoteUser method is primarily intended for use
on the server side when it wants to run a piece of its code while
impersonating the client.

I'd say that your second code sample is the correct one.  After running
kinit to get credentials, you can just run your code.  I expect Kerberos
authentication to work without taking any special measures to call
UserGroupInformation directly from your code.

Hope this helps.

Chris Nauroth
Hortonworks
http://hortonworks.com/



On Tue, Jun 24, 2014 at 6:29 AM, Liu, David liujion...@gmail.com wrote:

 Hi experts,

 After kinit hadoop, When I run this java file on a secured hadoop cluster,
 I met the following error:
 14/06/24 16:53:41 ERROR security.UserGroupInformation:
 PriviledgedActionException as:hdfs (auth:SIMPLE)
 cause:org.apache.hadoop.security.AccessControlException: Client cannot
 authenticate via:[TOKEN, KERBEROS]
 14/06/24 16:53:41 WARN ipc.Client: Exception encountered while connecting
 to the server : org.apache.hadoop.security.AccessControlException: Client
 cannot authenticate via:[TOKEN, KERBEROS]
 14/06/24 16:53:41 ERROR security.UserGroupInformation:
 PriviledgedActionException as:hdfs (auth:SIMPLE) cause:java.io.IOException:
 org.apache.hadoop.security.AccessControlException: Client cannot
 authenticate via:[TOKEN, KERBEROS]
 14/06/24 16:53:41 ERROR security.UserGroupInformation:
 PriviledgedActionException as:hdfs (auth:SIMPLE) cause:java.io.IOException:
 Failed on local exception: java.io.IOException:
 org.apache.hadoop.security.AccessControlException: Client cannot
 authenticate via:[TOKEN, KERBEROS]; Host Details : local host is:
 hdsh2-a161/10.62.66.161; destination host is: hdsh2-a161.lss.emc.com
 :8020;
 Exception in thread main java.io.IOException: Failed on local exception:
 java.io.IOException: org.apache.hadoop.security.AccessControlException:
 Client cannot authenticate via:[TOKEN, KERBEROS]; Host Details : local host
 is: hdsh2-a161/10.62.66.161; destination host is: 
 hdsh2-a161.lss.emc.com:8020;
 at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:764)
 at org.apache.hadoop.ipc.Client.call(Client.java:1351)
 at org.apache.hadoop.ipc.Client.call(Client.java:1300)
 at
 org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
 at com.sun.proxy.$Proxy9.getBlockLocations(Unknown Source)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
 at
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:606)
 at
 org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:186)
 at
 org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
 at com.sun.proxy.$Proxy9.getBlockLocations(Unknown Source)
 at
 org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getBlockLocations(ClientNamenodeProtocolTranslatorPB.java:191)
 at
 org.apache.hadoop.hdfs.DFSClient.callGetBlockLocations(DFSClient.java:1067)
 at
 org.apache.hadoop.hdfs.DFSClient.getLocatedBlocks(DFSClient.java:1057)
 at
 org.apache.hadoop.hdfs.DFSClient.getLocatedBlocks(DFSClient.java:1047)
 at
 org.apache.hadoop.hdfs.DFSInputStream.fetchLocatedBlocksAndGetLastBlockLength(DFSInputStream.java:235)
 at
 org.apache.hadoop.hdfs.DFSInputStream.openInfo(DFSInputStream.java:202)
 at
 org.apache.hadoop.hdfs.DFSInputStream.init(DFSInputStream.java:195)
 at org.apache.hadoop.hdfs.DFSClient.open(DFSClient.java:1215)
 at
 org.apache.hadoop.hdfs.DistributedFileSystem$3.doCall(DistributedFileSystem.java:290)
 at
 org.apache.hadoop.hdfs.DistributedFileSystem$3.doCall(DistributedFileSystem.java:286)
 at
 org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
 at
 org.apache.hadoop.hdfs.DistributedFileSystem.open(DistributedFileSystem.java:286)
 at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:763)
 at Testhdfs$1.run(Testhdfs.java:43)
 at Testhdfs$1.run(Testhdfs.java:30)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:415)
 at
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491)
 at Testhdfs.main(Testhdfs.java:30)


 Here is my code:

 UserGroupInformation ugi = UserGroupInformation.createRemoteUser(hadoop);
 ugi.doAs(new PrivilegedExceptionActionVoid() {
 public Void run

Re: [DISCUSS] Change by-laws on release votes: 5 days instead of 7

2014-06-22 Thread Chris Nauroth
+1 binding.  Thanks, Arun.

Chris Nauroth
Hortonworks
http://hortonworks.com/



On Sat, Jun 21, 2014 at 10:36 AM, Arun C. Murthy a...@hortonworks.com
wrote:

 Uma,

  Voting periods are defined in *minimum* terms, so it already covers what
 you'd like to see i.e. the vote can continue longer.

 thanks,
 Arun

  On Jun 21, 2014, at 2:19 AM, Gangumalla, Uma uma.ganguma...@intel.com
 wrote:
 
  How about proposing vote for 5days and give chance to RM for extending
 vote for 2more days( total to 7days) if the rc did not receive enough vote
 within 5days? If a rc received enough votes in 5days, RM can close vote.
  I can see an advantage of 7days voting is, that will cover all the week
 and weekend days. So, if someone wants to test on weekend time(due to the
 weekday schedules), that will give chance to them.
 
  Regards,
  Uma
 
  -Original Message-
  From: Arun C Murthy [mailto:a...@hortonworks.com]
  Sent: Saturday, June 21, 2014 11:25 AM
  To: hdfs-...@hadoop.apache.org; common-...@hadoop.apache.org;
 yarn-...@hadoop.apache.org; mapreduce-dev@hadoop.apache.org
  Subject: [DISCUSS] Change by-laws on release votes: 5 days instead of 7
 
  Folks,
 
  I'd like to propose we change our by-laws to reduce our voting periods
 on new releases from 7 days to 5.
 
  Currently, it just takes too long to turn around releases; particularly
 if we have critical security fixes etc.
 
  Thoughts?
 
  thanks,
  Arun
 
 
  --
  CONFIDENTIALITY NOTICE
  NOTICE: This message is intended for the use of the individual or entity
 to which it is addressed and may contain information that is confidential,
 privileged and exempt from disclosure under applicable law. If the reader
 of this message is not the intended recipient, you are hereby notified that
 any printing, copying, dissemination, distribution, disclosure or
 forwarding of this communication is strictly prohibited. If you have
 received this communication in error, please contact the sender immediately
 and delete it from your system. Thank You.

 --
 CONFIDENTIALITY NOTICE
 NOTICE: This message is intended for the use of the individual or entity to
 which it is addressed and may contain information that is confidential,
 privileged and exempt from disclosure under applicable law. If the reader
 of this message is not the intended recipient, you are hereby notified that
 any printing, copying, dissemination, distribution, disclosure or
 forwarding of this communication is strictly prohibited. If you have
 received this communication in error, please contact the sender immediately
 and delete it from your system. Thank You.


-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.


[jira] [Created] (MAPREDUCE-5922) Update distcp documentation to mention option for preserving xattrs.

2014-06-11 Thread Chris Nauroth (JIRA)
Chris Nauroth created MAPREDUCE-5922:


 Summary: Update distcp documentation to mention option for 
preserving xattrs.
 Key: MAPREDUCE-5922
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5922
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: documentation
Affects Versions: 3.0.0
Reporter: Chris Nauroth
Priority: Minor


In 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/site/markdown/DistCp.md.vm,
 let's add a mention of the new distcp option for preserving xattrs.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (MAPREDUCE-5886) Allow wordcount example job to accept multiple input paths.

2014-05-11 Thread Chris Nauroth (JIRA)
Chris Nauroth created MAPREDUCE-5886:


 Summary: Allow wordcount example job to accept multiple input 
paths.
 Key: MAPREDUCE-5886
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5886
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: examples
Affects Versions: 3.0.0, 2.4.0
Reporter: Chris Nauroth
Assignee: Chris Nauroth
Priority: Minor
 Attachments: MAPREDUCE-5886.1.patch

It would be convenient if the wordcount example MapReduce job could accept 
multiple input paths and run the word count on all of them.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (MAPREDUCE-5852) Prepare MapReduce codebase for JUnit 4.11.

2014-04-21 Thread Chris Nauroth (JIRA)
Chris Nauroth created MAPREDUCE-5852:


 Summary: Prepare MapReduce codebase for JUnit 4.11.
 Key: MAPREDUCE-5852
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5852
 Project: Hadoop Map/Reduce
  Issue Type: Test
  Components: test
Affects Versions: 3.0.0, 2.4.0
Reporter: Chris Nauroth
Assignee: Chris Nauroth
Priority: Minor
 Attachments: MAPREDUCE-5852.1.patch

HADOOP-10503 upgrades the entire Hadoop repo to use JUnit 4.11. Some of the 
MapReduce code needs some minor updates to fix deprecation warnings before the 
upgrade.




--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (MAPREDUCE-5850) PATH environment variable contains duplicate values in map and reduce tasks on Windows.

2014-04-18 Thread Chris Nauroth (JIRA)
Chris Nauroth created MAPREDUCE-5850:


 Summary: PATH environment variable contains duplicate values in 
map and reduce tasks on Windows.
 Key: MAPREDUCE-5850
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5850
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: client
Affects Versions: 3.0.0, 2.4.0
Reporter: Chris Nauroth
Assignee: Chris Nauroth
Priority: Minor


The value of the PATH environment variable gets appended twice before execution 
of a container for a map or reduce task.  This is ultimately harmless at 
runtime, but it does cause a failure in {{TestMiniMRChildTask}} when running on 
Windows.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (MAPREDUCE-5840) Update MapReduce calls to ProxyUsers#authorize.

2014-04-16 Thread Chris Nauroth (JIRA)
Chris Nauroth created MAPREDUCE-5840:


 Summary: Update MapReduce calls to ProxyUsers#authorize.
 Key: MAPREDUCE-5840
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5840
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 3.0.0, 2.4.0
Reporter: Chris Nauroth
Assignee: Benoy Antony
Priority: Minor


HADOOP-10499 will remove an unnecessary overload of {{ProxyUsers#authorize}}. 
This issue tracks updating call sites in the MapReduce code.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


Re: Policy on adding timeouts to tests

2014-04-15 Thread Chris Nauroth
+common-dev, hdfs-dev

My understanding of the current situation is that we had a period where we
tried to enforce adding timeouts on all new tests in patches, but it caused
trouble, and now we're back to not requiring it.  Jenkins test-patch isn't
checking for it anymore.

I don't think patches are getting rejected for using timeouts though.

The difficulty is that execution time is quite sensitive to the build
environment.  (Consider top-of-the-line server hardware used in build
infrastructure vs. a dev running a VirtualBox VM with 1 dedicated CPU, 2 GB
RAM and slow virtualized disk.)  When we were enforcing timeouts, it was
quite common to see follow-up patches tuning up the timeout settings to
make tests work reliably in a greater variety of environments.  At that
point, the benefit of using the timeout becomes questionable, because now
the fast machine is running with the longer timeout too.

Chris Nauroth
Hortonworks
http://hortonworks.com/



On Mon, Apr 14, 2014 at 9:41 AM, Karthik Kambatla ka...@cloudera.comwrote:

 Hi folks

 Just wanted to check what our policy for adding timeouts to tests is. Do we
 encourage/discourage using timeouts for tests? If we discourage using
 timeouts for tests in general, are we okay with adding timeouts for a few
 tests where we explicitly want the test to fail if it takes longer than a
 particular amount of time?

 Thanks
 Karthik


-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.


Re: Thinking ahead

2014-04-12 Thread Chris Nauroth
+1

The proposed content for 2.5 in the roadmap wiki looks good to me.
On Apr 12, 2014 7:26 AM, Arun C Murthy a...@hortonworks.com wrote:

 Gang,

  With hadoop-2.4 out, it's time to think ahead.

  In the short-term hadoop-2.4.1 is in order; particularly with
 https://issues.apache.org/jira/browse/MAPREDUCE-5830 (it's a break to
 @Private API, unfortunately something Hive is using - sigh!). There are
 some other fixes which testing has uncovered; so it will be nice to pull
 them them in. I'm thinking of an RC by end of the coming week - committers,
 please be *very* conservative when getting stuff into 2.4.1 (i.e. merging
 to branch-2.4).

  Next up, hadoop-2.5.

  I've updated https://wiki.apache.org/hadoop/Roadmap with some candidates
 for consideration - please chime in and say 'aye'/'nay' or add new content.
 IAC, I suspect that list is too large.

  Rather than wait for everything it would be better to plan on releasing
 it on a time-bound manner; particularly around the Hadoop Summit. If that
 makes sense; I think we should target branching for 2.5 by mid-May to get
 it stable and released by early June.

  Thoughts?

 thanks,
 Arun


 --
 Arun C. Murthy
 Hortonworks Inc.
 http://hortonworks.com/



 --
 CONFIDENTIALITY NOTICE
 NOTICE: This message is intended for the use of the individual or entity to
 which it is addressed and may contain information that is confidential,
 privileged and exempt from disclosure under applicable law. If the reader
 of this message is not the intended recipient, you are hereby notified that
 any printing, copying, dissemination, distribution, disclosure or
 forwarding of this communication is strictly prohibited. If you have
 received this communication in error, please contact the sender immediately
 and delete it from your system. Thank You.


-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.


[jira] [Resolved] (MAPREDUCE-5808) Port output replication factor configurable for terasort to Hadoop 1.x

2014-03-24 Thread Chris Nauroth (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5808?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Nauroth resolved MAPREDUCE-5808.
--

   Resolution: Fixed
Fix Version/s: 1.3.0
   1-win

I committed this to branch-1 and branch-1-win.  Chuan, thank you for the patch.

 Port output replication factor configurable for terasort to Hadoop 1.x
 --

 Key: MAPREDUCE-5808
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5808
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: examples
Affects Versions: 1-win, 1.3.0
Reporter: Chuan Liu
Assignee: Chuan Liu
Priority: Minor
 Fix For: 1-win, 1.3.0

 Attachments: MAPREDUCE-5808.patch


 Currently, terasort output is hardcoded to have replication factor of 1 in 
 TeraSort.java in Hadoop branch-1 code base and configurable in Hadoop 2.0 and 
 trunk. We would like to back port the changes to make terasort output 
 replication factor configurable in Hadoop 1.0.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


Re: [DISCUSS] Clarification on Compatibility Policy: Upgraded Client + Old Server

2014-03-24 Thread Chris Nauroth
Adding back all *-dev lists to make sure everyone is covered.

Chris Nauroth
Hortonworks
http://hortonworks.com/



On Mon, Mar 24, 2014 at 2:02 PM, Chris Nauroth cnaur...@hortonworks.comwrote:

 Thank you, everyone, for the discussion.  There is general agreement, so I
 have filed HADOOP-10423 with a patch to update the compatibility
 documentation.

 Chris Nauroth
 Hortonworks
 http://hortonworks.com/



 On Thu, Mar 20, 2014 at 11:24 AM, Colin McCabe cmcc...@alumni.cmu.eduwrote:

 +1 for making this guarantee explicit.

 It also definitely seems like a good idea to test mixed versions in
 bigtop.

 HDFS is not immune to new client, old server scenarios because the HDFS
 client gets bundled into a lot of places.

 Colin
 On Mar 20, 2014 10:55 AM, Chris Nauroth cnaur...@hortonworks.com
 wrote:

  Our use of protobuf helps mitigate a lot of compatibility concerns, but
  there still can be situations that require careful coding on our part.
   When adding a new field to a protobuf message, the client might need
 to do
  a null check, even if the server-side implementation in the new version
  always populates the field.  When adding a whole new RPC endpoint, the
  client might need to consider the possibility that the RPC endpoint
 isn't
  there on an old server, and degrade gracefully after the RPC fails.  The
  original issue in MAPREDUCE-4052 concerned the script commands passed
 in a
  YARN container submission, where protobuf doesn't provide any validation
  beyond the fact that they're strings.
 
  Forward compatibility is harder than backward compatibility, and
 testing is
  a big challenge.  Our test suites in the Hadoop repo don't cover this.
   Does anyone know if anything in Bigtop tries to run with mixed
 versions?
 
  I agree that we need to make it clear in the language that upgrading
 client
  alone is insufficient to get access to new server-side features,
 including
  new YARN APIs.  Thanks for the suggestions, Steve.
 
  Chris Nauroth
  Hortonworks
  http://hortonworks.com/
 
 
 
  On Thu, Mar 20, 2014 at 5:53 AM, Steve Loughran ste...@hortonworks.com
  wrote:
 
   I'm clearly supportive of this, though of course the testing costs
 needed
   to back up the assertion make it more expensive than just a statement.
  
   Two issues
  
   -we'd need to make clear that new cluster features that a client can
  invoke
   won't be available. You can't expect snapshot or symlink support
 running
   against a -2.2.0 cluster, even if the client supports it.
  
   -in YARN, there are no guarantees that an app compiled against later
 YARN
   APIs will work in old clusters. Because YARN apps upload themselves to
  the
   server, and run with their hadoop, hdfs  yarn libraries. We have to
 do a
   bit of introspection in our code already to support this situation.
 The
   compatibility doc would need to be clear on that too: YARN apps that
 use
   new APIs (including new fields in datastructures) can expect link
   exceptions
  
  
  
  
  
   On 20 March 2014 04:25, Vinayakumar B vinayakuma...@huawei.com
 wrote:
  
+1, I agree with your point Chris. It depends on the client
 application
how they using the hdfs jars in their classpath.
   
As implementation already supports the compatibility (through
  protobuf),
No extra code changes required to support new Client + old server.
   
I feel it will be good to explicitly mention about the
 compatibility of
existing APIs in both versions.
   
Anyway this is not applicable for the new APIs in latest client and
  this
is understood. We can make it explicit in the document though.
   
   
Regards,
Vinayakumar B
   
-Original Message-
From: Chris Nauroth [mailto:cnaur...@hortonworks.com]
Sent: 20 March 2014 05:36
To: common-...@hadoop.apache.org
Cc: mapreduce-dev@hadoop.apache.org; hdfs-...@hadoop.apache.org;
yarn-...@hadoop.apache.org
Subject: Re: [DISCUSS] Clarification on Compatibility Policy:
 Upgraded
Client + Old Server
   
I think this kind of compatibility issue still could surface for
 HDFS,
particularly for custom applications (i.e. something not executed
 via
hadoop jar on a cluster node, where the client classes ought to be
injected into the classpath automatically).  Running DistCP between
 2
clusters of different versions could result in a 2.4.0 client
 calling a
2.3.0 NameNode.  Someone could potentially pick up the 2.4.0 WebHDFS
client as a dependency and try to use it to make HTTP calls to a
 2.3.0
   HDFS
cluster.
   
Chris Nauroth
Hortonworks
http://hortonworks.com/
   
   
   
On Wed, Mar 19, 2014 at 4:28 PM, Vinod Kumar Vavilapalli 
vino...@apache.org
 wrote:
   
 It makes sense only for YARN today where we separated out the
  clients.
 HDFS is still a monolithic jar so this compatibility issue is
 kind of
 invalid there.

 +vinod

 On Mar 19, 2014, at 1:59 PM, Chris Nauroth

Re: [DISCUSS] Clarification on Compatibility Policy: Upgraded Client + Old Server

2014-03-20 Thread Chris Nauroth
Our use of protobuf helps mitigate a lot of compatibility concerns, but
there still can be situations that require careful coding on our part.
 When adding a new field to a protobuf message, the client might need to do
a null check, even if the server-side implementation in the new version
always populates the field.  When adding a whole new RPC endpoint, the
client might need to consider the possibility that the RPC endpoint isn't
there on an old server, and degrade gracefully after the RPC fails.  The
original issue in MAPREDUCE-4052 concerned the script commands passed in a
YARN container submission, where protobuf doesn't provide any validation
beyond the fact that they're strings.

Forward compatibility is harder than backward compatibility, and testing is
a big challenge.  Our test suites in the Hadoop repo don't cover this.
 Does anyone know if anything in Bigtop tries to run with mixed versions?

I agree that we need to make it clear in the language that upgrading client
alone is insufficient to get access to new server-side features, including
new YARN APIs.  Thanks for the suggestions, Steve.

Chris Nauroth
Hortonworks
http://hortonworks.com/



On Thu, Mar 20, 2014 at 5:53 AM, Steve Loughran ste...@hortonworks.comwrote:

 I'm clearly supportive of this, though of course the testing costs needed
 to back up the assertion make it more expensive than just a statement.

 Two issues

 -we'd need to make clear that new cluster features that a client can invoke
 won't be available. You can't expect snapshot or symlink support running
 against a -2.2.0 cluster, even if the client supports it.

 -in YARN, there are no guarantees that an app compiled against later YARN
 APIs will work in old clusters. Because YARN apps upload themselves to the
 server, and run with their hadoop, hdfs  yarn libraries. We have to do a
 bit of introspection in our code already to support this situation. The
 compatibility doc would need to be clear on that too: YARN apps that use
 new APIs (including new fields in datastructures) can expect link
 exceptions





 On 20 March 2014 04:25, Vinayakumar B vinayakuma...@huawei.com wrote:

  +1, I agree with your point Chris. It depends on the client application
  how they using the hdfs jars in their classpath.
 
  As implementation already supports the compatibility (through protobuf),
  No extra code changes required to support new Client + old server.
 
  I feel it will be good to explicitly mention about the compatibility of
  existing APIs in both versions.
 
  Anyway this is not applicable for the new APIs in latest client and this
  is understood. We can make it explicit in the document though.
 
 
  Regards,
  Vinayakumar B
 
  -Original Message-
  From: Chris Nauroth [mailto:cnaur...@hortonworks.com]
  Sent: 20 March 2014 05:36
  To: common-...@hadoop.apache.org
  Cc: mapreduce-dev@hadoop.apache.org; hdfs-...@hadoop.apache.org;
  yarn-...@hadoop.apache.org
  Subject: Re: [DISCUSS] Clarification on Compatibility Policy: Upgraded
  Client + Old Server
 
  I think this kind of compatibility issue still could surface for HDFS,
  particularly for custom applications (i.e. something not executed via
  hadoop jar on a cluster node, where the client classes ought to be
  injected into the classpath automatically).  Running DistCP between 2
  clusters of different versions could result in a 2.4.0 client calling a
  2.3.0 NameNode.  Someone could potentially pick up the 2.4.0 WebHDFS
  client as a dependency and try to use it to make HTTP calls to a 2.3.0
 HDFS
  cluster.
 
  Chris Nauroth
  Hortonworks
  http://hortonworks.com/
 
 
 
  On Wed, Mar 19, 2014 at 4:28 PM, Vinod Kumar Vavilapalli 
  vino...@apache.org
   wrote:
 
   It makes sense only for YARN today where we separated out the clients.
   HDFS is still a monolithic jar so this compatibility issue is kind of
   invalid there.
  
   +vinod
  
   On Mar 19, 2014, at 1:59 PM, Chris Nauroth cnaur...@hortonworks.com
   wrote:
  
I'd like to discuss clarification of part of our compatibility
 policy.
Here is a link to the compatibility documentation for release 2.3.0:
   
   
   http://hadoop.apache.org/docs/r2.3.0/hadoop-project-dist/hadoop-common
   /Compatibility.html#Wire_compatibility
   
For convenience, here are the specific lines in question:
   
Client-Server compatibility is required to allow users to continue
using the old clients even after upgrading the server (cluster) to a
later version (or vice versa). For example, a Hadoop 2.1.0 client
talking to a Hadoop 2.3.0 cluster.
   
Client-Server compatibility is also required to allow upgrading
   individual
components without upgrading others. For example, upgrade HDFS from
   version
2.1.0 to 2.2.0 without upgrading MapReduce.
   
Server-Server compatibility is required to allow mixed versions
within an active cluster so the cluster may be upgraded without
downtime in a
   rolling
fashion

[DISCUSS] Clarification on Compatibility Policy: Upgraded Client + Old Server

2014-03-19 Thread Chris Nauroth
I'd like to discuss clarification of part of our compatibility policy.
 Here is a link to the compatibility documentation for release 2.3.0:

http://hadoop.apache.org/docs/r2.3.0/hadoop-project-dist/hadoop-common/Compatibility.html#Wire_compatibility

For convenience, here are the specific lines in question:

Client-Server compatibility is required to allow users to continue using
the old clients even after upgrading the server (cluster) to a later
version (or vice versa). For example, a Hadoop 2.1.0 client talking to a
Hadoop 2.3.0 cluster.

Client-Server compatibility is also required to allow upgrading individual
components without upgrading others. For example, upgrade HDFS from version
2.1.0 to 2.2.0 without upgrading MapReduce.

Server-Server compatibility is required to allow mixed versions within an
active cluster so the cluster may be upgraded without downtime in a rolling
fashion.

Notice that there is no specific mention of upgrading the client ahead of
the server.  (There is no clause for upgraded client + old server.)
 Based on my experience, this is a valid use case when a user wants to pick
up a client-side bug fix ahead of the cluster administrator's upgrade
schedule.

Is it our policy to maintain client compatibility with old clusters within
the same major release?  I think many of us have assumed that the answer is
yes and coded our new features accordingly, but it isn't made explicit in
the documentation.  Do we all agree that the answer is yes, or is it
possibly up for debate depending on the change in question?  In RFC 2119
lingo, is it a MUST or a SHOULD?  Either way, I'd like to update the policy
text to make our decision clear.  After we have consensus, I can volunteer
to file an issue and patch the text of the policy.

This discussion started initially in MAPREDUCE-4052, which involved
changing our scripting syntax for MapReduce YARN container submissions.  We
settled the question there by gating the syntax change behind a
configuration option.  By default, it will continue using the existing
syntax currently understood by the pre-2.4.0 NodeManager, thus preserving
compatibility.  We wanted to open the policy question for wider discussion
though.

Thanks, everyone.

Chris Nauroth
Hortonworks
http://hortonworks.com/

-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.


Re: [DISCUSS] Clarification on Compatibility Policy: Upgraded Client + Old Server

2014-03-19 Thread Chris Nauroth
I think this kind of compatibility issue still could surface for HDFS,
particularly for custom applications (i.e. something not executed via
hadoop jar on a cluster node, where the client classes ought to be
injected into the classpath automatically).  Running DistCP between 2
clusters of different versions could result in a 2.4.0 client calling a
2.3.0 NameNode.  Someone could potentially pick up the 2.4.0 WebHDFS client
as a dependency and try to use it to make HTTP calls to a 2.3.0 HDFS
cluster.

Chris Nauroth
Hortonworks
http://hortonworks.com/



On Wed, Mar 19, 2014 at 4:28 PM, Vinod Kumar Vavilapalli vino...@apache.org
 wrote:

 It makes sense only for YARN today where we separated out the clients.
 HDFS is still a monolithic jar so this compatibility issue is kind of
 invalid there.

 +vinod

 On Mar 19, 2014, at 1:59 PM, Chris Nauroth cnaur...@hortonworks.com
 wrote:

  I'd like to discuss clarification of part of our compatibility policy.
  Here is a link to the compatibility documentation for release 2.3.0:
 
 
 http://hadoop.apache.org/docs/r2.3.0/hadoop-project-dist/hadoop-common/Compatibility.html#Wire_compatibility
 
  For convenience, here are the specific lines in question:
 
  Client-Server compatibility is required to allow users to continue using
  the old clients even after upgrading the server (cluster) to a later
  version (or vice versa). For example, a Hadoop 2.1.0 client talking to a
  Hadoop 2.3.0 cluster.
 
  Client-Server compatibility is also required to allow upgrading
 individual
  components without upgrading others. For example, upgrade HDFS from
 version
  2.1.0 to 2.2.0 without upgrading MapReduce.
 
  Server-Server compatibility is required to allow mixed versions within an
  active cluster so the cluster may be upgraded without downtime in a
 rolling
  fashion.
 
  Notice that there is no specific mention of upgrading the client ahead of
  the server.  (There is no clause for upgraded client + old server.)
  Based on my experience, this is a valid use case when a user wants to
 pick
  up a client-side bug fix ahead of the cluster administrator's upgrade
  schedule.
 
  Is it our policy to maintain client compatibility with old clusters
 within
  the same major release?  I think many of us have assumed that the answer
 is
  yes and coded our new features accordingly, but it isn't made explicit in
  the documentation.  Do we all agree that the answer is yes, or is it
  possibly up for debate depending on the change in question?  In RFC 2119
  lingo, is it a MUST or a SHOULD?  Either way, I'd like to update the
 policy
  text to make our decision clear.  After we have consensus, I can
 volunteer
  to file an issue and patch the text of the policy.
 
  This discussion started initially in MAPREDUCE-4052, which involved
  changing our scripting syntax for MapReduce YARN container submissions.
  We
  settled the question there by gating the syntax change behind a
  configuration option.  By default, it will continue using the existing
  syntax currently understood by the pre-2.4.0 NodeManager, thus preserving
  compatibility.  We wanted to open the policy question for wider
 discussion
  though.
 
  Thanks, everyone.
 
  Chris Nauroth
  Hortonworks
  http://hortonworks.com/
 
  --
  CONFIDENTIALITY NOTICE
  NOTICE: This message is intended for the use of the individual or entity
 to
  which it is addressed and may contain information that is confidential,
  privileged and exempt from disclosure under applicable law. If the reader
  of this message is not the intended recipient, you are hereby notified
 that
  any printing, copying, dissemination, distribution, disclosure or
  forwarding of this communication is strictly prohibited. If you have
  received this communication in error, please contact the sender
 immediately
  and delete it from your system. Thank You.


 --
 CONFIDENTIALITY NOTICE
 NOTICE: This message is intended for the use of the individual or entity to
 which it is addressed and may contain information that is confidential,
 privileged and exempt from disclosure under applicable law. If the reader
 of this message is not the intended recipient, you are hereby notified that
 any printing, copying, dissemination, distribution, disclosure or
 forwarding of this communication is strictly prohibited. If you have
 received this communication in error, please contact the sender immediately
 and delete it from your system. Thank You.


-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender

[jira] [Resolved] (MAPREDUCE-4401) Enhancements to MapReduce for Windows Server and Windows Azure development and runtime environments

2014-02-11 Thread Chris Nauroth (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4401?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Nauroth resolved MAPREDUCE-4401.
--

Resolution: Fixed

We've completed the intended scope of this issue, so I'm resolving it.  Thank 
you to all contributors!

 Enhancements to MapReduce for Windows Server and Windows Azure development 
 and runtime environments
 ---

 Key: MAPREDUCE-4401
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4401
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: build, mrv2
Affects Versions: 3.0.0
Reporter: Bikas Saha
Assignee: Bikas Saha

 This JIRA tracks the work that needs to be done on trunk to enable Hadoop to 
 run on Windows Server and Azure environments. This incorporates porting 
 relevant work from the similar effort on branch 1 tracked via HADOOP-8079.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


Re: branch development for HADOOP-9639

2013-12-06 Thread Chris Nauroth
+1 for the idea.  The branch committership clause was added for exactly
this kind of scenario.

From the phrasing in the bylaws, it looks like we'll need assistance from
PMC to get the ball rolling.  Is there a PMC member out there who could
volunteer to help start the process with Sangjin?

Chris Nauroth
Hortonworks
http://hortonworks.com/



On Mon, Dec 2, 2013 at 11:47 AM, Sangjin Lee sj...@apache.org wrote:

 We have been having discussions on HADOOP-9639 (shared cache for jars) and
 the proposed design there for some time now. We are going to start work on
 this and have it vetted and reviewed by the community. I have just filed
 some more implementation JIRAs for this feature: YARN-1465, MAPREDUCE-5662,
 YARN-1466, YARN-1467

 Rather than working privately in our corner and sharing a big patch at the
 end, I'd like to explore the idea of developing on a branch in the public
 to foster more public feedback. Recently the Hadoop PMC has passed the
 change to the bylaws to allow for branch committers (

 http://mail-archives.apache.org/mod_mbox/hadoop-general/201307.mbox/%3CCACO5Y4y7HZnn3BS-ZyCVfv-UBcMudeQhndr2vqg%3DXqE1oBiQvQ%40mail.gmail.com%3E
 ),
 and I think it would be a good model for this development.

 I'd like to propose a branch development and a branch committer status for
 a couple of us who are going to work on this per bylaw. Could you please
 let me know what you think?

  Thanks,
 Sangjin


-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.


[jira] [Resolved] (MAPREDUCE-5655) Remote job submit from windows to a linux hadoop cluster fails due to wrong classpath

2013-12-02 Thread Chris Nauroth (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5655?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Nauroth resolved MAPREDUCE-5655.
--

Resolution: Duplicate

Hi, [~padisah].  Thanks for the bug report.

I do think this is a duplicate of MAPREDUCE-4052.  The 0.23.x code line is 
similar to the 2.2.x code line.  It's often the case that a bug in 2.2.x is 
also a bug in 0.23.x.  I've just updated MAPREDUCE-4052 to make the title 
clearer and indicate that it also affects version 2.2.0.

I recommend that your participate on MAPREDUCE-4052.  There is a patch attached 
to that issue, but it's a few months old, so it's likely to be out-of-date at 
this point.  Seeing your latest patch would be valuable.  You can upload your 
patch by clicking the More button at the top and then going through the Attach 
Files dialog.  The Submit Patch button is used to submit your patch to Jenkins 
for a test run against current trunk.

Thanks again!


 Remote job submit from windows to a linux hadoop cluster fails due to wrong 
 classpath
 -

 Key: MAPREDUCE-5655
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5655
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: client, job submission
Affects Versions: 2.2.0
 Environment: Client machine is a Windows 7 box, with Eclipse
 Remote: there is a multi node hadoop cluster, installed on Ubuntu boxes (any 
 linux)
Reporter: Attila Pados

 I was trying to run a java class on my client, windows 7 developer 
 environment, which submits a job to the remote Hadoop cluster, initiates a 
 mapreduce there, and then downloads the results back to the local machine.
 General use case is to use hadoop services from a web application installed 
 on a non-cluster computer, or as part of a developer environment.
 The problem was, that the ApplicationMaster's startup shell script 
 (launch_container.sh) was generated with wrong CLASSPATH entry. Together with 
 the java process call on the bottom of the file, these entries were generated 
 in windows style, using % as shell variable marker and ; as the CLASSPATH 
 delimiter.
 I tracked down the root cause, and found that the MrApps.java, and the 
 YarnRunner.java classes create these entries, and is passed forward to the 
 ApplicationMaster, assuming that the OS that runs these classes will match 
 the one running the ApplicationMaster. But it's not the case, these are in 2 
 different jvm, and also the OS can be different, the strings are generated 
 based on the client/submitter side's OS.
 I made some workaround changes to these 2 files, so i could launch my job, 
 however there may be more problems ahead.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Reopened] (MAPREDUCE-5606) JobTracker blocked for DFSClient: Failed recovery attempt

2013-11-19 Thread Chris Nauroth (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5606?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Nauroth reopened MAPREDUCE-5606:
--

  Assignee: (was: firegun)

I'm reopening this.  There is an actual bug here (holding a global lock in the 
JT while doing I/O).  Despite the config workaround I described, I don't think 
we can really call it resolved.

What I'm not sure about is if this is a duplicate of MAPREDUCE-1144.  If anyone 
on that issue can tell, then we can close this as duplicate.

 JobTracker blocked for DFSClient: Failed recovery attempt
 -

 Key: MAPREDUCE-5606
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5606
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: jobtracker
Affects Versions: 1.0.3
 Environment: centos 5.8  jdk 1.7 
Reporter: firegun
Priority: Critical

 when a  datanode was crash,the server can  ping ok,but can not  call rpc ,and 
 also can not ssh login. and then jobTracker may be request a block on this 
 datanode.
 it will happened ,the  JobTracker can not work,the webUI is also 
 unwork,hadoop job -list also unwork,the jobTracker logs no other info .
 and then we need to restart the datanode.
 then jobTraker can work too,but the taskTracker num come to zero,
 we need run : hadoop mradmin -refreshNodes
 then the JobTracker begin to add taskTraker ,but is very slowly.
 this problem occur 5time  in 2weeks.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Created] (MAPREDUCE-5616) MR Client-AppMaster RPC max retries on socket timeout is too high.

2013-11-08 Thread Chris Nauroth (JIRA)
Chris Nauroth created MAPREDUCE-5616:


 Summary: MR Client-AppMaster RPC max retries on socket timeout is 
too high.
 Key: MAPREDUCE-5616
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5616
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: client
Affects Versions: 2.2.0, 3.0.0
Reporter: Chris Nauroth
Assignee: Chris Nauroth


MAPREDUCE-3811 introduced a separate config key for overriding the max retries 
applied to RPC connections from the MapReduce Client to the MapReduce 
Application Master.  This was done to make failover from the AM to the 
MapReduce History Server faster in the event that the AM completes while the 
client thinks it's still running.  However, the RPC client uses a separate 
setting for socket timeouts, and this one is not overridden.  The default for 
this is 45 retries with a 20-second timeout on each retry.  This means that in 
environments subject to connection timeout instead of connection refused, the 
client waits 15 minutes for failover.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


Re: Next releases

2013-11-08 Thread Chris Nauroth
Arun, what are your thoughts on test-only patches?  I know I've been
merging a lot of Windows test stabilization patches down to branch-2.2.
 These can't rightly be called blockers, but they do improve dev
experience, and there is no risk to product code.

Chris Nauroth
Hortonworks
http://hortonworks.com/



On Fri, Nov 8, 2013 at 1:30 AM, Steve Loughran ste...@hortonworks.comwrote:

 On 8 November 2013 02:42, Arun C Murthy a...@hortonworks.com wrote:

  Gang,
 
   Thinking through the next couple of releases here, appreciate f/b.
 
   # hadoop-2.2.1
 
   I was looking through commit logs and there is a *lot* of content here
  (81 commits as on 11/7). Some are features/improvements and some are
 fixes
  - it's really hard to distinguish what is important and what isn't.
 
   I propose we start with a blank slate (i.e. blow away branch-2.2 and
  start fresh from a copy of branch-2.2.0)  and then be very careful and
  meticulous about including only *blocker* fixes in branch-2.2. So, most
 of
  the content here comes via the next minor release (i.e. hadoop-2.3)
 
   In future, we continue to be *very* parsimonious about what gets into a
  patch release (major.minor.patch) - in general, these should be only
  *blocker* fixes or key operational issues.
 

 +1


 
   # hadoop-2.3
 
   I'd like to propose the following features for YARN/MR to make it into
  hadoop-2.3 and punt the rest to hadoop-2.4 and beyond:
   * Application History Server - This is happening in  a branch and is
  close; with it we can provide a reasonable experience for new frameworks
  being built on top of YARN.
   * Bug-fixes in RM Restart
   * Minimal support for long-running applications (e.g. security) via
  YARN-896
 

 +1 -the complete set isn't going to make it, but I'm sure we can identify
 the key ones



   * RM Fail-over via ZKFC
   * Anything else?
 
   HDFS???
 
 

- If I had the time, I'd like to do some work on the HADOOP-9361
filesystem spec  tests -this is mostly some specification, the basis
 of a
better test framework for newer FS tests, and some more tests, with a
couple of minor changes to some of the FS code, mainly in terms of
tightening some of the exceptions thrown (IOE - EOF)

 otherwise:

- I'd like the hadoop-openstack  JAR in; it's already in branch-2 so
it's a matter of ensuring testing during the release against as many
providers as possible.
- There are a fair few JIRAs about updating versions of dependencies
-the S3 JetS3t update went in this week, but there are more, as well as
cruft in the POMs which shows up downstream. I think we could update the
low-risk dependencies (test-time, log4j, c), while avoiding those we
 know
will be trouble (jetty). This may seem minor but it does make a big
 diff to
the downstream projects.

 --
 CONFIDENTIALITY NOTICE
 NOTICE: This message is intended for the use of the individual or entity to
 which it is addressed and may contain information that is confidential,
 privileged and exempt from disclosure under applicable law. If the reader
 of this message is not the intended recipient, you are hereby notified that
 any printing, copying, dissemination, distribution, disclosure or
 forwarding of this communication is strictly prohibited. If you have
 received this communication in error, please contact the sender immediately
 and delete it from your system. Thank You.


-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.


[jira] [Created] (MAPREDUCE-5604) TestMRAMWithNonNormalizedCapabilities fails on Windows due to exceeding max path length

2013-10-31 Thread Chris Nauroth (JIRA)
Chris Nauroth created MAPREDUCE-5604:


 Summary: TestMRAMWithNonNormalizedCapabilities fails on Windows 
due to exceeding max path length
 Key: MAPREDUCE-5604
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5604
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: client, test
Affects Versions: 2.2.0, 3.0.0
Reporter: Chris Nauroth
Assignee: Chris Nauroth
Priority: Minor


The test uses the full class name as a component of the 
{{yarn.nodemanager.local-dirs}} setting for a {{MiniMRYarnCluster}}.  This 
causes container launch to fail when trying to access files at a path longer 
than the maximum of 260 characters.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


Re: About documentation in apache hadoop site

2013-10-25 Thread Chris Nauroth
Hello Yoonmin,

Thank you for finding the typo and reporting it!  I recommend filing an
issue in jira to track it.  If you're interested in contributing a patch to
fix it, this wiki page describes the process:

http://wiki.apache.org/hadoop/HowToContribute

The cluster setup documentation is sourced from the following file in the
codebase:

hadoop-common-project/hadoop-common/src/site/apt/ClusterSetup.apt.vm

Chris Nauroth
Hortonworks
http://hortonworks.com/



On Thu, Oct 24, 2013 at 9:31 AM, Yoonmin Nam rony...@dgist.ac.kr wrote:

 Hi, I am Yoonmin from DGIST, South Korea.



 I found some typo in


 http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/Clus
 terSetup.html



 In the Hadoop Startup content,



 Run a script to start DataNodes on all slaves:

 $ $HADOOP_PREFIX/sbin/hadoop-daemon.sh --config $HADOOP_CONF_DIR --script
 hdfs start datanode



 Should be modified as

 $ $HADOOP_PREFIX/sbin/hadoop-daemons.sh --config $HADOOP_CONF_DIR --script
 hdfs start datanode

 (hadoop-daemon.sh is for master, hadoop-daemons. Is for slave when every
 slave address is defined at slave file)



 Also, Run a script to start NodeManagers on all slaves has same typo.

 So, it should be modified as

 $ $HADOOP_YARN_HOME/sbin/yarn-daemons.sh --config $HADOOP_CONF_DIR start
 nodemanager



 Hadoop shutdown also need to modified for above two scripts.





 Thanks



-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.


[jira] [Resolved] (MAPREDUCE-5588) TaskTrackers get killed by JettyBugMonitor because of incredibly high cpu usage

2013-10-17 Thread Chris Nauroth (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5588?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Nauroth resolved MAPREDUCE-5588.
--

Resolution: Duplicate
  Assignee: Chris Nauroth

Hello, [~moian].

There is a known bug in Jetty that can cause it to consume a ton of CPU and not 
make any real progress.  As a workaround, the Hadoop code monitors for this 
condition and kills the tasktracker process if it finds Jetty consuming too 
much CPU.  (It's better to kill the tasktracker outright then leave it running 
in an unresponsive state.)  This monitoring thread was added in MAPREDUCE-3184. 
 If you review that issue, you can find more background information on 
configuration and tuning.

The bug is set to be fixed in Jetty version 6.1.27, but we've been waiting a 
long time on that release.  We also can't easily jump to a new major version of 
Jetty due to API backwards-compatibility issues.  MAPREDUCE-2980 contains 
ongoing discussion about our upgrade plan.  As a workaround, there are also 
patches available that you can apply on top of Jetty 6.1.26.

I'm going to resolve this as a duplicate of MAPREDUCE-2980.

 TaskTrackers get killed by JettyBugMonitor because of incredibly high cpu 
 usage
 ---

 Key: MAPREDUCE-5588
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5588
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: tasktracker
Affects Versions: 1.1.2
Reporter: Anthony MOI
Assignee: Chris Nauroth
  Labels: cpu-usage, jetty, tasktracker

 We are running a little cluster with 10 servers running task trackers. All of 
 them are getting killed randomly with the following message
 {quote}
 2013-10-17 11:32:31,037 FATAL org.apache.hadoop.mapred.JettyBugMonitor: 
 
 Jetty CPU usage: 120093277.1%. This is greater than the fatal threshold 
 mapred.tasktracker.jetty.cpu.threshold.fatal. Aborting JVM.
 
 2013-10-17 11:32:31,039 INFO org.apache.hadoop.mapred.TaskTracker: 
 SHUTDOWN_MSG: 
 /
 SHUTDOWN_MSG: Shutting down TaskTracker
 /
 {quote}
 Everytime, the message notices a cpu usage above 120M%. Everything has been 
 running for a while now (since 1.1.2 release) without any problems, and it 
 started just like that.
 Any idea of what could cause this ?



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Resolved] (MAPREDUCE-5546) mapred.cmd on Windows set HADOOP_OPTS incorrectly

2013-10-14 Thread Chris Nauroth (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5546?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Nauroth resolved MAPREDUCE-5546.
--

   Resolution: Fixed
Fix Version/s: 2.2.1
   3.0.0

I've committed this to trunk, branch-2, and branch-2.2.  Chuan, thank you for 
the patch.

 mapred.cmd on Windows set HADOOP_OPTS incorrectly
 -

 Key: MAPREDUCE-5546
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5546
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 3.0.0, 2.2.0
Reporter: Chuan Liu
Assignee: Chuan Liu
 Fix For: 3.0.0, 2.2.1

 Attachments: MAPREDUCE-5546-trunk.patch


 The mapred command on Windows does not set HADOOP_OPTS correctly. As a 
 result, some options and settings will miss in the final command, and this 
 will lead to some desired behavior missing. One example is the logging file 
 setting will miss, i.e. even if one set HADOOP_ROOT_LOGGER to DRFA, there is 
 no history server log at HADOOP_LOGFILE.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


Re: [VOTE] Release Apache Hadoop 2.2.0

2013-10-10 Thread Chris Nauroth
+1 non-binding

I verified the checksum and signature.  I deployed the tarball to a small
cluster of Ubuntu VMs: 1 * NameNode, 1 * ResourceManager, 2 * DataNode, 2 *
NodeManager, 1 * SecondaryNameNode.  I ran a few HDFS commands and sample
MapReduce jobs.  I verified that the 2NN can take a checkpoint
successfully.  Everything worked as expected.

The outcome of the recent discussions on HDFS symlinks was that we need to
disable the feature in this release.  Just to be certain that this patch
took, I wrote a small client to call FileSystem.createSymlink and tried to
run it in my 2.2.0 cluster.  It threw UnsupportedOperationException, which
is the expected behavior.

Chris Nauroth
Hortonworks
http://hortonworks.com/



On Thu, Oct 10, 2013 at 10:18 AM, Bikas Saha bi...@hortonworks.com wrote:

 +1 (non binding)

 -Original Message-
 From: Arpit Gupta [mailto:ar...@hortonworks.com]
 Sent: Thursday, October 10, 2013 10:06 AM
 To: common-...@hadoop.apache.org
 Cc: hdfs-...@hadoop.apache.org; yarn-...@hadoop.apache.org;
 mapreduce-dev@hadoop.apache.org
 Subject: Re: [VOTE] Release Apache Hadoop 2.2.0

 +1 (non binding)

 Ran secure and non secure multi node clusters and tested HA and RM
 recovery tests.

 --
 Arpit Gupta
 Hortonworks Inc.
 http://hortonworks.com/

 On Oct 7, 2013, at 12:00 AM, Arun C Murthy a...@hortonworks.com wrote:

  Folks,
 
  I've created a release candidate (rc0) for hadoop-2.2.0 that I would
 like to get released - this release fixes a small number of bugs and some
 protocol/api issues which should ensure they are now stable and will not
 change in hadoop-2.x.
 
  The RC is available at:
  http://people.apache.org/~acmurthy/hadoop-2.2.0-rc0
  The RC tag in svn is here:
  http://svn.apache.org/repos/asf/hadoop/common/tags/release-2.2.0-rc0
 
  The maven artifacts are available via repository.apache.org.
 
  Please try the release and vote; the vote will run for the usual 7 days.
 
  thanks,
  Arun
 
  P.S.: Thanks to Colin, Andrew, Daryn, Chris and others for helping nail
 down the symlinks-related issues. I'll release note the fact that we have
 disabled it in 2.2. Also, thanks to Vinod for some heavy-lifting on the
 YARN side in the last couple of weeks.
 
 
 
 
 
  --
  Arun C. Murthy
  Hortonworks Inc.
  http://hortonworks.com/
 
 
 
  --
  CONFIDENTIALITY NOTICE
  NOTICE: This message is intended for the use of the individual or
  entity to which it is addressed and may contain information that is
  confidential, privileged and exempt from disclosure under applicable
  law. If the reader of this message is not the intended recipient, you
  are hereby notified that any printing, copying, dissemination,
  distribution, disclosure or forwarding of this communication is
  strictly prohibited. If you have received this communication in error,
  please contact the sender immediately and delete it from your system.
 Thank You.


 --
 CONFIDENTIALITY NOTICE
 NOTICE: This message is intended for the use of the individual or entity
 to
 which it is addressed and may contain information that is confidential,
 privileged and exempt from disclosure under applicable law. If the reader
 of this message is not the intended recipient, you are hereby notified
 that
 any printing, copying, dissemination, distribution, disclosure or
 forwarding of this communication is strictly prohibited. If you have
 received this communication in error, please contact the sender
 immediately
 and delete it from your system. Thank You.

 --
 CONFIDENTIALITY NOTICE
 NOTICE: This message is intended for the use of the individual or entity to
 which it is addressed and may contain information that is confidential,
 privileged and exempt from disclosure under applicable law. If the reader
 of this message is not the intended recipient, you are hereby notified that
 any printing, copying, dissemination, distribution, disclosure or
 forwarding of this communication is strictly prohibited. If you have
 received this communication in error, please contact the sender immediately
 and delete it from your system. Thank You.


-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.


Re: [VOTE] Release Apache Hadoop 2.1.1-beta

2013-09-24 Thread Chris Nauroth
Update: HDFS-5228 has been resolved.  It was committed to branch-2.1-beta,
so I think there was an assumption that this would warrant a new RC.  (If
that's not the case, then we ought to pull HDFS-5228 back out of
branch-2.1-beta to avoid confusion.)

Chris Nauroth
Hortonworks
http://hortonworks.com/



On Tue, Sep 24, 2013 at 8:03 AM, Alejandro Abdelnur t...@cloudera.comwrote:

 ping


 On Tue, Sep 24, 2013 at 2:36 AM, Alejandro Abdelnur t...@cloudera.com
 wrote:

  Vote for the 2.1.1-beta release is closing tonight, while we had quite a
  few +1s, it seems we need to address the following before doing a
 release:
 
  symlink discussion: get a concrete and explicit understanding on what we
  will do and  in what release(s).
 
  Also, the following JIRAs seem nasty enough to require a new RC:
 
  https://issues.apache.org/jira/browse/HDFS-5225 (no patch avail)
  https://issues.apache.org/jira/browse/HDFS-5228 (patch avail)
  https://issues.apache.org/jira/browse/YARN-1089 (patch avail)
  https://issues.apache.org/jira/browse/MAPREDUCE-5529 (patch avail)
 
  I won't -1 the release but I'm un-casting my vote as I think we should
  address these things before.
 
  Thanks.
 
  Alejandro
 
 
  On Tue, Sep 24, 2013 at 1:49 AM, Suresh Srinivas sur...@hortonworks.com
 wrote:
 
  +1 (binding)
 
 
  Verified the signatures and hashes for both src and binary tars. Built
  from
  the source, the binary distribution and the documentation. Started a
  single
  node cluster and tested the following:
 
  # Started HDFS cluster, verified the hdfs CLI commands such ls, copying
  data back and forth, verified namenode webUI etc.
 
  # Ran some tests such as sleep job, TestDFSIO, NNBench etc.
 
 
 
 
  On Mon, Sep 16, 2013 at 11:38 PM, Arun C Murthy a...@hortonworks.com
  wrote:
 
   Folks,
  
   I've created a release candidate (rc0) for hadoop-2.1.1-beta that I
  would
   like to get released - this release fixes a number of bugs on top of
   hadoop-2.1.0-beta as a result of significant amounts of testing.
  
   If things go well, this might be the last of the *beta* releases of
   hadoop-2.x.
  
   The RC is available at:
   http://people.apache.org/~acmurthy/hadoop-2.1.1-beta-rc0
   The RC tag in svn is here:
  
 
 http://svn.apache.org/repos/asf/hadoop/common/tags/release-2.1.1-beta-rc0
  
   The maven artifacts are available via repository.apache.org.
  
   Please try the release and vote; the vote will run for the usual 7
 days.
  
   thanks,
   Arun
  
  
   --
   Arun C. Murthy
   Hortonworks Inc.
   http://hortonworks.com/
  
  
  
   --
   CONFIDENTIALITY NOTICE
   NOTICE: This message is intended for the use of the individual or
  entity to
   which it is addressed and may contain information that is
 confidential,
   privileged and exempt from disclosure under applicable law. If the
  reader
   of this message is not the intended recipient, you are hereby notified
  that
   any printing, copying, dissemination, distribution, disclosure or
   forwarding of this communication is strictly prohibited. If you have
   received this communication in error, please contact the sender
  immediately
   and delete it from your system. Thank You.
  
 
 
 
  --
  http://hortonworks.com/download/
 
  --
  CONFIDENTIALITY NOTICE
  NOTICE: This message is intended for the use of the individual or entity
  to
  which it is addressed and may contain information that is confidential,
  privileged and exempt from disclosure under applicable law. If the
 reader
  of this message is not the intended recipient, you are hereby notified
  that
  any printing, copying, dissemination, distribution, disclosure or
  forwarding of this communication is strictly prohibited. If you have
  received this communication in error, please contact the sender
  immediately
  and delete it from your system. Thank You.
 
 
 
 
  --
  Alejandro
 



 --
 Alejandro


-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.


[jira] [Resolved] (MAPREDUCE-5508) JobTracker memory leak caused by unreleased FileSystem objects in JobInProgress#cleanupJob

2013-09-23 Thread Chris Nauroth (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5508?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Nauroth resolved MAPREDUCE-5508.
--

  Resolution: Fixed
   Fix Version/s: 1.3.0
  1-win
Target Version/s: 1-win, 1.3.0
Hadoop Flags: Reviewed

I have committed this to branch-1 and branch-1-win.  Xi, thank you for 
providing a patch for this tricky issue.  Sandy, thank you for help with code 
reviews.

 JobTracker memory leak caused by unreleased FileSystem objects in 
 JobInProgress#cleanupJob
 --

 Key: MAPREDUCE-5508
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5508
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: jobtracker
Affects Versions: 1-win, 1.2.1
Reporter: Xi Fang
Assignee: Xi Fang
Priority: Critical
 Fix For: 1-win, 1.3.0

 Attachments: MAPREDUCE-5508.1.patch, MAPREDUCE-5508.2.patch, 
 MAPREDUCE-5508.3.patch, MAPREDUCE-5508.patch


 MAPREDUCE-5351 fixed a memory leak problem but introducing another filesystem 
 object (see tempDirFs) that is not properly released.
 {code} JobInProgress#cleanupJob()
   void cleanupJob() {
 ...
   tempDirFs = jobTempDirPath.getFileSystem(conf);
   CleanupQueue.getInstance().addToQueue(
   new PathDeletionContext(jobTempDirPath, conf, userUGI, jobId));
 ...
  if (tempDirFs != fs) {
   try {
 fs.close();
   } catch (IOException ie) {
 ...
 }
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Resolved] (MAPREDUCE-5498) maven Junit dependency should be test only

2013-09-14 Thread Chris Nauroth (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5498?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Nauroth resolved MAPREDUCE-5498.
--

Resolution: Duplicate

 maven Junit dependency should be test only
 --

 Key: MAPREDUCE-5498
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5498
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: build
Affects Versions: 3.0.0, 2.1.0-beta
Reporter: Steve Loughran
Assignee: André Kelpe
Priority: Minor
 Attachments: HADOOP-9935-001.patch


 The maven dependencies for the YARN artifacts don't restrict to test time, so 
 it gets picked up by all downstream users.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Reopened] (MAPREDUCE-5498) maven Junit dependency should be test only

2013-09-14 Thread Chris Nauroth (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5498?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Nauroth reopened MAPREDUCE-5498:
--


 maven Junit dependency should be test only
 --

 Key: MAPREDUCE-5498
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5498
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: build
Affects Versions: 3.0.0, 2.1.0-beta
Reporter: Steve Loughran
Assignee: André Kelpe
Priority: Minor
 Attachments: HADOOP-9935-001.patch


 The maven dependencies for the YARN artifacts don't restrict to test time, so 
 it gets picked up by all downstream users.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Resolved] (MAPREDUCE-5470) LocalJobRunner does not work on Windows.

2013-08-23 Thread Chris Nauroth (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5470?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Nauroth resolved MAPREDUCE-5470.
--

   Resolution: Fixed
Fix Version/s: 2.1.1-beta
   3.0.0

I committed to branch-2.1-beta.  Thanks again, Sandy!

 LocalJobRunner does not work on Windows.
 

 Key: MAPREDUCE-5470
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5470
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 3.0.0, 2.1.1-beta
Reporter: Chris Nauroth
Assignee: Sandy Ryza
 Fix For: 3.0.0, 2.1.1-beta

 Attachments: MAPREDUCE-5470-branch-2.1-beta.patch, 
 MAPREDUCE-5470.patch


 {{LocalJobRunner#getLocalTaskDir}} creates a directory that is unique to the 
 task ID.  The logic of this method concatenates the local job dir and a 
 task-specific path, but one of the arguments is a {{Path}} with a scheme, so 
 the final result has file: embedded in it.  This works on Linux, but the 
 ':' is an invalid character in a file name on Windows.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (MAPREDUCE-5470) LocalJobRunner does not work on Windows.

2013-08-20 Thread Chris Nauroth (JIRA)
Chris Nauroth created MAPREDUCE-5470:


 Summary: LocalJobRunner does not work on Windows.
 Key: MAPREDUCE-5470
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5470
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 3.0.0, 2.1.1-beta
Reporter: Chris Nauroth


{{LocalJobRunner#getLocalTaskDir}} creates a directory that is unique to the 
task ID.  The logic of this method concatenates the local job dir and a 
task-specific path, but one of the arguments is a {{Path}} with a scheme, so 
the final result has file: embedded in it.  This works on Linux, but the ':' 
is an invalid character in a file name on Windows.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (MAPREDUCE-5460) MR AppMaster command options does not replace @taskid@ with the current task ID.

2013-08-14 Thread Chris Nauroth (JIRA)
Chris Nauroth created MAPREDUCE-5460:


 Summary: MR AppMaster command options does not replace @taskid@ 
with the current task ID.
 Key: MAPREDUCE-5460
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5460
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: applicationmaster, mrv2
Affects Versions: 3.0.0, 2.1.1-beta
Reporter: Chris Nauroth


The description of {{yarn.app.mapreduce.am.command-opts}} in mapred-default.xml 
states that occurrences of {{@taskid@}} will be replaced by the current task 
ID.  This substitution is not happening.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Resolved] (MAPREDUCE-5405) Job recovery can fail if task log directory symlink from prior run still exists

2013-07-22 Thread Chris Nauroth (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5405?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Nauroth resolved MAPREDUCE-5405.
--

   Resolution: Fixed
Fix Version/s: 1.3.0
   1-win
 Hadoop Flags: Reviewed

Thanks, Arun.  I have committed this to branch-1 and branch-1-win.

 Job recovery can fail if task log directory symlink from prior run still 
 exists
 ---

 Key: MAPREDUCE-5405
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5405
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv1
Affects Versions: 1-win, 1.3.0
Reporter: Chris Nauroth
Assignee: Chris Nauroth
 Fix For: 1-win, 1.3.0

 Attachments: MAPREDUCE-5405.branch-1.1.patch


 During recovery, the task attempt log dir symlink from the prior run might 
 still exist.  If it does, then the recovered attempt will fail while trying 
 to create a symlink at that path.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Resolved] (MAPREDUCE-5406) Improve logging around Task Tracker exiting with JVM manager inconsistent state

2013-07-20 Thread Chris Nauroth (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5406?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Nauroth resolved MAPREDUCE-5406.
--

   Resolution: Fixed
Fix Version/s: 1.3.0
   1-win

I committed this to branch-1 and branch-1-win.  Chelsey, thank you for 
contributing this patch.

 Improve logging around Task Tracker exiting with JVM manager inconsistent 
 state
 ---

 Key: MAPREDUCE-5406
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5406
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: tasktracker
Affects Versions: 1-win, 1.3.0
Reporter: Chelsey Chang
Assignee: Chelsey Chang
 Fix For: 1-win, 1.3.0

 Attachments: hadoop-tasktracker-RD00155D61582F-short.log, 
 MAPREDUCE-5406.branch-1-win.1.patch


 Looks like we are reaching JVM manager inconsistent state which cases TT to 
 crash:
 {code}
 2013-06-09 06:41:11,250 FATAL org.apache.hadoop.mapred.JvmManager: 
 Inconsistent state!!! JVM Manager reached an unstable state while reaping a 
 JVM for task: attempt_201306080400_104812_m_01_0 Number of active JVMs:8
   JVMId jvm_201306080400_104517_m_1331138312 #Tasks ran: 0 Currently busy? 
 true Currently running: attempt_201306080400_104517_m_01_0
   JVMId jvm_201306080400_104641_m_-1631395161 #Tasks ran: 0 Currently busy? 
 true Currently running: attempt_201306080400_104641_m_00_0
   JVMId jvm_201306080400_104494_m_-1702464703 #Tasks ran: 0 Currently busy? 
 true Currently running: attempt_201306080400_104494_m_00_0
   JVMId jvm_201306080400_104784_m_1407576088 #Tasks ran: 0 Currently busy? 
 true Currently running: attempt_201306080400_104784_m_00_0
   JVMId jvm_201306080400_104530_m_186665365 #Tasks ran: 0 Currently busy? 
 true Currently running: attempt_201306080400_104530_m_00_0
   JVMId jvm_201306080400_104589_m_-1080246077 #Tasks ran: 0 Currently busy? 
 true Currently running: attempt_201306080400_104589_m_00_0
   JVMId jvm_201306080400_104674_m_830017814 #Tasks ran: 0 Currently busy? 
 true Currently running: attempt_201306080400_104674_m_00_0
   JVMId jvm_201306080400_104719_m_-226910128 #Tasks ran: 0 Currently busy? 
 true Currently running: attempt_201306080400_104719_m_00_0. Aborting. 
 2013-06-09 06:41:11,250 INFO org.apache.hadoop.mapred.TaskTracker: 
 SHUTDOWN_MSG: 
 {code}
 Although this causes TT to crash, the frequency of the error is rare and the 
 error itself is recoverable so the priority of the issue is not high.
 However, this does look like a bug in the JVM manager state machine. I'm 
 guessing there is some race condition that we're hitting.
 (Logs attached)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (MAPREDUCE-5405) Job recovery can fail if task log directory symlink from prior run still exists

2013-07-19 Thread Chris Nauroth (JIRA)
Chris Nauroth created MAPREDUCE-5405:


 Summary: Job recovery can fail if task log directory symlink from 
prior run still exists
 Key: MAPREDUCE-5405
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5405
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv1
Affects Versions: 1-win, 1.3.0
Reporter: Chris Nauroth
Assignee: Chris Nauroth


During recovery, the task attempt log dir symlink from the prior run might 
still exist.  If it does, then the recovered attempt will fail while trying to 
create a symlink at that path.


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Resolved] (MAPREDUCE-5391) TestNonLocalJobJarSubmission fails on Windows due to missing classpath entries

2013-07-17 Thread Chris Nauroth (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5391?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Nauroth resolved MAPREDUCE-5391.
--

   Resolution: Fixed
Fix Version/s: 1-win

I committed this to branch-1-win.

 TestNonLocalJobJarSubmission fails on Windows due to missing classpath entries
 --

 Key: MAPREDUCE-5391
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5391
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: test
Affects Versions: 1-win
Reporter: Chris Nauroth
Assignee: Chris Nauroth
 Fix For: 1-win

 Attachments: MAPREDUCE-5391.1.patch


 This test works by having the mapper check all classpath entries loaded by 
 the classloader.  On Windows, the classpath is packed into an intermediate 
 jar file with a manifest containing the classpath to work around command line 
 length limitation.  The test needs to be updated to unpack the intermediate 
 jar file and read the manifest when running on Windows.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (MAPREDUCE-5391) TestNonLocalJobJarSubmission fails on Windows due to missing classpath entries

2013-07-15 Thread Chris Nauroth (JIRA)
Chris Nauroth created MAPREDUCE-5391:


 Summary: TestNonLocalJobJarSubmission fails on Windows due to 
missing classpath entries
 Key: MAPREDUCE-5391
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5391
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: test
Affects Versions: 1-win
Reporter: Chris Nauroth
Assignee: Chris Nauroth


This test works by having the mapper check all classpath entries loaded by the 
classloader.  On Windows, the classpath is packed into an intermediate jar file 
with a manifest containing the classpath to work around command line length 
limitation.  The test needs to be updated to unpack the intermediate jar file 
and read the manifest when running on Windows.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Resolved] (MAPREDUCE-5371) TestProxyUserFromEnv#testProxyUserFromEnvironment failed caused by domains of windows users

2013-07-08 Thread Chris Nauroth (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5371?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Nauroth resolved MAPREDUCE-5371.
--

Resolution: Fixed

+1 for the patch.  I committed this to branch-1-win.  Thank you, Xi.

 TestProxyUserFromEnv#testProxyUserFromEnvironment failed caused by domains of 
 windows users
 ---

 Key: MAPREDUCE-5371
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5371
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: test
Affects Versions: 1-win
 Environment: Windows
Reporter: Xi Fang
Assignee: Xi Fang
Priority: Minor
 Fix For: 1-win

 Attachments: MAPREDUCE-5371.patch


 The error message was:
 Error Message
 expected:[sijenkins-vm2]jenkins but was:[]jenkins
 Stacktrace
 at 
 org.apache.hadoop.security.TestProxyUserFromEnv.testProxyUserFromEnvironment(TestProxyUserFromEnv.java:45)
 The root cause of this failure is the domain used on Windows.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Resolved] (MAPREDUCE-5349) TestClusterMapReduceTestCase and TestJobName fail on Windows in branch-2

2013-06-30 Thread Chris Nauroth (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5349?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Nauroth resolved MAPREDUCE-5349.
--

  Resolution: Fixed
   Fix Version/s: 2.1.0-beta
Target Version/s: 2.1.0-beta
Hadoop Flags: Reviewed

I committed this to branch-2 and branch-2.1-beta.  Thanks again for the 
contribution, Chuan!

 TestClusterMapReduceTestCase and TestJobName fail on Windows in branch-2
 

 Key: MAPREDUCE-5349
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5349
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 2.1.0-beta
Reporter: Chuan Liu
Assignee: Chuan Liu
Priority: Minor
 Fix For: 2.1.0-beta

 Attachments: MAPREDUCE-5349-branch-2.2.patch, 
 MAPREDUCE-5349-branch-2.3.patch, MAPREDUCE-5349-branch-2.patch


 The two unit tests fails due to MiniMRCluster use test class fullname in 
 branch-2, instead of simple name as in trunk, to construct the MiniMRCluster 
 identifier. Full name in the identifier almost always leads to a command 
 script path with length larger than 260 characters which will generate an 
 exception {{DefaultContainerExecutor.launchContainer()}} when launching the 
 container script.
 The exception looks like the follows:
 {noformat}
 2013-06-24 09:45:03,060 WARN  [ContainersLauncher #0] 
 launcher.ContainerLaunch (ContainerLaunch.java:call(262)) - Failed to launch 
 container.
 java.io.IOException: Cannot launch container using script at path 
 C:/Users/chuanliu/AppData/Local/Temp/1/1372092295656/org.apache.hadoop.mapred.ClusterMapReduceTestCaseConfigurableMiniMRCluster_1106798455-localDir-nm-0_1/usercache/chuanliu/appcache/application_1372092193505_0001/container_1372092193505_0001_01_01/default_container_executor.cmd,
  because it exceeds the maximum supported path length of 260 characters.  
 Consider configuring shorter directories in yarn.nodemanager.local-dirs.
   at 
 org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:159)
   at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:257)
   at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:1)
   at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
   at java.util.concurrent.FutureTask.run(FutureTask.java:138)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
   at java.lang.Thread.run(Thread.java:662)
 {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


Re: Visual debugging tools for hadoop

2013-06-14 Thread Chris Nauroth
Hi Saikat,

You might want to investigate contributing on Apache Ambari, which has
features for visualization of jobs and end-to-end flows consisting of
multiple dependent jobs.

http://incubator.apache.org/ambari/

Chris Nauroth
Hortonworks
http://hortonworks.com/



On Fri, Jun 14, 2013 at 8:20 AM, Saikat Kanjilal sxk1...@hotmail.comwrote:

 Hi Folks,
 I was wondering if anyone is currently working on or thinking about visual
 debugging tools for mapreduce jobs, I was thinking about starting an effort
 to build an end to end visual tool that shows all the steps in the
 mapreduce workflow and data flows, variable content changing to speed up
 debugging of jobs.Please ignore if something like this already exists
 and if not I'd love to collaborate with folks to build something.


 Regards



Re: Clarifications on MAPREDUCE-5183

2013-05-21 Thread Chris Nauroth
BTW, there is a handy helper function in Hadoop Common that can help you
with this: org.apache.hadoop.util.StringUtils.formatPercent.

Chris Nauroth
Hortonworks
http://hortonworks.com/



On Tue, May 21, 2013 at 11:06 AM, maisnam ns maisnam...@gmail.com wrote:

 Hi Sandy,

 Sure, I would do exactly what you are suggesting.

 Regards
 Niranjan Singh


 On Tue, May 21, 2013 at 11:27 PM, Sandy Ryza sandy.r...@cloudera.com
 wrote:

  If you're planning to fix it, it would probably look prettiest to keep
 the
  percentage sign and have the numbers between 0 and 100.
 
  -Sandy
 
 
  On Tue, May 21, 2013 at 10:55 AM, maisnam ns maisnam...@gmail.com
 wrote:
 
   Thanks Sandy Ryza
  
  
  
  
   On Tue, May 21, 2013 at 11:20 PM, Sandy Ryza sandy.r...@cloudera.com
   wrote:
  
Hi Niranjan,
   
Your understanding is correct.
   
-Sandy
   
   
On Tue, May 21, 2013 at 1:02 AM, maisnam ns maisnam...@gmail.com
   wrote:
   
 Hi,

 I was looking into this issue but would be happy if someone could
   clarify
 some of my doubts.

 Is the issue related to the given below snapshot of log:

 org.apache.hadoop.mapred.TaskTracker: Task
 attempt_201305211246_0001_r_01_0 is in commit-pending, task
 state:COMMIT_PENDING 2013-05-21 12:48:51,058 INFO
 org.apache.hadoop.mapred.TaskTracker:
 attempt_201305211246_0001_r_01_0 0.0% reduce  copy 
 2013-05-21
 12:48:51,769 DEBUG org.apache.hadoop.mapred.TaskTracker:
 Got heartbeatResponse from JobTracker with responseId: 27 and 2
 actions 2013-05-21 12:48:51,769 INFO
org.apache.hadoop.mapred.TaskTracker:
  Received commit task action for
  attempt_201305211246_0001_r_00_0 2013-05-21 12:48:51,769
  INFO org.apache.hadoop.mapred.TaskTracker:
  Received commit task action for
  attempt_201305211246_0001_r_01_0 2013-05-21 12:48:53,695
  INFO org.apache.hadoop.mapred.TaskTracker:
   attempt_201305211246_0001_r_01_0 1.0% reduce 
   reduce 2013-05-21 12:48:53,777 INFO

 The issue says 'TaskTracker#reportProgress logging of 0.0-1.0
  progress
 is followed by percent sign'


 1.attempt_201305211246_0001_r_01_0 0.0% reduce
 2.attempt_201305211246_0001_r_01_0 1.0% reduce

 If my understanding is correct the percent sign at the end of 0.0
 and
   1.0

 should be removed.

 Regards

 Niranjan Singh

   
  
 



Re: [VOTE] Plan to create release candidate for 0.23.8

2013-05-19 Thread Chris Nauroth
+1 (non-binding)

BTW, I left a comment on HDFS-4835 suggesting that you include HDFS-3180
for WebHDFS socket connect/read timeouts.  It's up to you.  (I'm voting +1
for the release plan either way.)

Chris Nauroth
Hortonworks
http://hortonworks.com/



On Fri, May 17, 2013 at 7:25 PM, Eli Collins e...@cloudera.com wrote:

 +1

 On Friday, May 17, 2013, Thomas Graves wrote:

  Hello all,
 
  We've had a few critical issues come up in 0.23.7 that I think warrants a
  0.23.8 release. The main one is MAPREDUCE-5211.  There are a couple of
  other issues that I want finished up and get in before we spin it.  Those
  include HDFS-3875, HDFS-4805, and HDFS-4835.  I think those are on track
  to finish up early next week.   So I hope to spin 0.23.8 soon after this
  vote completes.
 
  Please vote '+1' to approve this plan. Voting will close on Friday May
  24th at 2:00pm PDT.
 
  Thanks,
  Tom Graves
 
 



  1   2   >